Reason maintenance is a knowledge representation approach to efficient handling of inferred information that is explicitly stored. Reason maintenance distinguishes between base facts, which can be defeated, and derived facts. As such it differs from belief revision which, in its basic form, assumes that all facts are equally important. Reason maintenance was originally developed as a technique for implementing problem solvers. It encompasses a variety of techniques that share a common architecture: two components—a reasoner and a reason maintenance system—communicate with each other via an interface. The reasoner uses the reason maintenance system to record its inferences and justifications of ("reasons" for) the inferences. The reasoner also informs the reason maintenance system which are the currently valid base facts (assumptions). The reason maintenance system uses the information to compute the truth value of the stored derived facts and to restore consistency if an inconsistency is derived. == Truth maintenance system == A truth maintenance system, or TMS, is a knowledge representation method for representing both beliefs and their dependencies and an algorithm called the "truth maintenance algorithm" that manipulates and maintains the dependencies. The name truth maintenance is due to the ability of these systems to restore consistency. A truth maintenance system maintains consistency between old believed knowledge and current believed knowledge in the knowledge base (KB) through revision. If the current believed statements contradict the knowledge in the KB, then the KB is updated with the new knowledge. It may happen that the same data will again be believed, and the previous knowledge will be required in the KB. If the previous data are not present, but may be required for new inference. But if the previous knowledge was in the KB, then no retracing of the same knowledge is needed. The use of TMS avoids such retracing; it keeps track of the contradictory data with the help of a dependency record. This record reflects the retractions and additions which makes the inference engine (IE) aware of its current belief set. == Algorithm == Each statement having at least one valid justification is made a part of the current belief set. When a contradiction is found, the statement(s) responsible for the contradiction are identified and the records are appropriately updated. This process is called dependency-directed backtracking. The TMS algorithm maintains the records in the form of a dependency network. Each node in the network is an entry in the KB (a premise, antecedent, or inference rule etc.) Each arc of the network represent the inference steps through which the node was derived. A premise is a fundamental belief which is assumed to be true. They do not need justifications. The set of premises are the basis from which justifications for all other nodes will be derived. == Justification == There are two types of justification for a node. They are: Support list [SL] Conditional proof (CP) == Examples == Many kinds of truth maintenance systems exist. Two major types are single-context and multi-context truth maintenance. In single context systems, consistency is maintained among all facts in memory (KB) and relates to the notion of consistency found in classical logic. Multi-context systems support paraconsistency by allowing consistency to be relevant to a subset of facts in memory, a context, according to the history of logical inference. This is achieved by tagging each fact or deduction with its logical history. Multi-agent truth maintenance systems perform truth maintenance across multiple memories, often located on different machines. de Kleer's assumption-based truth maintenance system (ATMS, 1986) was utilized in systems based upon KEE on the Lisp Machine. The first multi-agent TMS was created by Mason and Johnson. It was a multi-context system. Bridgeland and Huhns created the first single-context multi-agent system.
Region Based Convolutional Neural Networks
Region-based Convolutional Neural Networks (R-CNN) are a family of machine learning models for computer vision, and specifically object detection and localization. The original goal of R-CNN was to take an input image and produce a set of bounding boxes as output, where each bounding box contains an object and also the category (e.g. car or pedestrian) of the object. In general, R-CNN architectures perform selective search over feature maps outputted by a CNN. R-CNN has been extended to perform other computer vision tasks, such as: tracking objects from a drone-mounted camera, locating text in an image, and enabling object detection in Google Lens. Mask R-CNN is also one of seven tasks in the MLPerf Training Benchmark, which is a competition to speed up the training of neural networks. == History == The following covers some of the versions of R-CNN that have been developed. November 2013: R-CNN. April 2015: Fast R-CNN. June 2015: Faster R-CNN. March 2017: Mask R-CNN. December 2017: Cascade R-CNN is trained with increasing Intersection over Union (IoU, also known as the Jaccard index) thresholds, making each stage more selective against nearby false positives. June 2019: Mesh R-CNN adds the ability to generate a 3D mesh from a 2D image. == Architecture == For review articles see. === Selective search === Given an image (or an image-like feature map), selective search (also called Hierarchical Grouping) first segments the image by the algorithm in (Felzenszwalb and Huttenlocher, 2004), then performs the following: Input: (colour) image Output: Set of object location hypotheses L Segment image into initial regions R = {r1, ..., rn} using Felzenszwalb and Huttenlocher (2004) Initialise similarity set S = ∅ foreach Neighbouring region pair (ri, rj) do Calculate similarity s(ri, rj) S = S ∪ s(ri, rj) while S ≠ ∅ do Get highest similarity s(ri, rj) = max(S) Merge corresponding regions rt = ri ∪ rj Remove similarities regarding ri: S = S \ s(ri, r∗) Remove similarities regarding rj: S = S \ s(r∗, rj) Calculate similarity set St between rt and its neighbours S = S ∪ St R = R ∪ rt Extract object location boxes L from all regions in R === R-CNN === With R-CNN, prediction follows a two-step process. A preprocessing selective search step generates a large set of candidate objects (typically as many as 2000), known as regions of interest (ROI). These are forwarded to a CNN, which predicts an object class score and bounding box estimate, independently for each ROI. Importantly, the ROIs are heavily filtered to remove excess candidates. This is achieved using two mechanism. Filtering begins by removing ROIs assigned to the background category. This is a specialized category, which is scored by the CNN alongside other categories. An unfortunate reality is that remaining ROIs typically suffer from heavy duplication. Namely, multiple ROIs that cover same objects in the image are all assigned non-background categories. This is resolved by a heuristic non-maximum suppression (NMS) step. === Fast R-CNN === While the original R-CNN independently computed the neural network features on each of as many as two thousand regions of interest, Fast R-CNN runs the neural network once on the whole image. At the end of the network is a ROIPooling module, which slices out each ROI from the network's output tensor, reshapes it, and classifies it. As in the original R-CNN, the Fast R-CNN uses selective search to generate its region proposals. === Faster R-CNN === While Fast R-CNN used selective search to generate ROIs, Faster R-CNN integrates the ROI generation into the neural network itself. === Mask R-CNN === While previous versions of R-CNN focused on object detections, Mask R-CNN adds instance segmentation. Mask R-CNN also replaced ROIPooling with a new method called ROIAlign, which can represent fractions of a pixel.
AI: When a Robot Writes a Play
AI: When a Robot Writes a Play (in Czech: AI: Když robot píše hru) is a 2021 experimental theatre play, where 90% of its script was automatically generated by artificial intelligence (the GPT-2 language model). The play is in Czech language, but an English version of the script also exists. == Creation == The play is the first result of the THEaiTRE research project, aiming to commemorate the centenary of the R.U.R. play by Karel Čapek by investigating to what extent artificial intelligence could be used to create theatre play scripts. The script of the play was created using the THEaiTRobot tool, based on the GPT-2 language model. First, the play dramaturge, David Košťák, described the initial setting of each scene in a few sentences, and wrote the first line for each character. Next, THEaiTRobot suggested a continuation of the script, which the dramaturge could use, reject, or use part of it and let the tool generate a new continuation. Another option was to manually insert another line or a scenic remark. The script was generated in English and was automatically translated to Czech by the state-of-the-art CUBBITT machine translation tool. The resulting script was then further post-edited by the dramaturge. The resulting script was made freely available for non-commercial use both in English and in Czech, with marked manually inserted texts and manual edits. The analysis shows that 90% of the English script is automatically generated, with 10% manually written or manually post-edited. In the Czech script, a larger amount of edits were made, but the analysis claims that these additional edits are corrections of errors of the automated translation and stylistic corrections which do not change the meaning of the lines as represented by the English script, but rather bring the Czech script closer to the English one. == Characters == The play contains 9 characters. The Robot appears in all the scenes, while each of the other characters appears in only one scene. Robot – The lead character, a male humanoid robot. Master – An old man, the creator of the Robot. Boy – A schoolboy. Masseuse – A sex worker in a brothel. Stranger – An engineer. Man. Psychologist. Administrator – A female clerk at an employment agency. Actress – A film actress and a model in a robot-like costume. == Plot == The play is composed of 8 scenes. It tells the story of a humanoid robot, who encounters 8 other characters and engages into various typically human situations and activities, related to death, love, sex, violence, etc. The individual scenes are not tightly linked, but there are some linking points, such as the central character of the robot or some repeated and developing themes, such as the robot's search for love. The scenes often contain some absurd turns and it is often hard to find sense in them. It is therefore a very complicated piece interpretationally, requiring the director and the actors to invest a lot of effort and creativity in finding a meaningful interpretation which would not deviate from the script. In the interpretation by Švanda theatre, who premiered the play and who also participated on the creation of the script, the scenes typically contain non-verbally expressed content which can add a lot to the meaning of the scene compared to what is contained in the actual script (as the script only contains the lines said by the characters). === Scene 1: Death === The play opens by the Robot parting with his dying Master. The Master gives the Robot several last lessons and talks with him about death, soul, and love. === Scene 2: Sense of Humour === In the second scene, the Robot meets a sad and angry Boy, who complains that he wants to go to school, that his girlfriend is crazy, that he wants to buy a car, etc. The Robot tries to help the Boy by giving him advice, but the Boy's reactions are quite negative and irritated. The Boy then repeatedly asks the Robot to tell him a joke; the Robot keeps refusing, but ultimately tells the following joke: When you are dead. When your children are dead. When your grandchildren are dead, I will be still alive. === Scene 3: Nightclub === The Robot wants to feel pleasure, so he goes to a "night club" (a brothel), where he meets a "Masseuse" (a prostitute). The Robot is initially "a bit cold", but eventually manages to enjoy the experience and falls in love with the Masseuse. In the Švanda theatre performance, the Robot and the Masseuse seem to have a sort of virtual sex without touching each other, reminiscent of the sex scene in Demolition Man. === Scene 4: Fear of the Dark === It is the night. The Robot is standing under a lamp, unable to move away from the light as he finds that he is afraid of the dark. He meets a Stranger, an engineer who tells him that robots don't have feelings and that people cannot be trusted, and keeps hurting him. In the Švanda theatre performance, the Man repeatedly zaps the Robot with some kind of electric pulse. === Scene 5: Killer Robot === A Man approaches the Robot and repeatedly asks him to kill him. Instead, the Robot sticks a finger into the Man's anus, which leads to an argument between the Man and the Robot. === Scene 6: Burn Out === The Robot meets a Psychologist, who keeps asking him lots of questions regarding his life, burnout feeling, love, relationships, and emotions. They also talk about the Robot using a device called emotion machine which helps him to get rid of stress. === Scene 7: Search for Job === The Robot comes to an employment agency. He meets an Administrator and asks her to help him find a job. He expresses the wish to become an actor, and talks about his experience as a clown. He reveals his name to be Troy McClure, which is a character from The Simpsons who is an actor. In the Švanda theatre performance, the Administrator starts to seduce the Robot once his name is revealed, which he keeps ignoring; the Administrator then becomes irritated. === Scene 8: Love at First Sight === The Robot meets a human Actress in a robotic costume and falls in love with her immediately. The Actress is first reluctant, but the Robot manages to seduce her and she also falls in love with him. The Robot tells her about a binary world, in which he lives and where he will also take her. Ultimately, the Actress agrees, and the whole play concludes by the Robot and the Actress promising each to other to always be together. In the Švanda theatre performance, the Robot does not have a physical body in this scene, we can only hear his voice and see a pulsating light (based on the line in the script where the Robot says: "I have no body. So I don't need to wear clothes. You can't see me, you only hear me."), and the Actress eventually also agrees to lose her physical body so that she can be with the Robot forever. == Theatrical performances == The play premiered on 26 February 2021 in Švanda Theatre in Prague, Czech Republic, directed by Daniel Hrbek. Due to the COVID-19 pandemic, the play was not played in front of a live audience, but it was broadcast online, in Czech language with English subtitles. The play was followed by a panel discussion by the project members and experts on artificial intelligence. The premiere was viewed by 13,498 spectators worldwide. A short trailer of the premiere is available on YouTube. In 2021, after the opening of the theatres in the Czech Republic to spectators, the play can be viewed at Švanda Theatre. The performance takes approximately 60 minutes, and is followed by a discussion of the creators with the audience. The derniere is planned for 4 February 2023. == Reception == The play received a number of reviews, both in its country of origin as well as internationally. It is praised as first of its kind, although some reviewers note the similarity to previous works, such as the musical Beyond the Fence, the play Lifestyle of the Richard and Family, or the short movie Sunspring; however, these works used less advanced technology, and either were very short (Sunspring) or necessitated a larger amount of human interventions. The reviewers note that the script is far from perfect, with many inconsistencies and nonsensical parts, and conclude that the technology is definitely not yet ready to replace human authors; however, some find some parts of the script frighteningly human-like. The amount of human intervention is a somewhat controversial topic, with some reviewers finding the human influence too large (especially in interpreting the script and putting the play on scene), while others feel that a greater amount of human intervention would have been favorable as this could greatly improve the quality of the play. The reviews also frequently comment on the amount of sex, violence and strong language in the play; this can be attributed to the method used for creating the script, where the GPT-2 language model reflects topics and language common in the human-written articles on the internet that were used to train the model. Furthermore, some r
Sedona Canada Principles
The Sedona Canada Principles are a set of authoritative guidelines published by The Sedona Conference to aid members of the Canadian legal community involved in the identification, collection, preservation, review and production of electronically stored information (ESI). The principles were drafted by a small group of lawyers, judges and technologists called the Sedona Working Group 7 or Sedona Canada. Sedona Canada is an offshoot of The Sedona Conference which is an American "non-profit ... research and educational institute dedicated to the advanced study of law and policy in the areas of antitrust law, complex litigation, and intellectual property rights". == Background == Civil procedure in Canada is jurisdictional with each province following its own rules of civil procedure. However, each province must address the fact that due to the advancement of technology the discovery process enshrined in the rules of civil procedure can be potentially derailed due to the sheer volume of electronically stored information (ESI). When dealing with litigation matters that involve electronically stored information (ESI), the discovery process is commonly called e-discovery. The problems associated with e-discovery in Canada led to the creation of the Sedona Canada Principles. Rule 29.1.03(4) of the wikibooks:Ontario Rules of Civil Procedure specifically refers to the Sedona Canada Principles in referencing Principles re Electronic Discovery although it has been reported that this rule has been largely ignored in practice. == Summary == The Sedona Canada Principles largely refer to the processes found in the Electronic Discovery Reference Model. The principles urge proportionality due to the potentially enormous volumes of documents that may be discoverable when dealing with ESI. They also encourage good faith in the document preservation stage and regular meetings between parties to discuss the scope of the litigation. Parties are urged to be aware of the potential costs involved in producing relevant ESI but are advised that only reasonably accessible ESI need be produced. The principles stipulate that parties should not be required to search for or collect deleted material unless there is an agreement or court order related to those terms. The use of electronic tools and processes such as data sampling and web harvesting are acceptable practices. Parties are encouraged to agree early in the litigation process on production format required for the exchange of relevant documents as part of the discovery process (native files, pdf, tiff, metadata requirements etc.). Agreements or direction should be sought, if necessary, with respect to privilege or other confidential information related to production of electronic documents and data. Parties should be aware that legal precedents can be formed as a result of e-discovery practices and sanctions can be considered for a party's failure to meet their discovery obligations unless it can be demonstrated that the failure was not intentional. All parties must bear the “reasonable” costs associated with e-discovery but other arrangements can be agreed upon by the parties or by court order. == Caselaw == In Warman v. National Post Company proportionality was at issue in a case where the plaintiff was suing the defendant for libel. A motion was brought by the defendant to have the plaintiff provide a mirror image of his hard drive in an effort to prove an internet article was indeed authored by the plaintiff. Issues of proportionality and the work of the Sedona Conference and Sedona Canada Principles were factored in to the decision to grant the defendant only limited access to the hard drive. In Innovative Health Group Inc. v. Calgary Health Region the plaintiff's legal obligation to produce imaged hard drives is in question. Justice Conrad refers to the advice of Sedona Canada on proportionality and problems associated with time and expense related to the difficulties associated with electronically stored information. In York University v. Michael Markicevic Justice Brown specifically refers to the need for the parties to agree upon a formal e-discovery plan to be drafted in consultation with Sedona Canada Principles. In Friends of Lansdowne v. Ottawa Master MacLeod refers to the need for Sedona Canada principles and states “This is particularly true in the current information age when e-mail is ubiquitous and multiple copies or variants of messages may be held on various kinds of data storage devices including individual hard drives, e-mail and Blackberry servers. Even documents that ultimately exist in paper form normally begin their life on computers and negotiations frequently involve exchanges of electronic drafts. To find every scrap of paper and every electronic trace of relevant information has become a nightmarish task that threatens to render any kind of litigation extravagantly expensive.” == Criticism == Critics of the Sedona Canada Principles believe they should address system integrity and that the true history of any file preserved cannot be identified without proof of the integrity of the electronic record systems management it comes from. Other criticism is more directed to the Sedona Canada working group and complaints that it is insular and irrelevant.
Parchive
Parchive (a portmanteau of parity archive, and formally known as Parity Volume Set Specification) is an erasure code system that produces par files for checksum verification of data integrity, with the capability to perform data recovery operations that can repair or regenerate corrupted or missing data. Parchive was originally written to solve the problem of reliable file sharing on Usenet, but it can be used for protecting any kind of data from data corruption, disc rot, bit rot, and accidental or malicious damage. Despite the name, Parchive uses more advanced techniques (specifically error correction codes) than simplistic parity methods of error detection. As of 2015, PAR1 is obsolete, PAR2 is mature for widespread use, and PAR3 is a discontinued experimental version developed by MultiPar author Yutaka Sawada. The original SourceForge Parchive project has been inactive since April 30, 2015. A new PAR3 specification has been worked on since April 28, 2019 by PAR2 specification author Michael Nahas. An alpha version of the PAR3 specification has been published on January 29, 2022 while the program itself is being developed. == History == Parchive was intended to increase the reliability of transferring files via Usenet newsgroups. Usenet was originally designed for informal conversations, and the underlying protocol, NNTP was not designed to transmit arbitrary binary data. Another limitation, which was acceptable for conversations but not for files, was that messages were normally fairly short in length and limited to 7-bit ASCII text. Various techniques were devised to send files over Usenet, such as uuencoding and Base64. Later Usenet software allowed 8 bit Extended ASCII, which permitted new techniques like yEnc. Large files were broken up to reduce the effect of a corrupted download, but the unreliable nature of Usenet remained. With the introduction of Parchive, parity files could be created that were then uploaded along with the original data files. If any of the data files were damaged or lost while being propagated between Usenet servers, users could download parity files and use them to reconstruct the damaged or missing files. Parchive included the construction of small index files (.par in version 1 and .par2 in version 2) that do not contain any recovery data. These indexes contain file hashes that can be used to quickly identify the target files and verify their integrity. Because the index files were so small, they minimized the amount of extra data that had to be downloaded from Usenet to verify that the data files were all present and undamaged, or to determine how many parity volumes were required to repair any damage or reconstruct any missing files. They were most useful in version 1 where the parity volumes were much larger than the short index files. These larger parity volumes contain the actual recovery data along with a duplicate copy of the information in the index files (which allows them to be used on their own to verify the integrity of the data files if there is no small index file available). In July 2001, Tobias Rieper and Stefan Wehlus proposed the Parity Volume Set specification, and with the assistance of other project members, version 1.0 of the specification was published in October 2001. Par1 used Reed–Solomon error correction to create new recovery files. Any of the recovery files can be used to rebuild a missing file from an incomplete download. Version 1 became widely used on Usenet, but it did suffer some limitations: It was restricted to handle at most 255 files. The recovery files had to be the size of the largest input file, so it did not work well when the input files were of various sizes. (This limited its usefulness when not paired with the proprietary RAR compression tool.) The recovery algorithm had a bug, due to a flaw in the academic paper on which it was based. It was strongly tied to Usenet and it was felt that a more general tool might have a wider audience. In January 2002, Howard Fukada proposed that a new Par2 specification should be devised with the significant changes that data verification and repair should work on blocks of data rather than whole files, and that the algorithm should switch to using 16 bit numbers rather than the 8 bit numbers that PAR1 used. Michael Nahas and Peter Clements took up these ideas in July 2002, with additional input from Paul Nettle and Ryan Gallagher (who both wrote Par1 clients). Version 2.0 of the Parchive specification was published by Michael Nahas in September 2002. Peter Clements then went on to write the first two Par2 implementations, QuickPar and par2cmdline. Abandoned since 2004, Paul Houle created phpar2 to supersede par2cmdline. Yutaka Sawada created MultiPar to supersede QuickPar. MultiPar uses par2j.exe (which is partially based on par2cmdline's optimization techniques) to use as MultiPar's backend engine. == Versions == Versions 1 and 2 of the file format are incompatible. (However, many clients support both.) === Par1 === For Par1, the files f1, f2, ..., fn, the Parchive consists of an index file (f.par), which is CRC type file with no recovery blocks, and a number of "parity volumes" (f.p01, f.p02, etc.). Given all of the original files except for one (for example, f2), it is possible to create the missing f2 given all of the other original files and any one of the parity volumes. Alternatively, it is possible to recreate two missing files from any two of the parity volumes and so forth. Par1 supports up to a total of 256 source and recovery files. === Par2 === Par2 files generally use this naming/extension system: filename.vol000+01.PAR2, filename.vol001+02.PAR2, filename.vol003+04.PAR2, filename.vol007+06.PAR2, etc. The number after the "+" in the filename indicates how many blocks it contains, and the number after "vol" indicates the number of the first recovery block within the PAR2 file. If an index file of a download states that 4 blocks are missing, the easiest way to repair the files would be by downloading filename.vol003+04.PAR2. However, due to the redundancy, filename.vol007+06.PAR2 is also acceptable. There is also an index file filename.PAR2, it is identical in function to the small index file used in PAR1. Par2 specification supports up to 32,768 source blocks and up to 65,535 recovery blocks. Input files are split into multiple equal-sized blocks so that recovery files do not need to be the size of the largest input file. Although Unicode is mentioned in the PAR2 specification as an option, most PAR2 implementations do not support Unicode. Directory support is included in the PAR2 specification, but most or all implementations do not support it. === Par3 === The Par3 specification was originally planned to be published as an enhancement over the Par2 specification. However, to date, it has remained closed source by specification owner Yutaka Sawada. A discussion on a new format started in the GitHub issue section of the maintained fork par2cmdline on January 29, 2019. The discussion led to a new format which is also named as Par3. The new Par3 format's specification is published on GitHub, but remains being an alpha draft as of January 28, 2022. The specification is written by Michael Nahas, the author of Par2 specification, with the help from Yutaka Sawada, animetosho and malaire. The new format claims to have multiple advantages over the Par2 format, including support for: More than 216 files and more than 216 blocks. Packing small files into one block, as well as deduplication when a block appears in multiple files. UTF-8 file names. File permissions, hard links, symbolic/soft links, and empty directories. Embedding PAR data inside other formats, like ZIP archives or ISO disk images. "Incremental backups", where a user creates recovery files for some file or folder, change some data, and create new recovery files reusing some of the older files. More error correction code algorithms (such as LDPC and sparse random matrix). BLAKE3 hashes, dropping support for the MD5 hashes used in PAR2. == Software == === Multi-platform === par2+tbb (GPLv2) — a concurrent (multithreaded) version of par2cmdline 0.4 using TBB. Only compatible with x86 based CPUs. It is available in the FreeBSD Ports system as par2cmdline-tbb. Original par2cmdline — (obsolete). Available in the FreeBSD Ports system as par2cmdline. par2cmdline maintained fork by BlackIkeEagle. par2cmdline-mt is another multithreaded version of par2cmdline using OpenMP, GPLv2, or later. Currently merged into BlackIkeEagle's fork and maintained there. ParPar (CC0) is a high performance, multithreaded PAR2 client and Node.js library. Does not support verifying or repair, it can currently only create PAR2 archives. par2deep (LGPL-3.0) — Produce, verify and repair par2 files recursively, both on the command line as well as with the aid of a graphical user interface. It is available in the Python Package Index system as par2deep. par2cron (MIT License) is an o
Kleene star
In formal language theory, the Kleene star (or Kleene operator or Kleene closure) refers to two related unary operations, that can be applied either to an alphabet of symbols or to a formal language, a set of strings (finite sequences of symbols). The Kleene star operator on an alphabet V generates the set V of all finite-length strings over V, that is, finite sequences whose elements belong to V; in mathematics, it is more commonly known as the free monoid construction. The Kleene star operator on a language L generates another language L, the set of all strings that can be obtained as a concatenation of zero or more members of L. In both cases, repetitions are allowed. The Kleene star operators are named after American mathematician Stephen Cole Kleene, who first introduced and widely used it to characterize automata for regular expressions. == Of an alphabet == Given an alphabet V {\displaystyle V} , define V 0 = { ε } {\displaystyle V^{0}=\{\varepsilon \}} (the set consists only of the empty string), V 1 = V , {\displaystyle V^{1}=V,} and define recursively the set V i + 1 = { w v : w ∈ V i and v ∈ V } {\displaystyle V^{i+1}=\{wv:w\in V^{i}{\text{ and }}v\in V\}} for each i > 0 , {\displaystyle i>0,} where w v {\displaystyle wv} denotes the string obtained by appending the single character v {\displaystyle v} to the end of w {\displaystyle w} . Here, V i {\displaystyle V^{i}} can be understood to be the set of all strings of length exactly i {\displaystyle i} , with characters from V {\displaystyle V} . The definition of Kleene star on V {\displaystyle V} is V ∗ = ⋃ i ≥ 0 V i = V 0 ∪ V 1 ∪ V 2 ∪ V 3 ∪ V 4 ∪ ⋯ . {\displaystyle V^{}=\bigcup _{i\geq 0}V^{i}=V^{0}\cup V^{1}\cup V^{2}\cup V^{3}\cup V^{4}\cup \cdots .} == Of a language == Given a language L {\displaystyle L} (any finite or infinite set of strings), define L 0 = { ε } {\displaystyle L^{0}=\{\varepsilon \}} (the language consisting only of the empty string), L 1 = L , {\displaystyle L^{1}=L,} and define recursively the set L i + 1 = { w v : w ∈ L i and v ∈ L } {\displaystyle L^{i+1}=\{wv:w\in L^{i}{\text{ and }}v\in L\}} for each i > 0 , {\displaystyle i>0,} where w v {\displaystyle wv} denotes the string obtained by concatenating w {\displaystyle w} and v {\displaystyle v} . Here, L i {\displaystyle L^{i}} can be understood to be the set of all strings that can be obtained by concatenating exactly i {\displaystyle i} strings from L {\displaystyle L} , allowing repetitions. The definition of Kleene star on L {\displaystyle L} is L ∗ = ⋃ i ≥ 0 L i = L 0 ∪ L 1 ∪ L 2 ∪ L 3 ∪ L 4 ∪ ⋯ . {\displaystyle L^{}=\bigcup _{i\geq 0}L^{i}=L^{0}\cup L^{1}\cup L^{2}\cup L^{3}\cup L^{4}\cup \cdots .} == Kleene plus == In some formal language studies, (e.g. AFL theory) a variation on the Kleene star operation called the Kleene plus is used. The Kleene plus omits the V 0 {\displaystyle V^{0}} or L 0 {\displaystyle L^{0}} term in the above unions. In other words, the Kleene plus on V {\displaystyle V} is V + = ⋃ i ≥ 1 V i = V 1 ∪ V 2 ∪ V 3 ∪ ⋯ , {\displaystyle V^{+}=\bigcup _{i\geq 1}V^{i}=V^{1}\cup V^{2}\cup V^{3}\cup \cdots ,} or V + = V ∗ V . {\displaystyle V^{+}=V^{}V.} == Examples == Example of Kleene star applied to a set of strings: {"ab","c"} = { ε, "ab", "c", "abab", "abc", "cab", "cc", "ababab", "ababc", "abcab", "abcc", "cabab", "cabc", "ccab", "ccc", ...}. Example of Kleene star applied to a set of strings without the prefix property: {"a","ab","b"} = { ε, "a", "ab", "b", "aa", "aab", "aba", "abab", "abb", "ba", "bab", "bb", ...};In this example, the string "aab" can be obtained in two different ways. The Sardinas-Patterson algorithm can be used to check for a given V whether any member of V can be obtained in more than one way. Example of Kleene and Kleene plus applied to a set of characters (following the C programming language convention where a character is denoted by single quotes and a string is denoted by double quotes): {'a', 'b', 'c'} = { ε, "a", "b", "c", "aa", "ab", "ac", "ba", "bb", "bc", "ca", "cb", "cc", "aaa", "aab", ...}. {'a', 'b', 'c'}+ = { "a", "b", "c", "aa", "ab", "ac", "ba", "bb", "bc", "ca", "cb", "cc", "aaa", "aab", ...}. == Properties == If V {\displaystyle V} is any finite or countably infinite set of characters, then V ∗ {\displaystyle V^{}} is a countably infinite set. As a result, each formal language over a finite or countably infinite alphabet Σ {\displaystyle \Sigma } is countable, since it is a subset of the countably infinite set Σ ∗ {\displaystyle \Sigma ^{}} . ( L ∗ ) ∗ = L ∗ {\displaystyle (L^{})^{}=L^{}} , which means that the Kleene star operator is an idempotent unary operator, as ( L ∗ ) i = L ∗ {\displaystyle (L^{})^{i}=L^{}} for every i ≥ 1 {\displaystyle i\geq 1} . V ∗ = { ε } {\displaystyle V^{}=\{\varepsilon \}} , if V {\displaystyle V} is the empty set ∅. For the version of the Kleene star operator on languages, L ∗ = { ε } {\displaystyle L^{}=\{\varepsilon \}} when L {\displaystyle L} is either the empty set ∅ or the singleton set { ε } {\displaystyle \{\varepsilon \}} . == Generalization == Strings form a monoid with concatenation as the binary operation and ε the identity element. In addition to strings, the Kleene star is defined for any monoid. More precisely, let (M, ⋅) be a monoid, and S ⊆ M. Then S is the smallest submonoid of M containing S; that is, S contains the neutral element of M, the set S, and is such that if x,y ∈ S, then x⋅y ∈ S. Furthermore, the Kleene star is generalized by including the -operation (and the union) in the algebraic structure itself by the notion of complete star semiring.
Artificial intelligence industry in Italy
The artificial intelligence industry in Italy is growing and supports industrial development. In 2024 it reached a new record, reaching 1.2 billion euros with a growth of +58% compared to 2023. While in 2025, the growth of artificial intelligence in the industrial application was even greater than in 2024 both in terms of value and application to industrial sectors. == History == The roots of AI research in Italy extend back to the 1970s, when Italian scholars began exploring automated reasoning, programming language semantics, and pattern recognition. Researchers such as those involved in early projects at the National Research Council and various universities laid the groundwork for subsequent academic and industrial developments in the field. During this period, the focus was predominantly on developing algorithms for automated theorem proving and building systems to reason about complex mathematical problems. This era witnessed the birth of methodologies that would later influence numerous AI subfields, from natural language processing (NLP) to robotics. === Institutional milestones and academic contributions === A turning point in the Italian AI landscape was the formation of the Italian Association for Artificial Intelligence (AIxIA) in 1988. Founded by academics, including Luigia Carlucci Aiello, the association established a platform for collaboration between universities, research centers, and industry. Led by Aiello, AIIA played a role in promoting research, organizing national conferences, and fostering international partnerships that connected Italy's AI community to global networks. At the same time, professors such as Roberto Navigli and numerous practitioners contributed to the advancement of AI in Italy. Navigli has worked in multilingual NLP, including the creation of BabelNet, and led the Minerva project. === Industrial AI === Over recent decades, numerous national and European initiatives supported by funding from programs such as the National Recovery and Resilience Plan (PNRR) have spurred the transition from theoretical research to practical applications. Industrial sectors including manufacturing, banking, and healthcare increasingly embraced AI-driven automation, while research institutions collaborated with industrial partners to deploy cutting-edge solutions. In recent years, Italy has also seen the establishment of specialized research centers and institutes aimed at bridging the gap between academic innovation and industrial application. These initiatives indicate a broader national commitment to integrating AI into the fabric of Italian industry. == Recent developments == === Emergence of generative AI === A landmark in Italy's modern AI evolution is the development of Minerva AI. Developed by the Sapienza NLP research group at Sapienza University of Rome and led by Professor Roberto Navigli, Minerva represents the first family of large language models (LLMs) trained from scratch with a primary focus on the Italian language. ==== Minerva 7B ==== The latest iteration, Minerva 7B, has 7 billion parameters and has been trained on an extensive corpus of over 1.5 trillion words. By using advanced instruction tuning techniques, Minerva 7B is able to produce highly accurate, coherent, and contextually sensitive responses addressing common issues such as hallucinations and inappropriate content generation. This breakthrough sets a benchmark for transparent, open-source AI development in the country. Minerva's development, carried out within the FAIR (Future Artificial Intelligence Research) project in collaboration with CINECA and supported by supercomputing resources like the Leonardo (supercomputer), aligns closely with Italy's cultural and linguistic heritage. === Establishment of AI4I === The recent establishment of the Istituto Italiano per l’Intelligenza Artificiale (AI4I) is part of Italy's strategy to improve its industrial competitiveness in AI. This dedicated institute aims to bridge the gap between research institutions and industrial enterprises; promote training and R&D support to nurture the next generation of Italian AI experts; and enhance national competitiveness. This initiative is expected to serve as a hub for applied AI research, driving innovations that are tailored to the specific needs of Italian industry and public administration. === Benefits of InvestAI === Italy's AI industry stands to benefit from the European InvestAI initiative, a plan unveiled at the recent AI Action Summit in Paris. InvestAI is an effort by the European Commission to mobilize €200 billion for AI investments, with a dedicated €20 billion fund earmarked for building AI gigafactories. These gigafactories are planned as large-scale hubs for training advanced, complex AI models using approximately 100,000 last-generation AI chips. For Italy, this investment presents several major opportunities: Access to State-of-the-Art Infrastructure: Italian companies, research institutions, and start-ups can leverage the gigafactories’ immense computational resources, enabling them to train highly sophisticated language models and other AI systems. Enhanced Competitiveness and Collaboration: With InvestAI's layered funding model where EU funds help de-risk private investments Italian firms can access capital more readily. This will bolster public–private partnerships and create a more dynamic AI ecosystem that spans from academic research to industrial applications. Alignment with National and Regional Initiatives: The Istituto Italiano per l’Intelligenza Artificiale (AI4I), based in Turin, is already recognized as a strategic asset by both Italy and the European Union. As the main recipient of InvestAI funds in Italy, AI4I will play a pivotal role in implementing these investments locally, fostering innovation in sectors like manufacturing, healthcare and aerospace. Commission President Ursula von der Leyen emphasized that InvestAI is designed to democratize AI innovation throughout Europe by ensuring that even smaller companies have access to high-performance computing power. For Italy, this means not only keeping pace with global leaders but also harnessing European-scale investments to transform its AI industry and drive economic growth.