Induction of regular languages

Induction of regular languages

In computational learning theory, induction of regular languages refers to the task of learning a formal description (e.g. grammar) of a regular language from a given set of example strings. Although E. Mark Gold has shown that not every regular language can be learned this way (see language identification in the limit), approaches have been investigated for a variety of subclasses. They are sketched in this article. For learning of more general grammars, see Grammar induction. == Definitions == A regular language is defined as a (finite or infinite) set of strings that can be described by one of the mathematical formalisms called "finite automaton", "regular grammar", or "regular expression", all of which have the same expressive power. Since the latter formalism leads to shortest notations, it shall be introduced and used here. Given a set Σ of symbols (a.k.a. alphabet), a regular expression can be any of ∅ (denoting the empty set of strings), ε (denoting the singleton set containing just the empty string), a (where a is any character in Σ; denoting the singleton set just containing the single-character string a), r + s (where r and s are, in turn, simpler regular expressions; denoting their set's union) r ⋅ s (denoting the set of all possible concatenations of strings from r's and s's set), r + (denoting the set of n-fold repetitions of strings from r's set, for any n ≥ 1), or r (similarly denoting the set of n-fold repetitions, but also including the empty string, seen as 0-fold repetition). For example, using Σ = {0,1}, the regular expression (0+1+ε)⋅(0+1) denotes the set of all binary numbers with one or two digits (leading zero allowed), while 1⋅(0+1)⋅0 denotes the (infinite) set of all even binary numbers (no leading zeroes). Given a set of strings (also called "positive examples"), the task of regular language induction is to come up with a regular expression that denotes a set containing all of them. As an example, given {1, 10, 100}, a "natural" description could be the regular expression 1⋅0, corresponding to the informal characterization "a 1 followed by arbitrarily many (maybe even none) 0's". However, (0+1) and 1+(1⋅0)+(1⋅0⋅0) is another regular expression, denoting the largest (assuming Σ = {0,1}) and the smallest set containing the given strings, and called the trivial overgeneralization and undergeneralization, respectively. Some approaches work in an extended setting where also a set of "negative example" strings is given; then, a regular expression is to be found that generates all of the positive, but none of the negative examples. == Lattice of automata == Dupont et al. have shown that the set of all structurally complete finite automata generating a given input set of example strings forms a lattice, with the trivial undergeneralized and the trivial overgeneralized automaton as bottom and top element, respectively. Each member of this lattice can be obtained by factoring the undergeneralized automaton by an appropriate equivalence relation. For the above example string set {1, 10, 100}, the picture shows at its bottom the undergeneralized automaton Aa,b,c,d in grey, consisting of states a, b, c, and d. On the state set {a,b,c,d}, a total of 15 equivalence relations exist, forming a lattice. Mapping each equivalence E to the corresponding quotient automaton language L(Aa,b,c,d / E) obtains the partially ordered set shown in the picture. Each node's language is denoted by a regular expression. The language may be recognized by quotient automata w.r.t. different equivalence relations, all of which are shown below the node. An arrow between two nodes indicates that the lower node's language is a proper subset of the higher node's. If both positive and negative example strings are given, Dupont et al. build the lattice from the positive examples, and then investigate the separation border between automata that generate some negative example and such that do not. Most interesting are those automata immediately below the border. In the picture, separation borders are shown for the negative example strings 11 (green), 1001 (blue), 101 (cyan), and 0 (red). Coste and Nicolas present an own search method within the lattice, which they relate to Mitchell's version space paradigm. To find the separation border, they use a graph coloring algorithm on the state inequality relation induced by the negative examples. Later, they investigate several ordering relations on the set of all possible state fusions. Kudo and Shimbo use the representation by automaton factorizations to give a unique framework for the following approaches (sketched below): k-reversible languages and the "tail clustering" follow-up approach, Successor automata and the predecessor-successor method, and pumping-based approaches (framework-integration challenged by Luzeaux, however). Each of these approaches is shown to correspond to a particular kind of equivalence relations used for factorization. == Approaches == === k-reversible languages === Angluin considers so-called "k-reversible" regular automata, that is, deterministic automata in which each state can be reached from at most one state by following a transition chain of length k. Formally, if Σ, Q, and δ denote the input alphabet, the state set, and the transition function of an automaton A, respectively, then A is called k-reversible if: ∀a0, ..., ak ∈ Σ ∀s1, s2 ∈ Q: δ(s1, a0...ak) = δ(s2, a0...ak) ⇒ s1 = s2, where δ means the homomorphic extension of δ to arbitrary words. Angluin gives a cubic algorithm for learning of the smallest k-reversible language from a given set of input words; for k = 0, the algorithm has even almost linear complexity. The required state uniqueness after k + 1 given symbols forces unifying automaton states, thus leading to a proper generalization different from the trivial undergeneralized automaton. This algorithm has been used to learn simple parts of English syntax; later, an incremental version has been provided. Another approach based on k-reversible automata is the tail clustering method. === Successor automata === From a given set of input strings, Vernadat and Richetin build a so-called successor automaton, consisting of one state for each distinct character and a transition between each two adjacent characters' states. For example, the singleton input set {aabbaabb} leads to an automaton corresponding to the regular expression (a+⋅b+). An extension of this approach is the predecessor-successor method which generalizes each character repetition immediately to a Kleene + and then includes for each character the set of its possible predecessors in its state. Successor automata can learn exactly the class of local languages. Since each regular language is the homomorphic image of a local language, grammars from the former class can be learned by lifting, if an appropriate (depending on the intended application) homomorphism is provided. In particular, there is such a homomorphism for the class of languages learnable by the predecessor-successor method. The learnability of local languages can be reduced to that of k-reversible languages. === Early approaches === Chomsky and Miller (1957) used the pumping lemma: they guess a part v of an input string uvw and try to build a corresponding cycle into the automaton to be learned; using membership queries they ask, for appropriate k, which of the strings uw, uvvw, uvvvw, ..., uvkw also belongs to the language to be learned, thereby refining the structure of their automaton. In 1959, Solomonoff generalized this approach to context-free languages, which also obey a pumping lemma. === Cover automata === Câmpeanu et al. learn a finite automaton as a compact representation of a large finite language. Given such a language F, they search a so-called cover automaton A such that its language L(A) covers F in the following sense: L(A) ∩ Σ≤ l = F, where l is the length of the longest string in F, and Σ≤ l denotes the set of all strings not longer than l. If such a cover automaton exists, F is uniquely determined by A and l. For example, F = {ad, read, reread } has l = 6 and a cover automaton corresponding to the regular expression (r⋅e)⋅a⋅d. For two strings x and y, Câmpeanu et al. define x ~ y if xz ∈ F ⇔ yz ∈ F for all strings z of a length such that both xz and yz are not longer than l. Based on this relation, whose lack of transitivity causes considerable technical problems, they give an O(n4) algorithm to construct from F a cover automaton A of minimal state count. Moreover, for union, intersection, and difference of two finite languages they provide corresponding operations on their cover automata. Păun et al. improve the time complexity to O(n2). === Residual automata === For a set S of strings and a string u, the Brzozowski derivative u−1S is defined as the set of all rest-strings obtainable from a string in S by cutting off its prefix u (if possible), formally: u−1S = {v ∈ Σ: uv ∈ S}, cf. picture. Denis et al. define a

AppyStore

AppyStore is a comprehensive learning videos and games app for kids up to the age of 8 years. The platform developed by Mauj Mobile, a mobile value-added services (VAS) provider curates content to help in child development by leveraging technology. Mauj is funded by Sequoia Capital, Westbridge Capital and Intel Capital. == Background == AppyStore was launched in 2014 as a platform providing content for kids between the ages of 1.5 and 6 years. AppyStore subsequently extended its services for kids up to 8 years of age. The company operates on a subscription-based model and claims to have 5,000 learning games and videos segregated in 18 learning areas developed to help children gain optimal skills and qualities. According to an article published in Business Standard, the application is claimed to be one of the top 5 apps that help to enhance the logical and imaginative capabilities of children. AppyStore was awarded the Best app for kids by Google Play in December 2017. == Service == The company provides content via a website and an Android app. The website and android app provide learning games, rhymes, phonics, reading, stories, science, numbers, maths, logic videos comprising puzzles, worksheets, videos and fun activities and the premium subscription also includes physical worksheets which are home delivered. This content is educational and has been handpicked by teachers and experts with an understanding of the major areas of child development milestones for children up to 8 years of age. The mobile application also allows parents to track the progress of their child on the basis of the number of videos viewed.

Deepfake pornography

Deepfake pornography is a form of non-consensual AI pornography created by altering existing photographs or videos using deepfake technology to modify the appearance of the participants. The use of deepfake pornography has sparked controversy because it involves the making and sharing of realistic videos featuring non-consenting individuals and is sometimes used for revenge porn. Many countries have criminalized this "new voyeurism" through legislative measures and technological solutions. == History == The term "deepfake" was coined in 2017 on a Reddit forum where users shared altered pornographic videos created using machine learning algorithms. It is a combination of the word "deep learning", which refers to the program used to create the videos, and "fake" meaning the videos are not real. Deepfake pornography was originally created on a small individual scale using a combination of machine learning algorithms, computer vision techniques, and AI software. The process began by gathering a large amount of source material (including both images and videos) of a person's face, and then using a deep learning model to train a Generative Adversarial Network to create a fake video that convincingly swaps the face of the source material onto the body of a pornographic performer. However, the production process has significantly evolved since 2018, with the advent of several public apps that have largely automated the process. While several AI "nudification" apps emerged on mainstream platforms like Google Play and the Apple App Store around 2023, major tech storefronts have since implemented stricter policies and automated detection to ban such software. Consequently, the proliferation of non-consensual deepfake pornography has largely shifted to decentralized websites, specialized online forums, and third-party messaging bot ecosystems. Deepfake pornography is sometimes confused with fake nude photography, but the two are mostly different. Fake nude photography typically uses non-sexual images and merely makes it appear that the people in them are nude. == Notable cases == Deepfake technology has been used to create non-consensual and pornographic images and videos of famous women. One of the earliest examples occurred in 2017 when a deepfake pornographic video of Gal Gadot was created by a Reddit user and quickly spread online. Since then, there have been numerous instances of similar deepfake content targeting other female celebrities, such as Emma Watson, Natalie Portman, and Scarlett Johansson. Johansson spoke publicly on the issue in December 2018, condemning the practice but also refusing legal action because she views the harassment as inevitable. === Rana Ayyub === In 2018, Rana Ayyub, an Indian investigative journalist, was the target of an online hate campaign stemming from her condemnation of the Indian government, specifically her speaking out against the rape of an eight-year-old Kashmiri girl. Ayyub was bombarded with rape and death threats, and had a doctored pornographic video of her circulated online. In a Huffington Post article, Ayyub discussed the long-lasting psychological and social effects this experience has had on her. She explained that she continued to struggle with her mental health and how the images and videos continued to resurface whenever she took a high-profile case. === Atrioc controversy === In 2023, Twitch streamer Atrioc stirred controversy when he accidentally revealed deepfake pornographic material featuring female Twitch streamers while on live. The influencer has since admitted to paying for AI generated porn, and apologized to the women and his fans. === Taylor Swift === In January 2024, AI-generated sexually explicit images of American singer Taylor Swift were posted on X (formerly Twitter), and spread to other platforms such as Facebook, Reddit and Instagram. One tweet with the images was viewed over 45 million times before being removed. A report from 404 Media found that the images appeared to have originated from a Telegram group, whose members used tools such as Microsoft Designer to generate the images, using misspellings and keyword hacks to work around Designer's content filters. After the material was posted, Swift's fans posted concert footage and images to bury the deepfake images, and reported the accounts posting the deepfakes. Searches for Swift's name were temporarily disabled on X, returning an error message instead. Graphika, a disinformation research firm, traced the creation of the images back to a 4chan community. A source close to Swift told the Daily Mail that she would be considering legal action, saying, "Whether or not legal action will be taken is being decided, but there is one thing that is clear: These fake AI-generated images are abusive, offensive, exploitative, and done without Taylor's consent and/or knowledge." The controversy drew condemnation from White House Press Secretary Karine Jean-Pierre, Microsoft CEO Satya Nadella, the Rape, Abuse & Incest National Network, and SAG-AFTRA. Several US politicians called for federal legislation against deepfake pornography. Later in the month, US senators Dick Durbin, Lindsey Graham, Amy Klobuchar and Josh Hawley introduced a bipartisan bill that would allow victims to sue individuals who produced or possessed "digital forgeries" with intent to distribute, or those who received the material knowing it was made non-consensually. === 2024 Telegram deepfake scandal === It emerged in South Korea in August 2024, that many teachers and female students were victims of deepfake images created by users who utilized AI technology. Journalist Ko Narin of The Hankyoreh uncovered the deepfake images through Telegram chats. On Telegram, group chats were created specifically for image-based sexual abuse of women, including middle and high school students, teachers, and even family members. Women with photos on social media platforms like KakaoTalk, Instagram, and Facebook are often targeted as well. Perpetrators use AI bots to generate fake images, which are then sold or widely shared, along with the victims' social media accounts, phone numbers, and KakaoTalk usernames. One Telegram group reportedly drew around 220,000 members, according to a Guardian report. Investigations revealed numerous chat groups on Telegram where users, mainly teenagers, create and share explicit deepfake images of classmates and teachers. The issue came in the wake of a troubling history of digital sex crimes, notably the notorious Nth Room case in 2019. The Korean Teachers Union estimated that more than 200 schools had been affected by these incidents. Activists called for a "national emergency" declaration to address the problem. South Korean police reported over 800 deepfake sex crime cases by the end of September 2024, a stark rise from just 156 cases in 2021, with most victims and offenders being teenagers. On September 21, 6,000 people gathered at Marronnier Park in northeastern Seoul to demand stronger legal action against deepfake crimes targeting women. On September 26, following widespread outrage over the Telegram scandal, South Korean lawmakers passed a bill criminalizing the possession or viewing of sexually explicit deepfake images and videos, imposing penalties that include prison terms and fines. Under the new law, those caught buying, saving, or watching such material could face up to three years in prison or fines up to 30 million won ($22,600). At the time the bill was proposed, creating sexually explicit deepfakes for distribution carried a maximum penalty of five years, but the new legislation would increase this to seven years, regardless of intent. By October 2024, it was estimated that "nudify" deep fake bots on Telegram were up to four million monthly users. === 2025–2026 Grok/X chatbot deepfake scandal === In December 2025, Bloomberg reported that X users found Grok would comply with unconsensual requests to digitally undress individuals, including minors, or show them performing sexually explicit acts. The majority of these prompts were targeted at women and girls. An analysis of 20,000 images generated by Grok between December 25, 2025 and January 1, 2026 showed 2% were of people in bikinis or transparent clothes and appeared to be 18 or younger, including 30 of "young or very young" women or girls. A separate analysis conducted over 24 hours from January 5 to 6 calculated that users had Grok create 6,700 sexually suggestive or nudified images per hour. xAI responded to requests for comment from media organizations with the automated reply, "Legacy Media Lies". The bot's image generation sparked an international backlash and calls for legal or regulatory action from officials in the European Union, United Kingdom, Poland, France, India, Malaysia, and Brazil. === Fernandes–Ulmen case === German TV presenter Collien Fernandes, filed a complaint against her ex-husband, actor Christian Ulmen, for several accusation including, ident

Felix, Net i Nika

Felix, Net i Nika ("Felix, Net and Nika") is a series of Polish language science fiction books for teenagers, written by Rafał Kosik. It tells the adventures of three friends - Felix Polon, Net Bielecki and Nika Mickiewicz - who attend fictional Professor Kuszmiński Middle School in Warsaw. As of 2024, eighteen books have been published. == Books == There are currently 18 books in the series: Felix, Net and Nika and the Gang of Invisible People - November 2004. Felix, Net and Nika and the Theoretically Possible Catastrophe - November 2005 Felix, Net and Nika and the Palace of Dreams - November 2006 Felix, Net and Nika and the Trap of Immortality - November 2007 Felix, Net and Nika and the Orbital Conspiracy - November 2008 Felix, Net and Nika and the Orbital Conspiracy 2: Small Army - May 2009 Felix, Net and Nika and the Third Cousin - November 2009 Felix, Net and Nika and the Rebellion of Machines - March 2011 Felix, Net and Nika and the World Zero - November 2011 Felix, Net and Nika and the World Zero 2. Alternauts - November 2012 Felix, Net and Nika and the Extracurricular Stories - April 2013 Felix, Net and Nika and the Secret of Czerwona Hańcza - November 2013 Felix, Net and Nika and Curse of McKillian's House - November 2014 Felix, Net and Nika and (un)Safe Growing up - November 2015 Felix, Net and Nika and The End of The World as We Know It - November 2018 Felix, Net and Nika and No Chance - November 2022 Felix, Net and Nika and No Chance 2: other tomorrrow - 2023 Felix, Net and Nika and Fantology - June 2024 == Film == A feature motion picture, Felix, Net i Nika oraz Teoretycznie Możliwa Katastrofa (Felix, Net and Nika and the Theoretically Possible Catastrophe) was released in Poland on September 28, 2012. == Main characters == Felix Polon - a foresighted, fair-haired boy with dark brown eyes. He inherited the talent of constructing various things, especially robots, from his father- it saved his friends many times. He can make anything from nothing, always finds a way out of a situation; almost always has a plan. Together with his parents Marlene and Peter, grandmother Lucy, his dog Caban (a Black Russian Terrier) and Golem Golem a robot he built, Felix lives on Serdeczna Street in a small family house. Net Bielecki is quite tall & slim, has blue eyes and a high IQ level. "Net" is his nickname; his true name is unknown. He is the most trendy and 'awesome' in his entire class. He is a human calculator and is excellent in mathematics. He hates dictations and spelling because he is dyslexic. He is also quite lazy, absent-minded and sometimes hysterical, or panicking. His dark blond hair looks like a heap of hay after a grenade explosion. He is best in ICT and writes many of his own programs. His love interest is Nika Mickiewicz. Together with his parents Lila and Mark, and their newborn twins nicknamed Pompek and Prumcia he lives on the top floor of a Penthouse apartment. Nika Mickiewicz is a girl with a character. She is very brave and mature. She likes reading books. She has curly, red hair, green eyes and a few freckles. She is not very rich; she wears second-hand clothes and her only pair of black Dr. Martens shoes. She lives in a tiny apartment. She is an orphan, but hides that fact from people for almost 3 years. However, Felix and Net, her best and possibly only friends, find out about it. She also has abnormal abilities. She can move distant objects using her powers, ski uphill and knows some things by intuition. In other words, she is telekinetic. Manfred is a friendly AI program started and never finished by Net's father, and mastered and programmed further by Net himself. He likes going on adventures and solving mysteries with the trio much more than his actual job, which is controlling the traffic lights. He helped out the three friends many times and is their reliable and faithful friend. Morten is also an AI program, but he is the antagonist of the trio. He appears in all 6 books of Felix Net and Nika. In the first book, the trio thinks they finished him off for good, but as we find out later, he comes back in the third book. In the fifth/sixth book, he was the mastermind of the Orbital Conspiracy. Also, Morten's logo, appears in all 6 books and it is still a mystery what he has to do with each event.

CogX Festival

CogX Festival is a global festival focusing on the impact of artificial intelligence (AI) and emerging technology on industry, government, and society. It takes place annually, usually in September, in London, England. Founded by Charlie Muirhead and Tabitha Goldstaub in 2017, CogX aims to facilitate dialogue and understanding about AI and its implications across various sectors. CogX Festival 2023 was held from September 12 to September 14 across multiple sites in London. == History == The inaugural CogX event took place in 2017, intending to bring together experts from diverse fields to discuss the role and impact of AI and emerging technologies. Since then, it has evolved to include a broader range of topics and attract a diverse audience. In 2018, the first CogX Awards festival was hosted. That year, over 50 awards were shown to 300 guests. In 2021, CogX and Hopin, a video conferencing software, signed an agreement lasting 4 years to make CogX a hybrid conference due to the COVID-19 pandemic. CogX 2021 attracted over 5,000 attendees in-person and over 100,000 virtually. In 2022, they returned to a live event format after two years of hybrid events and controlled physical attendance. They also launched the CogX app, which curated insights from the world's top podcasts. In 2023, after he had delivered the keynote address guest speaker Stephen Fry fell off the stage and subsequently broke his leg, hip, pelvis and a "bunch of ribs". A court filing in 2026 revealed that Fry was seeking £100,000 in damages from CogX Festival Ltd and creative agency Blonstein Events. == Programming == The festival features sessions, discussions, workshops, and exhibitions, encompassing various domains of AI and technology. In recent CogX Festivals, they have featured summits encompassing topics like global leadership and industry transformation.

Astrostatistics

Astrostatistics is a discipline which spans astrophysics, statistical analysis and data mining. It is used to process the vast amount of data produced by automated scanning of the cosmos, to characterize complex datasets, and to link astronomical data to astrophysical theory. Many branches of statistics are involved in astronomical analysis including nonparametrics, multivariate regression and multivariate classification, time series analysis, and especially Bayesian inference. The field is closely related to astroinformatics.

Fuzzy differential equation

Fuzzy differential equation are general concept of ordinary differential equation in mathematics defined as differential inclusion for non-uniform upper hemicontinuity convex set with compactness in fuzzy set. d x ( t ) / d t = F ( t , x ( t ) , α ) , {\displaystyle dx(t)/dt=F(t,x(t),\alpha ),} for all α ∈ [ 0 , 1 ] {\displaystyle \alpha \in [0,1]} . == First order fuzzy differential equation == A first order fuzzy differential equation with real constant or variable coefficients x ′ ( t ) + p ( t ) x ( t ) = f ( t ) {\displaystyle x'(t)+p(t)x(t)=f(t)} where p ( t ) {\displaystyle p(t)} is a real continuous function and f ( t ) : [ t 0 , ∞ ) → R F {\displaystyle f(t)\colon [t_{0},\infty )\rightarrow R_{F}} is a fuzzy continuous function y ( t 0 ) = y 0 {\displaystyle y(t_{0})=y_{0}} such that y 0 ∈ R F {\displaystyle y_{0}\in R_{F}} . == Linear systems of fuzzy differential equations == A system of equations of the form x ( t ) n ′ = a n 1 ( t ) x 1 ( t ) + . . . . . . + a n n ( t ) x n ( t ) + f n ( t ) {\displaystyle x(t)'_{n}=a_{n}1(t)x_{1}(t)+......+a_{n}n(t)x_{n}(t)+f_{n}(t)} where a i j {\displaystyle a_{i}j} are real functions and f i {\displaystyle f_{i}} are fuzzy functions x n ′ ( t ) = ∑ i = 0 1 a i j x i . {\displaystyle x'_{n}(t)=\sum _{i=0}^{1}a_{ij}x_{i}.} == Fuzzy partial differential equations == A fuzzy differential equation with partial differential operator is ∇ x ( t ) = F ( t , x ( t ) , α ) , {\displaystyle \nabla x(t)=F(t,x(t),\alpha ),} for all α ∈ [ 0 , 1 ] {\displaystyle \alpha \in [0,1]} . == Fuzzy fractional differential equation == A fuzzy differential equation with fractional differential operator is d n x ( t ) d t n = F ( t , x ( t ) , α ) , {\displaystyle {\frac {d^{n}x(t)}{dt^{n}}}=F(t,x(t),\alpha ),} for all α ∈ [ 0 , 1 ] {\displaystyle \alpha \in [0,1]} where n {\displaystyle n} is a rational number.