Open information extraction

In natural language processing, open information extraction (OIE) is the task of generating a structured, machine-readable representation of the information in text, usually in the form of triples or n-ary propositions. == Overview == A proposition can be understood as truth-bearer, a textual expression of a potential fact (e.g., "Dante wrote the Divine Comedy"), represented in an amenable structure for computers [e.g., ("Dante", "wrote", "Divine Comedy")]. An OIE extraction normally consists of a relation and a set of arguments. For instance, ("Dante", "passed away in" "Ravenna") is a proposition formed by the relation "passed away in" and the arguments "Dante" and "Ravenna". The first argument is usually referred as the subject while the second is considered to be the object. The extraction is said to be a textual representation of a potential fact because its elements are not linked to a knowledge base. Furthermore, the factual nature of the proposition has not yet been established. In the above example, transforming the extraction into a full fledged fact would first require linking, if possible, the relation and the arguments to a knowledge base. Second, the truth of the extraction would need to be determined. In computer science transforming OIE extractions into ontological facts is known as relation extraction. In fact, OIE can be seen as the first step to a wide range of deeper text understanding tasks such as relation extraction, knowledge-base construction, question answering, semantic role labeling. The extracted propositions can also be directly used for end-user applications such as structured search (e.g., retrieve all propositions with "Dante" as subject). OIE was first introduced by TextRunner developed at the University of Washington Turing Center headed by Oren Etzioni. Other methods introduced later such as Reverb, OLLIE, ClausIE or CSD helped to shape the OIE task by characterizing some of its aspects. At a high level, all of these approaches make use of a set of patterns to generate the extractions. Depending on the particular approach, these patterns are either hand-crafted or learned. == OIE systems and contributions == Reverb suggested the necessity to produce meaningful relations to more accurately capture the information in the input text. For instance, given the sentence "Faust made a pact with the devil", it would be erroneous to just produce the extraction ("Faust", "made", "a pact") since it would not be adequately informative. A more precise extraction would be ("Faust", "made a pact with", "the devil"). Reverb also argued against the generation of overspecific relations. OLLIE stressed two important aspects for OIE. First, it pointed to the lack of factuality of the propositions. For instance, in a sentence like "If John studies hard, he will pass the exam", it would be inaccurate to consider ("John", "will pass", "the exam") as a fact. Additionally, the authors indicated that an OIE system should be able to extract non-verb mediated relations, which account for significant portion of the information expressed in natural language text. For instance, in the sentence "Obama, the former US president, was born in Hawaii", an OIE system should be able to recognize a proposition ("Obama", "is", "former US president"). ClausIE introduced the connection between grammatical clauses, propositions, and OIE extractions. The authors stated that as each grammatical clause expresses a proposition, each verb mediated proposition can be identified by solely recognizing the set of clauses expressed in each sentence. This implies that to correctly recognize the set of propositions in an input sentence, it is necessary to understand its grammatical structure. The authors studied the case in the English language that only admits seven clause types, meaning that the identification of each proposition only requires defining seven grammatical patterns. The finding also established a separation between the recognition of the propositions and its materialization. In a first step, the proposition can be identified without any consideration of its final form, in a domain-independent and unsupervised way, mostly based on linguistic principles. In a second step, the information can be represented according to the requirements of the underlying application, without conditioning the identification phase. Consider the sentence "Albert Einstein was born in Ulm and died in Princeton". The first step will recognize the two propositions ("Albert Einstein", "was born", "in Ulm") and ("Albert Einstein", "died", "in Princeton"). Once the information has been correctly identified, the propositions can take the particular form required by the underlying application [e.g., ("Albert Einstein", "was born in", "Ulm") and ("Albert Einstein", "died in", "Princeton")]. CSD introduced the idea of minimality in OIE. It considers that computers can make better use of the extractions if they are expressed in a compact way. This is especially important in sentences with subordinate clauses. In these cases, CSD suggests the generation of nested extractions. For example, consider the sentence "The Embassy said that 6,700 Americans were in Pakistan". CSD generates two extractions [i] ("6,700 Americans", "were", "in Pakistan") and [ii] ("The Embassy", "said", "that [i]"). This is usually known as reification.

Concept mining

Concept mining is an activity that results in the extraction of concepts from artifacts. Solutions to the task typically involve aspects of artificial intelligence and statistics, such as data mining and text mining. Because artifacts are typically a loosely structured sequence of words and other symbols (rather than concepts), the problem is nontrivial, but it can provide powerful insights into the meaning, provenance and similarity of documents. == Methods == Traditionally, the conversion of words to concepts has been performed using a thesaurus, and for computational techniques the tendency is to do the same. The thesauri used are either specially created for the task, or a pre-existing language model, usually related to Princeton's WordNet. The mappings of words to concepts are often ambiguous. Typically each word in a given language will relate to several possible concepts. Humans use context to disambiguate the various meanings of a given piece of text, where available machine translation systems cannot easily infer context. For the purposes of concept mining, however, these ambiguities tend to be less important than they are with machine translation, for in large documents the ambiguities tend to even out, much as is the case with text mining. There are many techniques for disambiguation that may be used. Examples are linguistic analysis of the text and the use of word and concept association frequency information that may be inferred from large text corpora. Recently, techniques that base on semantic similarity between the possible concepts and the context have appeared and gained interest in the scientific community. == Applications == === Detecting and indexing similar documents in large corpora === One of the spin-offs of calculating document statistics in the concept domain, rather than the word domain, is that concepts form natural tree structures based on hypernymy and meronymy. These structures can be used to generate simple tree membership statistics, that can be used to locate any document in a Euclidean concept space. If the size of a document is also considered as another dimension of this space then an extremely efficient indexing system can be created. This technique is currently in commercial use locating similar legal documents in a 2.5 million document corpus. === Clustering documents by topic === Standard numeric clustering techniques may be used in "concept space" as described above to locate and index documents by the inferred topic. These are numerically far more efficient than their text mining cousins, and tend to behave more intuitively, in that they map better to the similarity measures a human would generate.

Shy Girl

Shy Girl is a horror novel initially self-published in February 2025 by Mia Ballard. Publishing rights for the book were acquired by Hachette Book Group, which released the book in the United Kingdom in November 2025 and planned to publish it in the United States in 2026. Its US release was cancelled and its UK release was discontinued after it faced accusations of being created with generative AI. Ballard denied having personally used AI in the book's writing, claiming that a freelance editor had introduced AI-generated changes. She also stated that she would take legal action against the editor. == Premise == The novel follows Gia, a depressed woman with obsessive–compulsive disorder, who encounters a mysterious man named Nathan while looking for a sugar daddy to ease her financial troubles. Nathan offers to erase all of Gia's debts in exchange for her agreeing to live as his pet. Living like an animal convinces her that she is becoming an animal, making her behave like one. == Publication and cancellation == Shy Girl was first self-published online by Mia Ballard in February 2025. Marketing material described the book as a "buzzy BookTok sensation" and "bloody and unforgiving". The self-published edition of the book was highly successful and had over 4,900 ratings on Goodreads and an average score of 3.52 stars. In an interview, Ballard described her writing style as lyrical, feverish, and introspective, and stated she was more interested in "what it feels like to live inside a body" than in plot-driven storylines. Publishing rights were acquired by Hachette Book Group and it was published by its Wildfire imprint in the United Kingdom in November 2025. By March 2026, the book had sold 1,800 copies in the United Kingdom. A US release was planned for 2026 by the imprint Orbit Books. After the British publication, critics and readers began to make claims that the book appeared to have been written by generative AI. A January 2026 post on Reddit claimed that the book had many of the hallmarks of having been written with a large language model, and stated that it was "repulsive" that the book was accepted by Hachette. A two-and-a-half-hour video essay covering the book, titled "i'm pretty sure this book is ai slop", received 1.2 million views on YouTube by March 2026. In response, Hachette Book Group announced in March 2026 that it would cancel the book's US publication and discontinue its UK publication. It told The Wall Street Journal that it had made "a lengthy investigation" before deciding to cancel the book. Ballard told The New York Times that she had not used AI when writing the book, but that AI-generated elements were added by a freelance editor without her knowledge. She also stated that she could not elaborate on her claim because she was pursuing legal action against the editor. Writer Andrea Bartz opined that the situation "raises many concerns about trust, authenticity and publishing's readiness for a new, A.I.-assisted world", but that "readers made it abundantly clear they want books by humans, not machines".

We Appreciate Power

"We Appreciate Power" is a song by Canadian musician Grimes, featuring American musician Hana. It was released on November 29, 2018, billed as the lead single from her fifth studio album Miss Anthropocene, however it is only available on the Japanese and deluxe releases. The song was written and produced by Grimes, Poppy (originally), Hana and Chris Greatti. == Background and release == The song was supposed to be one of two collaborations between Grimes and American singer Poppy, for the latter's second studio album Am I a Girl?. In an interview, Poppy mentioned that she wrote two songs with Grimes; one about "destroying things" and another about "power". The other song, "Play Destroy", was featured on the album. Grimes shared a lyric of the song with a photo of her with Poppy on Twitter in May 2018. Following feuds between the two singers, the song was released by Grimes featuring singer Hana instead. On November 26, Grimes announced she would be releasing new music on November 29. Two days later, she revealed that the single is titled "We Appreciate Power" and features Hana, and shared the artwork. The release of the song was accompanied by a lyric video directed by Grimes and her brother Mac Boucher. == Music and lyrics == "We Appreciate Power" is an industrial rock, nu metal, and techno-industrial song. The track is regarded as a further step into Grimes's experimentation with guitars that started on 2015's Art Angels. The track was compared to the works of Nine Inch Nails; Jillian Mapes of Pitchfork described the song as "an immediate onslaught of mutilated noise—distorted metal guitar chug, bloody screams, a guitar loop that conjures fear and demands worship. Flashes of Nine Inch Nails' Pretty Hate Machine reverberate through the drum programming and synths." Brendan Klinkenberg of Rolling Stone placed the song "somewhere between power pop and straightforward industrial (with an extended bridge reminiscent of the most sweeping moments in a Final Fantasy score)" and "a distinctly 2018 take on Nine Inch Nails-esque hard-edged rock." A press release stated that the song was inspired by the North Korean band Moranbong and was written "from the perspective of a Pro-A.I. Girl Group Propaganda machine who use song, dance, sex and fashion to spread goodwill towards Artificial Intelligence." In addition Grimes stated that by simply listening to the song you will be reducing your risk of ending up on any future AI overlord's hit list when it reigns supreme, mirroring the Roko's basilisk theory. Lyrically, the song touches on transhumanist ideas such as the betterment and future of the human race, the possibilities of merging consciousness with machines to extend life indefinitely through mind uploading, and the idea that reality may be simulated. The song's chorus generated a spike in interest in the word "capitulate". == Critical reception == Pitchfork critic Jillian Mapes wrote: "If "Freak on a Leash" isn't a dealbreaker, then the supervillain allure of "We Appreciate Power" might pull you in (it legitimately slaps), but it just as well may leave you weighed down by Grimes' commitment to the absolute darkest timeline." Billboard's Gil Kaufman described the song as "a dystopian, aggressive dive into a more rock-leaning sound." Similarly, Brendan Klinkenberg of Rolling Stone called it "the most aggressive single Grimes has released to date" Noisey called the song "an absolute motherfucker of a single" and opined it sounds "like a K-pop band covering nu-metal". Justin Kamp of Paste described the track as a "glitchy empowerment anthem that chugs along on screeching synths and Grimes' repeated exultations of power." == Personnel == Credits adapted from Tidal. Grimes – vocals, guitar, production, engineering Hana – vocals, guitar, additional production Chris Greatti – guitar, keyboards, production, engineering Zakk Cervini – mixing == Track listing == == Charts ==

Ensemble averaging (machine learning)

In machine learning, ensemble averaging is the process of creating multiple models (typically artificial neural networks) and combining them to produce a desired output, as opposed to creating just one model. Ensembles of models often outperform individual models, as the various errors of the ensemble constituents "average out". == Overview == Ensemble averaging is one of the simplest types of committee machines. Along with boosting, it is one of the two major types of static committee machines. In contrast to standard neural network design, in which many networks are generated but only one is kept, ensemble averaging keeps the less satisfactory networks, but with less weight assigned to their outputs. The theory of ensemble averaging relies on two properties of artificial neural networks: In any network, the bias can be reduced at the cost of increased variance In a group of networks, the variance can be reduced at no cost to the bias. This is known as the bias–variance tradeoff. Ensemble averaging creates a group of networks, each with low bias and high variance, and combines them to form a new network which should theoretically exhibit low bias and low variance. Hence, this can be thought of as a resolution of the bias–variance tradeoff. The idea of combining experts can be traced back to Pierre-Simon Laplace. == Method == The theory mentioned above gives an obvious strategy: create a set of experts with low bias and high variance, and average them. Generally, what this means is to create a set of experts with varying parameters; frequently, these are the initial synaptic weights of a neural network, although other factors (such as learning rate, momentum, etc.) may also be varied. Some authors recommend against varying weight decay and early stopping. The steps are therefore: Generate N experts, each with their own initial parameters (these values are usually sampled randomly from a distribution) Train each expert separately Combine the experts and average their values. Alternatively, domain knowledge may be used to generate several classes of experts. An expert from each class is trained, and then combined. A more complex version of ensemble average views the final result not as a mere average of all the experts, but rather as a weighted sum. If each expert is y i {\displaystyle y_{i}} , then the overall result y ~ {\displaystyle {\tilde {y}}} can be defined as: y ~ ( x ; α ) = ∑ j = 1 p α j y j ( x ) {\displaystyle {\tilde {y}}(\mathbf {x} ;\mathbf {\alpha } )=\sum _{j=1}^{p}\alpha _{j}y_{j}(\mathbf {x} )} where α {\displaystyle \mathbf {\alpha } } is a set of weights. The optimization problem of finding alpha is readily solved through neural networks, hence a "meta-network" where each "neuron" is in fact an entire neural network can be trained, and the synaptic weights of the final network is the weight applied to each expert. This is known as a linear combination of experts. It can be seen that most forms of neural network are some subset of a linear combination: the standard neural net (where only one expert is used) is simply a linear combination with all α j = 0 {\displaystyle \alpha _{j}=0} and one α k = 1 {\displaystyle \alpha _{k}=1} . A raw average is where all α j {\displaystyle \alpha _{j}} are equal to some constant value, namely one over the total number of experts. A more recent ensemble averaging method is negative correlation learning, proposed by Y. Liu and X. Yao. This method has been widely used in evolutionary computing. == Benefits == The resulting committee is almost always less complex than a single network that would achieve the same level of performance The resulting committee can be trained more easily on smaller datasets The resulting committee often has improved performance over any single model The risk of overfitting is lessened, as there are fewer parameters (e.g. neural network weights) which need to be set.

Evolutionary robotics

Evolutionary robotics is an embodied approach to Artificial Intelligence (AI) in which robots are automatically designed using Darwinian principles of natural selection. The design of a robot, or a subsystem of a robot such as a neural controller, is optimized against a behavioral goal (e.g. run as fast as possible). Usually, designs are evaluated in simulations as fabricating thousands or millions of designs and testing them in the real world is prohibitively expensive in terms of time, money, and safety. An evolutionary robotics experiment starts with a population of randomly generated robot designs. The worst performing designs are discarded and replaced with mutations and/or combinations of the better designs. This evolutionary algorithm continues until a prespecified amount of time elapses or some target performance metric is surpassed. Evolutionary robotics methods are particularly useful for engineering machines that must operate in environments in which humans have limited intuition (nanoscale, space, etc.). Evolved simulated robots can also be used as scientific tools to generate new hypotheses in biology and cognitive science, and to test old hypothesis that require experiments that have proven difficult or impossible to carry out in reality. == History == In the early 1990s, two separate European groups demonstrated different approaches to the evolution of robot control systems. Dario Floreano and Francesco Mondada at EPFL evolved controllers for the Khepera robot. Adrian Thompson, Nick Jakobi, Dave Cliff, Inman Harvey, and Phil Husbands evolved controllers for a Gantry robot at the University of Sussex. However the body of these robots was presupposed before evolution. The first simulations of evolved robots were reported by Karl Sims and Jeffrey Ventrella of the MIT Media Lab, also in the early 1990s. However these so-called virtual creatures never left their simulated worlds. The first evolved robots to be built in reality were 3D-printed by Hod Lipson and Jordan Pollack at Brandeis University at the turn of the 21st century.

Blended artificial intelligence

Blended artificial intelligence (blended AI) refers to the blending of different artificial intelligence techniques or approaches to achieve more robust and practical solutions. It involves integrating multiple AI models, algorithms, and technologies to leverage their respective strengths and compensate for their weaknesses. == Background == In the context of machine learning, blended AI can involve using different types of models, such as generative AI, decision trees, neural networks, and support vector machines. By combining their results, predictions are more accurate and reliable. This blending of models can be done through techniques like ensemble learning, where multiple models are trained independently and their predictions are combined to make a final decision. Blended AI can also involve combining different AI techniques or technologies, such as natural language processing, computer vision, and expert systems, to tackle complex problems that require a multi-dimensional approach. For example, in a sales scenario AI could be used for lead generation and gathering information from social media such as LinkedIn posts, or understanding a prospect's hobbies and interests. Another blended AI could achieve customer profiling including past interactions and purchasing habits, by them, their industry and growth areas. Blended AI could be used to do predictive analytics to look at historical sales data, market trends, and external factors to generate accurate sales forecasts. This method is critical to gauge and increase "efficiency, revenue, and productivity". Lastly, another could integrate all the information into the CRM to build and maintain better prospect and customer profiles. Blended AI aims to leverage the strengths of different AI techniques and technologies, allowing them to complement each other and create more powerful and comprehensive AI solutions. By combining multiple approaches, blended AI aims to achieve better performance, higher accuracy, improved robustness, and enhanced capabilities in solving diverse and challenging problems.