AI Content Idea Generator

AI Content Idea Generator — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Symbolic artificial intelligence

    Symbolic artificial intelligence

    In artificial intelligence, symbolic artificial intelligence (also known as classical artificial intelligence or logic-based artificial intelligence) is the term for the collection of all methods in artificial intelligence research that are based on high-level symbolic (human-readable) representations of problems, logic, and search. Symbolic AI used tools such as logic programming, production rules, semantic nets and frames, and it developed applications such as knowledge-based systems (in particular, expert systems), symbolic mathematics, automated theorem provers, ontologies, the semantic web, and automated planning and scheduling systems. The Symbolic AI paradigm led to important ideas in search, symbolic programming languages, agents, multi-agent systems, the semantic web, and the strengths and limitations of formal knowledge and reasoning systems. Symbolic AI was the dominant paradigm of AI research from the mid-1950s until the mid-1990s. Researchers in the 1960s and the 1970s were convinced that symbolic approaches would eventually succeed in creating a machine with artificial general intelligence and considered this the ultimate goal of their field. An early boom, with early successes such as the Logic Theorist and Samuel's Checkers Playing Program, led to unrealistic expectations and promises and was followed by the first AI Winter as funding dried up. A second boom (1969–1986) occurred with the rise of expert systems, their promise of capturing corporate expertise, and an enthusiastic corporate embrace. That boom, and some early successes, e.g., with XCON at DEC, was followed again by later disappointment. Problems with difficulties in knowledge acquisition, maintaining large knowledge bases, and brittleness in handling out-of-domain problems arose. Another, second, AI Winter (1988–2011) followed. Subsequently, AI researchers focused on addressing underlying problems in handling uncertainty and in knowledge acquisition. Uncertainty was addressed with formal methods such as hidden Markov models, Bayesian reasoning, and statistical relational learning. Symbolic machine learning addressed the knowledge acquisition problem with contributions including Version Space, Valiant's PAC learning, Quinlan's ID3 decision-tree learning, case-based learning, and inductive logic programming to learn relations. Neural networks, a subsymbolic approach, had been pursued from early days and reemerged strongly in 2012. Early examples are Rosenblatt's perceptron learning work, the backpropagation work of Rumelhart, Hinton and Williams, and work in convolutional neural networks by LeCun et al. in 1989. However, neural networks were not viewed as successful until about 2012: "Until Big Data became commonplace, the general consensus in the Al community was that the so-called neural-network approach was hopeless. Systems just didn't work that well, compared to other methods. ... A revolution came in 2012, when a number of people, including a team of researchers working with Hinton, worked out a way to use the power of GPUs to enormously increase the power of neural networks." Over the next several years, deep learning had spectacular success in handling vision, speech recognition, speech synthesis, image generation, and machine translation, though symbolic approaches continue to be useful in a few domains such as computer algebra systems and proof assistants. == History == A short history of symbolic AI to the present day follows below. Time periods and titles are drawn from Henry Kautz's 2020 AAAI Robert S. Engelmore Memorial Lecture and the longer Wikipedia article on the History of AI, with dates and titles differing slightly for increased clarity. === The first AI summer: irrational exuberance, 1948–1966 === Success at early attempts in AI occurred in three main areas: artificial neural networks, knowledge representation, and heuristic search, contributing to high expectations. This section summarizes Kautz's reprise of early AI history. ==== Approaches inspired by human or animal cognition or behavior ==== Cybernetic approaches attempted to replicate the feedback loops between animals and their environments. A robotic turtle, with sensors, motors for driving and steering, and seven vacuum tubes for control, based on a preprogrammed neural net, was built as early as 1948. This work can be seen as an early precursor to later work in neural networks, reinforcement learning, and situated robotics. An important early symbolic AI program was the Logic theorist, written by Allen Newell, Herbert Simon and Cliff Shaw in 1955–56, as it was able to prove 38 elementary theorems from Whitehead and Russell's Principia Mathematica. Newell, Simon, and Shaw later generalized this work to create a domain-independent problem solver, GPS (General Problem Solver). GPS solved problems represented with formal operators via state-space search using means-ends analysis. During the 1960s, symbolic approaches achieved great success at simulating intelligent behavior in structured environments such as game-playing, symbolic mathematics, and theorem-proving. AI research was concentrated in four institutions in the 1960s: Carnegie Mellon University, Stanford, MIT and (later) University of Edinburgh. Each one developed its own style of research. Earlier approaches based on cybernetics or artificial neural networks were abandoned or pushed into the background. Herbert Simon and Allen Newell studied human problem-solving skills and attempted to formalize them, and their work laid the foundations of the field of artificial intelligence, as well as cognitive science, operations research and management science. Their research team used the results of psychological experiments to develop programs that simulated the techniques that people used to solve problems. This tradition, centered at Carnegie Mellon University would eventually culminate in the development of the Soar architecture in the middle 1980s. ==== Heuristic search ==== In addition to the highly specialized domain-specific kinds of knowledge that we will see later used in expert systems, early symbolic AI researchers discovered another more general application of knowledge. These were called heuristics, rules of thumb that guide a search in promising directions: "How can non-enumerative search be practical when the underlying problem is exponentially hard? The approach advocated by Simon and Newell is to employ heuristics: fast algorithms that may fail on some inputs or output suboptimal solutions." Another important advance was to find a way to apply these heuristics that guarantees a solution will be found, if there is one, not withstanding the occasional fallibility of heuristics: "The A algorithm provided a general frame for complete and optimal heuristically guided search. A is used as a subroutine within practically every AI algorithm today but is still no magic bullet; its guarantee of completeness is bought at the cost of worst-case exponential time. ==== Early work on knowledge representation and reasoning ==== Early work covered both applications of formal reasoning emphasizing first-order logic, along with attempts to handle common-sense reasoning in a less formal manner. ===== Modeling formal reasoning with logic: the "neats" ===== Unlike Simon and Newell, John McCarthy felt that machines did not need to simulate the exact mechanisms of human thought, but could instead try to find the essence of abstract reasoning and problem-solving with logic, regardless of whether people used the same algorithms. His laboratory at Stanford (SAIL) focused on using formal logic to solve a wide variety of problems, including knowledge representation, planning and learning. Logic was also the focus of the work at the University of Edinburgh and elsewhere in Europe which led to the development of the programming language Prolog and the science of logic programming. ===== Modeling implicit common-sense knowledge with frames and scripts: the "scruffies" ===== Researchers at MIT (such as Marvin Minsky and Seymour Papert) found that solving difficult problems in vision and natural language processing required ad hoc solutions—they argued that no simple and general principle (like logic) would capture all the aspects of intelligent behavior. Roger Schank described their "anti-logic" approaches as "scruffy" (as opposed to the "neat" paradigms at CMU and Stanford). Commonsense knowledge bases (such as Doug Lenat's Cyc) are an example of "scruffy" AI, since they must be built by hand, one complicated concept at a time. === The first AI winter: crushed dreams, 1967–1977 === The first AI winter was a shock: During the first AI summer, many people thought that machine intelligence could be achieved in just a few years. The Defense Advance Research Projects Agency (DARPA) launched programs to support AI research to use AI to solve problems of national security; in particular, to automate the translation of Russian to English for inte

    Read more →
  • General Data Protection Regulation

    General Data Protection Regulation

    The General Data Protection Regulation (Regulation (EU) 2016/679), abbreviated GDPR, is a European Union regulation on information privacy in the European Union (EU) and the European Economic Area (EEA). The GDPR is an important component of EU privacy law and human rights law, in particular Article 8(1) of the Charter of Fundamental Rights of the European Union. It also governs the transfer of personal data outside the EU and EEA. The GDPR's goals are to enhance individuals' control and rights over their personal information and to simplify the regulations for international business. It supersedes the Data Protection Directive 95/46/EC and, among other things, simplifies the terminology. The European Parliament and Council of the European Union adopted the GDPR on 14 April 2016, to become effective on 25 May 2018. As an EU regulation (instead of a directive), the GDPR has direct legal effect and does not require transposition into national law. However, it also provides flexibility for individual member states to modify (derogate from) some of its provisions. As an example of the Brussels effect, the regulation became a model for many other laws around the world, including in Brazil, Japan, Singapore, South Africa, South Korea, Sri Lanka, and Thailand. After leaving the European Union, the United Kingdom enacted its "UK GDPR", identical to the GDPR. The California Consumer Privacy Act (CCPA), adopted on 28 June 2018, has many similarities with the GDPR. == Contents == The GDPR 2016 has eleven chapters, concerning general provisions, principles, rights of the data subject, duties of data controllers or processors, transfers of personal data to third-party countries, supervisory authorities, cooperation among member states, remedies, liability or penalties for breach of rights, provisions related to specific processing situations, and miscellaneous final provisions. The GDPR also contains 173 recitals purposed to clarify scope and rationale for the regulatory provisions, as well as its legislative intents – Recital 4, for instance, begins by saying that the processing of personal data should be "designed to serve mankind". === General provisions === The regulation applies if the data controller, or processor, or the data subject (person) is based in the EU. The regulation also applies to organisations based outside the EU if they collect or process personal data of individuals located inside the EU. The regulation does not apply to the processing of data by private persons provided that the purpose has no connection to a professional or commercial activity." (Recital 18). According to the European Commission, "Personal data is information that relates to an identified or identifiable individual. If you cannot directly identify an individual from that information, then you need to consider whether the individual is still identifiable. You should take into account the information you are processing together with all the means reasonably likely to be used by either you or any other person to identify that individual." The precise definitions of terms such as "personal data", "processing", "data subject", "controller", and "processor" are stated in Article 4. The regulation does not purport to apply to the processing of personal data for national security activities or law enforcement of the EU; however, industry groups concerned about facing a potential conflict of laws have questioned whether Article 48 could be invoked to seek to prevent a data controller subject to a third country's laws from complying with a legal order from that country's law enforcement, judicial, or national security authorities to disclose to such authorities the personal data of an EU person, regardless of whether the data resides in or out of the EU. Article 48 states that any judgement of a court or tribunal and any decision of an administrative authority of a third country requiring a controller or processor to transfer or disclose personal data may not be recognised or enforceable in any manner unless based on an international agreement, like a mutual legal assistance treaty in force between the requesting third (non-EU) country and the EU or a member state. The data protection reform package also includes a separate Data Protection Directive for the police and criminal justice sector that provides rules on personal data exchanges at State level, Union level, and international levels. A single set of rules applies to all EU member states. Each member state establishes an independent supervisory authority (SA) to hear and investigate complaints, sanction administrative offences, etc. SAs in each member state co-operate with other SAs, providing mutual assistance and organising joint operations. If a business has multiple establishments in the EU, it must have a single SA as its "lead authority", based on the location of its "main establishment" where the main processing activities take place. The lead authority thus acts as a "one-stop shop" to supervise all the processing activities of that business throughout the EU. A European Data Protection Board (EDPB) co-ordinates the SAs. EDPB thus replaces the Article 29 Data Protection Working Party. There are exceptions for data processed in an employment context or in national security that still might be subject to individual country regulations. === Principles and lawful purposes === Article 5 sets out six principles relating to the lawfulness of processing personal data. The first of these specifies that data must be processed lawfully, fairly and in a transparent manner. Article 6 develops this principle by specifying that personal data may not be processed unless there is at least one legal basis for doing so. The other principles refer to "purpose limitation", "data minimisation", "accuracy", "storage limitation", and "integrity and confidentiality". Article 6 states that the lawful purposes are: (a) If the data subject has given consent to the processing of his or her personal data; (b) To fulfill contractual obligations with a data subject, or for tasks at the request of a data subject who is in the process of entering into a contract; (c) To comply with a data controller's legal obligations; (d) To protect the vital interests of a data subject or another individual; (e) To perform a task in the public interest or in official authority; (f) For the legitimate interests of a data controller or a third party, unless these interests are overridden by interests of the data subject or her or his rights according to the Charter of Fundamental Rights (especially in the case of children). If informed consent is used as the lawful basis for processing, consent must have been explicit for data collected and each purpose data is used for. Consent must be a specific, freely given, plainly worded, and unambiguous affirmation given by the data subject; an online form which has consent options structured as an opt-out selected by default is a violation of the GDPR, as the consent is not unambiguously affirmed by the user. In addition, multiple types of processing may not be "bundled" together into a single affirmation prompt, as this is not specific to each use of data, and the individual permissions are not freely given. (Recital 32). Data subjects must be allowed to withdraw this consent at any time, and the process of doing so must not be harder than it was to opt in. A data controller may not refuse service to users who decline consent to processing that is not strictly necessary in order to use the service. Consent for children, defined in the regulation as being less than 16 years old (although with the option for member states to individually make it as low as 13 years old), must be given by the child's parent or custodian, and verifiable. If consent to processing was already provided under the Data Protection Directive, a data controller does not have to re-obtain consent if the processing is documented and obtained in compliance with the GDPR's requirements (Recital 171). === Rights of the data subject === ==== Transparency and modalities ==== Article 12 requires the data controller to provide information to the "data subject in a concise, transparent, intelligible and easily accessible form, using clear and plain language, in particular for any information addressed specifically to a child." ==== Information and access ==== The right of access (Article 15) is a data subject right. It gives people the right to access their personal data and information about how this personal data is being processed. A data controller must provide, upon request, an overview of the categories of data that are being processed as well as a copy of the actual data; furthermore, the data controller has to inform the data subject on details about the processing, such as the purposes of the processing, with whom the data is shared, and how it acquired the data. A data subject must be able to transfer personal data from one electro

    Read more →
  • OpenCog

    OpenCog

    OpenCog is a project that aims to build an open source artificial intelligence framework. OpenCog Prime is an architecture for robot and virtual embodied cognition that defines a set of interacting components designed to give rise to human-equivalent artificial general intelligence (AGI) as an emergent phenomenon of the whole system. OpenCog Prime's design is primarily the work of Ben Goertzel while the OpenCog framework is intended as a generic framework for broad-based AGI research. Research utilizing OpenCog has been published in journals and presented at conferences and workshops including the annual Conference on Artificial General Intelligence. OpenCog is released under the terms of the GNU Affero General Public License. OpenCog is in use by more than 50 companies, including Huawei and Cisco. == Origin == OpenCog was originally based on the release in 2008 of the source code of the proprietary "Novamente Cognition Engine" (NCE) of Novamente LLC. The original NCE code is discussed in the PLN book (ref below). Ongoing development of OpenCog is supported by Artificial General Intelligence Research Institute (AGIRI), the Google Summer of Code project, Hanson Robotics, SingularityNET and others. == Components == OpenCog consists of: A graph database, dubbed the AtomSpace, that holds "atoms" (that is, terms, atomic formulas, sentences and relationships) together with their "values" (valuations or interpretations, which can be thought of as per-atom key-value databases). An example of a value would be a truth value. Atoms are globally unique, immutable and are indexed (searchable); values are fleeting and changeable. A collection of pre-defined atoms, termed Atomese, used for generic knowledge representation, such as conceptual graphs and semantic networks, as well as to represent and store the rules (in the sense of term rewriting) needed to manipulate such graphs. A collection of pre-defined atoms that encode a type subsystem, including type constructors and function types. These are used to specify the types of variables, terms and expressions, and are used to specify the structure of generic graphs containing variables. A collection of pre-defined atoms that encode both functional and imperative programming styles. These include the lambda abstraction for binding free variables into bound variables, as well as for performing beta reduction. A collection of pre-defined atoms that encode a satisfiability modulo theories solver, built in as a part of a generic graph query engine, for performing graph and hypergraph pattern matching (isomorphic subgraph discovery). This generalizes the idea of a structured query language (SQL) to the domain of generic graphical queries; it is an extended form of a graph query language. A generic rule engine, including a forward chainer and a backward chainer, that is able to chain together rules. The rules are exactly the graph queries of the graph query subsystem, and so the rule engine vaguely resembles a query planner. It is designed so as to allow different kinds of inference engines and reasoning systems to be implemented, such as Bayesian inference or fuzzy logic, or practical tasks, such as constraint solvers or motion planners. An attention allocation subsystem based on economic theory, termed ECAN. This subsystem is used to control the combinatorial explosion of search possibilities that are met during inference and chaining. An implementation of a probabilistic reasoning engine based on probabilistic logic networks. The current implementation uses the rule engine to chain together specific rules of logical inference (such as modus ponens), together with some very specific mathematical formulas assigning a probability and a confidence to each deduction. This subsystem can be thought of as a certain kind of proof assistant that works with a modified form of Bayesian inference. A probabilistic genetic program evolver called Meta-Optimizing Semantic Evolutionary Search, or MOSES. This is used to discover collections of short Atomese programs that accomplish tasks; these can be thought of as performing a kind of decision tree learning, resulting in a kind of decision forest, or rather, a generalization thereof. A natural language input system consisting of Link Grammar, and partly inspired by both Meaning-Text Theory as well as Dick Hudson's Word Grammar, which encodes semantic and syntactic relations in Atomese. A natural language generation system. An implementation of Psi-Theory for handling emotional states, drives and urges, dubbed OpenPsi. Interfaces to Hanson Robotics robots, including emotion modelling via OpenPsi. This includes the Loving AI project, used to demonstrate meditation techniques. == Organization and funding == In 2008, the Machine Intelligence Research Institute (MIRI), formerly called Singularity Institute for Artificial Intelligence (SIAI), sponsored several researchers and engineers. Many contributions from the open source community have been made since OpenCog's involvement in the Google Summer of Code in 2008 and 2009. Currently MIRI no longer supports OpenCog. OpenCog has received funding and support from several sources, including the Hong Kong government, Hong Kong Polytechnic University, the Jeffrey Epstein VI Foundation and Hanson Robotics. In 2013, OpenCog began providing AI solutions to Hanson Robotics, and in 2017, OpenCog became a founding member of SingularityNET. == Applications == Similar to other cognitive architectures, the main purpose is to create virtual humans, which are three dimensional avatar characters. The goal is to mimic behaviors like emotions, gestures and learning. For example, the emotion module in the software was only programmed because humans have emotions. Artificial General Intelligence can be realized if it simulates intelligence of humans. The self-description of the OpenCog project provides additional possible applications which are going into the direction of natural language processing and the simulation of a dog.

    Read more →
  • Mental mapping

    Mental mapping

    In behavioral geography, a mental map is a person's point-of-view perception of their area of interaction. Although this kind of subject matter would seem most likely to be studied by fields in the social sciences, this particular subject is most often studied by modern-day geographers. Researchers have also applied mental mapping to understand and define cognitive regions. They study it to determine subjective qualities from the public such as personal preference and practical uses of geography like driving directions. Mass media also have a virtually direct effect on a person's mental map of the geographical world. The perceived geographical dimensions of a foreign nation (relative to one's own nation) may often be heavily influenced by the amount of time and relative news coverage that the news media may spend covering news events from that foreign region. For instance, a person might perceive a small island to be nearly the size of a continent, merely based on the amount of news coverage that they are exposed to on a regular basis. In psychology, the term names the information maintained in the mind of an organism by means of which it may plan activities, select routes over previously traveled territories, etc. The rapid traversal of a familiar maze depends on this kind of mental map if scents or other markers laid down by the subject are eliminated before the maze is re-run. == Background == Mental maps are an outcome of the field of behavioral geography. The imagined maps are considered one of the first studies that intersected geographical settings with human action. The most prominent contribution and study of mental maps was in the writings of Kevin Lynch. In The Image of the City, Lynch used simple sketches of maps created from memory of an urban area to reveal five elements of the city; nodes, edges, districts, paths and landmarks. Lynch claimed that “Most often our perception of the city is not sustained, but rather partial, fragmentary, mixed with other concerns. Nearly every sense is in operation, and the image is the composite of them all.” (Lynch, 1960, p 2.) The creation of a mental map relies on memory as opposed to being copied from a preexisting map or image. In The Image of the City, Lynch asks a participant to create a map as follows: “Make it just as if you were making a rapid description of the city to a stranger, covering all the main features. We don’t expect an accurate drawing- just a rough sketch.” (Lynch 1960, p 141) In the field of human geography mental maps have led to an emphasizing of social factors and the use of social methods versus quantitative or positivist methods. Mental maps have often led to revelations regarding social conditions of a particular space or area. Haken and Portugali (2003) developed an information view, which argued that the face of the city is its information . Bin Jiang (2012) argued that the image of the city (or mental map) arises out of the scaling of city artifacts and locations. He addressed that why the image of city can be formed , and he even suggested ways of computing the image of the city, or more precisely the kind of collective image of the city, using increasingly available geographic information such as Flickr and Twitter . Using mental maps, we will be able to predict individual decision making and spatial selection, as well as evaluate their routing and navigation. A cognitive maps utility as a mnemonic and metaphorical device is precisely one of its other benefits as a shaper of the world and local attitudes. The first major field of study within the domain of memory maps is geography, spatial cognition and neurophysiology. This aims to understand how routes are drawn by subject from their set of subjects out into space which lead to memorization and internal representations. Overall these representations take the form of drawings, positioning in a graph, or oral/textual narratives, but are reflected as behavior is space that can be recorded as tracking items. == Research applications == Mental maps have been used in a collection of spatial research. Many studies have been performed that focus on the quality of an environment in terms of feelings such as fear, desire and stress. A study by Matei et al. in 2001 used mental maps to reveal the role of media in shaping urban space in Los Angeles. The study used Geographic Information Systems (GIS) to process 215 mental maps taken from seven neighborhoods across the city. The results showed that people's fear perceptions in Los Angeles are not associated with high crime rates but are instead associated with a concentration of certain ethnicities in a given area. The mental maps recorded in the study draw attention to these areas of concentrated ethnicities as parts of the urban space to avoid or stay away from. Mental maps have also been used to describe the urban experience of children. In a 2008 study by Olga den Besten mental maps were used to map out the fears and dislikes of children in Berlin and Paris. The study looked into the absence of children in today's cities and the urban environment from a child's perspective of safety, stress and fear. Peter Gould and Rodney White have performed prominent analyses in the book “Mental Maps.” This book is an investigation into people's spatial desires. The book asks of its participants: “Suppose you were suddenly given the chance to choose where you would like to live- an entirely free choice that you could make quite independently of the usual constraints of income or job availability. Where would you choose to go?” (Gould, 1974, p 15) Gould and White use their findings to create a surface of desire for various areas of the world. The surface of desire is meant to show people's environmental preferences and regional biases. In an experiment done by Edward C. Tolman, the development of a mental map was seen in rats. A rat was placed in a cross shaped maze and allowed to explore it. After this initial exploration, the rat was placed at one arm of the cross and food was placed at the next arm to the immediate right. The rat was conditioned to this layout and learned to turn right at the intersection in order to get to the food. When placed at different arms of the cross maze however, the rat still went in the correct direction to obtain the food because of the initial mental map it had created of the maze. Rather than just deciding to turn right at the intersection no matter what, the rat was able to determine the correct way to the food no matter where in the maze it was placed. The idea of mental maps is also used in strategic analysis. David Brewster, an Australian strategic analyst, has applied the concept to strategic conceptions of South Asia and Southeast Asia. He argues that popular mental maps of where regions begin and end can have a significant impact on the strategic behaviour of states. A collection of essays, documenting current geographical and historical research in mental maps is published by the Journal of Cultural Geography in 2018.

    Read more →
  • Winner-take-all in action selection

    Winner-take-all in action selection

    Winner-take-all is a computer science concept that has been widely applied in behavior-based robotics as a method of action selection for intelligent agents. Winner-take-all systems work by connecting modules (task-designated areas) in such a way that when one action is performed it stops all other actions from being performed, so only one action is occurring at a time. The name comes from the idea that the "winner" action takes all of the motor system's power. == History == In the 1980s and 1990s, many roboticists and cognitive scientists were attempting to find speedier and more efficient alternatives to the traditional world modeling method of action selection. In 1982, Jerome A. Feldman and D.H. Ballard published the "Connectionist Models and Their Properties", referencing and explaining winner-take-all as a method of action selection. Feldman's architecture functioned on the simple rule that in a network of interconnected action modules, each module will set its own output to zero if it reads a higher input than its own in any other module. In 1986, Rodney Brooks introduced behavior-based artificial intelligence. Winner-take-all architectures for action selection soon became a common feature of behavior-based robots, because selection occurred at the level of the action modules (bottom-up) rather than at a separate cognitive level (top-down), producing a tight coupling of stimulus and reaction. == Types of winner-take-all architectures == === Hierarchy === In the hierarchical architecture, actions or behaviors are programmed in a high-to-low priority list, with inhibitory connections between all the action modules. The agent performs low-priority behaviors until a higher-priority behavior is stimulated, at which point the higher behavior inhibits all other behaviors and takes over the motor system completely. Prioritized behaviors are usually key to the immediate survival of the agent, while behaviors of lower priority are less time-sensitive. For example, "run away from predator" would be ranked above "sleep." While this architecture allows for clear programming of goals, many roboticists have moved away from the hierarchy because of its inflexibility. === Heterarchy and fully distributed === In the heterarchy and fully distributed architecture, each behavior has a set of pre-conditions to be met before it can be performed, and a set of post-conditions that will be true after the action has been performed. These pre- and post-conditions determine the order in which behaviors must be performed and are used to causally connect action modules. This enables each module to receive input from other modules as well as from the sensors, so modules can recruit each other. For example, if the agent's goal were to reduce thirst, the behavior "drink" would require the pre-condition of having water available, so the module would activate the module in charge of "find water". The activations organize the behaviors into a sequence, even though only one action is performed at a time. The distribution of larger behaviors across modules makes this system flexible and robust to noise. Some critics of this model hold that any existing set of division rules for the predecessor and conflictor connections between modules produce sub-par action selection. In addition, the feedback loop used in the model can in some circumstances lead to improper action selection. === Arbiter and centrally coordinated === In the arbiter and centrally coordinated architecture, the action modules are not connected to each other but to a central arbiter. When behaviors are triggered, they begin "voting" by sending signals to the arbiter, and the behavior with the highest number of votes is selected. In these systems, bias is created through the "voting weight", or how often a module is allowed to vote. Some arbiter systems take a different spin on this type of winner-take-all by using a "compromise" feature in the arbiter. Each module is able to vote for or against each smaller action in a set of actions, and the arbiter selects the action with the most votes, meaning that it benefits the most behavior modules. This can be seen as violating the general rule against creating representations of the world in behavior-based AI, established by Brooks. By performing command fusion, the system is creating a larger composite pool of knowledge than is obtained from the sensors alone, forming a composite inner representation of the environment. Defenders of these systems argue that forbidding world-modeling puts unnecessary constraints on behavior-based robotics, and that agents benefits from forming representations and can still remain reactive.

    Read more →
  • BabelNet

    BabelNet

    BabelNet is a multilingual lexical-semantic knowledge graph, ontology and encyclopedic dictionary developed at the NLP group of the Sapienza University of Rome under the supervision of Roberto Navigli. BabelNet was automatically created by linking Wikipedia to the most popular computational lexicon of the English language, WordNet. The integration is done using an automatic mapping and by filling in lexical gaps in resource-poor languages by using statistical machine translation. The result is an encyclopedic dictionary that provides concepts and named entities lexicalized in many languages and connected with large amounts of semantic relations. Additional lexicalizations and definitions are added by linking to free-license wordnets, OmegaWiki, the English Wiktionary, Wikidata, FrameNet, VerbNet and others. Similarly to WordNet, BabelNet groups words in different languages into sets of synonyms, called Babel synsets. For each Babel synset, BabelNet provides short definitions (called glosses) in many languages harvested from both WordNet and Wikipedia. == Statistics of BabelNet == As of December 2023, BabelNet (version 5.3) covers 600 languages. It contains almost 23 million synsets and around 1.7 billion word senses (regardless of their language). Each Babel synset contains 2 synonyms per language, i.e., word senses, on average. The semantic network includes all the lexico-semantic relations from WordNet (hypernymy and hyponymy, meronymy and holonymy, antonymy and synonymy, etc., totaling around 364,000 relation edges) as well as an underspecified relatedness relation from Wikipedia (totaling around 1.9 billion edges). Version 5.3 also associates around 61 million images with Babel synsets and provides a Lemon RDF encoding of the resource, available via a SPARQL endpoint. 2.67 million synsets are assigned domain labels. == Applications == BabelNet has been shown to enable multilingual natural language processing applications. The lexicalized knowledge available in BabelNet has been shown to obtain state-of-the-art results in: Semantic relatedness, Multilingual word-sense disambiguation and entity linking, with the Babelfy system, Video games with a purpose. == Prizes and acknowledgments == BabelNet received the META prize 2015 for "groundbreaking work in overcoming language barriers through a multilingual lexicalised semantic network and ontology making use of heterogeneous data sources". The Artificial Intelligence Journal paper that describes BabelNet won the Prominent Paper Award in 2017. BabelNet featured prominently in a Time magazine article about the new age of innovative and up-to-date lexical knowledge resources available on the Web.

    Read more →
  • Mathematical model

    Mathematical model

    A mathematical model is an abstract description of a concrete system using mathematical concepts and language. The process of developing a mathematical model is termed mathematical modeling. Mathematical models are used in many fields, including applied mathematics, natural sciences, social sciences and engineering. In particular, the field of operations research studies the use of mathematical modelling and related tools to solve problems in business or military operations. A model may help to characterize a system by studying the effects of different components, which may be used to make predictions about behavior or solve specific problems. == Elements of a mathematical model == Mathematical models can take many forms, including dynamical systems, statistical models, differential equations, or game theoretic models. These and other types of models can overlap, with a given model involving a variety of abstract structures. In many cases, the quality of a scientific field depends on how well the mathematical models developed on the theoretical side agree with results of repeatable experiments. Lack of agreement between theoretical mathematical models and experimental measurements often leads to important advances as better theories are developed. In the physical sciences, a traditional mathematical model contains most of the following elements: Governing equations Supplementary sub-models Defining equations Constitutive equations Assumptions and constraints Initial and boundary conditions Classical constraints and kinematic equations == Classifications == Mathematical models are of different types: === Linear vs. nonlinear === If all the operators in a mathematical model exhibit linearity, the resulting mathematical model is defined as linear. All other models are considered nonlinear. The definition of linearity and nonlinearity is dependent on context, and linear models may have nonlinear expressions in them. For example, in a statistical linear model, it is assumed that a relationship is linear in the parameters, but it may be nonlinear in the predictor variables. Similarly, a differential equation is said to be linear if it can be written with linear differential operators, but it can still have nonlinear expressions in it. In a mathematical programming model, if the objective functions and constraints are represented entirely by linear equations, then the model is regarded as a linear model. If one or more of the objective functions or constraints are represented with a nonlinear equation, then the model is known as a nonlinear model. Linear structure implies that a problem can be decomposed into simpler parts that can be treated independently or analyzed at a different scale, and therefore that the results will remain valid if the initial is recomposed or rescaled. Nonlinearity, even in fairly simple systems, is often associated with phenomena such as chaos and irreversibility. Although there are exceptions, nonlinear systems and models tend to be more difficult to study than linear ones. A common approach to nonlinear problems is linearization, but this can be problematic if one is trying to study aspects such as irreversibility, which are strongly tied to nonlinearity. === Static vs. dynamic === A dynamic model accounts for time-dependent changes in the state of the system, while a static (or steady-state) model calculates the system in equilibrium, and thus is time-invariant. Dynamic models are typically represented by differential equations or difference equations. === Explicit vs. implicit === If all of the input parameters of the overall model are known, and the output parameters can be calculated by a finite series of computations, the model is said to be explicit. But sometimes it is the output parameters which are known, and the corresponding inputs must be solved for by an iterative procedure, such as Newton's method or Broyden's method. In such a case the model is said to be implicit. For example, a jet engine's physical properties such as turbine and nozzle throat areas can be explicitly calculated given a design thermodynamic cycle (air and fuel flow rates, pressures, and temperatures) at a specific flight condition and power setting, but the engine's operating cycles at other flight conditions and power settings cannot be explicitly calculated from the constant physical properties. === Discrete vs. continuous === A discrete model treats objects as discrete, such as the particles in a molecular model or the states in a statistical model; while a continuous model represents the objects in a continuous manner, such as the velocity field of fluid in pipe flows, temperatures and stresses in a solid, and electric field that applies continuously over the entire model due to a point charge. === Deterministic vs. probabilistic (stochastic) === A deterministic model is one in which every set of variable states is uniquely determined by parameters in the model and by sets of previous states of these variables; therefore, a deterministic model always performs the same way for a given set of initial conditions. Conversely, in a stochastic model—usually called a "statistical model"—randomness is present, and variable states are not described by unique values, but rather by probability distributions. === Deductive, inductive, or floating === A deductive model is a logical structure based on a theory. An inductive model arises from empirical findings and generalization from them. If a model rests on neither theory nor observation, it may be described as a 'floating' model. Application of mathematics in social sciences outside of economics has been criticized for unfounded models. Application of catastrophe theory in science has been characterized as a floating model. === Strategic vs. non-strategic === Models used in game theory are distinct in the sense that they model agents with incompatible incentives, such as competing species or bidders in an auction. Strategic models assume that players are autonomous decision makers who rationally choose actions that maximize their objective function. A key challenge of using strategic models is defining and computing solution concepts such as the Nash equilibrium. An interesting property of strategic models is that they separate reasoning about rules of the game from reasoning about behavior of the players. == Construction == In business and engineering, mathematical models may be used to maximize a certain output. The system under consideration will require certain inputs. The system relating inputs to outputs depends on other variables too: decision variables, state variables, exogenous variables, and random variables. Decision variables are sometimes known as independent variables. Exogenous variables are sometimes known as parameters or constants. The variables are not independent of each other as the state variables are dependent on the decision, input, random, and exogenous variables. Furthermore, the output variables are dependent on the state of the system (represented by the state variables). Objectives and constraints of the system and its users can be represented as functions of the output variables or state variables. The objective functions will depend on the perspective of the model's user. Depending on the context, an objective function is also known as an index of performance, as it is some measure of interest to the user. Although there is no limit to the number of objective functions and constraints a model can have, using or optimizing the model becomes more involved (computationally) as the number increases. For example, economists often apply linear algebra when using input–output models. Complicated mathematical models that have many variables may be consolidated by use of vectors where one symbol represents several variables. === A priori information === Mathematical modeling problems are often classified into black box or white box models, according to how much a priori information on the system is available. A black-box model is a system of which there is no a priori information available. A white-box model (also called glass box or clear box) is a system where all necessary information is available. Practically all systems are somewhere between the black-box and white-box models, so this concept is useful only as an intuitive guide for deciding which approach to take. Usually, it is preferable to use as much a priori information as possible to make the model more accurate. Therefore, the white-box models are usually considered easier, because if you have used the information correctly, then the model will behave correctly. Often the a priori information comes in forms of knowing the type of functions relating different variables. For example, if we make a model of how a medicine works in a human system, we know that usually the amount of medicine in the blood is an exponentially decaying function, but we are still left with several unknown parameters; how

    Read more →
  • 4E cognition

    4E cognition

    4E cognition refers to a group of theories in (the philosophy of) cognitive science that challenge traditional views of the mind as something that happens only inside the brain. The four Es stand for: embodied, meaning that a brain is found in and, more importantly, vitally interconnected with a larger physical/biological body; embedded, which refers to the limitations placed on the body by the external environment and laws of nature; extended, which argues that the mind is supplemented and even enhanced by the exterior world (e.g., writing, a calculator, etc.); and enactive, which is the argument that without dynamic processes, actions that require reactions, the mind would be ineffectual. It could be argued that the four Es are compounding extensions of cognition or the mind, being part of a body that is, in turn, part of an environment which limits it but also allows for certain extensions, all of which require dynamic actions and reactions. == History == Ideas of embodied cognition, or rather the idea that our physical bodies play a crucial role in our decision making, can be traced back as far as Plato's dialogues and Aristotelian thought. It was, however, in the twentieth century that this debate began to resemble the current discussion, fueled by disagreements between cognitivists and behaviourists. Tensions within cognitivism, as well as the increasing popularity of neurobiology, led, on the one side, to a predominant focus on internal, cognitive processes while neglecting environmental factors, which in turn caused a push-back fuelling our modern understanding of embodied cognition. The term 4E cognition is hard to trace back to its first use, however, some sources attribute it to Shaun Gallagher and the conference on 4E cognition he organised in 2007, while others indicate the term to be first used in 2006 at an 'Embodied mind workshop' at Cardiff University that Gallagher attended. Embodiment or embodied cognition arguably presents the bridge between cognitivism and 4E cognition as the embodiment of cognitive function provides the necessary conditions for embeddedness, enactedness, and extendedness to connect to cognition. 4E cognition was and is heavily influenced by phenomenology. The ideas are still rather fragmented in nature due to their four main components, which can not be neatly divided, causing conceptual questions of internal boundary concepts. As a young field, it is held back both by its fragmented nature and a relative lack of critical evaluations. It is important to acknowledge that 4E cognition, though young, is a broad field containing and combining several different theoretical perspectives that conflict with one another to varying degrees. The somewhat convoluted and competing nature of the theories that can be grouped as 4E cognition, as well as the field's relative youth, make it difficult to put together an exhaustive history beyond the history of its four main theoretical pillars: embodiment, embeddedness, extendedness, and enactedness. == Importance and core tenets of 4E == If there are separate theories of cognition (e.g., embodied, extended, etc.), why group them under this umbrella, causing important epistemological and especially ontological dilemmas? Notably, other theories of 'non-traditional' cognition are not included under the 4E umbrella. The four E's in 4E cognition importantly all reject, or at a minimum draw into question, some of the core tenets of traditional cognitivism. Importantly, 4E cognition is seen as deindividualizing cognition to some extent, allowing for a broader examination of the interplay of personal, social, political, and ethical aspects that shape human cognition. This can be compared to advancements in the field of epigenetics, which have allowed for a broader examination of environmental (both natural and social) factors and their influence on what had previously only been subject to genetic theorizing. In a similar vein, 4E cognition might also help ground cognition in evolutionary theory by extending cognition to a biological account subject to development over time by means of evolution. Overall, the importance of the extension that is 4E cognition aims to reexamine ideas of a self-centered view of cognition, advocating for a more holistic approach. Ideally, this would allow us to reconsider ideas of justice and individual rights and responsibilities that take into account a more nuanced understanding of the relations between people and their context, balancing self-agency with factors beyond it. === Conceptual differences from cognitive psychology === According to the traditional teachings of cognitive psychology, cognition is a type of information processing based on representational mental structures. This idea, as the name suggests, was heavily influenced by computer science. In this light, the brain is a kind of central processing unit that organises and directs all else. The classical cognitivist view draws a strong boundary between 'the internal' and 'the external', where cognition is solely a subject of 'the internal' realm. The four E's, however, break down this boundary. Cognition can not reside solely within the confines of our heads if it is also embodied, embedded, enacted, and extended. In a way, 4E cognition is interested in the extracranial processes affecting cognition. == From embodied cognition to 4E cognition == === The strong and the weak view === ==== Embodied cognition ==== Broadly speaking, there is a strong and a weak perspective of embodied cognition in 4E cognition. The weak understanding refers to mental processes being causally dependent on extracranial processes. This essentially means that there is a cause and effect or action-reaction relationship between the mind and the body and its environment, etc. The strong perspective views extracranial processes as a (partial) constitutive aspect of cognition. An example here could be using a calculator to solve math problems. The calculator is not part of your brain or mind, but it supports your cognitive processes. === Extracranial processes: bodily or extrabodily === In addition to the weak and the strong reading of 4E cognition, there is also the distinction between bodily and extrabodily extracranial processes. Bodily extracranial processes refer to processes within the body, e.g., sensory perception. Extrabodily extracranial processes refer to processes outside of the body, like the aforementioned calculator example. === Four claims of embodied cognition === ==== Embedded and extended cognition ==== When combining the weak/strong reading of embodied cognition and bodily/extrabodily extracranial process, four claims about embodied cognition emerge: strongly embodied and bodily processes strongly embodied and extrabodily processes weakly embodied and bodily processes weakly embodied and extrabodily processes The first and third claims signify a strong and a weak reading of embodied cognition in the more classical sense. The second claim fits almost perfectly with embedded cognition. Claim two is most compatible with extended cognition. ==== Enacted cognition ==== Finally, enacted cognition refers to cognition being connected to active interaction between a conscious agent and their environment. Here, too, there can be a weak and a strong reading. == Criticisms == Given the divided nature of the field, much criticism surrounding the lack of unity within the field has emerged. In particular, the claims of embodied cognition centering around the body appear to conflict with the tenets of extended cognition, which also appear to conflict with the body/environment distinction that is central to enactivism. Some theoreticians argue that the umbrella of 4E theories is still lacking a common language that might bridge the gaps between the theories that constitute it. There is also the concern that the grouping of such variable theories results in an important loss of nuance and complexity, which is a part of human cognition. Another concern raised is the "dogma of harmony". The criticism contained there regards the notion that within 4E theorizing, there is generally an optimistic and harmonic expectation of the extension between humans and their technologies, ignoring the possibility of those extensions detracting from cognition in some way rather than adding to it. Recent attempts to incorporate embodied cognitive neuroscience have been argued to hold the potential to resolve internal issues within 4E cognition. Overall, a concern often voiced regarding 4E cognition is that its proponents are at best only vaguely interested in cognition. More broadly, this concern reflects the arguably too distracted nature of this emerging field.

    Read more →
  • Proximal gradient methods for learning

    Proximal gradient methods for learning

    Proximal gradient (forward backward splitting) methods for learning is an area of research in optimization and statistical learning theory which studies algorithms for a general class of convex regularization problems where the regularization penalty may not be differentiable. One such example is ℓ 1 {\displaystyle \ell _{1}} regularization (also known as Lasso) of the form min w ∈ R d 1 n ∑ i = 1 n ( y i − ⟨ w , x i ⟩ ) 2 + λ ‖ w ‖ 1 , where x i ∈ R d and y i ∈ R . {\displaystyle \min _{w\in \mathbb {R} ^{d}}{\frac {1}{n}}\sum _{i=1}^{n}(y_{i}-\langle w,x_{i}\rangle )^{2}+\lambda \|w\|_{1},\quad {\text{ where }}x_{i}\in \mathbb {R} ^{d}{\text{ and }}y_{i}\in \mathbb {R} .} Proximal gradient methods offer a general framework for solving regularization problems from statistical learning theory with penalties that are tailored to a specific problem application. Such customized penalties can help to induce certain structure in problem solutions, such as sparsity (in the case of lasso) or group structure (in the case of group lasso). == Relevant background == Proximal gradient methods are applicable in a wide variety of scenarios for solving convex optimization problems of the form min x ∈ H F ( x ) + R ( x ) , {\displaystyle \min _{x\in {\mathcal {H}}}F(x)+R(x),} where F {\displaystyle F} is convex and differentiable with Lipschitz continuous gradient, R {\displaystyle R} is a convex, lower semicontinuous function which is possibly nondifferentiable, and H {\displaystyle {\mathcal {H}}} is some set, typically a Hilbert space. The usual criterion of x {\displaystyle x} minimizes F ( x ) + R ( x ) {\displaystyle F(x)+R(x)} if and only if ∇ ( F + R ) ( x ) = 0 {\displaystyle \nabla (F+R)(x)=0} in the convex, differentiable setting is now replaced by 0 ∈ ∂ ( F + R ) ( x ) , {\displaystyle 0\in \partial (F+R)(x),} where ∂ φ {\displaystyle \partial \varphi } denotes the subdifferential of a real-valued, convex function φ {\displaystyle \varphi } . Given a convex function φ : H → R {\displaystyle \varphi :{\mathcal {H}}\to \mathbb {R} } an important operator to consider is its proximal operator prox φ : H → H {\displaystyle \operatorname {prox} _{\varphi }:{\mathcal {H}}\to {\mathcal {H}}} defined by prox φ ⁡ ( u ) = arg ⁡ min x ∈ H φ ( x ) + 1 2 ‖ u − x ‖ 2 2 , {\displaystyle \operatorname {prox} _{\varphi }(u)=\operatorname {arg} \min _{x\in {\mathcal {H}}}\varphi (x)+{\frac {1}{2}}\|u-x\|_{2}^{2},} which is well-defined because of the strict convexity of the ℓ 2 {\displaystyle \ell _{2}} norm. The proximal operator can be seen as a generalization of a projection. We see that the proximity operator is important because x ∗ {\displaystyle x^{}} is a minimizer to the problem min x ∈ H F ( x ) + R ( x ) {\displaystyle \min _{x\in {\mathcal {H}}}F(x)+R(x)} if and only if x ∗ = prox γ R ⁡ ( x ∗ − γ ∇ F ( x ∗ ) ) , {\displaystyle x^{}=\operatorname {prox} _{\gamma R}\left(x^{}-\gamma \nabla F(x^{})\right),} where γ > 0 {\displaystyle \gamma >0} is any positive real number. === Moreau decomposition === One important technique related to proximal gradient methods is the Moreau decomposition, which decomposes the identity operator as the sum of two proximity operators. Namely, let φ : X → R {\displaystyle \varphi :{\mathcal {X}}\to \mathbb {R} } be a lower semicontinuous, convex function on a vector space X {\displaystyle {\mathcal {X}}} . We define its Fenchel conjugate φ ∗ : X → R {\displaystyle \varphi ^{}:{\mathcal {X}}\to \mathbb {R} } to be the function φ ∗ ( u ) := sup x ∈ X ⟨ x , u ⟩ − φ ( x ) . {\displaystyle \varphi ^{}(u):=\sup _{x\in {\mathcal {X}}}\langle x,u\rangle -\varphi (x).} The general form of Moreau's decomposition states that for any x ∈ X {\displaystyle x\in {\mathcal {X}}} and any γ > 0 {\displaystyle \gamma >0} that x = prox γ φ ⁡ ( x ) + γ prox φ ∗ / γ ⁡ ( x / γ ) , {\displaystyle x=\operatorname {prox} _{\gamma \varphi }(x)+\gamma \operatorname {prox} _{\varphi ^{}/\gamma }(x/\gamma ),} which for γ = 1 {\displaystyle \gamma =1} implies that x = prox φ ⁡ ( x ) + prox φ ∗ ⁡ ( x ) {\displaystyle x=\operatorname {prox} _{\varphi }(x)+\operatorname {prox} _{\varphi ^{}}(x)} . The Moreau decomposition can be seen to be a generalization of the usual orthogonal decomposition of a vector space, analogous with the fact that proximity operators are generalizations of projections. In certain situations it may be easier to compute the proximity operator for the conjugate φ ∗ {\displaystyle \varphi ^{}} instead of the function φ {\displaystyle \varphi } , and therefore the Moreau decomposition can be applied. This is the case for group lasso. == Lasso regularization == Consider the regularized empirical risk minimization problem with square loss and with the ℓ 1 {\displaystyle \ell _{1}} norm as the regularization penalty: min w ∈ R d 1 n ∑ i = 1 n ( y i − ⟨ w , x i ⟩ ) 2 + λ ‖ w ‖ 1 , {\displaystyle \min _{w\in \mathbb {R} ^{d}}{\frac {1}{n}}\sum _{i=1}^{n}(y_{i}-\langle w,x_{i}\rangle )^{2}+\lambda \|w\|_{1},} where x i ∈ R d and y i ∈ R . {\displaystyle x_{i}\in \mathbb {R} ^{d}{\text{ and }}y_{i}\in \mathbb {R} .} The ℓ 1 {\displaystyle \ell _{1}} regularization problem is sometimes referred to as lasso (least absolute shrinkage and selection operator). Such ℓ 1 {\displaystyle \ell _{1}} regularization problems are interesting because they induce sparse solutions, that is, solutions w {\displaystyle w} to the minimization problem have relatively few nonzero components. Lasso can be seen to be a convex relaxation of the non-convex problem min w ∈ R d 1 n ∑ i = 1 n ( y i − ⟨ w , x i ⟩ ) 2 + λ ‖ w ‖ 0 , {\displaystyle \min _{w\in \mathbb {R} ^{d}}{\frac {1}{n}}\sum _{i=1}^{n}(y_{i}-\langle w,x_{i}\rangle )^{2}+\lambda \|w\|_{0},} where ‖ w ‖ 0 {\displaystyle \|w\|_{0}} denotes the ℓ 0 {\displaystyle \ell _{0}} "norm", which is the number of nonzero entries of the vector w {\displaystyle w} . Sparse solutions are of particular interest in learning theory for interpretability of results: a sparse solution can identify a small number of important factors. === Solving for L1 proximity operator === For simplicity we restrict our attention to the problem where λ = 1 {\displaystyle \lambda =1} . To solve the problem min w ∈ R d 1 n ∑ i = 1 n ( y i − ⟨ w , x i ⟩ ) 2 + ‖ w ‖ 1 , {\displaystyle \min _{w\in \mathbb {R} ^{d}}{\frac {1}{n}}\sum _{i=1}^{n}(y_{i}-\langle w,x_{i}\rangle )^{2}+\|w\|_{1},} we consider our objective function in two parts: a convex, differentiable term F ( w ) = 1 n ∑ i = 1 n ( y i − ⟨ w , x i ⟩ ) 2 {\displaystyle F(w)={\frac {1}{n}}\sum _{i=1}^{n}(y_{i}-\langle w,x_{i}\rangle )^{2}} and a convex function R ( w ) = ‖ w ‖ 1 {\displaystyle R(w)=\|w\|_{1}} . Note that R {\displaystyle R} is not strictly convex. Let us compute the proximity operator for R ( w ) {\displaystyle R(w)} . First we find an alternative characterization of the proximity operator prox R ⁡ ( x ) {\displaystyle \operatorname {prox} _{R}(x)} as follows: u = prox R ⁡ ( x ) ⟺ 0 ∈ ∂ ( R ( u ) + 1 2 ‖ u − x ‖ 2 2 ) ⟺ 0 ∈ ∂ R ( u ) + u − x ⟺ x − u ∈ ∂ R ( u ) . {\displaystyle {\begin{aligned}u=\operatorname {prox} _{R}(x)\iff &0\in \partial \left(R(u)+{\frac {1}{2}}\|u-x\|_{2}^{2}\right)\\\iff &0\in \partial R(u)+u-x\\\iff &x-u\in \partial R(u).\end{aligned}}} For R ( w ) = ‖ w ‖ 1 {\displaystyle R(w)=\|w\|_{1}} it is easy to compute ∂ R ( w ) {\displaystyle \partial R(w)} : the i {\displaystyle i} th entry of ∂ R ( w ) {\displaystyle \partial R(w)} is precisely ∂ | w i | = { 1 , w i > 0 − 1 , w i < 0 [ − 1 , 1 ] , w i = 0. {\displaystyle \partial |w_{i}|={\begin{cases}1,&w_{i}>0\\-1,&w_{i}<0\\\left[-1,1\right],&w_{i}=0.\end{cases}}} Using the recharacterization of the proximity operator given above, for the choice of R ( w ) = ‖ w ‖ 1 {\displaystyle R(w)=\|w\|_{1}} and γ > 0 {\displaystyle \gamma >0} we have that prox γ R ⁡ ( x ) {\displaystyle \operatorname {prox} _{\gamma R}(x)} is defined entrywise by ( prox γ R ⁡ ( x ) ) i = { x i − γ , x i > γ 0 , | x i | ≤ γ x i + γ , x i < − γ , {\displaystyle \left(\operatorname {prox} _{\gamma R}(x)\right)_{i}={\begin{cases}x_{i}-\gamma ,&x_{i}>\gamma \\0,&|x_{i}|\leq \gamma \\x_{i}+\gamma ,&x_{i}<-\gamma ,\end{cases}}} which is known as the soft thresholding operator S γ ( x ) = prox γ ‖ ⋅ ‖ 1 ⁡ ( x ) {\displaystyle S_{\gamma }(x)=\operatorname {prox} _{\gamma \|\cdot \|_{1}}(x)} . === Fixed point iterative schemes === To finally solve the lasso problem we consider the fixed point equation shown earlier: x ∗ = prox γ R ⁡ ( x ∗ − γ ∇ F ( x ∗ ) ) . {\displaystyle x^{}=\operatorname {prox} _{\gamma R}\left(x^{}-\gamma \nabla F(x^{})\right).} Given that we have computed the form of the proximity operator explicitly, then we can define a standard fixed point iteration procedure. Namely, fix some initial w 0 ∈ R d {\displaystyle w^{0}\in \mathbb {R} ^{d}} , and for k = 1 , 2 , … {\displaystyle k=1,2,\ldots } define w k + 1 = S γ ( w k − γ ∇ F ( w k ) ) . {\displaystyle w^{k+1}=S_{\gamma }\left(w^{k}-\gamma \nabla F\l

    Read more →
  • Issue tree

    Issue tree

    An issue tree, also called logic tree, is a graphical breakdown of a question that dissects it into its different components vertically and that progresses into details as it reads to the right. Issue trees are useful in problem solving to identify the root causes of a problem as well as to identify its potential solutions. They also provide a reference point to see how each piece fits into the whole picture of a problem. == Types == According to professor of strategy Arnaud Chevallier, elaborating an approach used at McKinsey & Company, there are two types of issue trees: diagnostic ones and solution ones. Diagnostic trees break down a "why" key question, identifying all the possible root causes for the problem. Solution trees break down a "how" key question, identifying all the possible alternatives to fix the problem. == Rules == Four basic rules can help ensure that issue trees are optimal, according to Chevallier: Consistently answer a "why" or a "how" question Progress from the key question to the analysis as it moves to the right Have branches that are mutually exclusive and collectively exhaustive (MECE) Use an insightful breakdown The requirement for issue trees to be collectively exhaustive implies that divergent thinking is a critical skill. == Applications == === In management interviews === Issue trees are used to answer questions in case interviews for management consulting positions. A quantitative type of question, the market sizing question, requires the interviewee to estimate the size of a data group such as a specific segment of a population, an amount of objects, a company's revenues, or similar. The candidates are expected to use a structured and logical method of arriving at their answer, and using an issue tree provides a diagram to aid the candidate's logical reasoning. Issue trees are used for other types of case interview questions as well.

    Read more →
  • Jakub Pachocki

    Jakub Pachocki

    Jakub Pachocki (born 1991) is a Polish computer scientist and former competitive programmer. He is best known as OpenAI's chief scientist and for his role in overseeing development of GPT-4. == Background == Pachocki was born in 1991 in Gdańsk, Poland. In high school, he was a six-time finalist of the Polish Olympiad in Informatics. In 2009, he qualified for the International Olympiad in Informatics, winning a silver medal. Pachocki obtained his undergraduate degree in Computer Science from the University of Warsaw. He represented his university at the International Collegiate Programming Contest with his team winning a gold medal and coming second place overall in 2012. In the same year he was also the champion of the Google Code Jam. From 2011 to 2012, Pachocki worked at Facebook as a software engineering intern. Pachocki attended graduate school at Carnegie Mellon University, where he obtained his PhD under the supervision of Gary Miller. == Career == After graduation, Pachocki did postdoc work at Harvard University and Simons Institute for the Theory of Computing. === OpenAI === In 2017, Pachocki joined OpenAI. In 2021, he became OpenAI's research director where he led the development of GPT-4 and OpenAI Five. In May 2024, he became chief scientist after his mentor Ilya Sutskever left the company. OpenAI CEO Sam Altman has called Pachocki "easily one of the greatest minds of our generation". == Competitive programming achievements == International Olympiad in Informatics: Silver medal (2009) International Collegiate Programming Contest World Finals: Gold medal (second place overall in 2012) Google Code Jam: Champion (2012), Third place (2011) Facebook Hacker Cup: Second place (2013) TopCoder Open Algorithm: Second place (2012) A more comprehensive list of achievements can be found at the Competitive Programming Hall Of Fame website.

    Read more →
  • Babelfy

    Babelfy

    Babelfy is a software algorithm for the disambiguation of text written in any language. It performs the tasks of multilingual Word Sense Disambiguation (i.e., the disambiguation of common nouns, verbs, adjectives and adverbs) and Entity Linking (i.e. the disambiguation of mentions to encyclopedic entities like people, companies, places, etc.). == Overview == Babelfy uses the BabelNet multilingual knowledge graph to perform disambiguation and entity linking in three steps: It associates with each vertex of the BabelNet semantic network, i.e., either concept or named entity, a semantic signature, that is, a set of related vertices. This is a preliminary step which needs to be performed only once, independently of the input text. Given an input text, it extracts all the linkable fragments from this text and, for each of them, lists the possible meanings according to the semantic network. It creates a graph-based semantic interpretation of the whole text by linking the candidate meanings of the extracted fragments using the previously computed semantic signatures. It then extracts a dense subgraph of this representation and selects the best candidate meaning for each fragment. As a result, the text, written in any of the 271 languages supported by BabelNet, is output with possibly overlapping semantic annotations.

    Read more →
  • OpenIO

    OpenIO

    OpenIO offered object storage for a wide range of high-performance applications. OpenIO was founded in 2015 by Laurent Denel (CEO), Jean-François Smigielski (CTO) and five other co-founders; it leveraged open source software, developed since 2006, based on a grid technology that enabled dynamic behaviour and supported heterogenous hardware. In October 2017 OpenIO was completed a $5 million funding rounds. In July 2020 OpenIO had been acquired by OVH and withdrawn from the market to become the core technology of OVHcloud object storage offering. == Software == OpenIO is a software-defined object store that supports S3 and can be deployed on-premises, cloud-hosted or at the edge, on any hardware mix. It has been designed from the beginning for performance and cost-efficiency at any scale, and it has been optimized for Big Data, HPC and AI. OpenIO stores objects within a flat structure within a massively distributed directory with indirections, which allows the data query path to be independent of the number of nodes and the performance not to be affected by the growth of capacity. Servers are organized as a grid of nodes massively distributed, where each node takes part in directory and storage services, which ensures that there is no single point of failure and that new nodes are automatically discovered and immediately available without the need to rebalance data. The software is built on top of a technology that ensures optimal data placement based on real-time metrics and allows the addition or removal of storage devices with automatic performance and load impact optimization. For data protection OpenIO has synchronous and asynchronous replication with multiple copies, and an erasure coding implementation based on Reed-Solomon that can be deployed in one data center or geo-distributed or stretched clusters. The software has a feature that catches all events that occur in the cluster and can pass them up in the stack or to applications running on OpenIO nodes. This enables event-driven computing directly into the storage infrastructure. The open source code is available on Github and it is licensed under AGPL3 for server code and LGPL3 for client code. == Performance == OpenIO claimed in 2019 to have reached 1.372 Tbit/s write speed (171 GB/s) on a cluster of 350 physical machines. The benchmark scenario, conducted under production conditions with standard hardware (commodity servers with 7200 rpm HDDs), consisted in backing up a 38 PB Hadoop datalake via the DistCp command. This level of performance marked, according to analysts, the arrival of a new generation of object storage technologies oriented toward high performance and hyper-scalability.

    Read more →
  • Framework Convention on Artificial Intelligence

    Framework Convention on Artificial Intelligence

    The Framework Convention on Artificial Intelligence and Human Rights, Democracy and the Rule of Law (also called Framework Convention on Artificial Intelligence or AI convention) is an international treaty on artificial intelligence. It was adopted under the auspices of the Council of Europe (CoE) and signed on 5 September 2024. The treaty aims to ensure that the development and use of AI technologies align with fundamental human rights, democratic values, and the rule of law, addressing risks such as misinformation, algorithmic discrimination, and threats to public institutions. More than 50 countries, including the EU member states, have endorsed the Framework Convention on Artificial Intelligence. == Background == The development of the Framework Convention on AI emerged in response to growing concerns over the ethical, legal, and societal impacts of artificial intelligence. The Council of Europe, which has historically played a key role in setting human rights standards across Europe, initiated discussions on AI governance in 2020, leading to the drafting of a binding legal framework. The process of creating the Framework Convention began in 2019 with the ad hoc Committee on Artificial Intelligence (CAHAI) assessing the feasibility of the instrument. In 2022, the Committee on Artificial Intelligence (CAI) took over the process, drafting and negotiating the text of the Convention. The treaty is designed to complement existing international human rights instruments, including the European Convention on Human Rights and the Convention for the Protection of Individuals with regard to Automatic Processing of Personal Data. == Structure and content == The Convention establishes fundamental principles for AI governance, including transparency, accountability, non-discrimination, and human rights protection through eight chapters and 26 articles. Adopted in 2024, this landmark treaty addresses AI governance through seven core principles and detailed implementation mechanisms. It mandates risk and impact assessments to mitigate potential harms and provides safeguards such as the right to challenge AI-driven decisions. It applies to public authorities and private entities acting on their behalf but excludes national security and defense activities. Implementation is overseen by a Conference of the Parties, ensuring compliance and international cooperation. Activities within the AI system lifecycle must adhere to seven fundamental principles, ensuring compliance with human rights, democracy, and the rule of law. The treaty also establishes remedies, procedural rights and safeguards, and risk and impact management requirements to promote accountability, transparency, and responsible AI development. The treaty consists of five chapters. Chapter I contains general provisions. Chapter II states the general obligation to protect human rights and the integrity of democratic processes and respect of the rule of law. The main principles and rights are contained in Chapter III, which consists of Articles 6 to 13. Chapter IV (Articles 14 to 15) sets up the legal remedies. Chapter V states the risk and impact management framework. Chapter VI facilitates the implementation criteria of the treaty. Chapter VII sets the co-operation and oversight mechanisms. Chapter VIII contains various concluding clauses. Article 1 declares the objectives of the treaty, to ensure that activities within the lifecycle of artificial intelligence systems are fully consistent with human rights, democracy and the rule of law. == Entry into force == The treaty will enter into force on the first day of the month following the expiration of a period of three months after the date on which five ratification made by five countries, including three member states of the Council of Europe. == Competing approaches == While the CoE's AI Convention represents a multilateral effort to regulate AI through a human rights-based approach, alternative frameworks have also been proposed. One notable example is the Munich Draft for a Convention on AI, Data and Human Rights, an initiative led by legal scholars and policymakers in Germany. The Munich Draft advocates for stronger safeguards against AI-related risks, emphasizing stricter data protection measures, accountability for AI developers, and explicit prohibitions on high-risk AI applications, such as mass surveillance and autonomous lethal weapons. Unlike the CoE convention, which focuses on balancing innovation with regulation, the Munich Draft takes a more precautionary stance, calling for tighter controls over AI deployment in sensitive domains. Other competing international efforts include the OECD’s AI Principles, the GPAI (Global Partnership on AI), and the European Union's AI Act, each of which offers different regulatory strategies to govern AI at regional and global levels. == Signatories == Signatories include Andorra, Canada, the European Union, Georgia, Iceland, Israel, Japan, Liechtenstein, the Republic of Moldova, Montenegro, Norway, San Marino, Switzerland, Ukraine, the United Kingdom, the United States, and Uruguay. == Endorsement == The treaty was widely endorsed by leading AI policy experts, including Stuart J. Russell, Virginia Dignum, Emma Ruttkamp-Bloem, Pascal Pichonnaz, Maria Helen Murphy, Angella Ndaka, Hannes Werthner, Katja Langenbucher, Gry Hasselbalch, Ricardo Baeza-Yates, Kutoma Wakunuma, Gianclaudio Malgieri, Oreste Pollicino, Nagla Rizk, Giovanni Sartor, Lee Tiedrich, Ingrid Schneider, Eduardo Bertoni, Garry Kasparov, Merve Hikcok, and Marc Rotenberg. The treaty was also endorsed by notable political leaders, including Theodoros Roussopoulos, President of the Parliamentart Assembly in the Council of Europe, and Christopher Holmes, Member of the House of Lords of the United Kingdom, and by the International Bar Association (IBA), and personally by Almudena Arpón de Mendívil, President of the IBA. The Center for AI and Digital Policy (CAIDP) has been carrying out a campaign to promote endorsement of the treaty by urging various countries to sign and ratify the treaty. The CAIDP further urged the countries to make a clear and firm commitment to ensure the full inclusion of the private sector under the treaty’s provisions.

    Read more →
  • Lumpers and splitters

    Lumpers and splitters

    Lumpers and splitters are opposing factions in any academic discipline that has to place individual examples into rigorously defined categories. The lumper–splitter problem occurs when there is the desire to create classifications and assign examples to them, for example, schools of literature, biological taxa, and so on. A "lumper" is a person who assigns examples broadly, judging that differences are not as important as signature similarities. A "splitter" makes precise definitions, and creates new categories to classify samples that differ in key ways. == Origin of the terms == The earliest known use of these terms was thought to be by Charles Darwin, in a letter to Joseph Dalton Hooker in 1857: "It is good to have hair-splitters & lumpers". But according to research done by the deputy director at NCSE, Glenn Branch, the credit is due to naturalist Edward Newman who wrote in 1845, "The time has arrived for discarding imaginary species, and the duty of doing this is as imperative as the admission of new ones when such are really discovered. The talents described under the respective names of 'hair-splitting' and 'lumping' are unquestionably yielding their power to the mightier power of Truth." They were then introduced more widely by George G. Simpson in his 1945 work The Principles of Classification and a Classification of Mammals. As he put it: splitters make very small units – their critics say that if they can tell two animals apart, they place them in different genera ... and if they cannot tell them apart, they place them in different species. ... Lumpers make large units – their critics say that if a carnivore is neither a dog nor a bear, they call it a cat. A later use can be found in the title of a 1969 paper "On lumpers and splitters ..." by the medical geneticist Victor McKusick. Reference to lumpers and splitters in the humanities appeared in a debate in 1975 between J. H. Hexter and Christopher Hill, in the Times Literary Supplement. It followed from Hexter's detailed review of Hill's book Change and Continuity in Seventeenth Century England, in which Hill developed Max Weber's argument that the rise of capitalism was facilitated by Calvinist Puritanism. Hexter objected to Hill's "mining" of sources to find evidence that supported his theories. Hexter argued that Hill plucked quotations from sources in a way that distorted their meaning. Hexter explained this as a mental habit that he called "lumping". According to him, "lumpers" rejected differences and chose to emphasise similarities. Any evidence that did not fit their arguments was ignored as aberrant. Splitters, by contrast, emphasised differences, and resisted simple schemes. While lumpers consistently tried to create coherent patterns, splitters preferred incoherent complexity. == Usage in various fields == === Biology === The categorisation and naming of a particular species should be regarded as a hypothesis about the evolutionary relationships and distinguishability of that group of organisms. As further information comes to hand, the hypothesis may be confirmed or refuted. Sometimes, especially in the past when communication was more difficult, taxonomists working in isolation have given two distinct names to individual organisms later identified as the same species. When two named species are agreed to be of the same species, the older species name is almost always retained dropping the newer species name honouring a convention known as "priority of nomenclature". This form of lumping is technically called synonymisation. Dividing a taxon into multiple, often new, taxa is called splitting. Taxonomists are often referred to as "lumpers" or "splitters" by their colleagues, depending on their personal approach to recognizing differences or commonalities between organisms. For example, the number of genera used in Pteridophyte Phylogeny Group I (PPG I) has proved controversial. PPG I uses 18 lycophyte and 319 fern genera. The earlier system put forward by Smith et al. (2006) had suggested a range of 274 to 312 genera for ferns alone. By contrast, the system of Christenhusz & Chase (2014) used 5 lycophyte and about 212 fern genera. The number of fern genera was further reduced to 207 in a subsequent publication. Defending PPG I, Schuettpelz et al. (2018) argue that the larger number of genera is a result of "the gradual accumulation of new collections and new data" and hence "a greater appreciation of fern diversity and ... an improved ability to distinguish taxa". They also argue that the number of species per genus in the PPG I system is already higher than in other groups of organisms (about 33 species per genus for ferns as opposed to about 22 species per genus for angiosperms) and that reducing the number of genera as Christenhusz and Chase propose yields the excessive number of about 50 species per genus for ferns. In response, Christenhusz and Chase (2018) argue that the excessive splitting of genera destabilises the usage of names and will lead to greater instability in future, and that the highly split genera have few if any characters that can be used to recognise them, making identification difficult, even to generic level. They further argue that comparing numbers of species per genus in different groups is "fundamentally meaningless". === History === In history, lumpers are those who tend to create broad definitions that cover large periods of time and many disciplines, whereas splitters want to assign names to tight groups of inter-relationships. Lumping tends to create a more and more unwieldy definition, with members having less and less mutually in common. This can lead to definitions which are little more than conventionalities, or groups which join fundamentally different examples. Splitting often leads to "distinctions without difference", ornate and fussy categories, and failure to see underlying similarities. For example, in the arts, "Romantic" can refer specifically to a period of German poetry roughly from 1780 to 1810, but would exclude the later work of Goethe, among other writers. In music it can mean every composer from Hummel through Rachmaninoff, plus many that came after. === Software modelling === Software engineering often proceeds by building models (sometimes known as model-driven architecture). A lumper is keen to generalise, and produces models with a small number of broadly defined objects. A splitter is reluctant to generalise, and produces models with a large number of narrowly defined objects. Conversion between the two styles is not necessarily symmetrical. For example, if error messages in two narrowly defined classes behave in the same way, the classes can be easily combined. But if some messages in a broad class behave differently, every object in the class must be examined before the class can be split. This illustrates the principle that "splits can be lumped more easily than lumps can be split". === Language classification === There is no agreement among historical linguists about what amount of evidence is needed for two languages to be safely classified in the same language family. For this reason, many proposed language families have had lumper–splitter controversies, including Altaic, Pama–Nyungan, Nilo-Saharan, and most of the larger families of the Americas. At a completely different level, the splitting of a mutually intelligible dialect continuum into different languages, or lumping them into one, is also an issue that continually comes up, though the consensus in contemporary linguistics is that there is no completely objective way to settle the question. Splitters regard the comparative method (meaning not comparison in general, but only reconstruction of a common ancestor or protolanguage) as the only valid proof of kinship, and consider genetic relatedness to be the question of interest. American linguists of recent decades tend to be splitters. Lumpers are more willing to admit techniques like mass lexical comparison or lexicostatistics, and mass typological comparison, and to tolerate the uncertainty of whether relationships found by these methods are the result of linguistic divergence (descent from common ancestor) or language convergence (borrowing). Much long-range comparison work has been from Russian linguists belonging to the Moscow School of Comparative Linguistics, most notably Vladislav Illich-Svitych and Sergei Starostin. In the United States, Greenberg and Ruhlen's work has been met with little acceptance from linguists. Earlier American linguists like Morris Swadesh and Edward Sapir also pursued large-scale classifications like Sapir's 1929 scheme for the Americas, accompanied by controversy similar to that today. === Religious studies === Paul F. Bradshaw suggests that the same principles of lumping and splitting apply to the study of early Christian liturgy. Lumpers, who tend to predominate in this field, try to find a single line of successive texts from the apostolic age to the

    Read more →