AI Builder Pricing

AI Builder Pricing — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Symbolic regression

    Symbolic regression

    Symbolic regression (SR) is a type of regression analysis that searches the space of mathematical expressions to find the model that best fits a given dataset, both in terms of accuracy and simplicity. No particular model is provided as a starting point for symbolic regression. Instead, initial expressions are formed by randomly combining mathematical building blocks such as mathematical operators, analytic functions, constants, and state variables. Usually, a subset of these primitives will be specified by the person operating it, but that's not a requirement of the technique. The symbolic regression problem for mathematical functions has been tackled with a variety of methods, including recombining equations most commonly using genetic programming, as well as more recent methods utilizing Bayesian methods and neural networks. Another non-classical alternative method to SR is called Universal Functions Originator (UFO), which has a different mechanism, search-space, and building strategy. Further methods such as Exact Learning attempt to transform the fitting problem into a moments problem in a natural function space, usually built around generalizations of the Meijer-G function. By not requiring a priori specification of a model, symbolic regression isn't affected by human bias, or unknown gaps in domain knowledge. It attempts to uncover the intrinsic relationships of the dataset, by letting the patterns in the data itself reveal the appropriate models, rather than imposing a model structure that is deemed mathematically tractable from a human perspective. The fitness function that drives the evolution of the models takes into account not only error metrics (to ensure the models accurately predict the data), but also special complexity measures, thus ensuring that the resulting models reveal the data's underlying structure in a way that's understandable from a human perspective. This facilitates reasoning and favors the odds of getting insights about the data-generating system, as well as improving generalisability and extrapolation behaviour by preventing overfitting. Accuracy and simplicity may be left as two separate objectives of the regression—in which case the optimum solutions form a Pareto front—or they may be combined into a single objective by means of a model selection principle such as minimum description length. It has been proven that symbolic regression is an NP-hard problem. Nevertheless, if the sought-for equation is not too complex it is possible to solve the symbolic regression problem exactly by generating every possible function (built from some predefined set of operators) and evaluating them on the dataset in question. == Difference from classical regression == While conventional regression techniques seek to optimize the parameters for a pre-specified model structure, symbolic regression avoids imposing prior assumptions, and instead infers the model from the data. In other words, it attempts to discover both model structures and model parameters. This approach has the disadvantage of having a much larger space to search, because not only the search space in symbolic regression is infinite, but there are an infinite number of models which will perfectly fit a finite data set (provided that the model complexity isn't artificially limited). This means that it will possibly take a symbolic regression algorithm longer to find an appropriate model and parametrization, than traditional regression techniques. This can be attenuated by limiting the set of building blocks provided to the algorithm, based on existing knowledge of the system that produced the data; but in the end, using symbolic regression is a decision that has to be balanced with how much is known about the underlying system. Nevertheless, this characteristic of symbolic regression also has advantages: because the evolutionary algorithm requires diversity in order to effectively explore the search space, the result is likely to be a selection of high-scoring models (and their corresponding set of parameters). Examining this collection could provide better insight into the underlying process, and allows the user to identify an approximation that better fits their needs in terms of accuracy and simplicity. == Benchmarking == === SRBench === In 2021, SRBench was proposed as a large benchmark for symbolic regression. In its inception, SRBench featured 14 symbolic regression methods, 7 other ML methods, and 252 datasets from PMLB. The benchmark intends to be a living project: it encourages the submission of improvements, new datasets, and new methods, to keep track of the state of the art in SR. === SRBench Competition 2022 === In 2022, SRBench announced the competition Interpretable Symbolic Regression for Data Science, which was held at the GECCO conference in Boston, MA. The competition pitted nine leading symbolic regression algorithms against each other on a novel set of data problems and considered different evaluation criteria. The competition was organized in two tracks, a synthetic track and a real-world data track. ==== Synthetic Track ==== In the synthetic track, methods were compared according to five properties: re-discovery of exact expressions; feature selection; resistance to local optima; extrapolation; and sensitivity to noise. Rankings of the methods were: QLattice PySR (Python Symbolic Regression) uDSR (Deep Symbolic Optimization) ==== Real-world Track ==== In the real-world track, methods were trained to build interpretable predictive models for 14-day forecast counts of COVID-19 cases, hospitalizations, and deaths in New York State. These models were reviewed by a subject expert and assigned trust ratings and evaluated for accuracy and simplicity. The ranking of the methods was: uDSR (Deep Symbolic Optimization) QLattice geneticengine (Genetic Engine) == Non-standard methods == Most symbolic regression algorithms prevent combinatorial explosion by implementing evolutionary algorithms that iteratively improve the best-fit expression over many generations. Recently, researchers have proposed algorithms utilizing other tactics in AI. Silviu-Marian Udrescu and Max Tegmark developed the "AI Feynman" algorithm, which attempts symbolic regression by training a neural network to represent the mystery function, then runs tests against the neural network to attempt to break up the problem into smaller parts. For example, if f ( x 1 , . . . , x i , x i + 1 , . . . , x n ) = g ( x 1 , . . . , x i ) + h ( x i + 1 , . . . , x n ) {\displaystyle f(x_{1},...,x_{i},x_{i+1},...,x_{n})=g(x_{1},...,x_{i})+h(x_{i+1},...,x_{n})} , tests against the neural network can recognize the separation and proceed to solve for g {\displaystyle g} and h {\displaystyle h} separately and with different variables as inputs. This is an example of divide and conquer, which reduces the size of the problem to be more manageable. AI Feynman also transforms the inputs and outputs of the mystery function in order to produce a new function which can be solved with other techniques, and performs dimensional analysis to reduce the number of independent variables involved. The algorithm was able to "discover" 100 equations from The Feynman Lectures on Physics, while a leading software using evolutionary algorithms, Eureqa, solved only 71. AI Feynman, in contrast to classic symbolic regression methods, requires a very large dataset in order to first train the neural network and is naturally biased towards equations that are common in elementary physics.

    Read more →
  • Maia and Marco

    Maia and Marco

    Maia and Marco are artificial intelligence used by GMA Network. Unveiled in 2023, they are used to fulfill the role of sports newscasters. == Background == Maia and Marco are artificial intelligence (AI) which take the form of three-dimensional human avatars. Maia makes use of a female avatar while Marco uses a male likeness. They have aesthetic features that are typical to Filipino showbusiness personalities. Among the technologies used in making and operating the AI include image generation, text-to-speech AI voice synthesis/generation, and deep learning face animation. They are also demonstrated to be bilingual, being able to speak in English and Tagalog (Filipino). == Use == The AI pair was unveiled by GMA Network on September 24, 2023, for their coverage of Season 99 of the National Collegiate Athletic Association (NCAA). Fulfilling the role of sports newscasters, Maia and Marco would join GMA's courtside human reporters. The AI pair are scheduled to appear four times a month on GMA's digital media platforms. They will not appear in traditional television broadcast. == Reception == The launch of the Maia and Marco was met with strong reactions. Various journalists and other personalities across the Philippine media industry expressed concern that their employment be at risk with the introduction of AI. The quality of the AI ability to emulate human behavior was characterized by critics as "soulless". GMA responding to concerns has stated that the AI would complement rather than replace its live human journalists including sportscasters. The National Union of Journalists of the Philippines urged dialogue among its peers in the newsroom on policy on how to use AI, which the group acknowledge as "inevitable".

    Read more →
  • Xinhua–Sogou AI news anchor

    Xinhua–Sogou AI news anchor

    Xinhua News Agency and Sogou of China developed an artificial intelligence (AI) for news reporting purposes. The AI was unveiled in 2018. It is touted to be the "world's first AI news anchor". == History == The AI was unveiled at the 2018 World Internet Conference in Wuzhen, Zhejiang, China. The AI devises avatars patterned after real life Xinhua anchors. The AI patterned after Qiu Hao spoke in Chinese, while the one derived from the likeness of Zhang Zhao speaks in English. The unveiling of the AI raised concerns of its impact on employment. Xinhua and Sogou unveiled Xin Xiaomeng, an AI with a female avatar in 2019. People's Daily followed suit by unveiling its own AI newscaster in 2023.

    Read more →
  • AgMES

    AgMES

    The AgMES (Agricultural Metadata Element set) initiative was developed by the Food and Agriculture Organization (FAO) of the United Nations and aims to encompass issues of semantic standards in the domain of agriculture with respect to description, resource discovery, interoperability, and data exchange for different types of information resources. There are numerous other metadata schemas for different types of information resources. The following list contains a list of a few examples: Document-like Information Objects (DLIOs): Dublin Core, Agricultural Metadata Element Set (AgMES) Events: VCalendar Geographic and Regional Information: Geographic information—Metadata ISO/IEC 11179 Standards Persons: Friend-of-a-friend (FOAF), vCard Plant Production and Protection: Darwin Core (1.0 and 2.0) (DwC) AgMES as a namespace is designed to include agriculture specific extensions for terms and refinements from established standard metadata namespaces like Dublin Core, AGLS etc. Thus, to be used for Document-like Information Objects, for example like publications, articles, books, web sites, papers, etc., it will have to be used in conjunction with the standard namespaces mentioned before. The AgMES initiative strives to achieve improved interoperability between information resources in agricultural domain by enabling means for exchange of information. Describing a DLIO with AgMES means exposing its major characteristics and contents in a standard way that can be reused easily in any information system. The more institutions and organizations in the agricultural domain that use AgMES to describe their DLIOs, the easier it will be to interchange data in between information systems like digital libraries and other repositories of agricultural information. == Use of AgMES == Metadata on agricultural Document-like Information Objects (DLIOs) can be created and stored in various formats: embedded in a web site (in the manner as with the HTML meta tag) in a separate metadata database in an XML file in an RDF file AgMES defines elements that can be used to describe a DLIO that can be used together with other metadata standards such as the Dublin Core, the Australian Government Locator Service. A complete list of all elements, refinements and schemes endorsed by AgMES is available from the AgMES website. === Creating application profiles === Application profiles are defined as schemas which consist of data elements drawn from one or more namespaces, combined by implementers, and optimized for a particular local application. Application profiles share the following four characteristics: They draw upon existing pool of metadata definition standards to extract suitable application- or requirement oriented elements. An application profile cannot create new elements. Application profiles specify the application specific details such as the schemes or controlled vocabularies. An application profile also contains information such as the format for the element value, cardinality or data type. Lastly, an application profile can refine standardized definitions as long as it is "semantically narrower or more specific". This capability of application profiles caters to situations where a domain specific terminology is needed to replace a more general one. === Sample application profiles using AgMES === The AGRIS Application Profile is a standard created specifically to enhance the description, exchange and subsequent retrieval of agricultural Document-like Information Objects (DLIOs). It is a format that allows sharing of information across dispersed bibliographic systems and is based on well-known and accepted metadata standards. The Event Application Profile is a standard created to allow members of the Agricultural community to 'know' about an upcoming event and guide them to the event Web site where they can find further information. The information communicated is thus minimum yet interoperable across domains and organizations. == AgMES and the semantic web == One of the advantages of the AgMES metadata schema is the ability to link between the metadata element and controlled vocabularies. The use of controlled vocabulary provides a "known" set of options to the indexer (and the search programmer) as to how the field can be filled out. Often the values may come from a specific thesaurus (e.g. AGROVOC) or classification schemes (e.g. the AGRIS/CARIS classification scheme) etc. Thanks to the possibility to use controlled vocabularies for metadata elements, the user is provided with the most precise information. In this context, work is also being carried out on exploiting the power of controlled vocabularies expressed as using URIs and machine-understandable semantics. In this context, FAO is promoting the Agricultural Ontology Service (AOS) initiative with the objective of expressing more semantics within the traditional thesaurus AGROVOC and build a Concept Server as a repository from which it will be always possible to extract traditional KOS.

    Read more →
  • Adversarial stylometry

    Adversarial stylometry

    Adversarial stylometry is the practice of altering writing style to reduce the potential for stylometry to discover the author's identity or their characteristics. This task is also known as authorship obfuscation or authorship anonymisation. Stylometry poses a significant privacy challenge in its ability to unmask anonymous authors or to link pseudonyms to an author's other identities, which, for example, creates difficulties for whistleblowers, activists, and hoaxers and fraudsters. The privacy risk is expected to grow as machine learning techniques and text corpora develop. All adversarial stylometry shares the core idea of faithfully paraphrasing the source text so that the meaning is unchanged but the stylistic signals are obscured. Such a faithful paraphrase is an adversarial example for a stylometric classifier. Several broad approaches to this exist, with some overlap: imitation, substituting the author's own style for another's; translation, applying machine translation with the hope that this eliminates characteristic style in the source text; and obfuscation, deliberately modifying a text's style to make it not resemble the author's own. Manually obscuring style is possible, but laborious; in some circumstances, it is preferable or necessary. Automated tooling, either semi- or fully-automatic, could assist an author. How best to perform the task and the design of such tools is an open research question. While some approaches have been shown to be able to defeat particular stylometric analyses, particularly those that do not account for the potential of adversariality, establishing safety in the face of unknown analyses is an issue. Ensuring the faithfulness of the paraphrase is a critical challenge for automated tools. It is uncertain if the practice of adversarial stylometry is detectable in itself. Some studies have found that particular methods produced signals in the output text, but a stylometrist who is uncertain of what methods may have been used may not be able to reliably detect them. == History == Rao & Rohatgi (2000), an early work in adversarial stylometry, identified machine translation as a possibility, but noted that the quality of translators available at the time presented severe challenges. Kacmarcik & Gamon (2006) is another early work. Brennan, Afroz & Greenstadt (2012) performed the first evaluation of adversarial stylometric methods on actual texts. Brennan & Greenstadt (2009) introduced the first corpus of adversarially authored texts specifically for evaluating stylometric methods; other corpora include the International Imitation Hemingway Competition, the Faux Faulkner contest, and the hoax blog A Gay Girl in Damascus. == Motivations == Rao & Rohatgi (2000) suggest that short, unattributed documents (i.e., anonymous posts) are not at risk of stylometric identification, but pseudonymous authors who have not practiced adversarial stylometry in producing corpuses of thousands of words may be vulnerable. Narayanan et al. (2012) attempted large-scale deanonymisation of 100,000 blog authors with mixed results: the identifications were significantly better than chance, but only accurately matched the blog and author a fifth of the time; identification improved with the number of posts written by the author in the corpus. Even if an author is not identified, some of their characteristics may still be deduced stylometrically, or stylometry may narrow the anonymity set of potential authors sufficiently for other information to complete the identification. Detecting author characteristics (e.g., gender or age) is often simpler than identifying an author from a large, possibly open, set of candidates. Modern machine learning techniques offer powerful tools for identification; further development of corpora and computational stylometric techniques are likely to raise further privacy issues. Gröndahl & Asokan (2020a) say that the general validity of the hypothesis underlying stylometry—that authors have invariant, content-independent 'style fingerprints'—is uncertain, but "the deanonymisation attack is a real privacy concern". Those interested in practicing adversarial stylometry and stylistic deception include whistleblowers avoiding retribution; journalists and activists; perpetrators of frauds and hoaxes; authors of fake reviews; literary forgers; criminals disguising their identity from investigators; and, generally, anyone with a desire for anonymity or pseudonymity. Authors, or agents acting on behalf of authors, may also attempt to remove stylistic clues to author characteristics (e.g., race or gender) so that knowledge of those characteristics cannot be used for discrimination (e.g., through algorithmic bias). Another possible use for adversarial stylometry is in disguising automatically generated text as human-authored. == Methods == With imitation, the author attempts to mislead stylometry by matching their style to another author's. An incomplete imitation, where some of the true author's unique characteristics appear alongside the imitated author's, can be a detectable signal for the use of adversarial stylometry. Imitation can be performed automatically with style transfer systems, though this typically requires a large corpus in the target style for the system to learn from. Another approach is translation, which employs machine translation of a source text to eliminate characteristic style, often through multiple translators in sequence to produce a round-trip translation. Such chained translation can lead to texts being significantly altered, even to the point of incomprehensibility; improved translation tools reduce this risk. More simply-structured texts can be easier to machine translate without losing the original meaning. Machine translation blurs into direct stylistic imitation or obfuscation achieved through automated style transfer, which can be viewed as a "translation" with the same language as input and output. With low-quality translation tools, an author can be required to manually correct major translation errors while avoiding the hazard of re-introducing stylistic characteristics. Wang, Juola & Riddell (2022) found that gross errors introduced by Google Translate were rare, but more common with several intermediate translations—however, occasional simple or short sentences and misspellings in the source text appeared verbatim in the output, potentially providing an identifying signal. Chain translation can leave characteristic traces of its application in a document, which may allow reconstruction of the intermediate languages used and the number of translation steps performed. Obfuscation involves deliberately changing the style of a text to reduce its similarity to other texts by some metric; this may be performed at the time of writing by conscious modification, or as part of a revision process with feedback from the metric being targeted as an input to decide when the text has been sufficiently obfuscated. In contrast to translation, complex texts can offer more opportunities for effective obfuscation without altering meaning, and likewise genres with more permissible variation allow more obfuscation. However, longer texts are harder to thoroughly obfuscate. Obfuscation can blend into imitation if the author develops a novel target style, distinct from their original style. With respect to masking author characteristics, obfuscation may aim to achieve a union (adding signals for imitated characteristics) or an intersection (removing signals and normalising) of other authors' styles. Avoiding the author's own idiosyncrasies and producing a "normalised" text is a critical obfuscatory step: an author may have a unique tendency to misspell certain words, use particular variants, or to format a document in a characteristic way. Stylometric signals vary in how simply they can be adversarially masked; an author may easily change their vocabulary by conscious choice, but altering the pattern of grammar or the letter frequency in their text may be harder to achieve, though Juola & Vescovi (2011) report that imitation typically succeeds at masking more characteristics than obfuscation. Automated obfuscation may require large amounts of training data written by the author. Concerning automated implementations of adversarial stylometry, two possible implementations are rule-based systems for paraphrasing; and encoder–decoder architectures, where the text passes through an intermediate format that is (intended to be) style-neutral. Another division in automated methods is whether there is feedback from an identification system or not. With such feedback, finding paraphrases for author masking has been characterised as a heuristic search problem, exploring textual variants until the result is stylistically sufficiently far (in the case of obfuscation) or near (in the case of imitation), which then constitutes an adversarial example for that identification system. == Evaluation == How

    Read more →
  • Z.ai

    Z.ai

    Knowledge Atlas Technology Joint Stock Co., Ltd., branded internationally as Z.ai, is a Chinese technology company specializing in artificial intelligence (AI). The company was formerly known as Zhipu AI outside China until its rebranding in 2025. Z.ai's flagship product is the GLM (General Language Model) family of large language models, which the company has released under the free and open-source MIT License since July 2025. As of 2024, it is one of China's "AI tiger" companies by investors and considered to be the third-largest LLM market player in China's AI industry according to the International Data Corporation. In January 2025, the United States Commerce Department blacklisted the company in its Entity List due to national security concerns. == History == Founded in 2019, the startup company began from Tsinghua University and was later spun out as an independent company. Researchers published an Association for Computational Linguistics conference paper in May 2022 introducing the GLM (General Language Model) training algorithm, which uses an "autoregressive blank infilling" strategy that creates cloze tests by randomly removing segments of input text and trains the model to autoregressively regenerate the removed text. In 2023, it raised 2.5 billion yuan (approx. 350 million in USD) from Alibaba Group and Tencent, along with Meituan, Ant Group, Xiaomi, and HongShan. In March 2024, Zhipu AI announced it was developing a Sora-like technology to achieve artificial general intelligence (AGI). In May 2024, the Saudi Arabian finance firm Prosperity7 Ventures, LLC participated in a USD $400 million financing round for Zhipu AI with a valuation of approximately 3 billion USD. In July 2024, they debuted the Ying text-to-video model. Zhipu released GLM-4-Plus in August 2024. In October 2024, Zhipu released GLM-4-Voice, an end-to-end speech large language model that can adjust its tone or dialect. Zhipu disclosed in April 2025 that it had started preparing for its initial public offering (IPO) and released two models under the free and open-source MIT License. In May 2025, the company sealed a 61.28 million yuan deal from the Chinese government for city projects in Hangzhou. In July 2025, Zhipu AI released GLM-4.5 and GLM-4.5 Air, their next generation language models, and the company rebranded itself as Z.ai internationally. In August 2025, Z.ai announced that their GLM models are compatible with Huawei's Ascend processors. On August 11, 2025, Z.ai released a new vision-language model (VLM) with a total of 106B parameters, GLM-4.5V. In late September 2025, the company released GLM-4.6 using China's domestic chips such as those from Cambricon Technologies. Z.ai released GLM-4.6V and GLM-4.7 in December 2025. That same year, the company changed its official name to Knowledge Atlas Technology JSC Ltd. On 8 January 2026, Z.ai held its IPO on the Hong Kong Stock Exchange to become a listed company. It is considered to be China's first major LLM company that went through an IPO. On February 11, 2026, Z.ai released GLM-5. In late February 2026, Z.ai's shares fell by 23%, and had a shortage of compute resources, leading to user complaints and Z.ai issuing a public call for support. Z.ai also restricted new user signups. In late March, 2026, Z.ai released the GLM-5.1 model to subscription users. On April 8th, 2026, Z.ai released GLM-5.1 as open-source. The same day, Z.ai increased its API prices by 10%, but maintained a lower price than its United States competitor Anthropic's Opus 4.6 model. On release, the company's share price increased 11.5%. == Description == Z.ai provides the following products and services: General Language Model (commonly abbreviated as GLM; formerly known as ChatGLM), a series of pre-trained dialogue models initially developed by Zhipu AI and Tsinghua KEG in 2023. GLM 4.5, released in July 2025 by Z.ai, can run on eight NVIDIA H20 chips. The release of GLM-4.6 in late September 2025 marked the first integration of FP8 and Int4 quantization on Cambricon chips. It also supports native FP8 on Moore Threads GPUs. Ying, a text-to-video model that generates image and text prompts into a six-second video clip for around 30 seconds. AutoGLM, an AI agent application that uses voice commands to complete tasks within a smartphone. The app can analyze complex tasks such as ordering an item from a nearby store and repeating an order based from the user's shopping history. AMiner, created by Jie Tang (co-founder of Z.ai) in March 2006, now owned by Z.ai. Z.ai has offices in the Middle East, United Kingdom, Singapore, and Malaysia, along with innovation center projects across Southeast Asia (2025). In January 2025, the United States Commerce Department added the company to its Entity List, citing national security concerns. == Models ==

    Read more →
  • RightsCon

    RightsCon

    RightsCon is an annual conference on digital rights hosted by Access Now. It convenes international leaders and organizations to discuss global problems including internet censorship, the regulation of algorithms, electronic surveillance, the ethics of technology, online hate speech, content moderation, cyberwarfare, and more. == History == The conference was first convened by Access (today, Access Now) in Silicon Valley in 2011, with the intention of gathering civil society to discuss impacts of the growing tech industry on digital rights and human rights. It sought the participation of leaders from both industry (including companies such as Twitter, Google, Mozilla, and Comcast) and civil society organizations (such as the Electronic Frontier Foundation and New America). Keynote speakers included the then-Assistant Secretary of State, Michael Posner; Egyptian blogger and political prisoner, Alaa Abd El-Fattah; and then-director of public policy at Google, Bob Boorstin. RightsCon organizers have sought to ensure the event is accessible to attendees from across the globe, particularly global majority countries, informing the decision to hold the conference in Asia, the Middle East, and Latin America. === Online convenings === In 2020, RightsCon was to be held in San José, Costa Rica, but due to the COVID-19 pandemic, the meeting took place in an online format. In 2021, the 10th edition of RightsCon was again held online from Monday, June 7 to Friday, June 11, 2021, due to the continued global COVID-19 pandemic which altered several digital rights physical meetings. The topics for RightsCon2021 included: Artificial Intelligence (AI), automation, data protection and user control, digital futures, democracy, elections, new business models, content control, peacebuilding, censorship, internet shutdowns, freedom of the media and many others were discussed by several digital rights organizations and individuals. === 2026 cancellation === The 14th RightsCon was scheduled to be held in Zambia from May 5 to 8, 2026. On April 29, 2026, the Zambian government abruptly postponed the conference, writing in a statement that the postponement was "necessitated by the need for comprehensive disclosure […] relating to key thematic issues proposed for discussion during the Summit." In May 2026, the conference was cancelled due to pressure from the Chinese government. In a statement the same day, Access Now wrote that it was "told that diplomats from the People's Republic of China (PRC) were putting pressure on the Government of Zambia because Taiwanese civil society participants were planning to join us in person." == List of conferences == Past RightsCon conferences include:

    Read more →
  • Greg Brockman

    Greg Brockman

    Gregory Brockman (born November 29, 1987) is an American entrepreneur and software engineer. He is co-founder and president of OpenAI. He began his career at Stripe in 2010, upon leaving MIT, and became CTO in 2013. He left Stripe in 2015 to co-found OpenAI, where he also served as CTO. == Early life == Brockman was born in Thompson, North Dakota, and attended Red River High School, where he excelled in mathematics, chemistry, and computer science. He won a silver medal in the 2006 International Chemistry Olympiad and became the first finalist from North Dakota to participate in the Intel science talent search since 1973. In 2003, 2005, and 2007, he attended Canada/USA Mathcamp, a summer program for mathematically talented high-school students. In 2008, Brockman enrolled at Harvard University but left a year later, briefly enrolling at the Massachusetts Institute of Technology. == Career == In 2010, he dropped out of MIT to join Stripe, a company founded by Patrick Collison, his MIT classmate, and John Collison. In 2013, he became Stripe's first CTO, while the company grew from 5 to 205 employees. Brockman left Stripe in May 2015. === OpenAI === Brockman met with Sam Altman and Elon Musk, and led the recruiting of the OpenAI founding team. Many of its members, including Ilya Sutskever, were top AI research talent that left high paying jobs for the opportunity at OpenAI. He co-founded OpenAI in December 2015 alongside Altman, Sutskever and others. The company initially operated from Brockman's living room. He led various projects at OpenAI, including OpenAI Gym and OpenAI Five, a Dota 2 bot. On February 14, 2019, OpenAI announced that they had developed a new large language model called GPT-2, but kept it private due to their concern for its potential misuse. They released the model to a limited group of beta testers in May 2019. On March 14, 2023, in a live video demo, Brockman unveiled GPT-4, the fourth iteration in the GPT series. On November 17, 2023, alongside the firing of Sam Altman, Brockman was told he had been removed from the board. Sutskever supplied the board with a document of alleged bullying by Brockman. Mira Murati said Brockman's relationship with Altman made it impossible for her to do her job, and Altman had already "fielded many requests from OpenAI employees to rein in Brockman". Brockman was to report to Murati, but on November 17, he announced that he had quit the company. On November 20, 2023, Microsoft CEO Satya Nadella announced that Brockman and Altman would join Microsoft to lead a new advanced AI research team. The following day, after a deal was reached to reinstate Altman as CEO, Brockman returned to OpenAI. Brockman took a sabbatical from August to November 2024. === Elon Musk lawsuit === Jury selection for OpenAI cofounder Elon Musk's lawsuit against OpenAI and its current executives, including Brockman, began on April 27, 2026. On April 28, 2026, trial testimony was by now underway, with Elon Musk beginning his testimony against Altman and OpenAI. On April 30, 2026 Musk would enter his third day of testimony. == Personal life == In November 2019 after a year of dating, Brockman married Anna at OpenAI's offices on a workday. Ilya Sutskever officiated. == Political activities == Brockman and his wife were the biggest donors to Donald Trump's Super PAC, MAGA Inc., in 2025 with each of them donating US$12.5 million. Brockman and his wife also donated $50 million to Leading the Future, a super PAC dedicated to AI deregulation that he helped found with Andreessen Horowitz co-founders Marc Andreessen and Ben Horowitz. OpenAI publicly expressed openness to increased regulatory oversight and has a policy against donating to such Super PACs.

    Read more →
  • Actor-critic algorithm

    Actor-critic algorithm

    The actor-critic algorithm (AC) is a family of reinforcement learning (RL) algorithms that combine policy-based RL algorithms such as policy gradient methods, and value-based RL algorithms such as value iteration, Q-learning, SARSA, and TD learning. An AC algorithm consists of two main components: an "actor" that determines which actions to take according to a policy function, and a "critic" that evaluates those actions according to a value function. Some AC algorithms are on-policy, some are off-policy. Some apply to either continuous or discrete action spaces. Some work in both cases. == Overview == The actor-critic methods can be understood as an improvement over pure policy gradient methods like REINFORCE via introducing a baseline. === Actor === The actor uses a policy function π ( a | s ) {\displaystyle \pi (a|s)} , while the critic estimates either the value function V ( s ) {\displaystyle V(s)} , the action-value Q-function Q ( s , a ) , {\displaystyle Q(s,a),} the advantage function A ( s , a ) {\displaystyle A(s,a)} , or any combination thereof. The actor is a parameterized function π θ {\displaystyle \pi _{\theta }} , where θ {\displaystyle \theta } are the parameters of the actor. The actor takes as argument the state of the environment s {\displaystyle s} and produces a probability distribution π θ ( ⋅ | s ) {\displaystyle \pi _{\theta }(\cdot |s)} . If the action space is discrete, then ∑ a π θ ( a | s ) = 1 {\displaystyle \sum _{a}\pi _{\theta }(a|s)=1} . If the action space is continuous, then ∫ a π θ ( a | s ) d a = 1 {\displaystyle \int _{a}\pi _{\theta }(a|s)da=1} . The goal of policy optimization is to improve the actor. That is, to find some θ {\displaystyle \theta } that maximizes the expected episodic reward J ( θ ) {\displaystyle J(\theta )} : J ( θ ) = E π θ [ ∑ t = 0 T γ t r t ] {\displaystyle J(\theta )=\mathbb {E} _{\pi _{\theta }}\left[\sum _{t=0}^{T}\gamma ^{t}r_{t}\right]} where γ {\displaystyle \gamma } is the discount factor, r t {\displaystyle r_{t}} is the reward at step t {\displaystyle t} , and T {\displaystyle T} is the time-horizon (which can be infinite). The goal of policy gradient method is to optimize J ( θ ) {\displaystyle J(\theta )} by gradient ascent on the policy gradient ∇ J ( θ ) {\displaystyle \nabla J(\theta )} . As detailed on the policy gradient method page, there are many unbiased estimators of the policy gradient: ∇ θ J ( θ ) = E π θ [ ∑ 0 ≤ j ≤ T ∇ θ ln ⁡ π θ ( A j | S j ) ⋅ Ψ j | S 0 = s 0 ] {\displaystyle \nabla _{\theta }J(\theta )=\mathbb {E} _{\pi _{\theta }}\left[\sum _{0\leq j\leq T}\nabla _{\theta }\ln \pi _{\theta }(A_{j}|S_{j})\cdot \Psi _{j}{\Big |}S_{0}=s_{0}\right]} where Ψ j {\textstyle \Psi _{j}} is a linear sum of the following: ∑ 0 ≤ i ≤ T ( γ i R i ) {\textstyle \sum _{0\leq i\leq T}(\gamma ^{i}R_{i})} . γ j ∑ j ≤ i ≤ T ( γ i − j R i ) {\textstyle \gamma ^{j}\sum _{j\leq i\leq T}(\gamma ^{i-j}R_{i})} : the REINFORCE algorithm. γ j ∑ j ≤ i ≤ T ( γ i − j R i ) − b ( S j ) {\textstyle \gamma ^{j}\sum _{j\leq i\leq T}(\gamma ^{i-j}R_{i})-b(S_{j})} : the REINFORCE with baseline algorithm. Here b {\displaystyle b} is an arbitrary function. γ j ( R j + γ V π θ ( S j + 1 ) − V π θ ( S j ) ) {\textstyle \gamma ^{j}\left(R_{j}+\gamma V^{\pi _{\theta }}(S_{j+1})-V^{\pi _{\theta }}(S_{j})\right)} : TD(1) learning. γ j Q π θ ( S j , A j ) {\textstyle \gamma ^{j}Q^{\pi _{\theta }}(S_{j},A_{j})} . γ j A π θ ( S j , A j ) {\textstyle \gamma ^{j}A^{\pi _{\theta }}(S_{j},A_{j})} : Advantage Actor-Critic (A2C). γ j ( R j + γ R j + 1 + γ 2 V π θ ( S j + 2 ) − V π θ ( S j ) ) {\textstyle \gamma ^{j}\left(R_{j}+\gamma R_{j+1}+\gamma ^{2}V^{\pi _{\theta }}(S_{j+2})-V^{\pi _{\theta }}(S_{j})\right)} : TD(2) learning. γ j ( ∑ k = 0 n − 1 γ k R j + k + γ n V π θ ( S j + n ) − V π θ ( S j ) ) {\textstyle \gamma ^{j}\left(\sum _{k=0}^{n-1}\gamma ^{k}R_{j+k}+\gamma ^{n}V^{\pi _{\theta }}(S_{j+n})-V^{\pi _{\theta }}(S_{j})\right)} : TD(n) learning. γ j ∑ n = 1 ∞ λ n − 1 1 − λ ⋅ ( ∑ k = 0 n − 1 γ k R j + k + γ n V π θ ( S j + n ) − V π θ ( S j ) ) {\textstyle \gamma ^{j}\sum _{n=1}^{\infty }{\frac {\lambda ^{n-1}}{1-\lambda }}\cdot \left(\sum _{k=0}^{n-1}\gamma ^{k}R_{j+k}+\gamma ^{n}V^{\pi _{\theta }}(S_{j+n})-V^{\pi _{\theta }}(S_{j})\right)} : TD(λ) learning, also known as GAE (generalized advantage estimate). This is obtained by an exponentially decaying sum of the TD(n) learning terms. === Critic === In the unbiased estimators given above, certain functions such as V π θ , Q π θ , A π θ {\displaystyle V^{\pi _{\theta }},Q^{\pi _{\theta }},A^{\pi _{\theta }}} appear. These are approximated by the critic. Since these functions all depend on the actor, the critic must learn alongside the actor. The critic is learned by value-based RL algorithms. For example, if the critic is estimating the state-value function V π θ ( s ) {\displaystyle V^{\pi _{\theta }}(s)} , then it can be learned by any value function approximation method. Let the critic be a function approximator V ϕ ( s ) {\displaystyle V_{\phi }(s)} with parameters ϕ {\displaystyle \phi } . The simplest example is TD(1) learning, which trains the critic to minimize the TD(1) error: δ i = R i + γ V ϕ ( S i + 1 ) − V ϕ ( S i ) {\displaystyle \delta _{i}=R_{i}+\gamma V_{\phi }(S_{i+1})-V_{\phi }(S_{i})} The critic parameters are updated by gradient descent on the squared TD error: ϕ ← ϕ − α ∇ ϕ ( δ i ) 2 = ϕ + α δ i ∇ ϕ V ϕ ( S i ) {\displaystyle \phi \leftarrow \phi -\alpha \nabla _{\phi }(\delta _{i})^{2}=\phi +\alpha \delta _{i}\nabla _{\phi }V_{\phi }(S_{i})} where α {\displaystyle \alpha } is the learning rate. Note that the gradient is taken with respect to the ϕ {\displaystyle \phi } in V ϕ ( S i ) {\displaystyle V_{\phi }(S_{i})} only, since the ϕ {\displaystyle \phi } in γ V ϕ ( S i + 1 ) {\displaystyle \gamma V_{\phi }(S_{i+1})} constitutes a moving target, and the gradient is not taken with respect to that. This is a common source of error in implementations that use automatic differentiation, and requires "stopping the gradient" at that point. Similarly, if the critic is estimating the action-value function Q π θ {\displaystyle Q^{\pi _{\theta }}} , then it can be learned by Q-learning or SARSA. In SARSA, the critic maintains an estimate of the Q-function, parameterized by ϕ {\displaystyle \phi } , denoted as Q ϕ ( s , a ) {\displaystyle Q_{\phi }(s,a)} . The temporal difference error is then calculated as δ i = R i + γ Q θ ( S i + 1 , A i + 1 ) − Q θ ( S i , A i ) {\displaystyle \delta _{i}=R_{i}+\gamma Q_{\theta }(S_{i+1},A_{i+1})-Q_{\theta }(S_{i},A_{i})} . The critic is then updated by θ ← θ + α δ i ∇ θ Q θ ( S i , A i ) {\displaystyle \theta \leftarrow \theta +\alpha \delta _{i}\nabla _{\theta }Q_{\theta }(S_{i},A_{i})} The advantage critic can be trained by training both a Q-function Q ϕ ( s , a ) {\displaystyle Q_{\phi }(s,a)} and a state-value function V ϕ ( s ) {\displaystyle V_{\phi }(s)} , then let A ϕ ( s , a ) = Q ϕ ( s , a ) − V ϕ ( s ) {\displaystyle A_{\phi }(s,a)=Q_{\phi }(s,a)-V_{\phi }(s)} . Although, it is more common to train just a state-value function V ϕ ( s ) {\displaystyle V_{\phi }(s)} , then estimate the advantage by A ϕ ( S i , A i ) ≈ ∑ j ∈ 0 : n − 1 γ j R i + j + γ n V ϕ ( S i + n ) − V ϕ ( S i ) {\displaystyle A_{\phi }(S_{i},A_{i})\approx \sum _{j\in 0:n-1}\gamma ^{j}R_{i+j}+\gamma ^{n}V_{\phi }(S_{i+n})-V_{\phi }(S_{i})} Here, n {\displaystyle n} is a positive integer. The higher n {\displaystyle n} is, the more lower is the bias in the advantage estimation, but at the price of higher variance. The Generalized Advantage Estimation (GAE) introduces a hyperparameter λ {\displaystyle \lambda } that smoothly interpolates between Monte Carlo returns ( λ = 1 {\displaystyle \lambda =1} , high variance, no bias) and 1-step TD learning ( λ = 0 {\displaystyle \lambda =0} , low variance, high bias). This hyperparameter can be adjusted to pick the optimal bias-variance trade-off in advantage estimation. It uses an exponentially decaying average of n-step returns with λ {\displaystyle \lambda } being the decay strength. == Variants == Asynchronous Advantage Actor-Critic (A3C): Parallel and asynchronous version of A2C. Soft Actor-Critic (SAC): Incorporates entropy maximization for improved exploration. Deep Deterministic Policy Gradient (DDPG): Specialized for continuous action spaces.

    Read more →
  • NLWeb

    NLWeb

    Natural Language Web or NLWeb was introduced by Microsoft in 2025. It is an open Python project designed to simplify the creation of natural language interfaces for websites. It enables users to query website contents using natural language, similar to interacting with an AI assistant. Every instance functions as a Model Context Protocol (MCP) server allowing websites to make their content discoverable and accessible to AI agents and other participants. NLWeb leverages existing web standards like Schema.org and RSS to build conversational capabilities of processing user queries through language models, performing semantic searches against website content and generating natural responses. It is platform-agnostic, running on all major systems and connecting to any vector database. Content to be indexed by NLWeb works best when it is organized in an AI friendly way. This means short, interlinked and semantically annotated articles work best. Initial adopters of NLWeb include TripAdvisor, Shopify, Eventbrite, and Hearst.

    Read more →
  • Integrated Operations in the High North

    Integrated Operations in the High North

    Integrated Operations in the High North (IOHN, IO High North or IO in the High North) is a unique collaboration project that during a four-year period starting May 2008 is working on designing, implementing and testing a Digital Platform for what in the upstream oil and gas industry is called the next or second generation of Integrated Operations. The work on the Digital platform is focussed on capture, transfer and integration of real-time data from the remote production installations to the decision makers. A risk evaluation across the whole chain is also included. The platform is based on open standards and enables a higher degree of interoperability. Requirements for the digital platform come from use cases defined within the Drilling and Completion, Reservoir and Production and Operations and Maintenance domains. The platform will subsequently be demonstrated through pilots within these three domains. The project was a sidecar initiative for Statoil’s Global Operations Data Integration Project. This was part of a very ambitious Master Plan IT (MapIT), which also included the Real Time Visualization (RTV) tender. The RTV tender aimed to be an ontology-aware information workspace for a wide range of disciplines, as per the IO Capability Stack. Additionally, the sidecar project aimed to increase the semantic web knowledge among suppliers in the industry. This new platform is considered an important enabler for safe and sustainable operations in remote, vulnerable and hazardous areas such as the High North, but the technology is clearly also applicable in more general applications. The IOHN project consortium consists of 23 participants, including operators, service providers, software vendors, technology providers, research institutions and universities. In addition, the Norwegian Defence Force is working with the project to resolve common infrastructural and interoperability challenges. The project is managed by Det Norske Veritas (DNV). Nils Sandsmark was the project manager during the initiation and start-up phase. Frédéric Verhelst took over as project manager from the beginning of 2009. Financing comes from the participants and the Research Council of Norway (RCN) for parts of the project (GOICT and AutoConRig). == Participants == The consortium consists of the following 22 participants (in alphabetical order):

    Read more →
  • Interactive activation and competition networks

    Interactive activation and competition networks

    Interactive activation and competition (IAC) networks are artificial neural networks used to model memory and intuitive generalizations. They are made up of nodes or artificial neurons which are arrayed and activated in ways that emulate the behaviors of human memory. The IAC model is used by the parallel distributed processing (PDP) Group and is associated with James L. McClelland and David E. Rumelhart; it is described in detail in their book Explorations in Parallel Distributed Processing: A Handbook of Models, Programs, and Exercises. This model does not contradict any currently known biological data or theories, and its performance is close enough to human performance as to warrant further investigation.

    Read more →
  • Commission on Enhancing National Cybersecurity

    Commission on Enhancing National Cybersecurity

    The President's Commission on Enhancing National Cybersecurity is a Presidential Commission formed on April 13, 2016, to develop a plan for protecting cyberspace, and America's economic reliance on it. The commission released its final report in December 2016. The report made recommendations regarding the intertwining roles of the military, government administration and the private sector in providing cyber security. Chairman Donilon said of the report that its coverage "is unusual in the breadth of issues" with which it deals. == Recommendations == The report made sixteen major recommendations with fifty-three specific action items broadly grouped under six areas: Protecting the information and digital infrastructure Investing in the secure growth of information and digital infrastructure Consumer information access Building the cybersecurity workforce Building a secure governmental cybersecurity framework Keeping interconnectivity open, fair, competitive, and secure The Commission found that strong authentication systems were mandatory for adequate cybersecurity, not just for the government, but for all commercial systems, and private individuals. The commission also stressed remote identity proofing and security for the Internet of things (IoT). Finding that technicians who know cybersecurity and can protect systems are few and in short supply, the commission recommended nationally supported training programs to produce an adequate workforce, as well as increasing the level of expertise in the existing workforce. The Commission highlighted the importance of partnerships between government and the private sector as a powerful tool for encouraging the technology, policies and practices we need to secure and grow the digital economy. (page 2) Some criticised the commission's work as lacking an understanding of cybersecurity and not being cognizant of "cyber reality" and the cost of some of the action items, but others found the report constructive and meaningful. == Commission members == The initial members of the Commission are: Tom Donilon, former Assistant to the President and National Security Advisor (Chair) Sam Palmisano, former CEO of IBM (Vice Chair) General Keith Alexander, CEO of IronNet Cybersecurity, former Director of the National Security Agency and former Commander of U.S. Cyber Command Annie Antón, Professor and Chair of the School of Interactive Computing at Georgia Tech. Ajay Banga, President and CEO of MasterCard Steven Chabinsky, General Counsel and Chief Risk Officer of CrowdStrike Patrick Gallagher, Chancellor of the University of Pittsburgh and former Director of the National Institute of Standards and Technology Peter Lee, Corporate Vice President, Microsoft Research Herbert Lin, Senior Research Scholar for Cyber Policy and Security at the Stanford Center for International Security and Cooperation and Research Fellow at the Hoover Institution Heather Murren, former member of the Financial Crisis Inquiry Commission and co-founder of the Nevada Cancer Institute Joe Sullivan, Chief Security Officer of Uber and former Chief Security Officer of Facebook Maggie Wilderotter, Executive Chairman of Frontier Communications == Follow-on == Incoming President Trump has indicated that he wants a full review of U.S. cyber protection policy. == Notes and references ==

    Read more →
  • Open Mind Common Sense

    Open Mind Common Sense

    Open Mind Common Sense (OMCS) is an artificial intelligence project based at the Massachusetts Institute of Technology (MIT) Media Lab whose goal is to build and utilize a large commonsense knowledge base from the contributions of many thousands of people across the Web. It has been active from 1999 to 2016. Since its founding, it has accumulated more than a million English facts from over 15,000 contributors in addition to knowledge bases in other languages. Much of OMCS's software is built on three interconnected representations: the natural language corpus that people interact with directly, a semantic network built from this corpus called ConceptNet, and a matrix-based representation of ConceptNet called AnalogySpace that can infer new knowledge using dimensionality reduction. The knowledge collected by Open Mind Common Sense has enabled research projects at MIT and elsewhere. == History == The project was the brainchild of Marvin Minsky, Push Singh, Catherine Havasi, and others. Development work began in September 1999, and the project opened to the Internet a year later. Havasi described it in her dissertation as "an attempt to ... harness some of the distributed human computing power of the Internet, an idea which was then only in its early stages." The original OMCS was influenced by the website Everything2 and its predecessor, and presents a minimalist interface that is inspired by Google. Push Singh would have become a professor at the MIT Media Lab and lead the Common Sense Computing group in 2007, but committed suicide on February 28, 2006. The project is currently run by the Digital Intuition Group at the MIT Media Lab under Catherine Havasi. == Database and website == There are many different types of knowledge in OMCS. Some statements convey relationships between objects or events, expressed as simple phrases of natural language: some examples include "A coat is used for keeping warm", "The sun is very hot", and "The last thing you do when you cook dinner is wash your dishes". The database also contains information on the emotional content of situations, in such statements as "Spending time with friends causes happiness" and "Getting into a car wreck makes one angry". OMCS contains information on people's desires and goals, both large and small, such as "People want to be respected" and "People want good coffee". Originally, these statements could be entered into the Web site as unconstrained sentences of text, which had to be parsed later. The current version of the Web site collects knowledge only using more structured fill-in-the-blank templates. OMCS also makes use of data collected by the Game With a Purpose "Verbosity". In its native form, the OMCS database is simply a collection of these short sentences that convey some common knowledge. In order to use this knowledge computationally, it has to be transformed into a more structured representation. == ConceptNet == ConceptNet is a semantic network based on the information in the OMCS database. ConceptNet is expressed as a directed graph whose nodes are concepts, and whose edges are assertions of common sense about these concepts. Concepts represent sets of closely related natural language phrases, which could be noun phrases, verb phrases, adjective phrases, or clauses. ConceptNet is created from the natural-language assertions in OMCS by matching them against patterns using a shallow parser. Assertions are expressed as relations between two concepts, selected from a limited set of possible relations. The various relations represent common sentence patterns found in the OMCS corpus, and in particular, every "fill-in-the-blanks" template used on the knowledge-collection Web site is associated with a particular relation. The data structures that make up ConceptNet were significantly reorganized in 2007, and published as ConceptNet 3. The Software Agents group currently distributes a database and API for the new version 4.0. In 2010, OMCS co-founder and director Catherine Havasi, with Robyn Speer, Dennis Clark and Jason Alonso, created Luminoso, a text analytics software company that builds on ConceptNet. It uses ConceptNet as its primary lexical resource in order to help businesses make sense of and derive insight from vast amounts of qualitative data, including surveys, product reviews and social media. == Machine learning tools == The information in ConceptNet can be used as a basis for machine learning algorithms. One representation, called AnalogySpace, uses singular value decomposition to generalize and represent patterns in the knowledge in ConceptNet, in a way that can be used in AI applications. Its creators distribute a Python machine learning toolkit called Divisi for performing machine learning based on text corpora, structured knowledge bases such as ConceptNet, and combinations of the two. == Comparison to other projects == Other similar projects include Never-Ending Language Learning, Mindpixel (discontinued), Cyc, Learner, SenticNet, Freebase, YAGO, DBpedia, and Open Mind 1001 Questions, which have explored alternative approaches to collecting knowledge and providing incentive for participation. The Open Mind Common Sense project differs from Cyc because it has focused on representing the common sense knowledge it collected as English sentences, rather than using a formal logical structure. ConceptNet is described by one of its creators, Hugo Liu, as being structured more like WordNet than Cyc, due to its "emphasis on informal conceptual-connectedness over formal linguistic-rigor".

    Read more →
  • Semantic parameterization

    Semantic parameterization

    Semantic parameterization is a conceptual modeling process for expressing natural language descriptions of a domain in first-order predicate logic. The process yields a formalization of natural language sentences in Description Logic to answer the who, what and where questions in the Inquiry-Cycle Model (ICM) developed by Colin Potts and his colleagues at the Georgia Institute of Technology. The parameterization process complements the Knowledge Acquisition and autOmated Specification (KAOS) method, which formalizes answers to the when, why and how ICM questions in Temporal Logic, to complete the ICM formalization. The artifacts used in the parameterization process include a dictionary that aligns the domain lexicon with unique concepts, distinguishing between synonyms and polysemes, and several natural language patterns that aid in mapping common domain descriptions to formal specifications. == Relationship to other theories == Semantic Parameterization defines a meta-model consisting of eight roles that are domain-independent and reusable. Seven of these roles correspond to Jeffrey Gruber's thematic relations and case roles in Charles Fillmore's case grammar: The Inquiry-Cycle Model (ICM) was introduced to drive elicitation between engineers and stakeholders in requirements engineering. The ICM consists of who, what, where, why, how and when questions. All but the when questions, which require a Temporal Logic to represent such phenomena, have been aligned with the meta-model in semantic parameterization using Description Logic (DL). == Introduction with Example == The semantic parameterization process is based on Description Logic, wherein the TBox is composed of words in a dictionary, including nouns, verbs, and adjectives, and the ABox is partitioned into two sets of assertions: 1) those assertions that come from words in the natural language statement, called the grounding, and 2) those assertions that are inferred by the (human) modeler, called the meta-model. Consider the following unstructured natural language statement (UNLS) (see Breaux et al. for an extended discussion): UNLS1.0 The customer1,1 must not share2,2 the access-code3,3 of the customer1,1 with someone4,4 who is not the provider5,4. The modeler first identifies intensional and extensional polysemes and synonyms, denoted by the subscripts: the first subscript uniquely refers to the intensional index, i.e., the same first index in two or more words refer to the same concept in the TBox; the second subscript uniquely refers to the extensional index, i.e., two same second index in two or more words refer to the same individual in the ABox. This indexing step aligns words in the statement and concepts in the dictionary. Next, the modeler identifies concepts from the dictionary to compose the meta-model. The following table illustrates the complete DL expression that results from applying semantic parameterization.

    Read more →