AI Chatbot Zoom

AI Chatbot Zoom — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Record sealing

    Record sealing

    Record sealing is the process of making public records inaccessible to the public. In many cases, a person with a sealed record gains the legal right to deny or not acknowledge anything to do with the arrest and the legal proceedings from the case itself. Records are commonly sealed in a number of situations: Sealed birth records (typically after adoption or determination of paternity) Juvenile criminal records may be sealed Other types of cases involving juveniles may be sealed, anonymized, or pseudonymized ("impounded"); e.g., child sex offense or custody cases Cases using witness protection information may be partly sealed Cases involving trade secrets Cases involving state secrets == Filing under seal in US court == Normally, records should not be filed under seal without a court permission. However, FRCP 5.2 requires that sensitive text – like Social Security number, Taxpayer Identification Number, birthday, bank accounts, and children’s names – should be redacted off the filings made with the court and accompanying exhibits. A person making a redacted filing can file an unredacted copy under seal, or the Court can choose to order later that an additional filing be made under seal without redaction. Alternately, the filing party may ask the court’s permission to file some exhibits completely under seal. When the document is filed "under seal", it should have a clear indication for the court clerk to file it separately – most often by stamping words "Filed Under Seal" on the bottom of each page. Person making filing should also provide instructions to the court clerk that the document needs to be filed "under seal". Courts often have specific requirements to these filings in their Local Rules. == Difference from expungement == Expungement, which is a physical destruction, namely a complete erasure of one's criminal records, and therefore usually carries a higher standard, differs from record sealing, which is only to restrict the public's access to records, so that only certain law enforcement agencies or courts, under special circumstances, will have access to them. A record seal will greatly improve the chance of employment, as employers will not have access to damning records. There are occasions, like expungement, where one can truthfully state under oath that they have never been convicted before. Most of the time, a record seal has more relaxed requirements than an expungement. If an expungement is not allowed with a case, then sealing a record may be the best bet. Different states have different terms for what constitutes sealing of a record. == Cybersecurity incidents involving sealed records == Several cybersecurity incidents have demonstrated that sealed court documents are not always secure in practice, with vulnerabilities and data breaches exposing sensitive information. In January 2021, following the SolarWinds cyber attack, the U.S. Bankruptcy Court United States District Court for the District of Nevada announced that its Case Management/Electronic Case Files CM/ECF system had been potentially compromised. The judiciary stated that additional safeguards were being implemented to protect filings, and that the review of the incident and its impact was ongoing. Reports noted that the breach raised concerns about exposure of highly sensitive and sealed documents submitted through the CM/ECF system. In 2023, security researcher Jason Parker, following a tip from an activist, identified flaws in online court systems that exposed sealed records including confidential testimony and medical records through publicly accessible portals. In 2024, a cyber intrusion targeting attorneys in a civil case involving Representative Matt Gaetz led to the unauthorized access and leak of sealed depositions and related records. The breach exposed confidential testimony and financial records, some of which were later reported by news outlets, raising concerns about the security of electronically stored legal materials and the handling of sealed filings. In 2025, multiple reports confirmed that the federal judiciary's CM/ECF and PACER (law) filing system was compromised, exposing sealed indictments, confidential informant information, and other sensitive filings. Some courts temporarily reverted to paper-based filing to mitigate the risks of further disclosure. The FBI later confirmed that the breach had exposed sealed records, and investigators suspected foreign state actors were involved. == GAO publications referencing sealed records == Closed Criminal Plea and Sentencing Proceedings (1983) – Reviewed Department of Justice policies on closing plea and sentencing hearings. GAO noted that sealed transcripts should be unsealed once the reasons for closure no longer applied. Information on Plea Agreements and Settlements in Defense Procurement Fraud Cases (1992) – Examined outcomes of procurement fraud prosecutions. GAO observed that in some instances the results were sealed from public access. Military Recruiting: More Needs to Be Done to Better Screen Applicants and Detect Fraud (1999) – Investigated fraudulent enlistments in the armed forces. The report highlighted that sealed juvenile records often prevented recruiters from discovering prior offenses. Social Security Numbers: Governments Could Do More to Reduce Display in Public Records (2004) – Analyzed risks associated with SSN availability in state and local records. GAO pointed out that some categories of records, such as adoption proceedings, were sealed and less likely to expose identifiers. Social Security Numbers: Stronger Safeguards Needed to Protect Privacy (2005 testimony) – Testimony before Congress reiterating concerns over SSN exposure in public records, while noting that sealed categories (e.g., adoption) were exceptions. U.S. Supreme Court: Policies and Perspectives on Video and Audio Coverage of Appellate Court Proceedings (2016) – Surveyed appellate court policies on courtroom media coverage. The report acknowledged distinctions between public filings, confidential submissions, and sealed materials. Evictions: National Data Are Limited and Challenging to Collect (2024) – Examined nationwide eviction data. GAO reported that in some states eviction records may be sealed or expunged, limiting researchers' ability to compile datasets. DOD Fraud Risk Management: Enhanced Data and Collaboration Could Improve Efforts (2024) – Reviewed Department of Defense fraud-risk management. GAO noted that some adjudicative records in its dataset were sealed, restricting completeness of oversight data.

    Read more →
  • INDIAai

    INDIAai

    INDIAai is a web portal launched by the Government of India on 07 March 2024 for artificial intelligence-related developments in India. It is known as the National AI Portal of India, which was jointly started by the Ministry of Electronics and Information Technology (MeitY), the National e-Governance Division (NeGD) and the National Association of Software and Service Companies (NASSCOM) with support from the Department of School Education and Literacy (DoSE&L) and Ministry of Human Resource Development. == History == The portal was launched on 30 May 2020, by Ravi Shankar Prasad, the Union Minister for Electronics and IT, Law and Justice and Communications, on the first anniversary of the second tenure of Prime Minister Narendra Modi-led government. A national program for the youth, 'Responsible AI for Youth', was also launched on the same day. As of 2022, the website was visited by more than 4.5 lakh users with 1.2 million page views. It has 1151 articles on artificial intelligence, 701 news stories, 98 reports, 95 case studies and 213 videos on its portal. It maintains a database on AI ecosystem of India featuring 121 government initiatives and 281 startups. In May 2022, INDIAai released a book titled 'AI for Everyone' that covers the basics of AI. Cabinet chaired by the Prime Minister Narendra Modi has approved the comprehensive national-level IndiaAI mission with a budget outlay of Rs.10,371.92 crore. The Mission will be implemented by ‘IndiaAI’ Independent Business Division (IBD) under Digital India Corporation (DIC). == Objective and features == It aims to function as a one-stop portal for all AI-related development in India. The platform publishes resources such as articles, news, interviews, and investment funding news and events for AI startups, AI companies, and educational firms related to artificial intelligence in India. It also distributes documents, case studies, and research reports. Additionally, the platform provides education and employment opportunities related to AI. It offers AI courses, both free and paid.

    Read more →
  • Granular computing

    Granular computing

    Granular computing is an emerging computing paradigm of information processing that concerns the processing of complex information entities called "information granules", which arise in the process of data abstraction and derivation of knowledge from information or data. Generally speaking, information granules are collections of entities that usually originate at the numeric level and are arranged together due to their similarity, functional or physical adjacency, indistinguishability, coherency, or the like. At present, granular computing is more a theoretical perspective than a coherent set of methods or principles. As a theoretical perspective, it encourages an approach to data that recognizes and exploits the knowledge present in data at various levels of resolution or scales. In this sense, it encompasses all methods which provide flexibility and adaptability in the resolution at which knowledge or information is extracted and represented. == Types of granulation == As mentioned above, granular computing is not an algorithm or process; there is no particular method that is called "granular computing". It is rather an approach to looking at data that recognizes how different and interesting regularities in the data can appear at different levels of granularity, much as different features become salient in satellite images of greater or lesser resolution. On a low-resolution satellite image, for example, one might notice interesting cloud patterns representing cyclones or other large-scale weather phenomena, while in a higher-resolution image, one misses these large-scale atmospheric phenomena but instead notices smaller-scale phenomena, such as the interesting pattern that is the streets of Manhattan. The same is generally true of all data: At different resolutions or granularities, different features and relationships emerge. The aim of granular computing is to try to take advantage of this fact in designing more effective machine-learning and reasoning systems. There are several types of granularity that are often encountered in data mining and machine learning, and we review them below: === Value granulation (discretization/quantization) === One type of granulation is the quantization of variables. It is very common that in data mining or machine-learning applications the resolution of variables needs to be decreased in order to extract meaningful regularities. An example of this would be a variable such as "outside temperature" (temp), which in a given application might be recorded to several decimal places of precision (depending on the sensing apparatus). However, for purposes of extracting relationships between "outside temperature" and, say, "number of health-club applications" (club), it will generally be advantageous to quantize "outside temperature" into a smaller number of intervals. ==== Motivations ==== There are several interrelated reasons for granulating variables in this fashion: Based on prior domain knowledge, there is no expectation that minute variations in temperature (e.g., the difference between 80–80.7 °F (26.7–27.1 °C)) could have an influence on behaviors driving the number of health-club applications. For this reason, any "regularity" which our learning algorithms might detect at this level of resolution would have to be spurious, as an artifact of overfitting. By coarsening the temperature variable into intervals the difference between which we do anticipate (based on prior domain knowledge) might influence number of health-club applications, we eliminate the possibility of detecting these spurious patterns. Thus, in this case, reducing resolution is a method of controlling overfitting. By reducing the number of intervals in the temperature variable (i.e., increasing its grain size), we increase the amount of sample data indexed by each interval designation. Thus, by coarsening the variable, we increase sample sizes and achieve better statistical estimation. In this sense, increasing granularity provides an antidote to the so-called curse of dimensionality, which relates to the exponential decrease in statistical power with increase in number of dimensions or variable cardinality. Independent of prior domain knowledge, it is often the case that meaningful regularities (i.e., which can be detected by a given learning methodology, representational language, etc.) may exist at one level of resolution and not at another. For example, a simple learner or pattern recognition system may seek to extract regularities satisfying a conditional probability threshold such as p ( Y = y j | X = x i ) ≥ α . {\displaystyle p(Y=y_{j}|X=x_{i})\geq \alpha .} In the special case where α = 1 , {\displaystyle \alpha =1,} this recognition system is essentially detecting logical implication of the form X = x i → Y = y j {\displaystyle X=x_{i}\rightarrow Y=y_{j}} or, in words, "if X = x i , {\displaystyle X=x_{i},} then Y = y j {\displaystyle Y=y_{j}} ". The system's ability to recognize such implications (or, in general, conditional probabilities exceeding threshold) is partially contingent on the resolution with which the system analyzes the variables. As an example of this last point, consider the feature space shown to the right. The variables may each be regarded at two different resolutions. Variable X {\displaystyle X} may be regarded at a high (quaternary) resolution wherein it takes on the four values { x 1 , x 2 , x 3 , x 4 } {\displaystyle \{x_{1},x_{2},x_{3},x_{4}\}} or at a lower (binary) resolution wherein it takes on the two values { X 1 , X 2 } . {\displaystyle \{X_{1},X_{2}\}.} Similarly, variable Y {\displaystyle Y} may be regarded at a high (quaternary) resolution or at a lower (binary) resolution, where it takes on the values { y 1 , y 2 , y 3 , y 4 } {\displaystyle \{y_{1},y_{2},y_{3},y_{4}\}} or { Y 1 , Y 2 } , {\displaystyle \{Y_{1},Y_{2}\},} respectively. At the high resolution, there are no detectable implications of the form X = x i → Y = y j , {\displaystyle X=x_{i}\rightarrow Y=y_{j},} since every x i {\displaystyle x_{i}} is associated with more than one y j , {\displaystyle y_{j},} and thus, for all x i , {\displaystyle x_{i},} p ( Y = y j | X = x i ) < 1. {\displaystyle p(Y=y_{j}|X=x_{i})<1.} However, at the low (binary) variable resolution, two bilateral implications become detectable: X = X 1 ↔ Y = Y 1 {\displaystyle X=X_{1}\leftrightarrow Y=Y_{1}} and X = X 2 ↔ Y = Y 2 {\displaystyle X=X_{2}\leftrightarrow Y=Y_{2}} , since every X 1 {\displaystyle X_{1}} occurs iff Y 1 {\displaystyle Y_{1}} and X 2 {\displaystyle X_{2}} occurs iff Y 2 . {\displaystyle Y_{2}.} Thus, a pattern recognition system scanning for implications of this kind would find them at the binary variable resolution, but would fail to find them at the higher quaternary variable resolution. ==== Issues and methods ==== It is not feasible to exhaustively test all possible discretization resolutions on all variables in order to see which combination of resolutions yields interesting or significant results. Instead, the feature space must be preprocessed (often by an entropy analysis of some kind) so that some guidance can be given as to how the discretization process should proceed. Moreover, one cannot generally achieve good results by naively analyzing and discretizing each variable independently, since this may obliterate the very interactions that we had hoped to discover. A sample of papers that address the problem of variable discretization in general, and multiple-variable discretization in particular, is as follows: Chiu, Wong & Cheung (1991), Bay (2001), Liu et al. (2002), Wang & Liu (1998), Zighed, Rabaséda & Rakotomalala (1998), Catlett (1991), Dougherty, Kohavi & Sahami (1995), Monti & Cooper (1999), Fayyad & Irani (1993), Chiu, Cheung & Wong (1990), Nguyen & Nguyen (1998), Grzymala-Busse & Stefanowski (2001), Ting (1994), Ludl & Widmer (2000), Pfahringer (1995), An & Cercone (1999), Chiu & Cheung (1989), Chmielewski & Grzymala-Busse (1996), Lee & Shin (1994), Liu & Wellman (2002), Liu & Wellman (2004). === Variable granulation (clustering/aggregation/transformation) === Variable granulation is a term that could describe a variety of techniques, most of which are aimed at reducing dimensionality, redundancy, and storage requirements. We briefly describe some of the ideas here, and present pointers to the literature. ==== Variable transformation ==== A number of classical methods, such as principal component analysis, multidimensional scaling, factor analysis, and structural equation modeling, and their relatives, fall under the genus of "variable transformation." Also in this category are more modern areas of study such as dimensionality reduction, projection pursuit, and independent component analysis. The common goal of these methods in general is to find a representation of the data in terms of new variables, which are a linear or nonlinear transformation of the original variables, and in which important stati

    Read more →
  • Highway network

    Highway network

    In machine learning, the Highway Network was the first working very deep feedforward neural network with hundreds of layers, much deeper than previous neural networks. It uses skip connections modulated by learned gating mechanisms to regulate information flow, inspired by long short-term memory (LSTM) recurrent neural networks. The advantage of the Highway Network over other deep learning architectures is its ability to overcome or partially prevent the vanishing gradient problem, thus improving its optimization. Gating mechanisms are used to facilitate information flow across the many layers ("information highways"). Highway Networks have found use in text sequence labeling and speech recognition tasks. In 2014, the state of the art was training deep neural networks with 20 to 30 layers. Stacking too many layers led to a steep reduction in training accuracy, known as the "degradation" problem. In 2015, two techniques were developed to train such networks: the Highway Network (published in May), and the residual neural network, or ResNet (December). ResNet behaves like an open-gated Highway Net. == Model == The model has two gates in addition to the H ( W H , x ) {\displaystyle H(W_{H},x)} gate: the transform gate T ( W T , x ) {\displaystyle T(W_{T},x)} and the carry gate C ( W C , x ) {\displaystyle C(W_{C},x)} . The latter two gates are non-linear transfer functions (specifically sigmoid by convention). The function H {\displaystyle H} can be any desired transfer function. The carry gate is defined as: C ( W C , x ) = 1 − T ( W T , x ) {\displaystyle C(W_{C},x)=1-T(W_{T},x)} while the transform gate is just a gate with a sigmoid transfer function. == Structure == The structure of a hidden layer in the Highway Network follows the equation: y = H ( x , W H ) ⋅ T ( x , W T ) + x ⋅ C ( x , W C ) = H ( x , W H ) ⋅ T ( x , W T ) + x ⋅ ( 1 − T ( x , W T ) ) {\displaystyle {\begin{aligned}y=H(x,W_{H})\cdot T(x,W_{T})+x\cdot C(x,W_{C})\\=H(x,W_{H})\cdot T(x,W_{T})+x\cdot (1-T(x,W_{T}))\end{aligned}}} == Related work == Sepp Hochreiter analyzed the vanishing gradient problem in 1991 and attributed to it the reason why deep learning did not work well. To overcome this problem, Long Short-Term Memory (LSTM) recurrent neural networks have residual connections with a weight of 1.0 in every LSTM cell (called the constant error carrousel) to compute y t + 1 = F ( x t ) + x t {\textstyle y_{t+1}=F(x_{t})+x_{t}} . During backpropagation through time, this becomes the residual formula y = F ( x ) + x {\textstyle y=F(x)+x} for feedforward neural networks. This enables training very deep recurrent neural networks with a very long time span t. A later LSTM version published in 2000 modulates the identity LSTM connections by so-called "forget gates" such that their weights are not fixed to 1.0 but can be learned. In experiments, the forget gates were initialized with positive bias weights, thus being opened, addressing the vanishing gradient problem. As long as the forget gates of the 2000 LSTM are open, it behaves like the 1997 LSTM. The Highway Network of May 2015 applies these principles to feedforward neural networks. It was reported to be "the first very deep feedforward network with hundreds of layers". It is like a 2000 LSTM with forget gates unfolded in time, while the later Residual Nets have no equivalent of forget gates and are like the unfolded original 1997 LSTM. If the skip connections in Highway Networks are "without gates," or if their gates are kept open (activation 1.0), they become Residual Networks. The residual connection is a special case of the "short-cut connection" or "skip connection" by Rosenblatt (1961) and Lang & Witbrock (1988) which has the form x ↦ F ( x ) + A x {\displaystyle x\mapsto F(x)+Ax} . Here the randomly initialized weight matrix A does not have to be the identity mapping. Every residual connection is a skip connection, but almost all skip connections are not residual connections. The original Highway Network paper not only introduced the basic principle for very deep feedforward networks, but also included experimental results with 20, 50, and 100 layers networks, and mentioned ongoing experiments with up to 900 layers. Networks with 50 or 100 layers had lower training error than their plain network counterparts, but no lower training error than their 20 layers counterpart (on the MNIST dataset, Figure 1 in ). No improvement on test accuracy was reported with networks deeper than 19 layers (on the CIFAR-10 dataset; Table 1 in ). The ResNet paper, however, provided strong experimental evidence of the benefits of going deeper than 20 layers. It argued that the identity mapping without modulation is crucial and mentioned that modulation in the skip connection can still lead to vanishing signals in forward and backward propagation (Section 3 in ). This is also why the forget gates of the 2000 LSTM were initially opened through positive bias weights: as long as the gates are open, it behaves like the 1997 LSTM. Similarly, a Highway Net whose gates are opened through strongly positive bias weights behaves like a ResNet. The skip connections used in modern neural networks (e.g., Transformers) are dominantly identity mappings.

    Read more →
  • TIMIT

    TIMIT

    TIMIT is a corpus of phonemically and lexically transcribed speech of American English speakers of different sexes and dialects. Each transcribed element has been delineated in time. TIMIT was designed to further acoustic-phonetic knowledge and automatic speech recognition systems. It was commissioned by DARPA and corpus design was a joint effort between the Massachusetts Institute of Technology, SRI International, and Texas Instruments (TI). The speech was recorded at TI, transcribed at MIT, and verified and prepared for publishing by the National Institute of Standards and Technology (NIST). There is also a telephone bandwidth version called NTIMIT (Network TIMIT). TIMIT and NTIMIT are not freely available — either membership of the Linguistic Data Consortium, or a monetary payment, is required for access to the dataset. == Data == TIMIT contains ~5 hours of speech, of 10 sentences spoken by each of 630 speakers. The sentences were randomly sampled from a corpus of 2342 sentences. The speakers were native speakers of American English, classified under 8 major dialect regions: New England, Northern, North Midland, South Midland, Southern, New York City, Western, Army Brat (moved around). The speakers were 70% male and 30% female. Recordings were made in a noise-isolated recording booth at Texas Instrument, using a semi-automatic computer system (STEROIDS) to control the presentation of prompts to the speaker and the recording. Two-channel recordings were made using a Sennheiser HMD 414 headset-mounted microphone and a Brüel & Kjær 1/2" far-field pressure microphone (#4165). The speech was digitized at a sample rate of 20 kHz then and downsampled to 16 kHz. == History == The TIMIT telephone corpus was an early attempt to create a database with speech samples. It was published in the year 1988 on CD-ROM and consists of only 10 sentences per speaker. Two 'dialect' sentences were read by each speaker, as well as another 8 sentences selected from a larger set Each sentence averages 3 seconds long and is spoken by 630 different speakers. It was the first notable attempt in creating and distributing a speech corpus and the overall project has produced costs of 1.5 million US$. An update was released in October 1990. It included full 630-speaker corpus; checked and corrected transcriptions; word-alignment transcriptions; NIST SPHERE-headered waveform files and header manipulation software; phonemic dictionary; new test and training subsets balanced for dialectal and phonetic coverage; more extensive documentation. The full name of the project is DARPA-TIMIT Acoustic-Phonetic Continuous Speech Corpus and the acronym TIMIT stands for Texas Instruments/Massachusetts Institute of Technology. The main reason why a corpus of telephone speech was created was to train speech recognition software. In the Blizzard challenge, different software has the obligation to convert audio recordings into textual data and the TIMIT corpus was used as a standardized baseline.

    Read more →
  • Compute (machine learning)

    Compute (machine learning)

    In machine learning and deep learning, compute is the amount of computing power or computational resources required to train machine learning models and large language models. More broadly, compute is the computational power or resources necessary for a computer or computer program to function. == Definition == Compute is commonly defined as the amount of computing power or computational resources required to train machine learning and large language models. The term "compute" has also been more broadly applied to cloud computing, referencing processing power, memory, networking, storage, and other resources required for the computation of any program. Compute is measured in petaflop/s-days and is used to document AI training. A petaflop/s-day (pfs-day) consists of performing 1015 neural net operations per second for one day, or a total of about 1020 operations. The compute-time product serves as a mental convenience, similar to kilowatt-hour for energy. An amount of compute is meant to give an idea of the number of actual operations performed. == History == In a 2018 analysis titled "AI and compute", artificial intelligence company OpenAI introduced the concept of compute. OpenAI identified two eras of training AI systems in terms of compute-usage. From 1959 to 2012, compute roughly followed Moore’s law. Between 2012 and 2018, the amount of compute used in the largest AI training runs increased exponentially, growing by more than 300,000 times — roughly doubling every 3.4 months. By comparison, Moore’s Law doubled every two years over the same period. One of the largest models, released in 2020, used 600,000 times more computing power than the 2012 model. After 2020, compute growth began to slow down, with the compute needed for the largest AI models continuing to slow down in 2023. The notion of compute has become increasingly used from the mid-2020s onwards. == Compute growth and AI progress == Larger AI models trained on more data and using more computational resources, tend to perform better. This happens even if the algorithms themselves remain unchanged. As early as 2018, OpenAI noted the exponential increase in compute to be have a key role in AI progress. OpenAI considers three factors drive the advance of AI: algorithmic innovation, data, and the amount of compute available for training. AI models with more compute not only improve in the tasks they were trained on but can develop emergent abilities. Incremental improvements can lead to more abrupt leaps in capabilities. AI provider SpaceXAI said in 2026 that their AI progress is driven by compute and used it a key metric in the AI training of its supercomputer Colossus, the which contains 1 million GPUs. Anthropic has a contract of $1.25 billion per month with SpaceXAI to buy all the compute capacity at Colossus 1 data center. === Criticism and policy === Increasing, promoting or constraining progress in artificial intelligence has often be done via controlling the amount of compute. Policymarkers have enacted policies and provided support to make compute resources more accessible to domestic AI researchers. In a January 2022 report, the Center for Security and Emerging Technology (CSET) suggested to institutions that increasingly powerful and generalizable AI (AGI) will likely require other strategies than maximizing compute. Some AI researchers are also concerned that government might exclusively focus on scaling compute instead of other strategies. The CSET has reported on the various bottlenecks which could explain why deep learning needs for compute have slow down: training is expensive and training extremely large models generates traffic jams across many processors that are difficult to manage. there is a limited supply of AI chips (see AI chip memory shortage). CSET advances that the main resource is human capital, specifically talented researchers — according to a 2023 published survey of more than 400 AI researchers, academic and private sector workers. The survey found that AI researchers are not primarily or exclusively constrained by compute access. However, both academic and industry AI researchers equally report concerns that insufficient compute could prevent them from contributing meaningfully to AI research in the future. High compute users are more concerned about compute access. When asked about which resource provided by the government would be the most useful to them, some AI researchers select compute, other prefer grant funding. For this goal, CSET advised policymakers to ensure that even researchers with smaller budgets could effectively contribute to AI research. Other proposed strategies include using contemporary AI algorithms, managing modern AI infrastructure or focusing on interdisciplinary work between the AI field and other fields of computer science. A 2024 study on compute access found that academic-only AI research teams often have less compute intensive research topics, especially foundation models, compared to industry AI labs. As a consequence, academia is likely to play a smaller role in advancing such techniques. The researchers suggest nationally-sponsored computing infrastructure as well as open science initiatives to boost academic compute access. === Data === A 2022 study found that current large language models are significantly under-trained, a consequence of focusing on scaling language models whilst keeping the amount of training data constant. By training over 400 language models of various parameter and token size, they found that "for compute-optimal training", the model size and the number of training tokens should ideally be scaled equally: for every doubling of model size the number of training tokens should also be doubled.

    Read more →
  • Embodied agent

    Embodied agent

    In artificial intelligence, an embodied agent, also sometimes referred to as an interface agent, is an intelligent agent that interacts with the environment through a physical body within that environment. Agents that are represented graphically with a body, for example a human or a cartoon animal, are also called embodied agents, although they have only virtual, not physical, embodiment. A branch of artificial intelligence focuses on empowering such agents to interact autonomously with human beings and the environment. Mobile robots are one example of physically embodied agents; Ananova and Microsoft Agent are examples of graphically embodied agents. Embodied conversational agents are embodied agents (usually with a graphical front-end as opposed to a robotic body) that are capable of engaging in conversation with one another and with humans employing the same verbal and nonverbal means that humans do (such as gesture, facial expression, and so forth). == Embodied conversational agents == Embodied conversational agents are a form of intelligent user interface. Graphically embodied agents aim to unite gesture, facial expression and speech to enable face-to-face communication with users, providing a powerful means of human-computer interaction. == Advantages == Face-to-face communication allows communication protocols that give a much richer communication channel than other means of communicating. It enables pragmatic communication acts such as conversational turn-taking, facial expression of emotions, information structure and emphasis, visualization and iconic gestures, and orientation in a three-dimensional environment. This communication takes place through both verbal and non-verbal channels such as gaze, gesture, spoken intonation and body posture. Research has found that users prefer a non-verbal visual indication of an embodied system's internal state to a verbal indication, demonstrating the value of additional non-verbal communication channels. As well as this, the face-to-face communication involved in interacting with an embodied agent can be conducted alongside another task without distracting the human participants, instead improving the enjoyment of such an interaction. Furthermore, the use of an embodied presentation agent results in improved recall of the presented information. Embodied agents also provide a social dimension to the interaction. Humans willingly ascribe social awareness to computers, and thus interaction with embodied agents follows social conventions, similar to human to human interactions. This social interaction both raises the believably and perceived trustworthiness of agents, and increases the user's engagement with the system. Rickenberg and Reeves found that the presence of an embodied agent on a website increased the level of user trust in that website, but also increased users' anxiety and affected their performance, as if they were being watched by a real human. Another effect of the social aspect of agents is that presentations given by an embodied agent are perceived as being more entertaining and less difficult than similar presentations given without an agent. Research shows that perceived enjoyment, followed by perceived usefulness and ease of use, is the major factor influencing user adoption of embodied agents. A study in January 2004 by Byron Reeves at Stanford demonstrated how digital characters could "enhance online experiences" through explaining how virtual characters essentially add a sense of familiarity to the user experience and make it more approachable. This increase in likability in turn helps make the products better, which benefits both the end users and those creating the product. === Applications === The rich style of communication that characterizes human conversation makes conversational interaction with embodied conversational agents ideal for many non-traditional interaction tasks. A familiar application of graphically embodied agents is computer games; embodied agents are ideal for this setting because the richer communication style makes interacting with the agent enjoyable. Embodied conversational agents have also been used in virtual training environments, portable personal navigation guides, interactive fiction and storytelling systems, interactive online characters and automated presenters and commentators. Major virtual assistants like Siri, Amazon Alexa and Google Assistant do not come with any visual embodied representation, which is believed to limit the sense of human presence by users. The U.S. Department of Defense utilizes a software agent called SGT STAR on U.S. Army-run Web sites and Web applications for site navigation, recruitment and propaganda purposes. Sgt. Star is run by the Army Marketing and Research Group, a division operated directly from The Pentagon. Sgt. Star is based upon the ActiveSentry technology developed by Next IT, a Washington-based information technology services company. Other such bots in the Sgt. Star "family" are utilized by the Federal Bureau of Investigation and the Central Intelligence Agency for intelligence gathering purposes.

    Read more →
  • Cross-entropy method

    Cross-entropy method

    The cross-entropy (CE) method is a Monte Carlo method for importance sampling and optimization. It is applicable to both combinatorial and continuous problems, with either a static or noisy objective. The method approximates the optimal importance sampling estimator by repeating two phases: Draw a sample from a probability distribution. Minimize the cross-entropy between this distribution and a target distribution to produce a better sample in the next iteration. Reuven Rubinstein developed the method in the context of rare-event simulation, where tiny probabilities must be estimated, for example in network reliability analysis, queueing models, or performance analysis of telecommunication systems. The method has also been applied to the traveling salesman, quadratic assignment, DNA sequence alignment, max-cut and buffer allocation problems. == Estimation via importance sampling == Consider the general problem of estimating the quantity ℓ = E u [ H ( X ) ] = ∫ H ( x ) f ( x ; u ) d x {\displaystyle \ell =\mathbb {E} _{\mathbf {u} }[H(\mathbf {X} )]=\int H(\mathbf {x} )\,f(\mathbf {x} ;\mathbf {u} )\,{\textrm {d}}\mathbf {x} } , where H {\displaystyle H} is some performance function and f ( x ; u ) {\displaystyle f(\mathbf {x} ;\mathbf {u} )} is a member of some parametric family of distributions. Using importance sampling this quantity can be estimated as ℓ ^ = 1 N ∑ i = 1 N H ( X i ) f ( X i ; u ) g ( X i ) {\displaystyle {\hat {\ell }}={\frac {1}{N}}\sum _{i=1}^{N}H(\mathbf {X} _{i}){\frac {f(\mathbf {X} _{i};\mathbf {u} )}{g(\mathbf {X} _{i})}}} , where X 1 , … , X N {\displaystyle \mathbf {X} _{1},\dots ,\mathbf {X} _{N}} is a random sample from g {\displaystyle g\,} . For positive H {\displaystyle H} , the theoretically optimal importance sampling density (PDF) is given by g ∗ ( x ) = H ( x ) f ( x ; u ) / ℓ {\displaystyle g^{}(\mathbf {x} )=H(\mathbf {x} )f(\mathbf {x} ;\mathbf {u} )/\ell } . This, however, depends on the unknown ℓ {\displaystyle \ell } . The CE method aims to approximate the optimal PDF by adaptively selecting members of the parametric family that are closest (in the Kullback–Leibler sense) to the optimal PDF g ∗ {\displaystyle g^{}} . == Generic CE algorithm == Choose initial parameter vector v ( 0 ) {\displaystyle \mathbf {v} ^{(0)}} ; set t = 1. Generate a random sample X 1 , … , X N {\displaystyle \mathbf {X} _{1},\dots ,\mathbf {X} _{N}} from f ( ⋅ ; v ( t − 1 ) ) {\displaystyle f(\cdot ;\mathbf {v} ^{(t-1)})} Solve for v ( t ) {\displaystyle \mathbf {v} ^{(t)}} , where v ( t ) = argmax v ⁡ 1 N ∑ i = 1 N H ( X i ) f ( X i ; u ) f ( X i ; v ( t − 1 ) ) log ⁡ f ( X i ; v ) {\displaystyle \mathbf {v} ^{(t)}=\mathop {\textrm {argmax}} _{\mathbf {v} }{\frac {1}{N}}\sum _{i=1}^{N}H(\mathbf {X} _{i}){\frac {f(\mathbf {X} _{i};\mathbf {u} )}{f(\mathbf {X} _{i};\mathbf {v} ^{(t-1)})}}\log f(\mathbf {X} _{i};\mathbf {v} )} If convergence is reached then stop; otherwise, increase t by 1 and reiterate from step 2. In several cases, the solution to step 3 can be found analytically. Situations in which this occurs are When f {\displaystyle f\,} belongs to the natural exponential family When f {\displaystyle f\,} is discrete with finite support When H ( X ) = I { x ∈ A } {\displaystyle H(\mathbf {X} )=\mathrm {I} _{\{\mathbf {x} \in A\}}} and f ( X i ; u ) = f ( X i ; v ( t − 1 ) ) {\displaystyle f(\mathbf {X} _{i};\mathbf {u} )=f(\mathbf {X} _{i};\mathbf {v} ^{(t-1)})} , then v ( t ) {\displaystyle \mathbf {v} ^{(t)}} corresponds to the maximum likelihood estimator based on those X k ∈ A {\displaystyle \mathbf {X} _{k}\in A} . == Continuous optimization—example == The same CE algorithm can be used for optimization, rather than estimation. Suppose the problem is to maximize some function S {\displaystyle S} , for example, S ( x ) = e − ( x − 2 ) 2 + 0.8 e − ( x + 2 ) 2 {\displaystyle S(x)={\textrm {e}}^{-(x-2)^{2}}+0.8\,{\textrm {e}}^{-(x+2)^{2}}} . To apply CE, one considers first the associated stochastic problem of estimating P θ ( S ( X ) ≥ γ ) {\displaystyle \mathbb {P} _{\boldsymbol {\theta }}(S(X)\geq \gamma )} for a given level γ {\displaystyle \gamma \,} , and parametric family { f ( ⋅ ; θ ) } {\displaystyle \left\{f(\cdot ;{\boldsymbol {\theta }})\right\}} , for example the 1-dimensional Gaussian distribution, parameterized by its mean μ t {\displaystyle \mu _{t}\,} and variance σ t 2 {\displaystyle \sigma _{t}^{2}} (so θ = ( μ , σ 2 ) {\displaystyle {\boldsymbol {\theta }}=(\mu ,\sigma ^{2})} here). Hence, for a given γ {\displaystyle \gamma \,} , the goal is to find θ {\displaystyle {\boldsymbol {\theta }}} so that D K L ( I { S ( x ) ≥ γ } ‖ f θ ) {\displaystyle D_{\mathrm {KL} }({\textrm {I}}_{\{S(x)\geq \gamma \}}\|f_{\boldsymbol {\theta }})} is minimized. This is done by solving the sample version (stochastic counterpart) of the KL divergence minimization problem, as in step 3 above. It turns out that parameters that minimize the stochastic counterpart for this choice of target distribution and parametric family are the sample mean and sample variance corresponding to the elite samples, which are those samples that have objective function value ≥ γ {\displaystyle \geq \gamma } . The worst of the elite samples is then used as the level parameter for the next iteration. This yields the following randomized algorithm that happens to coincide with the so-called Estimation of Multivariate Normal Algorithm (EMNA), an estimation of distribution algorithm. === Pseudocode === // Initialize parameters μ := −6 σ2 := 100 t := 0 maxits := 100 N := 100 Ne := 10 // While maxits not exceeded and not converged while t < maxits and σ2 > ε do // Obtain N samples from current sampling distribution X := SampleGaussian(μ, σ2, N) // Evaluate objective function at sampled points S := exp(−(X − 2) ^ 2) + 0.8 exp(−(X + 2) ^ 2) // Sort X by objective function values in descending order X := sort(X, S) // Update parameters of sampling distribution via elite samples μ := mean(X(1:Ne)) σ2 := var(X(1:Ne)) t := t + 1 // Return mean of final sampling distribution as solution return μ == Related methods == Simulated annealing Genetic algorithms Harmony search Estimation of distribution algorithm Tabu search Natural Evolution Strategy Ant colony optimization algorithms

    Read more →
  • FMLLR

    FMLLR

    In signal processing, Feature space Maximum Likelihood Linear Regression (fMLLR) is a global feature transform that are typically applied in a speaker adaptive way, where fMLLR transforms acoustic features to speaker adapted features by a multiplication operation with a transformation matrix. In some literature, fMLLR is also known as the Constrained Maximum Likelihood Linear Regression (cMLLR). == Overview == fMLLR transformations are trained in a maximum likelihood sense on adaptation data. These transformations may be estimated in many ways, but only maximum likelihood (ML) estimation is considered in fMLLR. The fMLLR transformation is trained on a particular set of adaptation data, such that it maximizes the likelihood of that adaptation data given a current model-set. This technique is a widely used approach for speaker adaptation in HMM-based speech recognition. Later research also shows that fMLLR is an excellent acoustic feature for DNN/HMM hybrid speech recognition models. The advantage of fMLLR includes the following: the adaptation process can be performed within a pre-processing phase, and is independent of the ASR training and decoding process. this type of adapted feature can be applied to deep neural networks (DNN) to replace traditionally used mel-spectrogram in end-to-end speech recognition models. fMLLR's speaker adaptation process leads to a significant performance boost for ASR models, hence outperforming other transform or features like MFCCs (Mel-Frequency Cepstral Coefficients) and FBANKs (Filter bank) coefficients. fMLLR features can be efficiently realized with speech toolkits like Kaldi. Major problem and disadvantage of fMLLR: when the amount of adaptation data is limited, the transformation matrices tends to easily overfit the given data. == Computing fMLLR transform == Feature transform of fMLLR can be easily computed with the open source speech tool Kaldi, the Kaldi script uses the standard estimation scheme described in Appendix B of the original paper, in particular the section Appendix B.1 "Direct method over rows". In the Kaldi formulation, fMLLR is an affine feature transform of the form x {\displaystyle x} → A {\displaystyle A} x {\displaystyle x} + b {\displaystyle +b} , which can be written in the form x {\displaystyle x} →W x ^ {\displaystyle {\hat {x}}} , where x ^ {\displaystyle {\hat {x}}} = [ x 1 ] {\displaystyle {\begin{bmatrix}x\\1\end{bmatrix}}} is the acoustic feature x {\displaystyle x} with a 1 appended. Note that this differs from some of the literature where the 1 comes first as x ^ {\displaystyle {\hat {x}}} = [ 1 x ] {\displaystyle {\begin{bmatrix}1\\x\end{bmatrix}}} . The sufficient statistics stored are: K = ∑ t , j , m γ j , m ( t ) Σ j m − 1 μ j m x ( t ) + {\displaystyle K=\sum _{t,j,m}\gamma _{j,m}(t)\textstyle \Sigma _{jm}^{-1}\mu _{jm}x(t)^{+}\displaystyle } where Σ j m − 1 {\displaystyle \textstyle \Sigma _{jm}^{-1}\displaystyle } is the inverse co-variance matrix. And for 0 ≤ i ≤ D {\displaystyle 0\leq i\leq D} where D {\displaystyle D} is the feature dimension: G ( i ) = ∑ t , j , m γ j , m ( t ) ( 1 σ j , m 2 ( i ) ) x ( t ) + x ( t ) + T {\displaystyle G^{(i)}=\sum _{t,j,m}\gamma _{j,m}(t)\left({\frac {1}{\sigma _{j,m}^{2}(i)}}\right)x(t)^{+}x(t)^{+T}\displaystyle } For a thorough review that explains fMLLR and the commonly used estimation techniques, see the original paper "Maximum likelihood linear transformations for HMM-based speech recognition ". Note that the Kaldi script that performs the feature transforms of fMLLR differs with by using a column of the inverse in place of the cofactor row. In other words, the factor of the determinant is ignored, as it does not affect the transform result and can causes potential danger of numerical underflow or overflow. == Comparing with other features or transforms == Experiment result shows that by using the fMLLR feature in speech recognition, constant improvement is gained over other acoustic features on various commonly used benchmark datasets (TIMIT, LibriSpeech, etc). In particular, fMLLR features outperform MFCCs and FBANKs coefficients, which is mainly due to the speaker adaptation process that fMLLR performs. In, phoneme error rate (PER, %) is reported for the test set of TIMIT with various neural architectures: As expected, fMLLR features outperform MFCCs and FBANKs coefficients despite the use of different model architecture. Where MLP (multi-layer perceptron) serves as a simple baseline, on the other hand RNN, LSTM, and GRU are all well known recurrent models. The Li-GRU architecture is based on a single gate and thus saves 33% of the computations over a standard GRU model, Li-GRU thus effectively address the gradient vanishing problem of recurrent models. As a result, the best performance is obtained with the Li-GRU model on fMLLR features. == Extract fMLLR features with Kaldi == fMLLR can be extracted as reported in the s5 recipe of Kaldi. Kaldi scripts can certainly extract fMLLR features on different dataset, below are the basic example steps to extract fMLLR features from the open source speech corpora Librispeech. Note that the instructions below are for the subsets train-clean-100,train-clean-360,dev-clean, and test-clean, but they can be easily extended to support the other sets dev-other, test-other, and train-other-500. These instruction are based on the codes provided in this GitHub repository, which contains Kaldi recipes on the LibriSpeech corpora to execute the fMLLR feature extraction process, replace the files under $KALDI_ROOT/egs/librispeech/s5/ with the files in the repository. Install Kaldi. Install Kaldiio. If running on a single machine, change the following lines in $KALDI_ROOT/egs/librispeech/s5/cmd.sh to replace queue.pl to run.pl: Change the data path in run.sh to your LibriSpeech data path, the directory LibriSpeech/ should be under that path. For example: Install flac with: sudo apt-get install flac Run the Kaldi recipe run.sh for LibriSpeech at least until Stage 13 (included), for simplicity you can use the modified run.sh. Copy exp/tri4b/trans. files into exp/tri4b/decode_tgsmall_train_clean_/ with the following command: Compute the fMLLR features by running the following script, the script can also be downloaded here: Compute alignments using: Apply CMVN and dump the fMLLR features to new .ark files, the script can also be downloaded here: Use the Python script to convert Kaldi generated .ark features to .npy for your own dataloader, an example Python script is provided:

    Read more →
  • Virtual intelligence

    Virtual intelligence

    Virtual intelligence (VI) is the term given to artificial intelligence that exists within a virtual world. Many virtual worlds have options for persistent avatars that provide information, training, role-playing, and social interactions. The immersion in virtual worlds provides a platform for VI beyond the traditional paradigm of past user interfaces (UIs). What Alan Turing established as a benchmark for telling the difference between human and computerized intelligence was devoid of visual influences. With today's VI bots, virtual intelligence has evolved past the constraints of past testing into a new level of the machine's ability to demonstrate intelligence. The immersive features of these environments provide nonverbal elements that affect the realism provided by virtually intelligent agents. Virtual intelligence is the intersection of these two technologies: Virtual environments: Immersive 3D spaces provide for collaboration, simulations, and role-playing interactions for training. Many of these virtual environments are currently being used for government and academic projects, including Second Life, VastPark, Olive, OpenSim, Outerra, Oracle's Open Wonderland, Duke University's Open Cobalt, and many others. Some of the commercial virtual worlds are also taking this technology into new directions, including the high-definition virtual world Blue Mars. Artificial intelligence (AI): AI is a branch of computer science that aims to create intelligent machines capable of performing tasks that typically require human intelligence. VI is a type of AI that operates within virtual environments to simulate human-like interactions and responses. == Applications == Cutlass Bomb Disposal Robot: Northrop Grumman developed a virtual training opportunity because of the prohibitive real-world cost and dangers associated with bomb disposal. By replicating a complicated system without having to learn advanced code, the virtual robot has no risk of damage, trainee safety hazards, or accessibility constraints. MyCyberTwin: NASA is among the companies that have used the MyCyberTwin AI technologies. They used it for the Phoenix rover in the virtual world Second Life. Their MyCyberTwin used a programmed profile to relay information about what the Phoenix rover was doing and its purpose. Second China: The University of Florida developed the "Second China" project as an immersive training experience for learning how to interact with the culture and language in a foreign country. Students are immersed in an environment that provides role-playing challenges coupled with language and cultural sensitivities magnified during country-level diplomatic missions or during times of potential conflict or regional destabilization. The virtual training provides participants with opportunities to access information, take part in guided learning scenarios, communicate, collaborate, and role-play. While China was the country for the prototype, this model can be modified for use with any culture to help better understand social and cultural interactions and see how other people think and what their actions imply. Duke School of Nursing Training Simulation: Extreme Reality developed virtual training to test critical thinking with a nurse performing trained procedures to identify critical data to make decisions and performing the correct steps for intervention. Bots are programmed to respond to the nurse's actions as the patient with their conditions improving if the nurse performs the correct actions.

    Read more →
  • Incremental heuristic search

    Incremental heuristic search

    Incremental heuristic search algorithms combine both incremental and heuristic search to speed up searches of sequences of similar search problems, which is important in domains that are only incompletely known or change dynamically. Incremental search has been studied at least since the late 1960s. Incremental search algorithms reuse information from previous searches to speed up the current search and solve search problems potentially much faster than solving them repeatedly from scratch. Similarly, heuristic search has also been studied at least since the late 1960s. Heuristic search algorithms, often based on A, use heuristic knowledge in the form of approximations of the goal distances to focus the search and solve search problems potentially much faster than uninformed search algorithms. The resulting search problems, sometimes called dynamic path planning problems, are graph search problems where paths have to be found repeatedly because the topology of the graph, its edge costs, the start vertex or the goal vertices change over time. So far, three main classes of incremental heuristic search algorithms have been developed: The first class restarts A at the point where its current search deviates from the previous one (example: Fringe Saving A). The second class updates the h-values (heuristic, i.e. approximate distance to goal) from the previous search during the current search to make them more informed (example: Generalized Adaptive A). The third class updates the g-values (distance from start) from the previous search during the current search to correct them when necessary, which can be interpreted as transforming the A search tree from the previous search into the A search tree for the current search (examples: Lifelong Planning A, D, D Lite). All three classes of incremental heuristic search algorithms are different from other replanning algorithms, such as planning by analogy, in that their plan quality does not deteriorate with the number of replanning episodes. == Applications == Incremental heuristic search has been extensively used in robotics, where a larger number of path planning systems are based on either D (typically earlier systems) or D Lite (current systems), two different incremental heuristic search algorithms.

    Read more →
  • AI agent

    AI agent

    In the context of generative artificial intelligence, AI agents (also referred to as compound AI systems or agentic AI) are a class of intelligent agents that can pursue goals, use tools, and take actions with varying degrees of autonomy. In practice, they usually operate within human-defined objectives, constraints, and available tools. == Overview == AI agents possess several key attributes, including goal-directed behavior, natural language interfaces, the capacity to use external tools, and the ability to perform multi-step tasks. Their control flow is frequently driven by large language models (LLMs). Agent systems may also include memory components, planning logic, tool interfaces, and orchestration software for coordinating agent components. AI agents do not have a standard definition. NIST describes agentic AI as an emerging area requiring standards for secure operation, interoperability, and reliable interaction with external systems. A common application of AI agents is task automation: for example, booking travel plans based on a user's prompted request. Companies such as Google, Microsoft and Amazon Web Services have offered platforms for deploying pre-built AI agents. Several protocols have been proposed for standardizing inter-agent communication, with examples including the Model Context Protocol, Gibberlink, and many others. Some of these protocols are also used for connecting agents to external applications. In December 2025, Linux Foundation announced the formation of the Agentic AI Foundation (AAIF), with the goal of ensuring agentic AI evolves transparently and collaboratively. == History == AI agents have been traced back to research from the 1990s, with Harvard professor Milind Tambe noting that the definition of an AI agent was not clear at the time. Researcher Andrew Ng has been credited with spreading the term "agentic" to a wider audience in 2024. == Training and testing == Researchers have attempted to build world models and reinforcement learning environments to train or evaluate AI agents. For example, video games such as Minecraft and No Man's Sky as well as replicas of company websites, have also been used for training such agents. == Autonomous capabilities == The Financial Times compared the autonomy of AI agents to the SAE classification of self-driving cars, likening most applications to level 2 or level 3, with some achieving level 4 in highly specialized circumstances, and level 5 being theoretical. == Cognitive architecture == The following are some internal design options for reasoning within an agent: Retrieval-augmented generation ReAct (Reason + Act) pattern is an iterative process in which an AI agent alternates between reasoning and taking actions, receives observations from the environment or external tools, and integrates these observations into subsequent reasoning steps. Reflexion, which uses an LLM to create feedback on the agent's plan of action and stores that feedback in a memory cache. A tool/agent registry, for organizing software functions or other agents that the agent can use. One-shot model querying, which queries the model once to create the plan of action. === Reference architecture === Ken Huang proposed an AI agent reference architecture, which consists of seven interconnected layers, with each layer building on the functionality of the layers beneath it: Layer 1: Foundation models - provide the core AI engines to power agent capabilities. Layer 2: Data operations - manage the complex data infrastructure required for AI agent operations, including Vector database, data loaders, RAG. Layer 3: Agent frameworks - sophisticated software and tools that simplify the development and management of the AI agents. Layer 4: Deployment and infrastructure - provide the robust technical foundation for running AI agents. Layer 5: Evaluation and observability - focus on assessing the safety and performance of AI agents. Layer 6: Security and compliance - a crucial protective framework ensuring AI agents operate safely, securely, and conform to regulatory boundaries. At this layer security and compliance features embedded into all the AI agent stack layers are integrated together. Layer 7: Agent ecosystem - represents the AI agents' interface with real-world applications and users. == Orchestration patterns == To execute complex tasks, autonomous agents are often integrated with other agents or specialized tools. These configurations, known as orchestration patterns or workflows, include the following: Prompt chaining: A sequence where the output of one step serves as the input for the next. Routing: The classification of an input to direct it to a specialized downstream task or tool. Parallelization: The simultaneous execution of multiple tasks. Sequential processing: A fixed, linear progression of tasks through a predefined pipeline. Planner-critic: An iterative pattern where one agent generates a proposal and another evaluates it to provide feedback for refinement. == Multimodal AI agents == In addition to large language models (LLMs), vision-language models (VLMs) and multimodal foundation models can be used as the basis for agents. In September 2024, Allen Institute for AI released an open-source vision-language model. Nvidia released a framework for developers to use VLMs, LLMs and retrieval-augmented generation for building AI agents that can analyze images and videos, including video search and video summarization. Microsoft released a multimodal agent model – trained on images, video, software user interface interactions, and robotics data – that the company claimed can manipulate software and robots. == Applications == As of April 2025, per the Associated Press, there are few real-world applications of AI agents. As of June 2025, per Fortune, many companies are primarily experimenting with AI agents. The Information divided AI agents into seven archetypes: business-task agents, for acting within enterprise software; conversational agents, which act as chatbots for customer support; research agents, for querying and analyzing information (such as OpenAI Deep Research); analytics agents, for analyzing data to create reports; software developer or coding agents (such as Cursor); domain-specific agents, which include specific subject matter knowledge; and web browser agents (such as OpenAI Operator). By mid-2025, AI agents have been used in video game development, gambling (including sports betting), cryptocurrency wallets (including cryptocurrency trading and meme coins) and social media. In August 2025, New York Magazine described software development as the most definitive use case of AI agents. Likewise, by October 2025, noting a decline in expectations, The Information noted AI coding agents and customer support as the primary use cases by businesses. In November 2025, The Wall Street Journal reported that few companies that deployed AI agents have received a return on investment. === Applications in government === Several government bodies in the United States and United Kingdom have deployed or announced the deployment of agents, at the local and national level. The city of Kyle, Texas deployed an AI agent from Salesforce in March 2025 for 311 customer service. In November 2025, the Internal Revenue Service stated that it would use Agentforce, AI agents from Salesforce, for the Office of Chief Counsel, Taxpayer Advocate Services and the Office of Appeals. That same month, Staffordshire Police announced that they would trial Agentforce agents for handling non-emergency 101 calls in the United Kingdom starting in 2026. In December 2025, the Department of Neighborhoods in Detroit, Michigan, in partnership with a local business, deployed a pilot project in two Detroit districts for an AI agent to be used for customer service calls. In February 2025, Thomas Shedd, the director of the Technology Transformation Services, proposed using AI coding agents across the United States federal government. A recruiter for the Department of Government Efficiency proposed in April 2025 to use AI agents to automate the work of about 70,000 United States federal government employees, as part of a startup with funding from OpenAI and a partnership agreement with Palantir. This proposal was criticized by experts for its impracticality, if not impossibility, and the lack of corresponding widespread adoption by businesses. In December 2025, the Food and Drug Administration announced that it would offer "agentic AI capabilities" to its staff for "meeting management, pre-market reviews, review validation, post-market surveillance, inspections and compliance and administrative functions." That same month, the United States Department of Defense launched GenAI.mil, an internal platform for American military personnel to use generative AI-based applications based on Google Gemini, including "intelligent agentic workflows". Defense Secretary Pete Hegseth listed applications such as "[conducting] deep r

    Read more →
  • Flat-field correction

    Flat-field correction

    Flat-field correction (FFC) is a digital imaging technique to mitigate pixel-to-pixel differences in the photodetector sensitivity and distortions in the optical path. It is a standard calibration procedure in everything from personal digital cameras to large telescopes. == Overview == Flat fielding refers to the process of compensating for different gains and dark currents in a detector. Once a detector has been appropriately flat-fielded, a uniform signal will create a uniform output (hence flat-field). This then means any further signal is due to the phenomenon being detected and not a systematic error. A flat-field image is acquired by imaging a uniformly-illuminated screen, thus producing an image of uniform color and brightness across the frame. For handheld cameras, the screen could be a piece of paper at arm's length, but a telescope will frequently image a clear patch of sky at twilight, when the illumination is uniform and there are few, if any, stars visible. Once the images are acquired, processing can begin. A flat-field consists of two numbers for each pixel, the pixel's gain and its dark current (or dark frame). The pixel's gain is how the amount of signal given by the detector varies as a function of the amount of light (or equivalent). The gain is almost always a linear variable, as such the gain is given simply as the ratio of the input and output signals. The dark-current is the amount of signal given out by the detector when there is no incident light (hence dark frame). In many detectors this can also be a function of time, for example in astronomical telescopes it is common to take a dark-frame of the same time as the planned light exposure. The gain and dark-frame for optical systems can also be established by using a series of neutral density filters to give input/output signal information and applying a least squares fit to obtain the values for the dark current and gain. C = ( R − D ) × m ( F − D ) = ( R − D ) × G {\displaystyle C={\frac {(R-D)\times m}{(F-D)}}=(R-D)\times G} where: C = corrected image R = raw image F = flat field image D = dark frame image m = image-averaged value of (F−D) G = Gain = m ( F − D ) {\displaystyle m \over (F-D)} In this equation, capital letters are 2D matrices, and lowercase letters are scalars. All matrix operations are performed element-by-element. In order for an astrophotographer to capture a light frame, they must place a light source over the imaging instrument's objective lens such that the light source emanates evenly through the users optics. The photographer must then adjust the exposure of their imaging device (charge-coupled device (CCD) or digital single-lens reflex camera (DSLR) ) so that when the histogram of the image is viewed, a peak reaching about 40–70% of the dynamic range (maximum range of pixel values) of the imaging device is seen. The photographer typically takes 15–20 light frames and performs median stacking. Once the desired light frames are acquired, the objective lens is covered so that no light is allowed in, then 15–20 dark frames are taken, each of equal exposure time as a light frame. These are called Dark-Flat frames. == In X-ray imaging == In X-ray imaging, the acquired projection images generally suffer from fixed-pattern noise, which is one of the limiting factors of image quality. It may stem from beam inhomogeneity, gain variations of the detector response due to inhomogeneities in the photon conversion yield, losses in charge transport, charge trapping, or variations in the performance of the readout. Also, the scintillator screen may accumulate dust and/or scratches on its surface, resulting in systematic patterns in every acquired X-ray projection image. In X-ray computed tomography (CT), fixed-pattern noise is known to significantly degrade the achievable spatial resolution and generally leads to ring or band artifacts in the reconstructed images. Fixed pattern noise can be easily removed using flat field correction. In conventional flat field correction, projection images without sample are acquired with and without the X-ray beam turned on, which are referred to as flat fields (F) and dark fields (D). Based on the acquired flat and dark fields, the measured projection images (P) with sample are then normalized to new images (N) according to: N = ( P − D ) ( F − D ) {\displaystyle N={\frac {(P-D)}{(F-D)}}} == Dynamic flat field correction == While conventional flat field correction is an elegant and easy procedure that largely reduces fixed-pattern noise, it heavily relies on the stationarity of the X-ray beam, scintillator response and CCD sensitivity. In practice, however, this assumption is only approximately met. Indeed, detector elements are characterized by intensity dependent, nonlinear response functions and the incident beam often shows time dependent non-uniformities, which render conventional FFC inadequate. In synchrotron X-ray tomography, many factors may cause flat field variations: instability of the bending magnets of the synchrotron, temperature variations due to the water cooling in mirrors and the monochromator, or vibrations of the scintillator and other beamline components. The latter is responsible for the biggest variations in the flat fields. To deal with such variations, a dynamic flat field correction procedure can be employed that estimates a flat field for each individual projection. Through principal component analysis of a set of flat fields, which are acquired prior and/or posterior to the actual scan, eigen flat fields can be computed. A linear combination of the most important eigen flat fields can then be used to individually normalize each X-ray projection: N j = P j − D ¯ F ¯ + ∑ k w j k u k − D ¯ {\displaystyle N_{j}={\frac {P_{j}-{\bar {D}}}{{\bar {F}}+\sum _{k}w_{jk}u_{k}-{\bar {D}}}}} where N j {\displaystyle N_{j}} = intensity normalized X-ray projection P j {\displaystyle P_{j}} = raw X-ray projection F ¯ {\displaystyle {\bar {F}}} = mean flat field image (average of flat fields) u k {\displaystyle u_{k}} = k-th eigen flat field w j k {\displaystyle w_{jk}} = weight of the eigen flat field u k {\displaystyle u_{k}} D ¯ {\displaystyle {\bar {D}}} = mean dark field (average of dark fields)

    Read more →
  • Intelligent control

    Intelligent control

    Intelligent control is a class of control techniques that use various artificial intelligence computing approaches like neural networks, Bayesian probability, fuzzy logic, machine learning, reinforcement learning, evolutionary computation and genetic algorithms. == Overview == Intelligent control can be divided into the following major sub-domains: Neural network control Machine learning control Reinforcement learning Bayesian control Fuzzy control Neuro-fuzzy control Expert Systems Genetic control New control techniques are created continuously as new models of intelligent behavior are created and computational methods developed to support them. === Neural network controller === Neural networks have been used to solve problems in almost all spheres of science and technology. Neural network control basically involves two steps: System identification Control It has been shown that a feedforward network with nonlinear, continuous and differentiable activation functions have universal approximation capability. Recurrent networks have also been used for system identification. Given, a set of input-output data pairs, system identification aims to form a mapping among these data pairs. Such a network is supposed to capture the dynamics of a system. For the control part, deep reinforcement learning has shown its ability to control complex systems. === Bayesian controllers === Bayesian probability has produced a number of algorithms that are in common use in many advanced control systems, serving as state space estimators of some variables that are used in the controller. The Kalman filter and the Particle filter are two examples of popular Bayesian control components. The Bayesian approach to controller design often requires an important effort in deriving the so-called system model and measurement model, which are the mathematical relationships linking the state variables to the sensor measurements available in the controlled system. In this respect, it is very closely linked to the system-theoretic approach to control design.

    Read more →
  • Hugging Face

    Hugging Face

    Hugging Face, Inc., is an American company based in New York City that develops computation tools for building applications using machine learning. Its transformers library built for natural language processing applications and its platform allow users to share machine learning models and datasets and showcase their work. == History == === Founding === The company was founded in 2016 by French entrepreneurs Clément Delangue, Julien Chaumond, and Thomas Wolf in New York City, originally as a company that developed a chatbot app targeted at teenagers. The company was named after the U+1F917 🤗 HUGGING FACE emoji. After open sourcing the model behind the chatbot, the company pivoted to focus on being a platform for machine learning. === AI boom === On April 28, 2021, the company launched the BigScience Research Workshop in collaboration with several other research groups to release an open large language model. In 2022, the workshop concluded with the announcement of BLOOM, a multilingual large language model with 176 billion parameters. In February 2023, the company announced partnership with Amazon Web Services (AWS) which would allow Hugging Face's products to be available to AWS customers to use them as the building blocks for their custom applications. The company also said the next generation of BLOOM will be run on Trainium, a proprietary machine learning chip created by AWS. In June 2024, the company announced, along with Meta and Scaleway, their launch of a new AI accelerator program for European startups. The initiative aimed to help startups integrate open foundation models into their products, accelerating the EU AI ecosystem. The program, based at STATION F in Paris, ran from September 2024 to February 2025. Selected startups received mentoring, and access to AI models and tools and Scaleway's computing power. On September 23, 2024, to further the International Decade of Indigenous Languages, Hugging Face teamed up with Meta and UNESCO to launch a new online language translator. It was built on Meta's No Language Left Behind open-source AI model, enabling free text translation across 200 languages, including many low-resource languages. In April 2025, Hugging Face announced that they acquired a humanoid robotics startup, Pollen Robotics, based in France and founded by Matthieu Lapeyre and Pierre Rouanet in 2016. In an X tweet, Delangue shared his vision to "make Artificial Intelligence robotics Open Source". === Cyberattacks === In early 2026, hackers hijacked the Hugging Face platform to launch Android-targeted attacks involving "powerful malware" which could completely take over a compromised target.

    Read more →