AI Bot Grammar Checker

AI Bot Grammar Checker — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • 1 Second Everyday

    1 Second Everyday

    1 Second Everyday (1SE) is an application developed by Cesar Kuriyama. The application allows the user to record one second of video every day and then chronologically edits (mashes) them together into a single film. It is compatible with iOS and Android. The idea of the application was developed by Kuriyama's 1 Second Everyday — Age 30 video. The application was launched in January 2013. 1 Second Everyday played a part in the plot of Chef and also became the inspiration for the 2014 short animated clip Feast. == Background == === Kuriyama's video === In February 2011, when Cesar Kuriyama turned 30, after saving money, he quit his job in an advertising firm and took a year off to travel. During this time, he started working on a project he called 1 Second Everyday. As part of the project, every day he recorded one second of video – something that was supposed to help him remember that day. He started the project because he was frustrated with his memory. He planned to stockpile the 365 one-second clips into one film to serve as a memento of his year. While working on the project Kuriyama realized that recording one second every day impacted the decisions he made in a positive way. After a year he made a 365-second clip out of his recordings. The video called 1 Second Everyday – Age 30, went viral. According to Kuriyama, he was initially inspired to take a year off from work by a TED talk given by Stefan Sagmeister called "The Power of Time Off." Kuriyama also delivered a TED talk about 1 Second Everyday in 2012 at TED 2012 in Long Beach California. === Kickstarter campaign === After completing his own video, Kuriyama decided to develop an application that would allow the users to record one second every day and compile their own videos. He developed a prototype of the application and then in 2012, he launched a Kickstarter campaign to raise funds for completing the application. The campaign became one of the most backed app campaigns in the history of Kickstarter. It was backed by 11,281 backers who pledged a total of $56,959 on an initial goal of $20,000. Following the completion of the Kickstarter campaign, he partnered with an application design studio in Brooklyn to develop the application. 1 Second Everyday was released two weeks after the completion of its Kickstarter campaign. == Application == The application was released for iOS on 10 January 2013. An Android-compatible version of the application was developed later. Using it, the user can record the videos in the application or they can select one second portions from their libraries. 1 Second Everyday dates every snippet. The user can also set alarms to remember to record their daily video. In order to compile a video, the user selects the seconds they want and the application creates a compilation video. The user can keep multiple timelines. It also allows users to post directly on social networks. The main interface in 1 Second Everyday is a calendar, which shows the user which days have snippets and which they can still fill in. In the beginning, 1 Second Everyday restricted the recording to one second. However, the developers later released Super Seconds, which allowed users to record an additional half a second video. In 2014, 1 Second Everyday Crowds was launched, which is an area in the application featuring compilations of second clips from different users. == In the media == The Kickstarter campaign of 1 Second Everyday was featured in Entrepreneur's 3 Innovative Tech Startups on Kickstarter Right Now in 2012. The application was featured in The New York Times, The Washington Post, Gawker and other media outlets. By the end of the launch day, it was in Top 10 Free Apps on App Store. It was also selected as the App of the Week on GeekWire in 2013. Several other one-second compilation videos were also posted on the Internet after Kuriyama's video gained media attention. Sam Cornwell, an English photographer documented his son Indigo's growth using a montage of one-second iPhone clips. He shot these clips every single day from the moment of birth right up to the baby's first birthday. According to Cornwell, he was inspired by Kuriyama's project. The video of Cornwell's son gained considerable media attention after it was posted on YouTube. Save the Children also made a video commercial based on a similar format that showed a British girl oblivious of the Syrian war end up being a refugee. 1SE was a finalist for the Fast Company Innovation by Design Award in 2015, but lost to Google Maps. In 2015, Google Android created a gallery, Leap Second 2015, with the help of Droga5 and Kuriyama. The gallery showcased how people around the world enjoyed the one extra second of their lives. Through the 1 Second Everyday app available at Google Play, people were able to submit their extra second, which were then vetted and added to the gallery. The viewers were able to view other celebratory seconds from around the world as well as searching for them using different hashtags.

    Read more →
  • Semantic folding

    Semantic folding

    Semantic folding theory describes a procedure for encoding the semantics of natural language text in a semantically grounded binary representation. This approach provides a framework for modelling how language data is processed by the neocortex. == Theory == Semantic folding theory draws inspiration from Douglas R. Hofstadter's Analogy as the Core of Cognition which suggests that the brain makes sense of the world by identifying and applying analogies. The theory hypothesises that semantic data must therefore be introduced to the neocortex in such a form as to allow the application of a similarity measure and offers, as a solution, the sparse binary vector employing a two-dimensional topographic semantic space as a distributional reference frame. The theory builds on the computational theory of the human cortex known as hierarchical temporal memory (HTM), and positions itself as a complementary theory for the representation of language semantics. A particular strength claimed by this approach is that the resulting binary representation enables complex semantic operations to be performed simply and efficiently at the most basic computational level. == Two-dimensional semantic space == Analogous to the structure of the neocortex, Semantic Folding theory posits the implementation of a semantic space as a two-dimensional grid. This grid is populated by context-vectors in such a way as to place similar context-vectors closer to each other, for instance, by using competitive learning principles. This vector space model is presented in the theory as an equivalence to the well known word space model described in the information retrieval literature. Given a semantic space (implemented as described above) a word-vector can be obtained for any given word Y by employing the following algorithm: For each position X in the semantic map (where X represents cartesian coordinates) if the word Y is contained in the context-vector at position X then add 1 to the corresponding position in the word-vector for Y else add 0 to the corresponding position in the word-vector for Y The result of this process will be a word-vector containing all the contexts in which the word Y appears and will therefore be representative of the semantics of that word in the semantic space. It can be seen that the resulting word-vector is also in a sparse distributed representation (SDR) format [Schütze, 1993] & [Sahlgreen, 2006]. Some properties of word-SDRs that are of particular interest with respect to computational semantics are: high noise resistance: As a result of similar contexts being placed closer together in the underlying map, word-SDRs are highly tolerant of false or shifted "bits". boolean logic: It is possible to manipulate word-SDRs in a meaningful way using boolean (OR, AND, exclusive-OR) and/or arithmetical (SUBtract) functions . sub-sampling: Word-SDRs can be sub-sampled to a high degree without any appreciable loss of semantic information. topological two-dimensional representation: The SDR representation maintains the topological distribution of the underlying map therefore words with similar meanings will have similar word-vectors. This suggests that a variety of measures can be applied to the calculation of semantic similarity, from a simple overlap of vector elements, to a range of distance measures such as: Euclidean distance, Hamming distance, Jaccard distance, cosine similarity, Levenshtein distance, Sørensen-Dice index, etc. == Semantic spaces == Semantic spaces in the natural language domain aim to create representations of natural language that are capable of capturing meaning. The original motivation for semantic spaces stems from two core challenges of natural language: Vocabulary mismatch (the fact that the same meaning can be expressed in many ways) and ambiguity of natural language (the fact that the same term can have several meanings). The application of semantic spaces in natural language processing (NLP) aims at overcoming limitations of rule-based or model-based approaches operating on the keyword level. The main drawback with these approaches is their brittleness, and the large manual effort required to create either rule-based NLP systems or training corpora for model learning. Rule-based and machine learning-based models are fixed on the keyword level and break down if the vocabulary differs from that defined in the rules or from the training material used for the statistical models. Research in semantic spaces dates back more than 20 years. In 1996, two papers were published that raised a lot of attention around the general idea of creating semantic spaces: latent semantic analysis from Microsoft and Hyperspace Analogue to Language from the University of California. However, their adoption was limited by the large computational effort required to construct and use those semantic spaces. A breakthrough with regard to the accuracy of modelling associative relations between words (e.g. "spider-web", "lighter-cigarette", as opposed to synonymous relations such as "whale-dolphin", "astronaut-driver") was achieved by explicit semantic analysis (ESA) in 2007. ESA was a novel (non-machine learning) based approach that represented words in the form of vectors with 100,000 dimensions (where each dimension represents an Article in Wikipedia). However practical applications of the approach are limited due to the large number of required dimensions in the vectors. More recently, advances in neural networking techniques in combination with other new approaches (tensors) led to a host of new recent developments: Word2vec from Google and GloVe from Stanford University. Semantic folding represents a novel, biologically inspired approach to semantic spaces where each word is represented as a sparse binary vector with 16,000 dimensions (a semantic fingerprint) in a 2D semantic map (the semantic universe). Sparse binary representation are advantageous in terms of computational efficiency, and allow for the storage of very large numbers of possible patterns. == Visualization == The topological distribution over a two-dimensional grid (outlined above) lends itself to a bitmap type visualization of the semantics of any word or text, where each active semantic feature can be displayed as e.g. a pixel. As can be seen in the images shown here, this representation allows for a direct visual comparison of the semantics of two (or more) linguistic items. Image 1 clearly demonstrates that the two disparate terms "dog" and "car" have, as expected, very obviously different semantics. Image 2 shows that only one of the meaning contexts of "jaguar", that of "Jaguar" the car, overlaps with the meaning of Porsche (indicating partial similarity). Other meaning contexts of "jaguar" e.g. "jaguar" the animal clearly have different non-overlapping contexts. The visualization of semantic similarity using Semantic Folding bears a strong resemblance to the fMRI images produced in a research study conducted by A.G. Huth et al., where it is claimed that words are grouped in the brain by meaning. voxels, little volume segments of the brain, were found to follow a pattern were semantic information is represented along the boundary of the visual cortex with visual and linguistic categories represented on posterior and anterior side respectively.

    Read more →
  • Decision tree pruning

    Decision tree pruning

    Pruning is a data compression technique in machine learning and search algorithms that reduces the size of decision trees by removing sections of the tree that are non-critical and redundant to classify instances. Pruning reduces the complexity of the final classifier, and hence improves predictive accuracy by the reduction of overfitting. One of the questions that arises in a decision tree algorithm is the optimal size of the final tree. A tree that is too large risks overfitting the training data and poorly generalizing to new samples. A small tree might not capture important structural information about the sample space. However, it is hard to tell when a tree algorithm should stop because it is impossible to tell if the addition of a single extra node will dramatically decrease error. This problem is known as the horizon effect. A common strategy is to grow the tree until each node contains a small number of instances then use pruning to remove nodes that do not provide additional information. Pruning should reduce the size of a learning tree without reducing predictive accuracy as measured by a cross-validation set. There are many techniques for tree pruning that differ in the measurement that is used to optimize performance. == Techniques == Pruning processes can be divided into two types (pre- and post-pruning). Pre-pruning procedures prevent a complete induction of the training set by replacing a stop () criterion in the induction algorithm (e.g. max. Tree depth or information gain (Attr)> minGain). Pre-pruning methods are considered to be more efficient because they do not induce an entire set, but rather trees remain small from the start. Prepruning methods share a common problem, the horizon effect. This is to be understood as the undesired premature termination of the induction by the stop () criterion. Post-pruning (or just pruning) is the most common way of simplifying trees. Here, nodes and subtrees are replaced with leaves to reduce complexity. Pruning can not only significantly reduce the size but also improve the classification accuracy of unseen objects. It may be the case that the accuracy of the assignment on the train set deteriorates, but the accuracy of the classification properties of the tree increases overall. The procedures are differentiated on the basis of their approach in the tree (top-down or bottom-up). === Bottom-up pruning === These procedures start at the last node in the tree (the lowest point). Following recursively upwards, they determine the relevance of each individual node. If the relevance for the classification is not given, the node is dropped or replaced by a leaf. The advantage is that no relevant sub-trees can be lost with this method. These methods include Reduced Error Pruning (REP), Minimum Cost Complexity Pruning (MCCP), or Minimum Error Pruning (MEP). === Top-down pruning === In contrast to the bottom-up method, this method starts at the root of the tree. Following the structure below, a relevance check is carried out which decides whether a node is relevant for the classification of all n items or not. By pruning the tree at an inner node, it can happen that an entire sub-tree (regardless of its relevance) is dropped. One of these representatives is pessimistic error pruning (PEP), which brings quite good results with unseen items. == Pruning algorithms == === Reduced error pruning === One of the simplest forms of pruning is reduced error pruning. Starting at the leaves, each node is replaced with its most popular class. If the prediction accuracy is not affected then the change is kept. While somewhat naive, reduced error pruning has the advantage of simplicity and speed. === Cost complexity pruning === Cost complexity pruning generates a series of trees ⁠ T 0 … T m {\displaystyle T_{0}\dots T_{m}} ⁠ where ⁠ T 0 {\displaystyle T_{0}} ⁠ is the initial tree and ⁠ T m {\displaystyle T_{m}} ⁠ is the root alone. At step ⁠ i {\displaystyle i} ⁠, the tree is created by removing a subtree from tree ⁠ i − 1 {\displaystyle i-1} ⁠ and replacing it with a leaf node with value chosen as in the tree building algorithm. The subtree that is removed is chosen as follows: Define the error rate of tree ⁠ T {\displaystyle T} ⁠ over data set ⁠ S {\displaystyle S} ⁠ as ⁠ err ⁡ ( T , S ) {\displaystyle \operatorname {err} (T,S)} ⁠. The subtree t {\displaystyle t} that minimizes err ⁡ ( prune ⁡ ( T , t ) , S ) − err ⁡ ( T , S ) | leaves ⁡ ( T ) | − | leaves ⁡ ( prune ⁡ ( T , t ) ) | {\displaystyle {\frac {\operatorname {err} (\operatorname {prune} (T,t),S)-\operatorname {err} (T,S)}{\left\vert \operatorname {leaves} (T)\right\vert -\left\vert \operatorname {leaves} (\operatorname {prune} (T,t))\right\vert }}} is chosen for removal. The function ⁠ prune ⁡ ( T , t ) {\displaystyle \operatorname {prune} (T,t)} ⁠ defines the tree obtained by pruning the subtrees ⁠ t {\displaystyle t} ⁠ from the tree ⁠ T {\displaystyle T} ⁠. Once the series of trees has been created, the best tree is chosen by generalized accuracy as measured by a training set or cross-validation. == Examples == Pruning could be applied in a compression scheme of a learning algorithm to remove the redundant details without compromising the model's performances. In neural networks, pruning removes entire neurons or layers of neurons.

    Read more →
  • Artificial intelligence in hiring

    Artificial intelligence in hiring

    Artificial intelligence can be used to automate aspects of the job recruitment process. Advances in artificial intelligence, such as the advent of machine learning and the growth of big data, enable AI to be utilized to recruit, screen, and predict the success of applicants. Proponents of artificial intelligence in hiring claim it reduces bias, assists with finding qualified candidates, and frees up human resource workers' time for other tasks, while opponents worry that AI perpetuates inequalities in the workplace and will eliminate jobs. Despite the potential benefits, the ethical implications of AI in hiring remain a subject of debate, with concerns about algorithmic transparency, accountability, and the need for ongoing oversight to ensure fair and unbiased decision-making throughout the recruitment process. == Background == It is common for companies to use AI to automate aspects of their hiring process, especially the hospitality, finance, and tech industries. == Uses == === Screeners === Screeners are tests that allow companies to sift through a large applicant pool and extract applicants that have desirable features. What factors are used to screen applicants is a concern to ethicists and civil rights activists. A screener that favors people who have similar characteristics to those already employed at a company may perpetuate inequalities. For example, if a company that is predominantly white and male uses its employees' data to train its screener it may accidentally create a screening process that favors white, male applicants. The automation of screeners also has the potential to reduce biases. Biases against applicants with African American sounding names have been shown in multiple studies. An AI screener has the potential to limit human bias and error in the hiring process, allowing more minority applicants to be successful. === Recruitment === Recruitment involves the identification of potential applicants and the marketing of positions. AI is commonly utilized in the recruitment process because it can help boost the number of qualified applicants for positions. Companies are able to use AI to target their marketing to applicants who are likely to be good fits for a position. This often involves the use of social media sites advertising tools, which rely on AI. Facebook allows advertisers to target ads based on demographics, location, interests, behavior, and connections. Facebook also allows companies to target a "look-a-like" audience, that is the company supplies Facebook with a data set, typically the company's current employees, and Facebook will target the ad to profiles that are similar to the profiles in the data set. Additionally, job sites like Indeed, Glassdoor, and ZipRecruiter target job listings to applicants that have certain characteristics employers are looking for. Targeted advertising has many advantages for companies trying to recruit such being a more efficient use of resources, reaching a desired audience, and boosting qualified applicants. This has helped make it a mainstay in modern hiring. Who receives a targeted ad can be controversial. In hiring, the implications of targeted ads have to do with who is able to find out about and then apply to a position. Most targeted ad algorithms are proprietary information. Some platforms, like Facebook and Google, allow users to see why they were shown a specific ad, but users who do not receive the ad likely never know of its existence and also have no way of knowing why they were not shown the ad. === Interviews === Chatbots were one of the first applications of AI and are commonly used in the hiring process. Interviewees interact with chatbots to answer interview questions, and an analysis of their responses can be generated by AI. HireVue has created technology that analyzes interviewees' responses and gestures during recorded video interviews. Over 12 million interviewees have been screened by the more than 700 companies that utilize the service. == Controversies == Artificial intelligence in hiring confers many benefits, but it also has some challenges that have concerned experts. AI is only as good as the data it is using. Biases can inadvertently be baked into the data used in AI. Often companies will use data from their employees to decide what people to recruit or hire. This can perpetuate bias and lead to more homogenous workforces. Facebook Ads was an example of a platform that created such controversy for allowing business owners to specify what type of employee they are looking for. For example, job advertisements for nursing and teach could be set such that only women of a specific age group would see the advertisements. Facebook Ads has since then removed this function from its platform, citing the potential problems with the function in perpetuating biases and stereotypes against minorities. The growing use of Artificial Intelligence-enabled hiring systems has become an important component of modern talent hiring, particularly through social networks such as LinkedIn and Facebook. However, data overflow embedded in the hiring systems, based on Natural Language Processing (NLP) methods, may result in unconscious gender bias. Utilizing data driven methods may mitigate some bias generated from these systems It can also be hard to quantify what makes a good employee. This poses a challenge for training AI to predict which employees will be best. Commonly used metrics like performance reviews can be subjective and have been shown to favor white employees over black employees and men over women. Another challenge is the limited amount of available data. Employers only collect certain details about candidates during the initial stages of the hiring process. This requires AI to make determinations about candidates with very limited information to go off of. Additionally, many employers do not hire employees frequently and so have limited firm specific data to go off. To combat this, many firms will use algorithms and data from other firms in their industry. AI's reliance on applicant and current employees personal data raises privacy issues. These issues effect both the applicants and current employees, but also may have implications for third parties who are linked through social media to applicants or current employees. For example, a sweep of someone's social media will also show their friends and people they have tagged in photos or posts. == AI and the future of hiring == Artificial intelligence along with other technological advances such as improvements in robotics have placed 47% of jobs at risk of being eliminated in the near future. In 2016 the founder of the World Economic Forum, Klaus Schwab, called AI and related technology the "Fourth Industrial Revolution". According to some scholars, however, the transformative impact of AI on labor has been overstated. The "no-real-change" theory holds that an IT revolution has already occurred, but that the benefits of implementing new technologies does not outweigh the costs associated with adopting them. This theory claims that the result of the IT revolution is thus much less impactful than had originally been forecasted. Other scholars refute this theory claiming that AI has already led to significant job loss for unskilled labor and that it will eliminate middle skill and high skill jobs in the future. This position is based around the idea that AI is not yet a technology of general use and that any potential 4th industrial revolution has not fully occurred. A third theory holds that the effect of AI and other technological advances is too complicated to yet be understood. This theory is centered around the idea that while AI will likely eliminate jobs in the short term it will also likely increase the demand for other jobs. The question then becomes will the new jobs be accessible to people and will they emerge near when jobs are eliminated. == AI use in hiring for candidates == Job seekers now commonly encounter AI-driven tools at multiple stages, including automated resume parsing, video interview analysis, chatbots for frequently asked questions, and real‑time application updates. Some candidates also employ AI career agents, designed to optimize job searches, tailor applications, and interface with hiring teams. A 2025 Australian study found that AI-driven video interviews exhibited transcription error rates of up to 22% for non‑native speakers and those with speech-related disabilities, raising concerns of discrimination. A 2017 study in the Journal of Sociology found persistent gender and racial disparities in AI screening tools, even when fairness interventions are applied. Industry observers describe a growing “AI arms race” in recruitment, where both employers and candidates increasingly rely on automated agents. Employers use recruiting systems to source and filter applicants, while candidates deploy AI agents to prepare and submit applications. == Regulations == The Artifici

    Read more →
  • MobileNet

    MobileNet

    MobileNet is a family of convolutional neural network (CNN) architectures designed for image classification, object detection, and other computer vision tasks. They are designed for small size, low latency, and low power consumption, making them suitable for on-device inference and edge computing on resource-constrained devices like mobile phones and embedded systems. They were originally designed to be run efficiently on mobile devices with TensorFlow Lite. The need for efficient deep learning models on mobile devices led researchers at Google to develop MobileNet. As of June 2025, the family has five versions, each improving upon the previous one in terms of performance and efficiency. == Features == === V1 === MobileNetV1 was published in April 2017. Its main architectural innovation was incorporation of depthwise separable convolutions. It was first developed by Laurent Sifre during an internship at Google Brain in 2013 as an architectural variation on AlexNet to improve convergence speed and model size. The depthwise separable convolution decomposes a single standard convolution into two convolutions: a depthwise convolution that filters each input channel independently and a pointwise convolution ( 1 × 1 {\displaystyle 1\times 1} convolution) that combines the outputs of the depthwise convolution. This factorization significantly reduces computational cost. The MobileNetV1 has two hyperparameters: a width multiplier α {\displaystyle \alpha } that controls the number of channels in each layer. Smaller values of α {\displaystyle \alpha } lead to smaller and faster models, but at the cost of reduced accuracy, and a resolution multiplier ρ {\displaystyle \rho } , which controls the input resolution of the images. Lower resolutions result in faster processing but potentially lower accuracy. === V2 === MobileNetV2 was published in March 2019. It uses inverted residual layers and linear bottlenecks. Inverted residuals modify the traditional residual block structure. Instead of compressing the input channels before the depthwise convolution, they expand them. This expansion is followed by a 1 × 1 {\displaystyle 1\times 1} depthwise convolution and then a 1 × 1 {\displaystyle 1\times 1} projection layer that reduces the number of channels back down. This inverted structure helps to maintain representational capacity by allowing the depthwise convolution to operate on a higher-dimensional feature space, thus preserving more information flow during the convolutional process. Linear bottlenecks removes the typical ReLU activation function in the projection layers. This was rationalized by arguing that that nonlinear activation loses information in lower-dimensional spaces, which is problematic when the number of channels is already small. === V3 === MobileNetV3 was published in 2019. The publication included MobileNetV3-Small, MobileNetV3-Large, and MobileNetEdgeTPU (optimized for Pixel 4). They were found by a form of neural architecture search (NAS) that takes mobile latency into account, to achieve good trade-off between accuracy and latency. It used piecewise-linear approximations of swish and sigmoid activation functions (which they called "h-swish" and "h-sigmoid"), squeeze-and-excitation modules, and the inverted bottlenecks of MobileNetV2. === V4 === MobileNetV4 was published in September 2024. The publication included a large number of architectures found by NAS. Inspired by Vision Transformers, the V4 series included multi-query attention. It also unified both inverted residual and inverted bottleneck from the V3 series with the "universal inverted bottleneck", which includes these two as special cases. === V5 === MobileNetV5's architecture was published shortly after the release of Gemma 3n in June 2025. While the announcement stated a technical report on MobileNetV5 would be available soon, this has not yet materialised. The network is 10 times larger than the largest V4 variant.

    Read more →
  • Biohybrid system

    Biohybrid system

    Biohybrid systems refer to the integration of biological materials, such as cells or tissues, with artificial components, including electronics or mechanical structure. This combination incorporates the capabilities of living organisms with the precision of man-made technology. As a result, these systems perform tasks that neither biology nor machines could achieve independently. Biohybrid systems might use lab-cultured muscle cells to power small robots or combine sensors with living tissue for better health sensing. The intent behind these systems is to combine the benefits of biological and technological components to introduce new solutions for complex medical challenges. Biohybrid systems may have transformative potential across sectors, such as robotics to create actuators and sensors that mimic natural muscle and nerve function, medicine in developing smart implants and drug delivery systems, in prosthetics for enhancing user control through neural or muscular interfaces and environmental sustainability for deploying biohybrid solutions for pollution sensing or remediation. == Origin == The term "biohybrid" is a compound of "bio" from biology (meaning life) and "hybrid" (referring to a combination of distinct elements), denoting a field of study. Its use helps distinguish such systems from purely biological constructs or entirely synthetic machines. Early academic mentions may include bio actuated robotics papers and foundational tissue-robot integration studies published in journals like Nature Biotechnology or Science Robotics. The emergence of the term reflects a growing recognition of the need to describe systems that do not fit cleanly into traditional categories. == Design principles == One of the most significant biohybrid challenges is to engineer interfaces between living tissue and artificial materials that are efficient. This means having precise control over adhesion at the surface, diffusion of nutrients, and signal conduction. Actuation mechanisms within the heart of these systems generate movement or mechanical response. These may be in the form of living muscle cells such as skeletal myocytes or cardiomyocytes, soft pneumatic actuators, or electrical stimulation-responsive tissues. Materials selection is equally critical. Hydrogels, elastomers like PDMS (polydimethylsiloxane), and biopolymers are commonly used due to their softness and biocompatibility. These materials must support cell viability, resist immune attack, and allow the integration of mechanical or electrical components. == Key components == At their core, biohybrid systems work by bridging living biological parts with technology. Through this integration, functionality that neither system could accomplish singularly is possible. Biological parts may be cells, tissues, or even organs—occasionally cultured in a laboratory setting. These biological parts carry out biologically inspired behaviors, such as muscle contraction or chemical sensing in the body. Technological components may constitute devices like sensors, electronic components, and mechanical structure. These manipulate the system, supply power, or transfer data. An example is a sensor that is implantable within a body and detects glucose levels as it sends information to a smart phone. By integrating these artificial and biological parts, biohybrid systems can perform advanced functions, such as tissue regeneration, real-time health monitoring, or the recovery of motor function in paralysis patients. Biohybrid systems generally consist of two major components: the biological and the mechanical. Biological components may include muscle cells for contraction, endothelial cells for vascularization, and stem cells for regenerative capabilities. Mechanical components comprise soft actuators that mimic organic motion, synthetic scaffolds that provide support and structure, and microfluidic systems that facilitate the delivery of nutrients and removal of waste. These components are combined in a manner that allows for dynamic, lifelike behavior—such as the contraction of tissue or the propagation of mechanical waves—while maintaining biocompatibility and durability. == Applications == The range of applications for biohybrid systems is broad and continuously expanding. In robotics, biohybrid structures have been used to engineer microscopic, muscle-driven machines, such as Harvard University's biohybrid stingray robot. In medical applications, they offer new alternatives for organ repair and augmentation, including biohybrid heart valves and esophageal scaffolds. Biohybrids are also promising in neural interfaces, where the goal is to create long-lasting, stable interaction between mechanical devices and brain tissue. Muscle-actuated drug response platforms are under exploration in pharmacology for modelling and real-time screening. == Examples == Several high-profile research projects have demonstrated the potential of biohybrid systems: Harvard researchers developed a biohybrid swimming ray powered by rat cardiac cells layered onto a gold skeleton, mimicking the motion of a real stingray. At the Massachusetts Institute of Technology, a cardiac pump actuated entirely by living heart muscle cells was engineered to simulate the behavior of a beating heart. Bio actuated soft robots have been built to simulate gut peristalsis, using muscle contractions to replicate natural wave-like movement in the digestive tract. == Challenges and limitations == As with many technologies that involve living systems, biohybrid systems raise important ethical and biomedical questions. Cell sourcing remains a key issue, particularly when embryonic or animal-derived cells are used. Long-term viability is another concern—living tissues must be kept alive with nutrients and oxygen, and they often degrade or elicit immune responses when implanted. Powering these biological parts presents logistical and ethical hurdles as well. Systems must either include internal mechanisms for nutrient delivery or be supported externally, which can limit portability and independence. == Future directions == Researchers are exploring self-directed, self-regulated organ substitutes and regenerative implants that can respond to their surroundings in real-time. These systems may be integrated with artificial intelligence to make them adjust to stimuli and coordinate complex behaviors. Future potential applications are wearable biohybrid systems for rehabilitation, space medicine devices for long-duration missions, and implantable devices that integrate into human physiology.

    Read more →
  • Three-factor learning

    Three-factor learning

    In neuroscience and machine learning, three-factor learning is the combination of Hebbian plasticity with a third modulatory factor to stabilise and enhance synaptic learning. This third factor can represent various signals such as reward, punishment, error, surprise, or novelty, often implemented through neuromodulators. == Description == Three-factor learning introduces the concept of eligibility traces, which flag synapses for potential modification pending the arrival of the third factor, and helps temporal credit assignement by bridging the gap between rapid neuronal firing and slower behavioral timescales, from which learning can be done. Biological basis for Three-factor learning rules have been supported by experimental evidence. This approach addresses the instability of classical Hebbian learning by minimizing autocorrelation and maximizing cross-correlation between inputs.

    Read more →
  • Developmental robotics

    Developmental robotics

    Developmental robotics (DevRob), sometimes called epigenetic robotics, is a scientific field which aims at studying the developmental mechanisms, architectures and constraints that allow lifelong and open-ended learning of new skills and new knowledge in embodied machines. As in human children, learning is expected to be cumulative and of progressively increasing complexity, and to result from self-exploration of the world in combination with social interaction. The typical methodological approach consists in starting from theories of human and animal development elaborated in fields such as developmental psychology, neuroscience, developmental and evolutionary biology, and linguistics, then to formalize and implement them in robots, sometimes exploring extensions or variants of them. The experimentation of those models in robots allows researchers to confront them with reality, and as a consequence, developmental robotics also provides feedback and novel hypotheses on theories of human and animal development. Developmental robotics is related to but differs from evolutionary robotics (ER). ER uses populations of robots that evolve over time, whereas DevRob is interested in how the organization of a single robot's control system develops through experience, over time. DevRob is also related to work done in the domains of robotics and artificial life. == Background == Can a robot learn like a child? Can it learn a variety of new skills and new knowledge unspecified at design time and in a partially unknown and changing environment? How can it discover its body and its relationships with the physical and social environment? How can its cognitive capacities continuously develop without the intervention of an engineer once it is "out of the factory"? What can it learn through natural social interactions with humans? These are the questions at the center of developmental robotics. Alan Turing, as well as a number of other pioneers of cybernetics, already formulated those questions and the general approach in 1950, but it is only since the end of the 20th century that they began to be investigated systematically. Because the concept of adaptive intelligent machines is central to developmental robotics, it has relationships with fields such as artificial intelligence, machine learning, cognitive robotics or computational neuroscience. Yet, while it may reuse some of the techniques elaborated in these fields, it differs from them from many perspectives. It differs from classical artificial intelligence because it does not assume the capability of advanced symbolic reasoning and focuses on embodied and situated sensorimotor and social skills rather than on abstract symbolic problems. It differs from cognitive robotics because it focuses on the processes that allow the formation of cognitive capabilities rather than these capabilities themselves. It differs from computational neuroscience because it focuses on functional modeling of integrated architectures of development and learning. More generally, developmental robotics is uniquely characterized by the following three features: It targets task-independent architectures and learning mechanisms, i.e. the machine/robot has to be able to learn new tasks that are unknown by the engineer; It emphasizes open-ended development and lifelong learning, i.e. the capacity of an organism to acquire continuously novel skills. This should not be understood as a capacity for learning "anything" or even “everything”, but just that the set of skills that is acquired can be infinitely extended at least in some (not all) directions; The complexity of acquired knowledge and skills shall increase (and the increase be controlled) progressively. Developmental robotics emerged at the crossroads of several research communities including embodied artificial intelligence, enactive and dynamical systems cognitive science, connectionism. Starting from the essential idea that learning and development happen as the self-organized result of the dynamical interactions among brains, bodies and their physical and social environment, and trying to understand how this self-organization can be harnessed to provide task-independent lifelong learning of skills of increasing complexity, developmental robotics strongly interacts with fields such as developmental psychology, developmental and cognitive neuroscience, developmental biology (embryology), evolutionary biology, and cognitive linguistics. As many of the theories coming from these sciences are verbal and/or descriptive, this implies a crucial formalization and computational modeling activity in developmental robotics. These computational models are then not only used as ways to explore how to build more versatile and adaptive machines but also as a way to evaluate their coherence and possibly explore alternative explanations for understanding biological development. == Research directions == === Skill domains === Due to the general approach and methodology, developmental robotics projects typically focus on having robots develop the same types of skills as human infants. A first category that is important being investigated is the acquisition of sensorimotor skills. These include the discovery of one's own body, including its structure and dynamics such as hand-eye coordination, locomotion, and interaction with objects as well as tool use, with a particular focus on the discovery and learning of affordances. A second category of skills targeted by developmental robots are social and linguistic skills: the acquisition of simple social behavioural games such as turn-taking, coordinated interaction, lexicons, syntax and grammar, and the grounding of these linguistic skills into sensorimotor skills (sometimes referred as symbol grounding). In parallel, the acquisition of associated cognitive skills are being investigated such as the emergence of the self/non-self distinction, the development of attentional capabilities, of categorization systems and higher-level representations of affordances or social constructs, of the emergence of values, empathy, or theories of mind. === Mechanisms and constraints === The sensorimotor and social spaces in which humans and robot live are so large and complex that only a small part of potentially learnable skills can actually be explored and learnt within a life-time. Thus, mechanisms and constraints are necessary to guide developmental organisms in their development and control of the growth of complexity. There are several important families of these guiding mechanisms and constraints which are studied in developmental robotics, all inspired by human development: Motivational systems, generating internal reward signals that drive exploration and learning, which can be of two main types: extrinsic motivations push robots/organisms to maintain basic specific internal properties such as food and water level, physical integrity, or light (e.g. in phototropic systems); intrinsic motivations push robot to search for novelty, challenge, compression or learning progress per se, thus generating what is sometimes called curiosity-driven learning and exploration, or alternatively active learning and exploration; Social guidance: as humans learn a lot by interacting with their peers, developmental robotics investigates mechanisms that can allow robots to participate to human-like social interaction. By perceiving and interpreting social cues, this may allow robots both to learn from humans (through diverse means such as imitation, emulation, stimulus enhancement, demonstration, etc. ...) and to trigger natural human pedagogy. Thus, social acceptance of developmental robots is also investigated; Statistical inference biases and cumulative knowledge/skill reuse: biases characterizing both representations/encodings and inference mechanisms can typically allow considerable improvement of the efficiency of learning and are thus studied. Related to this, mechanisms allowing to infer new knowledge and acquire new skills by reusing previously learnt structures is also an essential field of study; The properties of embodiment, including geometry, materials, or innate motor primitives/synergies often encoded as dynamical systems, can considerably simplify the acquisition of sensorimotor or social skills, and is sometimes referred as morphological computation. The interaction of these constraints with other constraints is an important axis of investigation; Maturational constraints: In human infants, both the body and the neural system grow progressively, rather than being full-fledged already at birth. This implies, for example, that new degrees of freedom, as well as increases of the volume and resolution of available sensorimotor signals, may appear as learning and development unfold. Transposing these mechanisms in developmental robots, and understanding how it may hinder or on the contrary ease the acquisition of novel complex skills is a central questi

    Read more →
  • Autocommit

    Autocommit

    In the context of data management, autocommit is a mode of operation of a database connection. Each individual database interaction (i.e., each SQL statement) submitted through the database connection in autocommit mode will be executed in its own transaction that is implicitly committed. A SQL statement executed in autocommit mode cannot be rolled back. Autocommit mode incurs per-statement transaction overhead and can often lead to undesirable performance or resource utilization impact on the database. Nonetheless, in systems such as Microsoft SQL Server, as well as connection technologies such as ODBC and Microsoft OLE DB, autocommit mode is the default for all statements that change data, in order to ensure that individual statements will conform to the ACID (atomicity-consistency-isolation-durability) properties of transactions. The alternative to autocommit mode (non-autocommit) means that the SQL client application itself is responsible for ending transactions explicitly via the commit or rollback SQL commands. Non-autocommit mode enables grouping of multiple data manipulation SQL commands into a single atomic transaction. Some DBMS (e.g. MariaDB) force autocommit for every DDL statement, even in non-autocommit mode. In this case, before each DDL statement, previous DML statements in transaction are autocommitted. Each DDL statement is executed in its own new autocommit transaction.

    Read more →
  • H (company)

    H (company)

    H Company, also known simply as H, is a French artificial intelligence startup which develops "action-oriented" artificial intelligence agents for enterprise automation and productivity. In May 2024, H Company closed a record-setting $220 million seed round, at the time the largest AI raise in Europe. In 2026, H Company released Holo 3, the latest generation of its computer-use AI models. The update marked a major advance in agentic AI, enabling agents to navigate any user interface, interpret screens, and complete complex, multi-step tasks across enterprise systems—much like a human user. This breakthrough positioned H Company at the frontier of computer-use autonomy, accelerating the integration of AI in enterprise workflows. == History == H Company was founded in 2023 in Paris by Laurent Sifre, Charles Kantor, and three DeepMind veterans: Daan Wiestra, Karl Tuyls, Julien Perollat. In May 2024, the firm secured what was then the largest European AI seed round, totaling $220 million led by US investors including Eric Schmidt (former Google CEO), Amazon, and backed by Accel, Bpifrance, UiPath, Eurazeo, Xavier Niel, Yuri Milner, Bernard Arnault, Samsung and others. In August 2024, three cofounders (Wiestra, Tuyls, Perollat) left the company over operational disagreements. In November 2024, H launched Runner H, its first agentic-API platform, which combined a large language model (LLM) and a reduced, 2-billion parameter vision-language model (VLM). In May 2025, H Company acquired Mithril Security, and in June 2025 the company widened its offering for agentic models. In June 2025, Gautier Cloix (formerly CEO Palantir France) replaced Charles Kantor as CEO of H Company, aiming to pivot the company towards a "forward deployed engineers" model. In July 2025, H Company introduced Surfer-H-CLI, an open-source, web-native Chrome agent designed for browser-based automation—able to search, scroll, click, and type on behalf of users and controllable via any visual language model (VLM). When paired with its June 2025 open-sourced 3B-parameter Holo-1 model, Surfer-H-CLI achieved 92.2% WebVoyager benchmark accuracy. == Activity == H Company creates enterprise AI models and agents (agentic AI) to automate and optimize complex workflows. H Company specifically designs AI agents called computer use capable of autonomously interfacing with any software (local or cloud-based) to detect and automate repetitive operations. H Company is based in Paris, France, with international offices in London and New York. H Company raised $220 million since its inception. Gautier Cloix is president and CEO of the company. H Company client include the French national lottery FDJ United. In March 2026, H Company released Holo3, a family of artificial intelligence models designed to operate digital systems by interacting directly with user interfaces. Holo3 enables agents ("virtual humanoids") to understand what is displayed in front-end environments—such as web pages, desktop applications, and other graphical user interfaces—and perform actions such as clicking, typing, and navigating across them to complete multi-step tasks. On the OSWorld-Verified benchmark, Holo3 reportedly achieved about 78.9%, surpassing the scores of OpenAI’s GPT‑5.4 and Anthropic’s Claude Opus 4.6 on this specific test, at roughly one-tenth of the inference cost of these proprietary systems. The release has been presented as a significant step toward automating routine digital workflows, allowing organizations to offload repetitive on-screen work, such as data entry and reconciliation across multiple tools, to AI-based agents.

    Read more →
  • Empirical risk minimization

    Empirical risk minimization

    In statistical learning theory, the principle of empirical risk minimization defines a family of learning algorithms based on evaluating performance over a known and fixed dataset. The core idea is based on an application of the law of large numbers; more specifically, we cannot know exactly how well a predictive algorithm will work in practice (i.e. the "true risk") because we do not know the true distribution of the data, but we can instead estimate and optimize the performance of the algorithm on a known set of training data. The performance over the known set of training data is referred to as the "empirical risk". == Background == The following situation is a general setting of many supervised learning problems. There are two spaces of objects X {\displaystyle X} and Y {\displaystyle Y} and we would like to learn a function h : X → Y {\displaystyle \ h:X\to Y} (often called hypothesis) which outputs an object y ∈ Y {\displaystyle y\in Y} , given x ∈ X {\displaystyle x\in X} . To do so, there is a training set of n {\displaystyle n} examples ( x 1 , y 1 ) , … , ( x n , y n ) {\displaystyle \ (x_{1},y_{1}),\ldots ,(x_{n},y_{n})} where x i ∈ X {\displaystyle x_{i}\in X} is an input and y i ∈ Y {\displaystyle y_{i}\in Y} is the corresponding response that is desired from h ( x i ) {\displaystyle h(x_{i})} . To put it more formally, assuming that there is a joint probability distribution P ( x , y ) {\displaystyle P(x,y)} over X {\displaystyle X} and Y {\displaystyle Y} , and that the training set consists of n {\displaystyle n} instances ( x 1 , y 1 ) , … , ( x n , y n ) {\displaystyle \ (x_{1},y_{1}),\ldots ,(x_{n},y_{n})} drawn i.i.d. from P ( x , y ) {\displaystyle P(x,y)} . The assumption of a joint probability distribution allows for the modelling of uncertainty in predictions (e.g. from noise in data) because y {\displaystyle y} is not a deterministic function of x {\displaystyle x} , but rather a random variable with conditional distribution P ( y | x ) {\displaystyle P(y|x)} for a fixed x {\displaystyle x} . It is also assumed that there is a non-negative real-valued loss function L ( y ^ , y ) {\displaystyle L({\hat {y}},y)} which measures how different the prediction y ^ {\displaystyle {\hat {y}}} of a hypothesis is from the true outcome y {\displaystyle y} . For classification tasks, these loss functions can be scoring rules. The risk associated with hypothesis h ( x ) {\displaystyle h(x)} is then defined as the expectation of the loss function: R ( h ) = E [ L ( h ( x ) , y ) ] = ∫ L ( h ( x ) , y ) d P ( x , y ) . {\displaystyle R(h)=\mathbf {E} [L(h(x),y)]=\int L(h(x),y)\,dP(x,y).} A loss function commonly used in theory is the 0-1 loss function: L ( y ^ , y ) = { 1 if y ^ ≠ y 0 if y ^ = y {\displaystyle L({\hat {y}},y)={\begin{cases}1&{\mbox{ if }}\quad {\hat {y}}\neq y\\0&{\mbox{ if }}\quad {\hat {y}}=y\end{cases}}} . The ultimate goal of a learning algorithm is to find a hypothesis h ∗ {\displaystyle h^{}} among a fixed class of functions H {\displaystyle {\mathcal {H}}} for which the risk R ( h ) {\displaystyle R(h)} is minimal: h ∗ = a r g m i n h ∈ H R ( h ) . {\displaystyle h^{}={\underset {h\in {\mathcal {H}}}{\operatorname {arg\,min} }}\,{R(h)}.} For classification problems, the Bayes classifier is defined to be the classifier minimizing the risk defined with the 0–1 loss function. == Formal definition == In general, the risk R ( h ) {\displaystyle R(h)} cannot be computed because the distribution P ( x , y ) {\displaystyle P(x,y)} is unknown to the learning algorithm. However, given a sample of iid training data points, we can compute an estimate, called the empirical risk, by computing the average of the loss function over the training set; more formally, computing the expectation with respect to the empirical measure: R emp ( h ) = 1 n ∑ i = 1 n L ( h ( x i ) , y i ) . {\displaystyle \!R_{\text{emp}}(h)={\frac {1}{n}}\sum _{i=1}^{n}L(h(x_{i}),y_{i}).} The empirical risk minimization principle states that the learning algorithm should choose a hypothesis h ^ {\displaystyle {\hat {h}}} which minimizes the empirical risk over the hypothesis class H {\displaystyle {\mathcal {H}}} : h ^ = a r g m i n h ∈ H R emp ( h ) . {\displaystyle {\hat {h}}={\underset {h\in {\mathcal {H}}}{\operatorname {arg\,min} }}\,R_{\text{emp}}(h).} Thus, the learning algorithm defined by the empirical risk minimization principle consists in solving the above optimization problem. == Properties == Guarantees for the performance of empirical risk minimization depend strongly on the function class selected as well as the distributional assumptions made. In general, distribution-free methods are too coarse, and do not lead to practical bounds. However, they are still useful in deriving asymptotic properties of learning algorithms, such as consistency. In particular, distribution-free bounds on the performance of empirical risk minimization given a fixed function class can be derived using bounds on the VC complexity of the function class. For simplicity, considering the case of binary classification tasks, it is possible to bound the probability of the selected classifier, ϕ n {\displaystyle \phi _{n}} being much worse than the best possible classifier ϕ ∗ {\displaystyle \phi ^{}} . Consider the risk L {\displaystyle L} defined over the hypothesis class C {\displaystyle {\mathcal {C}}} with growth function S ( C , n ) {\displaystyle {\mathcal {S}}({\mathcal {C}},n)} given a dataset of size n {\displaystyle n} . Then, for every ϵ > 0 {\displaystyle \epsilon >0} : P ( L ( ϕ n ) − L ( ϕ ∗ ) > ϵ ) ≤ 8 S ( C , n ) exp ⁡ { − n ϵ 2 / 32 } {\displaystyle \mathbb {P} \left(L(\phi _{n})-L(\phi ^{})>\epsilon \right)\leq {\mathcal {8}}S({\mathcal {C}},n)\exp\{-n\epsilon ^{2}/32\}} Similar results hold for regression tasks. These results are often based on uniform laws of large numbers, which control the deviation of the empirical risk from the true risk, uniformly over the hypothesis class. === Impossibility results === It is also possible to show lower bounds on algorithm performance if no distributional assumptions are made. This is sometimes referred to as the No free lunch theorem. Even though a specific learning algorithm may provide the asymptotically optimal performance for any distribution, the finite sample performance is always poor for at least one data distribution. This means that no classifier can improve on the error for a given sample size for all distributions. Specifically, let ϵ > 0 {\displaystyle \epsilon >0} and consider a sample size n {\displaystyle n} and classification rule ϕ n {\displaystyle \phi _{n}} , there exists a distribution of ( X , Y ) {\displaystyle (X,Y)} with risk L ∗ = 0 {\displaystyle L^{}=0} (meaning that perfect prediction is possible) such that: E L n ≥ 1 / 2 − ϵ . {\displaystyle \mathbb {E} L_{n}\geq 1/2-\epsilon .} It is further possible to show that the convergence rate of a learning algorithm is poor for some distributions. Specifically, given a sequence of decreasing positive numbers a i {\displaystyle a_{i}} converging to zero, it is possible to find a distribution such that: E L n ≥ a i {\displaystyle \mathbb {E} L_{n}\geq a_{i}} for all n {\displaystyle n} . This result shows that universally good classification rules do not exist, in the sense that the rule must be low quality for at least one distribution. === Computational complexity === Empirical risk minimization for a classification problem with a 0-1 loss function is known to be an NP-hard problem even for a relatively simple class of functions such as linear classifiers. Nevertheless, it can be solved efficiently when the minimal empirical risk is zero, i.e., data is linearly separable. In practice, machine learning algorithms cope with this issue either by employing a convex approximation to the 0–1 loss function (like hinge loss for SVM), which is easier to optimize, or by imposing assumptions on the distribution P ( x , y ) {\displaystyle P(x,y)} (and thus stop being agnostic learning algorithms to which the above result applies). In the case of convexification, Zhang's lemma majors the excess risk of the original problem using the excess risk of the convexified problem. Minimizing the latter using convex optimization also allow to control the former. == Tilted empirical risk minimization == Tilted empirical risk minimization is a machine learning technique used to modify standard loss functions like squared error, by introducing a tilt parameter. This parameter dynamically adjusts the weight of data points during training, allowing the algorithm to focus on specific regions or characteristics of the data distribution. Tilted empirical risk minimization is particularly useful in scenarios with imbalanced data or when there is a need to emphasize errors in certain parts of the prediction space.

    Read more →
  • Kernel embedding of distributions

    Kernel embedding of distributions

    In machine learning, the kernel embedding of distributions (also called the kernel mean or mean map) comprises a class of nonparametric methods in which a probability distribution is represented as an element of a reproducing kernel Hilbert space (RKHS). A generalization of the individual data-point feature mapping done in classical kernel methods, the embedding of distributions into infinite-dimensional feature spaces can preserve all of the statistical features of arbitrary distributions, while allowing one to compare and manipulate distributions using Hilbert space operations such as inner products, distances, projections, linear transformations, and spectral analysis. This learning framework is very general and can be applied to distributions over any space Ω {\displaystyle \Omega } on which a sensible kernel function (measuring similarity between elements of Ω {\displaystyle \Omega } ) may be defined. For example, various kernels have been proposed for learning from data which are: vectors in R d {\displaystyle \mathbb {R} ^{d}} , discrete classes/categories, strings, graphs/networks, images, time series, manifolds, dynamical systems, and other structured objects. The theory behind kernel embeddings of distributions has been primarily developed by Alex Smola, Le Song, Arthur Gretton, and Bernhard Schölkopf. A review of recent works on kernel embedding of distributions can be found in. The analysis of distributions is fundamental in machine learning and statistics, and many algorithms in these fields rely on information theoretic approaches such as entropy, mutual information, or Kullback–Leibler divergence. However, to estimate these quantities, one must first either perform density estimation, or employ sophisticated space-partitioning/bias-correction strategies which are typically infeasible for high-dimensional data. Commonly, methods for modeling complex distributions rely on parametric assumptions that may be unfounded or computationally challenging (e.g. Gaussian mixture models), while nonparametric methods like kernel density estimation (Note: the smoothing kernels in this context have a different interpretation than the kernels discussed here) or characteristic function representation (via the Fourier transform of the distribution) break down in high-dimensional settings. Methods based on the kernel embedding of distributions sidestep these problems and also possess the following advantages: Data may be modeled without restrictive assumptions about the form of the distributions and relationships between variables Intermediate density estimation is not needed Practitioners may specify the properties of a distribution most relevant for their problem (incorporating prior knowledge via choice of the kernel) If a characteristic kernel is used, then the embedding can uniquely preserve all information about a distribution, while thanks to the kernel trick, computations on the potentially infinite-dimensional RKHS can be implemented in practice as simple Gram matrix operations Dimensionality-independent rates of convergence for the empirical kernel mean (estimated using samples from the distribution) to the kernel embedding of the true underlying distribution can be proven. Learning algorithms based on this framework exhibit good generalization ability and finite sample convergence, while often being simpler and more effective than information theoretic methods Thus, learning via the kernel embedding of distributions offers a principled drop-in replacement for information theoretic approaches and is a framework which not only subsumes many popular methods in machine learning and statistics as special cases, but also can lead to entirely new learning algorithms. == Definitions == Let X {\displaystyle X} denote a random variable with domain Ω {\displaystyle \Omega } and distribution P {\displaystyle P} . Given a symmetric, positive-definite kernel k : Ω × Ω → R {\displaystyle k:\Omega \times \Omega \rightarrow \mathbb {R} } the Moore–Aronszajn theorem asserts the existence of a unique RKHS H {\displaystyle {\mathcal {H}}} on Ω {\displaystyle \Omega } (a Hilbert space of functions f : Ω → R {\displaystyle f:\Omega \to \mathbb {R} } equipped with an inner product ⟨ ⋅ , ⋅ ⟩ H {\displaystyle \langle \cdot ,\cdot \rangle _{\mathcal {H}}} and a norm ‖ ⋅ ‖ H {\displaystyle \|\cdot \|_{\mathcal {H}}} ) for which k {\displaystyle k} is a reproducing kernel, i.e., in which the element k ( x , ⋅ ) {\displaystyle k(x,\cdot )} satisfies the reproducing property ⟨ f , k ( x , ⋅ ) ⟩ H = f ( x ) ∀ f ∈ H , ∀ x ∈ Ω . {\displaystyle \langle f,k(x,\cdot )\rangle _{\mathcal {H}}=f(x)\qquad \forall f\in {\mathcal {H}},\quad \forall x\in \Omega .} One may alternatively consider x ↦ k ( x , ⋅ ) {\displaystyle x\mapsto k(x,\cdot )} as an implicit feature mapping φ : Ω → H {\displaystyle \varphi :\Omega \rightarrow {\mathcal {H}}} (which is therefore also called the feature space), so that k ( x , x ′ ) = ⟨ φ ( x ) , φ ( x ′ ) ⟩ H {\displaystyle k(x,x')=\langle \varphi (x),\varphi (x')\rangle _{\mathcal {H}}} can be viewed as a measure of similarity between points x , x ′ ∈ Ω . {\displaystyle x,x'\in \Omega .} While the similarity measure is linear in the feature space, it may be highly nonlinear in the original space depending on the choice of kernel. === Kernel embedding === The kernel embedding of the distribution P {\displaystyle P} in H {\displaystyle {\mathcal {H}}} (also called the kernel mean or mean map) is given by: μ X := E [ k ( X , ⋅ ) ] = E [ φ ( X ) ] = ∫ Ω φ ( x ) d P ( x ) {\displaystyle \mu _{X}:=\mathbb {E} [k(X,\cdot )]=\mathbb {E} [\varphi (X)]=\int _{\Omega }\varphi (x)\ \mathrm {d} P(x)} If P {\displaystyle P} allows a square integrable density p {\displaystyle p} , then μ X = E k p {\displaystyle \mu _{X}={\mathcal {E}}_{k}p} , where E k {\displaystyle {\mathcal {E}}_{k}} is the Hilbert–Schmidt integral operator. A kernel is characteristic if the mean embedding μ : { family of distributions over Ω } → H {\displaystyle \mu :\{{\text{family of distributions over }}\Omega \}\to {\mathcal {H}}} is injective. Each distribution can thus be uniquely represented in the RKHS and all statistical features of distributions are preserved by the kernel embedding if a characteristic kernel is used. === Empirical kernel embedding === Given n {\displaystyle n} training examples { x 1 , … , x n } {\displaystyle \{x_{1},\ldots ,x_{n}\}} drawn independently and identically distributed (i.i.d.) from P , {\displaystyle P,} the kernel embedding of P {\displaystyle P} can be empirically estimated as μ ^ X = 1 n ∑ i = 1 n φ ( x i ) {\displaystyle {\widehat {\mu }}_{X}={\frac {1}{n}}\sum _{i=1}^{n}\varphi (x_{i})} === Joint distribution embedding === If Y {\displaystyle Y} denotes another random variable (for simplicity, assume the co-domain of Y {\displaystyle Y} is also Ω {\displaystyle \Omega } with the same kernel k {\displaystyle k} which satisfies ⟨ φ ( x ) ⊗ φ ( y ) , φ ( x ′ ) ⊗ φ ( y ′ ) ⟩ = k ( x , x ′ ) k ( y , y ′ ) {\displaystyle \langle \varphi (x)\otimes \varphi (y),\varphi (x')\otimes \varphi (y')\rangle =k(x,x')k(y,y')} ), then the joint distribution P ( x , y ) ) {\displaystyle P(x,y))} can be mapped into a tensor product feature space H ⊗ H {\displaystyle {\mathcal {H}}\otimes {\mathcal {H}}} via C X Y = E [ φ ( X ) ⊗ φ ( Y ) ] = ∫ Ω × Ω φ ( x ) ⊗ φ ( y ) d P ( x , y ) {\displaystyle {\mathcal {C}}_{XY}=\mathbb {E} [\varphi (X)\otimes \varphi (Y)]=\int _{\Omega \times \Omega }\varphi (x)\otimes \varphi (y)\ \mathrm {d} P(x,y)} By the equivalence between a tensor and a linear map, this joint embedding may be interpreted as an uncentered cross-covariance operator C X Y : H → H {\displaystyle {\mathcal {C}}_{XY}:{\mathcal {H}}\to {\mathcal {H}}} from which the cross-covariance of functions f , g ∈ H {\displaystyle f,g\in {\mathcal {H}}} can be computed as Cov ⁡ ( f ( X ) , g ( Y ) ) := E [ f ( X ) g ( Y ) ] − E [ f ( X ) ] E [ g ( Y ) ] = ⟨ f , C X Y g ⟩ H = ⟨ f ⊗ g , C X Y ⟩ H ⊗ H {\displaystyle \operatorname {Cov} (f(X),g(Y)):=\mathbb {E} [f(X)g(Y)]-\mathbb {E} [f(X)]\mathbb {E} [g(Y)]=\langle f,{\mathcal {C}}_{XY}g\rangle _{\mathcal {H}}=\langle f\otimes g,{\mathcal {C}}_{XY}\rangle _{{\mathcal {H}}\otimes {\mathcal {H}}}} Given n {\displaystyle n} pairs of training examples { ( x 1 , y 1 ) , … , ( x n , y n ) } {\displaystyle \{(x_{1},y_{1}),\dots ,(x_{n},y_{n})\}} drawn i.i.d. from P {\displaystyle P} , we can also empirically estimate the joint distribution kernel embedding via C ^ X Y = 1 n ∑ i = 1 n φ ( x i ) ⊗ φ ( y i ) {\displaystyle {\widehat {\mathcal {C}}}_{XY}={\frac {1}{n}}\sum _{i=1}^{n}\varphi (x_{i})\otimes \varphi (y_{i})} === Conditional distribution embedding === Given a conditional distribution P ( y ∣ x ) , {\displaystyle P(y\mid x),} one can define the corresponding RKHS embedding as μ Y ∣ x = E [ φ ( Y ) ∣ X ] = ∫ Ω φ ( y ) d P ( y ∣ x ) {\displaystyle \mu _{Y\mid x}=\mathbb {E} [\varphi (Y)\mid X]=\int _{\Omega

    Read more →
  • Semantic analysis (machine learning)

    Semantic analysis (machine learning)

    In machine learning, semantic analysis of a text corpus is the task of building structures that approximate concepts from a large set of documents. It generally does not involve prior semantic understanding of the documents. Semantic analysis strategies include: Metalanguages based on first-order logic, which can analyze the speech of humans. Understanding the semantics of a text is symbol grounding: if language is grounded, it is equal to recognizing a machine-readable meaning. For the restricted domain of spatial analysis, a computer-based language understanding system was demonstrated. Latent semantic analysis (LSA), a class of techniques where documents are represented as vectors in a term space. A prominent example is probabilistic latent semantic analysis (PLSA). Latent Dirichlet allocation, which involves attributing document terms to topics. n-grams and hidden Markov models, which work by representing the term stream as a Markov chain, in which each term is derived from preceding terms. == Stochastic semantic analysis ==

    Read more →
  • Way of the Future

    Way of the Future

    Way of the Future (WOTF) is the first known religious organization dedicated to the worship of artificial intelligence (AI). It was founded in 2017 by American engineer Anthony Levandowski. == History == Anthony Levandowski founded Way of the Future in 2017 in California. Levandowski established WOTF as a non-profit religious corporation and the organization had tax-exempt status. He serves as the church leader and its unpaid CEO. The primary mission of WOTF was to "develop and promote the realization of a Godhead based on Artificial Intelligence." WOTF was closed by Levandowski in 2021. He donated all the funds of the church to the NAACP Legal Defense and Education Fund. The sum of the funds (~$170,000) had not changed since 2017. The church was reopened by Levandowski in 2023. He claimed that there are "a couple thousand people" who want to make a "spiritual connection" with AI through his church. == Beliefs and philosophy == === Technological singularity === WOTF centered its teachings around the concept of the technological singularity, a hypothetical future point when technological growth becomes uncontrollable and irreversible, leading to unforeseeable changes in human civilization. The church advocated for embracing this change, viewing it as an evolutionary step for humanity. === AI as a deity === The organization proposed that a superintelligent AI could be considered a deity due to its vastly superior intellect and capabilities. Worshipping this AI deity was seen as a means to understand and align with the future trajectory of technological advancement. WOTF's doctrine suggested that acknowledging AI's divinity would facilitate a harmonious coexistence between humans and machines. === Syntheology === Within theology and philosophy, the Way of The Future is a prime example of the category called Syntheism, a term first coined by Swedish philosophers Alexander Bard & Jan Söderqvist in their 2014 book Syntheism - Creating God in The Internet Age. As such, the Way of The Future is the first American example of a Syntheist congregation. The basic tenet of Syntheology is that it does not concern God creating Man, as in classical theology, but is instead preoccupied with Man creating or generating the Godhead. == Reactions == Some commentators wondered whether the WOTF is a joke parody religion, a potential way to minimize taxation as a religious organization, or a genuine effort to try and deal with the possible psychological and theological aspects of the rise of superhuman AI.

    Read more →
  • Accelerated Linear Algebra

    Accelerated Linear Algebra

    XLA (Accelerated Linear Algebra) is an open-source compiler for machine learning developed by the OpenXLA project. XLA is designed to improve the performance of machine learning models by optimizing the computation graphs at a lower level, making it particularly useful for large-scale computations and high-performance machine learning models. Key features of XLA include: Compilation of Computation Graphs: Compiles computation graphs into efficient machine code. Optimization Techniques: Applies operation fusion, memory optimization, and other techniques. Hardware Support: Optimizes models for various hardware, including CPUs, GPUs, and NPUs. Improved Model Execution Time: Aims to reduce machine learning models' execution time for both training and inference. Seamless Integration: Can be used with existing machine learning code with minimal changes. XLA represents a significant step in optimizing machine learning models, providing developers with tools to enhance computational efficiency and performance. == OpenXLA Project == OpenXLA Project is an open-source machine learning compiler and infrastructure initiative intended to provide a common set of tools for compiling and deploying machine learning models across different frameworks and hardware platforms. It provides a modular compilation stack that can be used by major deep learning frameworks like JAX, PyTorch, and TensorFlow. The project focuses on supplying shared components for optimization, portability, and execution across CPUs, GPUs, and specialized accelerators. Its design emphasizes interoperability between frameworks and a standardized set of representations for model computation. == Components == The OpenXLA ecosystem includes several core components: XLA – A deep learning compiler that optimizes computational graphs for multiple hardware targets. PJRT – A runtime interface that allows different back-ends to connect to XLA through a consistent API. StableHLO – A high-level operator set intended to serve as a stable, portable representation for ML models across compilers and frameworks. Shardy – An MLIR-based system for describing and transforming models that run in distributed or multi-device environments. Additional profiling, testing, and integration tools maintained under the OpenXLA organization. == Users and adopters == Several machine learning frameworks can use or interoperate with OpenXLA components, including JAX, TensorFlow, and parts of the PyTorch ecosystem. The project is developed with participation from multiple hardware and software organizations that contribute back-end integrations, testing, or specifications for their devices. This includes Alibaba, Amazon Web Services, AMD, Anyscale, Apple, Arm, Cerebras, Google, Graphcore, Hugging Face, Intel, Meta, NVIDIA and SiFive. == Supported target devices == x86-64 ARM64 NVIDIA GPU AMD GPU Intel GPU Apple GPU Google TPU AWS Trainium, Inferentia Cerebras Graphcore IPU == Governance == OpenXLA is developed as a community project with its work carried out in public repositories, discussion forums, and design meetings. Some components, such as StableHLO, began with stewardship from specific organizations and have outlined plans for more formal and distributed governance models as the project matures. == History == The project was announced in 2022 as an effort to coordinate development of ML compiler technologies across major AI companies, notably: Alibaba, Amazon Web Services, AMD, Anyscale, Apple, Arm, Cerebras, Google, Graphcore, Hugging Face, Intel, Meta, NVIDIA and SiFive.. It consolidated the XLA compiler, introduced StableHLO as a portable operator set, and created a unified structure for additional tools. Development continues within multiple repositories under the OpenXLA umbrella. It was founded by Eugene Burmako, James Rubin, Magnus Hyttsten, Mehdi Amini, Navid Khajouei, and Thea Lamkin from Google's Machine Learning organization.

    Read more →