AI Paragraph Rewriter

AI Paragraph Rewriter — hands-on reviews, top picks, pricing, pros and cons and a practical how-to guide on Aizhi.

  • Private cloud computing infrastructure

    Private cloud computing infrastructure

    Private cloud computing infrastructure is a category of cloud computing that provides comparable benefits to public cloud systems, such as self-service and scalability, but it does so via a proprietary framework. In contrast to public clouds, which cater to multiple entities, a private cloud is specifically designed for the requirements and objectives of one organization. == Definition == A private cloud computing infrastructure constitutes a distinctive model of cloud computing that facilitates a secure and distinct cloud environment where only the intended client can function. It can either be physically housed in the organization's in-house data center or be managed by a third-party provider. In a private cloud, the infrastructure and services are always sustained on a private network, and both the hardware and software are devoted exclusively to a single organization. == History == The concept of private cloud infrastructure started to take shape around the mid-2000s, coinciding with the rise of other cloud computing forms. It came into existence as a solution to the shortcomings of public clouds, particularly concerns over data control, security, and network performance. IT departments began to mirror the automation and self-service features of the public cloud in their data centers. Over time, these services became more advanced, and private cloud technology has been refined to address businesses and organizations' diverse needs. == Architecture == Private cloud computing infrastructure generally involves a mix of hardware, network infrastructure, and virtualization software. The hardware, often referred to as a cloud server or cloud array, consists of a server rack or a collection of server racks containing the storage and processors that constitute the cloud. The virtualization software, such as Hyper-V, OpenStack, or VMWare, establishes and oversees virtual machines with which users interact. The network infrastructure connects the private cloud to users and may facilitate connectivity with other on-premises data centers or clouds. == Applications == Private cloud infrastructures are usually utilized by medium to large businesses and organizations that need robust control over their data, have extensive computing needs, or have specific regulatory or compliance obligations. This includes healthcare organizations, government agencies, financial institutions, and any business that needs to process and store large data volumes.

    Read more →
  • Histogram of oriented displacements

    Histogram of oriented displacements

    Histogram of oriented displacements (HOD) is a 2D trajectory descriptor. The trajectory is described using a histogram of the directions between each two consecutive points. Given a trajectory T = {P1, P2, P3, ..., Pn}, where Pt is the 2D position at time t. For each pair of positions Pt and Pt+1, calculate the direction angle θ(t, t+1). Value of θ is between 0 and 360. A histogram of the quantized values of θ is created. If the histogram is of 8 bins, the first bin represents all θs between 0 and 45. The histogram accumulates the lengths of the consecutive moves. For each θ, a specific histogram bin is determined. The length of the line between Pt and Pt+1 is then added to the specific histogram bin. To show the intuition behind the descriptor, consider the action of waving hands. At the end of the action, the hand falls down. When describing this down movement, the descriptor does not care about the position from which the hand started to fall. This fall will affect the histogram with the appropriate angles and lengths, regardless of the position where the hand started to fall. HOD records for each moving point: how much it moves in each range of directions. HOD has a clear physical interpretation. It proposes that, a simple way to describe the motion of an object, is to indicate how much distance it moves in each direction. If the movement in all directions are saved accurately, the movement can be repeated from the initial position to the final destination regardless of the displacements order. However, the temporal information will be lost, as the order of movements is not stored-this is what we solve by applying the temporal pyramid, as shown in section \ref{sec:temp-pyramid}. If the angles quantization range is small, classifiers that use the descriptor will overfit. Generalization needs some slack in directions-which can be done by increasing the quantization range.

    Read more →
  • Conditional random field

    Conditional random field

    Conditional random fields (CRFs) are a class of statistical modeling methods often applied in pattern recognition and machine learning and used for structured prediction. Whereas a classifier predicts a label for a single sample without considering "neighbouring" samples, a CRF can take context into account. To do so, the predictions are modelled as a graphical model, which represents the presence of dependencies between the predictions. The kind of graph used depends on the application. For example, in natural language processing, "linear chain" CRFs are popular, for which each prediction is dependent only on its immediate neighbours. In image processing, the graph typically connects locations to nearby and/or similar locations to enforce that they receive similar predictions. Other examples where CRFs are used are: labeling or parsing of sequential data for natural language processing or biological sequences, part-of-speech tagging, shallow parsing, named entity recognition, gene finding, peptide critical functional region finding, and object recognition and image segmentation in computer vision. == Description == CRFs are a type of discriminative undirected probabilistic graphical model. Lafferty, McCallum and Pereira define a CRF on observations X {\displaystyle {\boldsymbol {X}}} and random variables Y {\displaystyle {\boldsymbol {Y}}} as follows: Let G = ( V , E ) {\displaystyle G=(V,E)} be a graph such that Y = ( Y v ) v ∈ V {\displaystyle {\boldsymbol {Y}}=({\boldsymbol {Y}}_{v})_{v\in V}} , so that Y {\displaystyle {\boldsymbol {Y}}} is indexed by the vertices of G {\displaystyle G} . Then ( X , Y ) {\displaystyle ({\boldsymbol {X}},{\boldsymbol {Y}})} is a conditional random field when each random variable Y v {\displaystyle {\boldsymbol {Y}}_{v}} , conditioned on X {\displaystyle {\boldsymbol {X}}} , obeys the Markov property with respect to the graph; that is, its probability is dependent only on its neighbours in G and not its past states: P ( Y v | X , { Y w : w ≠ v } ) = P ( Y v | X , { Y w : w ∼ v } ) {\displaystyle P({\boldsymbol {Y}}_{v}|{\boldsymbol {X}},\{{\boldsymbol {Y}}_{w}:w\neq v\})=P({\boldsymbol {Y}}_{v}|{\boldsymbol {X}},\{{\boldsymbol {Y}}_{w}:w\sim v\})} , where w ∼ v {\displaystyle {\mathit {w}}\sim v} means that w {\displaystyle w} and v {\displaystyle v} are neighbors in G {\displaystyle G} . What this means is that a CRF is an undirected graphical model whose nodes can be divided into exactly two disjoint sets X {\displaystyle {\boldsymbol {X}}} and Y {\displaystyle {\boldsymbol {Y}}} , the observed and output variables, respectively; the conditional distribution p ( Y | X ) {\displaystyle p({\boldsymbol {Y}}|{\boldsymbol {X}})} is then modeled. === Inference === For general graphs, the problem of exact inference in CRFs is intractable. The inference problem for a CRF is basically the same as for an MRF and the same arguments hold. However, there exist special cases for which exact inference is feasible: If the graph is a chain or a tree, message passing algorithms yield exact solutions. The algorithms used in these cases are analogous to the forward-backward and Viterbi algorithm for the case of HMMs. If the CRF only contains pair-wise potentials and the energy is submodular, combinatorial min cut/max flow algorithms yield exact solutions. If exact inference is impossible, several algorithms can be used to obtain approximate solutions. These include: Loopy belief propagation Alpha expansion Mean field inference Linear programming relaxations === Parameter learning === Learning the parameters θ {\displaystyle \theta } is usually done by maximum likelihood learning for p ( Y i | X i ; θ ) {\displaystyle p(Y_{i}|X_{i};\theta )} . If all nodes have exponential family distributions and all nodes are observed during training, this optimization is convex. It can be solved for example using gradient descent algorithms, or Quasi-Newton methods such as the L-BFGS algorithm. On the other hand, if some variables are unobserved, the inference problem has to be solved for these variables. Exact inference is intractable in general graphs, so approximations have to be used. === Examples === In sequence modeling, the graph of interest is usually a chain graph. An input sequence of observed variables X {\displaystyle X} represents a sequence of observations and Y {\displaystyle Y} represents a hidden (or unknown) state variable that needs to be inferred given the observations. The Y i {\displaystyle Y_{i}} are structured to form a chain, with an edge between each Y i − 1 {\displaystyle Y_{i-1}} and Y i {\displaystyle Y_{i}} . As well as having a simple interpretation of the Y i {\displaystyle Y_{i}} as "labels" for each element in the input sequence, this layout admits efficient algorithms for: model training, learning the conditional distributions between the Y i {\displaystyle Y_{i}} and feature functions from some corpus of training data. decoding, determining the probability of a given label sequence Y {\displaystyle Y} given X {\displaystyle X} . inference, determining the most likely label sequence Y {\displaystyle Y} given X {\displaystyle X} . The conditional dependency of each Y i {\displaystyle Y_{i}} on X {\displaystyle X} is defined through a fixed set of feature functions of the form f ( i , Y i − 1 , Y i , X ) {\displaystyle f(i,Y_{i-1},Y_{i},X)} , which can be thought of as measurements on the input sequence that partially determine the likelihood of each possible value for Y i {\displaystyle Y_{i}} . The model assigns each feature a numerical weight and combines them to determine the probability of a certain value for Y i {\displaystyle Y_{i}} . Linear-chain CRFs have many of the same applications as conceptually simpler hidden Markov models (HMMs), but relax certain assumptions about the input and output sequence distributions. An HMM can loosely be understood as a CRF with very specific feature functions that use constant probabilities to model state transitions and emissions. Conversely, a CRF can loosely be understood as a generalization of an HMM that makes the constant transition probabilities into arbitrary functions that vary across the positions in the sequence of hidden states, depending on the input sequence. Notably, in contrast to HMMs, CRFs can contain any number of feature functions, the feature functions can inspect the entire input sequence X {\displaystyle X} at any point during inference, and the range of the feature functions need not have a probabilistic interpretation. == Variants == === Higher-order CRFs and semi-Markov CRFs === CRFs can be extended into higher order models by making each Y i {\displaystyle Y_{i}} dependent on a fixed number k {\displaystyle k} of previous variables Y i − k , . . . , Y i − 1 {\displaystyle Y_{i-k},...,Y_{i-1}} . In conventional formulations of higher order CRFs, training and inference are only practical for small values of k {\displaystyle k} (such as k ≤ 5), since their computational cost increases exponentially with k {\displaystyle k} . However, another recent advance has managed to ameliorate these issues by leveraging concepts and tools from the field of Bayesian nonparametrics. Specifically, the CRF-infinity approach constitutes a CRF-type model that is capable of learning infinitely-long temporal dynamics in a scalable fashion. This is effected by introducing a novel potential function for CRFs that is based on the Sequence Memoizer (SM), a nonparametric Bayesian model for learning infinitely-long dynamics in sequential observations. To render such a model computationally tractable, CRF-infinity employs a mean-field approximation of the postulated novel potential functions (which are driven by an SM). This allows for devising efficient approximate training and inference algorithms for the model, without undermining its capability to capture and model temporal dependencies of arbitrary length. There exists another generalization of CRFs, the semi-Markov conditional random field (semi-CRF), which models variable-length segmentations of the label sequence Y {\displaystyle Y} . This provides much of the power of higher-order CRFs to model long-range dependencies of the Y i {\displaystyle Y_{i}} , at a reasonable computational cost. Finally, large-margin models for structured prediction, such as the structured Support Vector Machine can be seen as an alternative training procedure to CRFs. === Latent-dynamic conditional random field === Latent-dynamic conditional random fields (LDCRF) or discriminative probabilistic latent variable models (DPLVM) are a type of CRFs for sequence tagging tasks. They are latent variable models that are trained discriminatively. In an LDCRF, like in any sequence tagging task, given a sequence of observations x = x 1 , … , x n {\displaystyle x_{1},\dots ,x_{n}} , the main problem the model must solve is how to assign a sequence of labels y = y 1 , … , y n {\displaystyle y_{1},\dots ,y_{n}} from one finite set

    Read more →
  • Character computing

    Character computing

    Character computing is a trans-disciplinary field of research at the intersection of computer science and psychology. It is any computing that incorporates the human character within its context. Character is defined as all features or characteristics defining an individual and guiding their behavior in a specific situation. It consists of stable trait markers (e.g., personality, background, history, socio-economic embeddings, culture,...) and variable state markers (emotions, health, cognitive state, ...). Character computing aims at providing a holistic psychologically driven model of human behavior. It models and predicts behavior based on the relationships between a situation and character. Three main research modules fall under the umbrella of character computing: character sensing and profiling, character-aware adaptive systems, and artificial characters. == Overview == Character computing can be viewed as an extension of the well-established field of affective computing. Based on the foundations of the different psychology branches, it advocates defining behavior as a compound attribute that is not driven by either personality, emotions, situation or cognition alone. It rather defines behavior as a function of everything that makes up an individual i.e., their character and the situation they are in. Affective computing aims at allowing machines to understand and translate the non-verbal cues of individuals into affect. Accordingly, character computing aims at understanding the character attributes of an individual and the situation to translate it to predicted behavior, and vice versa. ''In practical terms, depending on the application context, character computing is a branch of research that deals with the design of systems and interfaces that can observe, sense, predict, adapt to, affect, understand, or simulate the following: character based on behavior and situation, behavior based on character and situation, or situation based on character and behavior.'' The Character-Behavior-Situation (CBS) triad is at the core of character computing and defines each of the three edges based on the other two. Character computing relies on simultaneous development from a computational and psychological perspective and is intended to be used by researchers in both fields. Its main concept is aligning the computational model of character computing with empirical results from in-lab and in-the-wild psychology experiments. The model is to be continuously built and validated through the emergence of new data. Similar to affective and personality computing, the model is to be used as a base for different applications towards improving user experience. == History == Character computing as such was first coined in its first workshop in 2017. Since then it has had 3 international workshops and numerous publications. Despite its young age, it has already drawn some interest in the research community, leading to the publication of the first book under the same title in early 2020 published by Springer Nature. Research that can be categorized under the field dates much older than 2017. The notion of combining several factors towards the explanation of behavior or traits and states has long been investigated in both Psychology and Computer Science, for example. == Character == The word character originates from the Greek word meaning “stamping tool”, referring to distinctive features and traits. Over the years it has been given many different connotations, like the moral character in philosophy, the temperament in psychology, a person in literature or an avatar in various virtual worlds, including video games. According to character computing character is a unification of all the previous definitions, by referring back to the original meaning of the word. Character is defined as the holistic concept representing all interacting trait and state markers that distinguish an individual. Traits are characteristics that mainly remain stable over time. Traits include personality, affect, socio-demographics, and general health. States are characteristics that vary in short periods of time. They include emotions, well-being, health, cognitive state. Each characteristic has many representation methods and psychological models. The different models can be combined or one model can be preset for each characteristic. This depends on the use-case and the design choices. == Areas == Research into character computing can be divided into three areas, which complement each other but can each be investigated separately. The first area is sensing and predicting character states and traits or ensuing behavior. The second area is adapting applications to certain character states or traits and the behavior they predict. It also deals with trying to change or monitor such behavior. The final area deals with creating artificial agents e.g., chatbots or virtual reality avatars that exhibit certain characteristics. The three areas are investigated separately and build on existing findings in the literature. The results of each of the three areas can also be used as a stepping stone for the next area. Each of the three areas has already been investigated on its own in different research fields with focus on different subsets of character. For example, affective computing and personality computing both cover different areas with a focus on some character components without the others to account for human behavior. == The Character-Behavior-Situation triad == Character computing is based on a holistic psychologically driven model of human behavior. Human behavior is modeled and predicted based on the relationships between a situation and a human's character. To further define character in a more formal or holistic manner, we represent it in light of the Character–Behavior–Situation triad. This highlights that character not only determines who we are but how we are, i.e., how we behave. The triad investigated in Personality Psychology is extended through character computing to the Character–Behavior–Situation triad. Any member of the CBS triad is a function of the two other members, e.g., given the situation and personality, the behavior can be predicted. Each of the components in the triad can be further decomposed into smaller units and features that may best represent the human's behavior or character in a particular situation. Character is thus behind a person's behavior in any given situation. While this is a causality relation, the correlation between the three components is often more easily used to predict the components that are most difficult to measure from those measured more easily. There are infinitely many components to include in the representation of any of C, B, and S. The challenge is always to choose the smallest subset needed for prediction of a person's behavior in a particular situation.

    Read more →
  • Tokken

    Tokken

    Tokken is a payment system and mobile app most known for being a legal and secure option for businesses transactions within the cannabis industry, because of its compliance with bank requirements. The startup company was created by Lamine Zarrad, a former regulator at the Office of the Comptroller of the Currency. == Operability == In order for a person to start using the app, they need to provide evidence, in the form of bioidentification data and mobile carrier records, that they can legally purchase weed. After they have been verified, customers can pay directly through the app at any dispensary that is using Tokken. Tokken turns credit card transactions into a digital token, which can be exchanged back for money that can later be deposited into a bank account. All transactions are logged publicly through a blockchain leger, making the process both anonymous and verified. === Banking services === Tokken has a "pay taxes" function which enables dispensaries to pay their taxes directly to the department.

    Read more →
  • Human–AI interaction

    Human–AI interaction

    Human–AI interaction is a developing field of research and a sub-field of human–computer interaction (HCI). HCI is a field of research that explores the interactions between humans and computer-based technology, focusing on design implementation, user experience, and psychological factors. With the proliferation of artificial intelligence (AI), there has developed a sub-section of HCI research dedicated specifically to artificial intelligence and how people interact with and are impacted by it. This is human–AI interaction, abbreviated either as HAX or HAII. == Introduction == Artificial intelligence (AI), in general, has fluid definitions and varied research applications, but in brief can be applied to mechanizing tasks that would require human intelligence to complete. AI are tools designed to replicate the human abilities of navigating uncertainty, active learning, and processing information in different contexts. Within the context of HCI and HAX research, artificial intelligence can be broken into two sub-fields, natural language processing (NLP) and computer vision (CV). AI technologies notably include machine-learning, deep-learning and neural networks, and large-language models (LLMs). As a new and rapidly developing technology, AI is changing how computers work and therefore changing how humans interact with computers. Unlike the traditional human-computer interaction, where a human directs a machine, human-AI interaction is characterized by a more collaborative relationship between the computer program (the AI) and the human user, as AI is perceived as an active agent rather than a tool. This changing dynamic creates new questions and necessitates new research methods that are not present in traditional HCI research. According to a scoping review on the state of the discipline, the HAX field comprises research on the "design, development, and evaluation of AI systems" and encompasses the themes of human-AI collaboration, human-AI competition, human-AI conflict, and human-AI symbiosis. == Design == Machine learning and artificial intelligence have been used for decades in targeted advertising and to recommend content in social media. Ethical Guidelines (Framework for ethical AI development) == User Experience (UX) == This section should handle research on how users interact with tools. What techniques do they use, do they develop habits, what types of programs and devices are they using to access these tools, what do they use these tools to do exactly. === Cognitive Frameworks in AI Tool Users === AI has been viewed with various expectations, attributions, and often misconceptions. Many people exclusively understand AI as the LLM chatbots they interact with, like ChatGPT or Claude, or other generative AI programs. [Insert section: discuss how people interact with these specific AI tools as a connection to the following paragraphs] Most fundamentally, humans have a mental model of understanding AI's reasoning and motivation for its decision recommendations, and building a holistic and precise mental model of AI helps people create prompts to receive more valuable responses from AI. However, these mental models are not whole because people can only gain more information about AI through their limited interaction with it; more interaction with AI builds a better mental model that a person may build to produce better prompt outcomes. Research on human-AI interaction has emphasized that users develop mental models of AI systems and revise those models through repeated use, feedback, and explanation, while design research has stressed the importance of communicating capabilities and limitations early and supporting trust calibration through explanation and correction. In a 2025 SSRN working paper, John DeVadoss proposed "Hypothetico-Deductive Interaction" (HDI), a framework that describes human-AI interaction as a mutual process of conjecture and refutation in which users test assumptions about an AI system's capabilities while the system infers and updates assumptions about user goals through its responses and clarifying questions. DeVadoss argued that this framing helps explain prompt iteration, weak capability awareness, and trust miscalibration, and suggested design responses such as clearer communication of uncertainty, easier correction, actionable explanations, and safer failure modes. == Research themes == === Human-AI collaboration === Human-AI collaboration occurs when the human and AI supervise the task on the same level and extent to achieve the same goal. Some collaboration occurs in the form of augmenting human capability. AI may help human ability in analysis and decision-making through providing and weighing a volume of information, and learning to defer to the human decision when it recognizes its unreliability. It is especially beneficial when the human can detect a task that AI can be trusted to make few errors so that there is not a lot of excessive checking process required on the human's end. Some findings show signs of human-AI augmentation, or human–AI symbiosis, in which AI enhances human ability in a way that co-working on a task with AI produces better outcomes than a human working alone. For example: the quality and speed of customer service tasks increase when a human agent collaborates with AI, training on specific models allows AI to improve diagnoses in clinical settings, and AI with human-intervention can improve creativity of artwork while fully AI-generated haikus were rated negatively. Human-AI synergy, a concept in which human-AI collaboration would produce more optimal outcomes than either human or AI working alone could explain why AI does not always help with performance. Some AI features and development may accelerate human-AI synergy, while others may stagnate it. For example, when AI updates for better performance, it sometimes worsens the team performance with human and AI by reducing the compatibility with the new model and the mental model a user has developed on the previous version. Research has found that AI often supports human capabilities in the form of human-AI augmentation and not human-AI synergy, potentially because people rely too much on AI and stop thinking on their own. Prompting people to actively engage in analysis and think when to follow AI recommendations reduces their over-reliance, especially for individuals with higher need for cognition. === Human-AI competition === Robots and computers have substituted routine tasks historically completed by humans, but agentic AI has made it possible to also replace cognitive tasks including taking phone calls for appointments and driving a car. At the point of 2016, research has estimated that 45% of paid activities could be replaced by AI by 2030. Perceived autonomy of robots is known to increase people's negative attitude toward them, and worry about the technology taking over leads people to reject it. There has been a consistent tendency of algorithm aversion in which people prefer human advice over AI advice. However, people are not always able to tell apart tasks completed by AI or other humans. See AI takeover for more information. It is also notable that this sentiment is more prominent in the Western cultures as Westerners tend to show less positive views about AI compared to East Asians. == Research on the psychological impacts of AI == === Perception on others who use AI === As much as people perceive and make judgment about AI itself, they also form impressions of themselves and others who use AI. In the workplace, employees who disclose the use of AI in their tasks are more likely to receive feedback that they are not as hardworking as those who are in the same job who receive non-AI help to complete the same tasks. AI use disclosure diminishes the perceived legitimacy in the employee's task and decision making which ultimately leads observers to distrust people who use AI. Although these negative effects of AI use disclosure are weakened by the observers who use AI frequently themselves, the effect is still not attenuated by the observers' positive attitude towards AI. === Bias, AI, and human === Although AI provides a wide range of information and suggestions to its users, AI itself is not free of biases and stereotypes, and it does not always help people reduce their cognitive errors and biases. People are prone to such errors by failing to see other potential ideas and cases that are not listed by AI responses and committing to a decision suggested by AI that directly contradicts the correct information and directions that they are already aware of. Gender bias is also reflected as the female gendering of AI technologies which conceptualizes females as a helpful assistant. == Emotional connection with AI == Human-AI interaction has been theorized in the context of interpersonal relationships mainly in social psychology, communications and media studies, and as a technology interface through the lens of hu

    Read more →
  • Data-driven model

    Data-driven model

    Data-driven models are a class of computational models that primarily rely on historical data collected throughout a system's or process' lifetime to establish relationships between input, internal, and output variables. Commonly found in numerous articles and publications, data-driven models have evolved from earlier statistical models, overcoming limitations posed by strict assumptions about probability distributions. These models have gained prominence across various fields, particularly in the era of big data, artificial intelligence, and machine learning, where they offer valuable insights and predictions based on the available data. == Background == These models have evolved from earlier statistical models, which were based on certain assumptions about probability distributions that often proved to be overly restrictive. The emergence of data-driven models in the 1950s and 1960s coincided with the development of digital computers, advancements in artificial intelligence research, and the introduction of new approaches in non-behavioural modelling, such as pattern recognition and automatic classification. == Key Concepts == Data-driven models encompass a wide range of techniques and methodologies that aim to intelligently process and analyse large datasets. Examples include fuzzy logic, fuzzy and rough sets for handling uncertainty, neural networks for approximating functions, global optimization and evolutionary computing, statistical learning theory, and Bayesian methods. These models have found applications in various fields, including economics, customer relations management, financial services, medicine, and the military, among others. Machine learning, a subfield of artificial intelligence, is closely related to data-driven modelling as it also focuses on using historical data to create models that can make predictions and identify patterns. In fact, many data-driven models incorporate machine learning techniques, such as regression, classification, and clustering algorithms, to process and analyse data. In recent years, the concept of data-driven models has gained considerable attention in the field of water resources, with numerous applications, academic courses, and scientific publications using the term as a generalization for models that rely on data rather than physics. This classification has been featured in various publications and has even spurred the development of hybrid models in the past decade. Hybrid models attempt to quantify the degree of physically based information used in hydrological models and determine whether the process of building the model is primarily driven by physics or purely data-based. As a result, data-driven models have become an essential topic of discussion and exploration within water resources management and research. The term "data-driven modelling" (DDM) refers to the overarching paradigm of using historical data in conjunction with advanced computational techniques, including machine learning and artificial intelligence, to create models that can reveal underlying trends, patterns, and, in some cases, make predictions Data-driven models can be built with or without detailed knowledge of the underlying processes governing the system behavior, which makes them particularly useful when such knowledge is missing or fragmented.

    Read more →
  • Audio inpainting

    Audio inpainting

    Audio inpainting (also known as audio interpolation) is an audio restoration task which deals with the reconstruction of missing or corrupted portions of a digital audio signal. Inpainting techniques are employed when parts of the audio have been lost due to various factors such as transmission errors, data corruption or errors during recording. The goal of audio inpainting is to fill in the gaps (i.e., the missing portions) in the audio signal seamlessly, making the reconstructed portions indistinguishable from the original content and avoiding the introduction of audible distortions or alterations. Many techniques have been proposed to solve the audio inpainting problem and this is usually achieved by analyzing the temporal and spectral information surrounding each missing portion of the considered audio signal. Classic methods employ statistical models or digital signal processing algorithms to predict and synthesize the missing or damaged sections. Recent solutions, instead, take advantage of deep learning models, thanks to the growing trend of exploiting data-driven methods in the context of audio restoration. Depending on the extent of the lost information, the inpainting task can be divided in three categories. Short inpainting refers to the reconstruction of few milliseconds (approximately less than 10) of missing signal, that occurs in the case of short distortions such as clicks or clipping. In this case, the goal of the reconstruction is to recover the lost information exactly. In long inpainting instead, with gaps in the order of hundreds of milliseconds or even seconds, this goal becomes unrealistic, since restoration techniques cannot rely on local information. Therefore, besides providing a coherent reconstruction, the algorithms need to generate new information that has to be semantically compatible with the surrounding context (i.e., the audio signal surrounding the gaps). The case of medium duration gaps lays between short and long inpainting. It refers to the reconstruction of tens of millisecond of missing data, a scale where the non-stationary characteristic of audio already becomes important. == Definition == Consider a digital audio signal x {\displaystyle \mathbf {x} } . A corrupted version of x {\displaystyle \mathbf {x} } , which is the audio signal presenting missing gaps to be reconstructed, can be defined as x ~ = m ∘ x {\displaystyle \mathbf {\tilde {x}} =\mathbf {m} \circ \mathbf {x} } , where m {\displaystyle \mathbf {m} } is a binary mask encoding the reliable or missing samples of x {\displaystyle \mathbf {x} } , and ∘ {\displaystyle \circ } represents the element-wise product. Audio inpainting aims at finding x ^ {\displaystyle \mathbf {\hat {x}} } (i.e., the reconstruction), which is an estimation of x {\displaystyle \mathbf {x} } . This is an ill-posed inverse problem, which is characterized by a non-unique set of solutions. For this reason, similarly to the formulation used for the inpainting problem in other domains, the reconstructed audio signal can be found through an optimization problem that is formally expressed as x ^ ∗ = argmin X ^ L ( m ∘ x ^ , x ~ ) + R ( x ^ ) {\displaystyle \mathbf {\hat {x}} ^{}={\underset {\hat {\mathbf {X} }}{\text{argmin}}}~L(\mathbf {m} \circ \mathbf {\hat {x}} ,\mathbf {\tilde {x}} )+R(\mathbf {\hat {x}} )} . In particular, x ^ ∗ {\displaystyle \mathbf {\hat {x}} ^{}} is the optimal reconstructed audio signal and L {\displaystyle L} is a distance measure term that computes the reconstruction accuracy between the corrupted audio signal and the estimated one. For example, this term can be expressed with a mean squared error or similar metrics. Since L {\displaystyle L} is computed only on the reliable frames, there are many solutions that can minimize L ( m ∘ x ^ , x ~ ) {\displaystyle L(\mathbf {m} \circ \mathbf {\hat {x}} ,\mathbf {\tilde {x}} )} . It is thus necessary to add a constraint to the minimization, in order to restrict the results only to the valid solutions. This is expressed through the regularization term R {\displaystyle R} that is computed on the reconstructed audio signal x ^ {\displaystyle \mathbf {\hat {x}} } . This term encodes some kind of a-priori information on the audio data. For example, R {\displaystyle R} can express assumptions on the stationarity of the signal, on the sparsity of its representation or can be learned from data. == Techniques == There exist various techniques to perform audio inpainting. These can vary significantly, influenced by factors such as the specific application requirements, the length of the gaps and the available data. In the literature, these techniques are broadly divided in model-based techniques (sometimes also referred as signal processing techniques) and data-driven techniques. === Model-based techniques === Model-based techniques involve the exploitation of mathematical models or assumptions about the underlying structure of the audio signal. These models can be based on prior knowledge of the audio content or statistical properties observed in the data. By leveraging these models, missing or corrupted portions of the audio signal can be inferred or estimated. An example of a model-based techniques are autoregressive models. These methods interpolate or extrapolate the missing samples based on the neighboring values, by using mathematical functions to approximate the missing data. In particular, in autoregressive models the missing samples are completed through linear prediction. The autoregressive coefficients necessary for this prediction are learned from the surrounding audio data, specifically from the data adjacent to each gap. Some more recent techniques approach audio inpainting by representing audio signals as sparse linear combinations of a limited number of basis functions (as for example in the Short Time Fourier Transform). In this context, the aim is to find the sparse representation of the missing section of the signal that most accurately matches the surrounding, unaffected signal. The aforementioned methods exhibit optimal performance when applied to filling in relatively short gaps, lasting only a few tens of milliseconds, and thus they can be included in the context of short inpainting. However, these signal-processing techniques tend to struggle when dealing with longer gaps. The reason behind this limitation lies in the violation of the stationarity condition, as the signal often undergoes significant changes after the gap, making it substantially different from the signal preceding the gap. As a way to overcome these limitations, some approaches add strong assumptions also about the fundamental structure of the gap itself, exploiting sinusoidal modeling or similarity graphs to perform inpainting of longer missing portions of audio signals. === Data-driven techniques === Data-driven techniques rely on the analysis and exploitation of the available audio data. These techniques often employ deep learning algorithms that learn patterns and relationships directly from the provided data. They involve training models on large datasets of audio examples, allowing them to capture the statistical regularities present in the audio signals. Once trained, these models can be used to generate missing portions of the audio signal based on the learned representations, without being restricted by stationarity assumptions. Data-driven techniques also offer the advantage of adaptability and flexibility, as they can learn from diverse audio datasets and potentially handle complex inpainting scenarios. As of today, such techniques constitute the state-of-the-art of audio inpainting, being able to reconstruct gaps of hundreds of milliseconds or even seconds. These performances are made possible by the use of generative models that have the capability to generate novel content to fill in the missing portions. For example, generative adversarial networks, which are the state-of-the-art of generative models in many areas, rely on two competing neural networks trained simultaneously in a two-player minmax game: the generator produces new data from samples of a random variable, the discriminator attempts to distinguish between generated and real data. During the training, the generator's objective is to fool the discriminator, while the discriminator attempts to learn to better classify real and fake data. In GAN-based inpainting methods the generator acts as a context encoder and produces a plausible completion for the gap only given the available information surrounding it. The discriminator is used to train the generator and tests the consistency of the produced inpainted audio. Recently, also diffusion models have established themselves as the state-of-the-art of generative models in many fields, often beating even GAN-based solutions. For this reason they have also been used to solve the audio inpainting problem, obtaining valid results. These models generate new data instances by inverting the

    Read more →
  • Intelligent database

    Intelligent database

    Until the 1980s, databases were viewed as computer systems that stored record-oriented and business data such as manufacturing inventories, bank records, and sales transactions. A database system was not expected to merge numeric data with text, images, or multimedia information, nor was it expected to automatically notice patterns in the data it stored. In the late 1980s the concept of an intelligent database was put forward as a system that manages information (rather than data) in a way that appears natural to users and which goes beyond simple record keeping. The term was introduced in 1989 by the book Intelligent Databases by Kamran Parsaye, Mark Chignell, Setrag Khoshafian and Harry Wong. The concept postulated three levels of intelligence for such systems: high level tools, the user interface and the database engine. The high level tools manage data quality and automatically discover relevant patterns in the data with a process called data mining. This layer often relies on the use of artificial intelligence techniques. The user interface uses hypermedia in a form that uniformly manages text, images and numeric data. The intelligent database engine supports the other two layers, often merging relational database techniques with object orientation. In the twenty-first century, intelligent databases have now become widespread, e.g. hospital databases can now call up patient histories consisting of charts, text and x-ray images just with a few mouse clicks, and many corporate databases include decision support tools based on sales pattern analysis.

    Read more →
  • Hugging Face

    Hugging Face

    Hugging Face, Inc., is an American company based in New York City that develops computation tools for building applications using machine learning. Its transformers library built for natural language processing applications and its platform allow users to share machine learning models and datasets and showcase their work. == History == === Founding === The company was founded in 2016 by French entrepreneurs Clément Delangue, Julien Chaumond, and Thomas Wolf in New York City, originally as a company that developed a chatbot app targeted at teenagers. The company was named after the U+1F917 🤗 HUGGING FACE emoji. After open sourcing the model behind the chatbot, the company pivoted to focus on being a platform for machine learning. === AI boom === On April 28, 2021, the company launched the BigScience Research Workshop in collaboration with several other research groups to release an open large language model. In 2022, the workshop concluded with the announcement of BLOOM, a multilingual large language model with 176 billion parameters. In February 2023, the company announced partnership with Amazon Web Services (AWS) which would allow Hugging Face's products to be available to AWS customers to use them as the building blocks for their custom applications. The company also said the next generation of BLOOM will be run on Trainium, a proprietary machine learning chip created by AWS. In June 2024, the company announced, along with Meta and Scaleway, their launch of a new AI accelerator program for European startups. The initiative aimed to help startups integrate open foundation models into their products, accelerating the EU AI ecosystem. The program, based at STATION F in Paris, ran from September 2024 to February 2025. Selected startups received mentoring, and access to AI models and tools and Scaleway's computing power. On September 23, 2024, to further the International Decade of Indigenous Languages, Hugging Face teamed up with Meta and UNESCO to launch a new online language translator. It was built on Meta's No Language Left Behind open-source AI model, enabling free text translation across 200 languages, including many low-resource languages. In April 2025, Hugging Face announced that they acquired a humanoid robotics startup, Pollen Robotics, based in France and founded by Matthieu Lapeyre and Pierre Rouanet in 2016. In an X tweet, Delangue shared his vision to "make Artificial Intelligence robotics Open Source". === Cyberattacks === In early 2026, hackers hijacked the Hugging Face platform to launch Android-targeted attacks involving "powerful malware" which could completely take over a compromised target.

    Read more →
  • AlphaChip (controversy)

    AlphaChip (controversy)

    The AlphaChip controversy refers to a series of public, scholarly, and legal disputes surrounding a 2021 Nature paper by Google-affiliated researchers. The paper describes an approach to macro placement, a stage of chip floorplanning, based on reinforcement learning (RL), a machine learning method in which a system iteratively improves its decisions by optimizing performance-based reward signals. The primary technical question is whether the new techniques are better than existing (non-AI) techniques. Both internal Google studies and external attempts to replicate the algorithm have failed to show the claimed benefits. No head-to-head comparison is available because the data used in the paper is proprietary, and Google has not released any results from running its algorithm on public benchmarks. This has resulted in considerable skepticism over the paper's claims. In addition, the inability of others (both inside and outside of Google) to replicate the claimed results have sparked concerns about the paper’s methodology, reproducibility, and scientific integrity. The lead researchers of the Nature paper were affiliated with Google Brain, which became part of Google DeepMind, and later spun off into the company Ricursive. == Motivation for research: Macro placement in chip layout == Chip design for modern integrated circuits is a complex, expert-driven process that relies on electronic design automation. It determines the performance of the final chip, and takes weeks or months to complete. Advances that produce better designs, or complete the process faster, are commercially and academically significant. Macro placement is a step during chip design that determines the locations of large circuit components (macros) within a chip. It is followed by detailed placement, which places the far more numerous but much smaller standard cells. Alternatively, mixed-size placement simultaneously places both large macros and millions of small cells, requiring algorithms to handle objects that differ by several orders of magnitude in area and mobility. The number of macros per circuit typically ranges from several to thousands. Wiring must be performed after placement, and the details of this wiring strongly influence the power, performance, and area (PPA) of the completed chip. The full wiring calculation is very resource intensive, so placement tools typically use a proxy cost, a simplified objective function used to guide the placement algorithm during training and evaluation. The faithfulness of the chosen proxy cost to the final objective cost is a critical aspect of placer performance. === State of the art as of 2021 === Chips have been designed since the 1960s, so there were many existing methods as of 2021. Available options included manual design, academic tools, and commercial offerings. Academic methods include combinatorial optimization techniques such as simulated annealing, analytical placement, hierarchical heuristics, and as of 2019 reinforcement learning and broader machine learning techniques.. Existing (non-AI) academic tools for solving the same problem include APlace, NTUplace3, ePlace, RePlace, and DREAMPlace. Commercial EDA vendors also offered automated software tools for floorplanning and mixed-size placement. For instance, as of 2019 Cadence’s Innovus implementation software offered a Concurrent Macro Placer (CMP) feature to automatically place large blocks and standard cells. == The 2021 Nature paper and its claims == In 2021, Nature published a paper under the title “A graph‑placement methodology for fast chip design” co‑authored by 21 Google-affiliated researchers. The paper reported that an RL agent could generate macro placements for integrated circuits "in under six hours" and achieve improvements over human-designed layouts in power, timing performance, and area (PPA), standard chip-quality metrics referring respectively to energy consumption, chip operating speed, and silicon footprint (evaluated after wire routing). It introduced a sequential macro placement algorithm in which macros are placed one at a time instead of optimizing their locations concurrently. At each step, the algorithm selects a location for a single macro on a discretized chip canvas, conditioning its decision on the placements of previously placed macros. This sequential formulation converts macro placement into a long-horizon decision process in which early placement choices constrain later ones. After macro placement, force-directed placement is applied to place standard cells connected to the macros. Deep reinforcement learning is used to train a policy network to place macros by maximizing a reward that reflects final placement quality (for example, wirelength and congestion). Policy learning occurs during self‑play for one or multiple circuit designs. Further placement optimizations refine the overall layout by balancing wirelength, density, and overlap constraints, while treating the macro locations produced by the RL policy as fixed obstacles. The approach relies on pre-training, in which the RL model is first trained on a corpus of prior designs (twenty in the Nature paper) to learn general placement patterns before being fine-tuned on a specific chip. Circuit examples used in the study were parts of proprietary Google TPU designs, called blocks (or floorplan partitions). The paper reported results on five blocks and described the approach as generalizable across chip designs. == Controversy == Soon after the paper's publication, controversy arose over whether the claims were true, whether they were sufficiently proven, and whether academic standards were followed. These controversies arose both within Google and among external academic experts. === Internal dispute at Google and legal proceedings === In 2022, Satrajit Chatterjee, a Google engineer involved in reviewing the AlphaChip work, raised concerns internally and drafted an alternative analysis, (Stronger Baselines) arguing that established methods outperformed the RL approach under fair comparison. In March 2022, Google declined to publish this analysis and terminated Chatterjee's employment. Chatterjee filed a wrongful dismissal lawsuit, alleging that representations related to the AlphaChip research involved fraud and scientific misconduct. According to court documents, Chatterjee's study was conducted "in the context of a large potential Google Cloud deal". He noted that it "would have been unethical to imply that we had revolutionary technology when our tests showed otherwise" and claimed Google was deliberately withholding material information. Furthermore, the committee that reviewed his paper and disapproved its publication was allegedly chaired by subordinates of Jeff Dean, a senior co-author of the Nature paper. Google’s subsequent motion to dismiss was denied, holding that Chatterjee had plausibly alleged retaliation for refusing to engage in conduct he believed would violate state or federal law. === External controversy === The external questions can be summarized in four main points: (a) Are the claims supported by the evidence provided? (b) Did the paper provide enough information to allow the results to be independently reproduced and verified? If so, are the results an improvement over existing academic and commercial tools? (c) Were the comparisons in the paper done fairly and with full disclosure? (d) Were academic standards followed? Each of these is discussed below. ==== Are the claims supported by the evidence provided? ==== The Nature paper described the reduction in design-process time as going from "days or weeks" to "hours", but did not provide per-design time breakdowns or specify the number of engineers, their level of expertise, or the baseline tools and workflow against which this comparison was made. It was also unclear whether the "days or weeks" baseline included time spent on other tasks such as functional design changes. The paper also evaluated the method on fewer benchmarks (five) than is common in the field, and showed mixed results across different evaluation goals While the approach was described as improving circuit area, this claim seems unsupported, as the RL optimization did not alter the overall circuit area, as it adjusted only the locations of fixed-shape non-overlapping circuit components within a fixed rectangular layout boundary. ==== Comparison with existing methods, and replicating the algorithm ==== Because macro placement is largely geometric and its fundamental algorithms are not tied to a specific process node, competing approaches can be evaluated on public benchmarks (tests) across technologies, rather than primarily on proprietary internal designs. This is standard procedure when comparing academic placers, see . In contrast, Google has only reported results only on internal proprietary designs, and as of 2026 has not offered comparisons with prior methods on common benchmarks. Researchers at the University of Califor

    Read more →
  • Sparkles emoji

    Sparkles emoji

    The Sparkles emoji (U+2728 ✨ SPARKLES) is an emoji that has one large star surrounded by smaller stars. Originating from Japan to represent sparkles used in anime and manga, the sparkles are often used as emphasis in text by surrounding words or phrases with it. It is the third most-used emoji in the world on Twitter as of 2021. Since the early 2020s it has been used by major software companies to represent artificial intelligence, marketing the technology as "like magic". == Development == According to Emojipedia, the Sparkles emoji was first used by Japanese mobile operators SoftBank, Docomo and au in the late 1990s. The emoji was added to Unicode 6.0 in 2010 and Emoji 1.0 in 2015. On some platforms the Sparkles emoji has been multicoloured whilst on other platforms it has been one colour. Twitter and Microsoft's Sparkles have changed from being multicoloured to being a single colour. Samsung's version of the emoji previously had a night sky in the background. == Usage == === Interpersonal communication === The Sparkles emoji was originally meant to represent the usage of sparkles in Japanese anime and manga, where the sparkles are used to represent beauty, happiness or awe. The emoji has several meanings and depends upon context. Starting in the late 2010s, the emoji started being used to surround words or phrases to be used as emphasis, an example from the book Because Internet being "I would simply ✨pass away✨". It can also be used as sarcasm, irony or as a way to mock people. Without emoji this could be represented with tildes or asterisks, for example, "~tildes~" or "~asterisk plus tilde~" or "~~true sparkle exuberance~~". The sparkles emoji can be used to represent stars in text, be used to represent cleanliness or can be used to mean "orgasm" whilst sexting. In September 2021 the Sparkles emoji overtook the Pleading Face (🥺) emoji to become the third most-used emoji in the world according to Emojipedia, with approximately 1 per cent of all tweets containing the Sparkles emoji. === Artificial intelligence === In the early 2020s, the Sparkles emoji started being used as an icon to represent artificial intelligence (AI). Companies who use the emoji this way include Google, OpenAI, Samsung, Microsoft, Adobe, Spotify and Zoom. As of August 2024, seven of the top 10 software companies by market capitalisation use the Sparkles emojis with AI. OpenAI has different versions of the Sparkles for different versions of the models that ChatGPT uses. One explanation is that Sparkles is being used by these companies as a way to market AI as "magic". Marketing technology as "magic" has been used before AI, particularly by Apple. Another explanation given by designers and marketers choosing to use Sparkles to signify AI is simply that other platforms are doing it, making it familiar to users. Around 2024, some of these companies started removing two of the smaller stars from the emoji in their AI services and have kept the one large star, an example being Google's Gemini chatbot. In early 2024, the Nielsen Norman Group provided test subjects with the star in isolation and found that people did not associate the symbol with AI, but instead mostly with "optimisation" or "favourite or save an item".

    Read more →
  • AsoSoft text corpus

    AsoSoft text corpus

    The AsoSoft text corpus is the first large-scale Kurdish text corpus, collected and processed by the AsoSoft research and development group. It contains 458,000 documents (188 million tokens) that are collected from sources such as websites, news agencies, books, and magazines. The corpus is partially tagged by topic, so it can be used for topic identification tasks. Also, it is applicable for extracting language model and computational lexicon information. Part of the corpus (75 million tokens) is available online for non-commercial use. The corpus uses the TEI format.

    Read more →
  • Machine learning

    Machine learning

    Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of statistical algorithms that can learn from data and generalize to unseen data, and thus perform tasks without being explicitly programmed. Advances in the field of deep learning have allowed neural networks, a class of statistical algorithms, to surpass many previous machine learning approaches in performance. Statistics and mathematical optimisation methods compose the foundations of machine learning. Data mining is a related field of study, focusing on exploratory data analysis (EDA) through unsupervised learning. From a theoretical viewpoint, probably approximately correct learning provides a mathematical and statistical framework for describing machine learning. Most traditional machine learning and deep learning algorithms can be described as empirical risk minimisation under this framework. == History == The term machine learning was coined in 1959 by Arthur Samuel, an IBM employee and pioneer in the field of computer gaming and artificial intelligence. The synonym self-teaching computers was also used during this time period. The earliest machine learning program was introduced in the 1950s, when Samuel invented a computer program that calculated the chance of winning in checkers for each side, but the history of machine learning is rooted in decades of efforts to study human cognitive processes. In 1949, Canadian psychologist Donald Hebb published the book The Organization of Behavior, in which he introduced a theoretical neural structure formed by certain interactions among nerve cells. The Hebbian theory of neuron interaction set the groundwork for how many machine learning algorithms work, with connected artificial neurons changing the strength of their connections based on data. Other researchers who have studied human cognitive systems contributed to the modern machine learning technologies as well, including Walter Pitts and Warren McCulloch, who proposed the first mathematical model of neural networks including algorithms that mirror human thought processes. By the early 1960s, an experimental "learning machine" with punched tape memory, called Cybertron, had been developed by Raytheon Company to analyse sonar signals, electrocardiograms, and speech patterns using rudimentary reinforcement learning. It was repetitively "trained" by a human operator/teacher to recognise patterns and equipped with a "goof" button to cause it to reevaluate incorrect decisions. A representative book on research into machine learning during the 1960s was Nils Nilsson's book "Learning Machines", dealing mostly with machine learning for pattern classification. Interest related to pattern recognition continued into the 1970s, as described by Duda and Hart in 1973. In 1981, a report was given on using teaching strategies so that an artificial neural network learns to recognise 40 characters (26 letters, 10 digits, and 4 special symbols) from a computer terminal. Tom M. Mitchell provided a widely quoted, more formal definition of the algorithms studied in the machine learning field: "A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E." This definition of the tasks in which machine learning is concerned is fundamentally operational rather than defining the field in cognitive terms. This follows Alan Turing's proposal in his paper "Computing Machinery and Intelligence", in which the question, "Can machines think?", is replaced by asking whether machines can convincingly imitate a human in its responses to human-posed questions. In 2014 Ian Goodfellow and others introduced generative adversarial networks (GANs) which could produce realistic synthetic data. By 2016 AlphaGo had won against top human players in Go using reinforcement learning techniques. == Relationships to other fields == === Artificial intelligence === As a scientific endeavour, machine learning grew out of the quest for artificial intelligence (AI). In the early days of AI as an academic discipline, some researchers were interested in having machines learn from data. They attempted to approach the problem with various symbolic methods, as well as what were then termed "neural networks"; these were mostly perceptrons and other models that were later found to be reinventions of the generalised linear models of statistics. Probabilistic reasoning was also employed, especially in automated medical diagnosis. However, an increasing emphasis on the logical, knowledge-based approach caused a rift between AI and machine learning. Probabilistic systems were plagued by theoretical and practical problems of data acquisition and representation. By 1980, expert systems had come to dominate AI, and statistics was out of favour. Work on symbolic/knowledge-based learning continued within AI, leading to inductive logic programming (ILP), but the more statistical line of research was now outside the field of AI proper, in pattern recognition and information retrieval. Neural network research was abandoned by AI and computer science around the same time. This subfield, termed "connectionism", was continued by researchers from other disciplines, including John Hopfield, David Rumelhart, and Geoffrey Hinton. Their main success came in the mid-1980s with the reinvention of backpropagation. Machine learning (ML), reorganised and recognised as its own field, started to flourish in the 1990s. The field changed its goal from achieving artificial intelligence to tackling solvable problems of a practical nature. It shifted focus away from the symbolic approaches it had inherited from AI, and toward methods and models borrowed from statistics, fuzzy logic, and probability theory. === Data compression === === Data mining === Machine learning and data mining often employ the same methods and overlap significantly, but while machine learning focuses on prediction based on known properties learned from the training data, data mining focuses on the discovery of previously unknown properties in the data (this is the analysis step of knowledge discovery in databases). Data mining uses many machine learning methods, but with different goals; on the other hand, machine learning also employs data mining methods as "unsupervised learning" or as a preprocessing step to improve learner accuracy. Much of the confusion between these two research communities comes from the basic assumptions they work with: in machine learning, performance is usually evaluated with respect to the ability to reproduce known knowledge, while in knowledge discovery and data mining (KDD) the key task is the discovery of previously unknown knowledge. Evaluated with respect to known knowledge, an uninformed (unsupervised) method will easily be outperformed by other supervised methods, while in a typical KDD task, supervised methods cannot be used due to the unavailability of training data. Machine learning also has intimate ties to optimization: Many learning problems are formulated as minimisation of some loss function on a training set of examples. Loss functions express the discrepancy between the predictions of the model being trained and the actual problem instances (for example, in classification, one wants to assign a label to instances, and models are trained to correctly predict the preassigned labels of a set of examples). === Generalization === Characterizing the generalisation of various learning algorithms is an active topic of current research, especially for deep learning algorithms. === Statistics === Machine learning and statistics are closely related fields in terms of methods, but distinct in their principal goal: statistics draws population inferences from a sample, while machine learning finds generalisable predictive patterns. Conventional statistical analyses require the a priori selection of a model most suitable for the study data set. In addition, only significant or theoretically relevant variables based on previous experience are included for analysis. In contrast, machine learning is not built on a pre-structured model; rather, the data shape the model by detecting underlying patterns. The more variables (input) used to train the model, the more accurate the ultimate model will be. Leo Breiman distinguished two statistical modelling paradigms: the data model and the algorithmic model, wherein "algorithmic model" means more or less the machine learning algorithms like Random forest. Some statisticians have adopted methods from machine learning, producing the field of statistical learning. === Statistical physics === Analytical and computational techniques derived from deep-rooted physics of disordered systems can be extended to large-scale problems, including machine learning, e.g., to analyse the weight space of deep neural networks. Statistical physics is thus

    Read more →
  • Embedding (machine learning)

    Embedding (machine learning)

    In machine learning, embedding is a representation learning technique that maps complex, high-dimensional data into a lower-dimensional vector space of numerical vectors. == Technique == It also denotes the resulting representation, where meaningful patterns or relationships are preserved. As a technique, it learns these vectors from data like words, images, or user interactions, differing from manually designed methods such as one-hot encoding. This process reduces complexity and captures key features without needing prior knowledge of the domain. == Similarity == In natural language processing, words or concepts may be represented as feature vectors, where similar concepts are mapped to nearby vectors. The resulting embeddings vary by type, including word embeddings for text (e.g., Word2Vec), image embeddings for visual data, and knowledge graph embeddings for knowledge graphs, each tailored to tasks like NLP, computer vision, or recommendation systems. This dual role enhances model efficiency and accuracy by automating feature extraction and revealing latent similarities across diverse applications. To measure the distance between two embeddings, a similarity measure can be used to find the overall similarity of the concepts represented by the embeddings. If the vectors are normalized to have a magnitude of 1, then the similarity measures are proportional to cos ⁡ ( θ a b ) {\displaystyle \cos \left(\theta _{ab}\right)} . The cosine similarity disregards the magnitude of the vector when determining similarity, so it is less biased towards training data that appears very frequently. The dot product includes the magnitude inherently, so it will tend to value more popular data. Generally, for high-dimensional vector spaces, vectors tend to converge in distance, so Euclidean distance becomes less reliable for large embedding vectors.

    Read more →