Sentence embedding

Sentence embedding

In natural language processing, a sentence embedding is a representation of a sentence as a vector of numbers which encodes meaningful semantic information. State of the art embeddings are based on the learned hidden layer representation of dedicated sentence transformer models. BERT pioneered an approach involving the use of a dedicated [CLS] token prepended to the beginning of each sentence inputted into the model; the final hidden state vector of this token encodes information about the sentence and can be fine-tuned for use in sentence classification tasks. In practice however, BERT's sentence embedding with the [CLS] token achieves poor performance, often worse than simply averaging non-contextual word embeddings. SBERT later achieved superior sentence embedding performance by fine tuning BERT's [CLS] token embeddings through the usage of a siamese neural network architecture on the SNLI dataset. Other approaches are loosely based on the idea of distributional semantics applied to sentences. Skip-Thought trains an encoder-decoder structure for the task of neighboring sentences predictions; this has been shown to achieve worse performance than approaches such as InferSent or SBERT. An alternative direction is to aggregate word embeddings, such as those returned by Word2vec, into sentence embeddings. The most straightforward approach is to simply compute the average of word vectors, known as continuous bag-of-words (CBOW). However, more elaborate solutions based on word vector quantization have also been proposed. One such approach is the vector of locally aggregated word embeddings (VLAWE), which demonstrated performance improvements in downstream text classification tasks. == Applications == In recent years, sentence embedding has seen a growing level of interest due to its applications in natural language queryable knowledge bases through the usage of vector indexing for semantic search. LangChain for instance utilizes sentence transformers for purposes of indexing documents. In particular, an indexing is generated by generating embeddings for chunks of documents and storing (document chunk, embedding) tuples. Then given a query in natural language, the embedding for the query can be generated. A top k similarity search algorithm is then used between the query embedding and the document chunk embeddings to retrieve the most relevant document chunks as context information for question answering tasks. This approach is also known formally as retrieval-augmented generation. Though not as predominant as BERTScore, sentence embeddings are commonly used for sentence similarity evaluation which sees common use for the task of optimizing a Large language model's generation parameters is often performed via comparing candidate sentences against reference sentences. By using the cosine-similarity of the sentence embeddings of candidate and reference sentences as the evaluation function, a grid-search algorithm can be utilized to automate hyperparameter optimization. == Evaluation == A way of testing sentence encodings is to apply them on Sentences Involving Compositional Knowledge (SICK) corpus for both entailment (SICK-E) and relatedness (SICK-R). In the best results are obtained using a BiLSTM network trained on the Stanford Natural Language Inference (SNLI) Corpus. The Pearson correlation coefficient for SICK-R is 0.885 and the result for SICK-E is 86.3. A slight improvement over previous scores is presented in: SICK-R: 0.888 and SICK-E: 87.8 using a concatenation of bidirectional Gated recurrent unit.

Transfer learning

Transfer learning (TL) is a technique in machine learning (ML) in which knowledge learned from a task is re-used in order to boost performance on a related task. For example, for image classification, knowledge gained while learning to recognize cars could be applied when trying to recognize trucks. This topic is related to the psychological literature on transfer of learning, although practical ties between the two fields are limited. Reusing or transferring information from previously learned tasks to new tasks has the potential to significantly improve learning efficiency. Since transfer learning makes use of training with multiple objective functions it is related to cost-sensitive machine learning and multi-objective optimization. == History == In 1976, Bozinovski and Fulgosi published a paper addressing transfer learning in neural network training. The paper gives a mathematical and geometrical model of the topic. In 1981, a report considered the application of transfer learning to a dataset of images representing letters of computer terminals, experimentally demonstrating positive and negative transfer learning. In 1992, Lorien Pratt formulated the discriminability-based transfer (DBT) algorithm. By 1998, the field had advanced to include multi-task learning, along with more formal theoretical foundations. Influential publications on transfer learning include the book Learning to Learn in 1998, a 2009 survey and a 2019 survey. Ng said in his NIPS 2016 tutorial that TL would become the next driver of machine learning commercial success after supervised learning. In the 2020 paper, "Rethinking Pre-Training and self-training", Zoph et al. reported that pre-training can hurt accuracy, and advocate self-training instead. == Definition == The definition of transfer learning is given in terms of domains and tasks. A domain D {\displaystyle {\mathcal {D}}} consists of: a feature space X {\displaystyle {\mathcal {X}}} and a marginal probability distribution P ( X ) {\displaystyle P(X)} , where X = { x 1 , . . . , x n } ∈ X {\displaystyle X=\{x_{1},...,x_{n}\}\in {\mathcal {X}}} . Given a specific domain, D = { X , P ( X ) } {\displaystyle {\mathcal {D}}=\{{\mathcal {X}},P(X)\}} , a task consists of two components: a label space Y {\displaystyle {\mathcal {Y}}} and an objective predictive function f : X → Y {\displaystyle f:{\mathcal {X}}\rightarrow {\mathcal {Y}}} . The function f {\displaystyle f} is used to predict the corresponding label f ( x ) {\displaystyle f(x)} of a new instance x {\displaystyle x} . This task, denoted by T = { Y , f ( x ) } {\displaystyle {\mathcal {T}}=\{{\mathcal {Y}},f(x)\}} , is learned from the training data consisting of pairs { x i , y i } {\displaystyle \{x_{i},y_{i}\}} , where x i ∈ X {\displaystyle x_{i}\in {\mathcal {X}}} and y i ∈ Y {\displaystyle y_{i}\in {\mathcal {Y}}} . Given a source domain D S {\displaystyle {\mathcal {D}}_{S}} and learning task T S {\displaystyle {\mathcal {T}}_{S}} , a target domain D T {\displaystyle {\mathcal {D}}_{T}} and learning task T T {\displaystyle {\mathcal {T}}_{T}} , where D S ≠ D T {\displaystyle {\mathcal {D}}_{S}\neq {\mathcal {D}}_{T}} , or T S ≠ T T {\displaystyle {\mathcal {T}}_{S}\neq {\mathcal {T}}_{T}} , transfer learning aims to help improve the learning of the target predictive function f T ( ⋅ ) {\displaystyle f_{T}(\cdot )} in D T {\displaystyle {\mathcal {D}}_{T}} using the knowledge in D S {\displaystyle {\mathcal {D}}_{S}} and T S {\displaystyle {\mathcal {T}}_{S}} . == Applications == Algorithms for transfer learning are available in Markov logic networks and Bayesian networks. Transfer learning has been applied to cancer subtype discovery, building utilization, general game playing, text classification, digit recognition, medical imaging and spam filtering. In 2020, it was discovered that, due to their similar physical natures, transfer learning is possible between electromyographic (EMG) signals from the muscles and classifying the behaviors of electroencephalographic (EEG) brainwaves, from the gesture recognition domain to the mental state recognition domain. It was noted that this relationship worked in both directions, showing that electroencephalographic can likewise be used to classify EMG. The experiments noted that the accuracy of neural networks and convolutional neural networks were improved through transfer learning both prior to any learning (compared to standard random weight distribution) and at the end of the learning process (asymptote). That is, results are improved by exposure to another domain. Moreover, the end-user of a pre-trained model can change the structure of fully-connected layers to improve performance.

Batch normalization

In artificial neural networks, batch normalization (also known as batch norm) is a normalization technique used to make training faster and more stable by adjusting the inputs to each layer—re-centering them around zero and re-scaling them to a standard size. It was introduced by Sergey Ioffe and Christian Szegedy in 2015. Experts still debate why batch normalization works so well. It was initially thought to tackle internal covariate shift, a problem where parameter initialization and changes in the distribution of the inputs of each layer affect the learning rate of the network. However, newer research suggests it doesn’t fix this shift but instead smooths the objective function—a mathematical guide the network follows to improve—enhancing performance. In very deep networks, batch normalization can initially cause a severe gradient explosion—where updates to the network grow uncontrollably large—but this is managed with shortcuts called skip connections in residual networks. Another theory is that batch normalization adjusts data by handling its size and path separately, speeding up training. == Internal covariate shift == Each layer in a neural network has inputs that follow a specific distribution, which shifts during training due to two main factors: the random starting values of the network’s settings (parameter initialization) and the natural variation in the input data. This shifting pattern affecting the inputs to the network’s inner layers is called internal covariate shift. While a strict definition isn’t fully agreed upon, experiments show that it involves changes in the means and variances of these inputs during training. Batch normalization was first developed to address internal covariate shift. During training, as the parameters of preceding layers adjust, the distribution of inputs to the current layer changes accordingly, such that the current layer needs to constantly readjust to new distributions. This issue is particularly severe in deep networks, because small changes in shallower hidden layers will be amplified as they propagate within the network, resulting in significant shift in deeper hidden layers. Batch normalization was proposed to reduced these unwanted shifts to speed up training and produce more reliable models. Beyond possibly tackling internal covariate shift, batch normalization offers several additional advantages. It allows the network to use a higher learning rate—a setting that controls how quickly the network learns—without causing problems like vanishing or exploding gradients, where updates become too small or too large. It also appears to have a regularizing effect, improving the network’s ability to generalize to new data, reducing the need for dropout, a technique used to prevent overfitting (when a model learns the training data too well and fails on new data). Additionally, networks using batch normalization are less sensitive to the choice of starting settings or learning rates, making them more robust and adaptable. == Procedures == === Transformation === In a neural network, batch normalization is achieved through a normalization step that fixes the means and variances of each layer's inputs. Ideally, the normalization would be conducted over the entire training set, but to use this step jointly with stochastic optimization methods, it is impractical to use the global information. Thus, normalization is restrained to each mini-batch in the training process. Let us use B to denote a mini-batch of size m of the entire training set. The empirical mean and variance of B could thus be denoted as μ B = 1 m ∑ i = 1 m x i {\displaystyle \mu _{B}={\frac {1}{m}}\sum _{i=1}^{m}x_{i}} and σ B 2 = 1 m ∑ i = 1 m ( x i − μ B ) 2 {\displaystyle \sigma _{B}^{2}={\frac {1}{m}}\sum _{i=1}^{m}(x_{i}-\mu _{B})^{2}} . For a layer of the network with d-dimensional input, x = ( x ( 1 ) , . . . , x ( d ) ) {\displaystyle x=(x^{(1)},...,x^{(d)})} , each dimension of its input is then normalized (i.e. re-centered and re-scaled) separately, x ^ i ( k ) = x i ( k ) − μ B ( k ) ( σ B ( k ) ) 2 + ϵ {\displaystyle {\hat {x}}_{i}^{(k)}={\frac {x_{i}^{(k)}-\mu _{B}^{(k)}}{\sqrt {\left(\sigma _{B}^{(k)}\right)^{2}+\epsilon }}}} , where k ∈ [ 1 , d ] {\displaystyle k\in [1,d]} and i ∈ [ 1 , m ] {\displaystyle i\in [1,m]} ; μ B ( k ) {\displaystyle \mu _{B}^{(k)}} and σ B ( k ) {\displaystyle \sigma _{B}^{(k)}} are the per-dimension mean and standard deviation, respectively. ϵ {\displaystyle \epsilon } is added in the denominator for numerical stability and is an arbitrarily small positive constant. The resulting normalized activation x ^ ( k ) {\displaystyle {\hat {x}}^{(k)}} have zero mean and unit variance, if ϵ {\displaystyle \epsilon } is not taken into account. To restore the representation power of the network, a transformation step then follows as y i ( k ) = γ ( k ) x ^ i ( k ) + β ( k ) {\displaystyle y_{i}^{(k)}=\gamma ^{(k)}{\hat {x}}_{i}^{(k)}+\beta ^{(k)}} , where the parameters γ ( k ) {\displaystyle \gamma ^{(k)}} and β ( k ) {\displaystyle \beta ^{(k)}} are subsequently learned in the optimization process. Formally, the operation that implements batch normalization is a transform B N γ ( k ) , β ( k ) : x 1... m ( k ) → y 1... m ( k ) {\displaystyle BN_{\gamma ^{(k)},\beta ^{(k)}}:x_{1...m}^{(k)}\rightarrow y_{1...m}^{(k)}} called the Batch Normalizing transform. The output of the BN transform y ( k ) = B N γ ( k ) , β ( k ) ( x ( k ) ) {\displaystyle y^{(k)}=BN_{\gamma ^{(k)},\beta ^{(k)}}(x^{(k)})} is then passed to other network layers, while the normalized output x ^ i ( k ) {\displaystyle {\hat {x}}_{i}^{(k)}} remains internal to the current layer. === Backpropagation === The described BN transform is a differentiable operation, and the gradient of the loss l {\displaystyle l} with respect to the different parameters can be computed directly with the chain rule. Specifically, ∂ l ∂ y i ( k ) {\displaystyle {\frac {\partial l}{\partial y_{i}^{(k)}}}} depends on the choice of activation function, and the gradient against other parameters could be expressed as a function of ∂ l ∂ y i ( k ) {\displaystyle {\frac {\partial l}{\partial y_{i}^{(k)}}}} : ∂ l ∂ x ^ i ( k ) = ∂ l ∂ y i ( k ) γ ( k ) {\displaystyle {\frac {\partial l}{\partial {\hat {x}}_{i}^{(k)}}}={\frac {\partial l}{\partial y_{i}^{(k)}}}\gamma ^{(k)}} , ∂ l ∂ γ ( k ) = ∑ i = 1 m ∂ l ∂ y i ( k ) x ^ i ( k ) {\displaystyle {\frac {\partial l}{\partial \gamma ^{(k)}}}=\sum _{i=1}^{m}{\frac {\partial l}{\partial y_{i}^{(k)}}}{\hat {x}}_{i}^{(k)}} , ∂ l ∂ β ( k ) = ∑ i = 1 m ∂ l ∂ y i ( k ) {\displaystyle {\frac {\partial l}{\partial \beta ^{(k)}}}=\sum _{i=1}^{m}{\frac {\partial l}{\partial y_{i}^{(k)}}}} , ∂ l ∂ σ B ( k ) 2 = ∑ i = 1 m ∂ l ∂ y i ( k ) ( x i ( k ) − μ B ( k ) ) ( − γ ( k ) 2 ( σ B ( k ) 2 + ϵ ) − 3 / 2 ) {\displaystyle {\frac {\partial l}{\partial \sigma _{B}^{(k)^{2}}}}=\sum _{i=1}^{m}{\frac {\partial l}{\partial y_{i}^{(k)}}}(x_{i}^{(k)}-\mu _{B}^{(k)})\left(-{\frac {\gamma ^{(k)}}{2}}(\sigma _{B}^{(k)^{2}}+\epsilon )^{-3/2}\right)} , ∂ l ∂ μ B ( k ) = ∑ i = 1 m ∂ l ∂ y i ( k ) − γ ( k ) σ B ( k ) 2 + ϵ + ∂ l ∂ σ B ( k ) 2 1 m ∑ i = 1 m ( − 2 ) ⋅ ( x i ( k ) − μ B ( k ) ) {\displaystyle {\frac {\partial l}{\partial \mu _{B}^{(k)}}}=\sum _{i=1}^{m}{\frac {\partial l}{\partial y_{i}^{(k)}}}{\frac {-\gamma ^{(k)}}{\sqrt {\sigma _{B}^{(k)^{2}}+\epsilon }}}+{\frac {\partial l}{\partial \sigma _{B}^{(k)^{2}}}}{\frac {1}{m}}\sum _{i=1}^{m}(-2)\cdot (x_{i}^{(k)}-\mu _{B}^{(k)})} , and ∂ l ∂ x i ( k ) = ∂ l ∂ x ^ i ( k ) 1 σ B ( k ) 2 + ϵ + ∂ l ∂ σ B ( k ) 2 2 ( x i ( k ) − μ B ( k ) ) m + ∂ l ∂ μ B ( k ) 1 m {\displaystyle {\frac {\partial l}{\partial x_{i}^{(k)}}}={\frac {\partial l}{\partial {\hat {x}}_{i}^{(k)}}}{\frac {1}{\sqrt {\sigma _{B}^{(k)^{2}}+\epsilon }}}+{\frac {\partial l}{\partial \sigma _{B}^{(k)^{2}}}}{\frac {2(x_{i}^{(k)}-\mu _{B}^{(k)})}{m}}+{\frac {\partial l}{\partial \mu _{B}^{(k)}}}{\frac {1}{m}}} . === Inference === During the training stage, the normalization steps depend on the mini-batches to ensure efficient and reliable training. However, in the inference stage, this dependence is not useful any more. Instead, the normalization step in this stage is computed with the population statistics such that the output could depend on the input in a deterministic manner. The population mean, E [ x ( k ) ] {\displaystyle E[x^{(k)}]} , and variance, Var ⁡ [ x ( k ) ] {\displaystyle \operatorname {Var} [x^{(k)}]} , are computed as: E [ x ( k ) ] = E B [ μ B ( k ) ] {\displaystyle E[x^{(k)}]=E_{B}[\mu _{B}^{(k)}]} , and Var ⁡ [ x ( k ) ] = m m − 1 E B [ ( σ B ( k ) ) 2 ] {\displaystyle \operatorname {Var} [x^{(k)}]={\frac {m}{m-1}}E_{B}[\left(\sigma _{B}^{(k)}\right)^{2}]} . The population statistics thus is a complete representation of the mini-batches. The BN transform in the inference step thus becomes y ( k ) = B N γ ( k ) , β ( k ) inf ( x ( k ) ) = γ ( k ) x ( k ) − E [ x ( k ) ] Var ⁡ [ x ( k ) ] + ϵ + β

Digital organism

A digital organism is a self-replicating computer program that mutates and evolves. Digital organisms are used as a tool to study the dynamics of Darwinian evolution, and to test or verify specific hypotheses or mathematical models of evolution. The study of digital organisms is closely related to the area of artificial life. == History == Digital organisms can be traced back to the game Darwin, developed in 1961 at Bell Labs, in which computer programs had to compete with each other by trying to stop others from executing . A similar implementation that followed this was the game Core War. In Core War, it turned out that one of the winning strategies was to replicate as fast as possible, which deprived the opponent of all computational resources. Programs in the Core War game were also able to mutate themselves and each other by overwriting instructions in the simulated "memory" in which the game took place. This allowed competing programs to embed damaging instructions in each other that caused errors (terminating the process that read it), "enslaved processes" (making an enemy program work for you), or even change strategies mid-game and heal themselves. Steen Rasmussen at Los Alamos National Laboratory took the idea from Core War one step further in his core world system by introducing a genetic algorithm that automatically wrote programs. However, Rasmussen did not observe the evolution of complex and stable programs. It turned out that the programming language in which core world programs were written was very brittle, and more often than not mutations would completely destroy the functionality of a program. The first to solve the issue of program brittleness was Thomas S. Ray with his Tierra system, which was similar to core world. Ray made some key changes to the programming language such that mutations were much less likely to destroy a program. With these modifications, he observed for the first time computer programs that did indeed evolve in a meaningful and complex way. Later, Chris Adami, Titus Brown, and Charles Ofria started developing their Avida system, which was inspired by Tierra but again had some crucial differences. In Tierra, all programs lived in the same address space and could potentially execute or otherwise interfere with each other's code. In Avida, on the other hand, each program lives in its own address space. Because of this modification, experiments with Avida became much cleaner and easier to interpret than those with Tierra. With Avida, digital organism research has begun to be accepted as a valid contribution to evolutionary biology by a growing number of evolutionary biologists. Evolutionary biologist Richard Lenski of Michigan State University has used Avida extensively in his work. Lenski, Adami, and their colleagues have published in journals such as Nature and the Proceedings of the National Academy of Sciences (USA). In 1996, Andy Pargellis created a Tierra-like system called Amoeba that evolved self-replication from a randomly seeded initial condition. More recently REvoSim - a software package based around binary digital organisms - has allowed evolutionary simulations of large populations that can be run for geological timescales.

Structure mapping engine

In artificial intelligence and cognitive science, the structure mapping engine (SME) is an implementation in software of an algorithm for analogical matching based on the psychological theory of Dedre Gentner. The basis of Gentner's structure-mapping idea is that an analogy is a mapping of knowledge from one domain (the base) into another (the target). The structure-mapping engine is a computer simulation of the analogy and similarity comparisons. The theory is useful because it ignores surface features and finds matches between potentially very different things if they have the same representational structure. For example, SME could determine that a pen is like a sponge because both are involved in dispensing liquid, even though they do this very differently. == Structure mapping theory == Structure mapping theory is based on the systematicity principle, which states that connected knowledge is preferred over independent facts. Therefore, the structure mapping engine should ignore isolated source-target mappings unless they are part of a bigger structure. The SME, the theory goes, should map objects that are related to knowledge that has already been mapped. The theory also requires that mappings be done one-to-one, which means that no part of the source description can map to more than one item in the target and no part of the target description can be mapped to more than one part of the source. The theory also requires that if a match maps subject to target, the arguments of subject and target must also be mapped. If both these conditions are met, the mapping is said to be "structurally consistent." == Concepts in SME == SME maps knowledge from a source into a target. SME calls each description a dgroup. Dgroups contain a list of entities and predicates. Entities represent the objects or concepts in a description — such as an input gear or a switch. Predicates are one of three types and are a general way to express knowledge for SME. Relation predicates contain multiple arguments, which can be other predicates or entities. An example relation is: (transmit (what from to)). This relation has a functor transmit and takes three arguments: what, from, and to. Attribute predicates are the properties of an entity. An example of an attribute is (red gear) which means that gear has the attribute red. Function predicates map an entity into another entity or constant. An example of a function is (joules power source) which maps the entity power source onto the numerical quantity joules. Functions and attributes have different meanings, and consequently SME processes them differently. For example, in SME's true analogy rule set, attributes differ from functions because they cannot match unless there is a higher-order match between them. The difference between attributes and functions will be explained further in this section's examples. All predicates have four parameters. They have (1) a functor, which identifies it, and (2) a type, which is either relation, attribute, or function. The other two parameters (3 and 4) are for determining how to process the arguments in the SME algorithm. If the arguments have to be matched in order, commutative is false. If the predicate can take any number of arguments, N-ary is false. An example of a predicate definition is: (sme:defPredicate behavior-set (predicate) relation :n-ary? t :commutative? t) The predicate's functor is “behavior-set,” its type is “relation,” and its n-ary and commutative parameters are both set to true. The “(predicate)” part of the definition specifies that there will be one or more predicates inside an instantiation of behavior-set. == Algorithm details == The algorithm has several steps. The first step of the algorithm is to create a set of match hypotheses between source and target dgroups. A match hypothesis represents a possible mapping between any part of the source and the target. This mapping is controlled by a set of match rules. By changing the match rules, one can change the type of reasoning SME does. For example, one set of match rules may perform a kind of analogy called literal similarity, and another performs a kind of analogy called true-analogy. These rules are not the place where domain-dependent information is added, but rather where the analogy process is tweaked, depending on the type of cognitive function the user is trying to emulate. For a given match rule, there are two types of rules that further define how it will be applied: filter rules and intern rules. Intern rules use only the arguments of the expressions in the match hypotheses that the filter rules identify. This limitation makes the processing more efficient by constraining the number of match hypotheses that are generated. At the same time, it also helps to build the structural consistencies that are needed later on in the algorithm. An example of a filter rule from the true-analogy rule set creates match hypotheses between predicates that have the same functor. The true-analogy rule set has an intern rule that iterates over the arguments of any match hypothesis, creating more match hypotheses if the arguments are entities or functions, or if the arguments are attributes and have the same functor. In order to illustrate how the match rules produce match hypotheses consider these two predicates: transmit torque inputgear secondgear (p1) transmit signal switch div10 (p2) Here we use true analogy for the type of reasoning. The filter match rule generates a match between p1 and p2 because they share the same functor, transmit. The intern rules then produce three more match hypotheses: torque to signal, inputgear to switch, and secondgear to div10. The intern rules created these match hypotheses because all the arguments were entities. If the arguments were functions or attributes instead of entities, the predicates would be expressed as: transmit torque (inputgear gear) (secondgear gear) (p3) transmit signal (switch circuit) (div10 circuit) (p4) These additional predicates make inputgear, secondgear, switch, and div10 functions or attributes depending on the value defined in the language input file. The representation also contains additional entities for gear and circuit. Depending on what type inputgear, secondgear, switch, and div10 are, their meanings change. As attributes, each one is a property of the gear or circuit. For example, the gear has two attributes, inputgear and secondgear. The circuit has two attributes, switch and circuit. As functions inputgear, secondgear, switch, and div10 become quantities of the gear and circuit. In this example, the functions inputgear and secondgear now map to the numerical quantities “torque from inputgear” and “torque from secondgear,” For the circuit the quantities map to logical quantity “switch engaged” and the numerical quantity “current count on the divide by 10 counter.” SME processes these differently. It does not allow attributes to match unless they are part of a higher-order relation, but it does allow functions to match, even if they are not part of such a relation. It allows functions to match because they indirectly refer to entities and thus should be treated like relations that involve no entities. However, as next section shows, the intern rules assign lower weights to matches between functions than to matches between relations. The reason SME does not match attributes is because it is trying to create connected knowledge based on relationships and thus satisfy the systematicity principle. For example, if both a clock and a car have inputgear attributes, SME will not mark them as similar. If it did, it would be making a match between the clock and car based on their appearance — not on the relationships between them. When the additional predicates in p3 and p4 are functions, the results from matching p3 and p4 are similar to the results from p1 and p2 except there is an additional match between gear and circuit and the values for the match hypotheses between (inputgear gear) and (switch circuit), and (secondgear gear) and (div10 circuit), are lower. The next section describes the reason for this in more detail. If the inputgear, secondgear, switch, and div10 are attributes instead of entities, SME does not find matches between any of the attributes. It finds matches only between the transmit predicates and between torque and signal. Additionally, the structural-evaluation scores for the remaining two matches decrease. In order to get the two predicates to match, p3 would need to be replaced by p5, which is demonstrated below. transmit torque (inputgear gear) (div10 gear) (p5) Since the true-analogy rule set identifies that the div10 attributes are the same between p5 and p4 and because the div10 attributes are both part of the higher-relation match between torque and signal, SME makes a match between (div10 gear) and (div10 circuit) — which leads to a match between gear and circuit. Being part of a higher-order match is a requiremen

AIX Toolbox for Linux Applications

The AIX Toolbox for Linux Applications is a collection of GNU tools for IBM AIX. These tools are available for installation using Red Hat's RPM format. == Licensing == Each of these packages includes its own licensing information and while IBM has made the code available to AIX users, the code is provided as is and has not been thoroughly tested. The Toolbox is meant to provide a core set of some of the most common development tools and libraries along with the more popular GNU packages.

Mata v. Avianca, Inc.

Mata v. Avianca, Inc. was a U.S. District Court for the Southern District of New York case in which the Court dismissed a personal injury case against the airline Avianca and issued a $5,000 fine to the plaintiffs' lawyers who had submitted fake precedents generated by ChatGPT in their legal briefs. == Background == In February 2022, Roberto Mata filed a personal injury lawsuit in the U.S. District Court for the Southern District of New York against Avianca, alleging that he was injured when a metal serving cart struck his knee during an international flight. The plaintiff's lawyers used ChatGPT to generate a legal motion, which contained numerous fake legal cases involving fictitious airlines with fabricated quotations and internal citations. Avianca's lawyers notified the Court that they had been "unable to locate" a few legal cases cited in the legal motion. The Court could not locate the cases either and ordered the plaintiff's lawyers to provide copies of the cited legal cases. Mata's lawyers provided copies of documents purportedly containing all but one of the legal cases, after ChatGPT assured that the cases "indeed exist" and "can be found in reputable legal databases such as LexisNexis and Westlaw." == Opinion == In May 2023, Judge P. Kevin Castel dismissed the personal injury case against Avianca and ordered the plaintiff's attorneys to pay a $5,000 fine. Judge Castel noted numerous inconsistencies in the opinion summaries, describing one of the legal analyses as "gibberish." Judge Castel held that Mata's lawyers had acted with "subjective bad faith" sufficient for sanctions under Federal Rule of Civil Procedure Rule 11. == Impact == In July 2024, the American Bar Association issued its first formal ethics opinion on the responsibilities of lawyers using generative AI (GAI). The 15-page opinion outlines how the Rules of Professional Conduct apply to the use of GAI in the practice of law. Experts caution that lawyers cannot reasonably rely on the accuracy, completeness, or validity of content generated by GAI tools. Due to the continued usage of GAI in the practice of law, Mata has been described as a landmark case by legal professionals, as it is frequently cited by courts in cases where usage of GAI during the course of proceedings leads to the creation and citation of nonexistent caselaw.