The closest point method (CPM) is an embedding method for solving partial differential equations on surfaces. The closest point method uses standard numerical approaches such as finite differences, finite element or spectral methods in order to solve the embedding partial differential equation (PDE) which is equal to the original PDE on the surface. The solution is computed in a band surrounding the surface in order to be computationally efficient. In order to extend the data off the surface, the closest point method uses a closest point representation. This representation extends function values to be constant along directions normal to the surface. == Definitions == Closest Point function: Given a surface S , c p ( x ) {\displaystyle {\mathcal {S}},cp(\mathbf {x} )} refers to a (possibly non-unique) point belonging to S {\displaystyle {\mathcal {S}}} , which is closest to x {\displaystyle \mathbf {x} } [SE]. Closest point extension: Let S {\displaystyle {\mathcal {S}}} , be a smooth surface in R d {\displaystyle \mathbb {R} ^{d}} . The closest point extension of a function u : S → R {\displaystyle u:{\mathcal {S}}\rightarrow \mathbb {R} } , to a neighborhood Ω {\displaystyle \Omega } of S {\displaystyle {\mathcal {S}}} , is the function v : Ω → R {\displaystyle v:\Omega \rightarrow \mathbb {R} } , defined by v ( x ) = u ( c p ( x ) ) {\displaystyle v(\mathbf {x} )=u(cp(\mathbf {x} ))} . == Closest point method == Initialization consists of these steps [EW]: If it is not already given, a closest point representation of the surface is constructed. A computational domain is chosen. Typically this is a band around the surface. Replace surface gradients by standard gradients in R 3 {\displaystyle \mathbb {R} ^{3}} . Solution is initialized by extending the initial surface data on to the computational domain using the closest point function. After initialization, alternate between the following two steps: Using the closest point function, extend the solution off the surface to the computational domain. Compute the solution to the embedding PDE on a Cartesian mesh in the computational domain for one time step. == Banding == The surface PDE is extended into R 3 {\displaystyle \mathbb {R} ^{3}} however it is only necessary to solve this new PDE near the surface. Hence, we solve the PDE in a band surrounding the surface for efficient computational purposes. Ω c x : ‖ x − c p ( x ) ‖ 2 ≤ λ {\displaystyle \Omega _{c}{x:\|x-cp(x)\|_{2}\leq \lambda }} where λ {\displaystyle \lambda } is the bandwidth. == Example: Heat equation on a circle == Using initial profile u S ( θ , t ) = sin ( θ ) {\displaystyle u_{S}(\theta ,t)=\sin(\theta )} leads to the solution u S ( θ , t ) = exp ( − t ) sin ( θ ) {\displaystyle u_{S}(\theta ,t)=\exp(-t)\sin(\theta )} for the heat equation. Forward Euler time-stepping is used with relation Δ t = 0.1 Δ x 2 {\displaystyle \Delta t=0.1\Delta x^{2}} and degree-four interpolation polynomials for the interpolations. Second-order centered differences are used for the spatial discretization. The CPM results in the expected second order error in the solution u {\displaystyle u} . == Applications == The closest point method can be applied to various PDEs on surfaces. Reaction–diffusion problems on point clouds [RD], eigenvalue problems [EV], and level set equations [LS] are a few examples.
Hierarchical Risk Parity
Hierarchical Risk Parity (HRP) is an advanced investment portfolio optimization framework developed in 2016 by Marcos López de Prado at Guggenheim Partners and Cornell University. HRP is a probabilistic graph-based alternative to the prevailing mean-variance optimization (MVO) framework developed by Harry Markowitz in 1952, and for which he received the Nobel Prize in economic sciences. HRP algorithms apply discrete mathematics and machine learning techniques to create diversified and robust investment portfolios that outperform MVO methods out-of-sample. HRP aims to address the limitations of traditional portfolio construction methods, particularly when dealing with highly correlated assets. Following its publication, HRP has been implemented in numerous open-source libraries, and received multiple extensions. == Key features == HRP portfolios have been proposed as a robust alternative to traditional quadratic optimization methods, including the Critical Line Algorithm (CLA) of Markowitz. HRP addresses three central issues commonly associated with quadratic optimizers: numerical instability, excessive concentration in a small number of assets, and poor out-of-sample performance. HRP leverages techniques from graph theory and machine learning to construct diversified portfolios using only the information embedded in the covariance matrix. Unlike quadratic programming methods, HRP does not require the covariance matrix to be invertible. Consequently, HRP remains applicable even in cases where the covariance matrix is ill-conditioned or singular—conditions under which standard optimizers fail. Monte Carlo simulations indicate that HRP achieves lower out-of-sample variance than CLA, despite the fact that minimizing variance is the explicit optimization objective of CLA. Furthermore, HRP portfolios exhibit lower realized risk compared to those generated by traditional risk parity methodologies. Empirical backtests have demonstrated that HRP would have historically outperformed conventional portfolio construction techniques. Algorithms within the HRP framework are characterized by the following features: Machine Learning Approach: HRP employs hierarchical clustering, a machine learning technique, to group similar assets based on their correlations. This allows the algorithm to identify the underlying hierarchical structure of the portfolio, and avoid that errors spread through the entire network. Risk-Based Allocation: The algorithm allocates capital based on risk, ensuring that assets only compete with similar assets for representation in the portfolio. This approach leads to better diversification across different risk sources, while avoiding the instability associated with noisy returns estimates. Covariance Matrix Handling: Unlike traditional methods like Mean-Variance Optimization, HRP does not require inverting the covariance matrix. This makes it more stable and applicable to portfolios with a large number of assets, particularly when the covariance matrix's condition number is high. == The problem: Markowitz's Curse == Portfolio construction is perhaps the most recurrent financial problem. On a daily basis, investment managers must build portfolios that incorporate their views and forecasts on risks and returns. Despite the theoretical elegance of Markowitz's mean-variance framework, its practical implementation is hindered by several limitations that undermine the reliability of solutions derived from the Critical Line Algorithm (CLA). A principal concern is the high sensitivity of optimal portfolios to small perturbations in expected returns: even minor forecasting errors can result in significantly different allocations (Michaud, 1998). Given the inherent difficulty of producing accurate return forecasts, numerous researchers have advocated for approaches that forgo expected returns entirely and instead rely solely on the covariance structure of asset returns. This has given rise to risk-based allocation methods, among which risk parity is a widely cited example (Jurczenko, 2015). While eliminating return forecasts mitigates some instability, it does not eliminate it. Quadratic programming techniques employed in portfolio optimization require the inversion of a positive-definite covariance matrix, meaning all eigenvalues must be strictly positive. When the matrix is numerically ill-conditioned—that is, when the ratio of its largest to smallest eigenvalue (its condition number) is large—matrix inversion becomes unreliable and prone to significant numerical errors (Bailey and López de Prado, 2012). The condition number of a covariance, correlation, or any symmetric (and thus diagonalizable) matrix is defined as the absolute value of the ratio between its largest and smallest eigenvalues in modulus. The figure on the right presents the sorted eigenvalues of several correlation matrices; the condition number is represented by the ratio of the first to last eigenvalues in each sequence. A diagonal correlation matrix, which is equal to its own inverse, exhibits the minimum possible condition number. As the number of correlated (or multicollinear) assets in a portfolio increases, the condition number rises. At high levels, this leads to severe numerical instability, whereby slight modifications in any matrix entry may result in drastically different inverses. This phenomenon, often referred to as Markowitz’s curse, encapsulates the paradox wherein increased correlation among assets heightens the theoretical need for diversification, yet simultaneously increases the likelihood of unstable optimization outcomes. Consequently, the potential benefits of diversification are frequently overshadowed by estimation errors. These problems are exacerbated as the dimensionality of the covariance matrix increases. The estimation of each covariance term consumes degrees of freedom, and in general, a minimum of 1 2 N ( N + 1 ) {\displaystyle {\frac {1}{2}}N(N+1)} independent and identically distributed (IID) observations is required to estimate a non-singular covariance matrix of dimension N {\displaystyle N} . For example, constructing an invertible covariance matrix of dimension 50 necessitates at least five years of daily IID observations. However, empirical evidence suggests that the correlation structure of financial assets is highly unstable over such extended periods. These difficulties are highlighted by the observation that even naïve allocation strategies—such as equally weighted portfolios—have frequently outperformed both mean-variance and risk-based optimizations in out-of-sample tests (De Miguel et al., 2009). == The solution: Hierarchical Risk Parity == The HRP algorithm addresses Markowitz's curse in three steps: Hierarchical Clustering: Assets are grouped into clusters based on their correlations, forming a hierarchical tree structure. Quasi-Diagonalization: The correlation matrix is reordered based on the clustering results, revealing a block diagonal structure. Recursive Bisection: Weights are assigned to assets through a top-down approach, splitting the portfolio into smaller sub-portfolios and allocating capital based on inverse variance. === Step 1: Hierarchical clustering === Given a T × N {\displaystyle T\times N} matrix of asset returns X {\displaystyle X} , where each column represents a time series of returns for one of N {\displaystyle N} assets over T {\displaystyle T} time periods, a hierarchical clustering process can be used to construct a tree-based representation of asset relationships. First, we compute the N × N {\displaystyle N\times N} correlation matrix ρ = ρ i , j i , j = 1 . . . N {\displaystyle \rho ={\rho _{i,j}}\;{i,j=1\;...\;N}} , where ρ i , j = c o r r ( X i , X j ) {\displaystyle \rho _{i,j}=\mathrm {corr} (X_{i},X_{j})} . From this, a pairwise distance matrix D = d i , j {\displaystyle D={d_{i,j}}} is defined using the transformation: d i , j = 1 2 ( 1 − ρ i , j ) {\displaystyle d_{i,j}={\sqrt {{\frac {1}{2}}(1-\rho _{i,j})}}} This distance function defines a proper metric space, satisfying non-negativity, identity of indiscernibles, symmetry, and the triangle inequality. Next, a secondary distance matrix D ~ = d ~ i , j {\displaystyle {\tilde {D}}={{\tilde {d}}_{i,j}}} is computed, where each entry measures the Euclidean distance between the distance profiles of two assets: d ~ i , j = ∑ n = 1 N ( d n , i − d n , j ) 2 {\displaystyle {\tilde {d}}_{i,j}={\sqrt {\sum _{n=1}^{N}(d_{n,i}-d_{n,j})^{2}}}} While d i , j {\displaystyle d_{i,j}} reflects correlation-based proximity between two assets, d ~ i , j {\displaystyle {\tilde {d}}_{i,j}} quantifies dissimilarity across the entire system, as it depends on all pairwise distances. Hierarchical clustering proceeds by identifying the pair ( i , j ) {\displaystyle (i,j)} with the smallest value of d ~ i , j {\displaystyle {\tilde {d}}_{i,j}} (for i ≠ j {\displaystyle i\neq j} ), and forming a new cluster u [ 1 ] = ( i , j ) {\displaystyle u[1]=(i,j)} .
Hyperparameter (machine learning)
In machine learning, a hyperparameter is a parameter that can be set in order to define any configurable part of a model's learning process. Hyperparameters can be classified as either model hyperparameters (such as the topology and size of a neural network) or algorithm hyperparameters (such as the learning rate and the batch size of an optimizer). These are named hyperparameters in contrast to parameters, which are characteristics that the model learns from the data. Hyperparameters are not required by every model or algorithm. Some simple algorithms such as ordinary least squares regression require none. However, the LASSO algorithm, for example, adds a regularization hyperparameter to ordinary least squares which must be set before training. Even models and algorithms without a strict requirement to define hyperparameters may not produce meaningful results if these are not carefully chosen. However, optimal values for hyperparameters are not always easy to predict. Some hyperparameters may have no meaningful effect, or one important variable may be conditional upon the value of another. Often a separate process of hyperparameter tuning is needed to find a suitable combination for the data and task. As well as improving model performance, hyperparameters can be used by researchers to introduce robustness and reproducibility into their work, especially if it uses models that incorporate random number generation. == Considerations == The time required to train and test a model can depend upon the choice of its hyperparameters. A hyperparameter is usually of continuous or integer type, leading to mixed-type optimization problems. The existence of some hyperparameters is conditional upon the value of others, e.g. the size of each hidden layer in a neural network can be conditional upon the number of layers. === Difficulty-learnable parameters === The objective function is typically non-differentiable with respect to hyperparameters. As a result, in most instances, hyperparameters cannot be learned using gradient-based optimization methods (such as gradient descent), which are commonly employed to learn model parameters. These hyperparameters are those parameters describing a model representation that cannot be learned by common optimization methods, but nonetheless affect the loss function. An example would be the tolerance hyperparameter for errors in support vector machines. === Untrainable parameters === Sometimes, hyperparameters cannot be learned from the training data because they aggressively increase the capacity of a model and can push the loss function to an undesired minimum (overfitting to the data), as opposed to correctly mapping the richness of the structure in the data. For example, if we treat the degree of a polynomial equation fitting a regression model as a trainable parameter, the degree would increase until the model perfectly fit the data, yielding low training error, but poor generalization performance. === Tunability === Most performance variation can be attributed to just a few hyperparameters. The tunability of an algorithm, hyperparameter, or interacting hyperparameters is a measure of how much performance can be gained by tuning it. For an LSTM, while the learning rate followed by the network size are its most crucial hyperparameters, batching and momentum have no significant effect on its performance. Although some research has advocated the use of mini-batch sizes in the thousands, other work has found the best performance with mini-batch sizes between 2 and 32. === Robustness === An inherent stochasticity in learning directly implies that the empirical hyperparameter performance is not necessarily its true performance. Methods that are not robust to simple changes in hyperparameters, random seeds, or even different implementations of the same algorithm cannot be integrated into mission critical control systems without significant simplification and robustification. Reinforcement learning algorithms, in particular, require measuring their performance over a large number of random seeds, and also measuring their sensitivity to choices of hyperparameters. Their evaluation with a small number of random seeds does not capture performance adequately due to high variance. Some reinforcement learning methods, e.g. DDPG (Deep Deterministic Policy Gradient), are more sensitive to hyperparameter choices than others. == Optimization == Hyperparameter optimization finds a tuple of hyperparameters that yields an optimal model which minimizes a predefined loss function on given test data. The objective function takes a tuple of hyperparameters and returns the associated loss. Typically these methods are not gradient based, and instead apply concepts from derivative-free optimization or black box optimization. == Reproducibility == Apart from tuning hyperparameters, machine learning involves storing and organizing the parameters and results, and making sure they are reproducible. In the absence of a robust infrastructure for this purpose, research code often evolves quickly and compromises essential aspects like bookkeeping and reproducibility. Online collaboration platforms for machine learning go further by allowing scientists to automatically share, organize and discuss experiments, data, and algorithms. Reproducibility can be particularly difficult for deep learning models. For example, research has shown that deep learning models depend very heavily even on the random seed selection of the random number generator.
Kolmogorov–Arnold Networks
Kolmogorov–Arnold Networks (KANs) are a type of artificial neural network architecture inspired by the Kolmogorov–Arnold representation theorem, also known as the superposition theorem. Unlike traditional multilayer perceptrons (MLPs), which rely on fixed activation functions and linear weights, KANs replace each weight with a learnable univariate function, often represented using splines. == History == KANs (Kolmogorov–Arnold Networks) were proposed by Liu et al. (2024) as a generalization of the Kolmogorov–Arnold representation theorem (KART), aiming to outperform MLPs in small-scale AI and scientific tasks. Before KANs, numerous studies explored KART's connections to neural networks or used it as a basis for designing new network architectures. In the 1980s and 1990s, early research applied KART to neural network design. Kůrková et al. (1992), Hecht-Nielsen (1987), and Nees (1994) established theoretical foundations for multilayer networks based on KART. Igelnik et al. (2003) introduced the Kolmogorov Spline Network using cubic splines to model complex functions. Sprecher (1996, 1997) introduced numerical methods for building network layers, while Nakamura et al. (1993) created activation functions with guaranteed approximation accuracy. These works linked KART's theoretical potential with practical neural network implementation. KART has also been used in other computational and theoretical fields. Coppejans (2004) developed nonparametric regression estimators using B-splines, Bryant (2008) applied it to high-dimensional image tasks, Liu (2015) investigated theoretical applications in optimal transport and image encryption, and more recently, Polar and Poluektov (2021) used Urysohn operators for efficient KART construction, while Fakhoury et al. (2022) introduced ExSpliNet, integrating KART with probabilistic trees and multivariate B-splines for improved function approximation. == Architecture == KANs are based on the Kolmogorov–Arnold representation theorem, which was linked to the 13th Hilbert problem. Given x = ( x 1 , x 2 , … , x n ) {\displaystyle x=(x_{1},x_{2},\dots ,x_{n})} consisting of n variables, a multivariate continuous function f ( x ) {\displaystyle f(x)} can be represented as: f ( x ) = f ( x 1 , … , x n ) = ∑ q = 1 2 n + 1 Φ q ( ∑ p = 1 n φ q , p ( x p ) ) {\displaystyle f(x)=f(x_{1},\dots ,x_{n})=\sum _{q=1}^{2n+1}\Phi _{q}\left(\sum _{p=1}^{n}\varphi _{q,p}(x_{p})\right)} (1) This formulation contains two nested summations: an outer and an inner sum. The outer sum ∑ q = 1 2 n + 1 {\displaystyle \sum _{q=1}^{2n+1}} aggregates 2 n + 1 {\displaystyle 2n+1} terms, each involving a function Φ q : R → R {\displaystyle \Phi _{q}:\mathbb {R} \to \mathbb {R} } . The inner sum ∑ p = 1 n {\displaystyle \sum _{p=1}^{n}} computes n terms for each q, where each term φ q , p : [ 0 , 1 ] → R {\displaystyle \varphi _{q,p}:[0,1]\to \mathbb {R} } is a continuous function of the single variable x p {\displaystyle x_{p}} . The inner continuous functions φ q , p {\displaystyle \varphi _{q,p}} are universal, independent of f {\displaystyle f} , while the outer functions Φ q {\displaystyle \Phi _{q}} depend on the specific function f {\displaystyle f} being represented. The representation (1) holds for all multivariate functions f {\displaystyle f} as proved in . If f {\displaystyle f} is continuous, then the outer functions Φ q {\displaystyle \Phi _{q}} are continuous; if f {\displaystyle f} is discontinuous, then the corresponding Φ q {\displaystyle \Phi _{q}} are generally discontinuous, while the inner functions φ q , p {\displaystyle \varphi _{q,p}} remain the same universal functions. Liu et al. proposed the name KAN. A general KAN network consisting of L layers takes x to generate the output as: K A N ( x ) = ( Φ L − 1 ∘ Φ L − 2 ∘ ⋯ ∘ Φ 1 ∘ Φ 0 ) x {\displaystyle \mathrm {KAN} (x)=(\Phi ^{L-1}\circ \Phi ^{L-2}\circ \cdots \circ \Phi ^{1}\circ \Phi ^{0})x} (3) Here, Φ l {\displaystyle \Phi ^{l}} is the function matrix of the l-th KAN layer or a set of pre-activations. Let i denote the neuron of the l-th layer and j the neuron of the (l+1)-th layer. The activation function φ j , i l {\displaystyle \varphi _{j,i}^{l}} connects (l, i) to (l+1, j): φ j , i l , l = 0 , … , L − 1 , i = 1 , … , n l , j = 1 , … , n l + 1 {\displaystyle \varphi _{j,i}^{l},\quad l=0,\dots ,L-1,\;i=1,\dots ,n_{l},\;j=1,\dots ,n_{l+1}} (4) where nl is the number of nodes of the l-th layer. Thus, the function matrix Φ l {\displaystyle \Phi ^{l}} can be represented as an n l + 1 × n l {\displaystyle n_{l+1}\times n_{l}} matrix of activations: x l + 1 = ( φ 1 , 1 l ( ⋅ ) φ 1 , 2 l ( ⋅ ) ⋯ φ 1 , n l l ( ⋅ ) φ 2 , 1 l ( ⋅ ) φ 2 , 2 l ( ⋅ ) ⋯ φ 2 , n l l ( ⋅ ) ⋮ ⋮ ⋱ ⋮ φ n l + 1 , 1 l ( ⋅ ) φ n l + 1 , 2 l ( ⋅ ) ⋯ φ n l + 1 , n l l ( ⋅ ) ) x l {\displaystyle x^{l+1}={\begin{pmatrix}\varphi _{1,1}^{l}(\cdot )&\varphi _{1,2}^{l}(\cdot )&\cdots &\varphi _{1,n_{l}}^{l}(\cdot )\\\varphi _{2,1}^{l}(\cdot )&\varphi _{2,2}^{l}(\cdot )&\cdots &\varphi _{2,n_{l}}^{l}(\cdot )\\\vdots &\vdots &\ddots &\vdots \\\varphi _{n_{l+1},1}^{l}(\cdot )&\varphi _{n_{l+1},2}^{l}(\cdot )&\cdots &\varphi _{n_{l+1},n_{l}}^{l}(\cdot )\end{pmatrix}}x^{l}} == Implementations == To make the KAN layers optimizable, the inner function is formed by the combination of spline and basic functions as the formula: φ ( x ) = w b b ( x ) + w s spline ( x ) {\displaystyle \varphi (x)=w_{b}\,b(x)+w_{s}\,{\text{spline}}(x)} where b ( x ) {\displaystyle b(x)} is the basic function, usually defined as s i l u ( x ) = x / ( 1 + e x ) {\displaystyle silu(x)=x/(1+e^{x})} and w b {\displaystyle w_{b}} is the base weight matrix. Also, w s {\displaystyle w_{s}} is the spline weight matrix and spline ( x ) {\displaystyle {\text{spline}}(x)} is the spline function. The spline function can be a sum of B-splines. spline ( x ) = ∑ i c i B i ( x ) {\displaystyle {\text{spline}}(x)=\sum _{i}c_{i}B_{i}(x)} Many studies suggested to use other polynomial and curve functions instead of B-spline to create new KAN variants. == Functions used == The choice of functional basis strongly influences the performance of KANs. Common function families include: B-splines: Provide locality, smoothness, and interpretability; they are the most widely used in current implementations. RBFs (include Gaussian RBFs): Capture localized features in data and are effective in approximating functions with non-linear or clustered structures. Chebyshev polynomials: Offer efficient approximation with minimized error in the maximum norm, making them useful for stable function representation. Rational function: Useful for approximating functions with singularities or sharp variations, as they can model asymptotic behavior better than polynomials. Fourier series: Capture periodic patterns effectively and are particularly useful in domains such as physics-informed machine learning. Wavelet functions (DoG, Mexican hat, Morlet, and Shannon): Used for feature extraction as they can capture both high-frequency and low-frequency data components. Piecewise linear functions: Provide efficient approximation for multivariate functions in KANs. == Usage == In some modern neural architectures like convolutional neural networks (CNNs), recurrent neural networks (RNNs), and Transformers, KANs are typically used as drop-in substitutes for MLP layers. Despite KANs' general-purpose design, researchers have created and used them for a number of tasks: Scientific machine learning (SciML): Function fitting, partial differential equations (PDEs) and physical/mathematical laws. Continual learning: KANs better preserve previously learned information during incremental updates, avoiding catastrophic forgetting due to the locality of spline adjustments. Graph neural networks: Extensions such as Kolmogorov–Arnold Graph Neural Networks (KA-GNNs) integrate KAN modules into message-passing architectures, showing improvements in molecular property prediction tasks. Sensor data processing: Kolmogorov–Arnold Networks (KANs) have recently been applied to sensor data processing due to their ability to model complex nonlinear relationships with relatively few parameters and improved interpretability compared to conventional multilayer perceptrons. Applications include industrial soft sensors, biomedical signal analysis, remote sensing, and environmental monitoring systems. == Drawbacks == KANs can be computationally intensive and require a large number of parameters due to their use of polynomial functions to capture data.
AZFinText
Arizona Financial Text System (AZFinText) is a textual-based quantitative financial prediction system written by Robert P. Schumaker of University of Texas at Tyler and Hsinchun Chen of the University of Arizona. == System == This system differs from other systems in that it uses financial text as one of its key means of predicting stock price movement. This reduces the information lag-time problem evident in many similar systems where new information must be transcribed (e.g., such as losing a costly court battle or having a product recall), before the quant can react appropriately. AZFinText overcomes these limitations by utilizing the terms used in financial news articles to predict future stock prices twenty minutes after the news article has been released. It is believed that certain article terms can move stocks more than others. Terms such as factory exploded or workers strike will have a depressing effect on stock prices whereas terms such as earnings rose will tend to increase stock prices. The AZFinText system analyzes financial news to identify the patterns in how investors react to such specific information. It uses methods like sentiment analysis and term weighting to examine the text of news articles. This system is designed to find price differences that occur when the market responds to news stories. This approach provides an alternative and easier method for predicting stock market movements. == Overview of research == The foundation of AZFinText can be found in the ACM TOIS article. Within this paper, the authors tested several different prediction models and linguistic textual representations. From this work, it was found that using the article terms and the price of the stock at the time the article was released was the most effective model and using proper nouns was the most effective textual representation technique. Combining the two, AZFinText netted a 2.84% trading return over the five-week study period. AZFinText was then extended to study what combination of peer organizations help to best train the system. Using the premise that IBM has more in common with Microsoft than GM, AZFinText studied the effect of varying peer-based training sets. To do this, AZFinText trained on the various levels of GICS and evaluated the results. It was found that sector-based training was most effective, netting an 8.50% trading return, outperforming Jim Cramer, Jim Jubak and DayTraders.com during the study period. AZFinText was also compared against the top 10 quantitative systems and outperformed 6 of them. A third study investigated the role of portfolio building in a textual financial prediction system. From this study, Momentum and Contrarian stock portfolios were created and tested. Using the premise that past winning stocks will continue to win and past losing stocks will continue to lose, AZFinText netted a 20.79% return during the study period. It was also noted that traders were generally overreacting to news events, creating the opportunity of abnormal returns. A fourth study looked into using author sentiment as an added predictive variable. Using the premise that an author can unwittingly influence market trades simply by the terms they use, AZFinText was tested using tone and polarity features. It was found that Contrarian activity was occurring within the market, where articles of a positive tone would decrease in price and articles of a negative tone would increase in price. A further study investigated what article verbs have the most influence on stock price movement. From this work, it was found that planted, announcing, front, smaller and crude had the highest positive impact on stock price. == Notable publicity == AZFinText has been the topic of discussion by numerous media outlets. Some of the more notable ones include The Wall Street Journal, MIT's Technology Review, Dow Jones Newswire, WBIR in Knoxville, TN, Slashdot and other media outlets.
Concurrency control
In information technology and computer science, especially in the fields of computer programming, operating systems, multiprocessors, and databases, concurrency control ensures that correct results for concurrent operations are generated, while getting those results as quickly as possible. Computer systems, both software and hardware, consist of modules, or components. Each component is designed to operate correctly, i.e., to obey or to meet certain consistency rules. When components that operate concurrently interact by messaging or by sharing accessed data (in memory or storage), a certain component's consistency may be violated by another component. The general area of concurrency control provides rules, methods, design methodologies, and theories to maintain the consistency of components operating concurrently while interacting, and thus the consistency and correctness of the whole system. Introducing concurrency control into a system means applying operation constraints which typically result in some performance reduction. Operation consistency and correctness should be achieved with as good as possible efficiency, without reducing performance below reasonable levels. Concurrency control can require significant additional complexity and overhead in a concurrent algorithm compared to the simpler sequential algorithm. For example, a failure in concurrency control can result in data corruption from torn read or write operations. == Concurrency control in databases == Comments: This section is applicable to all transactional systems, i.e., to all systems that use database transactions (atomic transactions; e.g., transactional objects in Systems management and in networks of smartphones which typically implement private, dedicated database systems), not only general-purpose database management systems (DBMSs). DBMSs need to deal also with concurrency control issues not typical just to database transactions but rather to operating systems in general. These issues (e.g., see Concurrency control in operating systems below) are out of the scope of this section. Concurrency control in Database management systems (DBMS; e.g., Bernstein et al. 1987, Weikum and Vossen 2001), other transactional objects, and related distributed applications (e.g., Grid computing and Cloud computing) ensures that database transactions are performed concurrently without violating the data integrity of the respective databases. Thus concurrency control is an essential element for correctness in any system where two database transactions or more, executed with time overlap, can access the same data, e.g., virtually in any general-purpose database system. Consequently, a vast body of related research has been accumulated since database systems emerged in the early 1970s. A well established concurrency control theory for database systems is outlined in the references mentioned above: serializability theory, which allows to effectively design and analyze concurrency control methods and mechanisms. An alternative theory for concurrency control of atomic transactions over abstract data types is presented in (Lynch et al. 1993), and not utilized below. This theory is more refined, complex, with a wider scope, and has been less utilized in the Database literature than the classical theory above. Each theory has its pros and cons, emphasis and insight. To some extent they are complementary, and their merging may be useful. To ensure correctness, a DBMS usually guarantees that only serializable transaction schedules are generated, unless serializability is intentionally relaxed to increase performance, but only in cases where application correctness is not harmed. For maintaining correctness in cases of failed (aborted) transactions (which can always happen for many reasons) schedules also need to have the recoverability (from abort) property. A DBMS also guarantees that no effect of committed transactions is lost, and no effect of aborted (rolled back) transactions remains in the related database. Overall transaction characterization is usually summarized by the ACID rules below. As databases have become distributed, or needed to cooperate in distributed environments (e.g., Federated databases in the early 1990, and Cloud computing currently), the effective distribution of concurrency control mechanisms has received special attention. === Database transaction and the ACID rules === The concept of a database transaction (or atomic transaction) has evolved in order to enable both a well understood database system behavior in a faulty environment where crashes can happen any time, and recovery from a crash to a well understood database state. A database transaction is a unit of work, typically encapsulating a number of operations over a database (e.g., reading a database object, writing, acquiring lock, etc.), an abstraction supported in database and also other systems. Each transaction has well defined boundaries in terms of which program/code executions are included in that transaction (determined by the transaction's programmer via special transaction commands). Every database transaction obeys the following rules (by support in the database system; i.e., a database system is designed to guarantee them for the transactions it runs): Atomicity - Either the effects of all or none of its operations remain ("all or nothing" semantics) when a transaction is completed (committed or aborted respectively). In other words, to the outside world a committed transaction appears (by its effects on the database) to be indivisible (atomic), and an aborted transaction does not affect the database at all. Either all the operations are done or none of them are. Consistency - Every transaction must leave the database in a consistent (correct) state, i.e., maintain the predetermined integrity rules of the database (constraints upon and among the database's objects). A transaction must transform a database from one consistent state to another consistent state (however, it is the responsibility of the transaction's programmer to make sure that the transaction itself is correct, i.e., performs correctly what it intends to perform (from the application's point of view) while the predefined integrity rules are enforced by the DBMS). Thus since a database can be normally changed only by transactions, all the database's states are consistent. Isolation - Transactions cannot interfere with each other (as an end result of their executions). Moreover, usually (depending on concurrency control method) the effects of an incomplete transaction are not even visible to another transaction. Providing isolation is the main goal of concurrency control. Durability - Effects of successful (committed) transactions must persist through crashes (typically by recording the transaction's effects and its commit event in a non-volatile memory). The concept of atomic transaction has been extended during the years to what has become Business transactions which actually implement types of Workflow and are not atomic. However also such enhanced transactions typically utilize atomic transactions as components. === Why is concurrency control needed? === If transactions are executed serially, i.e., sequentially with no overlap in time, no transaction concurrency exists. However, if concurrent transactions with interleaving operations are allowed in an uncontrolled manner, some unexpected, undesirable results may occur, such as: The lost update problem: A second transaction writes a second value of a data-item (datum) on top of a first value written by a first concurrent transaction, and the first value is lost to other transactions running concurrently which need, by their precedence, to read the first value. The transactions that have read the wrong value end with incorrect results. The dirty read problem: Transactions read a value written by a transaction that has been later aborted. This value disappears from the database upon abort, and should not have been read by any transaction ("dirty read"). The reading transactions end with incorrect results. The incorrect summary problem: While one transaction takes a summary over the values of all the instances of a repeated data-item, a second transaction updates some instances of that data-item. The resulting summary does not reflect a correct result for any (usually needed for correctness) precedence order between the two transactions (if one is executed before the other), but rather some random result, depending on the timing of the updates, and whether certain update results have been included in the summary or not. Most high-performance transactional systems need to run transactions concurrently to meet their performance requirements. Thus, without concurrency control such systems can neither provide correct results nor maintain their databases consistently. === Concurrency control mechanisms === ==== Categories ==== The main categories of concurrency control mechanis
Virtual intelligence
Virtual intelligence (VI) is the term given to artificial intelligence that exists within a virtual world. Many virtual worlds have options for persistent avatars that provide information, training, role-playing, and social interactions. The immersion in virtual worlds provides a platform for VI beyond the traditional paradigm of past user interfaces (UIs). What Alan Turing established as a benchmark for telling the difference between human and computerized intelligence was devoid of visual influences. With today's VI bots, virtual intelligence has evolved past the constraints of past testing into a new level of the machine's ability to demonstrate intelligence. The immersive features of these environments provide nonverbal elements that affect the realism provided by virtually intelligent agents. Virtual intelligence is the intersection of these two technologies: Virtual environments: Immersive 3D spaces provide for collaboration, simulations, and role-playing interactions for training. Many of these virtual environments are currently being used for government and academic projects, including Second Life, VastPark, Olive, OpenSim, Outerra, Oracle's Open Wonderland, Duke University's Open Cobalt, and many others. Some of the commercial virtual worlds are also taking this technology into new directions, including the high-definition virtual world Blue Mars. Artificial intelligence (AI): AI is a branch of computer science that aims to create intelligent machines capable of performing tasks that typically require human intelligence. VI is a type of AI that operates within virtual environments to simulate human-like interactions and responses. == Applications == Cutlass Bomb Disposal Robot: Northrop Grumman developed a virtual training opportunity because of the prohibitive real-world cost and dangers associated with bomb disposal. By replicating a complicated system without having to learn advanced code, the virtual robot has no risk of damage, trainee safety hazards, or accessibility constraints. MyCyberTwin: NASA is among the companies that have used the MyCyberTwin AI technologies. They used it for the Phoenix rover in the virtual world Second Life. Their MyCyberTwin used a programmed profile to relay information about what the Phoenix rover was doing and its purpose. Second China: The University of Florida developed the "Second China" project as an immersive training experience for learning how to interact with the culture and language in a foreign country. Students are immersed in an environment that provides role-playing challenges coupled with language and cultural sensitivities magnified during country-level diplomatic missions or during times of potential conflict or regional destabilization. The virtual training provides participants with opportunities to access information, take part in guided learning scenarios, communicate, collaborate, and role-play. While China was the country for the prototype, this model can be modified for use with any culture to help better understand social and cultural interactions and see how other people think and what their actions imply. Duke School of Nursing Training Simulation: Extreme Reality developed virtual training to test critical thinking with a nurse performing trained procedures to identify critical data to make decisions and performing the correct steps for intervention. Bots are programmed to respond to the nurse's actions as the patient with their conditions improving if the nurse performs the correct actions.