General Internet Corpus of Russian (GICR) is a corpus of Russian internet texts that has been accessible on request through an online query interface since 2013. The corpus includes rich text materials from the blogosphere, social networks, major news sources and literary magazines. == Goals of the project == The project has the status of an educational and scientific one, and many tasks of computational linguistics are solved by independent researchers and research groups with the materials obtained by GICR. While other corpus projects of Russian are focused on fiction and edited texts, General Internet Corpus provides linguists timely opportunity to learn the language as it is, with all the slang and regional peculiarities. Corpus gives the opportunity to carry out research in Linguistic research of a wide range: dialectological research, study of word distribution, study of the language of the social networks, study of the influence of gender, age and other factors on the language, frequency of words, fixed expressions and different constructions, stylistic features of texts of different segments of the Internet, etc. Social media analysis Corpus-based machine learning for evaluating automatic tagging At various times, student papers and independent researches were carried out on the project material by students, graduates and employees of MSU, MIPT, Russian State Humanitarian University, Novosibirsk State University, Higher School of Economics, Russian Academy of Sciences, SFU, CSU, SGMP, IAAS of MSU. Scientific project leaders: Belikov V. - RSUH, Moscow, Russia Selegey V. - RSUH, ABBYY, Moscow, Russia Sharoff S. - RSUH, Moscow, Russia; University of Leeds, UK The organizations involved in support of GICR: Russian State University of Humanities ABBYY Company Moscow Institute of Physics and Technology Skolkovo Institute of Science and Technology == Size and content of the corpus == Corpus size for the summer 2016 is 19.8 billion tokens, of which 49% are from VKontakte, 40% are from LiveJournal, another 4% - from Mail.ru Blogs and News, and 2% - from Russian Magazine Hall. The sources collected in news segment are: RIA Novosti, Regnum, Lenta.ru, Rosbalt. Texts are provided with metamarkup (by date of creation of the text, sex, place and year of birth of the author, Internet genre, etc.); all texts are provided with automatic morphological tagging and lemmatization. Most of the texts collected are of 2013–2014 years of creation, although in some segments, such as in Russian Magazine Hall, there are some texts collected since 1994. GICR is one of the few mega-corpora projects nowadays, which means its available size is reaching several billion of words. == Access == Currently the interface of GICR is in beta stage, so access to the search in the corpora is provided and is free, but is available for researchers on request.
LemonStand
LemonStand was a Canadian e-commerce company headquartered in Vancouver, British Columbia, that developed cloud-based computer software for online retailers. LemonStand was shut down on June 5, 2019. == History == LemonStand Version 1 was launched on July 28, 2001. It is written in the PHP programming language. Version 1 was released as an on-premises proprietary licensed software, and the commercial license was not free. However, there was a free trial license available. June 2012, LemonStand raised seed funding from the BDC Venture Capital, and a group of angel investors. December 20, 2013, a cloud-based SaaS version of the LemonStand eCommerce platform was released publicly. May 9, 2014, LemonStand and Payfirma, a payments processing company, partnered to provide integrated services for online retailers. May 3, 2016, LemonStand raised funding from BDC Venture Capital and Silicon Valley–based angel investors. March 5, 2019, LemonStand announced their intention to shut down on June 5, 2019. LemonStand was quietly acquired by Mailchimp at the end of February. == Pricing == LemonStand offered three levels of service plans. LemonStand did not charge any transaction fees.
Kolmogorov–Arnold Networks
Kolmogorov–Arnold Networks (KANs) are a type of artificial neural network architecture inspired by the Kolmogorov–Arnold representation theorem, also known as the superposition theorem. Unlike traditional multilayer perceptrons (MLPs), which rely on fixed activation functions and linear weights, KANs replace each weight with a learnable univariate function, often represented using splines. == History == KANs (Kolmogorov–Arnold Networks) were proposed by Liu et al. (2024) as a generalization of the Kolmogorov–Arnold representation theorem (KART), aiming to outperform MLPs in small-scale AI and scientific tasks. Before KANs, numerous studies explored KART's connections to neural networks or used it as a basis for designing new network architectures. In the 1980s and 1990s, early research applied KART to neural network design. Kůrková et al. (1992), Hecht-Nielsen (1987), and Nees (1994) established theoretical foundations for multilayer networks based on KART. Igelnik et al. (2003) introduced the Kolmogorov Spline Network using cubic splines to model complex functions. Sprecher (1996, 1997) introduced numerical methods for building network layers, while Nakamura et al. (1993) created activation functions with guaranteed approximation accuracy. These works linked KART's theoretical potential with practical neural network implementation. KART has also been used in other computational and theoretical fields. Coppejans (2004) developed nonparametric regression estimators using B-splines, Bryant (2008) applied it to high-dimensional image tasks, Liu (2015) investigated theoretical applications in optimal transport and image encryption, and more recently, Polar and Poluektov (2021) used Urysohn operators for efficient KART construction, while Fakhoury et al. (2022) introduced ExSpliNet, integrating KART with probabilistic trees and multivariate B-splines for improved function approximation. == Architecture == KANs are based on the Kolmogorov–Arnold representation theorem, which was linked to the 13th Hilbert problem. Given x = ( x 1 , x 2 , … , x n ) {\displaystyle x=(x_{1},x_{2},\dots ,x_{n})} consisting of n variables, a multivariate continuous function f ( x ) {\displaystyle f(x)} can be represented as: f ( x ) = f ( x 1 , … , x n ) = ∑ q = 1 2 n + 1 Φ q ( ∑ p = 1 n φ q , p ( x p ) ) {\displaystyle f(x)=f(x_{1},\dots ,x_{n})=\sum _{q=1}^{2n+1}\Phi _{q}\left(\sum _{p=1}^{n}\varphi _{q,p}(x_{p})\right)} (1) This formulation contains two nested summations: an outer and an inner sum. The outer sum ∑ q = 1 2 n + 1 {\displaystyle \sum _{q=1}^{2n+1}} aggregates 2 n + 1 {\displaystyle 2n+1} terms, each involving a function Φ q : R → R {\displaystyle \Phi _{q}:\mathbb {R} \to \mathbb {R} } . The inner sum ∑ p = 1 n {\displaystyle \sum _{p=1}^{n}} computes n terms for each q, where each term φ q , p : [ 0 , 1 ] → R {\displaystyle \varphi _{q,p}:[0,1]\to \mathbb {R} } is a continuous function of the single variable x p {\displaystyle x_{p}} . The inner continuous functions φ q , p {\displaystyle \varphi _{q,p}} are universal, independent of f {\displaystyle f} , while the outer functions Φ q {\displaystyle \Phi _{q}} depend on the specific function f {\displaystyle f} being represented. The representation (1) holds for all multivariate functions f {\displaystyle f} as proved in . If f {\displaystyle f} is continuous, then the outer functions Φ q {\displaystyle \Phi _{q}} are continuous; if f {\displaystyle f} is discontinuous, then the corresponding Φ q {\displaystyle \Phi _{q}} are generally discontinuous, while the inner functions φ q , p {\displaystyle \varphi _{q,p}} remain the same universal functions. Liu et al. proposed the name KAN. A general KAN network consisting of L layers takes x to generate the output as: K A N ( x ) = ( Φ L − 1 ∘ Φ L − 2 ∘ ⋯ ∘ Φ 1 ∘ Φ 0 ) x {\displaystyle \mathrm {KAN} (x)=(\Phi ^{L-1}\circ \Phi ^{L-2}\circ \cdots \circ \Phi ^{1}\circ \Phi ^{0})x} (3) Here, Φ l {\displaystyle \Phi ^{l}} is the function matrix of the l-th KAN layer or a set of pre-activations. Let i denote the neuron of the l-th layer and j the neuron of the (l+1)-th layer. The activation function φ j , i l {\displaystyle \varphi _{j,i}^{l}} connects (l, i) to (l+1, j): φ j , i l , l = 0 , … , L − 1 , i = 1 , … , n l , j = 1 , … , n l + 1 {\displaystyle \varphi _{j,i}^{l},\quad l=0,\dots ,L-1,\;i=1,\dots ,n_{l},\;j=1,\dots ,n_{l+1}} (4) where nl is the number of nodes of the l-th layer. Thus, the function matrix Φ l {\displaystyle \Phi ^{l}} can be represented as an n l + 1 × n l {\displaystyle n_{l+1}\times n_{l}} matrix of activations: x l + 1 = ( φ 1 , 1 l ( ⋅ ) φ 1 , 2 l ( ⋅ ) ⋯ φ 1 , n l l ( ⋅ ) φ 2 , 1 l ( ⋅ ) φ 2 , 2 l ( ⋅ ) ⋯ φ 2 , n l l ( ⋅ ) ⋮ ⋮ ⋱ ⋮ φ n l + 1 , 1 l ( ⋅ ) φ n l + 1 , 2 l ( ⋅ ) ⋯ φ n l + 1 , n l l ( ⋅ ) ) x l {\displaystyle x^{l+1}={\begin{pmatrix}\varphi _{1,1}^{l}(\cdot )&\varphi _{1,2}^{l}(\cdot )&\cdots &\varphi _{1,n_{l}}^{l}(\cdot )\\\varphi _{2,1}^{l}(\cdot )&\varphi _{2,2}^{l}(\cdot )&\cdots &\varphi _{2,n_{l}}^{l}(\cdot )\\\vdots &\vdots &\ddots &\vdots \\\varphi _{n_{l+1},1}^{l}(\cdot )&\varphi _{n_{l+1},2}^{l}(\cdot )&\cdots &\varphi _{n_{l+1},n_{l}}^{l}(\cdot )\end{pmatrix}}x^{l}} == Implementations == To make the KAN layers optimizable, the inner function is formed by the combination of spline and basic functions as the formula: φ ( x ) = w b b ( x ) + w s spline ( x ) {\displaystyle \varphi (x)=w_{b}\,b(x)+w_{s}\,{\text{spline}}(x)} where b ( x ) {\displaystyle b(x)} is the basic function, usually defined as s i l u ( x ) = x / ( 1 + e x ) {\displaystyle silu(x)=x/(1+e^{x})} and w b {\displaystyle w_{b}} is the base weight matrix. Also, w s {\displaystyle w_{s}} is the spline weight matrix and spline ( x ) {\displaystyle {\text{spline}}(x)} is the spline function. The spline function can be a sum of B-splines. spline ( x ) = ∑ i c i B i ( x ) {\displaystyle {\text{spline}}(x)=\sum _{i}c_{i}B_{i}(x)} Many studies suggested to use other polynomial and curve functions instead of B-spline to create new KAN variants. == Functions used == The choice of functional basis strongly influences the performance of KANs. Common function families include: B-splines: Provide locality, smoothness, and interpretability; they are the most widely used in current implementations. RBFs (include Gaussian RBFs): Capture localized features in data and are effective in approximating functions with non-linear or clustered structures. Chebyshev polynomials: Offer efficient approximation with minimized error in the maximum norm, making them useful for stable function representation. Rational function: Useful for approximating functions with singularities or sharp variations, as they can model asymptotic behavior better than polynomials. Fourier series: Capture periodic patterns effectively and are particularly useful in domains such as physics-informed machine learning. Wavelet functions (DoG, Mexican hat, Morlet, and Shannon): Used for feature extraction as they can capture both high-frequency and low-frequency data components. Piecewise linear functions: Provide efficient approximation for multivariate functions in KANs. == Usage == In some modern neural architectures like convolutional neural networks (CNNs), recurrent neural networks (RNNs), and Transformers, KANs are typically used as drop-in substitutes for MLP layers. Despite KANs' general-purpose design, researchers have created and used them for a number of tasks: Scientific machine learning (SciML): Function fitting, partial differential equations (PDEs) and physical/mathematical laws. Continual learning: KANs better preserve previously learned information during incremental updates, avoiding catastrophic forgetting due to the locality of spline adjustments. Graph neural networks: Extensions such as Kolmogorov–Arnold Graph Neural Networks (KA-GNNs) integrate KAN modules into message-passing architectures, showing improvements in molecular property prediction tasks. Sensor data processing: Kolmogorov–Arnold Networks (KANs) have recently been applied to sensor data processing due to their ability to model complex nonlinear relationships with relatively few parameters and improved interpretability compared to conventional multilayer perceptrons. Applications include industrial soft sensors, biomedical signal analysis, remote sensing, and environmental monitoring systems. == Drawbacks == KANs can be computationally intensive and require a large number of parameters due to their use of polynomial functions to capture data.
Google Research
Google Research (also known as Research at Google) is the research division of Google, a subsidiary of Alphabet Inc.. According to its official website, Google Research publishes findings, releases open-source software, and applies research results within Google products and services as well as within the wider scientific community. == Notable contributions == The 2017 landmark paper Attention Is All You Need, which introduced the Transformer architecture, which has subsequently been used to build modern large language models. Advances in neural machine translation powering Google Translate. Time series forecasting. Development of scalable learning systems and infrastructure for large-model training. Flood forecasting. Research into computational discovery via Google Accelerated Science including demonstrating the first below-threshold quantum calculations.
Class activation mapping
Class activation mapping methods are explainable AI (XAI) techniques used to visualize the regions of an input image that are the most relevant for a particular task, especially image classification, in convolutional neural networks (CNNs). These methods generate heatmaps by weighting the feature maps from a convolutional layer according to their relevance to the target class. In the field of artificial intelligence, generically defined as "the effort to automate intellectual tasks normally performed by humans", machine learning and deep learning were created. They both use statistical and computational methods to learn patterns from data, reducing the need for manually coded rules. Machine learning models are trained on input data and the known respective answers, learning the underlying patterns or structures present in the data. Traditional Machine learning algorithms employ manually designed feature sets, posing a direct link between machine learning designers and employed features. Deep learning is a subfield of machine learning, based on the concept of successive layers of representation, in which the data is progressively unfolded in different ways, to extract relevant and informative patterns in data analysis. Deep learning algorithms are defined as feature learning algorithms automatically learning hierarchical feature representations from raw data, extracting increasingly abstract features through multiple layers. CNNs are a specific architecture of deep learning models, designed to process spatially structured data, such as images, exploiting a series of convolution, non-linear activation and pooling operations to extract relevant features, contained in the so-called feature maps from input data. CNNs have demonstrated to be highly effective in a variety of computer vision and image processing tasks. CNNs (and deep learning models more broadly) are described as black boxes due to their complex and non-transparent internal layers of representation. The need for clearer indications on its internal working and decision-making process gave birth to XAI techniques. Among the proposed XAI techniques for computer vision tasks, Class activation mapping methods can show which pixels in an input image are important to the predicted logit for a class of interest, in a classification task. Class activation mapping methods were originally developed for class-discriminative scenarios to visualize which parts of the input image influenced the classification decision, namely to visually highlight the regions of those feature maps that contribute most strongly to the prediction of a given class. More advanced versions of these methods are not limited to image classification tasks, but have been extended also to several vision-related tasks, such as object detection, image captioning, visual question answering and image segmentation. == Background == The following methods laid the groundwork for the class activation maps approaches, forming the conceptual basis of using gradients to highlight class-discriminative regions. === Class model visualization and saliency maps for convolutional neural networks === The class model visualization and image-specific saliency maps approaches have been presented in the foundational work "Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps" by Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman and it generalizes the deconvnet method by Zeiler and Fergus. Class model visualization synthesizes an artificial input image that strongly activates the output neurons associated with a target class. Given a trained, fixed model, this method starts with a zero-initialized image, backpropagates the gradients from the class score to the image pixels, updates the image pixels increasing the specific class scores and it repeats the pixel updating process, showing an encoded (idealized version) prototype of the class of interest. Image-specific class saliency visualization method provides a visual explanation by highlighting the most relevant pixels in an image for predicting a certain class C of interest. This is done by computing the gradient of the class score with respect to the input image, I 0 , {\displaystyle I_{0},} w = ∂ S C ∂ I | I 0 {\displaystyle w=\left.{\frac {\partial S_{C}}{\partial I}}\right|_{I_{0}}} approximating the model locally (around I 0 {\displaystyle I_{0}} ) as linear, using a first-order Taylor expansion: S C ( I ) ≈ w C T I + b {\displaystyle S_{C}(I)\approx w_{C}^{T}I+b} . The magnitude of w C {\displaystyle w_{C}} , the gradient, indicates the importancy of the pixels: larger gradients suggest greater influence on the prediction. Once the gradient is known, the saliency map is defined as the maximum absolute gradient across the color channels: M i j = m a x C | ∂ S C ∂ I i j C | {\displaystyle M_{ij}=max_{C}\left|{\frac {\partial S_{C}}{\partial I_{ij}^{C}}}\right|} resulting in an saliency map (i.e. heatmap). === Guided backpropagation === The concept of guided backpropagation can be traced for the first time in the paper by Springenberg et al. "Striving For Simplicity: The All Convolutional Net" and also this method builds upon the work by Zeiler and Fergus "Visualizing and Understanding Convolutional Networks". Guided backpropagation core is to understand what a CNN is learning, by visualizing the patterns that activate more strongly individual neurons (or filters), in architectures which do not rely on max-pooling layer. When propagating gradients back through a rectified linear unit (ReLU), guided backpropagation passes the gradient if and only if the input to the ReLU was positive (forward pass) and the output gradient is positive (backward signal), tackling both inactive neurons, negative gradients and suppressing the noise. The result displays sharper, high-resolution visualizations of what each neuron is responding to. Guided backpropagation represents a simple and practical method for model interpretability, helping understand how and where neural networks detect semantic concepts across layers. Moreover, it can be applied to any network architecture, due to its working principle. == Base versions == Class activation mapping and gradient-weighted class activation mapping are the original and most widely used methods for visual explanations in convolutional neural networks. These methods serve as the foundation for many later developments in explainable AI. Notation: In this article, the symbols i and j represent integer indices that disappear inside sums or averages, while x and y are the continuous (or up-sampled integer) coordinates of the final heat-map that is plotted. === Class activation mapping (CAM) === Class activation mapping (CAM) was the first, and the original, version of CAM methods, and it gave the name to the whole category. The approach was firstly introduced by Zhou et al. in their seminal work "Learning Deep Features for Discriminative Localization". This approach achieves class-specific heatmaps by modifying image classification CNN architectures, replacing fully-connected layers with convolutional layers and a final global average pooling layer. Its main scope is to localize and highlight discriminative regions of an input image that a CNN uses to identify a particular class, without needing explicit bounding box annotations. ==== Global average pooling (GAP) ==== Global average pooling (GAP) represents the key element in the original CAM approach. It is a dimensionality reduction technique and, similarly to other pooling layers, it allows the downsampling of the feature maps, calculating representative values for a specific region of the feature map. The particularity of GAP is that it calculates a single value for an entire feature map, significantly reducing the model dimensions. ==== Mathematical description ==== The mathematical description considers as its key the combination of convolutional and GAP layers. In CAM, it is mandatory to have the GAP layer after the last convolutional layer and before the final linear classifier layer. This last element of the architecture connects the output logits (the network predictions) y C {\displaystyle y^{C}} , to the GAP values, with its respective fine-tuned weights, w k C {\displaystyle w_{k}^{C}} . Considering A k {\displaystyle A^{k}} as the last feature maps of the last convolutional layer, GAP produces one value for each feature map, by averaging all the matrix elements (i, j) of the feature map: F k = 1 m n ∑ i = 1 m ∑ j = 1 n A i j k {\displaystyle F^{k}={\frac {1}{mn}}\sum _{i=1}^{m}\sum _{j=1}^{n}A_{ij}^{k}} with A k = [ A 11 k A 12 k ⋯ A 1 n k A 21 k A 22 k ⋯ A 2 n k ⋮ ⋮ ⋱ ⋮ A m 1 k A m 2 k ⋯ A m n k ] = { A i j k ∣ 1 ≤ i ≤ m , 1 ≤ j ≤ n } {\displaystyle A^{k}={\begin{bmatrix}A_{11}^{k}&A_{12}^{k}&\cdots &A_{1n}^{k}\\A_{21}^{k}&A_{22}^{k}&\cdots &A_{2n}^{k}\\\vdots &\vdots &\ddots &\vdots \\A_{m1}^{k}&A_{m2}^{k}&\cdots &A_{mn}^{k}\end{bmatrix}}=\left\{A_{
Shell Control Box
Shell Control Box (SCB) is a network security appliance that controls privileged access to remote IT systems, records activities in replayable audit trails, and prevents malicious actions. For example, it records as a system administrator updates a file server or a third-party network operator configures a router. The recorded audit trails can be replayed like a movie to review the events as they occurred. The content of the audit trails is indexed to make searching for events and automatic reporting possible. SCB is a Linux-based device developed by Balabit. It is an application level proxy gateway. In 2017, Balabit changed the name of the product to Privileged Session Management (PSM) and repositioned it as the core module of its Privileged Access Management solution. == Main Features == Balabit’s Privileged Session Management (PSM), Shell Control Box (SCB) is a device that controls, monitors, and audits remote administrative access to servers and network devices. It is a tool to oversee system administrators by controlling the encrypted connections used for administration. PSM (SCB) has full control over the SSH, RDP, Telnet, TN3270, TN5250, Citrix ICA, and VNC connections, providing a framework (with solid boundaries) for the work of the administrators. === Gateway Authentication === PSM (SCB) acts as an authentication gateway, enforcing strong authentication before users access IT assets. PSM can also integrate to user directories (for example, a Microsoft Active Directory) to resolve the group memberships of the users who access the protected servers. Credentials for accessing the server are retrieved transparently from PSM’s credential store or a third-party password management system by PSM impersonating the authenticated user. This automatic password retrieval protects the confidentiality of passwords as users can never access them. === Access Control === PSM controls and audits privileged access over the most wide-spread protocols such as SSH, RDP, or HTTP(s). The detailed access management helps to control who can access what and when on servers. It is also possible to control advanced features of the protocols, like the type of channels permitted. For example, unneeded channels like file transfer or file sharing can be disabled, reducing the security risk on the server. With PSM policies for privileged access can be enforced in one single system. === 4-eyes Authorization === To avoid accidental misconfiguration and other human errors, PSM supports the 4-eyes authorization principle. This is achieved by requiring an authorizer to allow administrators to access the server. The authorizer also has the possibility to monitor – and terminate - the session of the administrator in real-time, as if they were watching the same screen. === Real-time Monitoring and Session Termination === PSM can monitor the network traffic in real time, and execute various actions if a certain pattern (for example, a suspicious command, window title or text) appears on the screen. PSM can also detect specific patterns such as credit card numbers. In case of detecting a suspicious user action, PSM can send an e-mail alert or immediately terminate the connection. For example, PSM can block the connection before a destructive administrator command, such as the „rm” comes into effect. === Session Recording === PSM makes user activities traceable by recording them in tamper-proof and confidential audit trails. It records the selected sessions into encrypted, timestamped, and digitally signed audit trails. Audit trails can be browsed online, or followed real-time to monitor the activities of the users. PSM replays the recorded sessions just like a movie – actions of the users can be seen exactly as they appeared on their monitor. The Balabit Desktop Player enables fast forwarding during replays, searching for events (for example, typed commands or pressing Enter) and texts seen by the user. In the case of any problems (database manipulation, unexpected shutdown, etc.) the circumstances of the event are readily available in the trails, thus the cause of the incident can be identified. In addition to recording audit trails, transferred files can be also recorded and extracted for further analysis.
Kolmogorov–Arnold Networks
Kolmogorov–Arnold Networks (KANs) are a type of artificial neural network architecture inspired by the Kolmogorov–Arnold representation theorem, also known as the superposition theorem. Unlike traditional multilayer perceptrons (MLPs), which rely on fixed activation functions and linear weights, KANs replace each weight with a learnable univariate function, often represented using splines. == History == KANs (Kolmogorov–Arnold Networks) were proposed by Liu et al. (2024) as a generalization of the Kolmogorov–Arnold representation theorem (KART), aiming to outperform MLPs in small-scale AI and scientific tasks. Before KANs, numerous studies explored KART's connections to neural networks or used it as a basis for designing new network architectures. In the 1980s and 1990s, early research applied KART to neural network design. Kůrková et al. (1992), Hecht-Nielsen (1987), and Nees (1994) established theoretical foundations for multilayer networks based on KART. Igelnik et al. (2003) introduced the Kolmogorov Spline Network using cubic splines to model complex functions. Sprecher (1996, 1997) introduced numerical methods for building network layers, while Nakamura et al. (1993) created activation functions with guaranteed approximation accuracy. These works linked KART's theoretical potential with practical neural network implementation. KART has also been used in other computational and theoretical fields. Coppejans (2004) developed nonparametric regression estimators using B-splines, Bryant (2008) applied it to high-dimensional image tasks, Liu (2015) investigated theoretical applications in optimal transport and image encryption, and more recently, Polar and Poluektov (2021) used Urysohn operators for efficient KART construction, while Fakhoury et al. (2022) introduced ExSpliNet, integrating KART with probabilistic trees and multivariate B-splines for improved function approximation. == Architecture == KANs are based on the Kolmogorov–Arnold representation theorem, which was linked to the 13th Hilbert problem. Given x = ( x 1 , x 2 , … , x n ) {\displaystyle x=(x_{1},x_{2},\dots ,x_{n})} consisting of n variables, a multivariate continuous function f ( x ) {\displaystyle f(x)} can be represented as: f ( x ) = f ( x 1 , … , x n ) = ∑ q = 1 2 n + 1 Φ q ( ∑ p = 1 n φ q , p ( x p ) ) {\displaystyle f(x)=f(x_{1},\dots ,x_{n})=\sum _{q=1}^{2n+1}\Phi _{q}\left(\sum _{p=1}^{n}\varphi _{q,p}(x_{p})\right)} (1) This formulation contains two nested summations: an outer and an inner sum. The outer sum ∑ q = 1 2 n + 1 {\displaystyle \sum _{q=1}^{2n+1}} aggregates 2 n + 1 {\displaystyle 2n+1} terms, each involving a function Φ q : R → R {\displaystyle \Phi _{q}:\mathbb {R} \to \mathbb {R} } . The inner sum ∑ p = 1 n {\displaystyle \sum _{p=1}^{n}} computes n terms for each q, where each term φ q , p : [ 0 , 1 ] → R {\displaystyle \varphi _{q,p}:[0,1]\to \mathbb {R} } is a continuous function of the single variable x p {\displaystyle x_{p}} . The inner continuous functions φ q , p {\displaystyle \varphi _{q,p}} are universal, independent of f {\displaystyle f} , while the outer functions Φ q {\displaystyle \Phi _{q}} depend on the specific function f {\displaystyle f} being represented. The representation (1) holds for all multivariate functions f {\displaystyle f} as proved in . If f {\displaystyle f} is continuous, then the outer functions Φ q {\displaystyle \Phi _{q}} are continuous; if f {\displaystyle f} is discontinuous, then the corresponding Φ q {\displaystyle \Phi _{q}} are generally discontinuous, while the inner functions φ q , p {\displaystyle \varphi _{q,p}} remain the same universal functions. Liu et al. proposed the name KAN. A general KAN network consisting of L layers takes x to generate the output as: K A N ( x ) = ( Φ L − 1 ∘ Φ L − 2 ∘ ⋯ ∘ Φ 1 ∘ Φ 0 ) x {\displaystyle \mathrm {KAN} (x)=(\Phi ^{L-1}\circ \Phi ^{L-2}\circ \cdots \circ \Phi ^{1}\circ \Phi ^{0})x} (3) Here, Φ l {\displaystyle \Phi ^{l}} is the function matrix of the l-th KAN layer or a set of pre-activations. Let i denote the neuron of the l-th layer and j the neuron of the (l+1)-th layer. The activation function φ j , i l {\displaystyle \varphi _{j,i}^{l}} connects (l, i) to (l+1, j): φ j , i l , l = 0 , … , L − 1 , i = 1 , … , n l , j = 1 , … , n l + 1 {\displaystyle \varphi _{j,i}^{l},\quad l=0,\dots ,L-1,\;i=1,\dots ,n_{l},\;j=1,\dots ,n_{l+1}} (4) where nl is the number of nodes of the l-th layer. Thus, the function matrix Φ l {\displaystyle \Phi ^{l}} can be represented as an n l + 1 × n l {\displaystyle n_{l+1}\times n_{l}} matrix of activations: x l + 1 = ( φ 1 , 1 l ( ⋅ ) φ 1 , 2 l ( ⋅ ) ⋯ φ 1 , n l l ( ⋅ ) φ 2 , 1 l ( ⋅ ) φ 2 , 2 l ( ⋅ ) ⋯ φ 2 , n l l ( ⋅ ) ⋮ ⋮ ⋱ ⋮ φ n l + 1 , 1 l ( ⋅ ) φ n l + 1 , 2 l ( ⋅ ) ⋯ φ n l + 1 , n l l ( ⋅ ) ) x l {\displaystyle x^{l+1}={\begin{pmatrix}\varphi _{1,1}^{l}(\cdot )&\varphi _{1,2}^{l}(\cdot )&\cdots &\varphi _{1,n_{l}}^{l}(\cdot )\\\varphi _{2,1}^{l}(\cdot )&\varphi _{2,2}^{l}(\cdot )&\cdots &\varphi _{2,n_{l}}^{l}(\cdot )\\\vdots &\vdots &\ddots &\vdots \\\varphi _{n_{l+1},1}^{l}(\cdot )&\varphi _{n_{l+1},2}^{l}(\cdot )&\cdots &\varphi _{n_{l+1},n_{l}}^{l}(\cdot )\end{pmatrix}}x^{l}} == Implementations == To make the KAN layers optimizable, the inner function is formed by the combination of spline and basic functions as the formula: φ ( x ) = w b b ( x ) + w s spline ( x ) {\displaystyle \varphi (x)=w_{b}\,b(x)+w_{s}\,{\text{spline}}(x)} where b ( x ) {\displaystyle b(x)} is the basic function, usually defined as s i l u ( x ) = x / ( 1 + e x ) {\displaystyle silu(x)=x/(1+e^{x})} and w b {\displaystyle w_{b}} is the base weight matrix. Also, w s {\displaystyle w_{s}} is the spline weight matrix and spline ( x ) {\displaystyle {\text{spline}}(x)} is the spline function. The spline function can be a sum of B-splines. spline ( x ) = ∑ i c i B i ( x ) {\displaystyle {\text{spline}}(x)=\sum _{i}c_{i}B_{i}(x)} Many studies suggested to use other polynomial and curve functions instead of B-spline to create new KAN variants. == Functions used == The choice of functional basis strongly influences the performance of KANs. Common function families include: B-splines: Provide locality, smoothness, and interpretability; they are the most widely used in current implementations. RBFs (include Gaussian RBFs): Capture localized features in data and are effective in approximating functions with non-linear or clustered structures. Chebyshev polynomials: Offer efficient approximation with minimized error in the maximum norm, making them useful for stable function representation. Rational function: Useful for approximating functions with singularities or sharp variations, as they can model asymptotic behavior better than polynomials. Fourier series: Capture periodic patterns effectively and are particularly useful in domains such as physics-informed machine learning. Wavelet functions (DoG, Mexican hat, Morlet, and Shannon): Used for feature extraction as they can capture both high-frequency and low-frequency data components. Piecewise linear functions: Provide efficient approximation for multivariate functions in KANs. == Usage == In some modern neural architectures like convolutional neural networks (CNNs), recurrent neural networks (RNNs), and Transformers, KANs are typically used as drop-in substitutes for MLP layers. Despite KANs' general-purpose design, researchers have created and used them for a number of tasks: Scientific machine learning (SciML): Function fitting, partial differential equations (PDEs) and physical/mathematical laws. Continual learning: KANs better preserve previously learned information during incremental updates, avoiding catastrophic forgetting due to the locality of spline adjustments. Graph neural networks: Extensions such as Kolmogorov–Arnold Graph Neural Networks (KA-GNNs) integrate KAN modules into message-passing architectures, showing improvements in molecular property prediction tasks. Sensor data processing: Kolmogorov–Arnold Networks (KANs) have recently been applied to sensor data processing due to their ability to model complex nonlinear relationships with relatively few parameters and improved interpretability compared to conventional multilayer perceptrons. Applications include industrial soft sensors, biomedical signal analysis, remote sensing, and environmental monitoring systems. == Drawbacks == KANs can be computationally intensive and require a large number of parameters due to their use of polynomial functions to capture data.