AI For Business Cards

AI For Business Cards — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Apache Kudu

    Apache Kudu

    Apache Kudu is a free and open source column-oriented data store of the Apache Hadoop ecosystem. It is compatible with most of the data processing frameworks in the Hadoop environment. It provides completeness to Hadoop's storage layer to enable fast analytics on fast data. The open source project to build Apache Kudu began as internal project at Cloudera. The first version Apache Kudu 1.0 was released 19 September 2016. == Comparison with other storage engines == Kudu was designed and optimized for OLAP workloads. Like HBase, it is a real-time store that supports key-indexed record lookup and mutation. Kudu differs from HBase since Kudu's datamodel is a more traditional relational model, while HBase is schemaless. Kudu's "on-disk representation is truly columnar and follows an entirely different storage design than HBase/Bigtable".

    Read more →
  • Xiaomi MiMo

    Xiaomi MiMo

    Xiaomi MiMo is a family of large language models (LLMs) developed by Xiaomi. It was initially released in April 2025 with the MiMo-7B model. Currently, MiMo is available for developers through API service. It is used as the key AI model in Xiaomi's "Human x Car x Home" ecosystem. == Development == Xiaomi developed MiMo as a reasoning-focused language model. Its development team was led by Luo Fuli, who had previously worked at DeepSeek before joining Xiaomi in late 2025. The model was trained using multi-token prediction and reinforcement learning, with a particular emphasis on mathematical reasoning and code generation tasks. In March 2026, Xiaomi CEO Lei Jun announced that the company planned to invest at least US$8.7 billion in artificial intelligence over the following three years. == Models == === List of models === === MiMo-7B === MiMo-7B is the first model of this LLM. The base model, MiMo-7B-Base, was pre-trained on approximately 25 trillion tokens using web pages, academic papers, books, and synthetic reasoning data. MiMo-7B-RL underwent supervised fine-tuning and reinforcement learning on 130,000 mathematics and code problems. MiMo-7B-RL-0530 was released in May 2025. It scaled the fine-tuning dataset from 500,000 to 6 million instances and extended the RL window from 32,000 to 48,000 tokens and improved AIME 2024 scores from 68.2 to 80.1. MiMo-VL-7B was a vision-language model combining a Vision Transformer encoder with the MiMo-7B backbone. It was trained in four stages consuming 2.4 trillion tokens. Its reinforcement learning variant used Mixed On-Policy Reinforcement Learning (MORL) which integrated reward signals across perception, grounding, and reasoning. Xiaomi also released MiMo-Audio-7B, an audio-language model for voice conversion, style transfer, and speech editing. === MiMo-V2-Flash === MiMo-V2-Flash was launched in December 2025. It is a open-sourced Mixture-of-experts model with 309 billion total parameters and 15 billion active parameters. It was trained on 27 trillion tokens using FP8 mixed precision. It used hybrid attention interleaving Sliding Window and Global Attention at a 5:1 ratio. === MiMo-V2-Pro === Xiaomi publicly introduced MiMo-V2-Pro on 18 March 2026. It has over 1 trillion total parameters, 42 billion active, and a 1-million-token context window. Before the official release, the model had appeared anonymously on OpenRouter under the codename "Hunter Alpha," where it drew substantial usage and topped daily charts for several days, according to Xiaomi and Reuters. During its listing on OpenRouter, the model reportedly processed over one trillion tokens in total usage. Xiaomi later said Hunter Alpha was an early internal test build of MiMo-V2-Pro, and Reuters reported that the model had been mistaken by some users for a possible DeepSeek system before Xiaomi confirmed its origin. The model was released as a proprietary API product, and Luo Fuli stated that Xiaomi intended to open-source a variant at an unspecified future date. Xiaomi has partnered with several API web platforms like OpenClaw to launch the model. All these websites initially offered a free trial of this model for a week, but due to the overwhelming response, Xiaomi later extended the free trial period of the model until 2 April 2026. === MiMo-V2-Omni === Alongside MiMo-V2-Pro, Xiaomi launched MiMo-V2-Omni on 18 March 2026. It handles image, video, audio, and text inputs. Before the official release, it was codenamed "Healer Alpha" in OpenRouter. === MiMo-V2-TTS === On the same date as the release of MiMo-V2-Pro and MiMo-V2-Omni, a Text-to-Speech model named MiMo-V2-TTS was released also. It is a speech synthesis model. It was trained on audio data, which makes it capable of emotional transitions, mid-sentence tone shifts, singing, and synthesis of regional dialects like Sichuan, Cantonese, Henan, and Taiwanese. == Licensing == Xiaomi has used different licensing approaches for different models in the MiMo family. The MiMo-7B series and MiMo-V2-Flash were released as open-weight models. MiMo-V2-Flash was published under the MIT license with model weights and inference code available on Hugging Face. MiMo-V2-Pro and MiMo-V2-Omni were released as proprietary models. It was accessible through Xiaomi's API platform and third-party API providers. Luo Fuli stated that Xiaomi intended to open-source a variant of MiMo-V2-Pro. Although, she did not specify any timeline. MiMo-V2-TTS was released as a proprietary model with no publicly available weights.

    Read more →
  • AsoSoft text corpus

    AsoSoft text corpus

    The AsoSoft text corpus is the first large-scale Kurdish text corpus, collected and processed by the AsoSoft research and development group. It contains 458,000 documents (188 million tokens) that are collected from sources such as websites, news agencies, books, and magazines. The corpus is partially tagged by topic, so it can be used for topic identification tasks. Also, it is applicable for extracting language model and computational lexicon information. Part of the corpus (75 million tokens) is available online for non-commercial use. The corpus uses the TEI format.

    Read more →
  • Version space learning

    Version space learning

    Version space learning is a logical approach to machine learning, specifically binary classification. Version space learning algorithms search a predefined space of hypotheses, viewed as a set of logical sentences. Formally, the hypothesis space is a disjunction H 1 ∨ H 2 ∨ . . . ∨ H n {\displaystyle H_{1}\lor H_{2}\lor ...\lor H_{n}} (i.e., one or more of hypotheses 1 through n are true). A version space learning algorithm is presented with examples, which it will use to restrict its hypothesis space; for each example x, the hypotheses that are inconsistent with x are removed from the space. This iterative refining of the hypothesis space is called the candidate elimination algorithm, the hypothesis space maintained inside the algorithm, its version space. == The version space algorithm == In settings where there is a generality-ordering on hypotheses, it is possible to represent the version space by two sets of hypotheses: (1) the most specific consistent hypotheses, and (2) the most general consistent hypotheses, where "consistent" indicates agreement with observed data. The most specific hypotheses (i.e., the specific boundary SB) cover the observed positive training examples, and as little of the remaining feature space as possible. These hypotheses, if reduced any further, exclude a positive training example, and hence become inconsistent. These minimal hypotheses essentially constitute a (pessimistic) claim that the true concept is defined just by the positive data already observed: Thus, if a novel (never-before-seen) data point is observed, it should be assumed to be negative. (I.e., if data has not previously been ruled in, then it's ruled out.) The most general hypotheses (i.e., the general boundary GB) cover the observed positive training examples, but also cover as much of the remaining feature space without including any negative training examples. These, if enlarged any further, include a negative training example, and hence become inconsistent. These maximal hypotheses essentially constitute a (optimistic) claim that the true concept is defined just by the negative data already observed: Thus, if a novel (never-before-seen) data point is observed, it should be assumed to be positive. (I.e., if data has not previously been ruled out, then it's ruled in.) Thus, during learning, the version space (which itself is a set – possibly infinite – containing all consistent hypotheses) can be represented by just its lower and upper bounds (maximally general and maximally specific hypothesis sets), and learning operations can be performed just on these representative sets. After learning, classification can be performed on unseen examples by testing the hypothesis learned by the algorithm. If the example is consistent with multiple hypotheses, a majority vote rule can be applied. == Historical background == The notion of version spaces was introduced by Mitchell in the early 1980s as a framework for understanding the basic problem of supervised learning within the context of solution search. Although the basic "candidate elimination" search method that accompanies the version space framework is not a popular learning algorithm, there are some practical implementations that have been developed (e.g., Sverdlik & Reynolds 1992, Hong & Tsang 1997, Dubois & Quafafou 2002). A major drawback of version space learning is its inability to deal with noise: any pair of inconsistent examples can cause the version space to collapse, i.e., become empty, so that classification becomes impossible. One solution of this problem is proposed by Dubois and Quafafou that proposed the Rough Version Space, where rough sets based approximations are used to learn certain and possible hypothesis in the presence of inconsistent data.

    Read more →
  • PenTile matrix family

    PenTile matrix family

    PenTile matrix is a family of patented subpixel matrix schemes used in electronic device displays. PenTile is a trademark of Samsung. PenTile matrices are used in AMOLED and LCD displays. These subpixel layouts are specifically designed to operate with proprietary algorithms for subpixel rendering embedded in the display driver, allowing plug and play compatibility with conventional RGB (Red-Green-Blue) stripe panels. == Overview == "PenTile Matrix" (a neologism from penta-, meaning "five" in Greek and tile) describes the geometric layout of the prototypical subpixel arrangement developed in the early 1990s. The layout consists of a quincunx comprising two red subpixels, two green subpixels, and one central blue subpixel in each unit cell. It was inspired by biomimicry of the human retina, which has nearly equal numbers of L and M type cone cells, but significantly fewer S cones. As the S cones are primarily responsible for perceiving blue colors, which do not appreciably affect the perception of luminance, reducing the number of blue subpixels with respect to the red and green subpixels in a display does not reduce the image quality. However, the layout may cause color leakage image distortion, which can be reduced by filters. In some cases the layout causes reduced moiré and blockiness compared to conventional RGB layouts. The PenTile layout is specifically designed to work with and be dependent upon subpixel rendering that uses only one and a quarter subpixel per pixel, on average, to render an image. That is, that any given input pixel is mapped to either a red-centered logical pixel, or a green-centered logical pixel. === History === PenTile was invented by Candice H. Brown Elliott, for which she was awarded the Society for Information Display's Otto Schade Prize in 2014. The technology was licensed by the company Clairvoyante from 2000 until 2008, during which time several prototype PenTile displays were developed by a number of Asian liquid crystal display (LCD) manufacturers. In March 2008, Samsung Electronics acquired Clairvoyante's PenTile IP assets. Samsung then funded a new company, Nouvoyance, Inc. to continue development of the PenTile technology. == PenTile RGBG == PenTile RGBG layout used in AMOLED and plasma displays uses green pixels interleaved with alternating red and blue pixels. The human eye is most sensitive to green, especially for high resolution luminance information. The green subpixels are mapped to input pixels on a one-to-one basis. The red and blue subpixels are subsampled, reconstructing the chroma signal at a lower resolution. The luminance signal is processed using adaptive subpixel rendering filters to optimize reconstruction of high spatial frequencies from the input image, wherein the green subpixels provide the majority of the reconstruction. The red and blue subpixels are capable of reconstructing the horizontal and vertical spatial frequencies, but not the highest of the diagonal. Diagonal high spatial frequency information in the red and blue channels of the input image are transferred to the green subpixels for image reconstruction. Thus the RG-BG scheme creates a color display with one third fewer subpixels than a traditional RGB-RGB scheme but with the same measured luminance display resolution. This is similar to the Bayer filter commonly used in digital cameras. === Devices === As of 2021, "almost all" OLED screens in portable consumer devices use some form of Pentile subpixel layout. == PenTile RGBW == PenTile RGBW technology, used in LCD, adds an extra subpixel to the traditional red, green and blue subpixels that is a clear area without color filtering material and with the only purpose of letting backlight come through, hence W for white. This makes it possible to produce a brighter image compared to an RGB-matrix while using the same amount of power, or produce an equally bright image while using less power. The PenTile RGBW layout uses each red, green, blue and white subpixel to present high-resolution luminance information to the human eyes' red-sensing and green-sensing cone cells, while using the combined effect of all the color subpixels to present lower-resolution chroma (color) information to all three cone cell types. Combined, this optimizes the match of display technology to the biological mechanisms of human vision. The layout uses one third fewer subpixels for the same resolution as the RGB stripe (RGB-RGB) layout, in spite of having four color primaries instead of the conventional three, using subpixel rendering combined with metamer rendering. Metamer rendering optimizes the energy distribution between the white subpixel and the combined red, green, and blue subpixels: W <> RGB, to improve image sharpness. The display driver chip has an RGB to RGBW color vector space converter and gamut mapping algorithm, followed by metamer and subpixel rendering algorithms. In order to maintain saturated color quality, to avoid simultaneous contrast error between saturated colors and peak white brightness, while simultaneously reducing backlight power requirements, the display backlight brightness is under control of the PenTile driver engine. When the image is mostly desaturated colors, those near white or grey, the backlight brightness is significantly reduced, often to less than 50% peak, while the LCD levels are increased to compensate. When the image has very bright saturated colors, the backlight brightness is maintained at higher levels. The PenTile RGBW also has an optional high-brightness mode that doubles the brightness of the desaturated color image areas, such as black-and-white text, for improved outdoor viewability. === Devices === Motorola MC65 Motorola ES55 Motorola ES400 Motorola Atrix 4G Samsung Galaxy Note 10.1 2014 version Lenovo Yoga 2 Pro Lenovo Yoga 3 Pro HP ENVY TouchSmart 14-k022tx Sleekbook MSI GS60 Ghost Pro 4K Lenovo IdeaPad Y50 4K Asus ZenBook UX303LN 4K Asus ZenBook Pro UX501JW LG UH7500/6500/6100 LG ThinQ G7/G7+ Oculus Quest 1 == Controversy == An ongoing controversy regarding the definition or measurement of resolution of color subpixelated flat panel displays led many people to question the resolution claims of PenTile display products. Journalists have noted that in "just about every flat-panel TV in existence, each pixel is composed of one red, one green, and one blue subpixel (RGB), all of uniform size". In traditional flat-panel screens, the resolution is defined by the number of red, green, and blue subpixels, in groups of three, in an array in each axis. As a result, each pixel or group of subpixels can render any colour on the screen, regardless of neighbouring pixels. This is not the case with PenTile screens. The Video Electronics Standards Association (VESA) method of measuring and defining resolution in color displays is to measure the contrast of line pairs, requiring a minimum of 50% Michelson contrast for displays intended for rendering text. The developers of PenTile displays use this VESA criterion for contrast of line pairs to calculate the resolutions specified. In the RGBG layout the alternate red and blue subpixels are 'shared' or sub-sampled with neighboring pixels. Due to the one third lower subpixel density on PenTile displays the pixel structure may be more visible when compared to RGB stripe displays with the same pixel density. The loss of subpixels for a given resolution specification has led some journalists to describe the use of PenTile as "shady practice" and "sort of cheating". For a given size and resolution specification, the PenTile screen can appear grainy, pixelated, speckled, with blurred text on some saturated colors and backgrounds when compared to RGB stripe color. This effect is understood to be caused by the restriction of the number of subpixels that may participate in the image reconstruction when colors are highly saturated to primaries. In the RGBW case, this is caused as the W subpixel will not be available in order to maintain the saturated color. In the RGBG case, this effect will occur when the color boundary is primarily red or blue, as the fully populated (one green per pixel) sub-pixel cannot contribute. For all other cases, text and especially full color images are effectively reconstructed. == Advantages and disadvantages == The PenTile layout reduces the number of subpixels needed to create a specified resolution. Consequently it is possible to achieve an HD resolution on a PenTile AMOLED screen at lower cost than other technologies, and most reviewers note that "300 ppi" (as per VESA - not full pixels) resolution displays (such as Samsung Galaxy S III) make the PenTile effect less obvious than lower resolution PenTile displays (Droid Razr). The second advantage is lower power consumption: the HTC One S's use of a PenTile display makes it more energy efficient and thinner than equivalent LCD screens, giving it better battery life than the HTC One X's IPS LCD. A PenTile AMOLED screen is also

    Read more →
  • Point-set registration

    Point-set registration

    In computer vision, pattern recognition, and robotics, point-set registration, also known as point-cloud registration or scan matching, is the process of finding a spatial transformation (e.g., scaling, rotation and translation) that aligns two point clouds. The purpose of finding such a transformation includes merging multiple data sets into a globally consistent model (or coordinate frame), and mapping a new measurement to a known data set to identify features or to estimate its pose. Raw 3D point cloud data are typically obtained from Lidars and RGB-D cameras. 3D point clouds can also be generated from computer vision algorithms such as triangulation, bundle adjustment, and more recently, monocular image depth estimation using deep learning. For 2D point set registration used in image processing and feature-based image registration, a point set may be 2D pixel coordinates obtained by feature extraction from an image, for example corner detection. Point cloud registration has extensive applications in autonomous driving, motion estimation and 3D reconstruction, object detection and pose estimation, robotic manipulation, simultaneous localization and mapping (SLAM), panorama stitching, virtual and augmented reality, and medical imaging. As a special case, registration of two point sets that only differ by a 3D rotation (i.e., there is no scaling and translation), is called the Wahba Problem and also related to the orthogonal procrustes problem. == Formulation == The problem may be summarized as follows: Let { M , S } {\displaystyle \lbrace {\mathcal {M}},{\mathcal {S}}\rbrace } be two finite size point sets in a finite-dimensional real vector space R d {\displaystyle \mathbb {R} ^{d}} , which contain M {\displaystyle M} and N {\displaystyle N} points respectively (e.g., d = 3 {\displaystyle d=3} recovers the typical case of when M {\displaystyle {\mathcal {M}}} and S {\displaystyle {\mathcal {S}}} are 3D point sets). The problem is to find a transformation to be applied to the moving "model" point set M {\displaystyle {\mathcal {M}}} such that the difference (typically defined in the sense of point-wise Euclidean distance) between M {\displaystyle {\mathcal {M}}} and the static "scene" set S {\displaystyle {\mathcal {S}}} is minimized. In other words, a mapping from R d {\displaystyle \mathbb {R} ^{d}} to R d {\displaystyle \mathbb {R} ^{d}} is desired which yields the best alignment between the transformed "model" set and the "scene" set. The mapping may consist of a rigid or non-rigid transformation. The transformation model may be written as T {\displaystyle T} , using which the transformed, registered model point set is: The output of a point set registration algorithm is therefore the optimal transformation T ⋆ {\displaystyle T^{\star }} such that M {\displaystyle {\mathcal {M}}} is best aligned to S {\displaystyle {\mathcal {S}}} , according to some defined notion of distance function dist ⁡ ( ⋅ , ⋅ ) {\displaystyle \operatorname {dist} (\cdot ,\cdot )} : where T {\displaystyle {\mathcal {T}}} is used to denote the set of all possible transformations that the optimization tries to search for. The most popular choice of the distance function is to take the square of the Euclidean distance for every pair of points: where ‖ ⋅ ‖ 2 {\displaystyle \|\cdot \|_{2}} denotes the vector 2-norm, s m {\displaystyle s_{m}} is the corresponding point in set S {\displaystyle {\mathcal {S}}} that attains the shortest distance to a given point m {\displaystyle m} in set M {\displaystyle {\mathcal {M}}} after transformation. Minimizing such a function in rigid registration is equivalent to solving a least squares problem. == Types of algorithms == When the correspondences (i.e., s m ↔ m {\displaystyle s_{m}\leftrightarrow m} ) are given before the optimization, for example, using feature matching techniques, then the optimization only needs to estimate the transformation. This type of registration is called correspondence-based registration. On the other hand, if the correspondences are unknown, then the optimization is required to jointly find out the correspondences and transformation together. This type of registration is called simultaneous pose and correspondence registration. === Rigid registration === Given two point sets, rigid registration yields a rigid transformation which maps one point set to the other. A rigid transformation is defined as a transformation that does not change the distance between any two points. Typically such a transformation consists of translation and rotation. In rare cases, the point set may also be mirrored. In robotics and computer vision, rigid registration has the most applications. === Non-rigid registration === Given two point sets, non-rigid registration yields a non-rigid transformation which maps one point set to the other. Non-rigid transformations include affine transformations such as scaling and shear mapping. However, in the context of point set registration, non-rigid registration typically involves nonlinear transformation. If the eigenmodes of variation of the point set are known, the nonlinear transformation may be parametrized by the eigenvalues. A nonlinear transformation may also be parametrized as a thin plate spline. === Other types === Some approaches to point set registration use algorithms that solve the more general graph matching problem. However, the computational complexity of such methods tend to be high and they are limited to rigid registrations. In this article, we will only consider algorithms for rigid registration, where the transformation is assumed to contain 3D rotations and translations (possibly also including a uniform scaling). The PCL (Point Cloud Library) is an open-source framework for n-dimensional point cloud and 3D geometry processing. It includes several point registration algorithms. == Correspondence-based registration == Correspondence-based methods assume the putative correspondences m ↔ s m {\displaystyle m\leftrightarrow s_{m}} are given for every point m ∈ M {\displaystyle m\in {\mathcal {M}}} . Therefore, we arrive at a setting where both point sets M {\displaystyle {\mathcal {M}}} and S {\displaystyle {\mathcal {S}}} have N {\displaystyle N} points and the correspondences m i ↔ s i , i = 1 , … , N {\displaystyle m_{i}\leftrightarrow s_{i},i=1,\dots ,N} are given. === Outlier-free registration === In the simplest case, one can assume that all the correspondences are correct, meaning that the points m i , s i ∈ R 3 {\displaystyle m_{i},s_{i}\in \mathbb {R} ^{3}} are generated as follows:where l > 0 {\displaystyle l>0} is a uniform scaling factor (in many cases l = 1 {\displaystyle l=1} is assumed), R ∈ SO ( 3 ) {\displaystyle R\in {\text{SO}}(3)} is a proper 3D rotation matrix ( SO ( d ) {\displaystyle {\text{SO}}(d)} is the special orthogonal group of degree d {\displaystyle d} ), t ∈ R 3 {\displaystyle t\in \mathbb {R} ^{3}} is a 3D translation vector and ϵ i ∈ R 3 {\displaystyle \epsilon _{i}\in \mathbb {R} ^{3}} models the unknown additive noise (e.g., Gaussian noise). Specifically, if the noise ϵ i {\displaystyle \epsilon _{i}} is assumed to follow a zero-mean isotropic Gaussian distribution with standard deviation σ i {\displaystyle \sigma _{i}} , i.e., ϵ i ∼ N ( 0 , σ i 2 I 3 ) {\displaystyle \epsilon _{i}\sim {\mathcal {N}}(0,\sigma _{i}^{2}I_{3})} , then the following optimization can be shown to yield the maximum likelihood estimate for the unknown scale, rotation and translation:Note that when the scaling factor is 1 and the translation vector is zero, then the optimization recovers the formulation of the Wahba problem. Despite the non-convexity of the optimization (cb.2) due to non-convexity of the set SO ( 3 ) {\displaystyle {\text{SO}}(3)} , seminal work by Berthold K.P. Horn showed that (cb.2) actually admits a closed-form solution, by decoupling the estimation of scale, rotation and translation. Similar results were discovered by Arun et al. In addition, in order to find a unique transformation ( l , R , t ) {\displaystyle (l,R,t)} , at least N = 3 {\displaystyle N=3} non-collinear points in each point set are required. More recently, Briales and Gonzalez-Jimenez have developed a semidefinite relaxation using Lagrangian duality, for the case where the model set M {\displaystyle {\mathcal {M}}} contains different 3D primitives such as points, lines and planes (which is the case when the model M {\displaystyle {\mathcal {M}}} is a 3D mesh). Interestingly, the semidefinite relaxation is empirically tight, i.e., a certifiably globally optimal solution can be extracted from the solution of the semidefinite relaxation. === Robust registration === The least squares formulation (cb.2) is known to perform arbitrarily badly in the presence of outliers. An outlier correspondence is a pair of measurements s i ↔ m i {\displaystyle s_{i}\leftrightarrow m_{i}} that departs from the generative model (cb.1). In this case, one can consider a differen

    Read more →
  • Facial age estimation

    Facial age estimation

    Facial age estimation is the use of artificial intelligence to estimate the age of a person based on their facial features. Computer vision techniques are used to analyse the facial features in the images of millions of people whose age is known and then deep learning is used to create an algorithm that tries to predict the age of an unknown person. The key use of the technology is to prevent access to age-restricted goods and services. Examples include restricting children from accessing internet pornography, checking that they meet a mandatory minimum age when registering for an account on social media, or preventing adults from accessing websites, online chat or games designed only for use by children. The technology is distinct from facial recognition systems as the software does not attempt to uniquely identify the individual. Researchers have applied neural networks for age estimation since at least 2010. == Evaluation == An ongoing study by the National Institute of Standards and Technology (NIST) entitled 'Face Analysis Technology Evaluation' seeks to establish the technical performance of prototype age estimation algorithms submitted by academic teams and software vendors including Brno University of Technology, Czech Technical University in Prague, Dermalog, IDEMIA, Incode Technologies Inc, Jumio, Nominder, Rank One Computing, Unissey and Yoti. == Public sector use == The UK government has explored using facial age estimation at the UK border as an alternative to bone X-rays and MRI scans when determining child status of asylum seekers. == Commercial use == Commercial users of facial age estimation include Instagram and OnlyFans. In January 2025, John Lewis & Partners announced that had started using the technology to check the age of people shopping for knives on its website, to comply with UK legislation to limit knife crime. In the UK, several supermarket chains have taken part in Home Office trials of the technology to automate the checking of a customer's age when buying age-restricted goods such as alcohol. UK legislation introduced in January 2025 mandates robust forms of age verification hosting adult content viewable in the UK by July 2025. Allowable methods include facial age estimation. == Criticism == Adam Schwartz, a lawyer for the Electronic Frontier Foundation, criticized the use of facial age estimation software, noting its inaccuracy especially in cases of minorities and women, as was found in NIST's 2024 report. Twenty organisations jointly under European Digital Rights called the practice a "systematic and invasive processing of young people's data" that risks discriminatory profiling.

    Read more →
  • NetMiner

    NetMiner

    NetMiner is an all-in-one software platform for analyzing and visualizing complex network data, based on Social Network Analysis (SNA). Originally released in 2001, it supports research and education in a wide range of domains through interactive and visual data exploration. This tool allows researchers to explore their network data visually and interactively, and helps them to detect underlying patterns and structures of the network. It has also been recognized for its comprehensive features and user-friendly interface in comparative reviews of SNA software packages. == Features == === Integrated Data Environment === NetMiner supports unified management of diverse data types—including network (nodes and links), tabular, and unstructured text data—within a single platform. This enables users to perform the entire analysis workflow seamlessly without switching between tools. NetMiner also supports a wide range of analytical methods, allowing users to derive new insights by combining multiple approaches. Analytical results can be saved and reused across workflows(Add to Dataset) Graph and Network Analysis: Includes Centrality, Community Detection, Blockmodeling, and Similarity Measures. Machine learning: Provides algorithms for regression, classification, clustering, ensemble modeling and XAI(Explainable AI) Graph Neural Networks (GNNs): Supports models such as GraphSAGE, GCN, and GAT to learn from both node attributes and graph structure. Natural language processing (NLP): Uses pretrained deep learning models to analyze unstructured text, including named entity recognition and keyword extraction. Text mining and Text network analysis: Supports construction of word co-occurrence networks and topic modeling using LDA, BERTopic, enabling identification of thematic patterns and semantic structures in text data. Data Visualization: Offers advanced network visualization features, supporting multiple layout algorithms. Analytical outcomes such as centrality or community detection can be directly reflected in the network map via node size, color, and position, enhancing intuitive understanding. === AI Assistant === NetMiner integrates with external large language models such as OpenAI GPT and Google Gemini to interpret complex analysis results in natural language, summarize key findings, and suggest next steps for exploration. === Workflow and Usability === Designed to follow the structure of real-world data analysis workflows, NetMiner adopts a hierarchical data organization (Project → Workspace → Dataset → Data Item). Its web-based user interface improves clarity and reduces complexity. NetMiner 5 supports Windows 10 or higher and macOS 11 or later with M1 chip. Both academic and commercial licenses are available. == Extension == NetMiner Extension is small program to extend the functionality of NetMiner. In other words, it enables you to customize NetMiner according to your needs. By adding ‘NetMiner Extension’, you can expand your research. === Web Data Collection === NetMiner allows users to collect data from services such as YouTube, OpenAlex, Springer, and KCI via Open APIs. Collected data is automatically preprocessed and transformed to fit NetMiner’s internal structure, requiring no additional coding or external tools. SNS Data Collector: It collects social media data from YouTube, which has a large number of social media users worldwide. Biblio Data Collector: It collects the bibliographic data from Springer, OpenAlex, and KCI essential for research trend analysis. == File formats == === NetMiner data file format === .NMF === Importable/exportable formats === Plain text data: .TXT, .CSV Microsoft Excel data: .XLS, .XLSX Unstructured text data: .TXT, .CSV, .XLS(X) ※ NetMiner 4 only NetMiner 2 data: .NTF UCINet data: .DL, .DAT Pajek data: .NET, .VEC, .CLU, .PER StOCNET data file: .DAT Graph Modelling Language data: .GML(importing only) Related software UCINET Pajek Gephi StoCNET == Data structure == === Hierarchy of NetMiner data structure === NetMiner 5 supports not only graph data composed of nodes and links, but also tabular and unstructured data without fixed schema or identifiers. This enables users to easily import a wide variety of raw and unstructured data suitable for machine learning applications. Within a single workspace, users can manage node sets, link sets, and structured/unstructured data simultaneously. Multiple graph layers under a node set can be organized in a tree structure, allowing for intuitive understanding of the data currently being analyzed. == Release history == The first version of NetMiner was released on Dec 21, 2001. There have been five major updates from 2001. === NetMiner 5 === Released on June 9, 2025. NetMiner 5 retains the core features and no-code concept of NetMiner 4, but has evolved by integrating cutting-edge AI technologies. AI Assistant, Personal Analytics Tutor Support for Graph, Structured, and Unstructured Data Graph Analytics / Social Network Analysis Machine Learning(M/L) & XAI Graph Machine Learning(GML): Graph Neural Network Text Mining: Natural Language Processing(NLP), Text Network, Topic Modeling Data Visualization === NetMiner 4 (2011) === Latest version is 4.5.1. Introduced Python scripting, encrypted NMF format, semantic analysis tools (word cloud, topic modeling), and Extension - Data Collector. === NetMiner 3 (2007) === Enhanced scalability, integrated analysis-visualization modules, and DB import from Oracle, MS SQL. === NetMiner 2 (2003) === Improved statistical and network measures, visualization algorithms, and external data import modules.

    Read more →
  • Intelligent database

    Intelligent database

    Until the 1980s, databases were viewed as computer systems that stored record-oriented and business data such as manufacturing inventories, bank records, and sales transactions. A database system was not expected to merge numeric data with text, images, or multimedia information, nor was it expected to automatically notice patterns in the data it stored. In the late 1980s the concept of an intelligent database was put forward as a system that manages information (rather than data) in a way that appears natural to users and which goes beyond simple record keeping. The term was introduced in 1989 by the book Intelligent Databases by Kamran Parsaye, Mark Chignell, Setrag Khoshafian and Harry Wong. The concept postulated three levels of intelligence for such systems: high level tools, the user interface and the database engine. The high level tools manage data quality and automatically discover relevant patterns in the data with a process called data mining. This layer often relies on the use of artificial intelligence techniques. The user interface uses hypermedia in a form that uniformly manages text, images and numeric data. The intelligent database engine supports the other two layers, often merging relational database techniques with object orientation. In the twenty-first century, intelligent databases have now become widespread, e.g. hospital databases can now call up patient histories consisting of charts, text and x-ray images just with a few mouse clicks, and many corporate databases include decision support tools based on sales pattern analysis.

    Read more →
  • Grammar checker

    Grammar checker

    A grammar checker, in computing terms, is a program, or part of a program, that attempts to verify written text for grammatical correctness. Grammar checkers are most often implemented as a feature of a larger program, such as a word processor, but are also available as a stand-alone application that can be activated from within programs that work with editable text. The implementation of a grammar checker makes use of natural language processing. == History == The earliest "grammar checkers" were programs that checked for punctuation and style inconsistencies, rather than a complete range of possible grammatical errors. The first system was called Writer's Workbench, and was a set of writing tools included with Unix systems as far back as the 1970s. The whole Writer's Workbench package included several separate tools to check for various writing problems. The "diction" tool checked for wordy, trite, clichéd or misused phrases in a text. The tool would output a list of questionable phrases and provide suggestions for improving the writing. The "style" tool analyzed the writing style of a given text. It performed a number of readability tests on the text and output the results, and gave some statistical information about the sentences of the text. Aspen Software of Albuquerque, New Mexico released the earliest version of a diction and style checker for personal computers, Grammatik, in 1981. Grammatik was first available for a Radio Shack - TRS-80, and soon had versions for CP/M and the IBM PC. Reference Software International of San Francisco, California, acquired Grammatik in 1985. Development of Grammatik continued, and it became an actual grammar checker that could detect writing errors beyond simple style checking. Other early diction and style checking programs included Punctuation & Style, Correct Grammar, RightWriter and PowerEdit. While all the earliest programs started as simple diction and style checkers, all eventually added various levels of language processing, and developed some level of true grammar checking capability. Until 1992, grammar checkers were sold as add-on programs. There were a large number of different word processing programs available at that time, with WordPerfect and Microsoft Word the top two in market share. In 1992, Microsoft decided to add grammar checking as a feature of Word, and licensed CorrecText, a grammar checker from Houghton Mifflin that had not yet been marketed as a standalone product. WordPerfect answered Microsoft's move by acquiring Reference Software, and the direct descendant of Grammatik is still included with WordPerfect. As of 2019, grammar checkers are built into systems like Google Docs, browser extensions like Grammarly and Qordoba, desktop applications like Ginger, free and open-source software like LanguageTool, and text editor plugins like those available from WebSpellChecker Software. == Technical issues == The earliest writing style programs checked for wordy, trite, clichéd, or misused phrases in a text. This process was based on simple pattern matching. The heart of the program was a list of many hundreds or thousands of phrases that are considered poor writing by many experts. The list of questionable phrases included alternative wording for each phrase. The checking program would simply break text into sentences, check for any matches in the phrase dictionary, flag suspect phrases and show an alternative. These programs could also perform some mechanical checks. For example, they would typically flag doubled words, doubled punctuation, some capitalization errors, and other simple mechanical mistakes. True grammar checking is more complex. While a programming language has a very specific syntax and grammar, this is not so for natural languages. One can write a somewhat complete formal grammar for a natural language, but there are usually so many exceptions in real usage that a formal grammar is of minimal help in writing a grammar checker. One of the most important parts of a natural language grammar checker is a dictionary of all the words in the language, along with the part of speech of each word. The fact that a natural word may be used as any one of several parts of speech (such as "free" being used as an adjective, adverb, noun, or verb) greatly increases the complexity of any grammar checker. A grammar checker will find each sentence in a text, look up each word in the dictionary, and then attempt to parse the sentence into a form that matches a grammar. Using various rules, the program can then detect various errors, such as agreement in tense, number, word order, and so on. It is also possible to detect some stylistic problems with the text. For example, some popular style guides such as The Elements of Style deprecate excessive use of the passive voice. Grammar checkers may attempt to identify passive sentences and suggest an active-voice alternative. The software elements required for grammar checking are closely related to some of the development issues that need to be addressed for speech recognition software. In voice recognition, parsing can be used to help predict which word is most likely intended, based on part of speech and position in the sentence. In grammar checking, the parsing is used to detect words that fail to follow accepted grammar usage. Recently, research has focused on developing algorithms which can recognize grammar errors based on the context of the surrounding words. == Criticism == Grammar checkers are considered a type of foreign language writing aid which non-native speakers can use to proofread their writings as such programs endeavor to identify syntactical errors. However, as with other computerized writing aids such as spell checkers, popular grammar checkers are often criticized when they fail to spot errors and incorrectly flag correct text as erroneous. The linguist Geoffrey K. Pullum argued in 2007 that they were generally so inaccurate as to do more harm than good: "for the most part, accepting the advice of a computer grammar checker on your prose will make it much worse, sometimes hilariously incoherent."

    Read more →
  • Phase correlation

    Phase correlation

    Phase correlation is an approach to estimate the relative translative offset between two similar images (digital image correlation) or other data sets. It is commonly used in image registration and relies on a frequency-domain representation of the data, usually calculated by fast Fourier transforms. The term is applied particularly to a subset of cross-correlation techniques that isolate the phase information from the Fourier-space representation of the cross-correlogram. == Example == The following image demonstrates the usage of phase correlation to determine relative translative movement between two images corrupted by independent Gaussian noise. The image was translated by (20,23) pixels. Accordingly, one can clearly see a peak in the phase-correlation representation at approximately (20,23). == Method == Given two input images g a {\displaystyle \ g_{a}} and g b {\displaystyle \ g_{b}} : Apply a window function (e.g., a Hamming window) on both images to reduce edge effects (this may be optional depending on the image characteristics). Then, calculate the discrete 2D Fourier transform of both images. G a = F { g a } , G b = F { g b } {\displaystyle \ \mathbf {G} _{a}={\mathcal {F}}\{g_{a}\},\;\mathbf {G} _{b}={\mathcal {F}}\{g_{b}\}} Calculate the cross-power spectrum by taking the complex conjugate of the second result, multiplying the Fourier transforms together elementwise, and normalizing this product elementwise. R = G a ∘ G b ∗ | G a ∘ G b ∗ | {\displaystyle \ R={\frac {\mathbf {G} _{a}\circ \mathbf {G} _{b}^{}}{|\mathbf {G} _{a}\circ \mathbf {G} _{b}^{}|}}} Where ∘ {\displaystyle \circ } is the Hadamard product (entry-wise product) and the absolute values are taken entry-wise as well. Written out entry-wise for element index ( j , k ) {\displaystyle (j,k)} : R j k = G a , j k ⋅ G b , j k ∗ | G a , j k ⋅ G b , j k ∗ | {\displaystyle \ R_{jk}={\frac {G_{a,jk}\cdot G_{b,jk}^{}}{|G_{a,jk}\cdot G_{b,jk}^{}|}}} Obtain the normalized cross-correlation by applying the inverse Fourier transform. r = F − 1 { R } {\displaystyle \ r={\mathcal {F}}^{-1}\{R\}} Determine the location of the peak in r {\displaystyle \ r} . ( Δ x , Δ y ) = arg ⁡ max ( x , y ) { r } {\displaystyle \ (\Delta x,\Delta y)=\arg \max _{(x,y)}\{r\}} === Subpixel registration === Commonly, interpolation methods are used to estimate the peak location in the cross-correlogram to non-integer values, despite the fact that the data are discrete, and this procedure is often termed 'subpixel registration'. A large variety of subpixel interpolation methods are given in the technical literature. Common peak interpolation methods such as parabolic interpolation have been used, and the OpenCV computer vision package uses a centroid-based method, though these generally have inferior accuracy compared to more sophisticated methods. Because the Fourier representation of the data has already been computed, it is especially convenient to use the Fourier shift theorem with real-valued (sub-integer) shifts for this purpose, which essentially interpolates using the sinusoidal basis functions of the Fourier transform. An especially popular FT-based estimator is given by Foroosh et al. In this method, the subpixel peak location is approximated by a simple formula involving peak pixel value and the values of its nearest neighbors, where r ( 0 , 0 ) {\displaystyle r_{(0,0)}} is the peak value and r ( 1 , 0 ) {\displaystyle r_{(1,0)}} is the nearest neighbor in the x direction (assuming, as in most approaches, that the integer shift has already been found and the comparand images differ only by a subpixel shift). Δ x = r ( 1 , 0 ) r ( 1 , 0 ) ± r ( 0 , 0 ) {\displaystyle \ \Delta x={\frac {r_{(1,0)}}{r_{(1,0)}\pm r_{(0,0)}}}} The Foroosh et al. method is quite fast compared to most methods, though it is not always the most accurate. Some methods shift the peak in Fourier space and apply non-linear optimization to maximize the correlogram peak, but these tend to be very slow since they must apply an inverse Fourier transform or its equivalent in the objective function. It is also possible to infer the peak location from phase characteristics in Fourier space without the inverse transformation, as noted by Stone. These methods usually use a linear least squares (LLS) fit of the phase angles to a planar model. The long latency of the phase angle computation in these methods is a disadvantage, but the speed can sometimes be comparable to the Foroosh et al. method depending on the image size. They often compare favorably in speed to the multiple iterations of extremely slow objective functions in iterative non-linear methods. Since all subpixel shift computation methods are fundamentally interpolative, the performance of a particular method depends on how well the underlying data conform to the assumptions in the interpolator. This fact also may limit the usefulness of high numerical accuracy in an algorithm, since the uncertainty due to interpolation method choice may be larger than any numerical or approximation error in the particular method. Subpixel methods are also particularly sensitive to noise in the images, and the utility of a particular algorithm is distinguished not only by its speed and accuracy but its resilience to the particular types of noise in the application. == Rationale == The method is based on the Fourier shift theorem. Let the two images g a {\displaystyle \ g_{a}} and g b {\displaystyle \ g_{b}} be circularly-shifted versions of each other: g b ( x , y ) = d e f g a ( ( x − Δ x ) mod M , ( y − Δ y ) mod N ) {\displaystyle \ g_{b}(x,y)\ {\stackrel {\mathrm {def} }{=}}\ g_{a}((x-\Delta x){\bmod {M}},(y-\Delta y){\bmod {N}})} (where the images are M × N {\displaystyle \ M\times N} in size). Then, the discrete Fourier transforms of the images will be shifted relatively in phase: G b ( u , v ) = G a ( u , v ) e − 2 π i ( u Δ x M + v Δ y N ) {\displaystyle \mathbf {G} _{b}(u,v)=\mathbf {G} _{a}(u,v)e^{-2\pi i({\frac {u\Delta x}{M}}+{\frac {v\Delta y}{N}})}} One can then calculate the normalized cross-power spectrum to factor out the phase difference: R ( u , v ) = G a G b ∗ | G a G b ∗ | = G a G a ∗ e 2 π i ( u Δ x M + v Δ y N ) | G a G a ∗ e 2 π i ( u Δ x M + v Δ y N ) | = G a G a ∗ e 2 π i ( u Δ x M + v Δ y N ) | G a G a ∗ | = e 2 π i ( u Δ x M + v Δ y N ) {\displaystyle {\begin{aligned}R(u,v)&={\frac {\mathbf {G} _{a}\mathbf {G} _{b}^{}}{|\mathbf {G} _{a}\mathbf {G} _{b}^{}|}}\\&={\frac {\mathbf {G} _{a}\mathbf {G} _{a}^{}e^{2\pi i({\frac {u\Delta x}{M}}+{\frac {v\Delta y}{N}})}}{|\mathbf {G} _{a}\mathbf {G} _{a}^{}e^{2\pi i({\frac {u\Delta x}{M}}+{\frac {v\Delta y}{N}})}|}}\\&={\frac {\mathbf {G} _{a}\mathbf {G} _{a}^{}e^{2\pi i({\frac {u\Delta x}{M}}+{\frac {v\Delta y}{N}})}}{|\mathbf {G} _{a}\mathbf {G} _{a}^{}|}}\\&=e^{2\pi i({\frac {u\Delta x}{M}}+{\frac {v\Delta y}{N}})}\end{aligned}}} since the magnitude of an imaginary exponential always is one, and the phase of G a G a ∗ {\displaystyle \ \mathbf {G} _{a}\mathbf {G} _{a}^{}} always is zero. The inverse Fourier transform of a complex exponential is a Dirac delta function, i.e. a single peak: r ( x , y ) = δ ( x + Δ x , y + Δ y ) {\displaystyle \ r(x,y)=\delta (x+\Delta x,y+\Delta y)} This result could have been obtained by calculating the cross correlation directly. The advantage of this method is that the discrete Fourier transform and its inverse can be performed using the fast Fourier transform, which is much faster than correlation for large images. === Benefits === Unlike many spatial-domain algorithms, the phase correlation method is resilient to noise, occlusions, and other defects typical of medical or satellite images. The method can be extended to determine rotation and scaling differences between two images by first converting the images to log-polar coordinates. Due to properties of the Fourier transform, the rotation and scaling parameters can be determined in a manner invariant to translation. === Limitations === In practice, it is more likely that g b {\displaystyle \ g_{b}} will be a simple linear shift of g a {\displaystyle \ g_{a}} , rather than a circular shift as required by the explanation above. In such cases, r {\displaystyle \ r} will not be a simple delta function, which will reduce the performance of the method. In such cases, a window function (such as a Gaussian or Tukey window) should be employed during the Fourier transform to reduce edge effects, or the images should be zero padded so that the edge effects can be ignored. If the images consist of a flat background, with all detail situated away from the edges, then a linear shift will be equivalent to a circular shift, and the above derivation will hold exactly. The peak can be sharpened by using edge or vector correlation. For periodic images (such as a chessboard or picket fence), phase correlation may yield ambiguous results with several peaks in the resulting output. == Applications == Phase correlation is the preferred m

    Read more →
  • Brill tagger

    Brill tagger

    The Brill tagger is an inductive method for part-of-speech tagging. It was described and invented by Eric Brill in his 1993 PhD thesis. It can be summarized as an "error-driven transformation-based tagger". It is: a form of supervised learning, which aims to minimize error; and, a transformation-based process, in the sense that a tag is assigned to each word and changed using a set of predefined rules. In the transformation process, if the word is known, it first assigns the most frequent tag, or if the word is unknown, it naively assigns the tag "noun" to it. High accuracy is eventually achieved by applying these rules iteratively and changing the incorrect tags. This approach ensures that valuable information such as the morphosyntactic construction of words is employed in an automatic tagging process. == Algorithm == The algorithm starts with initialization, which is the assignment of tags based on their probability for each word (for example, "dog" is more often a noun than a verb). Then "patches" are determined via rules that correct (probable) tagging errors made in the initialization phase: Initialization: Known words (in vocabulary): assigning the most frequent tag associated to a form of the word Unknown word == Rules and processing == The input text is first tokenized, or broken into words. Typically in natural language processing, contractions such as "'s", "n't", and the like are considered separate word tokens, as are punctuation marks. A dictionary and some morphological rules then provide an initial tag for each word token. For example, a simple lookup would reveal that "dog" may be a noun or a verb (the most frequent tag is simply chosen), while an unknown word will be assigned some tag(s) based on capitalization, various prefix or suffix strings, etc. (such morphological analyses, which Brill calls Lexical Rules, may vary between implementations). After all word tokens have (provisional) tags, contextual rules apply iteratively, to correct the tags by examining small amounts of context. This is where the Brill method differs from other part of speech tagging methods such as those using Hidden Markov Models. Rules are reapplied repeatedly, until a threshold is reached, or no more rules can apply. Brill rules are of the general form: tag1 → tag2 IF Condition where the Condition tests the preceding and/or following word tokens, or their tags (the notation for such rules differs between implementations). For example, in Brill's notation: IN NN WDPREVTAG DT while would change the tag of a word from IN (preposition) to NN (common noun), if the preceding word's tag is DT (determiner) and the word itself is "while". This covers cases like "all the while" or "in a while", where "while" should be tagged as a noun rather than its more common use as a conjunction (many rules are more general). Rules should only operate if the tag being changed is also known to be permissible, for the word in question or in principle (for example, most adjectives in English can also be used as nouns). Rules of this kind can be implemented by simple Finite-state machines. See Part of speech tagging for more general information including descriptions of the Penn Treebank and other sets of tags. Typical Brill taggers use a few hundred rules, which may be developed by linguistic intuition or by machine learning on a pre-tagged corpus. == Code == Brill's code pages at Johns Hopkins University are no longer on the web. An archived version of a mirror of the Brill tagger at its latest version as it was available at Plymouth Tech can be found on Archive.org. The software uses the MIT License.

    Read more →
  • Wide-column store

    Wide-column store

    A wide-column store (or extensible record store) is a type of NoSQL database. It uses tables, rows, and columns, but unlike a relational database, the names and format of the columns can vary from row to row in the same table. A wide-column store can be interpreted as a two-dimensional key–value store. Google's Bigtable is one of the prototypical examples of a wide-column store. == Wide-column stores versus columnar databases == Wide-column stores such as Bigtable and Apache Cassandra are not column stores in the original sense of the term, since their two-level structures do not use a columnar data layout. In genuine column stores, a columnar data layout is adopted such that each column is stored separately on disk. Wide-column stores do often support the notion of column families that are stored separately. However, each such column family typically contains multiple columns that are used together, similar to traditional relational database tables. Within a given column family, all data is stored in a row-by-row fashion, such that the columns for a given row are stored together, rather than each column being stored separately. Wide-column stores that support column families are also known as column family databases. == Notable examples == Notable wide-column stores include: Apache Accumulo Apache Cassandra Apache HBase Bigtable DataStax Enterprise (uses Apache Cassandra) DataStax Astra DB (uses Apache Cassandra) Hypertable Azure Tables ScyllaDB

    Read more →
  • 1.58-bit large language model

    1.58-bit large language model

    A 1.58-bit large language model (also known as a ternary LLM) is a type of large language model (LLM) designed to be computationally efficient. It achieves this by using weights that are restricted to only three values: -1, 0, and +1. This restriction significantly reduces the model's memory footprint and allows for faster processing, as computationally expensive multiplication operations can be replaced with lower-cost additions. This contrasts with traditional models that use 16-bit floating-point numbers (FP16 or BF16) for their weights. Studies have shown that for models up to several billion parameters, the performance of 1.58-bit LLMs on various tasks is comparable to their full-precision counterparts. This approach could enable powerful AI to run on less specialized and lower-power hardware. The name "1.58-bit" comes from the fact that a system with three states contains log 2 ⁡ 3 ≈ 1.58 {\displaystyle \log _{2}3\approx 1.58} bits of information. These models are sometimes also referred to as 1-bit LLMs in research papers, although this term can also refer to true binary models (with weights of -1 and +1). == BitNet == In 2024, Ma et al., researchers at Microsoft, declared that their 1.58-bit model, BitNet b1.58 is comparable in performance to the 16-bit Llama 2 and opens the era of 1-bit LLM. BitNet creators did not use the post-training quantization of weights but instead relied on the new BitLinear transform that replaced the nn.Linear layer of the traditional transformer design. In 2025, Microsoft researchers had released an open-weights and open inference code model BitNet b1.58 2B4T demonstrating performance competitive with the full precision models at 2B parameters and 4T training tokens. == Post-training quantization == BitNet derives its performance from being trained natively in 1.58 bit instead of being quantized from a full-precision model after training. Still, training is an expensive process and it would be desirable to be able to somehow convert an existing model to 1.58 bits. In 2024, HuggingFace reported a way to gradually ramp up the 1.58-bit quantization in fine-tuning an existing model down to 1.58 bits. == Critique == Some researchers point out that the scaling laws of large language models favor the low-bit weights only in case of undertrained models. As the number of training tokens increases, the deficiencies of low-bit quantization surface.

    Read more →
  • Transfer learning

    Transfer learning

    Transfer learning (TL) is a technique in machine learning (ML) in which knowledge learned from a task is re-used in order to boost performance on a related task. For example, for image classification, knowledge gained while learning to recognize cars could be applied when trying to recognize trucks. This topic is related to the psychological literature on transfer of learning, although practical ties between the two fields are limited. Reusing or transferring information from previously learned tasks to new tasks has the potential to significantly improve learning efficiency. Since transfer learning makes use of training with multiple objective functions it is related to cost-sensitive machine learning and multi-objective optimization. == History == In 1976, Bozinovski and Fulgosi published a paper addressing transfer learning in neural network training. The paper gives a mathematical and geometrical model of the topic. In 1981, a report considered the application of transfer learning to a dataset of images representing letters of computer terminals, experimentally demonstrating positive and negative transfer learning. In 1992, Lorien Pratt formulated the discriminability-based transfer (DBT) algorithm. By 1998, the field had advanced to include multi-task learning, along with more formal theoretical foundations. Influential publications on transfer learning include the book Learning to Learn in 1998, a 2009 survey and a 2019 survey. Ng said in his NIPS 2016 tutorial that TL would become the next driver of machine learning commercial success after supervised learning. In the 2020 paper, "Rethinking Pre-Training and self-training", Zoph et al. reported that pre-training can hurt accuracy, and advocate self-training instead. == Definition == The definition of transfer learning is given in terms of domains and tasks. A domain D {\displaystyle {\mathcal {D}}} consists of: a feature space X {\displaystyle {\mathcal {X}}} and a marginal probability distribution P ( X ) {\displaystyle P(X)} , where X = { x 1 , . . . , x n } ∈ X {\displaystyle X=\{x_{1},...,x_{n}\}\in {\mathcal {X}}} . Given a specific domain, D = { X , P ( X ) } {\displaystyle {\mathcal {D}}=\{{\mathcal {X}},P(X)\}} , a task consists of two components: a label space Y {\displaystyle {\mathcal {Y}}} and an objective predictive function f : X → Y {\displaystyle f:{\mathcal {X}}\rightarrow {\mathcal {Y}}} . The function f {\displaystyle f} is used to predict the corresponding label f ( x ) {\displaystyle f(x)} of a new instance x {\displaystyle x} . This task, denoted by T = { Y , f ( x ) } {\displaystyle {\mathcal {T}}=\{{\mathcal {Y}},f(x)\}} , is learned from the training data consisting of pairs { x i , y i } {\displaystyle \{x_{i},y_{i}\}} , where x i ∈ X {\displaystyle x_{i}\in {\mathcal {X}}} and y i ∈ Y {\displaystyle y_{i}\in {\mathcal {Y}}} . Given a source domain D S {\displaystyle {\mathcal {D}}_{S}} and learning task T S {\displaystyle {\mathcal {T}}_{S}} , a target domain D T {\displaystyle {\mathcal {D}}_{T}} and learning task T T {\displaystyle {\mathcal {T}}_{T}} , where D S ≠ D T {\displaystyle {\mathcal {D}}_{S}\neq {\mathcal {D}}_{T}} , or T S ≠ T T {\displaystyle {\mathcal {T}}_{S}\neq {\mathcal {T}}_{T}} , transfer learning aims to help improve the learning of the target predictive function f T ( ⋅ ) {\displaystyle f_{T}(\cdot )} in D T {\displaystyle {\mathcal {D}}_{T}} using the knowledge in D S {\displaystyle {\mathcal {D}}_{S}} and T S {\displaystyle {\mathcal {T}}_{S}} . == Applications == Algorithms for transfer learning are available in Markov logic networks and Bayesian networks. Transfer learning has been applied to cancer subtype discovery, building utilization, general game playing, text classification, digit recognition, medical imaging and spam filtering. In 2020, it was discovered that, due to their similar physical natures, transfer learning is possible between electromyographic (EMG) signals from the muscles and classifying the behaviors of electroencephalographic (EEG) brainwaves, from the gesture recognition domain to the mental state recognition domain. It was noted that this relationship worked in both directions, showing that electroencephalographic can likewise be used to classify EMG. The experiments noted that the accuracy of neural networks and convolutional neural networks were improved through transfer learning both prior to any learning (compared to standard random weight distribution) and at the end of the learning process (asymptote). That is, results are improved by exposure to another domain. Moreover, the end-user of a pre-trained model can change the structure of fully-connected layers to improve performance.

    Read more →