AI Assistant Qgis

AI Assistant Qgis — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Echo Lake (software)

    Echo Lake (software)

    Echo Lake (AKA Family Album Creator) was the most notable multimedia software product produced by Delrina, which debuted in June 1995. It was touted internally as a "cross [of] Quark Xpress and Myst". It featured an immersive 3D environment where a user could go to a virtual desktop in a virtual office and assemble video and audio clips along with images, and then print them out as either a virtual book other users of the program could use, or for print. It was a highly innovative product for its time, and ultimately was hampered by the inability of many users able to input their own multimedia content easily into a computer from that period. Creative Wonders bought the rights to the Echo Lake multimedia product, which was re-shaped as an introductory program on multimedia and re-released as Family Album Creator in 1996.

    Read more →
  • Best AI Content Generators in 2026

    Best AI Content Generators in 2026

    Trying to pick the best AI content generator? An AI content generator is software that uses machine learning to help you get more done — it scales effortlessly from a single task to thousands. The best picks balance beginner-friendly simplicity with the depth power users need, and they ship updates often. Whether you are a beginner or a pro, the right AI content generator slots into your workflow and pays for itself fast. This guide breaks down the top picks, their pros and cons, and who each one is best for.

    Read more →
  • Radford M. Neal

    Radford M. Neal

    Radford M. Neal (born September 12, 1956) is a professor emeritus at the Department of Statistics and Department of Computer Science at the University of Toronto, where he held a Canada research chair in statistics and machine learning. == Education and career == Neal studied computer science at the University of Calgary, where he received his B.Sc. in 1977 and M.Sc. in 1980, with thesis work supervised by David Hill. He worked for several years as a sessional instructor at the University of Calgary and as a statistical consultant in the industry before coming back to the academia. Neal continued his study at the University of Toronto, where he received his Ph.D. in 1995 under the supervision of Geoffrey Hinton. Neal became an assistant professor at the University of Toronto in 1995, an associated professor in 1999 and a full professor since 2001. He was the Canada Research Chair in Statistics and Machine Learning from 2003 to 2016 and retired in 2017. Neal has made great contributions in the area of machine learning and statistics, where he is particularly well known for his work on Markov chain Monte Carlo, error correcting codes and Bayesian learning for neural networks. He is also known for his blog and as the developer of pqR: a new version of the R interpreter.

    Read more →
  • Restricted Boltzmann machine

    Restricted Boltzmann machine

    A restricted Boltzmann machine (RBM) (also called a restricted Sherrington–Kirkpatrick model with external field or restricted stochastic Ising–Lenz–Little model) is a generative stochastic artificial neural network that can learn a probability distribution over its set of inputs. RBMs were initially proposed under the name Harmonium by Paul Smolensky in 1986, and rose to prominence after Geoffrey Hinton and collaborators used fast learning algorithms for them in the mid-2000s. RBMs have found applications in dimensionality reduction, classification, collaborative filtering, feature learning, topic modelling, immunology, and even many‑body quantum mechanics. They can be trained in either supervised or unsupervised ways, depending on the task. As their name implies, RBMs are a variant of Boltzmann machines, with the restriction that their neurons must form a bipartite graph: a pair of nodes from each of the two groups of units (commonly referred to as the "visible" and "hidden" units respectively) may have a symmetric connection between them; and there are no connections between nodes within a group. By contrast, "unrestricted" Boltzmann machines may have connections between hidden units. This restriction allows for more efficient training algorithms than are available for the general class of Boltzmann machines, in particular the gradient-based contrastive divergence algorithm. Restricted Boltzmann machines can also be used in deep learning networks. In particular, deep belief networks can be formed by "stacking" RBMs and optionally fine-tuning the resulting deep network with gradient descent and backpropagation. == Structure == The standard type of RBM has binary-valued (Boolean) hidden and visible units, and consists of a matrix of weights W {\displaystyle W} of size m × n {\displaystyle m\times n} . Each weight element ( w i , j ) {\displaystyle (w_{i,j})} of the matrix is associated with the connection between the visible (input) unit v i {\displaystyle v_{i}} and the hidden unit h j {\displaystyle h_{j}} . In addition, there are bias weights (offsets) a i {\displaystyle a_{i}} for v i {\displaystyle v_{i}} and b j {\displaystyle b_{j}} for h j {\displaystyle h_{j}} . Given the weights and biases, the energy of a configuration (pair of Boolean vectors) (v,h) is defined as E ( v , h ) = − ∑ i a i v i − ∑ j b j h j − ∑ i ∑ j v i w i , j h j {\displaystyle E(v,h)=-\sum _{i}a_{i}v_{i}-\sum _{j}b_{j}h_{j}-\sum _{i}\sum _{j}v_{i}w_{i,j}h_{j}} or, in matrix notation, E ( v , h ) = − a T v − b T h − v T W h . {\displaystyle E(v,h)=-a^{\mathrm {T} }v-b^{\mathrm {T} }h-v^{\mathrm {T} }Wh.} This energy function is analogous to that of a Hopfield network. As with general Boltzmann machines, the joint probability distribution for the visible and hidden vectors is defined in terms of the energy function as follows, P ( v , h ) = 1 Z e − E ( v , h ) {\displaystyle P(v,h)={\frac {1}{Z}}e^{-E(v,h)}} where Z {\displaystyle Z} is a partition function defined as the sum of e − E ( v , h ) {\displaystyle e^{-E(v,h)}} over all possible configurations, which can be interpreted as a normalizing constant to ensure that the probabilities sum to 1. The marginal probability of a visible vector is the sum of P ( v , h ) {\displaystyle P(v,h)} over all possible hidden layer configurations, P ( v ) = 1 Z ∑ { h } e − E ( v , h ) {\displaystyle P(v)={\frac {1}{Z}}\sum _{\{h\}}e^{-E(v,h)}} , and vice versa. Since the underlying graph structure of the RBM is bipartite (meaning there are no intra-layer connections), the hidden unit activations are mutually independent given the visible unit activations. Conversely, the visible unit activations are mutually independent given the hidden unit activations. That is, for m visible units and n hidden units, the conditional probability of a configuration of the visible units v, given a configuration of the hidden units h, is P ( v | h ) = ∏ i = 1 m P ( v i | h ) {\displaystyle P(v|h)=\prod _{i=1}^{m}P(v_{i}|h)} . Conversely, the conditional probability of h given v is P ( h | v ) = ∏ j = 1 n P ( h j | v ) {\displaystyle P(h|v)=\prod _{j=1}^{n}P(h_{j}|v)} . The individual activation probabilities are given by P ( h j = 1 | v ) = σ ( b j + ∑ i = 1 m w i , j v i ) {\displaystyle P(h_{j}=1|v)=\sigma \left(b_{j}+\sum _{i=1}^{m}w_{i,j}v_{i}\right)} and P ( v i = 1 | h ) = σ ( a i + ∑ j = 1 n w i , j h j ) {\displaystyle \,P(v_{i}=1|h)=\sigma \left(a_{i}+\sum _{j=1}^{n}w_{i,j}h_{j}\right)} where σ {\displaystyle \sigma } denotes the logistic sigmoid. The visible units of Restricted Boltzmann Machine can be multinomial, although the hidden units are Bernoulli. In this case, the logistic function for visible units is replaced by the softmax function P ( v i k = 1 | h ) = exp ⁡ ( a i k + Σ j W i j k h j ) Σ k ′ = 1 K exp ⁡ ( a i k ′ + Σ j W i j k ′ h j ) {\displaystyle P(v_{i}^{k}=1|h)={\frac {\exp(a_{i}^{k}+\Sigma _{j}W_{ij}^{k}h_{j})}{\Sigma _{k'=1}^{K}\exp(a_{i}^{k'}+\Sigma _{j}W_{ij}^{k'}h_{j})}}} where K is the number of discrete values that the visible values have. They are applied in topic modeling, and recommender systems. === Relation to other models === Restricted Boltzmann machines are a special case of Boltzmann machines and Markov random fields. The graphical model of RBMs corresponds to that of factor analysis. == Training algorithm == Restricted Boltzmann machines are trained to maximize the product of probabilities assigned to some training set V {\displaystyle V} (a matrix, each row of which is treated as a visible vector v {\displaystyle v} ), arg ⁡ max W ∏ v ∈ V P ( v ) {\displaystyle \arg \max _{W}\prod _{v\in V}P(v)} or equivalently, to maximize the expected log probability of a training sample v {\displaystyle v} selected randomly from V {\displaystyle V} : arg ⁡ max W E [ log ⁡ P ( v ) ] {\displaystyle \arg \max _{W}\mathbb {E} \left[\log P(v)\right]} The algorithm most often used to train RBMs, that is, to optimize the weight matrix W {\displaystyle W} , is the contrastive divergence (CD) algorithm due to Hinton, originally developed to train PoE (product of experts) models. The algorithm performs Gibbs sampling and is used inside a gradient descent procedure (similar to the way backpropagation is used inside such a procedure when training feedforward neural nets) to compute weight update. The basic, single-step contrastive divergence (CD-1) procedure for a single sample can be summarized as follows: Take a training sample v, compute the probabilities of the hidden units and sample a hidden activation vector h from this probability distribution. Compute the outer product of v and h and call this the positive gradient. From h, sample a reconstruction v' of the visible units, then resample the hidden activations h' from this. (Gibbs sampling step) Compute the outer product of v' and h' and call this the negative gradient. Let the update to the weight matrix W {\displaystyle W} be the positive gradient minus the negative gradient, times some learning rate: Δ W = ϵ ( v h T − v ′ h ′ T ) {\displaystyle \Delta W=\epsilon (vh^{\mathsf {T}}-v'h'^{\mathsf {T}})} . Update the biases a and b analogously: Δ a = ϵ ( v − v ′ ) {\displaystyle \Delta a=\epsilon (v-v')} , Δ b = ϵ ( h − h ′ ) {\displaystyle \Delta b=\epsilon (h-h')} . A Practical Guide to Training RBMs written by Hinton can be found on his homepage. == Stacked Restricted Boltzmann Machine == The difference between the Stacked Restricted Boltzmann Machines and RBM is that RBM has lateral connections within a layer that are prohibited to make analysis tractable. On the other hand, the Stacked Boltzmann consists of a combination of an unsupervised three-layer network with symmetric weights and a supervised fine-tuned top layer for recognizing three classes. The usage of Stacked Boltzmann is to understand Natural languages, retrieve documents, image generation, and classification. These functions are trained with unsupervised pre-training and/or supervised fine-tuning. Unlike the undirected symmetric top layer, with a two-way unsymmetric layer for connection for RBM. The restricted Boltzmann's connection is three-layers with asymmetric weights, and two networks are combined into one. Stacked Boltzmann does share similarities with RBM, the neuron for Stacked Boltzmann is a stochastic binary Hopfield neuron, which is the same as the Restricted Boltzmann Machine. The energy from both Restricted Boltzmann and RBM is given by Gibb's probability measure: E = − 1 2 ∑ i , j w i j s i s j + ∑ i θ i s i {\displaystyle E=-{\frac {1}{2}}\sum _{i,j}{w_{ij}{s_{i}}{s_{j}}}+\sum _{i}{\theta _{i}}{s_{i}}} . The training process of Restricted Boltzmann is similar to RBM. Restricted Boltzmann train one layer at a time and approximate equilibrium state with a 3-segment pass, not performing back propagation. Restricted Boltzmann uses both supervised and unsupervised on different RBM for pre-training for classification and recognition. The training uses contrastive divergence with

    Read more →
  • Whitelist

    Whitelist

    A whitelist or allowlist is a list or register of entities that are being provided a particular privilege, service, mobility, access or recognition. Entities on the list will be accepted, approved and/or recognized. Whitelisting is the reverse of blacklisting, the practice of identifying entities that are denied, unrecognized, or ostracized. == Email whitelists == Spam filters often include the ability to "whitelist" certain sender IP addresses, email addresses or domain names to protect their email from being rejected or sent to a junk mail folder. These can be manually maintained by the user or system administrator - but can also refer to externally maintained whitelist services. === Non-commercial whitelists === Non-commercial whitelists are operated by various non-profit organizations, ISPs, and others interested in blocking spam. Rather than paying fees, the sender must pass a series of tests; for example, their email server must not be an open relay and have a static IP address. The operator of the whitelist may remove a server from the list if complaints are received. === Commercial whitelists === Commercial whitelists are a system by which an Internet service provider allows someone to bypass spam filters when sending email messages to its subscribers, in return for a pre-paid fee, either an annual or a per-message fee. A sender can then be more confident that their messages have reached recipients without being blocked, or having links or images stripped out of them, by spam filters. The purpose of commercial whitelists is to allow companies to reliably reach their customers by email. == Advertising whitelist == Many websites rely on ads as a source of revenue, but the use of ad blockers is increasingly common. Websites that detect an adblocker in use often ask for it to be disabled - or their site to be "added to the whitelist" - a standard feature of most adblockers. == Network whitelists == === LAN whitelists === A use for whitelists is in local area network (LAN) security. Many network admins set up MAC address whitelists, or a MAC address filter, to control who is allowed on their networks. This is used when encryption is not a practical solution or in tandem with encryption. However, it's sometimes ineffective because a MAC address can be faked. === IP whitelist === Firewalls can usually be configured to only allow data-traffic from/to certain (ranges of) IP-addresses. === Application whitelists === One approach in combating viruses and malware is to whitelist software which is considered safe to run, blocking all others. This is particularly attractive in a corporate environment, where there are typically already restrictions on what software is approved. Leading providers of application whitelisting technology include Bit9, Velox, McAfee, Lumension, ThreatLocker, Airlock Digital and SMAC. On Microsoft Windows, recent versions include AppLocker, which allows administrators to control which executable files are denied or allowed to execute. With AppLocker, administrators are able to create rules based on file names, publishers or file location that will allow certain files to execute. Rules can apply to individuals or groups. Policies are used to group users into different enforcement levels. For example, some users can be added to a report-only policy that will allow administrators to understand the impact before moving that user to a higher enforcement level. Linux systems typically have AppArmor and SE Linux features available which can be used to effectively block all applications which are not explicitly whitelisted, and commercial products are also available. On HP-UX introduced a feature called "HP-UX Whitelisting" on 11iv3 version. == Controversy regarding name == In 2018, a journal commentary on a report on predatory publishing was released making claims that "white" and "black" are racially charged terms that need to be avoided in instances such as "whitelist" and "blacklist". The premise of the journal is that "black" and "white" have negative and positive connotations respectively. It states that since "blacklisting" was first referred to during "the time of mass enslavement and forced deportation of Africans to work in European-held colonies in the Americas," the word is therefore related to race. There is no mention of "whitelist" and its origin or relation to race. This issue is most widely disputed in computing industries where "whitelist" and "blacklist" are prevalent (e.g. IP whitelisting). Despite the commentary nature of the journal, some companies and individuals in others have taken to replacing "whitelist" and "blacklist" with new alternatives such as "allow list" and "deny list". Those adopting this change consider using the "whitelist"/"blacklist" names as a code smell. Those that oppose these changes question its attribution to race, citing the same etymology quote that the 2018 journal uses. According to the remark, the term "blacklist" evolved from the term "black book" about a century ago. The term "black book" does not appear to have any etymology or sources that support racial associations, instead originating in the 1400s as a reference to "a list of people who had committed crimes or fallen out of favor with leaders", and popularized by King Henry VIII's literal use of a black book. Others also note the prevalence of positive and negative connotations to "white" and "black" in the Bible, predating attributions to skin tone and slavery. It wasn't until the 1960s Black Power movement that "Black" became a widespread word to refer to one's race as a person of color in America (alternate to African-American) lending itself to the argument that the negative connotation behind "black" and "blacklist" both predate attribution to race.

    Read more →
  • Yun Sing Koh

    Yun Sing Koh

    Yun Sing Koh (born 1978) is a New Zealand computer science academic, and is a full professor at the University of Auckland, specialising in machine learning and artificial intelligence. She is a co-director of the Centre of Machine Learning for Social Good, and the Advanced Machine Learning and Data Analytics Research (MARS) Lab at Auckland. == Academic career == Koh earned a Bachelor of Science with Honours and a Master of Software Engineering at the University of Malaya. She then completed a PhD titled Generating sporadic association rules at the University of Otago in 2007. Koh joined the faculty of the University of Auckland in 2010, rising to full professor. As of 2024, she is director of the Centre of Machine Learning for Social Good at Auckland, alongside Gillian Dobbie and Daniel Wilson, and is director of the Master of AI course at the university. Koh also co-directs the Advanced Machine Learning and Data Analytics Research (MARS) Lab. Koh's research covers machine learning and artificial intelligence. She is especially interested in designing machine learning algorithms for data streams, and has led research using AI systems to identify individual stoats for pest population research. In 2018 she was awarded a Marsden grant for a research project "An Adaptive Predictive System for Life-long Learning on Data Streams", and has been part of three MBIE projects. In 2025 the stoat identification project Koh co-leads with Daniel Wilson was awarded $1 million per annum by the MBIE Smart Ideas fund. Koh was a finalist in the AI in Climate section of the Women in AI Australia and New Zealand Awards in 2022. She was a 2023 Fellow at the United States National Science Foundation-funded Convergence Research (CORE) Institute. Koh has chaired a number of sessions at international conferences on data mining. In March 2026 it was announced that Koh would be a member of the New Zealand Human Rights Commission's Expert Advisory Group on Artificial Intelligence, Emerging Digital Technologies and Human Rights. == Selected works == Philippe Fournier-Viger; Jerry Chun-Wei Lin; Rage Uday Kiran; Yun Sing Koh; Rincy Thomas (2017). "A Survey of Sequential Pattern Mining". Data Science and Pattern Recognition. 1 (1): 54–77. Wikidata Q138719481. Yun Sing Koh; Nathan Rountree; Richard O’Keefe (1 April 2006). "Finding Non-Coincidental Sporadic Rules Using Apriori-Inverse". International Journal of Data Warehousing and Mining (in Ndonga). 2 (2): 38–54. doi:10.4018/JDWM.2006040102. ISSN 1548-3924. Wikidata Q125185222. Russel Pears; Sripirakas Sakthithasan; Yun Sing Koh (11 January 2014). "Detecting concept change in dynamic data streams". Machine Learning. 97 (3): 259–293. doi:10.1007/S10994-013-5433-9. ISSN 1573-0565. Zbl 1319.68186. Wikidata Q125185156. David Tse Jung Huang; Yun Sing Koh; Gillian Dobbie; Russel Pears (December 2014), Detecting Volatility Shift in Data Streams, Institute of Electrical and Electronics Engineers, doi:10.1109/ICDM.2014.50, Wikidata Q125185151 Sidney Tsang; Yun Sing Koh; Gillian Dobbie (2011). "RP-Tree: Rare Pattern Tree Mining". Lecture Notes in Computer Science: 277–288. doi:10.1007/978-3-642-23544-3_21. ISSN 0302-9743. Wikidata Q125185206. Yun Sing Koh; Sri Devi Ravana (24 May 2016). "Unsupervised Rare Pattern Mining". ACM Transactions on Knowledge Discovery from Data. 10 (4): 1–29. doi:10.1145/2898359. ISSN 1556-4681. Wikidata Q125185136. Jack Julian; Yun Sing Koh; Albert Bifet (1 October 2025), Building adaptive knowledge bases for evolving continual learning models (PDF), vol. 1, doi:10.1038/S44387-025-00028-4, Wikidata Q138719496

    Read more →
  • Bin Yang

    Bin Yang

    Bin Yang (Chinese: 杨彬; Pinyin: Yáng Bīn) is a professor of computer science the department of computer science, Aalborg University. His research interests include data management and machine learning. == Education and career == Bin Yang received his bachelor and master degrees from Northwestern Polytechnical University, China in 2004 and 2007, respectively, and his Ph.D. from Fudan University in China in 2010. From 2010 to 2011, he worked at the Databases and Information Systems department at Max-Planck-Institut für Informatik in Germany. From 2011 to 2014, he was employed at the department of computer science, Aarhus University. He has been employed at Aalborg University since 2014. At the present moment, he works on a number of different projects: Time Series Analytics and Spatio-temporal Data Management, funded by Huawei, 2020 - 2022. Light-AI for Cognitive Power Electronics, funded by Villum Synergy Programme, 2020 - 2022. Advance: A Data-Intensive Paradigm for Dynamic, Uncertain Networks, funded by Independent Research Fund Denmark, 2019 - 2023. Algorithmic Foundations for Data-Intensive Routing, funded by The Danish Agency for Science and Higher Education, 2019 - 2021. Astra: AnalyticS of Time seRies in spAtial networks, funded by Independent Research Fund Denmark, 2018 - 2021. Distinguished Scholar, funded by The Technical Faculty of IT and Design, Aalborg University, 2018 - 2021. == Awards == Bin Yang has received a series of awards throughout his career: Sapere Aude Research Leader, Independent Research Fund Denmark, 2018. Distinguished Scholar, The Technical Faculty of IT and Design, Aalborg University, 2018. Early Career Distinguished Lecturer, 20th IEEE International Conference on Mobile Data Management (MDM), 2019. Distinguished Program Committee Member, 28th International Joint Conference on Artificial Intelligence (IJCAI), 2019 Best paper award at IEEE 14th International Conference on Mobile Data Management (MDM2013), Milan, Italy Best demo award at IEEE 14th International Conference on Mobile Data Management (MDM2013), Milan, Italy 2015 best paper in Pervasive and Embedded Computing, Shanghai Computer Academy == Selected publications == Sean Bin Yang, Chenjuan Guo, Jilin Hu, Jian Tang, and Bin Yang. Unsupervised Path Representation Learning with Curriculum Negative Sampling. IJCAI 2021. Razvan-Gabriel Cirstea, Tung Kieu, Chenjuan Guo, Bin Yang, and Sinno Jialin Pan. EnhanceNet: Plugin Neural Networks for Enhancing Correlated Time Series Forecasting. ICDE 2021. Sean Bin Yang, Chenjuan Guo, and Bin Yang. Context-Aware Path Ranking in Road Networks. TKDE 2021. Simon Aagaard Pedersen, Bin Yang, and Christian S. Jensen. Anytime Stochastic Routing with Hybrid Learning. PVLDB 13(9): 1555-1567 (2020). Tung Kieu, Bin Yang, Chenjuan Guo, and Christian S. Jensen. Outlier Detection for Time Series with Recurrent Autoencoder Ensembles. IJCAI 2019, 2725–2732. Jilin Hu, Chenjuan Guo, Bin Yang, and Christian S. Jensen. Stochastic Weight Completion for Road Networks using Graph Convolutional Networks. ICDE 2019, 1274–1285. Chenjuan Guo, Bin Yang, Jilin Hu, and Christian S. Jensen. Learning to Route with Sparse Trajectory Sets. ICDE 2018, 1073–1084. Bin Yang, Jian Dai, Chenjuan Guo, Christian S. Jensen, and Jilin Hu. PACE: A PAth-CEntric Paradigm For Stochastic Path Finding. The VLDB Journal 27(2): 153-178 (2018). Jian Dai, Bin Yang, Chenjuan Guo, and Zhiming Ding. Personalized Route Recommendation using Big Trajectory Data. ICDE 2015, 543–554, Seoul, Korea, April 2015. Bin Yang, Manohar Kaul, and Christian S. Jensen. Using Incomplete Information for Complete Weight Annotation of Road Networks. TKDE 26(5):1267-1279. Bin Yang, Chenjuan Guo, and Christian S. Jensen. Travel Cost Inference from Sparse, Spatio-Temporally Correlated Time Series Using Markov Models. PVLDB 6(9):769-780. VLDB 2013, Riva del Garda, Trento, Italy, August 2013.

    Read more →
  • Markov chain Monte Carlo

    Markov chain Monte Carlo

    In statistics, Markov chain Monte Carlo (MCMC) is a class of algorithms used to draw samples from a probability distribution. Given a probability distribution, one can construct a Markov chain whose elements' distribution approximates it, i.e. the Markov chain's equilibrium distribution matches the target distribution. The more steps that are included, the more closely the distribution of the sample matches the actual desired distribution. Markov chain Monte Carlo methods are used to study probability distributions that are too complex or too high dimensional to study with analytic techniques alone. Various algorithms exist for constructing such Markov chains, including the Metropolis–Hastings algorithm. == General explanation == Markov chain Monte Carlo methods create samples from a continuous random variable, with probability density proportional to a known function. These samples can be used to evaluate an integral over that variable, as its expected value or variance. Practically, an ensemble of chains is generally developed, starting from a set of points arbitrarily chosen and sufficiently distant from each other. These chains are stochastic processes of "walkers" which move around randomly according to an algorithm that looks for places with a reasonably high contribution to the integral to move into next, assigning them higher probabilities. Random walk Monte Carlo methods are a kind of random simulation or Monte Carlo method. However, whereas the random samples of the integrand used in a conventional Monte Carlo integration are statistically independent, those used in MCMC are autocorrelated. Correlations of samples introduces the need to use the Markov chain central limit theorem when estimating the error of mean values. These algorithms create Markov chains such that they have an equilibrium distribution which is proportional to the function given. == History == The development of MCMC methods is deeply rooted in the early exploration of Monte Carlo (MC) techniques in the mid-20th century, particularly in physics. These developments were marked by the Metropolis algorithm proposed by Nicholas Metropolis, Arianna W. Rosenbluth, Marshall Rosenbluth, Augusta H. Teller, and Edward Teller in 1953, which was designed to tackle high-dimensional integration problems using early computers. Then in 1970, W. K. Hastings generalized this algorithm and inadvertently introduced the component-wise updating idea, later known as Gibbs sampling. Simultaneously, the theoretical foundations for Gibbs sampling were being developed, such as the Hammersley–Clifford theorem from Julian Besag's 1974 paper. Although the seeds of MCMC were sown earlier, including the formal naming of Gibbs sampling in image processing by Stuart Geman and Donald Geman (1984) and the data augmentation method by Martin A. Tanner and Wing Hung Wong (1987), its "revolution" in mainstream statistics largely followed demonstrations of the universality and ease of implementation of sampling methods (especially Gibbs sampling) for complex statistical (particularly Bayesian) problems, spurred by increasing computational power and software like BUGS. This transformation was accompanied by significant theoretical advancements, such as Luke Tierney's (1994) rigorous treatment of MCMC convergence, and Jun S. Liu, Wong, and Augustine Kong's (1994, 1995) analysis of Gibbs sampler structure. Subsequent developments further expanded the MCMC toolkit, including particle filters (Sequential Monte Carlo) for sequential problems, Perfect sampling aiming for exact simulation (Jim Propp and David B. Wilson, 1996), RJMCMC (Peter J. Green, 1995) for handling variable-dimension models, and deeper investigations into convergence diagnostics and the central limit theorem. Overall, the evolution of MCMC represents a paradigm shift in statistical computation, enabling the analysis of numerous previously intractable complex models and continually expanding the scope and impact of statistics. == Mathematical setting == Suppose (Xn) is a Markov Chain in the general state space X {\displaystyle {\mathcal {X}}} with specific properties. We are interested in the limiting behavior of the partial sums: S n ( h ) = 1 n ∑ i = 1 n h ( X i ) {\displaystyle S_{n}(h)={\dfrac {1}{n}}\sum _{i=1}^{n}h(X_{i})} as n goes to infinity. Particularly, we hope to establish the Law of Large Numbers and the Central Limit Theorem for MCMC. In the following, we state some definitions and theorems necessary for the important convergence results. In short, we need the existence of invariant measure and Harris recurrent to establish the Law of Large Numbers of MCMC (Ergodic Theorem). And we need aperiodicity, irreducibility and extra conditions such as reversibility to ensure the Central Limit Theorem holds in MCMC. === Irreducibility and aperiodicity === Recall that in the discrete setting, a Markov chain is said to be irreducible if it is possible to reach any state from any other state in a finite number of steps with positive probability. However, in the continuous setting, point-to-point transitions have zero probability. In this case, φ-irreducibility generalizes irreducibility by using a reference measure φ on the measurable space ( X , B ( X ) ) {\displaystyle ({\mathcal {X}},{\mathcal {B}}({\mathcal {X}}))} . Definition (φ-irreducibility) Given a measure φ {\displaystyle \varphi } defined on ( X , B ( X ) ) {\displaystyle ({\mathcal {X}},{\mathcal {B}}({\mathcal {X}}))} , the Markov chain ( X n ) {\displaystyle (X_{n})} with transition kernel K ( x , y ) {\displaystyle K(x,y)} is φ-irreducible if, for every A ∈ B ( X ) {\displaystyle A\in {\mathcal {B}}({\mathcal {X}})} with φ ( A ) > 0 {\displaystyle \varphi (A)>0} , there exists n {\displaystyle n} such that K n ( x , A ) > 0 {\displaystyle K^{n}(x,A)>0} for all x ∈ X {\displaystyle x\in {\mathcal {X}}} (Equivalently, P x ( τ A < ∞ ) > 0 {\displaystyle P_{x}(\tau _{A}<\infty )>0} , here τ A = inf { n ≥ 1 ; X n ∈ A } {\displaystyle \tau _{A}=\inf\{n\geq 1;X_{n}\in A\}} is the first n {\displaystyle n} for which the chain enters the set A {\displaystyle A} ). This is a more general definition for irreducibility of a Markov chain in non-discrete state space. In the discrete case, an irreducible Markov chain is said to be aperiodic if it has period 1. Formally, the period of a state ω ∈ X {\displaystyle \omega \in {\mathcal {X}}} is defined as: d ( ω ) := g c d { m ≥ 1 ; K m ( ω , ω ) > 0 } {\displaystyle d(\omega ):=\mathrm {gcd} \{m\geq 1\,;\,K^{m}(\omega ,\omega )>0\}} For the general (non-discrete) case, we define aperiodicity in terms of small sets: Definition (Cycle length and small sets) A φ-irreducible Markov chain ( X n ) {\displaystyle (X_{n})} has a cycle of length d if there exists a small set C {\displaystyle C} , an associated integer M {\displaystyle M} , and a probability distribution ν M {\displaystyle \nu _{M}} such that d is the greatest common divisor of: { m ≥ 1 ; ∃ δ m > 0 such that C is small for ν m ≥ δ m ν M } . {\displaystyle \{m\geq 1\,;\,\exists \,\delta _{m}>0{\text{ such that }}C{\text{ is small for }}\nu _{m}\geq \delta _{m}\nu _{M}\}.} A set C {\displaystyle C} is called small if there exists m ∈ N ∗ {\displaystyle m\in \mathbb {N} ^{}} and a nonzero measure ν m {\displaystyle \nu _{m}} such that: K m ( x , A ) ≥ ν m ( A ) , ∀ x ∈ C , ∀ A ∈ B ( X ) . {\displaystyle K^{m}(x,A)\geq \nu _{m}(A),\quad \forall x\in C,\,\forall A\in {\mathcal {B}}({\mathcal {X}}).} === Harris recurrent === Definition (Harris recurrence) A set A {\displaystyle A} is Harris recurrent if P x ( η A = ∞ ) = 1 {\displaystyle P_{x}(\eta _{A}=\infty )=1} for all x ∈ A {\displaystyle x\in A} , where η A = ∑ n = 1 ∞ I A ( X n ) {\displaystyle \eta _{A}=\sum _{n=1}^{\infty }\mathbb {I} _{A}(X_{n})} is the number of visits of the chain ( X n ) {\displaystyle (X_{n})} to the set A {\displaystyle A} . The chain ( X n ) {\displaystyle (X_{n})} is said to be Harris recurrent if there exists a measure ψ {\displaystyle \psi } such that the chain is ψ {\displaystyle \psi } -irreducible and every measurable set A {\displaystyle A} with ψ ( A ) > 0 {\displaystyle \psi (A)>0} is Harris recurrent. A useful criterion for verifying Harris recurrence is the following: Proposition If for every A ∈ B ( X ) {\displaystyle A\in {\mathcal {B}}({\mathcal {X}})} , we have P x ( τ A < ∞ ) = 1 {\displaystyle P_{x}(\tau _{A}<\infty )=1} for every x ∈ A {\displaystyle x\in A} , then P x ( η A = ∞ ) = 1 {\displaystyle P_{x}(\eta _{A}=\infty )=1} for all x ∈ X {\displaystyle x\in {\mathcal {X}}} , and the chain ( X n ) {\displaystyle (X_{n})} is Harris recurrent. This definition is only needed when the state space X {\displaystyle {\mathcal {X}}} is uncountable. In the countable case, recurrence corresponds to E x [ η x ] = ∞ {\displaystyle \mathbb {E} _{x}[\eta _{x}]=\infty } , which is equivalent to P x ( τ x < ∞ ) = 1 {\displaystyle P_{x}(\tau _{x}<\infty )=1} for all x ∈ X {\displaystyle x\i

    Read more →
  • Wilkinson's Grammar of Graphics

    Wilkinson's Grammar of Graphics

    The Grammar of Graphics (GoG) is a grammar-based system for representing graphics to provide grammatical constraints on the composition of data and information visualizations. A graphical grammar differs from a graphics pipeline as it focuses on semantic components such as scales and guides, statistical functions, coordinate systems, marks and aesthetic attributes. For example, a bar chart can be converted into a pie chart by specifying a polar coordinate system without any other change in graphical specification. The grammar of graphics concept was launched by Leland Wilkinson in 2001 (Wilkinson et al., 2001; Wilkinson, 2005) and graphical grammars have since been written in a variety of languages with various parameterisations and extensions. The major implementations of graphical grammars are nViZn created by a team at SPSS/IBM, followed by Polaris focusing on multidimensional relational databases which is commercialised as Tableau, a revised Layered Grammar of Graphics by Hadley Wickham in Ggplot2, and Vega-Lite which is a visualisation grammar with added interactivity. The grammar of graphics continues to evolve with alternate parameterisations, extensions, or new specifications. == Wilkinson's Grammar of Graphics == === Theory === Wilkinson conceived the seven elements of a graphics to be Variables: mapping of objects to values represented in a graphic Algebra: operations to combine variables and specify dimensions of graphs Geometry: creation of geometric graphs from variables Aesthetics: sensory attributes Statistics: functions to change the appearance and representation of graphs Scales: represent variables on measured dimensions Coordinates: mapping to coordinate systems With these, Wilkinson hypothesised that These seven constructs are orthogonal and virtually all known statistical charts can be generated relatively parsimoniously This computational system is not a taxonomy of charts and rather it describes the meaning of what we do when we construct statistical graphics. === Implementations === Wilkinson wrote SYSTAT, a statistical software package, in the early 1980s. This program was noted for its comprehensive graphics, including the first software implementation of the heatmap display now widely used among biologists. After his company grew to 50 employees, he sold it to SPSS in 1995. At SPSS, he assembled a team of graphics programmers who developed the nViZn platform that produces the visualizations in SPSS, Clementine, and other analytics products. While at Stanford, Tableau founders Hanrahan and Stolte, as well as Diane Tang, created the predecessor to Tableau, named Polaris. Polaris was a data visualization software tool, built with the support of a United States Department of Energy defense program, the Accelerated Strategic Computing Initiative (ASCI). The main differences between Wilkinson's system and Polaris are the use of SQL relational algebra for database services and using shelves instead of cross and nest operators. == Wickham's Layered Grammar of Graphics == === Theory === Hadley Wickham conceived an alternate parameterisation of the syntax Wilkinson had derived, creating a layered grammar of graphics which he implemented as ggplot2 for R (programming language) users. This added a hierarchy of defaults based around the idea of building up a graphic from multiple layers. Wickham conceived these elements to be: Defaults: consists of data and mapping Data: dataset Mapping: aesthetic mappings Layer: consists of data, mapping, geom, stat, and position Data: dataset, or inherit from defaults Mapping: aesthetic mappings, or inherit from defaults Geom: geometric object Stat: statistical transformation Position: position adjustment Scale: mapping of data to aesthetic attributes Coord: mapping of data to the plane of the plot Facet: split up the data === Reception === Wilkinson is generally positive on Wickham's parameterisation and implementation of ggplot2, praising its elegance and expressivity whilst claiming that his original Grammar of Graphics is capable of representing a wider range of statistical graphics. === Implementations === ggplot2 is the first implementation of a layered grammar of graphics in R and implementations in other programming languages have ensued. These include direct ports plotnine for Python, gramm for MATLAB, Lets-Plot for Kotlin and gadfly for Julia. Projects inspired by elements of Wickham's grammar include Vega-Lite which specifies plots in JSON and uses a JavaScript engine. Implementations for Python include Vega-Altair (built on top of Vega-Lite). == Vega-Lite: A Grammar of Interactive Graphics == === Theory === Vega-Lite combines ideas from Wilkinson's Grammar of Graphics and Wickham's Layered Grammar of Graphics with a composition algebra for layered and multi-view displays with a grammar of interaction. The Vega-Lite specification is instantiated in JSON and rendered by the lower-level Vega. The graphical grammar implemented by Vega-Lite is composed of the following: Unit: consists of data, transforms, mark-type and encoding Data: relational table consisting of records (rows) and named attributes (columns) Transforms: data transformations Mark-type: geometric object for visual encoding Encodings: mapping of data attributes to visual marks properties where each encoding consists of: Channel: e.g. colour, shape, size, or text Field: data attribute Data-type: e.g. nominal, ordinal, quantitative, or temporal Value: use a literal instead of a data-type Functions: e.g. binning, aggregation, and sorting Scale: maps from data domain to visual range Guide: axis or legend for visualising scale Composite Views: compose views from multiple unit specifications with operators: Layer: charts plotted on top of each other Hconcat/Vconcat: place views side-by-side Facet: subset data to produce a trellis plot Repeat: multiple plots similar to facet but with full data replication in each cell Interaction: selections identify the set of points a user is interested in manipulating, with components: Selection: get the minimal number of backing points Name: reference Type: how many backing values are stored Predicate: determine the set of selected points e.g. single, list, interval Domain|Range: store data domain or visual range Event: e.g. mouseover, mousedown, mouseup, Init: initialise with specific backing points Transforms: e.g. project, toggle, translate, zoom, and nearest Resolve: resolve selections to union or intersect ==== Implementations ==== Whilst Vega-Lite is the sole implementation of this graphics grammar specification with compilation to Vega, other implementations do create JSON files which can be interpreted by Vega-Lite. == Related projects == Ggplot2 is an R package for plotting Tableau Software (originally known as Polaris) is a commercial software built using the Grammar of Graphics nViZn built by Wilkinson. SYSTAT (statistics package) built by Wilkinson ggpy, ggplot for Python, but has not been updated since 20 November 2016 plotnine started as an effort to improve the scalability of ggplot for Python and is largely compatible with ggplot2 syntax. Plotly - Interactive, online ggplot2 graphs gramm, a plotting class for MATLAB inspired by ggplot2 gadfly, a system for plotting and visualization written in Julia, based largely on ggplot2 Chart::GGPlot - ggplot2 port in Perl, but has not been updated since 16 March 2023 The Lets-Plot for Python library includes a native backend and a Python API, which was mostly based on the ggplot2 package. Lets-Plot Kotlin API is an open-source plotting library for statistical data implemented using the Kotlin programming language, and is built on the principles of layered graphics first described in the Leland Wilkinson's work The Grammar of Graphics. ggplotnim, plotting library using the Nim programming language inspired by ggplot2. Vega and Vega-Lite are plotting libraries that use JSON to specify plots. Vega-Altair, a Python library built on top of Vega-Lite chart-parts - React-friendly Grammar of Graphics, but has not been updated since 10 Dec 2021 g2 - a JavaScript library

    Read more →
  • Alberto Broggi

    Alberto Broggi

    Alberto Broggi is General Manager at VisLab srl (spinoff of the University of Parma acquired by Silicon-Valley company Ambarella Inc. in June 2015) and a professor of Computer Engineering at the University of Parma in Italy. == Research in computer vision, hardware, and AV == Broggi's research activities started in 1991–1994. His group together with the Dipartimento di Elettronica, Politecnico di Torino, Italy, built their own hardware architecture (named PAPRICA, for PArallel PRocessor for Image Checking and Analysis, based on 256 single-bit processing elements working in SIMD fashion) and installed it on board of a mobile laboratory (Mob-Lab) to develop and test some initial concepts in the field of intelligent vehicles. In 1996, Broggi's group worked to develop a real vehicle prototype (named ARGO, a Lancia Thema passenger car which was equipped with vision sensors, processing systems, and vehicle actuators) and developed the necessary software and hardware that made it able to drive autonomously on standard roads. Broggi's research group (called VisLab from then on) gathered all their findings in a book, which was then also translated in Chinese. When Broggi was with the University of Pavia, his research was extended and applied to extreme conditions (automatic driving on snow and ice): in 2001, VisLab led the research effort of providing a vehicle (RAS, Robot Antartico di Superficie) with sensing capabilities so that it was able to automatically follow the vehicle in front. In 2010 Broggi's group embarked on driving 4 vehicles autonomously from Italy to China with no human intervention. This challenge is called VIAC, for VisLab Intercontinental Autonomous Challenge . Soon after this, Broggi was awarded a second ERC grant (Proof of concept) to industrialize some of the results obtained and successfully tested on the VIAC vehicles. On July 12, 2013, VisLab tested the BRAiVE vehicle in downtown Parma, negotiating two-way narrow rural roads, pedestrian crossings, traffic lights, artificial bumps, pedestrian areas, and tight roundabouts. The vehicle traveled from Parma University Campus up to Piazza della Pilotta (downtown Parma): a 20 minutes run in a real environment, together with real traffic at 11am on a working day, that required absolutely no human intervention. Part of this test was driven with nobody in the driver seat, for the first time ever on public roads.

    Read more →
  • DeepL Translator

    DeepL Translator

    DeepL is a German AI research company known for its language AI platform, which includes DeepL Translator and DeepL Voice, and for DeepL Agent, an AI agent capable of planning workflows and using office systems and tools autonomously, in response to natural language instructions. Its algorithm uses the transformer architecture. It offers a paid subscription for additional features and access to its translation application programming interface. DeepL was founded in 2017 by Jaroslaw Kutylowski and is a unicorn, valued at $2 billion after a Series C funding round raised $300 million in May 2024. Its more than 200,000 business customers include a large proportion of the Fortune 500. == History == The translating system was first developed within Linguee by a team led by Chief Technology Officer Jarosław Kutyłowski in 2016. It was launched as DeepL Translator on 28 August 2017 and offered translations between English, German, French, Spanish, Italian, Polish and Dutch. At its launch, it claimed to have surpassed its competitors in blind tests and BLEU scores, including Google Translate, Amazon Translate, Microsoft Translator and Facebook's translation feature. With the release of DeepL in 2017, Linguee's company name was changed to DeepL GmbH, and it is also financed by advertising on its sister site, linguee.com. Support for Portuguese and Russian was added on 5 December 2018. In July 2019, Jarosław Kutyłowski became the CEO of DeepL GmbH and restructured the company into a Societas Europaea in 2021. Translation software for Microsoft Windows and macOS was released in September 2019. Support for Chinese (simplified) and Japanese was added on 19 March 2020, which the company claimed to have surpassed the aforementioned competitors as well as Baidu and Youdao. Then, 13 more European languages were added in March 2021: Bulgarian, Czech, Danish, Estonian, Finnish, Greek, Hungarian, Latvian, Lithuanian, Romanian, Slovak, Slovenian, and Swedish, bringing the total number of supported languages to 24. On 25 May 2022, support for Indonesian and Turkish was added, and support for Ukrainian was added on 14 September 2022. In January 2023, the company reached a valuation of 1 billion euro and became the most valued startup company in Cologne. At the end of the month, support for Korean and Norwegian (Bokmål) was also added. In May 2024, the company announced an investment of US$300 million at AI. In January 2026, more languages were supported, including Luxembourgish and Irish. == Services == === Translation method === The service uses a proprietary algorithm with convolutional neural networks (CNNs) that have been trained with the Linguee database. According to the developers, the service uses a newer improved architecture of neural networks, resulting in a more natural sound of translations than by competing services. The translation is generated using a supercomputer that reaches 5.1 petaflops and is operated in Iceland with hydropower. DeepL's data centers are located at the EcoDataCenter in Falun, Sweden, which is a data center for sustainability. In general, CNNs are slightly more suitable for long coherent word sequences, but they have so far not been used by the competition because of their weaknesses compared to recurrent neural networks. The weaknesses of DeepL are compensated for by supplemental techniques, some of which are publicly known. === Translator and subscription === The translator can be used for free with a maximum limit of 1,500 characters per translation. Microsoft Word and PowerPoint files in Office Open XML file formats (.docx and .pptx) and PDF files up to 5MB in size can also be translated. It offers paid subscription DeepL Pro, which has been available since March 2018 and includes application programming interface access and a software plug-in for computer-assisted translation tools, including SDL Trados Studio. Unlike the free version, translated texts are stated to not be saved on the server; also, the character limit is removed. The monthly pricing model includes a set amount of text, with texts beyond that being calculated according to the number of characters. ==== Supported languages ==== As of May 2026, the translation service supports the following languages: Additionally, these languages are currently in beta, indicated by an asterisk after their name in the language picker: === DeepL Write === In November 2022, DeepL launched a tool to improve monolingual texts in English and German, called DeepL Write. In December, the company removed access and informed journalists that it was only for internal use and that DeepL Write would be relaunched in early 2023. The public beta version was then released on January 17, 2023. In the summer of 2024, DeepL announced the availability of two more languages in DeepL Write: French and Spanish. By January 2024, DeepL had added an additional two: Portuguese (European and Brazilian) and Italian. === DeepL Agent === In November 2025, DeepL launched an AI agent called DeepL Agent which is capable of operating business applications in a human-like manner. == Reception == The reception of DeepL has been generally positive. TechCrunch appreciates it for the accuracy of its translations and stating that it was more accurate and nuanced than Google Translate. Le Monde thanks its developers for translating French text into more "French-sounding" expressions. RTL Z stated that DeepL Translator "offers better translations […] when it comes to Dutch to English and vice versa". La Repubblica, and a Latin American website, "WWWhat's new?", showed praise as well. A 2018 paper by the University of Bologna evaluated the Italian-to-German translation capabilities and found the preliminary results to be similar in quality to Google Translate. In September 2021, Slator remarked that the language industry response was more measured than the press and noted that DeepL is still highly regarded by users. A reviewer noted in 2018 that DeepL had far fewer languages available for translation than competing products. == Awards and honors == DeepL won the 2020 Webby Award for Best Practices and the 2020 Webby Award for Technical Achievement (Apps, Mobile, and Features), both in the category Apps, Mobile & Voice. In April 2025, DeepL was featured in the Forbes AI 50 list.

    Read more →
  • Magnetic ink character recognition

    Magnetic ink character recognition

    Magnetic ink character recognition code, known in short as MICR code, is a character recognition technology used mainly by the banking industry to streamline the processing and clearance of cheques and other documents. MICR encoding, called the MICR line, is at the bottom of cheques and other vouchers and typically includes the document-type indicator, bank code, bank account number, cheque number, cheque amount (usually added after a cheque is presented for payment), and a control indicator. The format for the bank code and bank account number is country-specific. The technology allows MICR readers to scan and read the information directly into a data-collection device. Unlike barcode and similar technologies, MICR characters can be read easily by humans. MICR encoded documents can be processed much faster and more accurately than conventional OCR encoded documents. == Pre-Unicode standard representation == The ISO standard ISO 2033:1983, and the corresponding Japanese Industrial Standard JIS X 9010:1984 (originally JIS C 6229–1984), define character encodings for OCR-A, OCR-B and E-13B. == International spread == There are two major MICR fonts in use: E-13B and CMC-7. There is no particular international agreement on which countries use which font. In practice, this does not create particular problems as cheques and other vouchers do not usually flow out of a particular jurisdiction. The E-13B font has been adopted as an international standard in ISO 1004-1:2013, and is the standard in Australia, Canada, the United Kingdom, the United States, as well as Central America and much of Asia, besides other countries. The CMC-7 font has been adopted as an international standard in ISO 1004-2:2013, and is widely used in Europe, including France and Italy, Mexico, and South America, including Argentina, Brazil, Chile, besides other countries. Israel is the only country that can use both fonts simultaneously, though the practice makes the system significantly less efficient. This situation is the product of the Israelis adopting CMC-7, while the Palestinians opted for E-13B. == Fonts == === E-13B === E-13B is a 14-character set, comprising the 10 decimal digits, and the following symbols: ⑆ (transit: used to delimit a bank code); ⑈ (on-us: used to delimit a customer account number); ⑇ (amount: used to delimit a transaction amount); ⑉ (dash: used to delimit parts of numbers—e.g., routing numbers or account numbers). In the check printing and banking industries the E-13B MICR line is also commonly referred to as the TOAD line. This reference comes from the 4 characters: Transit, On-us, Amount, and Dash. Compared to CMC-7, some pairs of E-13B characters (notably 2 and 5) can produce relatively similar results when magnetically scanned; however, as a fallback if magnetic reading fails, E-13B also performs well under optical character recognition. The E-13B repertoire can be represented in Unicode (see below). The official Unicode names contain misnomers. For example, the ⑈ on-us symbol is official titled "OCR Dash". Prior to Unicode, it could be encoded according to ISO 2033:1983, which encodes digits in their usual ASCII locations, transit as 0x3A, on-us as 0x3C, amount as 0x3B, and dash as 0x3D. For EBCDIC, IBM code page 1001 encodes digits in their usual EBCDIC locations, transit as 0xDB, on-us as 0xEB, amount as 0xCB, and dash as 0xFB. IBM code page 1032 extends code page 1001 by adding alternative encodings for transit at 0x5C, 0x7A and 0xC1, on-us at 0x4C, 0x61 and 0xC3, amount at 0x5B, 0x5E and 0xC2 and dash at 0x60, 0x7E and 0xC4, in addition to a zero-width space at 0x5A. These alternative representations were added for interoperability with Siemens and Océ printers. === CMC-7 === CMC-7 includes 10 numeric digits, 26 capital letters, and 5 control characters: S I (internal), S II (terminator), S III (amount), S IV (an unused character), and S V (routing). CMC-7 has a barcode format, with every character having two distinct large gaps in different places, as well as distinct patterns in between, to minimize any chance for character confusion while reading magnetically; however, these bars are too close and narrow to be reliably recognised at a typical scan resolution if falling back to optical scanning. CMC-7 can also produce superficially successful, but incorrect, scans of upside-down MICR lines. Unicode does not include support for the CMC-7 control symbols. IBM code page 1033 encodes: Digits and capitals in their usual EBCDIC locations S I (internal) as 0x5E, 0x61 or 0xCB; S II (terminator) as 0x4C, 0x5B or 0xEB; S III (amount) as 0x60, 0x7E or 0xFB; S IV as 0x50, 0x7A or 0xDB; S V (routing) as 0x5C, 0x6E or 0xBB. == MICR reader == MICR characters are printed on documents in one of the two MICR fonts, using magnetizable (commonly known as magnetic) ink or toner, usually containing iron oxide. In scanning, the document is passed through a MICR reader, which performs two functions: magnetization of the ink, and detection of the characters. The characters are read by a MICR reader head, a device similar to the playback head of a tape recorder. As each character passes over the head, it produces a unique waveform that can be easily identified by the system. MICR readers are the primary tool for cheque sorting and are used across the cheque distribution network at multiple stages. For example, a merchant will use a MICR reader to sort cheques by bank and send the sorted cheques to a clearing house for redistribution to those banks. Upon receipt, the banks perform another MICR sort to determine which customer's account is charged and to which branch the cheque should be sent on its way back to the customer. However, many banks no longer offer this last step of returning the cheque to the customer. Instead, cheques are scanned and stored digitally. Sorting of cheques is done as per the geographical coverage of banks in a nation. == Unicode == OCR and MICR characters have been included in the Unicode Standard since at least version 1.1 (June 1993). Since the Unicode Character Database only tracks characters starting with version 1.1, they may also have been present in Unicode 1.0 or 1.0.1. The Unicode block that includes OCR and MICR characters is called Optical Character Recognition and covers U+2440–U+245F. Of the characters in this block, four are from the MICR E-13B font: U+2446 ⑆ OCR BRANCH BANK IDENTIFICATION U+2447 ⑇ OCR AMOUNT OF CHECK U+2448 ⑈ OCR DASH (corrected alias MICR ON US SYMBOL) U+2449 ⑉ OCR CUSTOMER ACCOUNT NUMBER (corrected alias MICR DASH SYMBOL) The names of the latter two characters were inadvertently switched when they were named in ISO/IEC 10646:1993, and they have been assigned accurate names as formal aliases. Per the Unicode Stability Policy, the existing names remain, allowing their use as stable identifiers. Additionally, all four characters have informative (non-formal) aliases in the Unicode charts: "transit", "amount", "on-us", and "dash" respectively. Prior to Unicode, these symbols had been encoded by the ISO-IR-98 encoding defined by ISO 2033:1983, in which they were simply named SYMBOL ONE through SYMBOL FOUR. They were encoded immediately following the digits, which were encoded at their ASCII locations. Although ISO 2033 also specifies encoding for OCR-A and OCR-B, its encoding for E-13B is known simply as ISO_2033-1983 by the IANA. == History == Before the mid-1940s, cheques were processed manually using the Sort-A-Matic or Top Tab Key method. The processing and cheque clearing was very time-consuming and was a significant cost in cheque clearance and bank operations. As the number of cheques increased, ways were sought for automating the process. Standards were developed to ensure uniformity in financial institutions. By the mid-1950s, the Stanford Research Institute and General Electric Computer Laboratory had developed the first automated system to process cheques using MICR. The same team also developed the E-13B MICR font. "E" refers to the font being the fifth considered, and "B" to the fact that it was the second version. The "13" refers to the 0.013-inch character grid. The trial of MICR E-13B font was shown to the American Bankers Association (ABA) in July 1956, which adopted it in 1958 as the MICR standard for negotiable documents in the United States. ABA adopted MICR as its standard because machines could read MICR accurately, and MICR could be printed using existing technology. In addition, MICR remained machine readable, even through overstamping, marking, mutilation and more. The first cheques using MICR were printed by the end of 1959. Although compliance with MICR standards was voluntary in the United States, it had been almost universally adopted in the United States by 1963. In 1963, ANSI adopted the ABA's E-13B font as the American standard for MICR printing, and E-13B was also standardized as ISO 1004:1995. Other countries set their own standards, though the MICR readers and m

    Read more →
  • MultiValue database

    MultiValue database

    A MultiValue database is a type of NoSQL and multidimensional database. It is typically considered synonymous with PICK, a database originally developed as the Pick operating system. MultiValue databases include commercial products from Rocket Software, Revelation, InterSystems, Northgate Information Solutions, ONgroup, and other companies. These databases differ from a relational database in that they have features that support and encourage the use of attributes which can take a list of values, rather than all attributes being single-valued. They are often categorized with MUMPS within the category of post-relational databases, although the data model actually pre-dates the relational model. Unlike SQL-DBMS tools, most MultiValue databases can be accessed both with or without SQL. == History == Don Nelson designed the MultiValue data model in the early to mid-1960s. Dick Pick, a developer at TRW, worked on the first implementation of this model for the US Army in 1965. Pick considered the software to be in the public domain because it was written for the military, this was but the first dispute regarding MultiValue databases that was addressed by the courts. Ken Simms wrote DataBASIC, sometimes known as S-BASIC, in the mid-1970s. It was based on Dartmouth BASIC, but had enhanced features for data management. Simms played a lot of Star Trek (a text-based early computer game originally written in Dartmouth BASIC) while developing the language, to ensure that DataBASIC functioned to his satisfaction. Three of the implementations of MultiValue - PICK version R77, Microdata Reality 3.x, and Prime Information 1.0 - were very similar. In spite of attempts to standardize, particularly by International Spectrum and the Spectrum Manufacturers Association, who designed a logo for all to use, there are no standards across MultiValue implementations. Subsequently, these flavors diverged, although with some cross-over. These streams of MultiValue database development could be classified as one stemming from PICK R83, one from Microdata Reality, and one from Prime Information. Because of the differences, some implementations have provisions for supporting several flavors of the languages. An attempt to document the similarities and differences can be found at the Post-Relational Database Reference (PRDB). One reasonable hypothesis for this data model lasting 50 years, with new database implementations of the model even in the 21st century is that it provides inexpensive database solutions. == Data model example == In a MultiValue database system: a database or schema is called an "account" a table or collection is called a "file" a column or field is called a field or an "attribute", which is composed of "multi-value attributes" and "sub-value attributes" to store multiple values in the same attribute. a row or document is called a "record" or "item" Data is stored using two separate files: a "file" to store raw data and a "dictionary" to store the format for displaying the raw data. For example, assume there's a file (table) called "PERSON". In this file, there is an attribute called "eMailAddress". The eMailAddress field can store a variable number of email address values in a single record. The list [[email protected], [email protected], [email protected]] can be stored and accessed via a single query when accessing the associated record. Achieving the same (one-to-many) relationship within a traditional relational database system would include creating an additional table to store the variable number of email addresses associated with a single "PERSON" record. However, modern relational database systems support this multi-value data model too. For example, in PostgreSQL, a column can be an array of any base type. == MultiValue Basic Language == Multivalue Basic (now commonly styled as mvBasic) is a family of programming languages more or less common (and portable) to all the multivalue databases derived from the original Pick Operating System. The variations between implementations are known as flavours. The language originates from Dartmouth Basic and the earliest implementation of PickBASIC (now D3 FlashBasic). Over time various customisations and extensions have been added to take advantage of capabilities added to the different flavours while staying mainly in sync. mvBasic statements and functions are designed to access and take advantage of the multivalue database model and providing the usual capabilities of most modern languages. For example, cryptography and communications. mvBasic is typeless and lends itself to structured programming techniques. Example code is available but limited. Whilst there are commercial applications and tools available, the multivalue database community has not embraced the open source library/package model to the degree seen with other languages. The typical mvBasic compiler compiles program source to a P-code executable object and runs in an interpreter, with D3 FlashBasic and jBASE being notable exceptions. == MultiValue Query Language == Known as ENGLISH, ACCESS, AQL, UniQuery, Retrieve, CMQL, and by many other names over the years, corresponding to the different MultiValue implementations, the MultiValue query language differs from SQL in several respects. Each query is issued against a single dictionary within the schema, which could be understood as a virtual file or a portal to the database through which to view the data. LIST PEOPLE LAST_NAME FIRST_NAME EMAIL_ADDRESSES WITH LAST_NAME LIKE "Van..." The above statement would list all e-mail addresses for each person whose last name starts with "Van". A single entry would be output for each person, with multiple lines showing the multiple e-mail addresses (without repeating other data about the person).

    Read more →
  • Top 10 AI Text-to-video Tools Compared (2026)

    Top 10 AI Text-to-video Tools Compared (2026)

    Trying to pick the best AI text-to-video tool? An AI text-to-video tool is software that uses machine learning to help you get more done — it scales effortlessly from a single task to thousands. The best picks balance beginner-friendly simplicity with the depth power users need, and they ship updates often. Whether you are a beginner or a pro, the right AI text-to-video tool slots into your workflow and pays for itself fast. This guide breaks down the top picks, their pros and cons, and who each one is best for.

    Read more →
  • Jiliang Tang

    Jiliang Tang

    Jiliang Tang is a Chinese-born computer scientist and a University Foundation Professor of Computer Science and Engineering at Michigan State University, where he is the director of the Data Science and Engineering (DSE) Lab. His research expertise is in data mining and machine learning. == Education and career == He received his BEng in software engineering (2008) and MSc in computer science (2010) from the Beijing Institute of Technology, Beijing, China. His PhD is from Arizona State University (2015), under the direction of Huan Liu. After gaining his PhD, he worked as a research scientist at Yahoo Labs (2015–16) before joining Michigan State University as an assistant professor (2016). His research has mostly been published jointly with Huan Liu. It has received over thirteen thousand citations documented by Google Scholar, and has received coverage in the media. == Awards == He has received the 2020 ACM SIGKDD Rising Star Award that "aims to celebrate the early accomplishments of the SIGKDD communities' brightest new minds", NSF Career Award, and Michigan State University's Distinguished Withrow Research Award. == Selected publications == === Books === Jiliang Tang, Huan Liu. Trust in Social Media, (Synthesis digital library of engineering and computer science; Synthesis lectures on information security, privacy, and trust, # 13) Morgan & Claypool, 2015 ISBN 9781627054058 === Peer reviewed journal articles === Shu K, Sliva A, Wang S, Tang J, Liu H. Fake news detection on social media: A data mining perspective. ACM SIGKDD explorations newsletter. 2017 Sep 1;19(1):22-36. [1] Tang J, Alelyani S, Liu H. Feature selection for classification: A review. Data classification: Algorithms and applications. 2014:37. [2] Li J, Cheng K, Wang S, Morstatter F, Trevino RP, Tang J, Liu H. Feature selection: A data perspective. ACM Computing Surveys (CSUR). 2017 Dec 6;50(6):1-45. [3] Chang S, Han W, Tang J, Qi GJ, Aggarwal CC, Huang TS. Heterogeneous network embedding via deep architectures. InProceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining 2015 Aug 10 (pp. 119–128) Gao H, Tang J, Hu X, Liu H. Exploring temporal effects for location recommendation on location-based social networks. InProceedings of the 7th ACM conference on Recommender systems 2013 Oct 12 (pp. 93–100). Hu X, Tang J, Gao H, Liu H. Unsupervised sentiment analysis with emotional signals. InProceedings of the 22nd international conference on World Wide Web 2013 May 13 (pp. 607–618).

    Read more →