AI Art Queen

AI Art Queen — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • AI safety

    AI safety

    AI safety is an interdisciplinary field focused on preventing accidents, misuse, or other harmful consequences arising from artificial intelligence systems. It encompasses AI alignment (which aims to ensure AI systems behave as intended), monitoring AI systems for risks, and enhancing their robustness. The field is particularly concerned with existential risks posed by advanced AI models. Beyond technical research, AI safety involves developing norms and policies that promote safety, including advocacy for regulations at different levels of government. The field gained significant popularity in 2023, with rapid progress in generative AI and public concerns voiced by researchers and CEOs about potential dangers. During the 2023 AI Safety Summit, the United States and the United Kingdom both established their own AI Safety Institute. However, researchers have expressed concern that AI safety measures are not keeping pace with the rapid development of AI capabilities. == Motivations == Scholars discuss current risks from critical systems failures, bias, and AI-enabled surveillance, as well as emerging risks like technological unemployment, digital manipulation, weaponization, AI-enabled cyberattacks and bioterrorism. They also discuss speculative risks from losing control of future artificial general intelligence (AGI) agents, or from AI enabling perpetually stable dictatorships. === Existential safety === Some have criticized concerns about AGI, such as Andrew Ng who compared them in 2015 to "worrying about overpopulation on Mars when we have not even set foot on the planet yet". Stuart J. Russell on the other side urges caution, arguing that "it is better to anticipate human ingenuity than to underestimate it". AI researchers have widely differing opinions about the severity and primary sources of risk posed by AI technology – though surveys suggest that experts take high consequence risks seriously. In two surveys of AI researchers, the median respondent was optimistic about AI overall, but placed a 5% probability on an "extremely bad (e.g. human extinction)" outcome of advanced AI. In a 2022 survey of the natural language processing community, 37% agreed or weakly agreed that it is plausible that AI decisions could lead to a catastrophe that is "at least as bad as an all-out nuclear war". == History == Risks from AI began to be seriously discussed at the start of the computer age: Moreover, if we move in the direction of making machines which learn and whose behavior is modified by experience, we must face the fact that every degree of independence we give the machine is a degree of possible defiance of our wishes. In 1988 Blay Whitby published a book outlining the need for AI to be developed along ethical and socially responsible lines. From 2008 to 2009, the Association for the Advancement of Artificial Intelligence (AAAI) commissioned a study to explore and address potential long-term societal influences of AI research and development. The panel was generally skeptical of the radical views expressed by science-fiction authors but agreed that "additional research would be valuable on methods for understanding and verifying the range of behaviors of complex computational systems to minimize unexpected outcomes". In 2011, Roman Yampolskiy introduced the term "AI safety engineering" at the Philosophy and Theory of Artificial Intelligence conference, listing prior failures of AI systems and arguing that "the frequency and seriousness of such events will steadily increase as AIs become more capable". In 2014, philosopher Nick Bostrom published the book Superintelligence: Paths, Dangers, Strategies. He has the opinion that the rise of AGI has the potential to create various societal issues, ranging from the displacement of the workforce by AI, manipulation of political and military structures, to even the possibility of human extinction. His argument that future advanced systems may pose a threat to human existence prompted Elon Musk, Bill Gates, and Stephen Hawking to voice similar concerns. In 2015, dozens of artificial intelligence experts signed an open letter on artificial intelligence calling for research on the societal impacts of AI and outlining concrete directions. To date, the letter has been signed by over 8000 people including Yann LeCun, Shane Legg, Yoshua Bengio, and Stuart Russell. In the same year, a group of academics led by professor Stuart J. Russell founded the Center for Human-Compatible AI at the University of California Berkeley and the Future of Life Institute awarded $6.5 million in grants for research aimed at "ensuring artificial intelligence (AI) remains safe, ethical and beneficial". In 2016, the White House Office of Science and Technology Policy and Carnegie Mellon University announced The Public Workshop on Safety and Control for Artificial Intelligence, which was one of a sequence of four White House workshops aimed at investigating "the advantages and drawbacks" of AI. In the same year, Concrete Problems in AI Safety – one of the first and most influential technical AI Safety agendas – was published. In 2017, the Future of Life Institute sponsored the Asilomar Conference on Beneficial AI, where more than 100 thought leaders formulated principles for beneficial AI including "Race Avoidance: Teams developing AI systems should actively cooperate to avoid corner-cutting on safety standards". In 2018, the DeepMind Safety team outlined AI safety problems in specification, robustness, and assurance. The following year, researchers organized a workshop at ICLR that focused on these problem areas. In 2021, Unsolved Problems in ML Safety was published, outlining research directions in robustness, monitoring, alignment, and systemic safety. In 2023, Rishi Sunak said he wants the United Kingdom to be the "geographical home of global AI safety regulation" and to host the first global summit on AI safety. The AI safety summit took place in November 2023, and focused on the risks of misuse and loss of control associated with frontier AI models. During the summit the intention to create the International Scientific Report on the Safety of Advanced AI was announced. In 2024, The US and UK forged a new partnership on the science of AI safety. The MoU was signed on 1 April 2024 by US commerce secretary Gina Raimondo and UK technology secretary Michelle Donelan to jointly develop advanced AI model testing, following commitments announced at an AI Safety Summit in Bletchley Park in November. In 2025, an international team of 96 experts chaired by Yoshua Bengio published the first International AI Safety Report. The report, commissioned by 30 nations and the United Nations, represents the first global scientific review of potential risks associated with advanced artificial intelligence. It details potential threats stemming from misuse, malfunction, and societal disruption, with the objective of informing policy through evidence-based findings, without providing specific recommendations. == Research focus == AI safety research areas include robustness, monitoring, and alignment. === Robustness === ==== Adversarial robustness ==== AI systems are often vulnerable to adversarial examples or "inputs to machine learning (ML) models that an attacker has intentionally designed to cause the model to make a mistake". For example, in 2013, Szegedy et al. discovered that adding specific imperceptible perturbations to an image could cause it to be misclassified with high confidence. This continues to be an issue with neural networks, though in recent work the perturbations are generally large enough to be perceptible. The image on the right is predicted to be an ostrich after the perturbation is applied. (Left) is a correctly predicted sample, (center) perturbation applied magnified by 10x, (right) adversarial example. Adversarial robustness is often associated with security. Researchers demonstrated that an audio signal could be imperceptibly modified so that speech-to-text systems transcribe it to any message the attacker chooses. Network intrusion and malware detection systems also must be adversarially robust since attackers may design their attacks to fool detectors. Models that represent objectives (reward models) must also be adversarially robust. For example, a reward model might estimate how helpful a text response is and a language model might be trained to maximize this score. Researchers have shown that if a language model is trained for long enough, it will leverage the vulnerabilities of the reward model to achieve a better score and perform worse on the intended task. This issue can be addressed by improving the adversarial robustness of the reward model. More generally, any AI system used to evaluate another AI system must be adversarially robust. This could include monitoring tools, since they could also potentially be tampered with to produce a higher reward. Large language models (LLMs) can be vulnerable to prom

    Read more →
  • Douwe Kiela

    Douwe Kiela

    Douwe Kiela is a Dutch-American research scientist and entrepreneur working in the field of artificial intelligence with a focus on machine learning and natural language processing. He is a research scientist director at Google DeepMind. He previously co-founded and served as CEO of Contextual AI, an enterprise software company that provides a platform for building grounded AI agents for enterprise knowledge bases. He previously led the research team at Meta AI that introduced the RAG approach in 2020, co-authoring the foundational paper "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks." Kiela also served as Head of Research at Hugging Face and is an adjunct professor in Symbolic Systems at Stanford University. == Early life and education == Douwe Kiela was born in Amsterdam, Netherlands, in 1986. He earned a Bachelor of Science degree in Liberal Arts and Sciences from Utrecht University, with a double major in Cognitive Artificial Intelligence and Philosophy. He then obtained an MSc in logic (cum laude) from the University of Amsterdam's Institute for Logic, Language and Computation (ILLC). Kiela received an MPhil and PhD in Computer Science from the University of Cambridge, specializing in natural language processing and machine learning. == Career == === Facebook AI Research (Meta) === In 2016, Kiela joined Facebook AI Research (FAIR) as a postdoctoral researcher, later becoming a research scientist in New York. While at Meta, he co-authored papers in natural language processing, with a focus on multimodal and grounded language learning. His projects included creating a virtual assistant bot that could navigate tourists around a city and leading the development of Dynabench, an interactive benchmarking platform released in 2020 that used human feedback to test and improve language models. In 2020, Kiela led the Meta AI research team that introduced Retrieval-Augmented Generation (RAG), co-authoring the influential paper "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks," alongside Patrick Lewis, Ethan Perez, and other researchers. The RAG framework transformed how large language models access and incorporate external information by allowing them to retrieve relevant context from external knowledge bases at query time, rather than relying solely on pre-trained data. This approach addressed key limitations such as hallucination, outdated information, and lack of source attribution. The RAG technique has since become widely adopted in enterprise AI applications and knowledge-intensive natural language processing tasks. === Hugging Face === After leaving Meta, Kiela served as Head of Research at Hugging Face. === Contextual AI === In 2023, Kiela co-founded Contextual AI with Amanpreet Singh, another former researcher at Facebook AI Research and Hugging Face. The Mountain View-based company develops a platform for building grounded AI agents for enterprises, focusing on applications in technology, semiconductor, logistics, finance, and media sectors. Contextual AI raised $20 million in seed funding in June 2023, led by Bain Capital Ventures. In August 2024, the company completed an $80 million Series A funding round led by Greycroft, with participation from Bezos Expeditions, NVentures (Nvidia), HSBC Ventures, and Snowflake Ventures, among others. In May 2026, Kiela joined Google DeepMind as part of a licensing agreement between Google and Contextual AI under which more than 20 Contextual AI researchers joined DeepMind. Following his departure, Jay Chen became interim CEO of Contextual AI. === Academic roles === Douwe Kiela serves as an adjunct professor in Symbolic Systems at Stanford University. In a 2023 interview with the Stanford Daily, he commented on the development of Alpaca, a low-cost instruction-finetuned model based on Meta's LLaMA, and emphasized the importance of open academic research in large language models.

    Read more →
  • Top 10 AI Blog Writers Compared (2026)

    Top 10 AI Blog Writers Compared (2026)

    Comparing the best AI blog writer? An AI blog writer is software that uses machine learning to help you get more done — it lowers the barrier so anyone can produce professional output. Privacy matters too: check whether your data trains the model and whether a no-log or enterprise tier is available. Whether you are a beginner or a pro, the right AI blog writer slots into your workflow and pays for itself fast. We tested the leading options and ranked them by quality, value, and ease of use.

    Read more →
  • Markov information source

    Markov information source

    In mathematics, a Markov information source, or simply, a Markov source, is an information source whose underlying dynamics are given by a stationary finite Markov chain. == Formal definition == An information source is a sequence of random variables ranging over a finite alphabet Γ {\displaystyle \Gamma } , having a stationary distribution. A Markov information source is then a (stationary) Markov chain M {\displaystyle M} , together with a function f : S → Γ {\displaystyle f:S\to \Gamma } that maps states S {\displaystyle S} in the Markov chain to letters in the alphabet Γ {\displaystyle \Gamma } . A unifilar Markov source is a Markov source for which the values f ( s k ) {\displaystyle f(s_{k})} are distinct whenever each of the states s k {\displaystyle s_{k}} are reachable, in one step, from a common prior state. Unifilar sources are notable in that many of their properties are far more easily analyzed, as compared to the general case. == Applications == Markov sources are commonly used in communication theory, as a model of a transmitter. Markov sources also occur in natural language processing, where they are used to represent hidden meaning in a text. Given the output of a Markov source, whose underlying Markov chain is unknown, the task of solving for the underlying chain is undertaken by the techniques of hidden Markov models, such as the Viterbi algorithm.

    Read more →
  • Circle Hough Transform

    Circle Hough Transform

    The circle Hough Transform (CHT) is a basic feature extraction technique used in digital image processing for detecting circles in imperfect images. The circle candidates are produced by “voting” in the Hough parameter space and then selecting local maxima in an accumulator matrix. It is a specialization of the Hough transform. == Theory == In a two-dimensional space, a circle can be described by: ( x − a ) 2 + ( y − b ) 2 = r 2 ( 1 ) {\displaystyle \left(x-a\right)^{2}+\left(y-b\right)^{2}=r^{2}\ \ \ \ \ (1)} where (a,b) is the center of the circle, and r is the radius. If a 2D point (x,y) is fixed, then the parameters can be found according to (1). The parameter space would be three dimensional, (a, b, r). And all the parameters that satisfy (x, y) would lie on the surface of an inverted right-angled cone whose apex is at (x, y, 0). In the 3D space, the circle parameters can be identified by the intersection of many conic surfaces that are defined by points on the 2D circle. This process can be divided into two stages. The first stage is fixing radius then find the optimal center of circles in a 2D parameter space. The second stage is to find the optimal radius in a one dimensional parameter space. === Find parameters with known radius R === If the radius is fixed, then the parameter space would be reduced to 2D (the position of the circle center). For each point (x, y) on the original circle, it can define a circle centered at (x, y) with radius R according to (1). The intersection point of all such circles in the parameter space would be corresponding to the center point of the original circle. Consider 4 points on a circle in the original image (left). The circle Hough transform is shown in the right. Note that the radius is assumed to be known. For each (x,y) of the four points (white points) in the original image, it can define a circle in the Hough parameter space centered at (x, y) with radius r. An accumulator matrix is used for tracking the intersection point. In the parameter space, the voting number of those points that have a newly defined circle passing through them would be increased by one for every circle. Then the local maxima point (the red point in the center in the right figure) can be found. The position (a, b) of the maxima would be the center of the original circle. === Multiple circles with known radius R === Multiple circles with same radius can be found with the same technique. Note that, in the accumulator matrix (right fig), there would be at least 3 local maxima points. === Accumulator matrix and voting === In practice, an accumulator matrix is introduced to find the intersection point in the parameter space. First, we need to divide the parameter space into “buckets” using a grid and produce an accumulator matrix according to the grid. The element in the accumulator matrix denotes the number of “circles” in the parameter space that are passing through the corresponding grid cell in the parameter space. The number is also called “voting number”. Initially, every element in the matrix is zeros. Then for each “edge” point in the original space, we can formulate a circle in the parameter space and increase the voting number of the grid cell which the circle passes through. This process is called “voting”. After voting, we can find local maxima in the accumulator matrix. The positions of the local maxima are corresponding to the circle centers in the original space. === Find circle parameter with unknown radius === Since the parameter space is 3D, the accumulator matrix would be 3D, too. We can iterate through possible radii; for each radius, we use the previous technique. Finally, find the local maxima in the 3D accumulator matrix. Accumulator array should be A[x,y,r] in the 3D space. Voting should be for each pixels, radius and theta A[x,y,r] += 1 The algorithm : For each A[a,b,r] = 0; Process the filtering algorithm on image Gaussian Blurring, convert the image to grayscale ( grayScaling), make Canny operator, The Canny operator gives the edges on image. Vote on all possible circles in accumulator. The local maximum voted circles of Accumulator A gives the circle Hough space. The maximum voted circle of Accumulator gives the circle. The Incrementing for Best Candidate : For each A[a,b,r] = 0; // fill with zeroes initially, instantiate 3D matrix For each cell(x,y) For each theta t = 0 to 360 // the possible theta 0 to 360 b = y – r sin(t PI / 180); //polar coordinate for center (convert to radians) a = x – r cos(t PI / 180); //polar coordinate for center (convert to radians) A[a,b,r] +=1; //voting end end == Examples == === Find circles in a shoe-print === The original picture (right) is first turned into a binary image (left) using a threshold and Gaussian filter. Then edges (mid) are found from it using canny edge detection. After this, all the edge points are used by the Circle Hough Transform to find underlying circle structure. == Limitations == Since the parameter space of the CHT is three dimensional, it may require lots of storage and computation. Choosing a bigger grid size can ameliorate this problem. However, choosing an appropriate grid size is difficult. Since too coarse a grid can lead to large values of the vote being obtained falsely because many quite different structures correspond to a single bucket. Too fine a grid can lead to structures not being found because votes resulting from tokens that are not exactly aligned end up in different buckets, and no bucket has a large vote. Also, the CHT is not very robust to noise. == Extensions == === Adaptive Hough Transform === J. Illingworth and J. Kittler introduced this method for implementing Hough Transform efficiently. The AHT uses a small accumulator array and the idea of a flexible iterative "coarse to fine" accumulation and search strategy to identify significant peaks in the Hough parameter spaces. This method is substantially superior to the standard Hough Transform implementation in both storage and computational requirements. == Application == === People Counting === Since the head would be similar to a circle in an image, CHT can be used for detecting heads in a picture, so as to count the number of persons in the image. === Brain Aneurysm Detection === Modified Hough Circle Transform (MHCT) is used on the image extracted from Digital Subtraction Angiogram (DSA) to detect and classify aneurysms type. == Implementation code == Circle Detection via Standard Hough Transform, by Amin Sarafraz, Mathworks (File Exchange) Hough Circle Transform, OpenCV-Python Tutorials (archived version on archive.org)

    Read more →
  • Datacap

    Datacap

    Datacap (an IBM Company), a privately owned company, manufactures and sells computer software, and services. Datacap's first product, Paper Keyboard, was a "forms processing" product and shipped in 1989. In August 2010, IBM announced that it had acquired Datacap for an undisclosed amount. == Overview == Datacap sells products through a value-added distribution network worldwide. The software is classified as "enterprise software", meaning that it requires trained professionals to install and configure. Although the Company has focused on providing solutions for scanning paper documents, most recently Company materials have emphasized customer requirements to handle electronic documents ("eDocs"), documents being received into an organization electronically (usually email). Datacap claims that its software is unique because of the rules engine ("Rulerunner") used for processing inbound documents, including performing the image processing (deskew, noise removal, etc.), optical character recognition (OCR), intelligent character recognition (ICR), validations, and export-release formatting of extracted data to target ERP and line of business application.

    Read more →
  • How to Choose an AI Copywriting Tool

    How to Choose an AI Copywriting Tool

    Trying to pick the best AI copywriting tool? An AI copywriting tool is software that uses machine learning to help you get more done — it scales effortlessly from a single task to thousands. The best picks balance beginner-friendly simplicity with the depth power users need, and they ship updates often. Whether you are a beginner or a pro, the right AI copywriting tool slots into your workflow and pays for itself fast. Read on for hands-on impressions, pricing tiers, and the standout features that matter.

    Read more →
  • Is an AI Virtual Assistant Worth It in 2026?

    Is an AI Virtual Assistant Worth It in 2026?

    Shopping for the best AI virtual assistant? An AI virtual assistant is software that uses machine learning to help you get more done — it keeps getting smarter as the underlying models improve. Pricing, accuracy, and the size of the model behind the tool are the three factors that most affect daily usefulness. Whether you are a beginner or a pro, the right AI virtual assistant slots into your workflow and pays for itself fast. Below we compare features, pricing, and real output so you can choose with confidence.

    Read more →
  • Manifold hypothesis

    Manifold hypothesis

    The manifold hypothesis posits that many high-dimensional data sets that occur in the real world actually lie along low-dimensional latent manifolds inside that high-dimensional space. As a consequence of the manifold hypothesis, many data sets that appear to initially require many variables to describe, can actually be described by a comparatively small number of variables, linked to the local coordinate system of the underlying manifold. It is suggested that this principle underpins the effectiveness of machine learning algorithms in describing high-dimensional data sets by considering a few common features. The manifold hypothesis is related to the effectiveness of nonlinear dimensionality reduction techniques in machine learning. Many techniques of dimensional reduction make the assumption that data lies along a low-dimensional submanifold, such as manifold sculpting, manifold alignment, and manifold regularization. The major implications of this hypothesis is that Machine learning models only have to fit relatively simple, low-dimensional, highly structured subspaces within their potential input space (latent manifolds). Within one of these manifolds, it's always possible to interpolate between two inputs, that is to say, morph one into another via a continuous path along which all points fall on the manifold. The ability to interpolate between samples is the key to generalization in deep learning. == The information geometry of statistical manifolds == An empirically-motivated approach to the manifold hypothesis focuses on its correspondence with an effective theory for manifold learning under the assumption that robust machine learning requires encoding the dataset of interest using methods for data compression. This perspective gradually emerged using the tools of information geometry thanks to the coordinated effort of scientists working on the efficient coding hypothesis, predictive coding and variational Bayesian methods. The argument for reasoning about the information geometry on the latent space of distributions rests upon the existence and uniqueness of the Fisher information metric. In this general setting, we are trying to find a stochastic embedding of a statistical manifold. From the perspective of dynamical systems, in the big data regime this manifold generally exhibits certain properties such as homeostasis: We can sample large amounts of data from the underlying generative process. Machine Learning experiments are reproducible, so the statistics of the generating process exhibit stationarity. In a sense made precise by theoretical neuroscientists working on the free energy principle, the statistical manifold in question possesses a Markov blanket.

    Read more →
  • Markov chain central limit theorem

    Markov chain central limit theorem

    In the mathematical theory of random processes, the Markov chain central limit theorem has a conclusion somewhat similar in form to that of the classic central limit theorem (CLT) of probability theory, but the quantity in the role taken by the variance in the classic CLT has a more complicated definition. See also the general form of Bienaymé's identity. == Statement == Suppose that: the sequence X 1 , X 2 , X 3 , … {\textstyle X_{1},X_{2},X_{3},\ldots } of random elements of some set is a Markov chain that has a stationary probability distribution; and the initial distribution of the process, i.e. the distribution of X 1 {\textstyle X_{1}} , is the stationary distribution, so that X 1 , X 2 , X 3 , … {\textstyle X_{1},X_{2},X_{3},\ldots } are identically distributed. In the classic central limit theorem these random variables would be assumed to be independent, but here we have only the weaker assumption that the process has the Markov property; and g {\textstyle g} is some (measurable) real-valued function for which var ⁡ ( g ( X 1 ) ) < + ∞ . {\textstyle \operatorname {var} (g(X_{1}))<+\infty .} Now let μ = E ⁡ ( g ( X 1 ) ) , μ ^ n = 1 n ∑ k = 1 n g ( X k ) σ 2 := lim n → ∞ var ⁡ ( n μ ^ n ) = lim n → ∞ n var ⁡ ( μ ^ n ) = var ⁡ ( g ( X 1 ) ) + 2 ∑ k = 1 ∞ cov ⁡ ( g ( X 1 ) , g ( X 1 + k ) ) . {\displaystyle {\begin{aligned}\mu &=\operatorname {E} (g(X_{1})),\\{\widehat {\mu }}_{n}&={\frac {1}{n}}\sum _{k=1}^{n}g(X_{k})\\\sigma ^{2}&:=\lim _{n\to \infty }\operatorname {var} ({\sqrt {n}}{\widehat {\mu }}_{n})=\lim _{n\to \infty }n\operatorname {var} ({\widehat {\mu }}_{n})=\operatorname {var} (g(X_{1}))+2\sum _{k=1}^{\infty }\operatorname {cov} (g(X_{1}),g(X_{1+k})).\end{aligned}}} Then as n → ∞ , {\textstyle n\to \infty ,} we have n ( μ ^ n − μ ) → D Normal ( 0 , σ 2 ) , {\displaystyle {\sqrt {n}}({\hat {\mu }}_{n}-\mu )\ {\xrightarrow {\mathcal {D}}}\ {\text{Normal}}(0,\sigma ^{2}),} where the decorated arrow indicates convergence in distribution. == Monte Carlo Setting == The Markov chain central limit theorem can be guaranteed for functionals of general state space Markov chains under certain conditions. In particular, this can be done with a focus on Monte Carlo settings. An example of the application in a MCMC (Markov Chain Monte Carlo) setting is the following: Consider a simple hard spheres model on a grid. Suppose X = { 1 , … , n 1 } × { 1 , … , n 2 } ⊆ Z 2 {\displaystyle X=\{1,\ldots ,n_{1}\}\times \{1,\ldots ,n_{2}\}\subseteq Z^{2}} . A proper configuration on X {\displaystyle X} consists of coloring each point either black or white in such a way that no two adjacent points are white. Let χ {\displaystyle \chi } denote the set of all proper configurations on X {\displaystyle X} , N χ ( n 1 , n 2 ) {\displaystyle N_{\chi }(n_{1},n_{2})} be the total number of proper configurations and π be the uniform distribution on χ {\displaystyle \chi } so that each proper configuration is equally likely. Suppose our goal is to calculate the typical number of white points in a proper configuration; that is, if W ( x ) {\displaystyle W(x)} is the number of white points in x ∈ χ {\displaystyle x\in \chi } then we want the value of E π W = ∑ x ∈ χ W ( x ) N χ ( n 1 , n 2 ) {\displaystyle E_{\pi }W=\sum _{x\in \chi }{\frac {W(x)}{N_{\chi }{\bigl (}n_{1},n_{2}{\bigr )}}}} If n 1 {\displaystyle n_{1}} and n 2 {\displaystyle n_{2}} are even moderately large then we will have to resort to an approximation to E π W {\displaystyle E_{\pi }W} . Consider the following Markov chain on χ {\displaystyle \chi } . Fix p ∈ ( 0 , 1 ) {\displaystyle p\in (0,1)} and set X 1 = x 1 {\displaystyle X_{1}=x_{1}} where x 1 ∈ χ {\displaystyle x_{1}\in \chi } is an arbitrary proper configuration. Randomly choose a point ( x , y ) ∈ X {\displaystyle (x,y)\in X} and independently draw U ∼ U n i f o r m ( 0 , 1 ) {\displaystyle U\sim \mathrm {Uniform} (0,1)} . If u ≤ p {\displaystyle u\leq p} and all of the adjacent points are black then color ( x , y ) {\displaystyle (x,y)} white leaving all other points alone. Otherwise, color ( x , y ) {\displaystyle (x,y)} black and leave all other points alone. Call the resulting configuration X 1 {\displaystyle X_{1}} . Continuing in this fashion yields a Harris ergodic Markov chain { X 1 , X 2 , X 3 , … } {\displaystyle \{X_{1},X_{2},X_{3},\ldots \}} having π {\displaystyle \pi } as its invariant distribution. It is now a simple matter to estimate E π W {\displaystyle E_{\pi }W} with w n ¯ = ∑ i = 1 n W ( X i ) / n {\displaystyle {\overline {w_{n}}}=\sum _{i=1}^{n}W(X_{i})/n} . Also, since χ {\displaystyle \chi } is finite (albeit potentially large) it is well known that X {\displaystyle X} will converge exponentially fast to π {\displaystyle \pi } which implies that a CLT holds for w n ¯ {\displaystyle {\overline {w_{n}}}} . == Implications == Not taking into account the additional terms in the variance which stem from correlations (e.g. serial correlations in markov chain monte carlo simulations) can result in the problem of pseudoreplication when computing e.g. the confidence intervals for the sample mean.

    Read more →
  • The Best Free AI Blog Writer for Beginners

    The Best Free AI Blog Writer for Beginners

    Looking for the best AI blog writer? An AI blog writer is software that uses machine learning to help you get more done — it can save you hours every week by automating repetitive work. Most options offer a generous free tier, with paid plans unlocking higher limits, faster processing, and team features. Whether you are a beginner or a pro, the right AI blog writer slots into your workflow and pays for itself fast. Read on for hands-on impressions, pricing tiers, and the standout features that matter.

    Read more →
  • Edward Stabler

    Edward Stabler

    Edward Stabler is a Professor of Linguistics at the University of California, Los Angeles. His primary areas of research are (1) Natural Language Processing (NLP), (2) Parsing and formal language theory, and (3) Philosophy of Logic and Language. He was a member of the faculty at UCLA from 1984 to 2016. His work involves the production of software for minimalist grammars (MGs) and related systems. == Early life and education == Stabler received his Ph.D. from the Department of Linguistics and Philosophy at MIT in 1981. == Recent publications == Edward Stabler (2011) Computational perspectives on minimalism. Revised version in C. Boeckx, ed, Oxford Handbook of Linguistic Minimalism, pp. 617–642. Edward Stabler (2010) A defense of this perspective against the Evans&Levinson critique appears here, with revised version in Lingua 120(12): 2680-2685. Edward Stabler (2010) After GB. Revised version in J. van Benthem & A. ter Meulen, eds, Handbook of Logic and Language, pp. 395–414. Edward Stabler (2010) Recursion in grammar and performance. Presented at the 2009 UMass recursion conference. Edward Stabler (2009) Computational models of language universals. Revised version appears in M. H. Christiansen, C. Collins, and S. Edelman, eds., Language Universals, Oxford: Oxford University Press, pages 200-223. Edward Stabler (2008) Tupled pregroup grammars. Revised version appears in P. Casadio and J. Lambek, eds., Computational Algebraic Approaches to Natural Language, Milan: Polimetrica, pages 23–52. Edward Stabler (2006) Sidewards without copying. Proceedings of the 11th Conference on Formal Grammar, edited by P. Monachesi, G. Penn, G. Satta, and S. Wintner. Stanford: CSLI Publications, 2006, pages 133-146.

    Read more →
  • Inverse consistency

    Inverse consistency

    In image registration, inverse consistency measures the consistency of mappings between images produced by a registration algorithm. The inverse consistency error, introduced by Christiansen and Johnson in 2001, quantifies the distance between the composition of the mappings from each image to the other, produced by the registration procedure, and the identity function, and is used as a regularisation constraint in the loss function of many registration algorithms to enforce consistent mappings. Inverse consistency is necessary for good image registration but it is not sufficient, since a mapping can be perfectly consistent but not register the images at all. == Definition == Image registration is the process of establishing a common coordinate system between two images, and given two images I 1 : Ω 1 → R I 2 : Ω 2 → R {\displaystyle {\begin{aligned}I_{1}:\Omega _{1}\to \mathbb {R} \\I_{2}:\Omega _{2}\to \mathbb {R} \end{aligned}}} registering a source image I 1 {\displaystyle I_{1}} to a target image I 2 {\displaystyle I_{2}} consists of determining a transformation f 1 : Ω 2 → Ω 1 {\displaystyle f_{1}:\Omega _{2}\to \Omega _{1}} that maps points from the target space to the source space. An ideal registration algorithm should not be sensitive to which image in the pair is used as source or target, and the registration operator should be antisymmetric such that the mappings f 1 : Ω 2 → Ω 1 f 2 : Ω 1 → Ω 2 {\displaystyle {\begin{aligned}f_{1}:\Omega _{2}\to \Omega _{1}\\f_{2}:\Omega _{1}\to \Omega _{2}\end{aligned}}} produced when registering I 1 {\displaystyle I_{1}} to I 2 {\displaystyle I_{2}} and I 2 {\displaystyle I_{2}} to I 1 {\displaystyle I_{1}} respectively should be the inverse of each other, i.e. f 2 = f 1 − 1 {\displaystyle f_{2}=f_{1}^{-1}} and f 1 = f 2 − 1 {\displaystyle f_{1}=f_{2}^{-1}} or, equivalently, f 2 ∘ f 1 = id Ω 2 {\displaystyle f_{2}\circ f_{1}=\operatorname {id} _{\Omega _{2}}} and f 1 ∘ f 2 = id Ω 1 {\displaystyle f_{1}\circ f_{2}=\operatorname {id} _{\Omega _{1}}} , where ∘ {\displaystyle \circ } denotes the function composition operator. Real algorithms are not perfect, and when swapping the role of source and target image in a registration problem the so obtained transformations are not the inverse of each other. Inverse consistency can be enforced by adding to the loss function of the registration a symmetric regularisation term that penalises inconsistent transformations ∫ Ω 2 ‖ f 2 ( f 1 ( x ) ) − x ‖ 2 d x + ∫ Ω 1 ‖ f 1 ( f 2 ( x ) ) − x ‖ 2 d x . {\displaystyle \int _{\Omega _{2}}\left\Vert f_{2}(f_{1}(x))-x\right\Vert ^{2}\mathrm {d} x+\int _{\Omega _{1}}\left\Vert f_{1}(f_{2}(x))-x\right\Vert ^{2}\mathrm {d} x.} Inverse consistency can be used as a quality metric to evaluate image registration results. The inverse consistency error ( I C E {\displaystyle ICE} ) measures the distance between the composition of the two transforms and the identity function, and it can be formulated in terms of both average ( I C E a {\displaystyle ICE_{a}} ) or maximum ( I C E m {\displaystyle ICE_{m}} ) over a region of interest Ω {\displaystyle \Omega } of the image: I C E a = 1 ∫ Ω d x ∫ Ω ‖ f 2 ( f 1 ( x ) ) − x ‖ d x I C E m = max x ∈ Ω ‖ f 2 ( f 1 ( x ) ) − x ‖ . {\displaystyle {\begin{aligned}ICE_{a}&={\frac {1}{\int _{\Omega }\mathrm {d} x}}\int _{\Omega }\left\Vert f_{2}(f_{1}(x))-x\right\Vert \mathrm {d} x\\ICE_{m}&=\max _{x\in \Omega }\left\Vert f_{2}(f_{1}(x))-x\right\Vert .\end{aligned}}} While inverse consistency is a necessary property of good registration algorithms, inverse consistency error alone is not a sufficient metric to evaluate the quality of image registration results, since a perfectly consistent mapping, with no other constraint, may be not even close to correctly register a pair of images.

    Read more →
  • Probabilistic context-free grammar

    Probabilistic context-free grammar

    In theoretical linguistics and computational linguistics, probabilistic context free grammars (PCFGs) extend context-free grammars, similar to how hidden Markov models extend regular grammars. Each production is assigned a probability. The probability of a derivation (parse) is the product of the probabilities of the productions used in that derivation. These probabilities can be viewed as parameters of the model, and for large problems it is convenient to learn these parameters via machine learning. A probabilistic grammar's validity is constrained by context of its training dataset. PCFGs originated from grammar theory, and have application in areas as diverse as natural language processing to the study the structure of RNA molecules and design of programming languages. Designing efficient PCFGs has to weigh factors of scalability and generality. Issues such as grammar ambiguity must be resolved. The grammar design affects results accuracy. Grammar parsing algorithms have various time and memory requirements. == Definitions == Derivation: The process of recursive generation of strings from a grammar. Parsing: Finding a valid derivation using an automaton. Parse Tree: The alignment of the grammar to a sequence. An example of a parser for PCFG grammars is the pushdown automaton. The algorithm parses grammar nonterminals from left to right in a stack-like manner. This brute-force approach is not very efficient. In RNA secondary structure prediction variants of the Cocke–Younger–Kasami (CYK) algorithm provide more efficient alternatives to grammar parsing than pushdown automata. Another example of a PCFG parser is the Stanford Statistical Parser which has been trained using Treebank. == Formal definition == Similar to a CFG, a probabilistic context-free grammar G can be defined by a quintuple: G = ( M , T , R , S , P ) {\displaystyle G=(M,T,R,S,P)} where M is the set of non-terminal symbols T is the set of terminal symbols R is the set of production rules S is the start symbol P is the set of probabilities on production rules == Relation with hidden Markov models == PCFGs models extend context-free grammars the same way as hidden Markov models extend regular grammars. The Inside-Outside algorithm is an analogue of the Forward-Backward algorithm. It computes the total probability of all derivations that are consistent with a given sequence, based on some PCFG. This is equivalent to the probability of the PCFG generating the sequence, and is intuitively a measure of how consistent the sequence is with the given grammar. The Inside-Outside algorithm is used in model parametrization to estimate prior frequencies observed from training sequences in the case of RNAs. Dynamic programming variants of the CYK algorithm find the Viterbi parse of a RNA sequence for a PCFG model. This parse is the most likely derivation of the sequence by the given PCFG. == Grammar construction == Context-free grammars are represented as a set of rules inspired from attempts to model natural languages. The rules are absolute and have a typical syntax representation known as Backus–Naur form. The production rules consist of terminal { a , b } {\displaystyle \left\{a,b\right\}} and non-terminal S symbols and a blank ϵ {\displaystyle \epsilon } may also be used as an end point. In the production rules of CFG and PCFG the left side has only one nonterminal whereas the right side can be any string of terminal or nonterminals. In PCFG nulls are excluded. An example of a grammar: S → a S , S → b S , S → ϵ {\displaystyle S\to aS,S\to bS,S\to \epsilon } This grammar can be shortened using the '|' ('or') character into: S → a S | b S | ϵ {\displaystyle S\to aS|bS|\epsilon } Terminals in a grammar are words and through the grammar rules a non-terminal symbol is transformed into a string of either terminals and/or non-terminals. The above grammar is read as "beginning from a non-terminal S the emission can generate either a or b or ϵ {\displaystyle \epsilon } ". Its derivation is: S ⇒ a S ⇒ a b S ⇒ a b b S ⇒ a b b {\displaystyle S\Rightarrow aS\Rightarrow abS\Rightarrow abbS\Rightarrow abb} Ambiguous grammar may result in ambiguous parsing if applied on homographs since the same word sequence can have more than one interpretation. Pun sentences such as the newspaper headline "Iraqi Head Seeks Arms" are an example of ambiguous parses. One strategy of dealing with ambiguous parses (originating with grammarians as early as Pāṇini) is to add yet more rules, or prioritize them so that one rule takes precedence over others. This, however, has the drawback of proliferating the rules, often to the point where they become difficult to manage. Another difficulty is overgeneration, where unlicensed structures are also generated. Probabilistic grammars circumvent these problems by ranking various productions on frequency weights, resulting in a "most likely" (winner-take-all) interpretation. As usage patterns are altered in diachronic shifts, these probabilistic rules can be re-learned, thus updating the grammar. Assigning probability to production rules makes a PCFG. These probabilities are informed by observing distributions on a training set of similar composition to the language to be modeled. On most samples of broad language, probabilistic grammars where probabilities are estimated from data typically outperform hand-crafted grammars. CFGs when contrasted with PCFGs are not applicable to RNA structure prediction because while they incorporate sequence-structure relationship they lack the scoring metrics that reveal a sequence structural potential == Weighted context-free grammar == A weighted context-free grammar (WCFG) is a more general category of context-free grammar, where each production has a numeric weight associated with it. The weight of a specific parse tree in a WCFG is the product (or sum ) of all rule weights in the tree. Each rule weight is included as often as the rule is used in the tree. A special case of WCFGs are PCFGs, where the weights are (logarithms of ) probabilities. An extended version of the CYK algorithm can be used to find the "lightest" (least-weight) derivation of a string given some WCFG. When the tree weight is the product of the rule weights, WCFGs and PCFGs can express the same set of probability distributions. == Applications == === RNA structure prediction === Since the 1990s, PCFG has been applied to model RNA structures. Energy minimization and PCFG provide ways of predicting RNA secondary structure with comparable performance. However structure prediction by PCFGs is scored probabilistically rather than by minimum free energy calculation. PCFG model parameters are directly derived from frequencies of different features observed in databases of RNA structures rather than by experimental determination as is the case with energy minimization methods. The types of various structure that can be modeled by a PCFG include long range interactions, pairwise structure and other nested structures. However, pseudoknots can not be modeled. PCFGs extend CFG by assigning probabilities to each production rule. A maximum probability parse tree from the grammar implies a maximum probability structure. Since RNAs preserve their structures over their primary sequence, RNA structure prediction can be guided by combining evolutionary information from comparative sequence analysis with biophysical knowledge about a structure plausibility based on such probabilities. Also search results for structural homologs using PCFG rules are scored according to PCFG derivations probabilities. Therefore, building grammar to model the behavior of base-pairs and single-stranded regions starts with exploring features of structural multiple sequence alignment of related RNAs. S → a S a | b S b | a a | b b {\displaystyle S\to aSa|bSb|aa|bb} The above grammar generates a string in an outside-in fashion, that is the basepair on the furthest extremes of the terminal is derived first. So a string such as a a b a a b a a {\displaystyle aabaabaa} is derived by first generating the distal a's on both sides before moving inwards: S ⇒ a S a ⇒ a a S a a ⇒ a a b S b a a ⇒ a a b a a b a a {\displaystyle S\Rightarrow aSa\Rightarrow aaSaa\Rightarrow aabSbaa\Rightarrow aabaabaa} A PCFG model extendibility allows constraining structure prediction by incorporating expectations about different features of an RNA . Such expectation may reflect for example the propensity for assuming a certain structure by an RNA. However incorporation of too much information may increase PCFG space and memory complexity and it is desirable that a PCFG-based model be as simple as possible. Every possible string x a grammar generates is assigned a probability weight P ( x | θ ) {\displaystyle P(x|\theta )} given the PCFG model θ {\displaystyle \theta } . It follows that the sum of all probabilities to all possible grammar productions is ∑ x P ( x | θ ) = 1 {\displaystyle \sum _{\text{x}}P(x|\theta )=1} . The scores

    Read more →
  • Hebbian theory

    Hebbian theory

    Hebbian theory is a neuropsychological theory claiming that an increase in synaptic efficacy arises from a presynaptic cell's repeated and persistent stimulation of a postsynaptic cell. It is an attempt to explain synaptic plasticity, the adaptation of neurons during the learning process. Hebbian theory was introduced by Donald Hebb in his 1949 book The Organization of Behavior. The theory is also called Hebb's rule, Hebb's law, Hebb's postulate, and cell assembly theory. Hebb states it as follows: Let us assume that the persistence or repetition of a reverberatory activity (or "trace") tends to induce lasting cellular changes that add to its stability. ... When an axon of cell A is near enough to excite a cell B and repeatedly or persistently takes part in firing it, some growth process or metabolic change takes place in one or both cells such that A's efficiency, as one of the cells firing B, is increased. The theory is often summarized as "Neurons that fire together, wire together." However, Hebb emphasized that cell A needs to "take part in firing" cell B, and such causality can occur only if cell A fires just before, not at the same time as, cell B. This aspect of causation in Hebb's work foreshadowed what is now known about spike-timing-dependent plasticity, which requires temporal precedence. Hebbian theory attempts to explain associative or Hebbian learning, in which simultaneous activation of cells leads to pronounced increases in synaptic strength between those cells. It also provides a biological basis for errorless learning methods for education and memory rehabilitation. In the study of neural networks in cognitive function, it is often regarded as the neuronal basis of unsupervised learning. == Engrams, cell assembly theory, and learning == Hebbian theory provides an explanation for how neurons might connect to become engrams, which may be stored in overlapping cell assemblies, or groups of neurons that encode specific information. Initially created as a way to explain recurrent activity in specific groups of cortical neurons, Hebb's theories on the form and function of cell assemblies can be understood from the following: The general idea is an old one, that any two cells or systems of cells that are repeatedly active at the same time will tend to become 'associated' so that activity in one facilitates activity in the other. Hebb also wrote: When one cell repeatedly assists in firing another, the axon of the first cell develops synaptic knobs (or enlarges them if they already exist) in contact with the soma of the second cell. D. Alan Allport posits additional ideas regarding cell assembly theory and its role in forming engrams using the concept of auto-association, or the brain's ability to retrieve information based on a partial cue, described as follows: If the inputs to a system cause the same pattern of activity to occur repeatedly, the set of active elements constituting that pattern will become increasingly strongly inter-associated. That is, each element will tend to turn on every other element and (with negative weights) to turn off the elements that do not form part of the pattern. To put it another way, the pattern as a whole will become 'auto-associated'. We may call a learned (auto-associated) pattern an engram. Research conducted in the laboratory of Nobel laureate Eric Kandel has provided evidence supporting the role of Hebbian learning mechanisms at synapses in the marine gastropod Aplysia californica. Because synapses in the peripheral nervous system of marine invertebrates are much easier to control in experiments, Kandel's research found that Hebbian long-term potentiation along with activity-dependent presynaptic facilitation are both necessary for synaptic plasticity and classical conditioning in Aplysia californica. While research on invertebrates has established fundamental mechanisms of learning and memory, much of the work on long-lasting synaptic changes between vertebrate neurons involves the use of non-physiological experimental stimulation of brain cells. However, some of the physiologically relevant synapse modification mechanisms that have been studied in vertebrate brains do seem to be examples of Hebbian processes. One such review indicates that long-lasting changes in synaptic strengths can be induced by physiologically relevant synaptic activity using both Hebbian and non-Hebbian mechanisms. == Principles == In artificial neurons and artificial neural networks, Hebb's principle can be described as a method of determining how to alter the weights between model neurons. The weight between two neurons increases if the two neurons activate simultaneously, and reduces if they activate separately. Nodes that tend to be either both positive or both negative at the same time have strong positive weights, while those that tend to be opposite have strong negative weights. The following is a formulaic description of Hebbian learning (many other descriptions are possible): w i j = x i x j , {\displaystyle \,w_{ij}=x_{i}x_{j},} where w i j {\displaystyle w_{ij}} is the weight of the connection from neuron j {\displaystyle j} to neuron i {\displaystyle i} , and x i {\displaystyle x_{i}} is the input for neuron i {\displaystyle i} . This is an example of pattern learning, where weights are updated after every training example. In a Hopfield network, connections w i j {\displaystyle w_{ij}} are set to zero if i = j {\displaystyle i=j} (no reflexive connections allowed). With binary neurons (activations either 0 or 1), connections would be set to 1 if the connected neurons have the same activation for a pattern. When several training patterns are used, the expression becomes an average of the individuals: w i j = 1 p ∑ k = 1 p x i k x j k , {\displaystyle w_{ij}={\frac {1}{p}}\sum _{k=1}^{p}x_{i}^{k}x_{j}^{k},} where w i j {\displaystyle w_{ij}} is the weight of the connection from neuron j {\displaystyle j} to neuron i {\displaystyle i} , p {\displaystyle p} is the number of training patterns and x i k {\displaystyle x_{i}^{k}} the k {\displaystyle k} -th input for neuron i {\displaystyle i} . This is learning by epoch, with weights updated after all the training examples are presented and is last term applicable to both discrete and continuous training sets. Again, in a Hopfield network, connections w i j {\displaystyle w_{ij}} are set to zero if i = j {\displaystyle i=j} (no reflexive connections). A variation of Hebbian learning that takes into account phenomena such as blocking and other neural learning phenomena is the mathematical model of Harry Klopf. Klopf's model assumes that parts of a system with simple adaptive mechanisms can underlie more complex systems with more advanced adaptive behavior, such as neural networks. == Relationship to unsupervised learning, stability, and generalization == Because of the simple nature of Hebbian learning, based only on the coincidence of pre- and post-synaptic activity, it may not be intuitively clear why this form of plasticity leads to meaningful learning. However, it can be shown that Hebbian plasticity does pick up the statistical properties of the input in a way that can be categorized as unsupervised learning. This can be mathematically shown in a simplified example. Let us work under the simplifying assumption of a single rate-based neuron of rate y ( t ) {\displaystyle y(t)} , whose inputs have rates x 1 ( t ) . . . x N ( t ) {\displaystyle x_{1}(t)...x_{N}(t)} . The response of the neuron y ( t ) {\displaystyle y(t)} is usually described as a linear combination of its input, ∑ i w i x i {\displaystyle \sum _{i}w_{i}x_{i}} , followed by a response function f {\displaystyle f} : y = f ( ∑ i = 1 N w i x i ) . {\displaystyle y=f\left(\sum _{i=1}^{N}w_{i}x_{i}\right).} As defined in the previous sections, Hebbian plasticity describes the evolution in time of the synaptic weight w {\displaystyle w} : d w i d t = η x i y . {\displaystyle {\frac {dw_{i}}{dt}}=\eta x_{i}y.} Assuming, for simplicity, an identity response function f ( a ) = a {\displaystyle f(a)=a} , we can write d w i d t = η x i ∑ j = 1 N w j x j {\displaystyle {\frac {dw_{i}}{dt}}=\eta x_{i}\sum _{j=1}^{N}w_{j}x_{j}} or in matrix form: d w d t = η x x T w . {\displaystyle {\frac {d\mathbf {w} }{dt}}=\eta \mathbf {x} \mathbf {x} ^{T}\mathbf {w} .} As in the previous chapter, if training by epoch is done an average ⟨ … ⟩ {\displaystyle \langle \dots \rangle } over discrete or continuous (time) training set of x {\displaystyle \mathbf {x} } can be done: d w d t = ⟨ η x x T w ⟩ = η ⟨ x x T ⟩ w = η C w . {\displaystyle {\frac {d\mathbf {w} }{dt}}=\langle \eta \mathbf {x} \mathbf {x} ^{T}\mathbf {w} \rangle =\eta \langle \mathbf {x} \mathbf {x} ^{T}\rangle \mathbf {w} =\eta C\mathbf {w} .} where C = ⟨ x x T ⟩ {\displaystyle C=\langle \,\mathbf {x} \mathbf {x} ^{T}\rangle } is the correlation matrix of the input under the additional assumption that ⟨ x ⟩ = 0 {\displaystyle \langle \mathbf

    Read more →