AI Chat Bots Roleplay

AI Chat Bots Roleplay — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Image analysis

    Image analysis

    Image analysis or imagery analysis is the extraction of meaningful information from images; mainly from digital images by means of digital image processing techniques. Image analysis tasks can be as simple as reading bar coded tags or as sophisticated as identifying a person from their face. Computers are indispensable for the analysis of large amounts of data, for tasks that require complex computation, or for the extraction of quantitative information. On the other hand, the human visual cortex is an excellent image analysis apparatus, especially for extracting higher-level information, and for many applications — including medicine, security, and remote sensing — human analysts still cannot be replaced by computers. For this reason, many important image analysis tools such as edge detectors and neural networks are inspired by human visual perception models. == Digital == Digital Image Analysis or Computer Image Analysis is when a computer or electrical device automatically studies an image to obtain useful information from it. Note that the device is often a computer but may also be an electrical circuit, a digital camera or a mobile phone. It involves the fields of computer or machine vision, and medical imaging, and makes heavy use of pattern recognition, digital geometry, and signal processing. This field of computer science developed in the 1950s at academic institutions such as the MIT A.I. Lab, originally as a branch of artificial intelligence and robotics. It is the quantitative or qualitative characterization of two-dimensional (2D) or three-dimensional (3D) digital images. 2D images are, for example, to be analyzed in computer vision, and 3D images in medical imaging. The field was established in the 1950s—1970s, for example with pioneering contributions by Azriel Rosenfeld, Herbert Freeman, Jack E. Bresenham, or King-Sun Fu. == Techniques == There are many different techniques used in automatically analysing images. Each technique may be useful for a small range of tasks, however there still aren't any known methods of image analysis that are generic enough for wide ranges of tasks, compared to the abilities of a human's image analysing capabilities. Examples of image analysis techniques in different fields include: 2D and 3D object recognition, image segmentation, motion detection e.g. Single particle tracking, video tracking, optical flow, medical scan analysis, 3D Pose Estimation. == Deep learning == Since the early 2010s, deep learning methods have substantially advanced the field of image analysis. In 2012, a deep convolutional neural network (CNN) known as AlexNet achieved a significant reduction in error rates on the ImageNet large-scale image classification benchmark, demonstrating the effectiveness of deep learning for visual recognition tasks. Subsequent architectures such as ResNet introduced residual connections that enabled training of much deeper networks, further improving accuracy across image analysis tasks. Real-time object detection became practical with frameworks such as YOLO (You Only Look Once), which unified detection and classification into a single network pass. In 2020, the Vision Transformer (ViT) demonstrated that transformer architectures, originally developed for natural language processing, could achieve competitive results on image classification when applied directly to sequences of image patches. More recently, foundation models trained on large-scale datasets have enabled zero-shot generalisation across image analysis tasks. The Segment Anything Model (SAM), trained on over one billion masks, can segment arbitrary objects in images without task-specific fine-tuning. These advances have made image analysis techniques increasingly accessible through browser-based tools and open-source implementations. == Applications == The applications of digital image analysis are continuously expanding through all areas of science and industry, including: anatomy, allows for precise measurements, visualization, and statistical analysis of anatomical structures. assay micro plate reading, such as detecting where a chemical was manufactured. astronomy, such as calculating the size of a planet. automated species identification (e.g. plant and animal species) defense error level analysis filtering machine vision, such as to automatically count items in a factory conveyor belt. materials science, such as determining if a metal weld has cracks. medicine, such as detecting cancer in a mammography scan. metallography, such as determining the mineral content of a rock sample. microscopy, such as counting the germs in a swab. automatic number plate recognition; optical character recognition, such as automatic license plate detection. remote sensing, such as detecting intruders in a house, and producing land cover/land use maps. robotics, such as to avoid steering into an obstacle. security, such as detecting a person's eye color or hair color. == Object-based == Object-based image analysis (OBIA) involves two typical processes, segmentation and classification. Segmentation helps to group pixels into homogeneous objects. The objects typically correspond to individual features of interest, although over-segmentation or under-segmentation is very likely. Classification then can be performed at object levels, using various statistics of the objects as features in the classifier. Statistics can include geometry, context and texture of image objects. Over-segmentation is often preferred over under-segmentation when classifying high-resolution images. Object-based image analysis has been applied in many fields, such as cell biology, medicine, earth sciences, and remote sensing. For example, it can detect changes of cellular shapes in the process of cell differentiation.; it has also been widely used in the mapping community to generate land cover. When applied to earth images, OBIA is known as geographic object-based image analysis (GEOBIA), defined as "a sub-discipline of geoinformation science devoted to (...) partitioning remote sensing (RS) imagery into meaningful image-objects, and assessing their characteristics through spatial, spectral and temporal scale". The international GEOBIA conference has been held biannually since 2006. OBIA techniques are implemented in software such as eCognition or the Orfeo toolbox.

    Read more →
  • Neural operators

    Neural operators

    Neural operators are a class of deep learning architectures designed to learn maps between infinite-dimensional function spaces. Neural operators represent an extension of traditional artificial neural networks, marking a departure from the typical focus on learning mappings between finite-dimensional Euclidean spaces or finite sets. Neural operators directly learn operators between function spaces; they can receive input functions, and the output function can be evaluated at any discretization. The primary application of neural operators is in learning surrogate maps for the solution operators of partial differential equations (PDEs), which are critical tools in modeling the natural environment. Standard PDE solvers can be time-consuming and computationally intensive, especially for complex systems. Neural operators have demonstrated improved performance in solving PDEs compared to existing machine learning methodologies while being significantly faster than numerical solvers. Neural operators have also been applied to various scientific and engineering disciplines such as turbulent flow modeling, computational mechanics, graph-structured data, and the geosciences. In particular, they have been applied to learning stress-strain fields in materials, classifying complex data like spatial transcriptomics, predicting multiphase flow in porous media, and carbon dioxide migration simulations. Finally, the operator learning paradigm allows learning maps between function spaces, and is different from parallel ideas of learning maps from finite-dimensional spaces to function spaces, and subsumes these settings as special cases when limited to a fixed input resolution. == Operator learning == Understanding and mapping relationships between function spaces has many applications in engineering and the sciences. In particular, one can cast the problem of solving partial differential equations as identifying a map between function spaces, such as from an initial condition to a time-evolved state. In other PDEs this map takes an input coefficient function and outputs a solution function. Operator learning is a machine learning paradigm to learn solution operators mapping the input function to the output function . Using traditional machine learning methods, addressing this problem would involve discretizing the infinite-dimensional input and output function spaces into finite-dimensional grids and applying standard learning models, such as neural networks. This approach reduces the operator learning to finite-dimensional function learning and has some limitations, such as generalizing to discretizations beyond the grid used in training. The primary properties of neural operators that differentiate them from traditional neural networks is discretization invariance and discretization convergence. Unlike conventional neural networks, which are fixed on the discretization of training data, neural operators can adapt to various discretizations without re-training. This property improves the robustness and applicability of neural operators in different scenarios, providing consistent performance across different resolutions and grids. == Definition and formulation == Architecturally, neural operators are similar to feed-forward neural networks in the sense that they are composed of alternating linear maps and non-linearities. Since neural operators act on and output functions, neural operators have been instead formulated as a sequence of alternating linear integral operators on function spaces and point-wise non-linearities. Using an analogous architecture to finite-dimensional neural networks, similar universal approximation theorems have been proven for neural operators. In particular, it has been shown that neural operators can approximate any continuous operator on a compact set. Neural operators seek to approximate some operator G : A → U {\displaystyle {\mathcal {G}}:{\mathcal {A}}\to {\mathcal {U}}} between function spaces A {\displaystyle {\mathcal {A}}} and U {\displaystyle {\mathcal {U}}} by building a parametric map G ϕ : A → U {\displaystyle {\mathcal {G}}_{\phi }:{\mathcal {A}}\to {\mathcal {U}}} . Such parametric maps G ϕ {\displaystyle {\mathcal {G}}_{\phi }} can generally be defined in the form G ϕ := Q ∘ σ ( W T + K T + b T ) ∘ ⋯ ∘ σ ( W 1 + K 1 + b 1 ) ∘ P , {\displaystyle {\mathcal {G}}_{\phi }:={\mathcal {Q}}\circ \sigma (W_{T}+{\mathcal {K}}_{T}+b_{T})\circ \cdots \circ \sigma (W_{1}+{\mathcal {K}}_{1}+b_{1})\circ {\mathcal {P}},} where P , Q {\displaystyle {\mathcal {P}},{\mathcal {Q}}} are the lifting (lifting the codomain of the input function to a higher dimensional space) and projection (projecting the codomain of the intermediate function to the output dimension) operators, respectively. These operators act pointwise on functions and are typically parametrized as multilayer perceptrons. σ {\displaystyle \sigma } is a pointwise nonlinearity, such as a rectified linear unit (ReLU), or a Gaussian error linear unit (GeLU). Each layer t = 1 , … , T {\displaystyle t=1,\dots ,T} has a respective local operator W t {\displaystyle W_{t}} (usually parameterized by a pointwise neural network), a kernel integral operator K t {\displaystyle {\mathcal {K}}_{t}} , and a bias function b t {\displaystyle b_{t}} . Given some intermediate functional representation v t {\displaystyle v_{t}} with domain D {\displaystyle D} in the t {\displaystyle t} -th hidden layer, a kernel integral operator K ϕ {\displaystyle {\mathcal {K}}_{\phi }} is defined as ( K ϕ v t ) ( x ) := ∫ D κ ϕ ( x , y , v t ( x ) , v t ( y ) ) v t ( y ) d y , {\displaystyle ({\mathcal {K}}_{\phi }v_{t})(x):=\int _{D}\kappa _{\phi }(x,y,v_{t}(x),v_{t}(y))v_{t}(y)dy,} where the kernel κ ϕ {\displaystyle \kappa _{\phi }} is a learnable implicit neural network, parametrized by ϕ {\displaystyle \phi } . In practice, one is often given the input function to the neural operator at a specific resolution. For instance, consider the setting where one is given the evaluation of v t {\displaystyle v_{t}} at n {\displaystyle n} points { y j } j n {\displaystyle \{y_{j}\}_{j}^{n}} . Borrowing from Nyström integral approximation methods such as Riemann sum integration and Gaussian quadrature, the above integral operation can be computed as follows: ∫ D κ ϕ ( x , y , v t ( x ) , v t ( y ) ) v t ( y ) d y ≈ ∑ j n κ ϕ ( x , y j , v t ( x ) , v t ( y j ) ) v t ( y j ) Δ y j , {\displaystyle \int _{D}\kappa _{\phi }(x,y,v_{t}(x),v_{t}(y))v_{t}(y)dy\approx \sum _{j}^{n}\kappa _{\phi }(x,y_{j},v_{t}(x),v_{t}(y_{j}))v_{t}(y_{j})\Delta _{y_{j}},} where Δ y j {\displaystyle \Delta _{y_{j}}} is the sub-area volume or quadrature weight associated to the point y j {\displaystyle y_{j}} . Thus, a simplified layer can be computed as v t + 1 ( x ) ≈ σ ( ∑ j n κ ϕ ( x , y j , v t ( x ) , v t ( y j ) ) v t ( y j ) Δ y j + W t ( v t ( y j ) ) + b t ( x ) ) . {\displaystyle v_{t+1}(x)\approx \sigma \left(\sum _{j}^{n}\kappa _{\phi }(x,y_{j},v_{t}(x),v_{t}(y_{j}))v_{t}(y_{j})\Delta _{y_{j}}+W_{t}(v_{t}(y_{j}))+b_{t}(x)\right).} The above approximation, along with parametrizing κ ϕ {\displaystyle \kappa _{\phi }} as an implicit neural network, results in the graph neural operator (GNO). There have been various parameterizations of neural operators for different applications. These typically differ in their parameterization of κ {\displaystyle \kappa } . The most popular instantiation is the Fourier neural operator (FNO). FNO takes κ ϕ ( x , y , v t ( x ) , v t ( y ) ) := κ ϕ ( x − y ) {\displaystyle \kappa _{\phi }(x,y,v_{t}(x),v_{t}(y)):=\kappa _{\phi }(x-y)} and by applying the convolution theorem, arrives at the following parameterization of the kernel integral operator: ( K ϕ v t ) ( x ) = F − 1 ( R ϕ ⋅ ( F v t ) ) ( x ) , {\displaystyle ({\mathcal {K}}_{\phi }v_{t})(x)={\mathcal {F}}^{-1}(R_{\phi }\cdot ({\mathcal {F}}v_{t}))(x),} where F {\displaystyle {\mathcal {F}}} represents the Fourier transform and R ϕ {\displaystyle R_{\phi }} represents the Fourier transform of some periodic function κ ϕ {\displaystyle \kappa _{\phi }} . That is, FNO parameterizes the kernel integration directly in Fourier space, using a prescribed number of Fourier modes. When the grid at which the input function is presented is uniform, the Fourier transform can be approximated using the discrete Fourier transform (DFT) with frequencies below some specified threshold. The discrete Fourier transform can be computed using a fast Fourier transform (FFT) implementation. == Training == Training neural operators is similar to the training process for a traditional neural network. Neural operators are typically trained in some Lp norm or Sobolev norm. In particular, for a dataset { ( a i , u i ) } i = 1 N {\displaystyle \{(a_{i},u_{i})\}_{i=1}^{N}} of size N {\displaystyle N} , neural operators minimize (a discretization of) L U ( { ( a i , u i ) } i = 1 N ) := ∑ i = 1 N ‖ u i − G θ ( a i ) ‖ U 2 {\displaystyle {\mathcal {L}}_{\mathca

    Read more →
  • MegaHAL

    MegaHAL

    MegaHAL is a computer conversation simulator, or "chatterbot", created by Jason Hutchens. == Background == In 1996, Jason Hutchens entered the Loebner Prize Contest with HeX, a chatterbot based on ELIZA. HeX won the competition that year and took the $2000 prize for having the highest overall score. In 1998, Hutchens again entered the Loebner Prize Contest with his new program, MegaHAL. MegaHAL made its debut in the 1998 Loebner Prize Contest. Like many chatterbots, the intent is for MegaHAL to appear as a human fluent in a natural language. As a user types sentences into MegaHAL, MegaHAL will respond with sentences that are sometimes coherent and at other times complete gibberish. MegaHAL learns as the conversation progresses, remembering new words and sentence structures. It will even learn new ways to substitute words or phrases for other words or phrases. Many would consider conversation simulators like MegaHAL to be a primitive form of artificial intelligence. However, MegaHAL doesn't understand the conversation or even the sentence structure. It generates its conversation based on sequential and mathematical relationships. In the world of conversation simulators, MegaHAL is based on relatively old technology and could be considered primitive. However, its popularity has grown due to its humorous nature; it has been known to respond with twisted or nonsensical statements that are often amusing. == Theory of Operation == MegaHal is based at least in part on a so-called "hidden Markov Model", so that the first thing that Megahal does when it "trains" on a script or text is to build a database of text fragments encompassing every possible subset of perhaps 4, 5, or even 6 consecutive words, so that for example - if MegaHal trains on the Declaration of Independence, then MegaHal will build a database containing text fragments such as "When in the course", "in the course of", "the course of human", "course of human events", "of human events, one", "human events, one people", and so on. Then if Megahal is fed another text, such has "Superman, Yes! It's Superman - he can change the course of mighty rivers, bend steel with his bare hands - and who disguised at Clark Kent …" IT MIGHT induce Megahal to apparently bemuse itself to proffer whether Superman can change the course of human events, or something else altogether - such as some rambling about "when in the course of mighty rivers", and so on. Thus likewise - if a phrase like "the White house said" comes up a lot in some text; then Megahal's ability to switch randomly between different contexts which otherwise share some similarity can result at times in some surprising lucidity, or else it might otherwise seem quite bizarre. == Examples == There are some sentences that MegaHAL generated: CHESS IS A FUN SPORT, WHEN PLAYED WITH SHOT GUNS. and COWS FLY LIKE CLOUDS BUT THEY ARE NEVER COMPLETELY SUCCESSFUL. == Distribution == MegaHAL is distributed under the Unlicense. Its source code can be downloaded from the Github repository.

    Read more →
  • Lexxe

    Lexxe

    Lexxe is an internet search engine that applies Natural Language Processing in its semantic search technology. Founded in 2005 by Dr. Hong Liang Qiao, Lexxe is based in Sydney, Australia. Today, Lexxe's key focus is on sentiment search with the launch of a news sentiment search site at News & Moods (www.newsandmoods.com). Lexxe has experienced several stages of change of focus in search technology: Lexxe launched its Alpha version in 2005, featuring Natural Language question answering (i.e. users could ask questions in English to the search engine apart from keyword searches — this feature has been suspended for redevelopment since 2010). It used only algorithms to extract answers from web pages, with no question-answer pair databases prepared in advance. In 2011, Lexxe launched a beta version with a new search technology called Semantic Key. Semantic Keys enable users to query with a conceptual keyword (or a keyword with a special meaning, hence the term Semantic Key) in order to find instances under the concept, e.g. price → $5.95 or €200, color → red, yellow, white. For example, “price: a pound of apples”, “color: ferrari”. With initial 500 Semantic Keys at the Beta launch, Lexxe became the first search engine in the world to offer this unique and useful search technology to the users. The cost of building Semantic Keys was too heavy though. In 2017, Lexxe launched News & Moods (www.newsandmoods.com), an open platform for news sentiment search, a first step towards sentiment search feature for the entire Internet search in Lexxe search engine. News & Moods also comes with smartphone apps in Android and iOS.

    Read more →
  • Artificial intelligence

    Artificial intelligence

    Artificial intelligence (AI) is the capability of computational systems to perform tasks typically associated with human intelligence, such as learning, reasoning, problem-solving, perception, and decision-making. It is a field of research in engineering, mathematics and computer science that develops and studies methods and software that enable machines to perceive their environment and use learning and intelligence to take actions that maximize their chances of achieving defined goals. High-profile applications of AI include advanced web search engines, chatbots, virtual assistants, autonomous vehicles, and play and analysis in strategy games (e.g., chess and Go). Since the 2020s, generative AI has become widely available to generate images, audio, and videos from text prompts. The traditional goals of AI research include learning, reasoning, knowledge representation, planning, natural language processing, and perception, as well as support for robotics. To reach these goals, AI researchers have used techniques including state space search and mathematical optimization, formal logic, artificial neural networks, and methods based on statistics, operations research, and economics. AI also draws upon psychology, linguistics, philosophy, neuroscience, and other fields. Some companies, such as OpenAI, Google DeepMind and Meta, aim to create artificial general intelligence (AGI) – AI that can complete virtually any cognitive task at least as well as a human. Artificial intelligence was founded as an academic discipline in 1956, and the field went through multiple cycles of optimism throughout its history, followed by periods of disappointment and loss of funding, known as AI winters. Funding and interest increased substantially after 2012, when graphics processing units began being used to accelerate neural networks, and deep learning outperformed previous AI techniques. This growth accelerated further after 2017 with the transformer architecture. In the 2020s, an AI boom has coincided with advances in generative AI, which allowed for the creation and modification of media. In addition to AI safety and unintended consequences and harms from the use of AI, ethical concerns, AI's long-term effects, and potential existential risks have prompted discussions of AI regulation. == Goals == The general problem of simulating (or creating) intelligence has been broken into subproblems. These consist of particular traits or capabilities that researchers expect an intelligent system to display. The traits described below have received the most attention and cover the scope of AI research. === Reasoning and problem-solving === Early researchers developed algorithms that imitated step-by-step reasoning that humans use when they solve puzzles or make logical deductions. By the late 1980s and 1990s, methods were developed for dealing with uncertain or incomplete information, employing concepts from probability and economics. Many of these algorithms are insufficient for solving large reasoning problems because they experience a "combinatorial explosion": They become exponentially slower as the problems grow. Even humans rarely use the step-by-step deduction that early AI research could model. They solve most of their problems using fast, intuitive judgments. Accurate and efficient reasoning is an unsolved problem. === Knowledge representation === Knowledge representation and knowledge engineering allow AI programs to answer questions intelligently and make deductions about real-world facts. Formal knowledge representations are used in content-based indexing and retrieval, scene interpretation, clinical decision support, knowledge discovery (mining "interesting" and actionable inferences from large databases), and other areas. A knowledge base is a body of knowledge represented in a form that can be used by a program. An ontology is the set of objects, relations, concepts, and properties used by a particular domain of knowledge. Knowledge bases need to represent things such as objects, properties, categories, and relations between objects; situations, events, states, and time; causes and effects; knowledge about knowledge (what we know about what other people know); default reasoning (things that humans assume are true until they are told differently and will remain true even when other facts are changing); and many other aspects and domains of knowledge. Among the most difficult problems in knowledge representation are the breadth of commonsense knowledge (the set of atomic facts that the average person knows is enormous); and the sub-symbolic form of most commonsense knowledge (much of what people know is not represented as "facts" or "statements" that they could express verbally). There is also the difficulty of knowledge acquisition, the problem of obtaining knowledge for AI applications. === Planning and decision-making === An "agent" is any entity (artificial or not) that perceives and takes actions in the world. A rational agent has goals or preferences and takes actions to make them happen. In automated planning, the agent has a specific goal. In automated decision-making, the agent has preferences—there are some situations it would prefer to be in, and some situations it is trying to avoid. The decision-making agent assigns a number to each situation (called the "utility") that measures how much the agent prefers it. For each possible action, it can calculate the "expected utility": the utility of all possible outcomes of the action, weighted by the probability that the outcome will occur. It can then choose the action with the maximum expected utility. In classical planning, the agent knows exactly what the effect of any action will be. In most real-world problems, however, the agent may not be certain about the situation they are in (it is "unknown" or "unobservable") and it may not know for certain what will happen after each possible action (it is not "deterministic"). It must choose an action by making a probabilistic guess and then reassess the situation to see if the action worked. Alongside thorough testing and improvement based on previous decisions, having an explanation for why the agent took certain decisions is a way to build trust, especially when the decisions have to be relied upon. In some problems, the agent's preferences may be uncertain, especially if there are other agents or humans involved. These can be learned (e.g., with inverse reinforcement learning), or the agent can seek information to improve its preferences. Information value theory can be used to weigh the value of exploratory or experimental actions. The space of possible future actions and situations is typically intractably large, so the agents must take actions and evaluate situations while being uncertain of what the outcome will be. A Markov decision process has a transition model that describes the probability that a particular action will change the state in a particular way and a reward function that supplies the utility of each state and the cost of each action. A policy associates a decision with each possible state. The policy could be calculated (e.g., by iteration), be heuristic, or it can be learned. Game theory describes the rational behavior of multiple interacting agents and is used in AI programs that make decisions that involve other agents. === Learning === Machine learning is the study of programs that can improve their performance on a given task automatically. It has been a part of AI from the beginning. There are several kinds of machine learning. Unsupervised learning analyzes a stream of data and finds patterns and makes predictions without any other guidance. Supervised learning requires labeling the training data with the expected answers, and comes in two main varieties: classification (where the program must learn to predict what category the input belongs in) and regression (where the program must deduce a numeric function based on numeric input). In reinforcement learning, the agent is rewarded for good responses and punished for bad ones. The agent learns to choose responses that are classified as "good". Transfer learning is when the knowledge gained from one problem is applied to a new problem. Deep learning is a type of machine learning that runs inputs through biologically inspired artificial neural networks for all of these types of learning. Computational learning theory can assess learners by computational complexity, by sample complexity (how much data is required), or by other notions of optimization. === Natural language processing === Natural language processing (NLP) allows programs to read, write and communicate in human languages. Specific problems include speech recognition, speech synthesis, machine translation, information extraction, information retrieval and question answering. Early work, based on Noam Chomsky's generative grammar and semantic networks, had difficulty with word-sense disambiguation unless

    Read more →
  • Racter

    Racter

    Racter is an artificial intelligence program that generates English language prose at random. It was published by Mindscape for IBM PC compatibles in 1984, then for the Apple II, Mac, and Amiga. An expanded version of the software, not the one released through Mindscape, was used to generate the text for the published book The Policeman's Beard Is Half Constructed. == History == Racter, short for raconteur, was written by William Chamberlain and Thomas Etter. Racter's initial creation was the short story Soft Ions, which appeared in the October 1981 issue of Omni (magazine). The publication's editors bought the story in January 1980, before it had even been written. In exchange for the rights, the editors offered financial support to Chamberlain and Etter so the two could refine Racter. In 1983, Racter produced a book called The Policeman's Beard Is Half Constructed (ISBN 0-446-38051-2). The program originally was written for an OSI which only supported file names at most six characters long, causing the name to be shorted to Racter and it was later adapted to run on a CP/M machine where it was written in "compiled ASIC on a Z80 microcomputer with 64K of RAM." This version, the program that allegedly wrote the book, was not released to the general public. The sophistication claimed for the program was likely exaggerated, as could be seen by investigation of the template system of text generation. In 1984, Mindscape released an interactive version of Racter, developed by Inrac Corporation, for IBM PC compatibles, and it was ported to the Apple II, Mac, and Amiga. The published Racter was similar to a chatterbot. The BASIC program that was released by Mindscape was far less sophisticated than anything that could have written the fairly sophisticated prose of The Policeman's Beard. The commercial version of Racter could be likened to a computerized version of Mad Libs, the game in which you fill in the blanks in advance and then plug them into a text template to produce a surrealistic tale. The commercial program attempted to parse text inputs, identifying significant nouns and verbs, which it would then regurgitate to create "conversations", plugging the input from the user into phrase templates which it then combined, along with modules that conjugated English verbs. By contrast, the text in The Policeman's Beard, apart from being edited from a large amount of output, would have been the product of Chamberlain's own specialized templates and modules, which were not included in the commercial release of the program. == Reception == The Boston Phoenix called the story Soft Ions "schematic nonsense. But the scheme is obvious enough and the nonsense accessible enough to an attentive reader that one can almost believe Chamberlain when he predicts that before long Racter will be ready to write for the pulp-reading public." PC Magazine described some of Policeman's Beard's scenes as "surprising for their frankness" and "reflective". It concluded that the book was "whimsical and wise and sometimes fun". Computer Gaming World described Racter as "a diversion into another dimension that might best be seen before paying the price of a ticket. (Try before you buy!)" A 1985 review of the program in The New York Times notes that, "As computers move ever closer to artificial intelligence, Racter is on the edge of artificial insanity." It also states that Racter's "always-changing sentences are grammatically correct, often funny and, for a computer, sometimes profound." The article includes examples showing interaction with Racter, most often Racter asking the user questions. == Reviews == Jeux & Stratégie #47

    Read more →
  • Superellipsoid

    Superellipsoid

    In mathematics, a superellipsoid (or super-ellipsoid) is a solid whose horizontal sections are superellipses (Lamé curves) with the same squareness parameter ϵ 2 {\displaystyle \epsilon _{2}} , and whose vertical sections through the center are superellipses with the squareness parameter ϵ 1 {\displaystyle \epsilon _{1}} . It is a generalization of an ellipsoid, which is a special case when ϵ 1 = ϵ 2 = 1 {\displaystyle \epsilon _{1}=\epsilon _{2}=1} . Superellipsoids as computer graphics primitives were popularized by Alan H. Barr (who used the name "superquadrics" to refer to both superellipsoids and supertoroids). In modern computer vision and robotics literatures, superquadrics and superellipsoids are used interchangeably, since superellipsoids are the most representative and widely utilized shape among all the superquadrics. Superellipsoids have a rich shape vocabulary, including cuboids, cylinders, ellipsoids, octahedra and their intermediates. It becomes an important geometric primitive widely used in computer vision, robotics, and physical simulation. The main advantage of describing objects and environment with superellipsoids is its conciseness and expressiveness in shape. Furthermore, a closed-form expression of the Minkowski sum between two superellipsoids is available. This makes it a desirable geometric primitive for robot grasping, collision detection, and motion planning. == Special cases == A handful of notable mathematical figures can arise as special cases of superellipsoids given the correct set of values, which are depicted in the above graphic: Cylinder Sphere Steinmetz solid Bicone Regular octahedron Cube, as a limiting case where the exponents tend to infinity Piet Hein's supereggs are also special cases of superellipsoids. == Formulas == === Basic (normalized) superellipsoid === The basic superellipsoid is defined by the implicit function f ( x , y , z ) = ( x 2 ϵ 2 + y 2 ϵ 2 ) ϵ 2 / ϵ 1 + z 2 ϵ 1 {\displaystyle f(x,y,z)=\left(x^{\frac {2}{\epsilon _{2}}}+y^{\frac {2}{\epsilon _{2}}}\right)^{\epsilon _{2}/\epsilon _{1}}+z^{\frac {2}{\epsilon _{1}}}} The parameters ϵ 1 {\displaystyle \epsilon _{1}} and ϵ 2 {\displaystyle \epsilon _{2}} are positive real numbers that control the squareness of the shape. The surface of the superellipsoid is defined by the equation: f ( x , y , z ) = 1 {\displaystyle f(x,y,z)=1} For any given point ( x , y , z ) ∈ R 3 {\displaystyle (x,y,z)\in \mathbb {R} ^{3}} , the point lies inside the superellipsoid if f ( x , y , z ) < 1 {\displaystyle f(x,y,z)<1} , and outside if f ( x , y , z ) > 1 {\displaystyle f(x,y,z)>1} . Any "parallel of latitude" of the superellipsoid (a horizontal section at any constant z between -1 and +1) is a Lamé curve with exponent 2 / ϵ 2 {\displaystyle 2/\epsilon _{2}} , scaled by a = ( 1 − z 2 ϵ 1 ) ϵ 1 2 {\displaystyle a=(1-z^{\frac {2}{\epsilon _{1}}})^{\frac {\epsilon _{1}}{2}}} , which is ( x a ) 2 ϵ 2 + ( y a ) 2 ϵ 2 = 1. {\displaystyle \left({\frac {x}{a}}\right)^{\frac {2}{\epsilon _{2}}}+\left({\frac {y}{a}}\right)^{\frac {2}{\epsilon _{2}}}=1.} Any "meridian of longitude" (a section by any vertical plane through the origin) is a Lamé curve with exponent 2 / ϵ 1 {\displaystyle 2/\epsilon _{1}} , stretched horizontally by a factor w that depends on the sectioning plane. Namely, if x = u cos ⁡ θ {\displaystyle x=u\cos \theta } and y = u sin ⁡ θ {\displaystyle y=u\sin \theta } , for a given θ {\displaystyle \theta } , then the section is ( u w ) 2 ϵ 1 + z 2 ϵ 1 = 1 , {\displaystyle \left({\frac {u}{w}}\right)^{\frac {2}{\epsilon _{1}}}+z^{\frac {2}{\epsilon _{1}}}=1,} where w = ( cos 2 ϵ 2 ⁡ θ + sin 2 ϵ 2 ⁡ θ ) − ϵ 2 2 . {\displaystyle w=(\cos ^{\frac {2}{\epsilon _{2}}}\theta +\sin ^{\frac {2}{\epsilon _{2}}}\theta )^{-{\frac {\epsilon _{2}}{2}}}.} In particular, if ϵ 2 {\displaystyle \epsilon _{2}} is 1, the horizontal cross-sections are circles, and the horizontal stretching w {\displaystyle w} of the vertical sections is 1 for all planes. In that case, the superellipsoid is a solid of revolution, obtained by rotating the Lamé curve with exponent 2 / ϵ 1 {\displaystyle 2/\epsilon _{1}} around the vertical axis. === Superellipsoid === The basic shape above extends from −1 to +1 along each coordinate axis. The general superellipsoid is obtained by scaling the basic shape along each axis by factors a x {\displaystyle a_{x}} , a y {\displaystyle a_{y}} , a z {\displaystyle a_{z}} , the semi-diameters of the resulting solid. The implicit function is F ( x , y , z ) = ( ( x a x ) 2 ϵ 2 + ( y a y ) 2 ϵ 2 ) ϵ 2 ϵ 1 + ( z a z ) 2 ϵ 1 {\displaystyle F(x,y,z)=\left(\left({\frac {x}{a_{x}}}\right)^{\frac {2}{\epsilon _{2}}}+\left({\frac {y}{a_{y}}}\right)^{\frac {2}{\epsilon _{2}}}\right)^{\frac {\epsilon _{2}}{\epsilon _{1}}}+\left({\frac {z}{a_{z}}}\right)^{\frac {2}{\epsilon _{1}}}} . Similarly, the surface of the superellipsoid is defined by the equation F ( x , y , z ) = 1 {\displaystyle F(x,y,z)=1} For any given point ( x , y , z ) ∈ R 3 {\displaystyle (x,y,z)\in \mathbb {R} ^{3}} , the point lies inside the superellipsoid if f ( x , y , z ) < 1 {\displaystyle f(x,y,z)<1} , and outside if f ( x , y , z ) > 1 {\displaystyle f(x,y,z)>1} . Therefore, the implicit function is also called the inside-outside function of the superellipsoid. The superellipsoid has a parametric representation in terms of surface parameters η ∈ [ − π / 2 , π / 2 ) {\displaystyle \eta \in [-\pi /2,\pi /2)} , ω ∈ [ − π , π ) {\displaystyle \omega \in [-\pi ,\pi )} . x ( η , ω ) = a x cos ϵ 1 ⁡ η cos ϵ 2 ⁡ ω {\displaystyle x(\eta ,\omega )=a_{x}\cos ^{\epsilon _{1}}\eta \cos ^{\epsilon _{2}}\omega } y ( η , ω ) = a y cos ϵ 1 ⁡ η sin ϵ 2 ⁡ ω {\displaystyle y(\eta ,\omega )=a_{y}\cos ^{\epsilon _{1}}\eta \sin ^{\epsilon _{2}}\omega } z ( η , ω ) = a z sin ϵ 1 ⁡ η {\displaystyle z(\eta ,\omega )=a_{z}\sin ^{\epsilon _{1}}\eta } === General posed superellipsoid === In computer vision and robotic applications, a superellipsoid with a general pose in the 3D Euclidean space is usually of more interest. For a given Euclidean transformation of the superellipsoid frame g = [ R ∈ S O ( 3 ) , t ∈ R 3 ] ∈ S E ( 3 ) {\displaystyle g=[\mathbf {R} \in SO(3),\mathbf {t} \in \mathbb {R} ^{3}]\in SE(3)} relative to the world frame, the implicit function of a general posed superellipsoid surface defined the world frame is F ( g − 1 ∘ ( x , y , z ) ) = 1 {\displaystyle F\left(g^{-1}\circ (x,y,z)\right)=1} where ∘ {\displaystyle \circ } is the transformation operation that maps the point ( x , y , z ) ∈ R 3 {\displaystyle (x,y,z)\in \mathbb {R} ^{3}} in the world frame into the canonical superellipsoid frame. === Volume of superellipsoid === The volume encompassed by the superelllipsoid surface can be expressed in terms of the beta functions β ( ⋅ , ⋅ ) {\displaystyle \beta (\cdot ,\cdot )} , V ( ϵ 1 , ϵ 2 , a x , a y , a z ) = 2 a x a y a z ϵ 1 ϵ 2 β ( ϵ 1 2 , ϵ 1 + 1 ) β ( ϵ 2 2 , ϵ 2 + 2 2 ) {\displaystyle V(\epsilon _{1},\epsilon _{2},a_{x},a_{y},a_{z})=2a_{x}a_{y}a_{z}\epsilon _{1}\epsilon _{2}\beta ({\frac {\epsilon _{1}}{2}},\epsilon _{1}+1)\beta ({\frac {\epsilon _{2}}{2}},{\frac {\epsilon _{2}+2}{2}})} or equivalently with the Gamma function Γ ( ⋅ ) {\displaystyle \Gamma (\cdot )} , since β ( m , n ) = Γ ( m ) Γ ( n ) Γ ( m + n ) {\displaystyle \beta (m,n)={\frac {\Gamma (m)\Gamma (n)}{\Gamma (m+n)}}} == Recovery from data == Recoverying the superellipsoid (or superquadrics) representation from raw data (e.g., point cloud, mesh, images, and voxels) is an important task in computer vision, robotics, and physical simulation. Traditional computational methods model the problem as a least-square problem. The goal is to find out the optimal set of superellipsoid parameters θ ≐ [ ϵ 1 , ϵ 2 , a x , a y , a z , g ] {\displaystyle \theta \doteq [\epsilon _{1},\epsilon _{2},a_{x},a_{y},a_{z},g]} that minimize an objective function. Other than the shape parameters, g ∈ {\displaystyle g\in } SE(3) is the pose of the superellipsoid frame with respect to the world coordinate. There are two commonly used objective functions. The first one is constructed directly based on the implicit function G 1 ( θ ) = a x a y a z ∑ i = 1 N ( F ϵ 1 ( g − 1 ∘ ( x i , y i , z i ) ) − 1 ) 2 {\displaystyle G_{1}(\theta )=a_{x}a_{y}a_{z}\sum _{i=1}^{N}\left(F^{\epsilon _{1}}\left(g^{-1}\circ (x_{i},y_{i},z_{i})\right)-1\right)^{2}} The minimization of the objective function provides a recovered superellipsoid as close as possible to all the input points { ( x i , y i , z i ) ∈ R 3 , i = 1 , 2 , . . . , N } {\displaystyle \{(x_{i},y_{i},z_{i})\in \mathbb {R} ^{3},i=1,2,...,N\}} . At the mean time, the scalar value a x , a y , a z {\displaystyle a_{x},a_{y},a_{z}} is positively proportional to the volume of the superellipsoid, and thus have the effect of minimizing the volume as well. The other objective function tries to minimized the radial distance between the points and the superellipsoid. That is G 2 ( θ ) = ∑ i = 1 N ( | r

    Read more →
  • GPT-5

    GPT-5

    GPT-5 is a multimodal large language model developed by OpenAI and the fifth in its series of generative pre-trained transformer (GPT) foundation models. Preceded in the series by GPT-4, it was launched on August 7, 2025. It is publicly accessible to users of the chatbot products ChatGPT and Microsoft Copilot as well as to developers through the OpenAI API. == Background == On April 14, 2023, Sam Altman, the chief executive officer of OpenAI, spoke at an event at the Massachusetts Institute of Technology and said that the company was not training GPT-5 at that time. He stated that OpenAI was "prioritizing GPT-4 development" and that "we are not and won't for some time" release GPT-5. On July 18, OpenAI filed for a "GPT-5" trademark in the United States. On November 13, Altman confirmed to the Financial Times that the company was working to develop GPT-5. According to The Information, "[f]or much of the second half of 2024, OpenAI was developing a model known internally as Orion and intended to become GPT-5", "[b]ut the Orion effort failed to produce a better model, and the company instead released it as GPT-4.5 in February [2025]." By late July 2025, OpenAI was widely anticipated as planning to release GPT-5 in early August. On July 30, The Verge reported that "Microsoft is getting ready for GPT-5" as "sources familiar with Microsoft's AI plans" told an editor that the company was testing a new mode for its Copilot chatbot that would offer a model that "thinks deeply or quickly based on the task". On August 5, in the leadup to the release of GPT-5, OpenAI released GPT-OSS, a set of two open-weight models that have reasoning capabilities. GPT-5 was then unveiled during a livestream event on August 7. == Capabilities == At the time of its release, GPT-5 had state-of-the-art performance on benchmarks that test mathematics, programming, finance, and multimodal understanding. According to OpenAI, improvements over its predecessor models include faster response times, better coding and writing skills, more accurate answers to health questions, and lower levels of hallucination. Also, compared to previous models, GPT-5 aims to give safe, high-level responses to potentially harmful queries rather than outright declining them, an approach that OpenAI refers to as "safe completions", aiming to result "in GPT-5 being able to refuse more unsafe questions, while offering fewer rejections to users seeking harmless information." In addition, GPT-5 was trained to give more critical, "less effusively agreeable" answers compared to its predecessor models. Days before the launch of GPT-5, two early testers of the model stated that they were "impressed" by its ability to code and to solve mathematical and scientific problems. They suggested that the model shows great improvement from GPT-4, but not as large of a gain as from GPT-3 to GPT-4. A day prior to the release of GPT-5, during a press briefing, Sam Altman, the chief executive officer of OpenAI, called GPT-5 "a significant step along the path to AGI", referring to artificial general intelligence, the hypothetical level of intelligence that OpenAI defines as the ability to perform any economically valuable task that a human can. According to Altman, GPT-5 is "significantly better" than its predecessors, offering "PhD-level" abilities across a wide range of tasks. The exact energy consumption of GPT-5 use has not been disclosed by OpenAI. Researchers at the University of Rhode Island estimated that a medium-length response consumes slightly over 18 watt-hours, equivalent to using an incandescent bulb for 18 minutes. === Architecture === GPT-5 is a system that contains a fast, high-throughput model, a deeper reasoning model, and a real-time router that decides which model to use based on conversation type, complexity, tool needs, and explicit user intent. Altman had previously criticized the manual model picker for being overly complex, suggesting a need for unification. GPT-5 also includes agentic functionality through which it can set up its own desktop and can use its browser to search autonomously for sources that relate to its task. The GPT-5 system card defines two fast, high-throughput models – gpt-5-main and gpt-5-main-mini – and two thinking models – gpt-5-thinking and gpt-5-thinking-mini. In the OpenAI API, developers can access the thinking model, its mini version, and gpt-5-thinking-nano, an even smaller and faster nano version of the thinking model. The version of GPT-5 that is accessible via the API has adjustable reasoning effort (low, medium, high, or minimal) and verbosity (low, medium, or high). Additionally, ChatGPT provides access to gpt-5-thinking with a setting that makes use of parallel test-time compute, referred to as gpt-5-thinking-pro. == Limitations == === Safety === Neuraltrust, a security research company, claimed to have successfully compromised GPT-5 within its first day of testing the model. According to its report, it enabled GPT-5 to generate detailed instructions for manufacturing explosive devices. SPLX, another company, conducted similar tests and came to similar conclusions about GPT-5's security. Their assessments suggest that GPT-5 has significant security gaps, potentially rendering it as being unsafe for use in a corporate environment. == Training == According to AIMultiple, GPT-5 is natively multimodal, meaning that it was trained from scratch on multiple modalities (like text and images) at once without relying on already-trained language or vision models. Its training process involved three stages: unsupervised pretraining, supervised fine-tuning, and reinforcement learning from human feedback. Pretraining used a large-scale multilingual dataset of books, articles, web pages, academic papers, and licensed sources. GPT-5's visual and text capabilities were described as having been developed alongside each other throughout training, unlike with GPT-4. == Use == GPT-5 is used in ChatGPT. Although GPT-5 is free for all ChatGPT users, Plus users get higher use limits while Pro users get unlimited access to GPT-5 as well as limited access to GPT-5 Pro. Standard limits for lower-tier users on responses per hour still apply. Additionally, with the introduction of GPT-5, ChatGPT's "Advanced Voice Mode" was replaced by "ChatGPT Voice", which is supposed to enable more natural-sounding conversations. OpenAI stated that "Standard Voice Mode retires on September 9, 2025, unifying all users on ChatGPT Voice". On November 24, 2025, the feature of shopping research was added to ChatGPT, claimed to be a mini model post-trained on gpt-5-thinking-mini. GPT-5 is also available in Microsoft Copilot, and Microsoft stated that it will incorporate GPT-5 into a wide variety of its products. According to 9to5Mac, Apple Inc. is planning to integrate the model into the Apple Intelligence feature in its iOS 26, iPadOS 26, and macOS Tahoe operating systems. It is also accessible via the OpenAI API. A number of American companies were reported as having received access to GPT-5 ahead of its launch. OpenAI stated that the private health insurance company Oscar Health was checking applications from its policyholders with the model. In addition, Uber was using GPT-5 for its customer support system; GitLab, Windsurf, and Cursor were using the model for software development; and the Spanish bank BBVA was using it for financial analysis. Other companies that OpenAI listed as having used GPT-5 pre-release include Amgen, Lowe's, and Notion. == Reception == === Critical reviews === Grace Huckins in MIT Technology Review found that, "[w]hereas o1 was a major technological advancement, GPT-5 is, above all else, a refined product." In response to claims that Sam Altman, the chief executive officer of OpenAI, had made about the model, she stated that "GPT-5 will furnish a more pleasant and seamless user experience. That's not nothing, but it falls far short of the transformative AI future that Altman has spent much of the past year hyping." In response to Altman's claim that GPT-5 is "a significant step along the path" to artificial general intelligence, she noted: "[M]aybe he's right—but if so, it's a very small step." In The Information, Stephanie Palazzolo praised GPT-5's coding capabilities. According to Matteo Wong in The Atlantic, GPT-5 "is intuitive, fast, and efficient; adapts to human preferences and intentions; and is easy to personalize." He stated: "At this stage of the AI boom, when every major chatbot is legitimately helpful in numerous ways, benchmarks, science, and rigor feel almost insignificant. What matters is how the chatbot feels [...]". John Herrman from the New York magazine wrote: "Casual users who encounter GPT-5 through ChatGPT aren't likely to feel like they're using a completely different product [...] while people who use it for software development or in a corporate context are more likely to notice a major change." Mashable's Christian de Looper found that "GPT-5

    Read more →
  • Azure Data Lake

    Azure Data Lake

    Azure Data Lake is a scalable data storage and analytics service. The service is hosted in Azure, Microsoft's public cloud. == History == Azure Data Lake service was released on November 16, 2016. It is based on COSMOS, which is used to store and process data for applications such as Azure, AdCenter, Bing, MSN, Skype and Windows Live. COSMOS features a SQL-like query engine called SCOPE upon which U-SQL was built. == Storage == Data Lake Storage is a cloud service to store structured, semi-structured or unstructured data produced from applications including social networks, relational data, sensors, videos, web apps, mobile or desktop devices. A single account can store trillions of files where a single file can be greater than a petabyte in size. == Analytics == Data Lake Analytics is a parallel on-demand job service. The parallel processing system is based on Microsoft Dryad. Dryad can represent arbitrary Directed Acyclic Graphs (DAGs) of computation. Data Lake Analytics provides a distributed infrastructure that can dynamically allocate resources so that customers pay for only the services they use. The system uses Apache YARN, the part of Apache Hadoop which governs resource management across clusters. Data Lake Store supports any application that uses the Hadoop Distributed File System (HDFS) interface. == U-SQL == U-SQL is a query language for Data Lake Analytics parallel data transformation and processing programs. It combines SQL and C#: it is and an evolution of the declarative SQL language with native extensibility through user code written in C#. U-SQL uses C# data types and the C# expression language. == Retirement == In 2021, Microsoft announced the 2024 retirement of the original Azure Data Lake Storage, now called "Gen1". The related Azure Data Lake Analytics / U-SQL technologies are also being retired. Azure Data Lake Storage Gen2, an extension of Azure Storage, will continue. The suggested replacement technologies are Azure Synapse Analytics and Apache Spark.

    Read more →
  • Residual neural network

    Residual neural network

    A residual neural network (also referred to as a residual network or ResNet) is a deep learning architecture in which the layers learn residual functions with reference to the layer inputs. It was developed in 2015 for image recognition, and won the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) of that year. As a point of terminology, "residual connection" refers to the specific architectural motif of x ↦ f ( x ) + x {\displaystyle x\mapsto f(x)+x} , where f {\displaystyle f} is an arbitrary neural network module. The motif had been used previously (see §History for details). However, the publication of ResNet made it widely popular for feedforward networks, appearing in neural networks that are seemingly unrelated to ResNet. The residual connection stabilizes the training and convergence of deep neural networks with hundreds of layers, and is a common motif in deep neural networks, such as transformer models (e.g., BERT, and GPT models such as ChatGPT), the AlphaGo Zero system, the AlphaStar system, and the AlphaFold system. == Mathematics == === Residual connection === In a multilayer neural network model, consider a (non-residual) subnetwork with a certain number of stacked layers (e.g., 2 or 3). Let H ( x ; α ) {\displaystyle H(x;\alpha )} denote the subnetwork. Suppose H ∗ {\displaystyle H^{}} is the desired optimal output of this subnetwork. Residual learning simply adds x {\displaystyle x} directly to the output, such that the optimal learned output now becomes be H ∗ − x {\displaystyle H^{}-x} , which is interpreted as a "residual" with respect to x {\displaystyle x} . The operation of "adding x {\displaystyle x} " is implemented via a "skip connection" that performs an identity mapping to connect the input of the subnetwork with its output. This connection is referred to as a "residual connection" in later work. Let F ( x ; α ) = H ( x ; a ) + x {\displaystyle F(x;\alpha )=H(x;a)+x} . The function F {\displaystyle F} is often represented by matrix multiplication interlaced with activation functions and normalization operations (e.g., batch normalization or layer normalization). As a whole, one of these subnetworks is referred to as a "residual block". A deep residual network is constructed by simply stacking these blocks. Long short-term memory (LSTM) has a memory mechanism that serves as a residual connection. In an LSTM without a forget gate, an input x t {\displaystyle x_{t}} is processed by a function F {\displaystyle F} and added to a memory cell c t {\displaystyle c_{t}} , resulting in c t + 1 = c t + F ( x t ) {\displaystyle c_{t+1}=c_{t}+F(x_{t})} . An LSTM with a forget gate essentially functions as a highway network. To stabilize the variance of the layers' inputs, it is recommended to replace the residual connections x + f ( x ) {\displaystyle x+f(x)} with x / L + f ( x ) {\displaystyle x/L+f(x)} , where L {\displaystyle L} is the total number of residual layers. === Projection connection === If the function F {\displaystyle F} is of type F : R n → R m {\displaystyle F:\mathbb {R} ^{n}\to \mathbb {R} ^{m}} where n ≠ m {\displaystyle n\neq m} , then F ( x ) + x {\displaystyle F(x)+x} is undefined. To handle this special case, a projection connection is used: y = F ( x ) + P ( x ) {\displaystyle y=F(x)+P(x)} where P {\displaystyle P} is typically a linear projection, defined by P ( x ) = M x {\displaystyle P(x)=Mx} where M {\displaystyle M} is a m × n {\displaystyle m\times n} matrix. The matrix is trained via backpropagation, as is any other parameter of the model. === Signal propagation === The introduction of identity mappings facilitates signal propagation in both forward and backward paths. ==== Forward propagation ==== If the output of the ℓ {\displaystyle \ell } -th residual block is the input to the ( ℓ + 1 ) {\displaystyle (\ell +1)} -th residual block (assuming no activation function between blocks), then the ( ℓ + 1 ) {\displaystyle (\ell +1)} -th input is: x ℓ + 1 = F ( x ℓ ) + x ℓ {\displaystyle x_{\ell +1}=F(x_{\ell })+x_{\ell }} Applying this formulation recursively, e.g.: x ℓ + 2 = F ( x ℓ + 1 ) + x ℓ + 1 = F ( x ℓ + 1 ) + F ( x ℓ ) + x ℓ {\displaystyle {\begin{aligned}x_{\ell +2}&=F(x_{\ell +1})+x_{\ell +1}\\&=F(x_{\ell +1})+F(x_{\ell })+x_{\ell }\end{aligned}}} yields the general relationship: x L = x ℓ + ∑ i = ℓ L − 1 F ( x i ) {\displaystyle x_{L}=x_{\ell }+\sum _{i=\ell }^{L-1}F(x_{i})} where L {\textstyle L} is the index of a residual block and ℓ {\textstyle \ell } is the index of some earlier block. This formulation suggests that there is always a signal that is directly sent from a shallower block ℓ {\textstyle \ell } to a deeper block L {\textstyle L} . ==== Backward propagation ==== The residual learning formulation provides the added benefit of mitigating the vanishing gradient problem to some extent. However, it is crucial to acknowledge that the vanishing gradient issue is not the root cause of the degradation problem, which is tackled through the use of normalization. To observe the effect of residual blocks on backpropagation, consider the partial derivative of a loss function E {\displaystyle {\mathcal {E}}} with respect to some residual block input x ℓ {\displaystyle x_{\ell }} . Using the equation above from forward propagation for a later residual block L > ℓ {\displaystyle L>\ell } : ∂ E ∂ x ℓ = ∂ E ∂ x L ∂ x L ∂ x ℓ = ∂ E ∂ x L ( 1 + ∂ ∂ x ℓ ∑ i = ℓ L − 1 F ( x i ) ) = ∂ E ∂ x L + ∂ E ∂ x L ∂ ∂ x ℓ ∑ i = ℓ L − 1 F ( x i ) {\displaystyle {\begin{aligned}{\frac {\partial {\mathcal {E}}}{\partial x_{\ell }}}&={\frac {\partial {\mathcal {E}}}{\partial x_{L}}}{\frac {\partial x_{L}}{\partial x_{\ell }}}\\&={\frac {\partial {\mathcal {E}}}{\partial x_{L}}}\left(1+{\frac {\partial }{\partial x_{\ell }}}\sum _{i=\ell }^{L-1}F(x_{i})\right)\\&={\frac {\partial {\mathcal {E}}}{\partial x_{L}}}+{\frac {\partial {\mathcal {E}}}{\partial x_{L}}}{\frac {\partial }{\partial x_{\ell }}}\sum _{i=\ell }^{L-1}F(x_{i})\end{aligned}}} This formulation suggests that the gradient computation of a shallower layer, ∂ E ∂ x ℓ {\textstyle {\frac {\partial {\mathcal {E}}}{\partial x_{\ell }}}} , always has a later term ∂ E ∂ x L {\textstyle {\frac {\partial {\mathcal {E}}}{\partial x_{L}}}} that is directly added. Even if the gradients of the F ( x i ) {\displaystyle F(x_{i})} terms are small, the total gradient ∂ E ∂ x ℓ {\textstyle {\frac {\partial {\mathcal {E}}}{\partial x_{\ell }}}} resists vanishing due to the added term ∂ E ∂ x L {\textstyle {\frac {\partial {\mathcal {E}}}{\partial x_{L}}}} . == Variants of residual blocks == === Basic block === A basic block is the simplest building block studied in the original ResNet. This block consists of two sequential 3x3 convolutional layers and a residual connection. The input and output dimensions of both layers are equal. === Bottleneck block === A bottleneck block consists of three sequential convolutional layers and a residual connection. The first layer in this block is a 1×1 convolution for dimension reduction (e.g., to 1/2 of the input dimension); the second layer performs a 3×3 convolution; the last layer is another 1×1 convolution for dimension restoration. The models of ResNet-50, ResNet-101, and ResNet-152 are all based on bottleneck blocks. === Pre-activation block === The pre-activation residual block applies activation functions before applying the residual function F {\displaystyle F} . Formally, the computation of a pre-activation residual block can be written as: x ℓ + 1 = F ( ϕ ( x ℓ ) ) + x ℓ {\displaystyle x_{\ell +1}=F(\phi (x_{\ell }))+x_{\ell }} where ϕ {\displaystyle \phi } can be any activation (e.g. ReLU) or normalization (e.g. LayerNorm) operation. This design reduces the number of non-identity mappings between residual blocks, and allows an identity mapping directly from the input to the output. This design was used to train models with 200 to over 1000 layers, and was found to consistently outperform variants where the residual path is not an identity function. The pre-activation ResNet with 200 layers took 3 weeks to train for ImageNet on 8 GPUs in 2016. Since GPT-2, transformer blocks have been mostly implemented as pre-activation blocks. This is often referred to as "pre-normalization" in the literature of transformer models. == Applications == Originally, ResNet was designed for computer vision. All transformer architectures include residual connections. Indeed, very deep transformers cannot be trained without them. The original ResNet paper made no claim on being inspired by biological systems. However, later research has related ResNet to biologically-plausible algorithms. A study published in Science in 2023 disclosed the complete connectome of an insect brain (specifically that of a fruit fly larva). This study discovered "multilayer shortcuts" that resemble the skip connections in artificial neural networks, including ResNets. == History == === Previous work === Residual connections were noticed in neu

    Read more →
  • Latent semantic analysis

    Latent semantic analysis

    Latent semantic analysis (LSA) is a technique in natural language processing, in particular distributional semantics, of analyzing relationships between a set of documents and the terms they contain by producing a set of concepts related to the documents and terms. LSA assumes that words that are close in meaning will occur in similar pieces of text (the distributional hypothesis). A matrix containing word counts per document (rows represent unique words and columns represent each document) is constructed from a large piece of text and a mathematical technique called singular value decomposition (SVD) is used to reduce the number of rows while preserving the similarity structure among columns. Documents are then compared by cosine similarity between any two columns. Values close to 1 represent very similar documents while values close to 0 represent very dissimilar documents. An information retrieval technique using latent semantic structure was patented in 1988 by Scott Deerwester, Susan Dumais, George Furnas, Richard Harshman, Thomas Landauer, Karen Lochbaum and Lynn Streeter. In the context of its application to information retrieval, it is sometimes called latent semantic indexing (LSI). == Overview == === Occurrence matrix === LSA can use a document-term matrix which describes the occurrences of terms in documents; it is a sparse matrix whose rows correspond to terms and whose columns correspond to documents. A typical example of the weighting of the elements of the matrix is tf-idf (term frequency–inverse document frequency): the weight of an element of the matrix is proportional to the number of times the terms appear in each document, where rare terms are upweighted to reflect their relative importance. This matrix is also common to standard semantic models, though it is not necessarily explicitly expressed as a matrix, since the mathematical properties of matrices are not always used. === Rank lowering === After the construction of the occurrence matrix, LSA finds a low-rank approximation to the term-document matrix. There could be various reasons for these approximations: The original term-document matrix is presumed too large for the computing resources; in this case, the approximated low rank matrix is interpreted as an approximation (a "least and necessary evil"). The original term-document matrix is presumed noisy: for example, anecdotal instances of terms are to be eliminated. From this point of view, the approximated matrix is interpreted as a de-noisified matrix (a better matrix than the original). The original term-document matrix is presumed overly sparse relative to the "true" term-document matrix. That is, the original matrix lists only the words actually in each document, whereas we might be interested in all words related to each document—generally a much larger set due to synonymy. The consequence of the rank lowering is that some dimensions are combined and depend on more than one term: {(car), (truck), (flower)} → {(1.3452 car + 0.2828 truck), (flower)} This mitigates the problem of identifying synonymy, as the rank lowering is expected to merge the dimensions associated with terms that have similar meanings. It also partially mitigates the problem with polysemy, since components of polysemous words that point in the "right" direction are added to the components of words that share a similar meaning. Conversely, components that point in other directions tend to either simply cancel out, or, at worst, to be smaller than components in the directions corresponding to the intended sense. === Derivation === Let X {\displaystyle X} be a matrix where element ( i , j ) {\displaystyle (i,j)} describes the occurrence of term i {\displaystyle i} in document j {\displaystyle j} (this can be, for example, the frequency). X {\displaystyle X} will look like this: d j ↓ t i T → [ x 1 , 1 … x 1 , j … x 1 , n ⋮ ⋱ ⋮ ⋱ ⋮ x i , 1 … x i , j … x i , n ⋮ ⋱ ⋮ ⋱ ⋮ x m , 1 … x m , j … x m , n ] {\displaystyle {\begin{matrix}&{\textbf {d}}_{j}\\&\downarrow \\{\textbf {t}}_{i}^{T}\rightarrow &{\begin{bmatrix}x_{1,1}&\dots &x_{1,j}&\dots &x_{1,n}\\\vdots &\ddots &\vdots &\ddots &\vdots \\x_{i,1}&\dots &x_{i,j}&\dots &x_{i,n}\\\vdots &\ddots &\vdots &\ddots &\vdots \\x_{m,1}&\dots &x_{m,j}&\dots &x_{m,n}\\\end{bmatrix}}\end{matrix}}} Now a row in this matrix will be a vector corresponding to a term, giving its relation to each document: t i T = [ x i , 1 … x i , j … x i , n ] {\displaystyle {\textbf {t}}_{i}^{T}={\begin{bmatrix}x_{i,1}&\dots &x_{i,j}&\dots &x_{i,n}\end{bmatrix}}} Likewise, a column in this matrix will be a vector corresponding to a document, giving its relation to each term: d j = [ x 1 , j ⋮ x i , j ⋮ x m , j ] {\displaystyle {\textbf {d}}_{j}={\begin{bmatrix}x_{1,j}\\\vdots \\x_{i,j}\\\vdots \\x_{m,j}\\\end{bmatrix}}} Now the dot product t i T t p {\displaystyle {\textbf {t}}_{i}^{T}{\textbf {t}}_{p}} between two term vectors gives the correlation between the terms over the set of documents. The matrix product X X T {\displaystyle XX^{T}} contains all these dot products. Element ( i , p ) {\displaystyle (i,p)} (which is equal to element ( p , i ) {\displaystyle (p,i)} ) contains the dot product t i T t p {\displaystyle {\textbf {t}}_{i}^{T}{\textbf {t}}_{p}} ( = t p T t i {\displaystyle ={\textbf {t}}_{p}^{T}{\textbf {t}}_{i}} ). Likewise, the matrix X T X {\displaystyle X^{T}X} contains the dot products between all the document vectors, giving their correlation over the terms: d j T d q = d q T d j {\displaystyle {\textbf {d}}_{j}^{T}{\textbf {d}}_{q}={\textbf {d}}_{q}^{T}{\textbf {d}}_{j}} . Now, from the theory of linear algebra, there exists a decomposition of X {\displaystyle X} such that U {\displaystyle U} and V {\displaystyle V} are orthogonal matrices and Σ {\displaystyle \Sigma } is a diagonal matrix. This is called a singular value decomposition (SVD): X = U Σ V T {\displaystyle {\begin{matrix}X=U\Sigma V^{T}\end{matrix}}} The matrix products giving us the term and document correlations then become X X T = ( U Σ V T ) ( U Σ V T ) T = ( U Σ V T ) ( V T T Σ T U T ) = U Σ V T V Σ T U T = U Σ Σ T U T X T X = ( U Σ V T ) T ( U Σ V T ) = ( V T T Σ T U T ) ( U Σ V T ) = V Σ T U T U Σ V T = V Σ T Σ V T {\displaystyle {\begin{matrix}XX^{T}&=&(U\Sigma V^{T})(U\Sigma V^{T})^{T}=(U\Sigma V^{T})(V^{T^{T}}\Sigma ^{T}U^{T})=U\Sigma V^{T}V\Sigma ^{T}U^{T}=U\Sigma \Sigma ^{T}U^{T}\\X^{T}X&=&(U\Sigma V^{T})^{T}(U\Sigma V^{T})=(V^{T^{T}}\Sigma ^{T}U^{T})(U\Sigma V^{T})=V\Sigma ^{T}U^{T}U\Sigma V^{T}=V\Sigma ^{T}\Sigma V^{T}\end{matrix}}} Since Σ Σ T {\displaystyle \Sigma \Sigma ^{T}} and Σ T Σ {\displaystyle \Sigma ^{T}\Sigma } are diagonal we see that U {\displaystyle U} must contain the eigenvectors of X X T {\displaystyle XX^{T}} , while V {\displaystyle V} must be the eigenvectors of X T X {\displaystyle X^{T}X} . Both products have the same non-zero eigenvalues, given by the non-zero entries of Σ Σ T {\displaystyle \Sigma \Sigma ^{T}} , or equally, by the non-zero entries of Σ T Σ {\displaystyle \Sigma ^{T}\Sigma } . Now the decomposition looks like this: X U Σ V T ( d j ) ( d ^ j ) ↓ ↓ ( t i T ) → [ x 1 , 1 … x 1 , j … x 1 , n ⋮ ⋱ ⋮ ⋱ ⋮ x i , 1 … x i , j … x i , n ⋮ ⋱ ⋮ ⋱ ⋮ x m , 1 … x m , j … x m , n ] = ( t ^ i T ) → [ [ u 1 ] … [ u l ] ] ⋅ [ σ 1 … 0 ⋮ ⋱ ⋮ 0 … σ l ] ⋅ [ [ v 1 ] ⋮ [ v l ] ] {\displaystyle {\begin{matrix}&X&&&U&&\Sigma &&V^{T}\\&({\textbf {d}}_{j})&&&&&&&({\hat {\textbf {d}}}_{j})\\&\downarrow &&&&&&&\downarrow \\({\textbf {t}}_{i}^{T})\rightarrow &{\begin{bmatrix}x_{1,1}&\dots &x_{1,j}&\dots &x_{1,n}\\\vdots &\ddots &\vdots &\ddots &\vdots \\x_{i,1}&\dots &x_{i,j}&\dots &x_{i,n}\\\vdots &\ddots &\vdots &\ddots &\vdots \\x_{m,1}&\dots &x_{m,j}&\dots &x_{m,n}\\\end{bmatrix}}&=&({\hat {\textbf {t}}}_{i}^{T})\rightarrow &{\begin{bmatrix}{\begin{bmatrix}\,\\\,\\{\textbf {u}}_{1}\\\,\\\,\end{bmatrix}}\dots {\begin{bmatrix}\,\\\,\\{\textbf {u}}_{l}\\\,\\\,\end{bmatrix}}\end{bmatrix}}&\cdot &{\begin{bmatrix}\sigma _{1}&\dots &0\\\vdots &\ddots &\vdots \\0&\dots &\sigma _{l}\\\end{bmatrix}}&\cdot &{\begin{bmatrix}{\begin{bmatrix}&&{\textbf {v}}_{1}&&\end{bmatrix}}\\\vdots \\{\begin{bmatrix}&&{\textbf {v}}_{l}&&\end{bmatrix}}\end{bmatrix}}\end{matrix}}} The values σ 1 , … , σ l {\displaystyle \sigma _{1},\dots ,\sigma _{l}} are called the singular values, and u 1 , … , u l {\displaystyle u_{1},\dots ,u_{l}} and v 1 , … , v l {\displaystyle v_{1},\dots ,v_{l}} the left and right singular vectors. Notice the only part of U {\displaystyle U} that contributes to t i {\displaystyle {\textbf {t}}_{i}} is the i 'th {\displaystyle i{\textrm {'th}}} row. Let this row vector be called t ^ i T {\displaystyle {\hat {\textrm {t}}}_{i}^{T}} . Likewise, the only part of V T {\displaystyle V^{T}} that contributes to d j {\displaystyle {\textbf {d}}_{j}} is the j 'th {\displaystyle j{\textrm {'th}}} column, d ^ j {\displaystyle {\hat {\textrm {d}}}_{j}} . These are not the eigenvectors, but depend on all the eigenvectors. I

    Read more →
  • ViBe

    ViBe

    ViBe is a background subtraction algorithm which has been presented at the IEEE ICASSP 2009 conference and was refined in later publications. More precisely, it is a software module for extracting background information from moving images. It has been developed by Oliver Barnich and Marc Van Droogenbroeck of the Montefiore Institute, University of Liège, Belgium. ViBe is patented: the patent covers various aspects such as stochastic replacement, spatial diffusion, and non-chronological handling. ViBe is written in the programming language C, and has been implemented on CPU, GPU and FPGA. == Technical description == Source: === Pixel model and classification process === Many advanced techniques are used to provide an estimate of the temporal probability density function (pdf) of a pixel x. ViBe's approach is different, as it imposes the influence of a value in the polychromatic space to be limited to the local neighborhood. In practice, ViBe does not estimate the pdf, but uses a set of previously observed sample values as a pixel model. To classify a value pt(x), it is compared to its closest values among the set of samples. === Model update: Sample values lifespan policy === ViBe ensures a smooth exponentially decaying lifespan for the sample values that constitute the pixel models. This makes ViBe able to successfully deal with concomitant events with a single model of a reasonable size for each pixel. This is achieved by choosing, randomly, which sample to replace when updating a pixel model. Once the sample to be discarded has been chosen, the new value replaces the discarded sample. The pixel model that would result from the update of a given pixel model with a given pixel sample cannot be predicted since the value to be discarded is chosen at random. === Model update: Spatial Consistency === To ensure the spatial consistency of the whole image model and handle practical situations such as small camera movements or slowly evolving background objects, ViBe uses a technique similar to that developed for the updating process in which it chooses at random and update a pixel model in the neighborhood of the current pixel. By denoting NG(x) and p(x) respectively the spatial neighborhood of a pixel x and its value, and assuming that it was decided to update the set of samples of x by inserting p(x), then ViBe also use this value p(x) to update the set of samples of one of the pixels in the neighborhood NG(x), chosen at random. As a result, ViBe is able to produce spatially coherent results directly without the use of any post-processing method. === Model initialization === Although the model could easily recover from any type of initialization, for example by choosing a set of random values, it is convenient to get an accurate background estimate as soon as possible. Ideally a segmentation algorithm would like to be able to segment the video sequences starting from the second frame, the first frame being used to initialize the model. Since no temporal information is available prior to the second frame, ViBe populates the pixel models with values found in the spatial neighborhood of each pixel; more precisely, it initializes the background model with values taken randomly in each pixel neighborhood of the first frame. The background estimate is therefore valid starting from the second frame of a video sequence.

    Read more →
  • Color gradient

    Color gradient

    In color science, a color gradient (also known as a color ramp or a color progression) specifies a range of position-dependent colors, usually used to fill a region. In assigning colors to a set of values, a gradient is a continuous colormap, a type of color scheme. In computer graphics, the term swatch has come to mean a palette of active colors. == Definitions == Color gradient is a set of colors arranged in a linear order (ordered) A continuous colormap is a curve through a colorspace === Strict definition === A colormap is a function which associate a real value r with point c in color space C {\displaystyle C} f : [ r m i n , r m a x ] ⊂ R → C {\displaystyle f:[r_{min},r_{max}]\subset \mathbf {R} \to C} which is defined by: a colorspace C an increasing sequence of sampling points r 0 < . . . < r m ∈ [ r m i n , r m a x ] {\displaystyle r_{0}<... Read more →

  • Night Sky (app)

    Night Sky (app)

    Night Sky (app) is an application developed and published by indie studio iCandi Apps Ltd. from the UK. Night Sky is a stargazing reference app, where the user can explore a virtual representation of the night sky to identify stars, planets, constellations and satellites. The app is developed specifically for iOS, tvOS and watchOS devices. Night Sky was first released on November 1, 2011 for iOS, and has had multiple updates since launch. Night Sky was mentioned in the September 2016 Apple Keynote during the Apple Watch Series 2 announcement. In October 2016, Night Sky was featured as the Free App of The Week on the Apple App Store. == Reception == Night Sky was featured in Apple's 'Best of 2012' and has also been pre-installed onto iPads in Apple retail stores worldwide.

    Read more →
  • Glossary of machine vision

    Glossary of machine vision

    The following are common definitions related to the machine vision field. General related fields Machine vision Computer vision Image processing Signal processing == 0-9 == 1394. FireWire is Apple Inc.'s brand name for the IEEE 1394 interface. It is also known as i.Link (Sony's name) or IEEE 1394 (although the 1394 standard also defines a backplane interface). It is a personal computer (and digital audio/digital video) serial bus interface standard, offering high-speed communications and isochronous real-time data services. 1D. One-dimensional. 2D computer graphics. The computer-based generation of digital images—mostly from two-dimensional models (such as 2D geometric models, text, and digital images) and by techniques specific to them. 3D computer graphics. 3D computer graphics are different from 2D computer graphics in that a three-dimensional representation of geometric data is stored in the computer for the purposes of performing calculations and rendering 2D images. Such images may be for later display or for real-time viewing. Despite these differences, 3D computer graphics rely on many of the same algorithms as 2D computer vector graphics in the wire frame model and 2D computer raster graphics in the final rendered display. In computer graphics software, the distinction between 2D and 3D is occasionally blurred; 2D applications may use 3D techniques to achieve effects such as lighting, and primarily 3D may use 2D rendering techniques. 3D scanner. This is a device that analyzes a real-world object or environment to collect data on its shape and possibly color. The collected data can then be used to construct digital, three dimensional models useful for a wide variety of applications. == A == Aberration. Optically, defocus refers to a translation along the optical axis away from the plane or surface of best focus. In general, defocus reduces the sharpness and contrast of the image. What should be sharp, high-contrast edges in a scene become gradual transitions. Algebraic distance or algebraic error. The algebraic distance from a point xi to a curve or surface defined by f ( x , β ) = 0 {\displaystyle f(x,\beta )=0} is the value of f ( x i , β ) {\displaystyle f(x_{i},\beta )} , i.e. the residual in the least squares problem with data point (xi, 0) and model function f. This term is mainly used in computer vision.[1][2] Aperture. In context of photography or machine vision, aperture refers to the diameter of the aperture stop of a photographic lens. The aperture stop can be adjusted to control the amount of light reaching the film or image sensor. aspect ratio (image). The aspect ratio of an image is its displayed width divided by its height (usually expressed as "x:y"). Angular resolution. Describes the resolving power of any image forming device such as an optical or radio telescope, a microscope, a camera, or an eye. Automated optical inspection. == B == Barcode. A barcode (also bar code) is a machine-readable representation of information in a visual format on a surface. Blob discovery. Inspecting an image for discrete blobs of connected pixels (e.g. a black hole in a grey object) as image landmarks. These blobs frequently represent optical targets for machining, robotic capture, or manufacturing failure. Bitmap. A raster graphics image, digital image, or bitmap, is a data file or structure representing a generally rectangular grid of pixels, or points of color, on a computer monitor, paper, or other display device. == C == Camera. A camera is a device used to take pictures, either singly or in sequence. A camera that takes pictures singly is sometimes called a photo camera to distinguish it from a video camera. Camera Link. Camera Link is a serial communication protocol designed for computer vision applications based on the National Semiconductor interface Channel-link. It was designed for the purpose of standardizing scientific and industrial video products including cameras, cables and frame grabbers. The standard is maintained and administered by the Automated Imaging Association, or AIA, the global machine vision industry's trade group. Charge-coupled device. A charge-coupled device (CCD) is a sensor for recording images, consisting of an integrated circuit containing an array of linked, or coupled, capacitors. CCD sensors and cameras tend to be more sensitive, less noisy, and more expensive than CMOS sensors and cameras. CIE 1931 Color Space. In the study of the perception of color, one of the first mathematically defined color spaces was the CIE XYZ color space (also known as CIE 1931 color space), created by the International Commission on Illumination (CIE) in 1931. CMOS. CMOS ("see-moss")stands for complementary metal-oxide semiconductor, is a major class of integrated circuits. CMOS imaging sensors for machine vision are cheaper than CCD sensors but more noisy. CoaXPress. CoaXPress (CXP) is an asymmetric high speed serial communication standard over coaxial cable. CoaXPress combines high speed image data, low speed camera control and power over a single coaxial cable. The standard is maintained by JIIA, the Japan Industrial Imaging Association. Color. The perception of the frequency (or wavelength) of light, and can be compared to how pitch (or a musical note) is the perception of the frequency or wavelength of sound. Color blindness. Also known as color vision deficiency, in humans is the inability to perceive differences between some or all colors that other people can distinguish Color temperature. "White light" is commonly described by its color temperature. A traditional incandescent light source's color temperature is determined by comparing its hue with a theoretical, heated black-body radiator. The lamp's color temperature is the temperature in kelvins at which the heated black-body radiator matches the hue of the lamp. Color vision. CV is the capacity of an organism or machine to distinguish objects based on the wavelengths (or frequencies) of the light they reflect or emit. computer vision. The study and application of methods which allow computers to "understand" image content. Contrast. In visual perception, contrast is the difference in visual properties that makes an object (or its representation in an image) distinguishable from other objects and the background. C-Mount. Standardized adapter for optical lenses on CCD - cameras. C-Mount lenses have a back focal distance 17.5 mm vs. 12.5 mm for "CS-mount" lenses. A C-Mount lens can be used on a CS-Mount camera through the use of a 5 mm extension adapter. C-mount is a 1" diameter, 32 threads per inch mounting thread (1"-32UN-2A.) CS-Mount. Same as C-Mount but the focal point is 5 mm shorter. A CS-Mount lens will not work on a C-Mount camera. CS-mount is a 1" diameter, 32 threads per inch mounting thread. == D == Data matrix. A two dimensional Barcode. Depth of field. In optics, particularly photography and machine vision, the depth of field (DOF) is the distance in front of and behind the subject which appears to be in focus. Depth perception. DP is the visual ability to perceive the world in three dimensions. It is a trait common to many higher animals. Depth perception allows the beholder to accurately gauge the distance to an object. Diaphragm. In optics, a diaphragm is a thin opaque structure with an opening (aperture) at its centre. The role of the diaphragm is to stop the passage of light, except for the light passing through the aperture. == E == Edge detection. ED marks the points in a digital image at which the luminous intensity changes sharply. It also marks the points of luminous intensity changes of an object or spatial-taxon silhouette. Electromagnetic interference. Radio Frequency Interference (RFI) is electromagnetic radiation which is emitted by electrical circuits carrying rapidly changing signals, as a by-product of their normal operation, and which causes unwanted signals (interference or noise) to be induced in other circuits. == F == FireWire. FireWire (also known as i. Link or IEEE 1394) is a personal computer (and digital audio/video) serial bus interface standard, offering high-speed communications. It is often used as an interface for industrial cameras. Fixed-pattern noise. Flat-field correction. Frame grabber. An electronic device that captures individual, digital still frames from an analog video signal or a digital video stream. Fringe Projection Technique. 3D data acquisition technique employing projector displaying fringe pattern on a surface of measured piece, and one or more cameras recording image(s). Field of view. The field of view (FOV) is the part which can be seen by the machine vision system at one moment. The field of view depends from the lens of the system and from the working distance between object and camera. Focus. An image, or image point or region, is said to be in focus if light from object points is converged about as well as possible in the image; conversely, it is out of focus if light is not w

    Read more →