In formal language theory, the Kleene star (or Kleene operator or Kleene closure) refers to two related unary operations, that can be applied either to an alphabet of symbols or to a formal language, a set of strings (finite sequences of symbols). The Kleene star operator on an alphabet V generates the set V of all finite-length strings over V, that is, finite sequences whose elements belong to V; in mathematics, it is more commonly known as the free monoid construction. The Kleene star operator on a language L generates another language L, the set of all strings that can be obtained as a concatenation of zero or more members of L. In both cases, repetitions are allowed. The Kleene star operators are named after American mathematician Stephen Cole Kleene, who first introduced and widely used it to characterize automata for regular expressions. == Of an alphabet == Given an alphabet V {\displaystyle V} , define V 0 = { ε } {\displaystyle V^{0}=\{\varepsilon \}} (the set consists only of the empty string), V 1 = V , {\displaystyle V^{1}=V,} and define recursively the set V i + 1 = { w v : w ∈ V i and v ∈ V } {\displaystyle V^{i+1}=\{wv:w\in V^{i}{\text{ and }}v\in V\}} for each i > 0 , {\displaystyle i>0,} where w v {\displaystyle wv} denotes the string obtained by appending the single character v {\displaystyle v} to the end of w {\displaystyle w} . Here, V i {\displaystyle V^{i}} can be understood to be the set of all strings of length exactly i {\displaystyle i} , with characters from V {\displaystyle V} . The definition of Kleene star on V {\displaystyle V} is V ∗ = ⋃ i ≥ 0 V i = V 0 ∪ V 1 ∪ V 2 ∪ V 3 ∪ V 4 ∪ ⋯ . {\displaystyle V^{}=\bigcup _{i\geq 0}V^{i}=V^{0}\cup V^{1}\cup V^{2}\cup V^{3}\cup V^{4}\cup \cdots .} == Of a language == Given a language L {\displaystyle L} (any finite or infinite set of strings), define L 0 = { ε } {\displaystyle L^{0}=\{\varepsilon \}} (the language consisting only of the empty string), L 1 = L , {\displaystyle L^{1}=L,} and define recursively the set L i + 1 = { w v : w ∈ L i and v ∈ L } {\displaystyle L^{i+1}=\{wv:w\in L^{i}{\text{ and }}v\in L\}} for each i > 0 , {\displaystyle i>0,} where w v {\displaystyle wv} denotes the string obtained by concatenating w {\displaystyle w} and v {\displaystyle v} . Here, L i {\displaystyle L^{i}} can be understood to be the set of all strings that can be obtained by concatenating exactly i {\displaystyle i} strings from L {\displaystyle L} , allowing repetitions. The definition of Kleene star on L {\displaystyle L} is L ∗ = ⋃ i ≥ 0 L i = L 0 ∪ L 1 ∪ L 2 ∪ L 3 ∪ L 4 ∪ ⋯ . {\displaystyle L^{}=\bigcup _{i\geq 0}L^{i}=L^{0}\cup L^{1}\cup L^{2}\cup L^{3}\cup L^{4}\cup \cdots .} == Kleene plus == In some formal language studies, (e.g. AFL theory) a variation on the Kleene star operation called the Kleene plus is used. The Kleene plus omits the V 0 {\displaystyle V^{0}} or L 0 {\displaystyle L^{0}} term in the above unions. In other words, the Kleene plus on V {\displaystyle V} is V + = ⋃ i ≥ 1 V i = V 1 ∪ V 2 ∪ V 3 ∪ ⋯ , {\displaystyle V^{+}=\bigcup _{i\geq 1}V^{i}=V^{1}\cup V^{2}\cup V^{3}\cup \cdots ,} or V + = V ∗ V . {\displaystyle V^{+}=V^{}V.} == Examples == Example of Kleene star applied to a set of strings: {"ab","c"} = { ε, "ab", "c", "abab", "abc", "cab", "cc", "ababab", "ababc", "abcab", "abcc", "cabab", "cabc", "ccab", "ccc", ...}. Example of Kleene star applied to a set of strings without the prefix property: {"a","ab","b"} = { ε, "a", "ab", "b", "aa", "aab", "aba", "abab", "abb", "ba", "bab", "bb", ...};In this example, the string "aab" can be obtained in two different ways. The Sardinas-Patterson algorithm can be used to check for a given V whether any member of V can be obtained in more than one way. Example of Kleene and Kleene plus applied to a set of characters (following the C programming language convention where a character is denoted by single quotes and a string is denoted by double quotes): {'a', 'b', 'c'} = { ε, "a", "b", "c", "aa", "ab", "ac", "ba", "bb", "bc", "ca", "cb", "cc", "aaa", "aab", ...}. {'a', 'b', 'c'}+ = { "a", "b", "c", "aa", "ab", "ac", "ba", "bb", "bc", "ca", "cb", "cc", "aaa", "aab", ...}. == Properties == If V {\displaystyle V} is any finite or countably infinite set of characters, then V ∗ {\displaystyle V^{}} is a countably infinite set. As a result, each formal language over a finite or countably infinite alphabet Σ {\displaystyle \Sigma } is countable, since it is a subset of the countably infinite set Σ ∗ {\displaystyle \Sigma ^{}} . ( L ∗ ) ∗ = L ∗ {\displaystyle (L^{})^{}=L^{}} , which means that the Kleene star operator is an idempotent unary operator, as ( L ∗ ) i = L ∗ {\displaystyle (L^{})^{i}=L^{}} for every i ≥ 1 {\displaystyle i\geq 1} . V ∗ = { ε } {\displaystyle V^{}=\{\varepsilon \}} , if V {\displaystyle V} is the empty set ∅. For the version of the Kleene star operator on languages, L ∗ = { ε } {\displaystyle L^{}=\{\varepsilon \}} when L {\displaystyle L} is either the empty set ∅ or the singleton set { ε } {\displaystyle \{\varepsilon \}} . == Generalization == Strings form a monoid with concatenation as the binary operation and ε the identity element. In addition to strings, the Kleene star is defined for any monoid. More precisely, let (M, ⋅) be a monoid, and S ⊆ M. Then S is the smallest submonoid of M containing S; that is, S contains the neutral element of M, the set S, and is such that if x,y ∈ S, then x⋅y ∈ S. Furthermore, the Kleene star is generalized by including the -operation (and the union) in the algebraic structure itself by the notion of complete star semiring.
INDECT
INDECT is a research project in the area of intelligent security systems performed by several European universities since 2009 and funded by the European Union. The purpose of the project is to involve European scientists and researchers in the development of solutions to and tools for automatic threat detection through e.g. processing of CCTV camera data streams, standardization of video sequence quality for user applications, threat detection in computer networks as well as data and privacy protection. The area of research, applied methods, and techniques are described in the public deliverables which are available to the public on the project's website. Practically, all information related to the research is public. Only documents that comprise information related to financial data or information that could negatively influence the competitiveness and law enforcement capabilities of parties involved in the project are not published. This follows regulations and practices applied in EU research projects. == Application and target users == The main end-user of INDECT solutions are police forces and security services. The principle of operation of the project is detecting threats and identifying sources of threats, without monitoring and searching for particular citizens or groups of citizens. Then, the system operator (i.e. police officer) decides whether an intervention of services responsible for public security are required or not. Further investigation eventually leading to persons related to threats is performed, preserving the presumption of innocence, based on existing procedures already used by police services and prosecutors. As it can be found in the project deliverables, INDECT does not involve storage of personal data (such as names, addresses, identity document numbers, etc.). A similar, behavior-based surveillance program was SAMURAI (Suspicious and Abnormal behavior Monitoring Using a netwoRk of cAmeras & sensors for sItuation awareness enhancement). == Expected results == The main expected results of the INDECT project are: Trial of intelligent analysis of video and audio data for threat detection in urban environments Creation of tools and technology for privacy and data protection during storage and transmission of information using quantum cryptography and new methods of digital watermarking Performing computer-aided detection of threats and targeted crimes in Internet resources with privacy-protecting solutions Construction of a search engine for rapid semantic search based on watermarking of content related to child pornography and human organ trafficking Implementation of a distributed computer system that is capable of effective intelligent processing == Controversy == Some media and other sources accuse INDECT of privacy abuse, collecting personal data, and keeping information from the public. Consequently, these issues have been commented and discussed by some Members of the European Parliament. As seen in the project's documentation, INDECT does not involve mobile phone tracking or call interception. The rumors about testing INDECT during 2012 UEFA European Football Championship also turned out to be false. The mid-term review of the Seventh Framework Programme to the European Parliament strongly urges the European Commission to immediately make all documents available and to define a clear and strict mandate for the research goal, the application, and the end users of INDECT, and stresses a thorough investigation of the possible impact on fundamental rights. Nevertheless, according to Mr. Paweł Kowal, MEP, the project had the ethical review on 15 March 2011 in Brussels with the participation of ethics experts from Austria, France, Netherlands, Germany and Great Britain.
Agent Ruby
Agent Ruby (1998–2002) by Lynn Hershman Leeson is an interactive, multiuser work using artificial intelligence. == Description == On Agent Ruby's website, "Agent Ruby's Edream Portal," a female face moves her eyes and lips. Ruby, named from Hershman Leeson's own film, Teknolust, answers questions and often responds that she needs a better algorithm to answer questions not within her database. The work, created with AI, explores relationships between real and virtual worlds. Hershman Leeson had created an earlier version of Ruby, CyberRoberta, which was a custom-made doll with webcam eyes that interacted with the internet. The work in a gallery provides a screen and a sign inviting gallery-goers to "Chat with Ruby." == Artificial intelligence == In 2015 when Agent Ruby was exhibited at the gallery Modern Art Oxford, a review in Aesthetica Magazine described it as an artificial intelligence agent. A review in New Scientist noted that "Ruby is a fast learner, but perhaps not a natural conversationalist." A 2024 list of "25 Essential AI Artworks" published by ARTnews wrote that while "Agent Ruby's capabilities seem limited by today's standards," it was extensive for its day. == Publications and exhibitions == Agent Ruby was commissioned and displayed at the San Francisco Museum of Modern Art, Modern Art Oxford, and the ZKM Center for Art and Media in Karlsruhe, Germany. The San Francisco Museum of Modern Art (SFMOMA) presented Lynn Hershman Leeson: The Agent Ruby Files, March 30 through June 2, 2013 which presented the project server's archive of user conversations over the 12 years of exhibitions.
Superquadrics
In mathematics, the superquadrics or super-quadrics (also superquadratics) are a family of geometric shapes defined by formulas that resemble those of ellipsoids and other quadrics, except that the squaring operations are replaced by arbitrary powers. They can be seen as the three-dimensional relatives of the superellipses. The term may refer to the solid object or to its surface, depending on the context. The equations below specify the surface; the solid is specified by replacing the equality signs by less-than-or-equal signs. The superquadrics include many shapes that resemble cubes, octahedra, cylinders, lozenges and spindles, with rounded or sharp corners. Because of their flexibility and relative simplicity, they are popular geometric modeling tools, especially in computer graphics. It becomes an important geometric primitive widely used in computer vision, robotics, and physical simulation. Some authors, such as Alan Barr, define "superquadrics" as including both the superellipsoids and the supertoroids. In modern computer vision literatures, superquadrics and superellipsoids are used interchangeably, since superellipsoids are the most representative and widely utilized shape among all the superquadrics. Comprehensive coverage of geometrical properties of superquadrics and methods of their recovery from range images and point clouds are covered in several computer vision literatures. == Formulas == === Implicit equation === The surface of the basic superquadric is given by | x | r + | y | s + | z | t = 1 {\displaystyle \left|x\right|^{r}+\left|y\right|^{s}+\left|z\right|^{t}=1} where r, s, and t are positive real numbers that determine the main features of the superquadric. Namely: less than 1: a pointy octahedron modified to have concave faces and sharp edges. exactly 1: a regular octahedron. between 1 and 2: an octahedron modified to have convex faces, blunt edges and blunt corners. exactly 2: a sphere greater than 2: a cube modified to have rounded edges and corners. infinite (in the limit): a cube Each exponent can be varied independently to obtain combined shapes. For example, if r=s=2, and t=4, one obtains a solid of revolution which resembles an ellipsoid with round cross-section but flattened ends. This formula is a special case of the superellipsoid's formula if (and only if) r = s. If any exponent is allowed to be negative, the shape extends to infinity. Such shapes are sometimes called super-hyperboloids. The basic shape above spans from -1 to +1 along each coordinate axis. The general superquadric is the result of scaling this basic shape by different amounts A, B, C along each axis. Its general equation is | x A | r + | y B | s + | z C | t = 1. {\displaystyle \left|{\frac {x}{A}}\right|^{r}+\left|{\frac {y}{B}}\right|^{s}+\left|{\frac {z}{C}}\right|^{t}=1.} === Parametric description === Parametric equations in terms of surface parameters u and v (equivalent to longitude and latitude if m equals 2) are x ( u , v ) = A g ( v , 2 r ) g ( u , 2 r ) y ( u , v ) = B g ( v , 2 s ) f ( u , 2 s ) z ( u , v ) = C f ( v , 2 t ) − π 2 ≤ v ≤ π 2 , − π ≤ u < π , {\displaystyle {\begin{aligned}x(u,v)&{}=Ag\left(v,{\frac {2}{r}}\right)g\left(u,{\frac {2}{r}}\right)\\y(u,v)&{}=Bg\left(v,{\frac {2}{s}}\right)f\left(u,{\frac {2}{s}}\right)\\z(u,v)&{}=Cf\left(v,{\frac {2}{t}}\right)\\&-{\frac {\pi }{2}}\leq v\leq {\frac {\pi }{2}},\quad -\pi \leq u<\pi ,\end{aligned}}} where the auxiliary functions are f ( ω , m ) = sgn ( sin ω ) | sin ω | m g ( ω , m ) = sgn ( cos ω ) | cos ω | m {\displaystyle {\begin{aligned}f(\omega ,m)&{}=\operatorname {sgn}(\sin \omega )\left|\sin \omega \right|^{m}\\g(\omega ,m)&{}=\operatorname {sgn}(\cos \omega )\left|\cos \omega \right|^{m}\end{aligned}}} and the sign function sgn(x) is sgn ( x ) = { − 1 , x < 0 0 , x = 0 + 1 , x > 0. {\displaystyle \operatorname {sgn}(x)={\begin{cases}-1,&x<0\\0,&x=0\\+1,&x>0.\end{cases}}} === Spherical product === Barr introduces the spherical product which given two plane curves produces a 3D surface. If f ( μ ) = ( f 1 ( μ ) f 2 ( μ ) ) , g ( ν ) = ( g 1 ( ν ) g 2 ( ν ) ) {\displaystyle f(\mu )={\begin{pmatrix}f_{1}(\mu )\\f_{2}(\mu )\end{pmatrix}},\quad g(\nu )={\begin{pmatrix}g_{1}(\nu )\\g_{2}(\nu )\end{pmatrix}}} are two plane curves then the spherical product is h ( μ , ν ) = f ( μ ) ⊗ g ( ν ) = ( f 1 ( μ ) g 1 ( ν ) f 1 ( μ ) g 2 ( ν ) f 2 ( μ ) ) {\displaystyle h(\mu ,\nu )=f(\mu )\otimes g(\nu )={\begin{pmatrix}f_{1}(\mu )\ g_{1}(\nu )\\f_{1}(\mu )\ g_{2}(\nu )\\f_{2}(\mu )\end{pmatrix}}} This is similar to the typical parametric equation of a sphere: x = x 0 + r sin θ cos φ y = y 0 + r sin θ sin φ ( 0 ≤ θ ≤ π , 0 ≤ φ < 2 π ) z = z 0 + r cos θ {\displaystyle {\begin{aligned}x&=x_{0}+r\sin \theta \;\cos \varphi \\y&=y_{0}+r\sin \theta \;\sin \varphi \qquad (0\leq \theta \leq \pi ,\;0\leq \varphi <2\pi )\\z&=z_{0}+r\cos \theta \end{aligned}}} which give rise to the name spherical product. Barr uses the spherical product to define quadric surfaces, like ellipsoids, and hyperboloids as well as the torus, superellipsoid, superquadric hyperboloids of one and two sheets, and supertoroids. == Plotting code == The following GNU Octave code generates a mesh approximation of a superquadric:
Machine vision
Machine vision is the technology and methods used to provide imaging-based automatic inspection and analysis for such applications as automatic inspection, process control, and robot guidance, usually in industry. Machine vision refers to many technologies, software and hardware products, integrated systems, actions, methods and expertise. Machine vision as a systems engineering discipline can be considered distinct from computer vision, a form of computer science. It attempts to integrate existing technologies in new ways and apply them to solve real world problems. The term is the prevalent one for these functions in industrial automation environments but is also used for these functions in other environment vehicle guidance. The overall machine vision process includes planning the details of the requirements and project, and then creating a solution. During run-time, the process starts with imaging, followed by automated analysis of the image and extraction of the required information. == Definition == Definitions of the term "Machine vision" vary, but all include the technology and methods used to extract information from an image on an automated basis, as opposed to image processing, where the output is another image. The information extracted can be a simple good-part/bad-part signal, or more a complex set of data such as the identity, position and orientation of each object in an image. The information can be used for such applications as automatic inspection and robot and process guidance in industry, for security monitoring and vehicle guidance. This field encompasses a large number of technologies, software and hardware products, integrated systems, actions, methods and expertise. Machine vision is practically the only term used for these functions in industrial automation applications; the term is less universal for these functions in other environments such as security and vehicle guidance. Machine vision as a systems engineering discipline can be considered distinct from computer vision, a form of basic computer science; machine vision attempts to integrate existing technologies in new ways and apply them to solve real world problems in a way that meets the requirements of industrial automation and similar application areas. The term is also used in a broader sense by trade shows and trade groups such as the Automated Imaging Association and the European Machine Vision Association. This broader definition also encompasses products and applications most often associated with image processing. The primary uses for machine vision are automatic inspection and industrial robot/process guidance. In more recent times the terms computer vision and machine vision have converged to a greater degree. See glossary of machine vision. == Imaging based automatic inspection and sorting == The primary uses for machine vision are imaging-based automatic inspection and sorting and robot guidance.; in this section the former is abbreviated as "automatic inspection". The overall process includes planning the details of the requirements and project, and then creating a solution. This section describes the technical process that occurs during the operation of the solution. === Methods and sequence of operation === The first step in the automatic inspection sequence of operation is acquisition of an image, typically using cameras, lenses, and lighting that has been designed to provide the differentiation required by subsequent processing. MV software packages and programs developed in them then employ various digital image processing techniques to extract the required information, and often make decisions (such as pass/fail) based on the extracted information. === Equipment === The components of an automatic inspection system usually include lighting, a camera or other imager, a processor, software, and output devices. === Imaging === The imaging device (e.g. camera) can either be separate from the main image processing unit or combined with it in which case the combination is generally called a smart camera or smart sensor. Inclusion of the full processing function into the same enclosure as the camera is often referred to as embedded processing. When separated, the connection may be made to specialized intermediate hardware, a custom processing appliance, or a frame grabber within a computer using either an analog or standardized digital interface (Camera Link, CoaXPress). MV implementations also use digital cameras capable of direct connections (without a framegrabber) to a computer via FireWire, USB or Gigabit Ethernet interfaces. While conventional (2D visible light) imaging is most commonly used in MV, alternatives include multispectral imaging, hyperspectral imaging, imaging various infrared bands, line scan imaging, 3D imaging of surfaces and X-ray imaging. Key differentiations within MV 2D visible light imaging are monochromatic vs. color, frame rate, resolution, and whether or not the imaging process is simultaneous over the entire image, making it suitable for moving processes. Though the vast majority of machine vision applications are solved using two-dimensional imaging, machine vision applications utilizing 3D imaging are a growing niche within the industry. The most commonly used method for 3D imaging is scanning based triangulation which utilizes motion of the product or image during the imaging process. A laser is projected onto the surfaces of an object. In machine vision this is accomplished with a scanning motion, either by moving the workpiece, or by moving the camera & laser imaging system. The line is viewed by a camera from a different angle; the deviation of the line represents shape variations. Lines from multiple scans are assembled into a depth map or point cloud. Stereoscopic vision is used in special cases involving unique features present in both views of a pair of cameras. Other 3D methods used for machine vision are time of flight and grid based. One method is grid array based systems using pseudorandom structured light system as employed by the Microsoft Kinect system circa 2012. === Image processing === After an image is acquired, it is processed. Central processing functions are generally done by a CPU, a GPU, a FPGA or a combination of these. Deep learning training and inference impose higher processing performance requirements. Multiple stages of processing are generally used in a sequence that ends up as a desired result. A typical sequence might start with tools such as filters which modify the image, followed by extraction of objects, then extraction (e.g. measurements, reading of codes) of data from those objects, followed by communicating that data, or comparing it against target values to create and communicate "pass/fail" results. Machine vision image processing methods include; Stitching/Registration: Combining of adjacent 2D or 3D images. Filtering (e.g. morphological filtering) Thresholding: Thresholding starts with setting or determining a gray value that will be useful for the following steps. The value is then used to separate portions of the image, and sometimes to transform each portion of the image to simply black and white based on whether it is below or above that grayscale value. Pixel counting: counts the number of light or dark pixels Segmentation: Partitioning a digital image into multiple segments to simplify and/or change the representation of an image into something that is more meaningful and easier to analyze. Edge detection: finding object edges Color Analysis: Identify parts, products and items using color, assess quality from color, and isolate features using color. Blob detection and extraction: inspecting an image for discrete blobs of connected pixels (e.g. a black hole in a grey object) as image landmarks. Neural network / deep learning / machine learning processing: weighted and self-training multi-variable decision making Circa 2019 there is a large expansion of this, using deep learning and machine learning to significantly expand machine vision capabilities. The most common result of such processing is classification. Examples of classification are object identification,"pass fail" classification of identified objects and OCR. Pattern recognition including template matching. Finding, matching, and/or counting specific patterns. This may include location of an object that may be rotated, partially hidden by another object, or varying in size. Barcode, Data Matrix and "2D barcode" reading Optical character recognition: automated reading of text such as serial numbers Gauging/Metrology: measurement of object dimensions (e.g. in pixels, inches or millimeters) Comparison against target values to determine a "pass or fail" or "go/no go" result. For example, with code or bar code verification, the read value is compared to the stored target value. For gauging, a measurement is compared against the proper value and tolerances. For verification of alpha-numberic codes, the
Statistical shape analysis
Statistical shape analysis is an analysis of the geometrical properties of some given set of shapes by statistical methods. For instance, it could be used to quantify differences between male and female gorilla skull shapes, normal and pathological bone shapes, leaf outlines with and without herbivory by insects, etc. Important aspects of shape analysis are to obtain a measure of distance between shapes, to estimate mean shapes from (possibly random) samples, to estimate shape variability within samples, to perform clustering and to test for differences between shapes. One of the main methods used is principal component analysis (PCA). Statistical shape analysis has applications in various fields, including medical imaging, computer vision, computational anatomy, sensor measurement, and geographical profiling. == Landmark-based techniques == In the point distribution model, a shape is determined by a finite set of coordinate points, known as landmark points. These landmark points often correspond to important identifiable features such as the corners of the eyes. Once the points are collected some form of registration is undertaken. This can be a baseline methods used by Fred Bookstein for geometric morphometrics in anthropology. Or an approach like Procrustes analysis which finds an average shape. David George Kendall investigated the statistical distribution of the shape of triangles, and represented each triangle by a point on a sphere. He used this distribution on the sphere to investigate ley lines and whether three stones were more likely to be co-linear than might be expected. Statistical distribution like the Kent distribution can be used to analyse the distribution of such spaces. Alternatively, shapes can be represented by curves or surfaces representing their contours, by the spatial region they occupy. == Shape deformations == Differences between shapes can be quantified by investigating deformations transforming one shape into another. In particular a diffeomorphism preserves smoothness in the deformation. This was pioneered in D'Arcy Thompson's On Growth and Form before the advent of computers. Deformations can be interpreted as resulting from a force applied to the shape. Mathematically, a deformation is defined as a mapping from a shape x to a shape y by a transformation function Φ {\displaystyle \Phi } , i.e., y = Φ ( x ) {\displaystyle y=\Phi (x)} . Given a notion of size of deformations, the distance between two shapes can be defined as the size of the smallest deformation between these shapes. Diffeomorphometry is the focus on comparison of shapes and forms with a metric structure based on diffeomorphisms, and is central to the field of Computational anatomy. Diffeomorphic registration, introduced in the 90's, is now an important player with existing codes bases organized around ANTS, DARTEL, DEMONS, LDDMM, StationaryLDDMM, and FastLDDMM are examples of actively used computational codes for constructing correspondences between coordinate systems based on sparse features and dense images. Voxel-based morphometry (VBM) is an important technology built on many of these principles. Methods based on diffeomorphic flows are also used. For example, deformations could be diffeomorphisms of the ambient space, resulting in the LDDMM (Large Deformation Diffeomorphic Metric Mapping) framework for shape comparison.
Geometric hashing
In computer science, geometric hashing is a method for efficiently finding two-dimensional objects represented by discrete points that have undergone an affine transformation, though extensions exist to other object representations and transformations. In an off-line step, the objects are encoded by treating each pair of points as a geometric basis. The remaining points can be represented in an invariant fashion with respect to this basis using two parameters. For each point, its quantized transformed coordinates are stored in the hash table as a key, and indices of the basis points as a value. Then a new pair of basis points is selected, and the process is repeated. In the on-line (recognition) step, randomly selected pairs of data points are considered as candidate bases. For each candidate basis, the remaining data points are encoded according to the basis and possible correspondences from the object are found in the previously constructed table. The candidate basis is accepted if a sufficiently large number of the data points index a consistent object basis. Geometric hashing was originally suggested in computer vision for object recognition in 2D and 3D, but later was applied to different problems such as structural alignment of proteins. == Geometric hashing in computer vision == Geometric hashing is a method used for object recognition. Let’s say that we want to check if a model image can be seen in an input image. This can be accomplished with geometric hashing. The method could be used to recognize one of the multiple objects in a base, in this case the hash table should store not only the pose information but also the index of object model in the base. === Example === For simplicity, this example will not use too many point features and assume that their descriptors are given by their coordinates only (in practice local descriptors such as SIFT could be used for indexing). ==== Training Phase ==== Find the model's feature points. Assume that 5 feature points are found in the model image with the coordinates ( 12 , 17 ) ; {\displaystyle (12,17);} ( 45 , 13 ) ; {\displaystyle (45,13);} ( 40 , 46 ) ; {\displaystyle (40,46);} ( 20 , 35 ) ; {\displaystyle (20,35);} ( 35 , 25 ) {\displaystyle (35,25)} , see the picture. Introduce a basis to describe the locations of the feature points. For 2D space and similarity transformation the basis is defined by a pair of points. The point of origin is placed in the middle of the segment connecting the two points (P2, P4 in our example), the x ′ {\displaystyle x'} axis is directed towards one of them, the y ′ {\displaystyle y'} is orthogonal and goes through the origin. The scale is selected such that absolute value of x ′ {\displaystyle x'} for both basis points is 1. Describe feature locations with respect to that basis, i.e. compute the projections to the new coordinate axes. The coordinates should be discretised to make recognition robust to noise, we take the bin size 0.25. We thus get the coordinates ( − 0.75 , − 1.25 ) ; {\displaystyle (-0.75,-1.25);} ( 1.00 , 0.00 ) ; {\displaystyle (1.00,0.00);} ( − 0.50 , 1.25 ) ; {\displaystyle (-0.50,1.25);} ( − 1.00 , 0.00 ) ; {\displaystyle (-1.00,0.00);} ( 0.00 , 0.25 ) {\displaystyle (0.00,0.25)} Store the basis in a hash table indexed by the features (only transformed coordinates in this case). If there were more objects to match with, we should also store the object number along with the basis pair. Repeat the process for a different basis pair (Step 2). It is needed to handle occlusions. Ideally, all the non-colinear pairs should be enumerated. We provide the hash table after two iterations, the pair (P1, P3) is selected for the second one. Hash Table: Most hash tables cannot have identical keys mapped to different values. So in real life one won’t encode basis keys (1.0, 0.0) and (-1.0, 0.0) in a hash table. ==== Recognition Phase ==== Find interesting feature points in the input image. Choose an arbitrary basis. If there isn't a suitable arbitrary basis, then it is likely that the input image does not contain the target object. Describe coordinates of the feature points in the new basis. Quantize obtained coordinates as it was done before. Compare all the transformed point features in the input image with the hash table. If the point features are identical or similar, then increase the count for the corresponding basis (and the type of object, if any). For each basis such that the count exceeds a certain threshold, verify the hypothesis that it corresponds to an image basis chosen in Step 2. Transfer the image coordinate system to the model one (for the supposed object) and try to match them. If successful, the object is found. Otherwise, go back to Step 2. === Finding mirrored pattern === It seems that this method is only capable of handling scaling, translation, and rotation. However, the input image may contain the object in mirror transform. Therefore, geometric hashing should be able to find the object, too. There are two ways to detect mirrored objects. For the vector graph, make the left side positive, and the right side negative. Multiplying the x position by -1 will give the same result. Use 3 points for the basis. This allows detecting mirror images (or objects). Actually, using 3 points for the basis is another approach for geometric hashing. === Geometric hashing in higher-dimensions === Similar to the example above, hashing applies to higher-dimensional data. For three-dimensional data points, three points are also needed for the basis. The first two points define the x-axis, and the third point defines the y-axis (with the first point). The z-axis is perpendicular to the created axis using the right-hand rule. Notice that the order of the points affects the resulting basis