AI App Orange Logo

AI App Orange Logo — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Randomized Hough transform

    Randomized Hough transform

    Hough transforms are techniques for object detection, a critical step in many implementations of computer vision, or data mining from images. Specifically, the Randomized Hough transform is a probabilistic variant to the classical Hough transform, and is commonly used to detect curves (straight line, circle, ellipse, etc.) The basic idea of Hough transform (HT) is to implement a voting procedure for all potential curves in the image, and at the termination of the algorithm, curves that do exist in the image will have relatively high voting scores. Randomized Hough transform (RHT) is different from HT in that it tries to avoid conducting the computationally expensive voting process for every nonzero pixel in the image by taking advantage of the geometric properties of analytical curves, and thus improve the time efficiency and reduce the storage requirement of the original algorithm. == Motivation == Although Hough transform (HT) has been widely used in curve detection, it has two major drawbacks: First, for each nonzero pixel in the image, the parameters for the existing curve and redundant ones are both accumulated during the voting procedure. Second, the accumulator array (or Hough space) is predefined in a heuristic way. The more accuracy needed, the higher parameter resolution should be defined. These two needs usually result in a large storage requirement and low speed for real applications. Therefore, RHT was brought up to tackle this problem. == Implementation == In comparison with HT, RHT takes advantage of the fact that some analytical curves can be fully determined by a certain number of points on the curve. For example, a straight line can be determined by two points, and an ellipse (or a circle) can be determined by three points. The case of ellipse detection can be used to illustrate the basic idea of RHT. The whole process generally consists of three steps: Fit ellipses with randomly selected points. Update the accumulator array and corresponding scores. Output the ellipses with scores higher than some predefined threshold. === Ellipse fitting === One general equation for defining ellipses is: a ( x − p ) 2 + 2 b ( x − p ) ( y − q ) + c ( y − q ) 2 = 1 {\displaystyle a(x-p)^{2}+2b(x-p)(y-q)+c(y-q)^{2}=1} with restriction: a c − b 2 > 0 {\displaystyle ac-b^{2}>0} However, an ellipse can be fully determined if one knows three points on it and the tangents in these points. RHT starts by randomly selecting three points on the ellipse. Let them be X 1 {\displaystyle X_{1}} , X 2 {\displaystyle X_{2}} and X 3 {\displaystyle X_{3}} . The first step is to find the tangents of these three points. They can be found by fitting a straight line using least squares technique for a small window of neighboring pixels. The next step is to find the intersection points of the tangent lines. This can be easily done by solving the line equations found in the previous step. Then let the intersection points be T 12 {\displaystyle T_{12}} and T 23 {\displaystyle T_{23}} , the midpoints of line segments X 1 X 2 {\displaystyle X_{1}X_{2}} and X 2 X 3 {\displaystyle X_{2}X_{3}} be M 12 {\displaystyle M_{12}} and M 23 {\displaystyle M_{23}} . Then the center of the ellipse will lie in the intersection of T 12 M 12 {\displaystyle T_{12}M_{12}} and T 23 M 23 {\displaystyle T_{23}M_{23}} . Again, the coordinates of the intersected point can be determined by solving line equations and the detailed process is skipped here for conciseness. Let the coordinates of ellipse center found in previous step be ( x 0 , y 0 ) {\displaystyle (x_{0},y_{0})} . Then the center can be translated to the origin with x ′ = x − x 0 {\displaystyle x'=x-x_{0}} and y ′ = y − y 0 {\displaystyle y'=y-y_{0}} so that the ellipse equation can be simplified to: a x ′ 2 + 2 b x ′ y ′ + c y ′ 2 = 1 {\displaystyle ax'^{2}+2bx'y'+cy'^{2}=1} Now we can solve for the rest of ellipse parameters: a {\displaystyle a} , b {\displaystyle b} and c {\displaystyle c} by substituting the coordinates of X 1 {\displaystyle X_{1}} , X 2 {\displaystyle X_{2}} and X 3 {\displaystyle X_{3}} into the equation above. === Accumulating === With the ellipse parameters determined from previous stage, the accumulator array can be updated correspondingly. Different from classical Hough transform, RHT does not keep "grid of buckets" as the accumulator array. Rather, it first calculates the similarities between the newly detected ellipse and the ones already stored in accumulator array. Different metrics can be used to calculate the similarity. As long as the similarity exceeds some predefined threshold, replace the one in the accumulator with the average of both ellipses and add 1 to its score. Otherwise, initialize this ellipse to an empty position in the accumulator and assign a score of 1. === Termination === Once the score of one candidate ellipse exceeds the threshold, it is determined as existing in the image (in other words, this ellipse is detected), and should be removed from the image and accumulator array so that the algorithm can detect other potential ellipses faster. The algorithm terminates when the number of iterations reaches a maximum limit or all the ellipses have been detected. Pseudo code for RHT: while (we find ellipses AND not reached the maximum epoch) { for (a fixed number of iterations) { Find a potential ellipse. if (the ellipse is similar to an ellipse in the accumulator) then Replace the one in the accumulator with the average of two ellipses and add 1 to the score; else Insert the ellipse into an empty position in the accumulator with a score of 1; } Select the ellipse with the best score and save it in a best ellipse table; Eliminate the pixels of the best ellipse from the image; Empty the accumulator; }

    Read more →
  • AI Text-to-video Tools Reviews: What Actually Works in 2026

    AI Text-to-video Tools Reviews: What Actually Works in 2026

    Looking for the best AI text-to-video tool? An AI text-to-video tool is software that uses machine learning to help you get more done — it can save you hours every week by automating repetitive work. Most options offer a generous free tier, with paid plans unlocking higher limits, faster processing, and team features. Whether you are a beginner or a pro, the right AI text-to-video tool slots into your workflow and pays for itself fast. This guide breaks down the top picks, their pros and cons, and who each one is best for.

    Read more →
  • AI Art Generators Reviews: What Actually Works in 2026

    AI Art Generators Reviews: What Actually Works in 2026

    Comparing the best AI art generator? An AI art generator is software that uses machine learning to help you get more done — it lowers the barrier so anyone can produce professional output. Privacy matters too: check whether your data trains the model and whether a no-log or enterprise tier is available. Whether you are a beginner or a pro, the right AI art generator slots into your workflow and pays for itself fast. Below we compare features, pricing, and real output so you can choose with confidence.

    Read more →
  • Sparse dictionary learning

    Sparse dictionary learning

    Sparse dictionary learning (also known as sparse coding or SDL) is a representation learning method which aims to find a sparse representation of the input data in the form of a linear combination of basic elements as well as those basic elements themselves. These elements are called atoms, and they compose a dictionary. Atoms in the dictionary are not required to be orthogonal, and they may be an over-complete spanning set. This problem setup also allows the dimensionality of the signals being represented to be higher than any one of the signals being observed. These two properties lead to having seemingly redundant atoms that allow multiple representations of the same signal, but also provide an improvement in sparsity and flexibility of the representation. One of the most important applications of sparse dictionary learning is in the field of compressed sensing or signal recovery. In compressed sensing, a high-dimensional signal can be recovered with only a few linear measurements, provided that the signal is sparse or near-sparse. Since not all signals satisfy this condition, it is crucial to find a sparse representation of that signal such as the wavelet transform or the directional gradient of a rasterized matrix. Once a matrix or a high-dimensional vector is transferred to a sparse space, different recovery algorithms like basis pursuit, CoSaMP, or fast non-iterative algorithms can be used to recover the signal. One of the key principles of dictionary learning is that the dictionary has to be inferred from the input data. The emergence of sparse dictionary learning methods was stimulated by the fact that in signal processing, one typically wants to represent the input data using a minimal amount of components. Before this approach, the general practice was to use predefined dictionaries such as Fourier or wavelet transforms. However, in certain cases, a dictionary that is trained to fit the input data can significantly improve the sparsity, which has applications in data decomposition, compression, and analysis, and has been used in the fields of image denoising and classification, and video and audio processing. Sparsity and overcomplete dictionaries have immense applications in image compression, image fusion, and inpainting. == Problem statement == Given the input dataset X = [ x 1 , . . . , x K ] , x i ∈ R d {\displaystyle X=[x_{1},...,x_{K}],x_{i}\in \mathbb {R} ^{d}} we wish to find a dictionary D ∈ R d × n : D = [ d 1 , . . . , d n ] {\displaystyle \mathbf {D} \in \mathbb {R} ^{d\times n}:D=[d_{1},...,d_{n}]} and a representation R = [ r 1 , . . . , r K ] , r i ∈ R n {\displaystyle R=[r_{1},...,r_{K}],r_{i}\in \mathbb {R} ^{n}} such that both ‖ X − D R ‖ F 2 {\displaystyle \|X-\mathbf {D} R\|_{F}^{2}} is minimized and the representations r i {\displaystyle r_{i}} are sparse enough. This can be formulated as the following optimization problem: argmin D ∈ C , r i ∈ R n ∑ i = 1 K ‖ x i − D r i ‖ 2 2 + λ ‖ r i ‖ 0 {\displaystyle {\underset {\mathbf {D} \in {\mathcal {C}},r_{i}\in \mathbb {R} ^{n}}{\text{argmin}}}\sum _{i=1}^{K}\|x_{i}-\mathbf {D} r_{i}\|_{2}^{2}+\lambda \|r_{i}\|_{0}} , where C ≡ { D ∈ R d × n : ‖ d i ‖ 2 ≤ 1 ∀ i = 1 , . . . , n } {\displaystyle {\mathcal {C}}\equiv \{\mathbf {D} \in \mathbb {R} ^{d\times n}:\|d_{i}\|_{2}\leq 1\,\,\forall i=1,...,n\}} , λ > 0 {\displaystyle \lambda >0} C {\displaystyle {\mathcal {C}}} is required to constrain D {\displaystyle \mathbf {D} } so that its atoms would not reach arbitrarily high values allowing for arbitrarily low (but non-zero) values of r i {\displaystyle r_{i}} . λ {\displaystyle \lambda } controls the trade off between the sparsity and the minimization error. The minimization problem above is not convex because of the ℓ0-"norm" and solving this problem is NP-hard. In some cases L1-norm is known to ensure sparsity and so the above becomes a convex optimization problem with respect to each of the variables D {\displaystyle \mathbf {D} } and R {\displaystyle \mathbf {R} } when the other one is fixed, but it is not jointly convex in ( D , R ) {\displaystyle (\mathbf {D} ,\mathbf {R} )} . === Properties of the dictionary === The dictionary D {\displaystyle \mathbf {D} } defined above can be "undercomplete" if n < d {\displaystyle n d {\displaystyle n>d} with the latter being a typical assumption for a sparse dictionary learning problem. The case of a complete dictionary does not provide any improvement from a representational point of view and thus isn't considered. Undercomplete dictionaries represent the setup in which the actual input data lies in a lower-dimensional space. This case is strongly related to dimensionality reduction and techniques like principal component analysis which require atoms d 1 , . . . , d n {\displaystyle d_{1},...,d_{n}} to be orthogonal. The choice of these subspaces is crucial for efficient dimensionality reduction, but it is not trivial. And dimensionality reduction based on dictionary representation can be extended to address specific tasks such as data analysis or classification. However, their main downside is limiting the choice of atoms. Overcomplete dictionaries, however, do not require the atoms to be orthogonal (they will never have a basis anyway) thus allowing for more flexible dictionaries and richer data representations. An overcomplete dictionary which allows for sparse representation of signal can be a famous transform matrix (wavelets transform, fourier transform) or it can be formulated so that its elements are changed in such a way that it sparsely represents the given signal in a best way. Learned dictionaries are capable of giving sparser solutions as compared to predefined transform matrices. == Algorithms == As the optimization problem described above can be solved as a convex problem with respect to either dictionary or sparse coding while the other one of the two is fixed, most of the algorithms are based on the idea of iteratively updating one and then the other. The problem of finding an optimal sparse coding R {\displaystyle R} with a given dictionary D {\displaystyle \mathbf {D} } is known as sparse approximation (or sometimes just sparse coding problem). A number of algorithms have been developed to solve it (such as matching pursuit and LASSO) and are incorporated in the algorithms described below. === Method of optimal directions (MOD) === The method of optimal directions (or MOD) was one of the first methods introduced to tackle the sparse dictionary learning problem. The core idea of it is to solve the minimization problem subject to the limited number of non-zero components of the representation vector: min D , R { ‖ X − D R ‖ F 2 } s.t. ∀ i ‖ r i ‖ 0 ≤ T {\displaystyle \min _{\mathbf {D} ,R}\{\|X-\mathbf {D} R\|_{F}^{2}\}\,\,{\text{s.t.}}\,\,\forall i\,\,\|r_{i}\|_{0}\leq T} Here, F {\displaystyle F} denotes the Frobenius norm. MOD alternates between getting the sparse coding using a method such as matching pursuit and updating the dictionary by computing the analytical solution of the problem given by D = X R + {\displaystyle \mathbf {D} =XR^{+}} where R + {\displaystyle R^{+}} is a Moore-Penrose pseudoinverse. After this update D {\displaystyle \mathbf {D} } is renormalized to fit the constraints and the new sparse coding is obtained again. The process is repeated until convergence (or until a sufficiently small residue). MOD has proved to be a very efficient method for low-dimensional input data X {\displaystyle X} requiring just a few iterations to converge. However, due to the high complexity of the matrix-inversion operation, computing the pseudoinverse in high-dimensional cases is in many cases intractable. This shortcoming has inspired the development of other dictionary learning methods. === K-SVD === K-SVD is an algorithm that performs SVD at its core to update the atoms of the dictionary one by one and basically is a generalization of K-means. It enforces that each element of the input data x i {\displaystyle x_{i}} is encoded by a linear combination of not more than T 0 {\displaystyle T_{0}} elements in a way identical to the MOD approach: min D , R { ‖ X − D R ‖ F 2 } s.t. ∀ i ‖ r i ‖ 0 ≤ T 0 {\displaystyle \min _{\mathbf {D} ,R}\{\|X-\mathbf {D} R\|_{F}^{2}\}\,\,{\text{s.t.}}\,\,\forall i\,\,\|r_{i}\|_{0}\leq T_{0}} This algorithm's essence is to first fix the dictionary, find the best possible R {\displaystyle R} under the above constraint (using Orthogonal Matching Pursuit) and then iteratively update the atoms of dictionary D {\displaystyle \mathbf {D} } in the following manner: ‖ X − D R ‖ F 2 = | X − ∑ i = 1 K d i x T i | F 2 = ‖ E k − d k x T k ‖ F 2 {\displaystyle \|X-\mathbf {D} R\|_{F}^{2}=\left|X-\sum _{i=1}^{K}d_{i}x_{T}^{i}\right|_{F}^{2}=\|E_{k}-d_{k}x_{T}^{k}\|_{F}^{2}} The next steps of the algorithm include rank-1 approximation of the residual matrix E k {\displaystyle E_{k}} , updating d k {\displaystyle d_{k}} and enforcing the s

    Read more →
  • Kdan Mobile

    Kdan Mobile

    Kdan Mobile Software Limited is a software application development company based in Tainan City, Taiwan. Kdan also has branches in Taipei, Changsha, Irvine, California, Japan, and South Korea. The company was founded in 2009 by Kenny Su, the company's CEO. == History == Kdan Mobile was founded in 2009 by Kenny Su (蘇柏州) and develops an application for PDF documents. Su previously worked at the Industrial Technology Research Institute (ITRI) . In 2018, the company completed its Series B round of fundraising, in which it raised 16 million USD in total. Four global firms, Dattoz Partners (South Korea), WI Harper Group (U.S.), Taiwania Capital (Taiwan), and Golden Asia Fund Mitsubishi UFJ Capital (Japan), made up the Series B investment. Kdan previously raised 5 million USD in its Series A round in 2018.

    Read more →
  • Machine translation in China

    Machine translation in China

    Machine translation in China is the history of machine translation systems developed in China. China became the fourth country that began machine translation (MT) research following USA, UK, and the Soviet Union. In 1957, the Language Institute of Chinese Academy of Sciences took the initiative in Russian-Chinese MT research program and set up an MT research group. From then on the research activities were directed and applied for academic purposes in Universities. The turning point of MT systems launching initiatives in market began from 1990s. MT systems went into blossom into the market. Among these systems, there were commercialized MT systems. To be more specific, Transtar was the first commercialized MT system and has been constantly upgraded. What's more, IMC/EC MT system which was developed by Computer Institute of Chinese Academy of Sciences has further made great advancement. Meanwhile, the practical MT system MT-IT-EC specific to communication domain was also striking to notice, for it has greatly improved the efficiency and productivity in the issue of publications. Government funding is a critical component and support in the development of market-oriented machine translation in China. It is evident to see that since Chinese opened up to the outside world and joined the WTO, the vigorous import and export trade generate opportunities for machine translation to transfer technical terms of products into the readable target information. Facing the increasing demand of sophisticated state-of -the -art translation technology, the academic area including research institute and universities are even launching bachelors’ and master's programs regarding machine translation. Thus, strong evidence illustrates the promising field of machine translation in the future market of China.

    Read more →
  • How to Choose an AI Clip Maker

    How to Choose an AI Clip Maker

    Curious about the best AI clip maker? An AI clip maker is software that uses machine learning to help you get more done — it combines speed, accuracy, and an interface that just works. Hands-on testing shows real-world results vary, so a short free trial is the smartest way to decide. Whether you are a beginner or a pro, the right AI clip maker slots into your workflow and pays for itself fast. This guide breaks down the top picks, their pros and cons, and who each one is best for.

    Read more →
  • Thompson's construction

    Thompson's construction

    In computer science, Thompson's construction algorithm, also called the McNaughton–Yamada–Thompson algorithm, is a method of transforming a regular expression into an equivalent nondeterministic finite automaton (NFA). This NFA can be used to match strings against the regular expression. This algorithm is credited to Ken Thompson. Regular expressions and nondeterministic finite automata are two representations of formal languages. For instance, text processing utilities use regular expressions to describe advanced search patterns, but NFAs are better suited for execution on a computer. Hence, this algorithm is of practical interest, since it can compile regular expressions into NFAs. From a theoretical point of view, this algorithm is a part of the proof that they both accept exactly the same languages, that is, the regular languages. An NFA can be made deterministic by the powerset construction and then be minimized to get an optimal automaton corresponding to the given regular expression. However, an NFA may also be interpreted directly. To decide whether two given regular expressions describe the same language, each can be converted into an equivalent minimal deterministic finite automaton via Thompson's construction, powerset construction, and DFA minimization. If, and only if, the resulting automata agree up to renaming of states, the regular expressions' languages agree. == The algorithm == The algorithm works recursively by splitting an expression into its constituent subexpressions, from which the NFA will be constructed using a set of rules. More precisely, from a regular expression E, the obtained automaton A with the transition function Δ respects the following properties: A has exactly one initial state q0, which is not accessible from any other state. That is, for any state q and any letter a, Δ ( q , a ) {\displaystyle \Delta (q,a)} does not contain q0. A has exactly one final state qf, which is not co-accessible from any other state. That is, for any letter a, Δ ( q f , a ) = ∅ {\displaystyle \Delta (q_{f},a)=\emptyset } . Let c be the number of concatenation of the regular expression E and let s be the number of symbols apart from parentheses — that is, |, , a and ε. Then, the number of states of A is 2s − c (linear in the size of E). The number of transitions leaving any state is at most two. Since an NFA of m states and at most e transitions from each state can match a string of length n in time O(emn), a Thompson NFA can do pattern matching in linear time, assuming a fixed-size alphabet. === Rules === The following rules are depicted according to Aho et al. (2007), p. 122. In what follows, N(s) and N(t) are the NFA of the subexpressions s and t, respectively. The empty-expression ε is converted to A symbol a of the input alphabet is converted to The union expression s|t is converted to State q goes via ε either to the initial state of N(s) or N(t). Their final states become intermediate states of the whole NFA and merge via two ε-transitions into the final state of the NFA. The concatenation expression st is converted to The initial state of N(s) is the initial state of the whole NFA. The final state of N(s) becomes the initial state of N(t). The final state of N(t) is the final state of the whole NFA. The Kleene star expression s is converted to An ε-transition connects initial and final state of the NFA with the sub-NFA N(s) in between. Another ε-transition from the inner final to the inner initial state of N(s) allows for repetition of expression s according to the star operator. The parenthesized expression (s) is converted to N(s) itself. With these rules, using the empty expression and symbol rules as base cases, it is possible to prove with structural induction that any regular expression may be converted into an equivalent NFA. == Example == Two examples are now given, a small informal one with the result, and a bigger with a step by step application of the algorithm. === Small Example === The picture below shows the result of Thompson's construction on (ε|ab). The purple oval corresponds to a, the teal oval corresponds to a, the green oval corresponds to b, the orange oval corresponds to ab, and the blue oval corresponds to ε. === Application of the algorithm === As an example, the picture shows the result of Thompson's construction algorithm on the regular expression (0|(1(01(00)0)1)) that denotes the set of binary numbers that are multiples of 3: { ε, "0", "00", "11", "000", "011", "110", "0000", "0011", "0110", "1001", "1100", "1111", "00000", ... }. The upper right part shows the logical structure (syntax tree) of the expression, with "." denoting concatenation (assumed to have variable arity); subexpressions are named a-q for reference purposes. The left part shows the nondeterministic finite automaton resulting from Thompson's algorithm, with the entry and exit state of each subexpression colored in magenta and cyan, respectively. An ε as transition label is omitted for clarity — unlabelled transitions are in fact ε transitions. The entry and exit state corresponding to the root expression q is the start and accept state of the automaton, respectively. The algorithm's steps are as follows: An equivalent minimal deterministic automaton is shown below. == Relation to other algorithms == Thompson's is one of several algorithms for constructing NFAs from regular expressions; an earlier algorithm was given by McNaughton and Yamada. Converse to Thompson's construction, Kleene's algorithm transforms a finite automaton into a regular expression. Glushkov's construction algorithm is similar to Thompson's construction, once the ε-transitions are removed. == Use in string pattern matching == Regular expressions are often used to specify patterns that software is then asked to match. Generating an NFA by Thompson's construction, and using an appropriate algorithm to simulate it, it is possible to create pattern-matching software with performance that is ⁠ O ( m n ) {\displaystyle O(mn)} ⁠, where m is the length of the regular expression and n is the length of the string being matched. This is much better than is achieved by many popular programming-language implementations; however, it is restricted to purely regular expressions and does not support patterns for non-regular languages like backreferences.

    Read more →
  • Apprenticeship learning

    Apprenticeship learning

    In artificial intelligence, apprenticeship learning (or learning from demonstration or imitation learning) is the process of learning by observing an expert. It can be viewed as a form of supervised learning, where the training dataset consists of task executions by a demonstration teacher. == Mapping function approach == Mapping methods try to mimic the expert by forming a direct mapping either from states to actions, or from states to reward values. For example, in 2002 researchers used such an approach to teach an AIBO robot basic soccer skills. === Inverse reinforcement learning approach === Inverse reinforcement learning (IRL) is the process of deriving a reward function from observed behavior. While ordinary "reinforcement learning" involves using rewards and punishments to learn behavior, in IRL the direction is reversed, and a robot observes a person's behavior to figure out what goal that behavior seems to be trying to achieve. The IRL problem can be defined as: Given 1) measurements of an agent's behaviour over time, in a variety of circumstances; 2) measurements of the sensory inputs to that agent; 3) a model of the physical environment (including the agent's body): Determine the reward function that the agent is optimizing. IRL researcher Stuart J. Russell proposes that IRL might be used to observe humans and attempt to codify their complex "ethical values", in an effort to create "ethical robots" that might someday know "not to cook your cat" without needing to be explicitly told. The scenario can be modeled as a "cooperative inverse reinforcement learning game", where a "person" player and a "robot" player cooperate to secure the person's implicit goals, despite these goals not being explicitly known by either the person nor the robot. In 2017, OpenAI and DeepMind applied deep learning to the cooperative inverse reinforcement learning in simple domains such as Atari games and straightforward robot tasks such as backflips. The human role was limited to answering queries from the robot as to which of two different actions were preferred. The researchers found evidence that the techniques may be economically scalable to modern systems. Apprenticeship via inverse reinforcement learning (AIRP) was developed by in 2004 Pieter Abbeel, Professor in Berkeley's EECS department, and Andrew Ng, Associate Professor in Stanford University's Computer Science Department. AIRP deals with "Markov decision process where we are not explicitly given a reward function, but where instead we can observe an expert demonstrating the task that we want to learn to perform". AIRP has been used to model reward functions of highly dynamic scenarios where there is no obvious reward function intuitively. Take the task of driving for example, there are many different objectives working simultaneously - such as maintaining safe following distance, a good speed, not changing lanes too often, etc. This task, may seem easy at first glance, but a trivial reward function may not converge to the policy wanted. One domain where AIRP has been used extensively is helicopter control. While simple trajectories can be intuitively derived, complicated tasks like aerobatics for shows has been successful. These include aerobatic maneuvers like - in-place flips, in-place rolls, loops, hurricanes and even auto-rotation landings. This work was developed by Pieter Abbeel, Adam Coates, and Andrew Ng - "Autonomous Helicopter Aerobatics through Apprenticeship Learning" === System model approach === System models try to mimic the expert by modeling world dynamics. == Plan approach == The system learns rules to associate preconditions and postconditions with each action. In one 1994 demonstration, a humanoid learns a generalized plan from only two demonstrations of a repetitive ball collection task. == Example == Learning from demonstration is often explained from a perspective that the working Robot-control-system is available and the human-demonstrator is using it. And indeed, if the software works, the Human operator takes the robot-arm, makes a move with it, and the robot will reproduce the action later. For example, he teaches the robot-arm how to put a cup under a coffeemaker and press the start-button. In the replay phase, the robot is imitating this behavior 1:1. But that is not how the system works internally; it is only what the audience can observe. In reality, Learning from demonstration is much more complex. One of the first works on learning by robot apprentices (anthropomorphic robots learning by imitation) was Adrian Stoica's PhD thesis in 1995. In 1997, robotics expert Stefan Schaal was working on the Sarcos robot-arm. The goal was simple: solve the pendulum swingup task. The robot itself can execute a movement, and as a result, the pendulum is moving. The problem is, that it is unclear what actions will result into which movement. It is an Optimal control-problem which can be described with mathematical formulas but is hard to solve. The idea from Schaal was, not to use a Brute-force solver but record the movements of a human-demonstration. The angle of the pendulum is logged over three seconds at the y-axis. This results into a diagram which produces a pattern. In computer animation, the principle is called spline animation. That means, on the x-axis the time is given, for example 0.5 seconds, 1.0 seconds, 1.5 seconds, while on the y-axis is the variable given. In most cases it's the position of an object. In the inverted pendulum it is the angle. The overall task consists of two parts: recording the angle over time and reproducing the recorded motion. The reproducing step is surprisingly simple. As an input we know, in which time step which angle the pendulum must have. Bringing the system to a state is called “Tracking control” or PID control. That means, we have a trajectory over time, and must find control actions to map the system to this trajectory. Other authors call the principle “steering behavior”, because the aim is to bring a robot to a given line.

    Read more →
  • SNNS

    SNNS

    SNNS (Stuttgart Neural Network Simulator) is a neural network simulator originally developed at the University of Stuttgart. While it was originally built for X11 under Unix, there are Windows ports. Its successor JavaNNS never reached the same popularity. == Features == SNNS is written around a simulation kernel to which user written activation functions, learning procedures and output functions can be added. It has support for arbitrary network topologies and the standard release contains support for a number of standard neural network architectures and training algorithms. == Status == There is currently no ongoing active development of SNNS. In July 2008 the license was changed to the GNU LGPL.

    Read more →
  • GLIMMER

    GLIMMER

    In bioinformatics, GLIMMER (Gene Locator and Interpolated Markov ModelER) is used to find genes in prokaryotic DNA. "It is effective at finding genes in bacteria, archea, viruses, typically finding 98-99% of all relatively long protein coding genes". GLIMMER was the first system that used the interpolated Markov model to identify coding regions. The GLIMMER software is open source and is maintained by Steven Salzberg, Art Delcher, and their colleagues at the Center for Computational Biology at Johns Hopkins University. The original GLIMMER algorithms and software were designed by Art Delcher, Simon Kasif and Steven Salzberg and applied to bacterial genome annotation in collaboration with Owen White. == Versions == === GLIMMER 1.0 === First Version of GLIMMER "i.e., GLIMMER 1.0" was released in 1998 and it was published in the paper Microbial gene identification using interpolated Markov model. Markov models were used to identify microbial genes in GLIMMER 1.0. GLIMMER considers the local composition sequence dependencies which makes GLIMMER more flexible and more powerful when compared to fixed-order Markov model. There was a comparison made between interpolated Markov model used by GLIMMER and fifth order Markov model in the paper Microbial gene identification using interpolated Markov models. "GLIMMER algorithm found 1680 genes out of 1717 annotated genes in Haemophilus influenzae where fifth order Markov model found 1574 genes. GLIMMER found 209 additional genes which were not included in 1717 annotated genes where fifth order Markov model found 104 genes."' === GLIMMER 2.0 === Second Version of GLIMMER i.e., GLIMMER 2.0 was released in 1999 and it was published in the paper Improved microbial identification with GLIMMER. This paper provides significant technical improvements such as using interpolated context model instead of interpolated Markov model and resolving overlapping genes which improves the accuracy of GLIMMER. Interpolated context models are used instead of interpolated Markov model which gives the flexibility to select any base. In interpolated Markov model probability distribution of a base is determined from the immediate preceding bases. If the immediate preceding base is irrelevant amino acid translation, interpolated Markov model still considers the preceding base to determine the probability of given base where as interpolated context model which was used in GLIMMER 2.0 can ignore irrelevant bases. False positive predictions were increased in GLIMMER 2.0 to reduce the number of false negative predictions. Overlapped genes are also resolved in GLIMMER 2.0. Various comparisons between GLIMMER 1.0 and GLIMMER 2.0 were made in the paper Improved microbial identification with GLIMMER which shows improvement in the later version. "Sensitivity of GLIMMER 1.0 ranges from 98.4 to 99.7% with an average of 99.1% where as GLIMMER 2.0 has a sensitivity range from 98.6 to 99.8% with an average of 99.3%. GLIMMER 2.0 is very effective in finding genes of high density. The parasite Trypanosoma brucei, responsible for causing African sleeping sickness is being identified by GLIMMER 2.0" === GLIMMER 3.0 === Third version of GLIMMER, "GLIMMER 3.0" was released in 2007 and it was published in the paper Identifying bacterial genes and endosymbiont DNA with Glimmer. This paper describes several major changes made to the GLIMMER system including improved methods to identify coding regions and start codon. Scoring of ORF in GLIMMER 3.0 is done in reverse order i.e., starting from stop codon and moves back towards the start codon. Reverse scanning helps in identifying the coding portion of the gene more accurately which is contained in the context window of IMM. GLIMMER 3.0 also improves the generated training set data by comparing the long-ORF with universal amino acid distribution of widely disparate bacterial genomes."GLIMMER 3.0 has an average long-ORF output of 57% for various organisms where as GLIMMER 2.0 has an average long-ORF output of 39%." GLIMMER 3.0 reduces the rate of false positive predictions which were increased in GLIMMER 2.0 to reduce the number of false negative predictions. "GLIMMER 3.0 has a start-site prediction accuracy of 99.5% for 3'5' matches where as GLIMMER 2.0 has 99.1% for 3'5' matches. GLIMMER 3.0 uses a new algorithm for scanning coding regions, a new start site detection module, and architecture which integrates all gene predictions across an entire genome." Minimum description length === Theoretical and Biological Foundation === The GLIMMER project helped introduce and popularize the use of variable length models in Computational Biology and Bioinformatics that subsequently have been applied to numerous problems such as protein classification and others. Variable length modeling was originally pioneered by information theorists and subsequently ingeniously applied and popularized in data compression (e.g. Ziv-Lempel compression). Prediction and compression are intimately linked using Minimum Description Length Principles. The basic idea is to create a dictionary of frequent words (motifs in biological sequences). The intuition is that the frequently occurring motifs are likely to be most predictive and informative. In GLIMMER the interpolated model is a mixture model of the probabilities of these relatively common motifs. Similarly to the development of HMMs in Computational Biology, the authors of GLIMMER were conceptually influenced by the previous application of another variant of interpolated Markov models to speech recognition by researchers such as Fred Jelinek (IBM) and Eric Ristad (Princeton). The learning algorithm in GLIMMER is different from these earlier approaches. == Access == GLIMMER can be downloaded from The Glimmer home page (requires a C++ compiler). Alternatively, an online version is hosted by NCBI [1]. == How it works == GLIMMER primarily searches for long-ORFS. An open reading frame might overlap with any other open reading frame which will be resolved using the technique described in the sub section. Using these long-ORFS and following certain amino acid distribution GLIMMER generates training set data. Using these training data, GLIMMER trains all the six Markov models of coding DNA from zero to eight order and also train the model for noncoding DNA GLIMMER tries to calculate the probabilities from the data. Based on the number of observations, GLIMMER determines whether to use fixed order Markov model or interpolated Markov model. If the number of observations are greater than 400, GLIMMER uses fixed order Markov model to obtain there probabilities. If the number of observations are less than 400, GLIMMER uses interpolated Markov model which is briefly explained in the next sub section. GLIMMER obtains score for every long-ORF generated using all the six coding DNA models and also using non-coding DNA model. If the score obtained in the previous step is greater than a certain threshold then GLIMMER predicts it to be a gene. The steps explained above describes the basic functionality of GLIMMER. There are various improvements made to GLIMMER and some of them are described in the following sub-sections. === The GLIMMER system === GLIMMER system consists of two programs. First program called build-imm, which takes an input set of sequences and outputs the interpolated Markov model as follows. The probability for each base i.e., A,C,G,T for all k-mers for 0 ≤ k ≤ 8 is computed. Then, for each k-mer, GLIMMER computes weight. New sequence probability is computed as follows. where n is the length of the sequence S x {\displaystyle S_{x}} is the oligomer at position x. I M M 8 ( S x ) {\displaystyle IMM_{8}(S_{x})} , the 8 t h {\displaystyle 8^{th}} -order interpolated Markov model score is computed as "where Y k ( S x − 1 ) {\displaystyle Y_{k}(S_{x-1})} is the weight of the k-mer at position x-1 in the sequence S and P k ( S x ) {\displaystyle P_{k}(S_{x})} is the estimate obtained from the training data of the probability of the base located at position x in the k t h {\displaystyle k^{th}} -order model." The probability of base S x {\displaystyle S_{x}} given the i previous bases is computed as follows. "The value of Y i ( S x ) {\displaystyle Y_{i}(S_{x})} associated with P i ( S x ) {\displaystyle P_{i}(S_{x})} can be regarded as a measure of confidence in the accuracy of this value as an estimate of the true probability. GLIMMER uses two criteria to determine Y i ( S x ) {\displaystyle Y_{i}(S_{x})} . The first of these is simple frequency occurrence in which the number of occurrences of context string S x , i {\displaystyle S_{x,i}} in the training data exceeds a specific threshold value, then Y i ( S x ) {\displaystyle Y_{i}(S_{x})} is set to 1.0. The current default value for threshold is 400, which gives 95% confidence. When there are insufficient sample occurrences of a context string, build-imm employ additional criteria to determine Y {\displaystyle Y} value. For a

    Read more →
  • FastText

    FastText

    fastText is a library for learning of word embeddings and text classification created by Facebook's AI Research (FAIR) lab. The model allows one to create an unsupervised learning or supervised learning algorithm for obtaining vector representations for words. Facebook makes available pretrained models for 294 languages. Several papers describe the techniques used by fastText. The GitHub repository was archived on March 19, 2024.

    Read more →
  • Graphics

    Graphics

    Graphics (from Ancient Greek γραφικός (graphikós) 'pertaining to drawing, painting, writing, etc.') are visual images or designs on some surface, such as a wall, canvas, screen, paper, or stone, to inform, illustrate, or entertain. In contemporary usage, it includes a pictorial representation of data, as in design and manufacture, in typesetting and the graphic arts, and in educational and recreational software. Images that are generated by a computer are called computer graphics. Examples are photographs, drawings, line art, mathematical graphs, line graphs, charts, diagrams, typography, numbers, symbols, geometric designs, maps, engineering drawings, or other images. Graphics often combine text, illustration, and color. Graphic design may consist of the deliberate selection, creation, or arrangement of typography alone, as in a brochure, flyer, poster, web site, or book without any other element. The objective can be clarity or effective communication, association with other cultural elements, or merely the creation of a distinctive style. Graphics can be functional or artistic. The latter can be a recorded version, such as a photograph, or an interpretation by a scientist to highlight essential features, or an artist, in which case the distinction with imaginary graphics may become blurred. It can also be used for architecture. == History == The earliest graphics known to anthropologists studying prehistoric periods are cave paintings and markings on boulders, bone, ivory, and antlers, which were created during the Upper Palaeolithic period from 40,000 to 10,000 B.C. or earlier. Many of these were found to record astronomical, seasonal, and chronological details. Some of the earliest graphics and drawings are known to the modern world, from almost 6,000 years ago, are that of engraved stone tablets and ceramic cylinder seals, marking the beginning of the historical periods and the keeping of records for accounting and inventory purposes. Records from Egypt predate these and papyrus was used by the Egyptians as a material on which to plan the building of pyramids; they also used slabs of limestone and wood. From 600 to 250 BC, the Greeks played a major role in geometry. They used graphics to represent their mathematical theories such as the Circle Theorem and the Pythagorean theorem. In art, "graphics" is often used to distinguish work in a monotone and made up of lines, as opposed to painting. === Drawing === Drawing generally involves making marks on a surface by applying pressure from a tool or moving a tool across a surface. In which a tool is always used as if there were no tools it would be art. Graphical drawing is an instrumental guided drawing. === Printmaking === Woodblock printing, including images is first seen in China after paper was invented (about A.D. 105). In the West, the main techniques have been woodcut, engraving and etching, but there are many others. ==== Etching ==== Etching is an intaglio method of printmaking in which the image is incised into the surface of a metal plate using an acid. The acid eats the metal, leaving behind roughened areas, or, if the surface exposed to the acid is very thin, burning a line into the plate. The use of the process in printmaking is believed to have been invented by Daniel Hopfer (c. 1470–1536) of Augsburg, Germany, who decorated armour in this way. Etching is also used in the manufacturing of printed circuit boards and semiconductor devices. === Line art === Line art is a rather non-specific term sometimes used for any image that consists of distinct straight and curved lines placed against a (usually plain) background, without gradations in shade (darkness) or hue (color) to represent two-dimensional or three-dimensional objects. Line art is usually monochromatic, although lines may be of different colors. === Illustration === An illustration is a visual representation such as a drawing, painting, photograph or other work of art that stresses the subject more than form. The aim of an illustration is to elucidate or decorate a story, poem or piece of textual information (such as a newspaper article), traditionally by providing a visual representation of something described in the text. The editorial cartoon, also known as a political cartoon, is an illustration containing a political or social message. Illustrations can be used to display a wide range of subject matter and serve a variety of functions, such as: giving faces to characters in a story displaying a number of examples of an item described in an academic textbook (e.g. A Typology) visualizing step-wise sets of instructions in a technical manual communicating subtle thematic tone in a narrative linking brands to the ideas of human expression, individuality, and creativity making a reader laugh or smile for fun (to make laugh) funny === Graphs === A graph or chart is a graphic that represents tabular or numeric data. Charts are often used to make it easier to understand large quantities of data and the relationships between different parts of the data. === Diagrams === A diagram is a simplified and structured visual representation of concepts, ideas, constructions, relations, statistical data, etc., used to visualize and clarify the topic. === Symbols === A symbol, in its basic sense, is a representation of a concept or quantity; i.e., an idea, object, concept, quality, etc. In more psychological and philosophical terms, all concepts are symbolic in nature, and representations for these concepts are simply token artifacts that are allegorical to (but do not directly codify) a symbolic meaning, or symbolism. === Maps === A map is a simplified depiction of a space, a navigational aid which highlights relations between objects within that space. Usually, a map is a two-dimensional, geometrically accurate representation of a three-dimensional space. One of the first 'modern' maps was made by Waldseemüller. === Photography === One difference between photography and other forms of graphics is that a photographer, in principle, just records a single moment in reality, with seemingly no interpretation. However, a photographer can choose the field of view and angle, and may also use other techniques, such as various lenses to choose the view or filters to change the colors. In recent times, digital photography has opened the way to an infinite number of fast, but strong, manipulations. Even in the early days of photography, there was controversy over photographs of enacted scenes that were presented as 'real life' (especially in war photography, where it can be very difficult to record the original events). Shifting the viewer's eyes ever so slightly with simple pinpricks in the negative could have a dramatic effect. The choice of the field of view can have a strong effect, effectively 'censoring out' other parts of the scene, accomplished by cropping them out or simply not including them in the photograph. This even touches on the philosophical question of what reality is. The human brain processes information based on previous experience, making us see what we want to see or what we were taught to see. Photography does the same, although the photographer interprets the scene for their viewer. === Engineering drawings === An engineering drawing is a type of drawing and is technical in nature, used to fully and clearly define requirements for engineered items. It is usually created in accordance with standardized conventions for layout, nomenclature, interpretation, appearance (such as typefaces and line styles), size, etc. === Computer graphics === There are two types of computer graphics: raster graphics, where each pixel is separately defined (as in a digital photograph), and vector graphics, where mathematical formulas are used to draw lines and shapes, which are then interpreted at the viewer's end to produce the graphic. Using vectors results in infinitely sharp graphics and often smaller files, but, when complex, like vectors take time to render and may have larger file sizes than a raster equivalent. In 1950, the first computer-driven display was attached to MIT's Whirlwind I computer to generate simple pictures. This was followed by MIT's TX-0 and TX-2, interactive computing which increased interest in computer graphics during the late 1950s. In 1962, Ivan Sutherland invented Sketchpad, an innovative program that influenced alternative forms of interaction with computers. In the mid-1960s, large computer graphics research projects were begun at MIT, General Motors, Bell Labs, and Lockheed Corporation. Douglas T. Ross of MIT developed an advanced compiler language for graphics programming. S.A.Coons, also at MIT, and J. C. Ferguson at Boeing, began work in sculptured surfaces. GM developed their DAC-1 system, and other companies, such as Douglas, Lockheed, and McDonnell, also made significant developments. In 1968, ray tracing was first described by Arthur Appel of the IBM Research Center, Yorktown Heights, N

    Read more →
  • Georgetown–IBM experiment

    Georgetown–IBM experiment

    The Georgetown–IBM experiment was an influential demonstration of machine translation, which was performed on January 7, 1954. Developed jointly by Georgetown University and IBM, the experiment involved completely automatic translation of more than sixty Russian sentences into English. == Background == Conceived and performed primarily in order to attract governmental and public interest and funding by showing the possibilities of machine translation, it was by no means a fully featured system: It had only six grammar rules and 250 lexical items in its vocabulary (of stems and endings). Words in the vocabulary were in the fields of politics, law, mathematics, chemistry, metallurgy, communications and military affairs. Vocabulary was punched onto punch cards. This complete dictionary was never fully shown (only the extended one from Garvin's article). Apart from general topics, the system was specialized in the domain of organic chemistry. The translation was carried out using an IBM 701 mainframe computer (launched in April 1953). The Georgetown-IBM experiment is the best-known result of the MIT conference in June 1952 to which all active researchers in the machine translation field were invited. At the conference, Duncan Harkin from US Department of Defense suggested that his department would finance a new machine translation project. Jerome Weisner supported the idea and offered finance from the Research Laboratory of Electronics at MIT. Leon Dostert had been invited to the project for his previous experience with the automatic correction of translations (back then 'mechanical translation'); his interpretation system had a strong impact on the Nuremberg War Crimes Tribunal. The linguistics part of the demonstration was carried out for the most part by linguist Paul Garvin who had also good knowledge of Russian. Over 60 Romanized Russian statements from a wide range of political, legal, mathematical, and scientific topics were entered into the machine by a computer operator who knew no Russian, and the resulting English translations appeared on a printer. The sentences to be translated were carefully selected. Many operations for the demonstration were fitted to specific words and sentences. In addition, there was no relational or sentence analysis which could recognize the sentence structure. The approach was mostly 'lexicographical' based on a dictionary where a specific word had a connection with specific rules and steps. == Algorithm == The algorithm first translates Russian words into numerical codes, then performs the following case-analysis on each numerical code to choose between possible English word translations, reorder the English words, or omit some English words. The flowchart of the algorithm is reproduced in (see Table 1 for the 6 rules). == Translation examples == How it analyzes Vyelyichyina ugla opryedyelyayetsya otnoshyenyiyem dlyini dugi k radyiusu (figure 2 of ). == Reception == Well publicized by journalists and perceived as a success, the experiment did encourage governments to invest in computational linguistics. The authors claimed that within three or five years, machine translation could well be a solved problem. However, the real progress was much slower, and after the ALPAC report in 1966, which found that the ten years of long research had failed to fulfill the expectations, funding was reduced dramatically. The demonstration was given widespread coverage in the foreign press, but only a small fraction of journalists drew attention to previous machine translation attempts.

    Read more →
  • Best AI Presentation Makers in 2026

    Best AI Presentation Makers in 2026

    In search of the best AI presentation maker? An AI presentation maker is software that uses machine learning to help you get more done — it turns a rough idea into a polished result in seconds. When choosing one, weigh output quality, pricing, export formats, and how well it fits the tools you already use. Whether you are a beginner or a pro, the right AI presentation maker slots into your workflow and pays for itself fast. Below we compare features, pricing, and real output so you can choose with confidence.

    Read more →