AI Bot Grammar Checker

AI Bot Grammar Checker — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Ericom Connect

    Ericom Connect

    Ericom Connect is a remote access/application publishing solution produced by Ericom Software that provides secure, centrally managed access to physical or hosted desktops and applications running on Microsoft Windows and Linux systems. == Product overview == Ericom Connect is desktop virtualization and application virtualization software that allows users to run applications remotely, without installing them on the local computer or device. The software is noted for its scalability, ease of deployment, and compatibility with any type of infrastructure, cloud or physical. Ericom Connect uses AccessPad (native client for desktops), AccessToGo (native client for mobile), or AccessNow, one of the first HTML5 RDP solutions to support clientless access to Windows desktops and applications from any device with an HTML5-compatible browser, including Macintosh computers, mobile devices, and Google Chromebooks. Other notable features include performance monitoring, built-in real-time analytics & BI, support for two-factor authentication (using RSA SecurID), multi-tenancy and multi-datacenter support via a single unified web interface, and a “Launch Simulation” feature that allows users to visualize and simulate actual step-by-step user processes directly from within the administration console. In addition to scalability, by distributing configurations, logs, etc., across multiple servers there is no single point of failure, as can be the case if all configuration information is stored on one server. == History == Ericom Connect was introduced in 2015. Ericom Connect is a successor to Ericom PowerTerm Web Connect. PowerTerm Web Connect used an architecture similar to what was then current with Citrix and VMWare, relying on a centralized SQL server, a connection broker, image management for different hypervisors, and a variety of clients. Ericom Connect uses a new grid architecture that provides more scalability, reliability, and flexibility than before.

    Read more →
  • Information gain ratio

    Information gain ratio

    In decision tree learning, information gain ratio is a ratio of information gain to the intrinsic information. It was proposed by Ross Quinlan, to reduce a bias towards multi-valued attributes by taking the number and size of branches into account when choosing an attribute. Information gain is also known as mutual information. == Information gain calculation == Information gain is the reduction in entropy produced from partitioning a set with attributes a {\displaystyle a} and finding the optimal candidate that produces the highest value: IG ( T , a ) = H ( T ) − H ( T | a ) , {\displaystyle {\text{IG}}(T,a)=\mathrm {H} {(T)}-\mathrm {H} {(T|a)},} where T {\displaystyle T} is a random variable and H ( T | a ) {\displaystyle \mathrm {H} {(T|a)}} is the entropy of T {\displaystyle T} given the value of attribute a {\displaystyle a} . The information gain is equal to the total entropy for an attribute if for each of the attribute values a unique classification can be made for the result attribute. In this case the relative entropies subtracted from the total entropy are 0. == Split information calculation == The split information value for a test is defined as follows: SplitInformation ( X ) = − ∑ i = 1 n N ( x i ) N ( x ) ∗ log ⁡ 2 N ( x i ) N ( x ) {\displaystyle {\text{SplitInformation}}(X)=-\sum _{i=1}^{n}{{\frac {\mathrm {N} (x_{i})}{\mathrm {N} (x)}}\log {_{2}}{\frac {\mathrm {N} (x_{i})}{\mathrm {N} (x)}}}} where X {\displaystyle X} is a discrete random variable with possible values x 1 , x 2 , . . . , x i {\displaystyle {x_{1},x_{2},...,x_{i}}} and N ( x i ) {\displaystyle N(x_{i})} being the number of times that x i {\displaystyle x_{i}} occurs divided by the total count of events N ( x ) {\displaystyle N(x)} where x {\displaystyle x} is the set of events. The split information value is a positive number that describes the potential worth of splitting a branch from a node. This in turn is the intrinsic value that the random variable possesses and will be used to remove the bias in the information gain ratio calculation. == Information gain ratio calculation == The information gain ratio is the ratio between the information gain and the split information value: IGR ( T , a ) = IG ( T , a ) / SplitInformation ( T ) {\displaystyle {\text{IGR}}(T,a)={\text{IG}}(T,a)/{\text{SplitInformation}}(T)} IGR ( T , a ) = − ∑ i = 1 n P ( T ) log ⁡ P ( T ) − ( − ∑ i = 1 n P ( T | a ) log ⁡ P ( T | a ) ) − ∑ i = 1 n N ( t i ) N ( t ) ∗ log ⁡ 2 N ( t i ) N ( t ) {\displaystyle {\text{IGR}}(T,a)={\frac {-\sum _{i=1}^{n}{\mathrm {P} (T)\log \mathrm {P} (T)}-(-\sum _{i=1}^{n}{\mathrm {P} (T|a)\log \mathrm {P} (T|a)})}{-\sum _{i=1}^{n}{{\frac {\mathrm {N} (t_{i})}{\mathrm {N} (t)}}\log {_{2}}{\frac {\mathrm {N} (t_{i})}{\mathrm {N} (t)}}}}}} == Example == Using weather data published by Fordham University, the table was created below: Using the table above, one can find the entropy, information gain, split information, and information gain ratio for each variable (outlook, temperature, humidity, and wind). These calculations are shown in the tables below: Using the above tables, one can deduce that Outlook has the highest information gain ratio. Next, one must find the statistics for the sub-groups of the Outlook variable (sunny, overcast, and rainy), for this example one will only build the sunny branch (as shown in the table below): One can find the following statistics for the other variables (temperature, humidity, and wind) to see which have the greatest effect on the sunny element of the outlook variable: Humidity was found to have the highest information gain ratio. One will repeat the same steps as before and find the statistics for the events of the Humidity variable (high and normal): Since the play values are either all "No" or "Yes", the information gain ratio value will be equal to 1. Also, now that one has reached the end of the variable chain with Wind being the last variable left, they can build an entire root to leaf node branch line of a decision tree. Once finished with reaching this leaf node, one would follow the same procedure for the rest of the elements that have yet to be split in the decision tree. This set of data was relatively small, however, if a larger set was used, the advantages of using the information gain ratio as the splitting factor of a decision tree can be seen more. == Advantages == Information gain ratio biases the decision tree against considering attributes with a large number of distinct values. For example, suppose that we are building a decision tree for some data describing a business's customers. Information gain ratio is used to decide which of the attributes are the most relevant. These will be tested near the root of the tree. One of the input attributes might be the customer's telephone number. This attribute has a high information gain, because it uniquely identifies each customer. Due to its high amount of distinct values, this will not be chosen to be tested near the root. == Disadvantages == Although information gain ratio solves the key problem of information gain, it creates another problem. If one is considering an amount of attributes that have a high number of distinct values, these will never be above one that has a lower number of distinct values. == Difference from information gain == Information gain's shortcoming is created by not providing a numerical difference between attributes with high distinct values from those that have less. Example: Suppose that we are building a decision tree for some data describing a business's customers. Information gain is often used to decide which of the attributes are the most relevant, so they can be tested near the root of the tree. One of the input attributes might be the customer's credit card number. This attribute has a high information gain, because it uniquely identifies each customer, but we do not want to include it in the decision tree: deciding how to treat a customer based on their credit card number is unlikely to generalize to customers we haven't seen before. Information gain ratio's strength is that it has a bias towards the attributes with the lower number of distinct values. Below is a table describing the differences of information gain and information gain ratio when put in certain scenarios.

    Read more →
  • LIBSVM

    LIBSVM

    LIBSVM and LIBLINEAR are two popular open source machine learning libraries, both developed at the National Taiwan University and both written in C++ though with a C API. LIBSVM implements the sequential minimal optimization (SMO) algorithm for kernelized support vector machines (SVMs), supporting classification and regression. LIBLINEAR implements linear SVMs and logistic regression models trained using a coordinate descent algorithm. The SVM learning code from both libraries is often reused in other open source machine learning toolkits, including GATE, KNIME, Orange and scikit-learn. Bindings and ports exist for programming languages such as Java, MATLAB, R, Julia, and Python. It is available in e1071 library in R and scikit-learn in Python. Both libraries are free software released under the 3-clause BSD license.

    Read more →
  • NETtalk (artificial neural network)

    NETtalk (artificial neural network)

    NETtalk is an artificial neural network that learns to pronounce written English text by supervised learning. It takes English text as input, and produces a matching phonetic transcriptions as output. It is the result of research carried out in the mid-1980s by Terrence Sejnowski and Charles Rosenberg. The intent behind NETtalk was to construct simplified models that might shed light on the complexity of learning human level cognitive tasks, and their implementation as a connectionist model that could also learn to perform a comparable task. The authors trained it by backpropagation. The network was trained on a large amount of English words and their corresponding pronunciations, and is able to generate pronunciations for unseen words with a high level of accuracy. The output of the network was a stream of phonemes, which fed into DECtalk to produce audible speech. It achieved popular success, appearing on the Today show. From the point of view of modeling human cognition, NETtalk does not specifically model the image processing stages and letter recognition of the visual cortex. Rather, it assumes that the letters have been pre-classified and recognized. It is NETtalk's task to learn proper associations between the correct pronunciation with a given sequence of letters based on the context in which the letters appear. A similar architecture was subsequently used for the opposite task, that of converting continuous speech signal to a phoneme sequence. == Training == The training dataset was a 20,008-word subset of the Brown Corpus, with manually annotated phoneme and stress for each letter. The development process was described in a 1993 interview. It took three months -- 250 person-hours -- to create the training dataset, but only a few days to train the network. After it was run successfully on this, the authors tried it on a phonological transcription of an interview with a young Latino boy from a barrio in Los Angeles. This resulted in a network that reproduced his Spanish accent. The original NETtalk was implemented on a Ridge 32, which took 0.275 seconds per learning step (one forward and one backward pass). Training NETtalk became a benchmark to test for the efficiency of backpropagation programs. For example, an implementation on Connection Machine-1 (with 16384 processors) ran at 52x speedup. An implementation on a 10-cell Warp ran at 340x speedup. The following table compiles the benchmark scores as of 1988. Speed is measured in "millions of connections per second" (MCPS). For example, the original NETtalk on Ridge 32 took 0.275 seconds per forward-backward pass, giving 18629 / 10 6 0.275 = 0.068 {\displaystyle {\frac {18629/10^{6}}{0.275}}=0.068} MCPS. Relative times are normalized to the MicroVax. == Architecture == The network had three layers and 18,629 adjustable weights, large by the standards of 1986. There were worries that it would overfit the dataset, but it was trained successfully. The input of the network has 203 units, divided into 7 groups of 29 units each. Each group is a one-hot encoding of one character. There are 29 possible characters: 26 letters, comma, period, and word boundary (whitespace). To produce the pronunciation of a single character, the network takes the character itself, as well as 3 characters before and 3 characters after it. The hidden layer has 80 units. The output has 26 units. 21 units encode for articulatory features (point of articulation, voicing, vowel height, etc.) of phonemes, and 5 units encode for stress and syllable boundaries. Sejnowski studied the learned representation in the network, and found that phonemes that sound similar are clustered together in representation space. The output of the network degrades, but remains understandable, when some hidden neurons are removed.

    Read more →
  • DABUS

    DABUS

    DABUS (Device for the Autonomous Bootstrapping of Unified Sentience) is an artificial intelligence (AI) system created by Stephen Thaler. It reportedly conceived of two novel products — a food container constructed using fractal geometry, which enables rapid reheating, and a flashing beacon for attracting attention in an emergency. The filing of patent applications designating DABUS as inventor has led to decisions by patent offices and courts on whether a patent can be granted for an invention reportedly made by an AI system. == History in different jurisdictions == === Australia === On 17 September 2019, Thaler filed an application to patent a "Food container and devices and methods for attracting enhanced attention," naming DABUS as the inventor. On 21 September 2020, IP Australia found that section 15(1) of the Patents Act 1990 (Cth) is inconsistent with an artificial intelligence machine being treated as an inventor, and Thaler's application had lapsed. Thaler sought judicial review, and on 30 July 2021, the Federal Court set aside IP Australia's decision and ordered IP Australia to reconsider the application. On 13 April 2022, the Full Court of the Federal Court set aside that decision, holding that only a natural person can be an inventor for the purposes of the Patents Act 1990 (Cth) and the Patents Regulations 1991 (Cth), and that such an inventor must be identified for any person to be entitled to a grant of a patent. On 11 November 2022, Thaler was refused special leave to appeal to the High Court. === European Patent Office === On 17 October 2018 and 7 November 2018, Thaler filed two European patent applications with the European Patent Office. The first claimed invention was a "Food Container" and the second was "Devices and Methods for Attracting Enhanced Attention." On 27 January 2020, the EPO rejected the applications on the grounds that the application listed an AI system named DABUS, and not a human, as the inventor, based on Article 81 and Rule 19(1) of the European Patent Convention (EPC). On 21 December 2021, the Board of Appeal of the EPO dismissed Thaler's appeal from the EPO's primary decision. The Board of Appeal confirmed that "under the EPC the designated inventor has to be a person with legal capacity. This is not merely an assumption on which the EPC was drafted. It is the ordinary meaning of the term inventor." === United Kingdom === Similar applications were filed by Thaler to the United Kingdom Intellectual Property Office on 17 October and 7 November 2018. The Office asked Thaler to file statements of inventorship and of right of grant to a patent (Patent Form 7) in respect of each invention within 16 months of the filing date. Thaler filed those forms naming DABUS as the inventor and explaining in some detail why he believed that machines should be regarded as inventors in the circumstances. His application was rejected on the grounds that: (1) naming a machine as inventor did not meet the requirements of the Patents Act 1977; and (2) the IPO was not satisfied as to the manner in which Thaler had acquired rights that would otherwise vest in the inventor. Thaler was not satisfied with the decision and asked for a hearing before an official known as the "hearing officer". By a decision dated 4 December 2019 the hearing officer rejected Thaler's appeal. Thaler appealed against the hearing officer's decision to the Patents Court (a specialist court within the Chancery Division of the High Court of England and Wales that determines patent disputes). On 21 September 2020, Mr Justice Marcus Smith upheld the decision of the hearing officer. On 21 September 2021, Thaler's further appeal to the Court of Appeal was dismissed by Arnold LJ and Laing LJ (Birss LJ dissenting). On 20 December 2023, the UK Supreme Court dismissed a further appeal by Thaler. In its judgment, the court held that an "inventor" under the Patents Act 1977 must be a natural person. === United States === The patent applications on the inventions were refused by the USPTO, which held that only natural persons can be named as inventors in a patent application. Thaler first fought this result by filing a complaint under the Administrative Procedure Act alleging that the decision was "arbitrary, capricious, an abuse of discretion and not in accordance with the law; unsupported by substantial evidence, and in excess of Defendants’ statutory authority." A month later on August 19, 2019, Thaler filed a petition with the USPTO as allowed in 37 C.F.R. § 1.181 stating that DABUS should be the inventor. The judge and Thaler agreed in this case that Thaler himself is unable to receive the patent on behalf of DABUS. In their August 5, 2022, Thaler decision, the US Court of Appeals for the Federal Circuit affirmed that only a natural person could be an inventor, which means that the AI that invents any other type of invention is not addressed by the "who" mentioned in the legislation. === New Zealand === On January 31, 2022, the Intellectual Property Office of New Zealand (IPONZ) decided that a patent application (776029) filed by Stephen Thaler was void, on the basis that no inventor was identified on the patent application. IPONZ determined that DABUS could not be "an actual devisor of the invention" as required by the Patents Act 2013, and that this must be a natural person as held by the previous patent offices above. The High Court of New Zealand confirmed the decision in 2023. === South Africa === On 24 June 2021, the South African Companies and Intellectual Property Commission (CIPC) accepted Dr Thaler's Patent Cooperation Treaty, for a patent in respect of inventions generated by DABUS. In July 2021, the CIPC released a notice of issuance for the patent. It is the first patent granted for an AI invention. === Switzerland === On June 26, 2025, the Swiss Federal Administrative Court ruled that artificial intelligence systems such as DABUS cannot be listed as inventors in patent applications. The court upheld the existing practice of the Swiss Federal Institute of Intellectual Property (IPI), which requires that only natural persons can be recognized as inventors under Swiss patent law. The case concerned a patent application, which sought to designate DABUS as the sole inventor of a food container designed with a fractal geometry to enhance heat distribution. The IPI had rejected the application, arguing that both the absence of a human inventor and the attribution of inventorship to an AI system were inadmissible. While the court dismissed Thaler's main request, it accepted a subsidiary request: if a human applicant recognizes and files a patent based on an AI-generated invention, that person may be considered the inventor. As a result, the application may proceed with Thaler listed as the inventor. The decision (B-2532/2024) can still be appealed to the Swiss Federal Supreme Court.

    Read more →
  • Inverted pendulum

    Inverted pendulum

    An inverted pendulum is a pendulum that has its center of mass above its pivot point. It is unstable and falls over without additional help. It can be suspended stably in this inverted position by using a control system to monitor the angle of the pole and move the pivot point horizontally back under the center of mass when it starts to fall over, keeping it balanced. The inverted pendulum is a classic problem in dynamics and control theory and is used as a benchmark for testing control strategies. It is often implemented with the pivot point mounted on a cart that can move horizontally under control of an electronic servo system as shown in the photo; this is called a cart and pole apparatus. Most applications limit the pendulum to 1 degree of freedom by affixing the pole to an axis of rotation. Whereas a normal pendulum is stable when hanging downward, an inverted pendulum is inherently unstable, and must be actively balanced in order to remain upright; this can be done either by applying a torque at the pivot point, by moving the pivot point horizontally as part of a feedback system, changing the rate of rotation of a mass mounted on the pendulum on an axis parallel to the pivot axis and thereby generating a net torque on the pendulum, or by oscillating the pivot point vertically. A simple demonstration of moving the pivot point in a feedback system is achieved by balancing an upturned broomstick on the end of one's finger. A second type of inverted pendulum is a tiltmeter for tall structures, which consists of a wire anchored to the bottom of the foundation and attached to a float in a pool of oil at the top of the structure that has devices for measuring movement of the neutral position of the float away from its original position. == Overview == A pendulum with its bob hanging directly below the support pivot is at a stable equilibrium point, where it remains motionless because there is no torque on the pendulum. If displaced from this position, it experiences a restoring torque that returns it toward the equilibrium position. A pendulum with its bob in an inverted position, supported on a rigid rod directly above the pivot, 180° from its stable equilibrium position, is at an unstable equilibrium point. At this point again there is no torque on the pendulum, but the slightest displacement away from this position causes a gravitation torque on the pendulum that accelerates it away from equilibrium, causing it to fall over. In order to stabilize a pendulum in this inverted position, a feedback control system can be used, which monitors the pendulum's angle and moves the position of the pivot point sideways when the pendulum starts to fall over, to keep it balanced. The inverted pendulum is a classic problem in dynamics and control theory and is widely used as a benchmark for testing control algorithms (PID controllers, state-space representation, neural networks, fuzzy control, genetic algorithms, etc.). Variations on this problem include multiple links, allowing the motion of the cart to be commanded while maintaining the pendulum, and balancing the cart-pendulum system on a see-saw. The inverted pendulum is related to rocket or missile guidance, where the center of gravity is located behind the center of drag causing aerodynamic instability. The understanding of a similar problem can be shown by simple robotics in the form of a balancing cart. Balancing an upturned broomstick on the end of one's finger is a simple demonstration, and the problem is solved by self-balancing personal transporters such as the Segway PT, the self-balancing hoverboard and the self-balancing unicycle. Another way that an inverted pendulum may be stabilized, without any feedback or control mechanism, is by oscillating the pivot rapidly up and down. This is called Kapitza's pendulum. If the oscillation is sufficiently strong (in terms of its acceleration and amplitude) then the inverted pendulum can recover from perturbations in a strikingly counterintuitive manner. If the driving point moves in simple harmonic motion, the pendulum's motion is described by the Mathieu equation. == Equations of motion == The equations of motion of inverted pendulums are dependent on what constraints are placed on the motion of the pendulum. Inverted pendulums can be created in various configurations resulting in a number of Equations of Motion describing the behavior of the pendulum. === Stationary pivot point === In a configuration where the pivot point of the pendulum is fixed in space, the equation of motion is similar to that for an uninverted pendulum. The equation of motion below assumes no friction or any other resistance to movement, a rigid massless rod, and the restriction to 2-dimensional movement. θ ¨ − g ℓ sin ⁡ θ = 0 {\displaystyle {\ddot {\theta }}-{g \over \ell }\sin \theta =0} Where θ ¨ {\displaystyle {\ddot {\theta }}} is the angular acceleration of the pendulum, g {\displaystyle g} is the standard gravity on the surface of the Earth, ℓ {\displaystyle \ell } is the length of the pendulum, and θ {\displaystyle \theta } is the angular displacement measured from the equilibrium position. When θ ¨ {\displaystyle {\ddot {\theta }}} added to both sides, it has the same sign as the angular acceleration term: θ ¨ = g ℓ sin ⁡ θ {\displaystyle {\ddot {\theta }}={g \over \ell }\sin \theta } Thus, the inverted pendulum accelerates away from the vertical unstable equilibrium in the direction initially displaced, and the acceleration is inversely proportional to the length. Tall pendulums fall more slowly than short ones. Derivation using torque and moment of inertia: The pendulum is assumed to consist of a point mass, of mass m {\displaystyle m} , affixed to the end of a massless rigid rod, of length ℓ {\displaystyle \ell } , attached to a pivot point at the end opposite the point mass. The net torque of the system must equal the moment of inertia times the angular acceleration: τ n e t = I θ ¨ {\displaystyle {\boldsymbol {\tau }}_{\mathrm {net} }=I{\ddot {\theta }}} The torque due to gravity providing the net torque: τ n e t = m g ℓ sin ⁡ θ {\displaystyle {\boldsymbol {\tau }}_{\mathrm {net} }=mg\ell \sin \theta \,\!} Where θ {\displaystyle \theta \ } is the angle measured from the inverted equilibrium position. The resulting equation: I θ ¨ = m g ℓ sin ⁡ θ {\displaystyle I{\ddot {\theta }}=mg\ell \sin \theta \,\!} The moment of inertia for a point mass: I = m R 2 {\displaystyle I=mR^{2}} In the case of the inverted pendulum the radius is the length of the rod, ℓ {\displaystyle \ell } . Substituting in I = m ℓ 2 {\displaystyle I=m\ell ^{2}} m ℓ 2 θ ¨ = m g ℓ sin ⁡ θ {\displaystyle m\ell ^{2}{\ddot {\theta }}=mg\ell \sin \theta \,\!} Mass and ℓ 2 {\displaystyle \ell ^{2}} is divided from each side resulting in: θ ¨ = g ℓ sin ⁡ θ {\displaystyle {\ddot {\theta }}={g \over \ell }\sin \theta } === Inverted pendulum on a cart === An inverted pendulum on a cart consists of a mass m {\displaystyle m} at the top of a pole of length ℓ {\displaystyle \ell } pivoted on a horizontally moving base as shown in the adjacent image. The cart is restricted to linear motion and is subject to forces resulting in or hindering motion. === Essentials of stabilization === The essentials of stabilizing the inverted pendulum can be summarized qualitatively in three steps. 1. If the tilt angle θ {\displaystyle \theta } is to the right, the cart must accelerate to the right and vice versa. 2. The position of the cart x {\displaystyle x} relative to track center is stabilized by slightly modulating the null angle (the angle error that the control system tries to null) by the position of the cart, that is, null angle = θ + k x {\displaystyle =\theta +kx} where k {\displaystyle k} is small. This makes the pole want to lean slightly toward track center and stabilize at track center where the tilt angle is exactly vertical. Any offset in the tilt sensor or track slope that would otherwise cause instability translates into a stable position offset. A further added offset gives position control. 3. A normal pendulum subject to a moving pivot point such as a load lifted by a crane, has a peaked response at the pendulum radian frequency of ω p = g / ℓ {\displaystyle \omega _{p}={\sqrt {g/\ell }}} . To prevent uncontrolled swinging, the frequency spectrum of the pivot motion should be suppressed near ω p {\displaystyle \omega _{p}} . The inverted pendulum requires the same suppression filter to achieve stability. As a consequence of the null angle modulation strategy, the position feedback is positive, that is, a sudden command to move right produces an initial cart motion to the left followed by a move right to rebalance the pendulum. The interaction of the pendulum instability and the positive position feedback instability to produce a stable system is a feature that makes the mathematical analysis an interesting and challenging problem. === From Lagrange's equations === The equations of motion c

    Read more →
  • Averaged one-dependence estimators

    Averaged one-dependence estimators

    Averaged one-dependence estimators (AODE) is a probabilistic classification learning technique. It was developed to address the attribute-independence problem of the popular naive Bayes classifier. It frequently develops substantially more accurate classifiers than naive Bayes at the cost of a modest increase in the amount of computation. == The AODE classifier == AODE seeks to estimate the probability of each class y given a specified set of features x1, ... xn, P(y | x1, ... xn). To do so it uses the formula P ^ ( y ∣ x 1 , … x n ) = ∑ i : 1 ≤ i ≤ n ∧ F ( x i ) ≥ m P ^ ( y , x i ) ∏ j = 1 n P ^ ( x j ∣ y , x i ) ∑ y ′ ∈ Y ∑ i : 1 ≤ i ≤ n ∧ F ( x i ) ≥ m P ^ ( y ′ , x i ) ∏ j = 1 n P ^ ( x j ∣ y ′ , x i ) {\displaystyle {\hat {P}}(y\mid x_{1},\ldots x_{n})={\frac {\sum _{i:1\leq i\leq n\wedge F(x_{i})\geq m}{\hat {P}}(y,x_{i})\prod _{j=1}^{n}{\hat {P}}(x_{j}\mid y,x_{i})}{\sum _{y^{\prime }\in Y}\sum _{i:1\leq i\leq n\wedge F(x_{i})\geq m}{\hat {P}}(y^{\prime },x_{i})\prod _{j=1}^{n}{\hat {P}}(x_{j}\mid y^{\prime },x_{i})}}} where P ^ ( ⋅ ) {\displaystyle {\hat {P}}(\cdot )} denotes an estimate of P ( ⋅ ) {\displaystyle P(\cdot )} , F ( ⋅ ) {\displaystyle F(\cdot )} is the frequency with which the argument appears in the sample data and m is a user specified minimum frequency with which a term must appear in order to be used in the outer summation. In recent practice m is usually set at 1. == Derivation of the AODE classifier == We seek to estimate P(y | x1, ... xn). By the definition of conditional probability P ( y ∣ x 1 , … x n ) = P ( y , x 1 , … x n ) P ( x 1 , … x n ) . {\displaystyle P(y\mid x_{1},\ldots x_{n})={\frac {P(y,x_{1},\ldots x_{n})}{P(x_{1},\ldots x_{n})}}.} For any 1 ≤ i ≤ n {\displaystyle 1\leq i\leq n} , P ( y , x 1 , … x n ) = P ( y , x i ) P ( x 1 , … x n ∣ y , x i ) . {\displaystyle P(y,x_{1},\ldots x_{n})=P(y,x_{i})P(x_{1},\ldots x_{n}\mid y,x_{i}).} Under an assumption that x1, ... xn are independent given y and xi, it follows that P ( y , x 1 , … x n ) = P ( y , x i ) ∏ j = 1 n P ( x j ∣ y , x i ) . {\displaystyle P(y,x_{1},\ldots x_{n})=P(y,x_{i})\prod _{j=1}^{n}P(x_{j}\mid y,x_{i}).} This formula defines a special form of One Dependence Estimator (ODE), a variant of the naive Bayes classifier that makes the above independence assumption that is weaker (and hence potentially less harmful) than the naive Bayes' independence assumption. In consequence, each ODE should create a less biased estimator than naive Bayes. However, because the base probability estimates are each conditioned by two variables rather than one, they are formed from less data (the training examples that satisfy both variables) and hence are likely to have more variance. AODE reduces this variance by averaging the estimates of all such ODEs. == Features of the AODE classifier == Like naive Bayes, AODE does not perform model selection and does not use tuneable parameters. As a result, it has low variance. It supports incremental learning whereby the classifier can be updated efficiently with information from new examples as they become available. It predicts class probabilities rather than simply predicting a single class, allowing the user to determine the confidence with which each classification can be made. Its probabilistic model can directly handle situations where some data are missing. AODE has computational complexity O ( l n 2 ) {\displaystyle O(ln^{2})} at training time and O ( k n 2 ) {\displaystyle O(kn^{2})} at classification time, where n is the number of features, l is the number of training examples and k is the number of classes. This makes it infeasible for application to high-dimensional data. However, within that limitation, it is linear with respect to the number of training examples and hence can efficiently process large numbers of training examples. == Implementations == The free Weka machine learning suite includes an implementation of AODE.

    Read more →
  • Facial recognition system

    Facial recognition system

    A facial recognition system is a technology potentially capable of matching a human face from a digital image or a video frame against a database of faces. Such a system is typically employed to authenticate users through ID verification services, and works by pinpointing and measuring facial features from a given image. Development on similar systems began in the 1960s as a form of computer application. Since their inception, facial recognition systems have seen wider uses in recent times on smartphones and in other forms of technology, such as robotics. Because computerized facial recognition involves the measurement of a human's physiological characteristics, facial recognition systems are categorized as biometrics. Although the accuracy of facial recognition systems as a biometric technology is lower than iris recognition, fingerprint image acquisition, palm recognition or voice recognition, it is widely adopted due to its contactless process. Facial recognition systems have been deployed in advanced human–computer interaction, video surveillance, law enforcement, passenger screening, decisions on employment and housing, and automatic indexing of images. Facial recognition systems are employed throughout the world today by governments and private companies. Their effectiveness varies, and some systems have previously been scrapped because of their ineffectiveness. The use of facial recognition systems has also raised controversy, with claims that the systems violate citizens' privacy, commonly make incorrect identifications, encourage gender norms and racial profiling, and do not protect important biometric data. The appearance of synthetic media such as deepfakes has also raised concerns about its security. These claims have led to the ban of facial recognition systems in several cities in the United States. Growing societal concerns led social networking company Meta Platforms to shut down its Facebook facial recognition system in 2021, deleting the face-scan data of more than one billion users. The change represented one of the largest shifts in facial recognition usage in the technology's history. IBM also stopped offering facial recognition technology due to similar concerns. == History of facial recognition technology == Automated facial recognition was pioneered in the 1960s by Woody Bledsoe, Helen Chan Wolf, and Charles Bisson, whose work focused on teaching computers to recognize human faces. Their early facial recognition project was dubbed "man-machine" because a human first needed to establish the coordinates of facial features in a photograph before they could be used by a computer for recognition. Using a graphics tablet, a human would pinpoint facial features coordinates, such as the pupil centers, the inside and outside corners of eyes, and the widows peak in the hairline. The coordinates were used to calculate 20 individual distances, including the width of the mouth and of the eyes. A human could process about 40 pictures an hour, building a database of these computed distances. A computer would then automatically compare the distances for each photograph, calculate the difference between the distances, and return the closed records as a possible match. In 1970, Takeo Kanade publicly demonstrated a face-matching system that located anatomical features such as the chin and calculated the distance ratio between facial features without human intervention. Later tests revealed that the system could not always reliably identify facial features. Nonetheless, interest in the subject grew and in 1977 Kanade published the first detailed book on facial recognition technology. In 1993, the Defense Advanced Research Project Agency (DARPA) and the Army Research Laboratory (ARL) established the face recognition technology program FERET to develop "automatic face recognition capabilities" that could be employed in a productive real life environment "to assist security, intelligence, and law enforcement personnel in the performance of their duties." Face recognition systems that had been trialled in research labs were evaluated. The FERET tests found that while the performance of existing automated facial recognition systems varied, a handful of existing methods could viably be used to recognize faces in still images taken in a controlled environment. The FERET tests spawned three US companies that sold automated facial recognition systems. Vision Corporation and Miros Inc were founded in 1994, by researchers who used the results of the FERET tests as a selling point. Viisage Technology was established by an identification card defense contractor in 1996 to commercially exploit the rights to the facial recognition algorithm developed by Alex Pentland at MIT. Following the 1993 FERET face-recognition vendor test, the Department of Motor Vehicles (DMV) offices in West Virginia and New Mexico became the first DMV offices to use automated facial recognition systems to prevent people from obtaining multiple driving licenses using different names. Driver's licenses in the United States were at that point a commonly accepted form of photo identification. DMV offices across the United States were undergoing a technological upgrade and were in the process of establishing databases of digital ID photographs. This enabled DMV offices to deploy the facial recognition systems on the market to search photographs for new driving licenses against the existing DMV database. DMV offices became one of the first major markets for automated facial recognition technology and introduced US citizens to facial recognition as a standard method of identification. The increase of the US prison population in the 1990s prompted U.S. states to established connected and automated identification systems that incorporated digital biometric databases, in some instances this included facial recognition. In 1999, Minnesota incorporated the facial recognition system FaceIT by Visionics into a mug shot booking system that allowed police, judges and court officers to track criminals across the state. Until the 1990s, facial recognition systems were developed primarily by using photographic portraits of human faces. Research on face recognition to reliably locate a face in an image that contains other objects gained traction in the early 1990s with the principal component analysis (PCA). The PCA method of face detection is also known as Eigenface and was developed by Matthew Turk and Alex Pentland. Turk and Pentland combined the conceptual approach of the Karhunen–Loève theorem and factor analysis, to develop a linear model. Eigenfaces are determined based on global and orthogonal features in human faces. A human face is calculated as a weighted combination of a number of Eigenfaces. Because few Eigenfaces were used to encode human faces of a given population, Turk and Pentland's PCA face detection method greatly reduced the amount of data that had to be processed to detect a face. Pentland in 1994 defined Eigenface features, including eigen eyes, eigen mouths and eigen noses, to advance the use of PCA in facial recognition. In 1997, the PCA Eigenface method of face recognition was improved upon using linear discriminant analysis (LDA) to produce Fisherfaces. LDA Fisherfaces became dominantly used in PCA feature based face recognition. While Eigenfaces were also used for face reconstruction. In these approaches no global structure of the face is calculated which links the facial features or parts. Purely feature based approaches to facial recognition were overtaken in the late 1990s by the Bochum system, which used Gabor filter to record the face features and computed a grid of the face structure to link the features. Christoph von der Malsburg and his research team at the University of Bochum developed Elastic Bunch Graph Matching in the mid-1990s to extract a face out of an image using skin segmentation. By 1997, the face detection method developed by Malsburg outperformed most other facial detection systems on the market. The so-called "Bochum system" of face detection was sold commercially on the market as ZN-Face to operators of airports and other busy locations. The software was "robust enough to make identifications from less-than-perfect face views. It can also often see through such impediments to identification as mustaches, beards, changed hairstyles and glasses—even sunglasses". Real-time face detection in video footage became possible in 2001 with the Viola–Jones object detection framework for faces. Paul Viola and Michael Jones combined their face detection method with the Haar-like feature approach to object recognition in digital images to launch AdaBoost, the first real-time frontal-view face detector. By 2015, the Viola–Jones algorithm had been implemented using small low power detectors on handheld devices and embedded systems. Therefore, the Viola–Jones algorithm has not only broadened the practical application of face recognition systems but

    Read more →
  • Artificial Inventor Project

    Artificial Inventor Project

    The Artificial Inventor Project (AIP) is a global legal initiative headed by Professor Ryan Abbott dedicated to pursuing intellectual property (IP) rights for inventions and creative works generated autonomously by artificial intelligence (AI) systems without traditional human inventorship or authorship. The project coordinates a series of pro bono test cases worldwide, aiming to prompt law reform and public debate on how IP law should accommodate non-human creators. == History == In 2019, AIP filed patent applications in multiple jurisdictions, including the United States, United Kingdom, European Patent Office, Australia, Switzerland, and South Africa, naming the AI system DABUS (Device for the Autonomous Bootstrapping of Unified Sentience), created by Stephen Thaler, as the inventor. The aim was to challenge legal norms that require inventors to be natural persons and highlight pressing policy questions about AI-generated innovation and IP regimes. == Legal proceedings by jurisdiction == === Australia === In July 2021, a Federal Court of Australia judge (Beach J) ruled that AI can be considered an inventor under the Patents Act 1990, ordering IP Australia to reinstate the relevant patent. However, the full court then overturned this ruling on appeal and denied further review. === European Patent Office === The EPO Board of Appeal determined in 2022 that only a human inventor may be named, rendering DABUS‑based applications unacceptable. === South Africa === In 2021, a patent was granted listing DABUS as the inventor. As South Africa’s procedural system does not involve substantive inventorship review, the grant proceeded on formal grounds alone. === Switzerland === On 26 June 2025, the Swiss Federal Administrative Court ruled that artificial intelligence systems such as DABUS cannot be listed as inventors on patent applications. The court upheld the existing practice of the Swiss Federal Institute of Intellectual Property (IPI), affirming that only natural persons may be recognized as inventors under Swiss patent law. === United Kingdom === In December 2023, the UK Supreme Court unanimously held that AI systems cannot be legally recognized as inventors, affirming that "an inventor must be a person" under current British law. === United States === In Thaler v. Hirshfeld (2021), a U.S. federal court agreed with the USPTO that inventors must be natural persons, rejecting the DABUS application and setting a precedent consistent with existing statute and administrative policy. == Criticism and impact == The project has fueled substantial discourse. Critics caution that allowing AI inventorship may complicate notions of accountability and ownership. Proponents argue that legal recognition must evolve to avoid disincentivizing innovation produced by AI and to maintain honesty about the true source of invention.

    Read more →
  • Probabilistic latent semantic analysis

    Probabilistic latent semantic analysis

    Probabilistic latent semantic analysis (PLSA), also known as probabilistic latent semantic indexing (PLSI, especially in information retrieval circles) is a statistical technique for the analysis of two-mode and co-occurrence data. In effect, one can derive a low-dimensional representation of the observed variables in terms of their affinity to certain hidden variables, just as in latent semantic analysis, from which PLSA evolved. Compared to standard latent semantic analysis which stems from linear algebra and downsizes the occurrence tables (usually via a singular value decomposition), probabilistic latent semantic analysis is based on a mixture decomposition derived from a latent class model. == Model == Considering observations in the form of co-occurrences ( w , d ) {\displaystyle (w,d)} of words and documents, PLSA models the probability of each co-occurrence as a mixture of conditionally independent multinomial distributions: P ( w , d ) = ∑ c P ( d ) P ( c | d ) P ( w | c ) = P ( d ) ∑ c P ( c | d ) P ( w | c ) {\displaystyle P(w,d)=\sum _{c}P(d)P(c|d)P(w|c)=P(d)\sum _{c}P(c|d)P(w|c)} with c {\displaystyle c} being the words' topic. Note that the number of topics is a hyperparameter that must be chosen in advance and is not estimated from the data. The first formulation is the symmetric formulation, where w {\displaystyle w} and d {\displaystyle d} are both generated from the latent class c {\displaystyle c} in similar ways (using the conditional probabilities P ( d | c ) {\displaystyle P(d|c)} and P ( w | c ) {\displaystyle P(w|c)} ), whereas the second formulation is the asymmetric formulation, where, for each document d {\displaystyle d} , a latent class is chosen conditionally to the document according to P ( c | d ) {\displaystyle P(c|d)} , and a word is then generated from that class according to P ( w | c ) {\displaystyle P(w|c)} . Although we have used words and documents in this example, the co-occurrence of any couple of discrete variables may be modelled in exactly the same way. So, the number of parameters is equal to c d + w c {\displaystyle cd+wc} . The number of parameters grows linearly with the number of documents. In addition, although PLSA is a generative model of the documents in the collection it is estimated on, it is not a generative model of new documents. Their parameters are learned using the EM algorithm. == Application == PLSA may be used in a discriminative setting, via Fisher kernels. PLSA has applications in information retrieval and filtering, natural language processing, machine learning from text, bioinformatics, and related areas. It is reported that the aspect model used in the probabilistic latent semantic analysis has severe overfitting problems. == Extensions == Hierarchical extensions: Asymmetric: MASHA ("Multinomial ASymmetric Hierarchical Analysis") Symmetric: HPLSA ("Hierarchical Probabilistic Latent Semantic Analysis") Generative models: The following models have been developed to address an often-criticized shortcoming of PLSA, namely that it is not a proper generative model for new documents. Latent Dirichlet allocation – adds a Dirichlet prior on the per-document topic distribution Higher-order data: Although this is rarely discussed in the scientific literature, PLSA extends naturally to higher order data (three modes and higher), i.e. it can model co-occurrences over three or more variables. In the symmetric formulation above, this is done simply by adding conditional probability distributions for these additional variables. This is the probabilistic analogue to non-negative tensor factorisation. == History == This is an example of a latent class model (see references therein), and it is related to non-negative matrix factorization. The present terminology was coined in 1999 by Thomas Hofmann.

    Read more →
  • Distribution learning theory

    Distribution learning theory

    The distributional learning theory or learning of probability distribution is a framework in computational learning theory. It has been proposed from Michael Kearns, Yishay Mansour, Dana Ron, Ronitt Rubinfeld, Robert Schapire and Linda Sellie in 1994 and it was inspired from the PAC-framework introduced by Leslie Valiant. In this framework the input is a number of samples drawn from a distribution that belongs to a specific class of distributions. The goal is to find an efficient algorithm that, based on these samples, determines with high probability the distribution from which the samples have been drawn. Because of its generality, this framework has been used in a large variety of different fields like machine learning, approximation algorithms, applied probability and statistics. This article explains the basic definitions, tools and results in this framework from the theory of computation point of view. == Definitions == Let X {\displaystyle \textstyle X} be the support of the distributions of interest. As in the original work of Kearns et al. if X {\displaystyle \textstyle X} is finite it can be assumed without loss of generality that X = { 0 , 1 } n {\displaystyle \textstyle X=\{0,1\}^{n}} where n {\displaystyle \textstyle n} is the number of bits that have to be used in order to represent any y ∈ X {\displaystyle \textstyle y\in X} . We focus in probability distributions over X {\displaystyle \textstyle X} . There are two possible representations of a probability distribution D {\displaystyle \textstyle D} over X {\displaystyle \textstyle X} . probability distribution function (or evaluator) an evaluator E D {\displaystyle \textstyle E_{D}} for D {\displaystyle \textstyle D} takes as input any y ∈ X {\displaystyle \textstyle y\in X} and outputs a real number E D [ y ] {\displaystyle \textstyle E_{D}[y]} which denotes the probability that of y {\displaystyle \textstyle y} according to D {\displaystyle \textstyle D} , i.e. E D [ y ] = Pr [ Y = y ] {\displaystyle \textstyle E_{D}[y]=\Pr[Y=y]} if Y ∼ D {\displaystyle \textstyle Y\sim D} . generator a generator G D {\displaystyle \textstyle G_{D}} for D {\displaystyle \textstyle D} takes as input a string of truly random bits y {\displaystyle \textstyle y} and outputs G D [ y ] ∈ X {\displaystyle \textstyle G_{D}[y]\in X} according to the distribution D {\displaystyle \textstyle D} . Generator can be interpreted as a routine that simulates sampling from the distribution D {\displaystyle \textstyle D} given a sequence of fair coin tosses. A distribution D {\displaystyle \textstyle D} is called to have a polynomial generator (respectively evaluator) if its generator (respectively evaluator) exists and can be computed in polynomial time. Let C X {\displaystyle \textstyle C_{X}} a class of distribution over X, that is C X {\displaystyle \textstyle C_{X}} is a set such that every D ∈ C X {\displaystyle \textstyle D\in C_{X}} is a probability distribution with support X {\displaystyle \textstyle X} . The C X {\displaystyle \textstyle C_{X}} can also be written as C {\displaystyle \textstyle C} for simplicity. In order to evaluate learnability, it is necessary to have a way to measure how well an approximated distribution D ′ {\displaystyle \textstyle D'} fits the sampled distribution D {\displaystyle \textstyle D} . There are several ways to measure the divergence between two distributions. Three common possibilities are Kullback–Leibler divergence Total variation distance of probability measures Kolmogorov distance Total variation and Kolmogorov distance are true metrics, while KL divergence is not (it lacks symmetry). These measures are ordered by convergence strength: closeness in KL divergence implies closeness in total variation (via Pinsker's inequality), which in turn implies closeness in Kolmogorov distance. Therefore, a learnability result proven under KL divergence automatically holds under the weaker measures, but not vice versa. Since certain measures may be more appropriate in specific applications, we will use d ( D , D ′ ) {\displaystyle \textstyle d(D,D')} to denote a selected divergence between the distribution D {\displaystyle \textstyle D} and the distribution D ′ {\displaystyle \textstyle D'} . The basic input that we use in order to learn a distribution is a number of samples drawn by this distribution. For the computational point of view the assumption is that such a sample is given in a constant amount of time. So it's like having access to an oracle G E N ( D ) {\displaystyle \textstyle GEN(D)} that returns a sample from the distribution D {\displaystyle \textstyle D} . Sometimes the interest is, apart from measuring the time complexity, to measure the number of samples that have to be used in order to learn a specific distribution D {\displaystyle \textstyle D} in class of distributions C {\displaystyle \textstyle C} . This quantity is called sample complexity of the learning algorithm. In order for the problem of distribution learning to be more clear consider the problem of supervised learning as defined in. In this framework of statistical learning theory a training set S = { ( x 1 , y 1 ) , … , ( x n , y n ) } {\displaystyle \textstyle S=\{(x_{1},y_{1}),\dots ,(x_{n},y_{n})\}} and the goal is to find a target function f : X → Y {\displaystyle \textstyle f:X\rightarrow Y} that minimizes some loss function, e.g. the square loss function. More formally f = arg ⁡ min g ∫ V ( y , g ( x ) ) d ρ ( x , y ) {\displaystyle f=\arg \min _{g}\int V(y,g(x))d\rho (x,y)} , where V ( ⋅ , ⋅ ) {\displaystyle V(\cdot ,\cdot )} is the loss function, e.g. V ( y , z ) = ( y − z ) 2 {\displaystyle V(y,z)=(y-z)^{2}} and ρ ( x , y ) {\displaystyle \rho (x,y)} the probability distribution according to which the elements of the training set are sampled. If the conditional probability distribution ρ x ( y ) {\displaystyle \rho _{x}(y)} is known then the target function has the closed form f ( x ) = ∫ y y d ρ x ( y ) {\displaystyle f(x)=\int _{y}yd\rho _{x}(y)} . So the set S {\displaystyle S} is a set of samples from the probability distribution ρ ( x , y ) {\displaystyle \rho (x,y)} . Now the goal of distributional learning theory if to find ρ {\displaystyle \rho } given S {\displaystyle S} which can be used to find the target function f {\displaystyle f} . Definition of learnability A class of distributions C {\displaystyle \textstyle C} is called efficiently learnable if for every ϵ > 0 {\displaystyle \textstyle \epsilon >0} and 0 < δ ≤ 1 {\displaystyle \textstyle 0<\delta \leq 1} given access to G E N ( D ) {\displaystyle \textstyle GEN(D)} for an unknown distribution D ∈ C {\displaystyle \textstyle D\in C} , there exists a polynomial time algorithm A {\displaystyle \textstyle A} , called learning algorithm of C {\displaystyle \textstyle C} , that outputs a generator or an evaluator of a distribution D ′ {\displaystyle \textstyle D'} such that Pr [ d ( D , D ′ ) ≤ ϵ ] ≥ 1 − δ {\displaystyle \Pr[d(D,D')\leq \epsilon ]\geq 1-\delta } If we know that D ′ ∈ C {\displaystyle \textstyle D'\in C} then A {\displaystyle \textstyle A} is called proper learning algorithm, otherwise is called improper learning algorithm. In some settings the class of distributions C {\displaystyle \textstyle C} is a class with well known distributions which can be described by a set of parameters. For instance C {\displaystyle \textstyle C} could be the class of all the Gaussian distributions N ( μ , σ 2 ) {\displaystyle \textstyle N(\mu ,\sigma ^{2})} . In this case the algorithm A {\displaystyle \textstyle A} should be able to estimate the parameters μ , σ {\displaystyle \textstyle \mu ,\sigma } . In this case A {\displaystyle \textstyle A} is called parameter learning algorithm. Obviously the parameter learning for simple distributions is a very well studied field that is called statistical estimation and there is a very long bibliography on different estimators for different kinds of simple known distributions. But distributions learning theory deals with learning class of distributions that have more complicated description. == First results == In their seminal work, Kearns et al. deal with the case where A {\displaystyle \textstyle A} is described in term of a finite polynomial sized circuit and they proved the following for some specific classes of distribution. O R {\displaystyle \textstyle OR} gate distributions for this kind of distributions there is no polynomial-sized evaluator, unless # P ⊆ P / poly {\displaystyle \textstyle \#P\subseteq P/{\text{poly}}} . On the other hand, this class is efficiently learnable with generator. Parity gate distributions this class is efficiently learnable with both generator and evaluator. Mixtures of Hamming Balls this class is efficiently learnable with both generator and evaluator. Probabilistic Finite Automata this class is not efficiently learnable with evaluator under the Noisy Parity Assumption which is an impossibility assumption in the PAC learning fram

    Read more →
  • Multi-label classification

    Multi-label classification

    In machine learning, multi-label classification or multi-output classification is a variant of the classification problem where multiple nonexclusive labels may be assigned to each instance. Multi-label classification is a generalization of multiclass classification, which is the single-label problem of categorizing instances into precisely one of several (greater than or equal to two) classes. In the multi-label problem the labels are nonexclusive and there is no constraint on how many of the classes the instance can be assigned to. The formulation of multi-label learning was first introduced by Shen et al. in the context of Semantic Scene Classification, and later gained popularity across various areas of machine learning. Formally, multi-label classification is the problem of finding a model that maps inputs x to binary vectors y; that is, it assigns a value of 0 or 1 for each element (label) in y. == Problem transformation methods == Several problem transformation methods exist for multi-label classification, and can be roughly broken down into: === Transformation into binary classification problems === The baseline approach, called the binary relevance method, amounts to independently training one binary classifier for each label. Given an unseen sample, the combined model then predicts all labels for this sample for which the respective classifiers predict a positive result. Although this method of dividing the task into multiple binary tasks may resemble superficially the one-vs.-all (OvA) and one-vs.-rest (OvR) methods for multiclass classification, it is essentially different from both, because a single classifier under binary relevance deals with a single label, without any regard to other labels whatsoever. A classifier chain is an alternative method for transforming a multi-label classification problem into several binary classification problems. It differs from binary relevance in that labels are predicted sequentially, and the output of all previous classifiers (i.e. positive or negative for a particular label) are input as features to subsequent classifiers. Classifier chains have been applied, for instance, in HIV drug resistance prediction. Bayesian network has also been applied to optimally order classifiers in Classifier chains. In case of transforming the problem to multiple binary classifications, the likelihood function reads L = ∏ i = 1 n ( ∏ k ( ∏ j k ( p k , j k ( x i ) δ y i , k , j k ) ) ) {\displaystyle L=\prod _{i=1}^{n}(\prod _{k}(\prod _{j_{k}}(p_{k,j_{k}}(x_{i})^{\delta _{y_{i,k},j_{k}}})))} where index i {\displaystyle i} runs over the samples, index k {\displaystyle k} runs over the labels, j k {\displaystyle j_{k}} indicates the binary outcomes 0 or 1, δ a , b {\displaystyle \delta _{a,b}} indicates the Kronecker delta, y i , k ∈ 0 , 1 {\displaystyle y_{i,k}\in {0,1}} indicates the multiple hot encoded labels of sample i {\displaystyle i} . === Transformation into multi-class classification problem === The label powerset (LP) transformation creates one binary classifier for every label combination present in the training set. For example, if possible labels for an example were A, B, and C, the label powerset representation of this problem is a multi-class classification problem with the classes [0 0 0], [1 0 0], [0 1 0], [0 0 1], [1 1 0], [1 0 1], [0 1 1], and [1 1 1] where for example [1 0 1] denotes an example where labels A and C are present and label B is absent. === Ensemble methods === A set of multi-class classifiers can be used to create a multi-label ensemble classifier. For a given example, each classifier outputs a single class (corresponding to a single label in the multi-label problem). These predictions are then combined by an ensemble method, usually a voting scheme where every class that receives a requisite percentage of votes from individual classifiers (often referred to as the discrimination threshold) is predicted as a present label in the multi-label output. However, more complex ensemble methods exist, such as committee machines. Another variation is the random k-labelsets (RAKEL) algorithm, which uses multiple LP classifiers, each trained on a random subset of the actual labels; label prediction is then carried out by a voting scheme. A set of multi-label classifiers can be used in a similar way to create a multi-label ensemble classifier. In this case, each classifier votes once for each label it predicts rather than for a single label. == Adapted algorithms == Some classification algorithms/models have been adapted to the multi-label task, without requiring problem transformations. Examples of these including for multi-label data are k-nearest neighbors: the ML-kNN algorithm extends the k-NN classifier to multi-label data. decision trees: "Clare" is an adapted C4.5 algorithm for multi-label classification; the modification involves the entropy calculations. MMC, MMDT, and SSC refined MMDT, can classify multi-labeled data based on multi-valued attributes without transforming the attributes into single-values. They are also named multi-valued and multi-labeled decision tree classification methods. kernel methods for vector output neural networks: BP-MLL is an adaptation of the popular back-propagation algorithm for multi-label learning. == Learning paradigms == Based on learning paradigms, the existing multi-label classification techniques can be classified into batch learning and online machine learning. Batch learning algorithms require all the data samples to be available beforehand. It trains the model using the entire training data and then predicts the test sample using the found relationship. The online learning algorithms, on the other hand, incrementally build their models in sequential iterations. In iteration t, an online algorithm receives a sample, xt and predicts its label(s) ŷt using the current model; the algorithm then receives yt, the true label(s) of xt and updates its model based on the sample-label pair: (xt, yt). == Multi-label stream classification == Data streams are possibly infinite sequences of data that continuously and rapidly grow over time. Multi-label stream classification (MLSC) is the version of multi-label classification task that takes place in data streams. It is sometimes also called online multi-label classification. The difficulties of multi-label classification (exponential number of possible label sets, capturing dependencies between labels) are combined with difficulties of data streams (time and memory constraints, addressing infinite stream with finite means, concept drifts). Many MLSC methods resort to ensemble methods in order to increase their predictive performance and deal with concept drifts. Below are the most widely used ensemble methods in the literature: Online Bagging (OzaBagging)-based methods: Observing the probability of having K many of a certain data point in a bootstrap sample is approximately Poisson(1) for big datasets, each incoming data instance in a data stream can be weighted proportional to Poisson(1) distribution to mimic bootstrapping in an online setting. This is called Online Bagging (OzaBagging). Many multi-label methods that use Online Bagging are proposed in the literature, each of which utilizes different problem transformation methods. EBR, ECC, EPS, EBRT, EBMT, ML-Random Rules are examples of such methods. ADWIN Bagging-based methods: Online Bagging methods for MLSC are sometimes combined with explicit concept drift detection mechanisms such as ADWIN (Adaptive Window). ADWIN keeps a variable-sized window to detect changes in the distribution of the data, and improves the ensemble by resetting the components that perform poorly when there is a drift in the incoming data. Generally, the letter 'a' is used as a subscript in the name of such ensembles to indicate the usage of ADWIN change detector. EaBR, EaCC, EaHTPS are examples of such multi-label ensembles. GOOWE-ML-based methods: Interpreting the relevance scores of each component of the ensemble as vectors in the label space and solving a least squares problem at the end of each batch, Geometrically-Optimum Online-Weighted Ensemble for Multi-label Classification (GOOWE-ML) is proposed. The ensemble tries to minimize the distance between the weighted prediction of its components and the ground truth vector for each instance over a batch. Unlike Online Bagging and ADWIN Bagging, GOOWE-ML utilizes a weighted voting scheme where better performing components of the ensemble are given more weight. The GOOWE-ML ensemble grows over time, and the lowest weight component is replaced by a new component when it is full at the end of a batch. GOBR, GOCC, GOPS, GORT are the proposed GOOWE-ML-based multi-label ensembles. Multiple Windows : Here, BR models that use a sliding window are replaced with two windows for each label, one for relevant and one for non-relevant examples. Instances are oversampled or undersampled according to a load factor that is kept

    Read more →
  • Tresorit

    Tresorit

    Tresorit is a Swiss company providing end-to-end encrypted cloud storage and secure content collaboration services. Founded in 2011, the company primarily serves businesses and organizations with elevated data protection and compliance requirements. Since 2021, Tresorit has been part of Swiss Post's digital business services, which, under the name 'Swiss Post Digital' offer secure communication platforms and connectable software solutions for SMEs, public authorities, and the healthcare sector, among others. == History == Tresorit was founded in 2011 by Hungarian software developers Istvan Lam, Szilveszter Szebeni and Gyorgy Szilagyi with the aim of providing a secure alternative to traditional cloud storage solutions. The company developed a cloud collaboration platform based on client-side end-to-end encryption and a zero-knowledge architecture. In its early years, Tresorit gained attention through a public security challenge inviting researchers to attempt to compromise its encryption system. The initiative received coverage in technology and cybersecurity media. The company initially positioned itself as a secure alternative to conventional cloud storage services and gradually expanded its offering toward enterprise-focused collaboration tools. In 2021, Swiss Post Communications Services acquired a majority stake in Tresorit. The company is now part of Swiss Post, and continues to operate independently within Swiss Post’s digital division, while benefiting from the broader infrastructure and institutional framework of its parent organization. Tresorit has offices in Zurich, Munich, and Budapest. == Products and Services == Tresorit provides a cloud-based platform for secure file storage and collaboration. Its services include encrypted file sharing, email encryption, electronic signatures, and encrypted data rooms for managing sensitive documents and workflows. The platform is available on Windows, macOS, Linux, Android, and iOS. == Technology == Tresorit uses client-side end-to-end encryption based on a zero-knowledge model. Files are encrypted on the user’s device before being uploaded to company servers. According to the company, encryption keys remain under user control, meaning that Tresorit and third parties cannot access the content of stored files. == Security challenge == Between 2013 and 2014, Tresorit organized a public challenge inviting security researchers to attempt to compromise the service's encryption implementation. The challenge received coverage in technology and cybersecurity media. == Acquisition by Swiss Post == In 2021, Swiss Post Communications Services acquired a majority stake in Tresorit as part of Swiss Post’s broader digital services strategy. The company is now part of Swiss Post. == Reception == Tresorit has been covered by international technology and business publications in the context of secure cloud storage and encrypted collaboration services. TechCrunch described the company as an early European provider of end-to-end encrypted cloud services, while The New York Times included it in discussions of secure file-sharing tools. Other publications such as TechRadar and ITPro have reviewed Tresorit in the context of enterprise security and confidential data handling.

    Read more →
  • Dendrogram

    Dendrogram

    A dendrogram is a diagram representing a tree graph. This diagrammatic representation is frequently used in different contexts: in hierarchical clustering, it illustrates the arrangement of the clusters produced by the corresponding analyses. in computational biology, it shows the clustering of genes or samples, sometimes in the margins of heatmaps. in phylogenetics, it displays the evolutionary relationships among various biological taxa. In this case, the dendrogram is also called a phylogenetic tree. The name dendrogram derives from the two ancient greek words δένδρον (déndron), meaning "tree", and γράμμα (grámma), meaning "drawing, mathematical figure". == Clustering example == For a clustering example, suppose that five taxa ( a {\displaystyle a} to e {\displaystyle e} ) have been clustered by UPGMA based on a matrix of genetic distances. The hierarchical clustering dendrogram would show a column of five nodes representing the initial data (here individual taxa), and the remaining nodes represent the clusters to which the data belong, with the arrows representing the distance (dissimilarity). The distance between merged clusters is monotone, increasing with the level of the merger: the height of each node in the plot is proportional to the value of the intergroup dissimilarity between its two daughters (the nodes on the right representing individual observations all plotted at zero height).

    Read more →
  • Conference on Computer Vision and Pattern Recognition

    Conference on Computer Vision and Pattern Recognition

    The Conference on Computer Vision and Pattern Recognition is an annual conference on computer vision and pattern recognition. == Affiliations == The conference was first held in 1983 in Washington, DC, organized by Takeo Kanade and Dana H. Ballard. From 1985 to 2010 it was sponsored by the IEEE Computer Society. In 2011 it was also co-sponsored by University of Colorado Colorado Springs. Since 2012 it has been co-sponsored by the IEEE Computer Society and the Computer Vision Foundation, which provides open access to the conference papers. == Scope == The conference considers a wide range of topics related to computer vision and pattern recognition—basically any topic that is extracting structures or answers from images or video or applying mathematical methods to data to extract or recognize patterns. Common topics include object recognition, image segmentation, motion estimation, 3D reconstruction, and deep learning. The conference generally has less than 30% acceptance rates for all papers and less than 5% for oral presentations. It is managed by a rotating group of volunteers who are chosen in a public election at the Pattern Analysis and Machine Intelligence-Technical Community (PAMI-TC) meeting four years before the meeting. The conference uses a multi-tier double-blind peer review process. The program chairs, who cannot submit papers, select area chairs who manage the reviewers for their subset of submissions. == Location and time == The conference is usually held in June in North America. == Awards == === Best Paper Award === These awards are picked by committees delegated by the program chairs of the conference. === Longuet-Higgins Prize === The Longuet-Higgins Prize recognizes papers from ten years ago that have made a significant impact on computer vision research. === PAMI Young Researcher Award === The Pattern Analysis and Machine Intelligence Young Researcher Award is an award given by the Technical Committee on Pattern Analysis and Machine Intelligence of the IEEE Computer Society to a researcher within 7 years of completing their Ph.D. for outstanding early career research contributions. Candidates are nominated by the computer vision community, with winners selected by a committee of senior researchers in the field. This award was originally instituted in 2012 by the journal Image and Vision Computing, also presented at the conference, and the journal continues to sponsor the award. === PAMI Thomas S. Huang Memorial Prize === The Thomas Huang Memorial Prize was established at the 2020 conference and is awarded annually starting from 2021 to honor researchers who are recognized as examples in research, teaching/mentoring, and service to the computer vision community.

    Read more →