AI Detector Yang Dipakai Dosen

AI Detector Yang Dipakai Dosen — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Stencil buffer

    Stencil buffer

    A stencil buffer is an extra data buffer, in addition to the color buffer and Z-buffer, found on modern graphics hardware. The buffer is per pixel and works on integer values, usually with a depth of one byte per pixel. The Z-buffer and stencil buffer often share the same area in the RAM of the graphics hardware. In the simplest case, the stencil buffer is used to limit the area of rendering (stenciling). More advanced usage of the stencil buffer makes use of the strong connection between the Z-buffer and the stencil buffer in the rendering pipeline. For example, stencil values can be automatically increased/decreased for every pixel that fails or passes the depth test. The simple combination of depth test and stencil modifiers make a vast number of effects possible (such as stencil shadow volumes, Two-Sided Stencil, compositing, decaling, dissolves, fades, swipes, silhouettes, outline drawing, or highlighting of intersections between complex primitives) though they often require several rendering passes and, therefore, can put a heavy load on the graphics hardware. The most typical application is still to add shadows to 3D applications. It is also used for planar reflections. Other rendering techniques, such as portal rendering, use the stencil buffer in other ways; for example, it can be used to find the area of the screen obscured by a portal and re-render those pixels correctly. The stencil buffer and its modifiers can be accessed in computer graphics by using APIs like OpenGL, Direct3D, Vulkan or Metal. == Architecture == The stencil buffer typically shares the same memory space as the Z-buffer, and typically the ratio is 24 bits for Z-buffer + 8 bits for stencil buffer or, in the past, 15 bits for Z-buffer + 1 bit for stencil buffer. Another variant is 4 + 24, where 28 of the 32 bits are used and 4 ignored. Stencil and Z-buffers are part of the frame buffer, coupled to the color buffer. The first chip available to a wider market was 3Dlabs' Permedia II, which supported a one-bit stencil buffer. The bits allocated to the stencil buffer can be used to represent numerical values in the range [0, 2n-1], and also as a Boolean matrix (n is the number of allocated bits), each of which may be used to control the particular part of the scene. Any combination of these two ways of using the available memory is also possible. == Stencil test == Stencil test or stenciling is among the operations on the pixels/fragments (Per-pixel operations), located after the alpha test, and before the depth test. The stencil test ensures undesired pixels do not reach the depth test. This saves processing time for the scene. Similarly, the alpha test can prevent corresponding pixels to reach the stencil test. The test itself is carried out over the stencil buffer to some value in it, or altered or used it, and carried out through the so-called stencil function and stencil operations. The stencil function is a function by which the stencil value of a certain pixel is compared to a given reference value. If this comparison is logically true, the stencil test passes. Otherwise not. In doing so, the possible reaction caused by the result of comparing three different state-depth and stencil buffer: Stencil test is not passed Stencil test is passed but not the depth test Both tests are passed (or stencil test is passed, and the depth is not enabled) For each of these cases, different operations can be set over the examined pixel. In the OpenGL stencil functions, the reference value and mask, respectively, define the function glStencilFunc. In Direct3D each of these components is adjusted individually using methods SetRenderState devices currently in control. This method expects two parameters, the first of which is a condition that is set and the other its value. In the order that was used above, these conditions are called D3DRS_STENCILFUNC, D3DRS_STENCILREF, and D3DRS_STENCILMASK. Stencil operations in OpenGL adjust glStencilOp function that expects three values. In Direct3D, again, each state sets a specific method SetRenderState. The three states that can be assigned to surgery are called D3DRS_STENCILFAIL, D3DRENDERSTATE_STENCILZFAIL, and D3DRENDERSTATE_STENCILPASS. == Z-fighting == Due to the lack of precision in the Z-buffer, coplanar polygons that are short-range, or overlapping, can be portrayed as a single plane with a multitude of irregular cross-sections. These sections can vary depending on the camera position and other parameters and are rapidly changing. This is called Z-fighting. There exist multiple solutions to this issue: - Bring the far plane closer to restrict the scene's depth, thus increasing the accuracy of the Z-buffer, or reducing the distance at which objects are visible in the scene. - Increase the number of bits allocated to the Z-buffer, which is possible at the expense of memory for the stencil buffer. - Move polygons farther apart from one another, which restricts the possibilities for the artist to create an elaborate scene. All of these approaches to the problem can only reduce the likelihood that the polygons will experience Z-fighting, and do not guarantee a definitive solution in the general case. A solution that includes the stencil buffer is based on the knowledge of which polygon should be in front of the others. The silhouette of the front polygon is drawn into the stencil buffer. After that, the rest of the scene can be rendered only where the silhouette is negative, and so will not clash with the front polygon. == Shadow volume == Shadow volume is a technique used in 3D computer graphics to add shadows to a rendered scene. They were first proposed by Frank Crow in 1977 as the geometry describing the 3D shape of the region occluded from a light source. A shadow volume divides the virtual world in two: areas that are in shadow and areas that are not. The stencil buffer implementation of shadow volumes is generally considered among the most practical general-purpose real-time shadowing techniques for use on modern 3D graphics hardware. It has been popularised by the video game Doom 3, and a particular variation of the technique used in this game has become known as Carmack's Reverse. == Reflections == Reflection of a scene is drawn as the scene itself transformed and reflected relative to the "mirror" plane, which requires multiple render passes and using of stencil buffer to restrict areas where the current render pass works: Draw the scene excluding mirror areas – for each mirror lock the Z-buffer and color buffer Render visible part of the mirror Depth test is set up so that each pixel is passed to enter the maximum value and always passes for each mirror: Depth test is set so that it passes only if the distance of a pixel is less than the current (default behavior) The matrix transformation is changed to reflect the scene relative to the mirror plane Unlock the Z-buffer and color buffer Draw the scene, but only the part of it that lies between the mirror plane and the camera. In other words, a mirror plane is also a clipping plane Again locks color buffer, depth test is set so that it always passes, reset stencil for the next mirror. == Planar Shadows == While drawing a plane of shadows, there are two dominant problems: The first concerns the problem of deep struggle in case the flat geometry is not awarded on the part covered with the shadow of shadows and outside. See the section that relates to this. Another problem relates to the extent of the shadows outside the area where the plane there. Another problem, which may or may not appear, depending on the technique, the design of more polygons in one part of the shadow, resulting in darker and lighter parts of the same shade. All three problems can be solved geometrically, but because of the possibility that hardware acceleration is directly used, it is a far more elegant implementation using the stencil buffer: 1. Enable lights and the lights 2. Draw a scene without any polygon that should be projected shadows 3. Draw all polygons which should be projected shadows, but without lights. In doing so, the stencil buffer, the pixel of each polygon to be assigned to a specific value for the ground to which they belong. The distance between these values should be at least two, because for each plane to be used two values for two states: in the shadows and bright. 4. Disable any global illumination (to ensure that the next steps will affect only individual selected light) For each plane: For each light: 1. Edit a stencil buffer and only the pixels that carry a specific value for the selected level. Increase the value of all the pixels that are projected objects between the date of a given level and bright. 2. Allow only selected light for him to draw level at which part of her specific value was not changed. == Spatial shadows == Stencil buffer implementation of spatial drawing shadows is any shadow of a geometric body that its volume includes part of the scene that is

    Read more →
  • Genotypic and phenotypic repair

    Genotypic and phenotypic repair

    Genotypic and phenotypic repair are optional components of an evolutionary algorithm (EA). An EA reproduces essential elements of biological evolution as a computer algorithm in order to solve demanding optimization or planning tasks, at least approximately. A candidate solution is represented by a - usually linear - data structure that plays the role of an individual's chromosome. New solution candidates are generated by mutation and crossover operators following the example of biology. These offspring may be defective, which is corrected or compensated for by genotypic or phenotypic repair. == Description == Genotypic repair, also known as genetic repair, is the removal or correction of impermissible entries in the chromosome that violate restrictions. In phenotypic repair, the corrections are only made in the genotype-phenotype mapping and the chromosome remains unchanged. Michalewicz wrote about the importance of restrictions in real-world applications: "In general, constraints are an integral part of the formulation of any problem". Restriction violations are application-specific and therefore it depends on the current problem whether and which type of repair is useful. They can usually also be treated by a correspondingly extended evaluation and it depends on the problem which measures are possible and which is the most suitable. If a phenotypic repair is feasible, then it is usually the most efficient compared to the other measures. A survey on repair methods used as constraint handling techniques can be found in. Violations of the range limits of genes should be avoided as far as possible by the formulation of the genome. If this is not possible or if restrictions within the search space defined by the genome are involved, their violations are usually handled by the evaluation. This can be done, for example, by penalty functions that lower the fitness. Repair is often also required for combinatorial tasks. The application of a 1- or n-point crossover operator can, for example, lead to genes being missing in one of the child genomes that are present in duplicate in the other. In this case, a suitable genotypic repair measure is to move the surplus genes to the other genome in a positional manner. The use of the aforementioned operators in combinatorial tasks has also proven to be useful in combination with crossover types specially developed for permutations, at least for certain problems. Particularly in combinatorial problems, it has been observed that genotypic repair can promote premature convergence to a suboptimum, but can also significantly accelerate a successful search. Studies on various tasks have shown that this is application-dependent. An effective measure to avoid premature convergence is generally the use of structured populations instead of the usual panmictic ones. Sequence restrictions play a role in many scheduling tasks, for example when it comes to planning workflows. If, for example, it is specified that step A must be carried out before step B and the gene of step B is located before the gene of A in the chromosome, then there is an impermissible gene sequence. This is because the scheduling operation of step B requires the planned end of step A for correct scheduling, but this is not yet scheduled at the time gene B is processed. The problem can be solved in two ways: The scheduling operation of step B is postponed until the gene from step A has been processed. The genome remains unchanged and the repair only influences the genotype-phenotype mapping. Since only the phenotype is changed, this is referred to as phenotypic repair. If, on the other hand, the gene of step B is moved behind the gene of step A, this is a genotypic repair. The same applies to the alternative shift of gene A in front of gene B. In this case, genotypic repair has the disadvantage that it prevents a meaningful restructuring of the gene sequence in the chromosome if this requires several intermediate steps (mutations) that at least partially violate restrictions.

    Read more →
  • Junction tree algorithm

    Junction tree algorithm

    The junction tree algorithm (also known as 'Clique Tree') is a method used in machine learning to extract marginalization in general graphs. In essence, it entails performing belief propagation on a modified graph called a junction tree. The graph is called a tree because it branches into different sections of data; nodes of variables are the branches. The basic premise is to eliminate cycles by clustering them into single nodes. Multiple extensive classes of queries can be compiled at the same time into larger structures of data. There are different algorithms to meet specific needs and for what needs to be calculated. Inference algorithms gather new developments in the data and calculate it based on the new information provided. == Junction tree algorithm == === Hugin algorithm === If the graph is directed then moralize it to make it un-directed. Introduce the evidence. Triangulate the graph to make it chordal. Construct a junction tree from the triangulated graph (we will call the vertices of the junction tree "supernodes"). Propagate the probabilities along the junction tree (via belief propagation) Note that this last step is inefficient for graphs of large treewidth. Computing the messages to pass between supernodes involves doing exact marginalization over the variables in both supernodes. Performing this algorithm for a graph with treewidth k will thus have at least one computation which takes time exponential in k. It is a message passing algorithm. The Hugin algorithm takes fewer computations to find a solution compared to Shafer-Shenoy. === Shafer-Shenoy algorithm === Computed recursively Multiple recursions of the Shafer-Shenoy algorithm results in Hugin algorithm Found by the message passing equation Separator potentials are not stored The Shafer-Shenoy algorithm is the sum product of a junction tree. It is used because it runs programs and queries more efficiently than the Hugin algorithm. The algorithm makes calculations for conditionals for belief functions possible. Joint distributions are needed to make local computations happen. === Underlying theory === The first step concerns only Bayesian networks, and is a procedure to turn a directed graph into an undirected one. We do this because it allows for the universal applicability of the algorithm, regardless of direction. The second step is setting variables to their observed value. This is usually needed when we want to calculate conditional probabilities, so we fix the value of the random variables we condition on. Those variables are also said to be clamped to their particular value. The third step is to ensure that graphs are made chordal if they aren't already chordal. This is the first essential step of the algorithm. It makes use of the following theorem: Theorem: For an undirected graph, G, the following properties are equivalent: Graph G is triangulated. The clique graph of G has a junction tree. There is an elimination ordering for G that does not lead to any added edges. Thus, by triangulating a graph, we make sure that the corresponding junction tree exists. A usual way to do this, is to decide an elimination order for its nodes, and then run the Variable elimination algorithm. The variable elimination algorithm states that the algorithm must be run each time there is a different query. This will result to adding more edges to the initial graph, in such a way that the output will be a chordal graph. All chordal graphs have a junction tree. The next step is to construct the junction tree. To do so, we use the graph from the previous step, and form its corresponding clique graph. Now the next theorem gives us a way to find a junction tree: Theorem: Given a triangulated graph, weight the edges of the clique graph by their cardinality, |A∩B|, of the intersection of the adjacent cliques A and B. Then any maximum-weight spanning tree of the clique graph is a junction tree. So, to construct a junction tree we just have to extract a maximum weight spanning tree out of the clique graph. This can be efficiently done by, for example, modifying Kruskal's algorithm. The last step is to apply belief propagation to the obtained junction tree. Usage: A junction tree graph is used to visualize the probabilities of the problem. The tree can become a binary tree to form the actual building of the tree. A specific use could be found in auto encoders, which combine the graph and a passing network on a large scale automatically. === Inference Algorithms === Loopy belief propagation: A different method of interpreting complex graphs. The loopy belief propagation is used when an approximate solution is needed instead of the exact solution. It is an approximate inference. Cutset conditioning: Used with smaller sets of variables. Cutset conditioning allows for simpler graphs that are easier to read but are not exact.

    Read more →
  • Latent class model

    Latent class model

    In statistics, a latent class model (LCM) is a model for clustering multivariate discrete data. It assumes that the data arise from a mixture of discrete distributions, within each of which the variables are independent. It is called a latent class model because the class to which each data point belongs is unobserved (or latent). Latent class analysis (LCA) is a subset of structural equation modeling used to find groups or subtypes of cases in multivariate categorical data. These groups or subtypes of cases are called "latent classes". When faced with the following situation, a researcher might opt to use LCA to better understand the data: Symptoms a, b, c, and d have been recorded in a variety of patients diagnosed with diseases X, Y, and Z. Disease X is associated with symptoms a, b, and c; disease Y is linked to symptoms b, c, and d; and disease Z is connected to symptoms a, c, and d. In this context, the LCA would attempt to detect the presence of latent classes (i.e., the disease entities), thus creating patterns of association in the symptoms. As in factor analysis, LCA can also be used to classify cases according to their maximum likelihood class membership probability. The key criterion for resolving the LCA is identifying latent classes in which the observed symptom associations are effectively rendered null. This is because within each class, the diseases responsible for the symptoms create a structure of dependencies. As a result, the symptoms become conditionally independent, meaning that, given the class a case belongs to, the symptoms are no longer related to one another. == Model == Within each latent class, the observed variables are statistically independent—an essential aspect of latent class modeling. Usually, the observed variables are statistically dependent. By introducing the latent variable, independence is restored in the sense that within classes, variables are independent (local independence). Therefore, the association between the observed variables is explained by the classes of the latent variable (McCutcheon, 1987). In one form, the LCM is written as p i 1 , i 2 , … , i N ≈ ∑ t T p t ∏ n N p i n , t n , {\displaystyle p_{i_{1},i_{2},\ldots ,i_{N}}\approx \sum _{t}^{T}p_{t}\,\prod _{n}^{N}p_{i_{n},t}^{n},} where T {\displaystyle T} is the number of latent classes and p t {\displaystyle p_{t}} are the so-called recruitment or unconditional probabilities that should sum to one. p i n , t n {\displaystyle p_{i_{n},t}^{n}} are the marginal or conditional probabilities. For a two-way latent class model, the form is p i j ≈ ∑ t T p t p i t p j t . {\displaystyle p_{ij}\approx \sum _{t}^{T}p_{t}\,p_{it}\,p_{jt}.} This two-way model is related to probabilistic latent semantic analysis and non-negative matrix factorization. The probability model used in LCA is closely related to the Naive Bayes classifier. The main difference is that in LCA, the class membership of an individual is a latent variable, whereas in Naive Bayes classifiers, the class membership is an observed label. == Related methods == There are a number of methods with distinct names and uses that share a common relationship. Cluster analysis is, like LCA, used to discover taxon-like groups of cases in data. Multivariate mixture estimation (MME) is applicable to continuous data and assumes that such data arise from a mixture of distributions, such as a set of heights arising from a mixture of men and women. If a multivariate mixture estimation is constrained so that measures must be uncorrelated within each distribution, it is termed latent profile analysis. Modified to handle discrete data, this constrained analysis is known as LCA. Discrete latent trait models further constrain the classes to form from segments of a single dimension, allocating members to classes based on that dimension. An example would be assigning cases to social classes based on ability or merit. In a practical instance, the variables could be multiple choice items of a political questionnaire. In this case, the data consists of an N-way contingency table with answers to the items for a number of respondents. In this example, the latent variable refers to political opinion, and the latent classes to political groups. Given group membership, the conditional probabilities specify the chance that certain answers are chosen. == Application == LCA may be used in many fields, such as: collaborative filtering, Behavior Genetics and Evaluation of diagnostic tests.

    Read more →
  • GeoNetwork opensource

    GeoNetwork opensource

    The GeoNetwork opensource (GNOS) project is a free and open source (FOSS) cataloging application for spatially referenced resources. It is a catalog of location-oriented information. == Outline == It is a standardized and decentralized spatial information management environment designed to enable access to geo-referenced databases, cartographic products and related metadata from a variety of sources, enhancing the spatial information exchange and sharing between organizations and their audience, using the capacities of the internet. Using the Z39.50 protocol it both accesses remote catalogs and makes its data available to other catalog services. As of 2007, OGC Web Catalog Service are being implemented. Maps, including those derived from satellite imagery, are effective communicational tools and play an important role in the work of decision makers (e.g., sustainable development planners and humanitarian and emergency managers) in need of quick, reliable and up-to-date user-friendly cartographic products as a basis for action and to better plan and monitor their activities; GIS experts in need of exchanging consistent and updated geographical data; and spatial analysts in need of multidisciplinary data to perform preliminary geographical analysis and make reliable forecasts. == Deployment == The software has been deployed to various organizations, the first being FAO GeoNetwork and WFP VAM-SIE-GeoNetwork, both at their headquarters in Rome, Italy. Furthermore, the WHO, CGIAR, BRGM, ESA, FGDC and the Global Change Information and Research Centre (GCIRC) of China are working on GeoNetwork opensource implementations as their spatial information management capacity. It is used for several risk information systems, in particular in the Gambia. Several related tools are packaged with GeoNetwork, including GeoServer. GeoServer stores geographical data, while GeoNetwork catalogs collections of such data.

    Read more →
  • Semantic mapping (statistics)

    Semantic mapping (statistics)

    Semantic mapping (SM) is a statistical method for dimensionality reduction (the transformation of data from a high-dimensional space into a low-dimensional space). SM can be used in a set of multidimensional vectors of features to extract a few new features that preserves the main data characteristics. SM performs dimensionality reduction by clustering the original features in semantic clusters and combining features mapped in the same cluster to generate an extracted feature. Given a data set, this method constructs a projection matrix that can be used to map a data element from a high-dimensional space into a reduced dimensional space. SM can be applied in construction of text mining and information retrieval systems, as well as systems managing vectors of high dimensionality. SM is an alternative to random mapping, principal components analysis and latent semantic indexing methods.

    Read more →
  • Amazon Rekognition

    Amazon Rekognition

    Amazon Rekognition is a cloud-based software as a service (SaaS) computer vision platform that was launched in 2016. It has been sold to, and used by, a number of United States government agencies, including U.S. Immigration and Customs Enforcement (ICE) and Orlando, Florida police, as well as private entities. == Capabilities == Rekognition provides a number of computer vision capabilities, which can be divided into two categories: Algorithms that are pre-trained on data collected by Amazon or its partners, and algorithms that a user can train on a custom dataset. As of July 2019, Rekognition provides the following computer vision capabilities. === Pre-trained algorithms === Celebrity recognition in images Facial attribute detection in images, including gender, age range, emotions (e.g. happy, calm, disgusted), whether the face has a beard or mustache, whether the face has eyeglasses or sunglasses, whether the eyes are open, whether the mouth is open, whether the person is smiling, and the location of several markers such as the pupils and jaw line. People Pathing enables tracking of people through a video. An advertised use-case of this capability is to track sports players for post-game analysis. Text detection and classification in images Unsafe visual content detection === Algorithms that a user can train on a custom dataset === SearchFaces enables users to import a database of images with pre-labeled faces, to train a machine learning model on this database, and to expose the model as a cloud service with an API. Then, the user can post new images to the API and receive information about the faces in the image. The API can be used to expose a number of capabilities, including identifying faces of known people, comparing faces, and finding similar faces in a database. Face-based user verification == History and use == === 2017 === In late 2017, the Washington County, Oregon Sheriff's Office began using Rekognition to identify suspects' faces. Rekognition was marketed as a general-purpose computer vision tool, and an engineer working for Washington County decided to use the tool for facial analysis of suspects. Rekognition was offered to the department for free, and Washington County became the first US law enforcement agency known to use Rekognition. In 2018, the agency logged over 1,000 facial searches. The county, according to the Washington Post, by 2019 was paying about $7 a month for all of its searches. The relationship was unknown to the public until May 2018. In 2018, Rekognition was also used to help identify celebrities during a royal wedding telecast. === 2018 === In April 2018, it was reported that FamilySearch was using Rekognition to enable their users to "see which of their ancestors they most resemble based on family photographs". In early 2018, the FBI also began using it as a pilot program for analyzing video surveillance. In May 2018, it was reported by the ACLU that Orlando, Florida was running a pilot using Rekognition for facial analysis in law enforcement, with that pilot ending in July 2019. After the report, on June 22, 2018, Gizmodo reported that Amazon workers had written a letter to CEO Jeff Bezos requesting he cease selling Rekognition to US law enforcement, particularly ICE and Homeland Security. A letter was also sent to Bezos by the ACLU. On June 26, 2018, it was reported that the Orlando police force had ceased using Rekognition after their trial contract expired, reserving the right to use it in the future. The Orlando Police Department said that they had "never gotten to the point to test images" due to old infrastructure and low bandwidth. In July 2018, the ACLU released a test showing that Rekognition had falsely matched 28 members of Congress with mugshot photos, particularly Congresspeople of color. 25 House members afterwards sent a letter to Bezos, expressing concern about Rekognition. Amazon responded saying the Rekognition test had generated 80 percent confidence, while it recommended law enforcement only use matches rated at 99 percent confidence. The Washington Post states that Oregon instead has officers pick a "best of five" result, instead of adhering to the recommendation. In September 2018, it was reported that Mapillary was using Rekognition to read the text on parking signs (e.g. no stopping, no parking, or specific parking hours) in cities. In October 2018, it was reported that Amazon had earlier that year pitched Rekognition to U.S. Immigration and Customs Enforcement agency. Amazon defended government use of Rekognition. On December 1, 2018, it was reported that 8 Democratic lawmakers had said in a letter that Amazon had "failed to provide sufficient answers" about Rekognition, writing that they had "serious concerns that this type of product has significant accuracy issues, places disproportionate burdens on communities of color, and could stifle Americans' willingness to exercise their First Amendment rights in public." === 2019 === In January 2019, MIT researchers published a peer-reviewed study asserting that Rekognition had more difficulty in identifying dark-skinned females than competitors such as IBM and Microsoft. In the study, Rekognition misidentified darker-skinned women as men 31% of the time, but made no mistakes for light-skinned men. Amazon called the report "misinterpreted results" of the research with an improper "default confidence threshold." In January 2019, Amazon's shareholders "urged Amazon to stop selling Rekognition software to law enforcement agencies." Amazon in response defended its use of Rekognition, but supported new federal oversight and guidelines to "make sure facial recognition technology cannot be used to discriminate." In February 2019, it was reported that Amazon was collaborating with the National Institute of Standards and Technology (NIST) on developing standardized tests to improve accuracy and remove bias with facial recognition. In March 2019, an open letter regarding Rekognition was sent by a group of prominent AI researchers to Amazon, criticizing its sale to law enforcement with around 50 signatures. In April 2019, Amazon was told by the Securities and Exchange Commission that they had to vote on two shareholder proposals seeking to limit Rekognition. Amazon argued that the proposals were an "insignificant public policy issue for the Company" not related to Amazon's ordinary business, but their appeal was denied. The vote was set for May. The first proposal was tabled by shareholders. On May 24, 2019, 2.4% of shareholders voted to stop selling Rekognition to government agencies, while a second proposal calling for a study into Rekognition and civil rights had 27.5% support. In August 2019, the ACLU again used Rekognition on members of government, with 26 of 120 lawmakers in California flagged as matches to mugshots. Amazon stated the ACLU was "misusing" the software in the tests, by not dismissing results that did not meet Amazon's recommended accuracy threshold of 99%. By August 2019, there had been protests against ICE's use of Rekognition to surveil immigrants. In March 2019, Amazon announced a Rekognition update that would improve emotional detection, and in August 2019, "fear" was added to emotions that Rekognition could detect. === 2020 === In June 2020, Amazon announced it was implementing a one-year moratorium on police use of Rekognition, in response to the George Floyd protests. === 2024 === The Department of Justice disclosed that the FBI is initiating the use of Amazon Rekognition. The DOJ's AI inventory revealed the FBI's "Project Tyr" aims to customize Rekognition to identify nudity, weapons, explosives, and other information from lawfully acquired media. === 2025 === In late 2025, the New York Times reported that scientist, Dr. Jürgen Matthäus, retired from as the head of research at the U.S. Holocaust Memorial Museum in Washington, D.C., used Amazon Rekognition to identify the shooter in the Holocaust photograph known as The Last Jew in Vinnitsa "with more than 99 percent certainty" — as Jakobus Onnen (1906–1943), a teacher from Tichelwarf near Weener in East Frisia who had been a member of the SS since 1934 and was later killed in action near Zhitomir in 1943. The photographer and victim remain unidentified. == Controversy regarding facial analysis == === Racial and gender bias === In 2018, MIT researchers Joy Buolamwini and Timnit Gebru published a study called Gender Shades. In this study, a set of images was collected, and faces in the images were labeled with face position, gender, and skin tone information. The images were run through SaaS facial recognition platforms from Face++, IBM, and Microsoft. In all three of these platforms, the classifiers performed best on male faces (with error rates on female faces being 8.1% to 20.6% higher than error rates on male faces), and they performed worst on dark female faces (with error rates ranging from 20.8% to 30.4%). The authors hypothesized that this discr

    Read more →
  • Random projection

    Random projection

    In mathematics and statistics, random projection is a technique used to reduce the dimensionality of a set of points which lie in Euclidean space. According to theoretical results, random projection preserves distances well, but empirical results are sparse. They have been applied to many natural language tasks under the name random indexing. == Dimensionality reduction == Dimensionality reduction, as the name suggests, is reducing the number of random variables using various mathematical methods from statistics and machine learning. Dimensionality reduction is often used to reduce the problem of managing and manipulating large data sets. Dimensionality reduction techniques generally use linear transformations in determining the intrinsic dimensionality of the manifold as well as extracting its principal directions. For this purpose there are various related techniques, including: principal component analysis, linear discriminant analysis, canonical correlation analysis, discrete cosine transform, random projection, etc. Random projection is a simple and computationally efficient way to reduce the dimensionality of data by trading a controlled amount of error for faster processing times and smaller model sizes. The dimensions and distribution of random projection matrices are controlled so as to approximately preserve the pairwise distances between any two samples of the dataset. == Method == The core idea behind random projection is given in the Johnson-Lindenstrauss lemma, which states that if points in a vector space are of sufficiently high dimension, then they may be projected into a suitable lower-dimensional space in a way which approximately preserves pairwise distances between the points with high probability. In random projection, the original d {\displaystyle d} -dimensional data is projected to a k {\displaystyle k} -dimensional subspace, by multiplying on the left by a random matrix R ∈ R k × d {\displaystyle R\in \mathbb {R} ^{k\times d}} . Using matrix notation: If X d × N {\displaystyle X_{d\times N}} is the original set of N d-dimensional observations, then X k × N R P = R k × d X d × N {\displaystyle X_{k\times N}^{RP}=R_{k\times d}X_{d\times N}} is the projection of the data onto a lower k-dimensional subspace. Random projection is computationally simple: form the random matrix "R" and project the d × N {\displaystyle d\times N} data matrix X onto K dimensions of order O ( d k N ) {\displaystyle O(dkN)} . If the data matrix X is sparse with about c nonzero entries per column, then the complexity of this operation is of order O ( c k N ) {\displaystyle O(ckN)} . === Orthogonal random projection === A unit vector can be orthogonally projected to a random subspace. Let u {\displaystyle u} be the original unit vector, and let v {\displaystyle v} be its projection. The norm-squared ‖ v ‖ 2 2 {\displaystyle \|v\|_{2}^{2}} has the same distribution as projecting a random point, uniformly sampled on the unit sphere, to its first k {\displaystyle k} coordinates. This is equivalent to sampling a random point in the multivariate gaussian distribution x ∼ N ( 0 , I d × d ) {\displaystyle x\sim {\mathcal {N}}(0,I_{d\times d})} , then normalizing it. Therefore, ‖ v ‖ 2 2 {\displaystyle \|v\|_{2}^{2}} has the same distribution as ∑ i = 1 k x i 2 ∑ i = 1 k x i 2 + ∑ i = k + 1 d x i 2 {\displaystyle {\frac {\sum _{i=1}^{k}x_{i}^{2}}{\sum _{i=1}^{k}x_{i}^{2}+\sum _{i=k+1}^{d}x_{i}^{2}}}} , which by the chi-squared construction of the Beta distribution, has distribution Beta ⁡ ( k / 2 , ( d − k ) / 2 ) {\displaystyle \operatorname {Beta} (k/2,(d-k)/2)} , with mean k / d {\displaystyle k/d} . We have a concentration inequality P r [ | ‖ v ‖ 2 − k d | ≥ ϵ k d ] ≤ 3 exp ⁡ ( − k ϵ 2 / 64 ) {\displaystyle Pr\left[\left|\|v\|_{2}-{\frac {k}{d}}\right|\geq \epsilon {\sqrt {\frac {k}{d}}}\right]\leq 3\exp \left(-k\epsilon ^{2}/64\right)} for any ϵ ∈ ( 0 , 1 ) {\displaystyle \epsilon \in (0,1)} . === Gaussian random projection === The random matrix R can be generated using a Gaussian distribution. The first row is a random unit vector uniformly chosen from S d − 1 {\displaystyle S^{d-1}} . The second row is a random unit vector from the space orthogonal to the first row, the third row is a random unit vector from the space orthogonal to the first two rows, and so on. In this way of choosing R, and the following properties are satisfied: Spherical symmetry: For any orthogonal matrix A ∈ O ( d ) {\displaystyle A\in O(d)} , RA and R have the same distribution. Orthogonality: The rows of R are orthogonal to each other. Normality: The rows of R are unit-length vectors. === More computationally efficient random projections === Achlioptas has shown that the random matrix can be sampled more efficiently. Either the full matrix can be sampled IID according to R i , j = 3 / k × { + 1 with probability 1 6 0 with probability 2 3 − 1 with probability 1 6 {\displaystyle R_{i,j}={\sqrt {3/k}}\times {\begin{cases}+1&{\text{with probability }}{\frac {1}{6}}\\0&{\text{with probability }}{\frac {2}{3}}\\-1&{\text{with probability }}{\frac {1}{6}}\end{cases}}} or the full matrix can be sampled IID according to R i , j = 1 / k × { + 1 with probability 1 2 − 1 with probability 1 2 {\displaystyle R_{i,j}={\sqrt {1/k}}\times {\begin{cases}+1&{\text{with probability }}{\frac {1}{2}}\\-1&{\text{with probability }}{\frac {1}{2}}\end{cases}}} Both are efficient for database applications because the computations can be performed using integer arithmetic. More related study is conducted in. It was later shown how to use integer arithmetic while making the distribution even sparser, having very few nonzeroes per column, in work on the Sparse JL Transform. This is advantageous since a sparse embedding matrix means being able to project the data to lower dimension even faster. === Random Projection with Quantization === Random projection can be further condensed by quantization (discretization), with 1-bit (sign random projection) or multi-bits. It is the building block of SimHash, RP tree, and other memory efficient estimation and learning methods. == Large quasiorthogonal bases == The Johnson-Lindenstrauss lemma states that large sets of vectors in a high-dimensional space can be linearly mapped in a space of much lower (but still high) dimension n with approximate preservation of distances. One of the explanations of this effect is the exponentially high quasiorthogonal dimension of n-dimensional Euclidean space. There are exponentially large (in dimension n) sets of almost orthogonal vectors (with small value of inner products) in n–dimensional Euclidean space. This observation is useful in indexing of high-dimensional data. Quasiorthogonality of large random sets is important for methods of random approximation in machine learning. In high dimensions, exponentially large numbers of randomly and independently chosen vectors from equidistribution on a sphere (and from many other distributions) are almost orthogonal with probability close to one. This implies that in order to represent an element of such a high-dimensional space by linear combinations of randomly and independently chosen vectors, it may often be necessary to generate samples of exponentially large length if we use bounded coefficients in linear combinations. On the other hand, if coefficients with arbitrarily large values are allowed, the number of randomly generated elements that are sufficient for approximation is even less than dimension of the data space. == Implementations == RandPro - An R package for random projection sklearn.random_projection - A module for random projection from the scikit-learn Python library Weka implementation [1]

    Read more →
  • Medical data breach

    Medical data breach

    Medical data, including patients' identity information, health status, disease diagnosis and treatment, and biogenetic information, not only involve patients' privacy but also have a special sensitivity and important value, which may bring physical and mental distress and property loss to patients and even negatively affect social stability and national security once leaked. However, the development and application of medical AI must rely on a large amount of medical data for algorithm training, and the larger and more diverse the amount of data, the more accurate the results of its analysis and prediction will be. However, the application of big data technologies such as data collection, analysis and processing, cloud storage, and information sharing has increased the risk of data leakage. In the United States, the rate of such breaches has increased over time, with 176 million records breached by the end of 2017. By 2024, the U.S. Department of Health and Human Services reported 725 large healthcare data breaches affecting approximately 275 million individual records in a single year, marking a significant escalation in both the frequency and scale of incidents. == Black market for health data == In February 2015 an NPR report claimed that organized crime networks had ways of selling health data in the black market. In 2015 a Beazley employee estimated that medical records could sell on the black market for US$40-50. == How data is lost == Theft, data loss, hacking, and unauthorized account access are ways in which medical data breaches happen. Among reported breaches of medical information in the United States networked information systems accounted for the largest number of records breached. There are many data breaches happening in the US health care system, among business associates of the health care providers that continuously gain access to patients' data. == List of data breaches == In February 2024, a ransomware attack on Change Healthcare, a subsidiary of UnitedHealth Group, compromised the protected health information of approximately 100 million individuals, making it the largest healthcare data breach in United States history. The attack disrupted claims processing for healthcare providers nationwide for several weeks. In May 2024, MediSecure suffered a cyberattack involving ransomware in Australia. In May 2021, the Health Service Executive in the Republic of Ireland was the victim of a cyberattack involving ransomware, in the Health Service Executive cyberattack, with admission records and test results present in a sample of the data reviewed by the Financial Times. In October 2018, the Centers for Medicare and Medicaid Services in the US reported that around 75,000 individual records had been affected by a data breach that took place through the ACA Agent and Broker Portal. In 2018, Social Indicators Research published the scientific evidence of 173,398,820 (over 173 million) individuals affected in USA from October 2008 (when the data were collected) to September 2017 (when the statistical analysis took place). In 2015, Anthem Inc. lost data for 37 million people in the Anthem medical data breach In 2014 4.5 million people using Complete Health Systems had their data stolen In 2013-14 1 million people using Montana Department of Public Health and Human Services had their data stolen In 2013 4 million people using Advocate Health and Hospitals Corporation had their data stolen In 2011 4.9 million users of Tricare services had their data stolen due to an employee error by Science Applications International Corporation In 2011 1.9 million people using Health Net had their data stolen In 2011 1 million people using Nemours Foundation had their data stolen In 2010 6800 people using New York-Presbyterian Hospital and Columbia University Medical Center had their data breached. In response, those organizations agreed to pay the United States Department of Health and Human Services a US$4.8 million dollar fine. In 2009 1 million people using BlueCross BlueShield of Tennessee had their data stolen == Regulation == In the United States, the Health Insurance Portability and Accountability Act and Health Information Technology for Economic and Clinical Health Act require companies to report data breaches to affected individuals and the federal government. Under the HIPAA Breach Notification Rule, covered entities must notify affected individuals without unreasonable delay and no later than 60 days after discovering a breach of unsecured protected health information. Breaches affecting 500 or more individuals must also be reported to the HHS Secretary and to prominent media outlets serving the affected state or jurisdiction within the same timeframe; HHS publicly lists these larger breaches on its breach portal, commonly known as the "wall of shame." Breaches affecting fewer than 500 individuals are reported to HHS annually, no later than 60 days after the end of the calendar year in which they were discovered. Health Information Privacy Health Insurance Portability and Accountability Act of 1996 (HIPAA). - 45 CFR Parts 160 and 164, Standards for Privacy of Individually Identifiable Health Information and Security Standards for the Protection of Electronic Protected Health Information. HIPAA includes provisions designed to save health care businesses money by encouraging electronic transactions, as well as regulations to protect the security and confidentiality of patient information. The Privacy Rule became effective April 14, 2001, and most covered entities (health plans, health care clearinghouses, and health care providers that conduct certain financial and administrative transactions electronically) had until April 2003 to comply. This security provision became effective April 21, 2003. The Health Insurance Portability and Accountability Act (HIPAA) is the baseline set of federal regulations governing medical information. It does three things: i. i. i.Establish a structure for how personal health information is disclosed and establish the rights of individuals with respect to health information; ii.Specify security standards for the retention and transmission of electronic patient information; iii.Need a common format and data structure for the electronic exchange of health information. California-Specific Laws California’s medical privacy laws, primarily the Confidentiality of Medical Information Act (CMIA), the data breach sections of the Civil Code, and sections of the Health and Safety Code, provide HIPAA-like protections, although the terminology is different. HIPAA establishes a federal "minimum standard" that applies where there are gaps in California law, and HIPAA also specifies that stricter state laws will override or supersede HIPAA. California's health care privacy laws apply to providers who provide personal health records (PHR), while HIPAA only applies when the provider providing the PHR is a business associate of a covered entity. Federal law does not grant individuals the right to file a lawsuit in the event of a data breach (only the Attorney General can file a lawsuit), but California law does. This means that California law sets a higher standard for medical privacy, and that individuals in California enjoy stronger legal protections and more ways to hold entities that violate their medical privacy accountable. In the UK, the legal framework for how patient data is cared for and processed is the Data Protection Act 2018 (DPA), which incorporates the EU General Data Protection Regulation (GDPR) into law, and the common law duty of confidentiality (CLDC). The data protection legislation requires that the collection and processing of personal data be fair, lawful and transparent. This means that the collection and processing of data as defined by data protection legislation must always have a valid lawful basis and must also meet the requirements of the CLDC. In the China, Article 18 of the "National Health Care Big Data Standards, Security and Services Management Measures (for Trial Implementation)" (National Health Planning and Development (2018) No. 23) promulgated by the National Health Care Commission in 2018 states, "The responsible unit shall adopt measures such as data classification, important data backup, and encryption authentication to guarantee the security of health care big data." However, the scope and definition of important data are not covered. Although the "Information Security Technology-Healthcare Data Security Guide" (the "Guide") issued by the National Standardization Committee also proposes that important data should be evaluated and approved in accordance with the regulations, there is likewise no definition of the connotation and definition of important data.

    Read more →
  • World Programming System

    World Programming System

    The World Programming System, also known as WPS Analytics or WPS, is a software product developed by a company called World Programming (acquired by Altair Engineering). WPS Analytics supports users of mixed ability to access and process data and to perform data science tasks. It has interactive visual programming tools using data workflows, and it has coding tools supporting the use of the SAS language mixed with Python, R and SQL. == About == WPS can use programs written in the language of SAS without the need for translating them into any other language. In this regard WPS is compatible with the SAS system. WPS has a built-in language interpreter able to process the language of SAS and produce similar results. WPS is available to run on z/OS, Windows, macOS, Linux (x86, Armv8 64-bit, IBM Power LE, IBM Z), and AIX. On all supported platforms, programs written in the language of SAS can be executed from a WPS command line interface, often referred to as running in batch mode. WPS can also be used from a graphical user interface known as the WPS Workbench for managing, editing and running programs written in the language of SAS. The WPS Workbench user interface is based on Eclipse. WPS version 4 (released in March 2018) introduced a drag-and-drop workflow canvas providing interactive blocks for data retrieval, blending and preparation, data discovery and profiling, predictive modelling powered by machine learning algorithms, model performance validation and scorecards. WPS version 3 (released in February 2012) provided a new client/server architecture that allows the WPS Workbench GUI to execute SAS programs on remote server installations of WPS in a network or cloud. The resulting output, data sets, logs, etc., can then all be viewed and manipulated from inside the Workbench as if the workloads had been executed locally. SAS programs do not require any special language statements to use this feature. == Summary of main features == Runs on Windows, macOS, z/OS, Linux (x86, Armv8 64-bit, IBM Power LE, IBM Z), and AIX An integrated development environment based on Eclipse for Linux, macOS and Windows. Support for language of SAS elements. Support for the language of SAS Macros. Matrix Programming support using PROC IML. Support for generating band plots, bar charts, box plots, bubble plots, contour plots, dendrogram plots, ellipse plots, fringe plots, heat maps, high-low plots, histograms, loess plots, needle plots, pie charts, penalised b-spline, radar charts, reference lines, scatter plots, series plots, step plots, regression plots and vector plots. Support for statistical procedures ACECLUS, ASSOCRULES, ANOVA, BIN, BOXPLOT, CANCORR, CANDISC, CLUSTER, CORRESP, DISCRIM, DISTANCE, FACTOR, FASTCLUS, FREQ, GAM, GANNO, GENMOD, GLIMMIX, GLM, GLMMOD, GLMSELECT, ICLIFETEST, KDE, LIFEREG, LIFETEST, LOESS, LOGISTIC, MDS, MEANS, MI, MIANALYSE, MIXED, MODECLUS, NESTED, NLIN, NPAR1WAY, PHREG, PLAN, PLS, POWER, PRINCOMP, PROBIT, QUANTREG, RBF, REG, ROBUSTREG, RSREG, SCORE, SEGMENT, SIMNORMAL, STANDARD, STDSIZE, STDRATE, STEPDISC, SUMMARY, SURVEYMEANS, SURVEYSELECT, TPSPLINE, TRANSREG, TREE, TTEST, UNIVARIATE, VARCLUS, VARCOMP Support for time series procedures ARIMA, AUTOREG, ESM, EXPAND, FORECAST, LOAN, SEVERITY, SPECTRA, TIMESERIES, X12 Support for machine learning procedures DECISIONFOREST, DECISIONTREE, GMM, MLP, OPTIMALBIN, SEGMENT, SVM Support for ODS. Reads and writes SAS datasets (compressed or uncompressed). Access: Actian Matrix (previously known as ParAccel), DASD, DB2, Excel, Greenplum, Hadoop, Informix, Kognitio Archived 2012-08-24 at the Wayback Machine, MariaDB, MySQL, Netezza, ODBC, OLEDB, Oracle, PostgreSQL, SAND, Snowflake, SPSS/PSPP, SQL Server, Sybase, Sybase IQ, Teradata, VSAM, Vertica and XML. Support for SAS Tape Format. Direct output of reports to CSV, PDF and HTML. Support to connect WPS systems programmatically, remote submit parts of a program to execute on connected remote servers, upload and download data between the connected systems. Support for Hadoop Support for R Support for Python == Industry recognition == Gartner recognized World Programming in their Cool Vendors in Data Science, 2014 Report. == Lawsuit == In 2010 World Programming defended its use of the language of SAS in the High Court of England and Wales in SAS Institute Inc. v World Programming Ltd. The software was the subject of a lawsuit by SAS Institute. The EU Court of Justice ruled in favor of World Programming, stating that the copyright protection does not extend to the software functionality, the programming language used and the format of the data files used by the program. It stated that there is no copyright infringement when a company which does not have access to the source code of a program studies, observes and tests that program to create another program with the same functionality.

    Read more →
  • Rprop

    Rprop

    Rprop, short for resilient backpropagation, is a learning heuristic for supervised learning in feedforward artificial neural networks. This is a first-order optimization algorithm. This algorithm was created by Martin Riedmiller and Heinrich Braun in 1992. Similarly to the Manhattan update rule, Rprop takes into account only the sign of the partial derivative over all patterns (not the magnitude), and acts independently on each "weight". For each weight, if there was a sign change of the partial derivative of the total error function compared to the last iteration, the update value for that weight is multiplied by a factor η−, where η− < 1. If the last iteration produced the same sign, the update value is multiplied by a factor of η+, where η+ > 1. The update values are calculated for each weight in the above manner, and finally each weight is changed by its own update value, in the opposite direction of that weight's partial derivative, so as to minimise the total error function. η+ is empirically set to 1.2 and η− to 0.5. Rprop can result in very large weight increments or decrements if the gradients are large, which is a problem when using mini-batches as opposed to full batches. RMSprop addresses this problem by keeping the moving average of the squared gradients for each weight and dividing the gradient by the square root of the mean square. RPROP is a batch update algorithm. Next to the cascade correlation algorithm and the Levenberg–Marquardt algorithm, Rprop is one of the fastest weight update mechanisms. == Variations == Martin Riedmiller developed three algorithms, all named RPROP. Igel and Hüsken assigned names to them and added a new variant: RPROP+ is defined at A Direct Adaptive Method for Faster Backpropagation Learning: The RPROP Algorithm. RPROP− is defined at Advanced Supervised Learning in Multi-layer Perceptrons – From Backpropagation to Adaptive Learning Algorithms. Backtracking is removed from RPROP+. iRPROP− is defined in Rprop – Description and Implementation Details and was reinvented by Igel and Hüsken. This variant is very popular and most simple. iRPROP+ is defined at Improving the Rprop Learning Algorithm and is very robust and typically faster than the other three variants.

    Read more →
  • FastICA

    FastICA

    FastICA is an efficient and popular algorithm for independent component analysis invented by Aapo Hyvärinen at Helsinki University of Technology. Like most ICA algorithms, FastICA seeks an orthogonal rotation of prewhitened data, through a fixed-point iteration scheme, that maximizes a measure of non-Gaussianity of the rotated components. Non-gaussianity serves as a proxy for statistical independence, which is a very strong condition and requires infinite data to verify. FastICA can also be alternatively derived as an approximative Newton iteration. == Algorithm == === Prewhitening the data === Let the X := ( x i j ) ∈ R N × M {\displaystyle \mathbf {X} :=(x_{ij})\in \mathbb {R} ^{N\times M}} denote the input data matrix, M {\displaystyle M} the number of columns corresponding with the number of samples of mixed signals and N {\displaystyle N} the number of rows corresponding with the number of independent source signals. The input data matrix X {\displaystyle \mathbf {X} } must be prewhitened, or centered and whitened, before applying the FastICA algorithm to it. Centering the data entails demeaning each component of the input data X {\displaystyle \mathbf {X} } , that is, for each i = 1 , … , N {\displaystyle i=1,\ldots ,N} and j = 1 , … , M {\displaystyle j=1,\ldots ,M} . After centering, each row of X {\displaystyle \mathbf {X} } has an expected value of 0 {\displaystyle 0} . Whitening the data requires a linear transformation L : R N × M → R N × M {\displaystyle \mathbf {L} :\mathbb {R} ^{N\times M}\to \mathbb {R} ^{N\times M}} of the centered data so that the components of L ( X ) {\displaystyle \mathbf {L} (\mathbf {X} )} are uncorrelated and have variance one. More precisely, if X {\displaystyle \mathbf {X} } is a centered data matrix, the covariance of L x := L ( X ) {\displaystyle \mathbf {L} _{\mathbf {x} }:=\mathbf {L} (\mathbf {X} )} is the ( N × N ) {\displaystyle (N\times N)} -dimensional identity matrix, that is, A common method for whitening is by performing an eigenvalue decomposition on the covariance matrix of the centered data X {\displaystyle \mathbf {X} } , E { X X T } = E D E T {\displaystyle E\left\{\mathbf {X} \mathbf {X} ^{T}\right\}=\mathbf {E} \mathbf {D} \mathbf {E} ^{T}} , where E {\displaystyle \mathbf {E} } is the matrix of eigenvectors and D {\displaystyle \mathbf {D} } is the diagonal matrix of eigenvalues. The whitened data matrix is defined thus by === Single component extraction === The iterative algorithm finds the direction for the weight vector w ∈ R N {\displaystyle \mathbf {w} \in \mathbb {R} ^{N}} that maximizes a measure of non-Gaussianity of the projection w T X {\displaystyle \mathbf {w} ^{T}\mathbf {X} } , with X ∈ R N × M {\displaystyle \mathbf {X} \in \mathbb {R} ^{N\times M}} denoting a prewhitened data matrix as described above. Note that w {\displaystyle \mathbf {w} } is a column vector. To measure non-Gaussianity, FastICA relies on a nonquadratic nonlinear function f ( u ) {\displaystyle f(u)} , its first derivative g ( u ) {\displaystyle g(u)} , and its second derivative g ′ ( u ) {\displaystyle g^{\prime }(u)} . Hyvärinen states that the functions are useful for general purposes, while may be highly robust. The steps for extracting the weight vector w {\displaystyle \mathbf {w} } for single component in FastICA are the following: Randomize the initial weight vector w {\displaystyle \mathbf {w} } Let w + ← E { X g ( w T X ) T } − E { g ′ ( w T X ) } w {\displaystyle \mathbf {w} ^{+}\leftarrow E\left\{\mathbf {X} g(\mathbf {w} ^{T}\mathbf {X} )^{T}\right\}-E\left\{g'(\mathbf {w} ^{T}\mathbf {X} )\right\}\mathbf {w} } , where E { . . . } {\displaystyle E\left\{...\right\}} means averaging over all column-vectors of matrix X {\displaystyle \mathbf {X} } Let w ← w + / ‖ w + ‖ {\displaystyle \mathbf {w} \leftarrow \mathbf {w} ^{+}/\|\mathbf {w} ^{+}\|} If not converged, go back to 2 === Multiple component extraction === The single unit iterative algorithm estimates only one weight vector which extracts a single component. Estimating additional components that are mutually "independent" requires repeating the algorithm to obtain linearly independent projection vectors - note that the notion of independence here refers to maximizing non-Gaussianity in the estimated components. Hyvärinen provides several ways of extracting multiple components with the simplest being the following. Here, 1 M {\displaystyle \mathbf {1_{M}} } is a column vector of 1's of dimension M {\displaystyle M} . Algorithm FastICA Input: C {\displaystyle C} Number of desired components Input: X ∈ R N × M {\displaystyle \mathbf {X} \in \mathbb {R} ^{N\times M}} Prewhitened matrix, where each column represents an N {\displaystyle N} -dimensional sample, where C <= N {\displaystyle C<=N} Output: W ∈ R N × C {\displaystyle \mathbf {W} \in \mathbb {R} ^{N\times C}} Un-mixing matrix where each column projects X {\displaystyle \mathbf {X} } onto independent component. Output: S ∈ R C × M {\displaystyle \mathbf {S} \in \mathbb {R} ^{C\times M}} Independent components matrix, with M {\displaystyle M} columns representing a sample with C {\displaystyle C} dimensions. for p in 1 to C: w p ← {\displaystyle \mathbf {w_{p}} \leftarrow } Random vector of length N while w p {\displaystyle \mathbf {w_{p}} } changes w p ← 1 M X g ( w p T X ) T − 1 M g ′ ( w p T X ) 1 M w p {\displaystyle \mathbf {w_{p}} \leftarrow {\frac {1}{M}}\mathbf {X} g(\mathbf {w_{p}} ^{T}\mathbf {X} )^{T}-{\frac {1}{M}}g'(\mathbf {w_{p}} ^{T}\mathbf {X} )\mathbf {1_{M}} \mathbf {w_{p}} } w p ← w p − ∑ j = 1 p − 1 ( w p T w j ) w j {\displaystyle \mathbf {w_{p}} \leftarrow \mathbf {w_{p}} -\sum _{j=1}^{p-1}(\mathbf {w_{p}} ^{T}\mathbf {w_{j}} )\mathbf {w_{j}} } w p ← w p ‖ w p ‖ {\displaystyle \mathbf {w_{p}} \leftarrow {\frac {\mathbf {w_{p}} }{\|\mathbf {w_{p}} \|}}} output W ← [ w 1 , … , w C ] {\displaystyle \mathbf {W} \leftarrow {\begin{bmatrix}\mathbf {w_{1}} ,\dots ,\mathbf {w_{C}} \end{bmatrix}}} output S ← W T X {\displaystyle \mathbf {S} \leftarrow \mathbf {W^{T}} \mathbf {X} }

    Read more →
  • Service-oriented software engineering

    Service-oriented software engineering

    Service-oriented software engineering (SOSE), also referred to as service engineering, is a software engineering methodology focused on the development of software systems by composition of reusable services (service-orientation) often provided by other service providers. Since it involves composition, it shares many characteristics of component-based software engineering, the composition of software systems from reusable components, but it adds the ability to dynamically locate necessary services at run-time. These services may be provided by others as web services, but the essential element is the dynamic nature of the connection between the service users and the service providers. == Service-oriented interaction pattern == There are three types of actors in a service-oriented interaction: service providers, service users and service registries. They participate in a dynamic collaboration which can vary from time to time. Service providers are software services that publish their capabilities and availability with service registries. Service users are software systems (which may be services themselves) that accomplish some task through the use of services provided by service providers. Service users use service registries to discover and locate the service providers they can use. This discovery and location occurs dynamically when the service user requests them from a service registry.

    Read more →
  • Huber loss

    Huber loss

    In statistics, the Huber loss is a loss function used in robust regression, that is less sensitive to outliers in data than the squared error loss. A variant for classification is also sometimes used. == Definition == The Huber loss function describes the penalty incurred by an estimation procedure f. Huber (1964) defines the loss function piecewise by L δ ( a ) = { 1 2 a 2 for | a | ≤ δ , δ ⋅ ( | a | − 1 2 δ ) , otherwise. {\displaystyle L_{\delta }(a)={\begin{cases}{\frac {1}{2}}{a^{2}}&{\text{for }}|a|\leq \delta ,\\[4pt]\delta \cdot \left(|a|-{\frac {1}{2}}\delta \right),&{\text{otherwise.}}\end{cases}}} This function is quadratic for small values of a, and linear for large values, with equal values and slopes of the different sections at the two points where | a | = δ {\displaystyle |a|=\delta } . The variable a often refers to the residuals, that is to the difference between the observed and predicted values a = y − f ( x ) {\displaystyle a=y-f(x)} , so the former can be expanded to L δ ( y , f ( x ) ) = { 1 2 ( y − f ( x ) ) 2 for | y − f ( x ) | ≤ δ , δ ⋅ ( | y − f ( x ) | − 1 2 δ ) , otherwise. {\displaystyle L_{\delta }(y,f(x))={\begin{cases}{\frac {1}{2}}{\left(y-f(x)\right)}^{2}&{\text{for }}\left|y-f(x)\right|\leq \delta ,\\[4pt]\delta \ \cdot \left(\left|y-f(x)\right|-{\frac {1}{2}}\delta \right),&{\text{otherwise.}}\end{cases}}} The Huber loss is the convolution of the absolute value function with the rectangular function, scaled and translated. Thus it "smoothens out" the former's corner at the origin. == Motivation == Two very commonly used loss functions are the squared loss, L ( a ) = a 2 {\displaystyle L(a)=a^{2}} , and the absolute loss, L ( a ) = | a | {\displaystyle L(a)=|a|} . The squared loss function results in an arithmetic mean-unbiased estimator, and the absolute-value loss function results in a median-unbiased estimator (in the one-dimensional case, and a geometric median-unbiased estimator for the multi-dimensional case). The squared loss has the disadvantage that it has the tendency to be dominated by outliers—when summing over a set of a {\displaystyle a} 's (as in ∑ i = 1 n L ( a i ) {\textstyle \sum _{i=1}^{n}L(a_{i})} ), the sample mean is influenced too much by a few particularly large a {\displaystyle a} -values when the distribution is heavy tailed: in terms of estimation theory, the asymptotic relative efficiency of the mean is poor for heavy-tailed distributions. As defined above, the Huber loss function is strongly convex in a uniform neighborhood of its minimum a = 0 {\displaystyle a=0} ; at the boundary of this uniform neighborhood, the Huber loss function has a differentiable extension to an affine function at points a = − δ {\displaystyle a=-\delta } and a = δ {\displaystyle a=\delta } . These properties allow it to combine much of the sensitivity of the mean-unbiased, minimum-variance estimator of the mean (using the quadratic loss function) and the robustness of the median-unbiased estimator (using the absolute value function). == Pseudo-Huber loss function == The Pseudo-Huber loss function can be used as a smooth approximation of the Huber loss function. It combines the best properties of L2 squared loss and L1 absolute loss by being strongly convex when close to the target/minimum and less steep for extreme values. The scale at which the Pseudo-Huber loss function transitions from L2 loss for values close to the minimum to L1 loss for extreme values and the steepness at extreme values can be controlled by the δ {\displaystyle \delta } value. The Pseudo-Huber loss function ensures that derivatives are continuous for all degrees. It is defined as L δ ( a ) = δ 2 ( 1 + ( a / δ ) 2 − 1 ) . {\displaystyle L_{\delta }(a)=\delta ^{2}\left({\sqrt {1+(a/\delta )^{2}}}-1\right).} As such, this function approximates a 2 / 2 {\displaystyle a^{2}/2} for small values of a {\displaystyle a} , and approximates a straight line with slope δ {\displaystyle \delta } for large values of a {\displaystyle a} . While the above is the most common form, other smooth approximations of the Huber loss function also exist. == Variant for classification == For classification purposes, a variant of the Huber loss called modified Huber is sometimes used. Given a prediction f ( x ) {\displaystyle f(x)} (a real-valued classifier score) and a true binary class label y ∈ { + 1 , − 1 } {\displaystyle y\in \{+1,-1\}} , the modified Huber loss is defined as L ( y , f ( x ) ) = { max ( 0 , 1 − y f ( x ) ) 2 for y f ( x ) > − 1 , − 4 y f ( x ) otherwise. {\displaystyle L(y,f(x))={\begin{cases}\max(0,1-y\,f(x))^{2}&{\text{for }}\,\,y\,f(x)>-1,\\[4pt]-4y\,f(x)&{\text{otherwise.}}\end{cases}}} The term max ( 0 , 1 − y f ( x ) ) {\displaystyle \max(0,1-y\,f(x))} is the hinge loss used by support vector machines; the quadratically smoothed hinge loss is a generalization of L {\displaystyle L} . == Applications == The Huber loss function is used in robust statistics, M-estimation and additive modelling.

    Read more →
  • Characteristic samples

    Characteristic samples

    Characteristic samples is a concept in the field of grammatical inference, related to passive learning. In passive learning, an inference algorithm I {\displaystyle I} is given a set of pairs of strings and labels S {\displaystyle S} , and returns a representation R {\displaystyle R} that is consistent with S {\displaystyle S} . Characteristic samples consider the scenario when the goal is not only finding a representation consistent with S {\displaystyle S} , but finding a representation that recognizes a specific target language. A characteristic sample of language L {\displaystyle L} is a set of pairs of the form ( s , l ( s ) ) {\displaystyle (s,l(s))} where: l ( s ) = 1 {\displaystyle l(s)=1} if and only if s ∈ L {\displaystyle s\in L} l ( s ) = − 1 {\displaystyle l(s)=-1} if and only if s ∉ L {\displaystyle s\notin L} Given the characteristic sample S {\displaystyle S} , I {\displaystyle I} 's output on it is a representation R {\displaystyle R} , e.g. an automaton, that recognizes L {\displaystyle L} . == Formal Definition == === The Learning Paradigm associated with Characteristic Samples === There are three entities in the learning paradigm connected to characteristic samples, the adversary, the teacher and the inference algorithm. Given a class of languages C {\displaystyle \mathbb {C} } and a class of representations for the languages R {\displaystyle \mathbb {R} } , the paradigm goes as follows: The adversary A {\displaystyle A} selects a language L ∈ C {\displaystyle L\in \mathbb {C} } and reports it to the teacher The teacher T {\displaystyle T} then computes a set of strings and label them correctly according to L {\displaystyle L} , trying to make sure that the inference algorithm will compute L {\displaystyle L} The adversary can add correctly labeled words to the set in order to confuse the inference algorithm The inference algorithm I {\displaystyle I} gets the sample and computes a representation R ∈ R {\displaystyle R\in \mathbb {R} } consistent with the sample. The goal is that when the inference algorithm receives a characteristic sample for a language L {\displaystyle L} , or a sample that subsumes a characteristic sample for L {\displaystyle L} , it will return a representation that recognizes exactly the language L {\displaystyle L} . === Sample === Sample S {\displaystyle S} is a set of pairs of the form ( s , l ( s ) ) {\displaystyle (s,l(s))} such that l ( s ) ∈ { − 1 , 1 } {\displaystyle l(s)\in \{-1,1\}} ==== Sample consistent with a language ==== We say that a sample S {\displaystyle S} is consistent with language L {\displaystyle L} if for every pair ( s , l ( s ) ) {\displaystyle (s,l(s))} in S {\displaystyle S} : l ( s ) = 1 if and only if s ∈ L {\displaystyle l(s)=1{\text{ if and only if }}s\in L} l ( s ) = − 1 if and only if s ∉ L {\displaystyle l(s)=-1{\text{ if and only if }}s\notin L} === Characteristic sample === Given an inference algorithm I {\displaystyle I} and a language L {\displaystyle L} , a sample S {\displaystyle S} that is consistent with L {\displaystyle L} is called a characteristic sample of L {\displaystyle L} for I {\displaystyle I} if: I {\displaystyle I} 's output on S {\displaystyle S} is a representation R {\displaystyle R} that recognizes L {\displaystyle L} . For every sample D {\displaystyle D} that is consistent with L {\displaystyle L} and also fulfils S ⊆ D {\displaystyle S\subseteq D} , I {\displaystyle I} 's output on D {\displaystyle D} is a representation R {\displaystyle R} that recognizes L {\displaystyle L} . A Class of languages C {\displaystyle \mathbb {C} } is said to have charistaristic samples if every L ∈ C {\displaystyle L\in \mathbb {C} } has a characteristic sample. == Related Theorems == === Theorem === If equivalence is undecidable for a class C {\textstyle \mathbb {C} } over Σ {\textstyle \Sigma } of cardinality bigger than 1, then C {\textstyle \mathbb {C} } doesn't have characteristic samples. ==== Proof ==== Given a class of representations C {\textstyle \mathbb {C} } such that equivalence is undecidable, for every polynomial p ( x ) {\displaystyle p(x)} and every n ∈ N {\displaystyle n\in \mathbb {N} } , there exist two representations r 1 {\displaystyle r_{1}} and r 2 {\displaystyle r_{2}} of sizes bounded by n {\displaystyle n} , that recognize different languages but are inseparable by any string of size bounded by p ( n ) {\displaystyle p(n)} . Assuming this is not the case, we can decide if r 1 {\displaystyle r_{1}} and r 2 {\displaystyle r_{2}} are equivalent by simulating their run on all strings of size smaller than p ( n ) {\displaystyle p(n)} , contradicting the assumption that equivalence is undecidable. === Theorem === If S 1 {\displaystyle S_{1}} is a characteristic sample for L 1 {\displaystyle L_{1}} and is also consistent with L 2 {\displaystyle L_{2}} , then every characteristic sample of L 2 {\displaystyle L_{2}} , is inconsistent with L 1 {\displaystyle L_{1}} . ==== Proof ==== Given a class C {\textstyle \mathbb {C} } that has characteristic samples, let R 1 {\displaystyle R_{1}} and R 2 {\displaystyle R_{2}} be representations that recognize L 1 {\displaystyle L_{1}} and L 2 {\displaystyle L_{2}} respectively. Under the assumption that there is a characteristic sample for L 1 {\displaystyle L_{1}} , S 1 {\displaystyle S_{1}} that is also consistent with L 2 {\displaystyle L_{2}} , we'll assume falsely that there exist a characteristic sample for L 2 {\displaystyle L_{2}} , S 2 {\displaystyle S_{2}} that is consistent with L 1 {\displaystyle L_{1}} . By the definition of characteristic sample, the inference algorithm I {\displaystyle I} must return a representation which recognizes the language if given a sample that subsumes the characteristic sample itself. But for the sample S 1 ∪ S 2 {\displaystyle S_{1}\cup S_{2}} , the answer of the inferring algorithm needs to recognize both L 1 {\displaystyle L_{1}} and L 2 {\displaystyle L_{2}} , in contradiction. === Theorem === If a class is polynomially learnable by example based queries, it is learnable with characteristic samples. == Polynomialy characterizable classes == === Regular languages === The proof that DFA's are learnable using characteristic samples, relies on the fact that every regular language has a finite number of equivalence classes with respect to the right congruence relation, ∼ L {\displaystyle \sim _{L}} (where x ∼ L y {\displaystyle x\sim _{L}y} for x , y ∈ Σ ∗ {\displaystyle x,y\in \Sigma ^{}} if and only if ∀ z ∈ Σ ∗ : x z ∈ L ↔ y z ∈ L {\displaystyle \forall z\in \Sigma ^{}:xz\in L\leftrightarrow yz\in L} ). Note that if x {\displaystyle x} , y {\displaystyle y} are not congruent with respect to ∼ L {\displaystyle \sim _{L}} , there exists a string z {\displaystyle z} such that x z ∈ L {\displaystyle xz\in L} but y z ∉ L {\displaystyle yz\notin L} or vice versa, this string is called a separating suffix. ==== Constructing a characteristic sample ==== The construction of a characteristic sample for a language L {\displaystyle L} by the teacher goes as follows. Firstly, by running a depth first search on a deterministic automaton A {\displaystyle A} recognizing L {\displaystyle L} , starting from its initial state, we get a suffix closed set of words, W {\displaystyle W} , ordered in shortlex order. From the fact above, we know that for every two states in the automaton, there exists a separating suffix that separates between every two strings that the run of A {\displaystyle A} on them ends in the respective states. We refer to the set of separating suffixes as S {\displaystyle S} . The labeled set (sample) of words the teacher gives the adversary is { ( w , l ( w ) ) | w ∈ W ⋅ S ∪ W ⋅ Σ ⋅ S } {\displaystyle \{(w,l(w))|w\in W\cdot S\cup W\cdot \Sigma \cdot S\}} where l ( w ) {\displaystyle l(w)} is the correct label of w {\displaystyle w} (whether it is in L {\displaystyle L} or not). We may assume that ϵ ∈ S {\displaystyle \epsilon \in S} . ==== Constructing a deterministic automata ==== Given the sample from the adversary W {\displaystyle W} , the construction of the automaton by the inference algorithm I {\displaystyle I} starts with defining P = prefix ( W ) {\displaystyle P={\text{prefix}}(W)} and S = suffix ( W ) {\displaystyle S={\text{suffix}}(W)} , which are the set of prefixes and suffixes of W {\displaystyle W} respectively. Now the algorithm constructs a matrix M {\displaystyle M} where the elements of P {\displaystyle P} function as the rows, ordered by the shortlex order, and the elements of S {\displaystyle S} function as the columns, ordered by the shortlex order. Next, the cells in the matrix are filled in the following manner for prefix p i {\displaystyle p_{i}} and suffix s j {\displaystyle s_{j}} : If p i s j ∈ W → M i j = l ( p i s j ) {\displaystyle p_{i}s_{j}\in W\rightarrow M_{ij}=l(p_{i}s_{j})} else, M i j = 0 {\displaystyle M_{ij}=0} Now, we say row i {\displaystyle i} and t {\displaystyle t} are distinguishable if there exi

    Read more →