AI Detector Zero Chatgpt

AI Detector Zero Chatgpt — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Rapid prototyping

    Rapid prototyping

    Rapid prototyping is a group of techniques used to quickly fabricate a scale model of a physical part or assembly using three-dimensional computer aided design (CAD) data. Construction of the part or assembly is usually done using 3D printing or "additive layer manufacturing" technology. The first methods for rapid prototyping became available in mid 1987 and were used to produce models and prototype parts. Today, they are used for a wide range of applications and are used to manufacture production-quality parts in relatively small numbers if desired without the typical unfavorable short-run economics. This economy has encouraged online service bureaus. Historical surveys of RP technology start with discussions of simulacra production techniques used by 19th-century sculptors. Some modern sculptors use the progeny technology to produce exhibitions and various objects. The ability to reproduce designs from a dataset has given rise to issues of rights, as it is now possible to interpolate volumetric data from 2D images. As with CNC subtractive methods, the computer-aided-design – computer-aided manufacturing CAD -CAM workflow in the traditional rapid prototyping process starts with the creation of geometric data, either as a 3D solid using a CAD workstation, or 2D slices using a scanning device. For rapid prototyping this data must represent a valid geometric model; namely, one whose boundary surfaces enclose a finite volume, contain no holes exposing the interior, and do not fold back on themselves. In other words, the object must have an "inside". The model is valid if for each point in 3D space the computer can determine uniquely whether that point lies inside, on, or outside the boundary surface of the model. CAD post-processors will approximate the application vendors' internal CAD geometric forms (e.g., B-splines) with a simplified mathematical form, which in turn is expressed in a specified data format which is a common feature in additive manufacturing: STL file format, a de facto standard for transferring solid geometric models to SFF machines. To obtain the necessary motion control trajectories to drive the actual SFF, rapid prototyping, 3D printing or additive manufacturing mechanism, the prepared geometric model is typically sliced into layers, and the slices are scanned into lines (producing a "2D drawing" used to generate trajectory as in CNC's toolpath), mimicking in reverse the layer-to-layer physical building process. == Application areas == Rapid prototyping is also commonly applied in software engineering to try out new business models and application architectures such as Aerospace, Automotive, Financial Services, Product development, and Healthcare. Aerospace design and industrial teams rely on prototyping in order to create new AM methodologies in the industry. Using SLA they can quickly make multiple versions of their projects in a few days and begin testing quicker. Rapid Prototyping allows designers/developers to provide an accurate idea of how the finished product will turn out before putting too much time and money into the prototype. 3D printing being used for Rapid Prototyping allows for Industrial 3D printing to take place. With this, you could have large-scale moulds to spare parts being pumped out quickly within a short period of time. == Types of Rapid Prototyping == Stereolithography (SLA) → a laser-cured photopolymer for materials such as thermoplastic-like photopolymers. Selective Laser Sintering (SLS) → a laser-sintered powder for materials such as Nylon or TPU. Direct Metal Laser Sintering (DMLS) → laser-sintered metal powder for materials like stainless steel, titanium, chrome, and aluminum. Fused Deposition Modeling (FDM) → fused extrusions of filaments like ABS, PC, and PPCU. Multi Jet Fusion (MJF) → it is an inkjet array selective fusing across bed of nylon powder for Black Nylon 12. PolyJet (PJET) → it is a uv-cured jetted photopolymer to work with acrylic-based and elastomeric photopolymers. Computer Numerical Controlled Machine (CNC) → it is used for manipulating engineering-grade thermoplastics and metals. Injection Molding (IM) → the injection is done using aluminum molds and it is used for thermoplastics, metals and liquid silicone rubber. Vacuum Casting→ is a manufacturing process used to create high-quality prototypes and small batches of parts. == History == In the 1970s, Joseph Henry Condon and others at Bell Labs developed the Unix Circuit Design System (UCDS), automating the laborious and error-prone task of manually converting drawings to fabricate circuit boards for the purposes of research and development. By the 1980s, U.S. policy makers and industrial managers were forced to take note that America's dominance in the field of machine tool manufacturing evaporated, in what was named the machine tool crisis. Numerous projects sought to counter these trends in the traditional CNC CAM area, which had begun in the US. Later when Rapid Prototyping Systems moved out of labs to be commercialized, it was recognized that developments were already international and U.S. rapid prototyping companies would not have the luxury of letting a lead slip away. The National Science Foundation was an umbrella for the National Aeronautics and Space Administration (NASA), the US Department of Energy, the US Department of Commerce NIST, the US Department of Defense, Defense Advanced Research Projects Agency (DARPA), and the Office of Naval Research coordinated studies to inform strategic planners in their deliberations. One such report was the 1997 Rapid Prototyping in Europe and Japan Panel Report in which Joseph J. Beaman founder of DTM Corporation [DTM RapidTool pictured] provides a historical perspective: The roots of rapid prototyping technology can be traced to practices in topography and photosculpture. Within TOPOGRAPHY Blanther (1892) suggested a layered method for making a mold for raised relief paper topographical maps .The process involved cutting the contour lines on a series of plates which were then stacked. Matsubara (1974) of Mitsubishi proposed a topographical process with a photo-hardening photopolymer resin to form thin layers stacked to make a casting mold. PHOTOSCULPTURE was a 19th-century technique to create exact three-dimensional replicas of objects. Most famously Francois Willeme (1860) placed 24 cameras in a circular array and simultaneously photographed an object. The silhouette of each photograph was then used to carve a replica. Morioka (1935, 1944) developed a hybrid photo sculpture and topographic process using structured light to photographically create contour lines of an object. The lines could then be developed into sheets and cut and stacked, or projected onto stock material for carving. The Munz (1956) Process reproduced a three-dimensional image of an object by selectively exposing, layer by layer, a photo emulsion on a lowering piston. After fixing, a solid transparent cylinder contains an image of the object. "The Origins of Rapid Prototyping - RP stems from the ever-growing CAD industry, more specifically, the solid modeling side of CAD. Before solid modeling was introduced in the late 1980's, three-dimensional models were created with wire frames and surfaces. But not until the development of true solid modeling could innovative processes such as RP be developed. Charles Hull, who helped found 3D Systems in 1986, developed the first RP process. This process, called stereolithography, builds objects by curing thin consecutive layers of certain ultraviolet light-sensitive liquid resins with a low-power laser. With the introduction of RP, CAD solid models could suddenly come to life". The technologies referred to as Solid Freeform Fabrication are what we recognize today as rapid prototyping, 3D printing or additive manufacturing: Swainson (1977), Schwerzel (1984) worked on polymerization of a photosensitive polymer at the intersection of two computer controlled laser beams. Ciraud (1972) considered magnetostatic or electrostatic deposition with electron beam, laser or plasma for sintered surface cladding. These were all proposed but it is unknown if working machines were built. Hideo Kodama of Nagoya Municipal Industrial Research Institute was the first to publish an account of a solid model fabricated using a photopolymer rapid prototyping system (1981). The first 3D rapid prototyping system relying on Fused Deposition Modeling (FDM) was made in April 1992 by Stratasys but the patent did not issue until June 9, 1992. Sanders Prototype, Inc introduced the first desktop inkjet 3D Printer (3DP) using an invention from August 4, 1992 (Helinski), Modelmaker 6Pro in late 1993 and then the larger industrial 3D printer, Modelmaker 2, in 1997. Z-Corp using the MIT 3DP powder binding for Direct Shell Casting (DSP) invented 1993 was introduced to the market in 1995. Even at that early date the technology was seen as having a place in manufacturing practice. A low resol

    Read more →
  • Margin classifier

    Margin classifier

    In machine learning (ML), a margin classifier is a type of classification model which is able to give an associated distance from the decision boundary for each data sample. For instance, if a linear classifier is used, the distance (typically Euclidean, though others may be used) of a sample from the separating hyperplane is the margin of that sample. The notion of margins is important in several ML classification algorithms, as it can be used to bound the generalization error of these classifiers. These bounds are frequently shown using the VC dimension. The generalization error bound in boosting algorithms and support vector machines is particularly prominent. == Margin for boosting algorithms == The margin for an iterative boosting algorithm given a dataset with two classes can be defined as follows: the classifier is given a sample pair ( x , y ) {\displaystyle (x,y)} , where x ∈ X {\displaystyle x\in X} is a domain space and y ∈ Y = { − 1 , + 1 } {\displaystyle y\in Y=\{-1,+1\}} is the sample's label. The algorithm then selects a classifier h j ∈ C {\displaystyle h_{j}\in C} at each iteration j {\displaystyle j} where C {\displaystyle C} is a space of possible classifiers that predict real values. This hypothesis is then weighted by α j ∈ R {\displaystyle \alpha _{j}\in R} as selected by the boosting algorithm. At iteration t {\displaystyle t} , the margin of a sample x {\displaystyle x} can thus be defined as y ∑ j t α j h j ( x ) ∑ | α j | . {\displaystyle {\frac {y\sum _{j}^{t}\alpha _{j}h_{j}(x)}{\sum |\alpha _{j}|}}.} By this definition, the margin is positive if the sample is labeled correctly, or negative if the sample is labeled incorrectly. This definition may be modified and is not the only way to define the margin for boosting algorithms. However, there are reasons why this definition may be appealing. == Examples of margin-based algorithms == Many classifiers can give an associated margin for each sample. However, only some classifiers utilize information of the margin while learning from a dataset. Many boosting algorithms rely on the notion of a margin to assign weight to samples. If a convex loss is utilized (as in AdaBoost or LogitBoost, for instance) then a sample with a higher margin will receive less (or equal) weight than a sample with a lower margin. This leads the boosting algorithm to focus weight on low-margin samples. In non-convex algorithms (e.g., BrownBoost), the margin still dictates the weighting of a sample, though the weighting is non-monotone with respect to the margin. == Generalization error bounds == One theoretical motivation behind margin classifiers is that their generalization error may be bound by the algorithm parameters and a margin term. An example of such a bound is for the AdaBoost algorithm. Let S {\displaystyle S} be a set of m {\displaystyle m} data points, sampled independently at random from a distribution D {\displaystyle D} . Assume the VC-dimension of the underlying base classifier is d {\displaystyle d} and m ≥ d ≥ 1 {\displaystyle m\geq d\geq 1} . Then, with probability 1 − δ {\displaystyle 1-\delta } , we have the bound: P D ( y ∑ j t α j h j ( x ) ∑ | α j | ≤ 0 ) ≤ P S ( y ∑ j t α j h j ( x ) ∑ | α j | ≤ θ ) + O ( 1 m d log 2 ⁡ ( m / d ) / θ 2 + log ⁡ ( 1 / δ ) ) {\displaystyle P_{D}\left({\frac {y\sum _{j}^{t}\alpha _{j}h_{j}(x)}{\sum |\alpha _{j}|}}\leq 0\right)\leq P_{S}\left({\frac {y\sum _{j}^{t}\alpha _{j}h_{j}(x)}{\sum |\alpha _{j}|}}\leq \theta \right)+O\left({\frac {1}{\sqrt {m}}}{\sqrt {d\log ^{2}(m/d)/\theta ^{2}+\log(1/\delta )}}\right)} for all θ > 0 {\displaystyle \theta >0} .

    Read more →
  • Variable kernel density estimation

    Variable kernel density estimation

    In statistics, adaptive or "variable-bandwidth" kernel density estimation is a form of kernel density estimation in which the size of the kernels used in the estimate are varied depending upon either the location of the samples or the location of the test point. It is a particularly effective technique when the sample space is multi-dimensional. == Rationale == Given a set of samples, { x → i } {\displaystyle \lbrace {\vec {x}}_{i}\rbrace } , we wish to estimate the density, P ( x → ) {\displaystyle P({\vec {x}})} , at a test point, x → {\displaystyle {\vec {x}}} : P ( x → ) ≈ W n h D {\displaystyle P({\vec {x}})\approx {\frac {W}{nh^{D}}}} W = ∑ i = 1 n w i {\displaystyle W=\sum _{i=1}^{n}w_{i}} w i = K ( x → − x → i h ) {\displaystyle w_{i}=K\left({\frac {{\vec {x}}-{\vec {x}}_{i}}{h}}\right)} where n is the number of samples, K is the "kernel", h is its width and D is the number of dimensions in x → {\displaystyle {\vec {x}}} . The kernel can be thought of as a simple, linear filter. Using a fixed filter width may mean that in regions of low density, all samples will fall in the tails of the filter with very low weighting, while regions of high density will find an excessive number of samples in the central region with weighting close to unity. To fix this problem, we vary the width of the kernel in different regions of the sample space. There are two methods of doing this: balloon and pointwise estimation. In a balloon estimator, the kernel width is varied depending on the location of the test point. In a pointwise estimator, the kernel width is varied depending on the location of the sample. For multivariate estimators, the parameter, h, can be generalized to vary not just the size, but also the shape of the kernel. This more complicated approach will not be covered here. == Balloon estimators == A common method of varying the kernel width is to make it inversely proportional to the density at the test point: h = k [ n P ( x → ) ] 1 / D {\displaystyle h={\frac {k}{\left[nP({\vec {x}})\right]^{1/D}}}} where k is a constant. If we back-substitute the estimated PDF, and assuming a Gaussian kernel function, we can show that W is a constant: W = k D ( 2 π ) D / 2 {\displaystyle W=k^{D}(2\pi )^{D/2}} A similar derivation holds for any kernel whose normalising function is of the order hD, although with a different constant factor in place of the (2 π)D/2 term. This produces a generalization of the k-nearest neighbour algorithm. That is, a uniform kernel function will return the KNN technique. There are two components to the error: a variance term and a bias term. The variance term is given as: e 1 = P ∫ K 2 n h D {\displaystyle e_{1}={\frac {P\int K^{2}}{nh^{D}}}} . The bias term is found by evaluating the approximated function in the limit as the kernel width becomes much larger than the sample spacing. By using a Taylor expansion for the real function, the bias term drops out: e 2 = h 2 n ∇ 2 P {\displaystyle e_{2}={\frac {h^{2}}{n}}\nabla ^{2}P} An optimal kernel width that minimizes the error of each estimate can thus be derived. == Use for statistical classification == The method is particularly effective when applied to statistical classification. There are two ways we can proceed: the first is to compute the PDFs of each class separately, using different bandwidth parameters, and then compare them as in Taylor. Alternatively, we can divide up the sum based on the class of each sample: P ( j , x → ) ≈ 1 n ∑ i = 1 , c i = j n w i {\displaystyle P(j,{\vec {x}})\approx {\frac {1}{n}}\sum _{i=1,c_{i}=j}^{n}w_{i}} where ci is the class of the ith sample. The class of the test point may be estimated through maximum likelihood.

    Read more →
  • Bondy's theorem

    Bondy's theorem

    In mathematics, Bondy's theorem is a bound on the number of elements needed to distinguish the sets in a family of sets from each other. It belongs to the field of combinatorics, and is named after John Adrian Bondy, who published it in 1972. == Statement == The theorem is as follows: Let X be a set with n elements and let A1, A2, ..., An be distinct subsets of X. Then there exists a subset S of X with n − 1 elements such that the sets Ai ∩ S are all distinct. In other words, if we have a 0-1 matrix with n rows and n columns such that each row is distinct, we can remove one column such that the rows of the resulting n × (n − 1) matrix are distinct. == Example == Consider the 4 × 4 matrix [ 1 1 0 1 0 1 0 1 0 0 1 1 0 1 1 0 ] {\displaystyle {\begin{bmatrix}1&1&0&1\\0&1&0&1\\0&0&1&1\\0&1&1&0\end{bmatrix}}} where all rows are pairwise distinct. If we delete, for example, the first column, the resulting matrix [ 1 0 1 1 0 1 0 1 1 1 1 0 ] {\displaystyle {\begin{bmatrix}1&0&1\\1&0&1\\0&1&1\\1&1&0\end{bmatrix}}} no longer has this property: the first row is identical to the second row. Nevertheless, by Bondy's theorem we know that we can always find a column that can be deleted without introducing any identical rows. In this case, we can delete the third column: all rows of the 3 × 4 matrix [ 1 1 1 0 1 1 0 0 1 0 1 0 ] {\displaystyle {\begin{bmatrix}1&1&1\\0&1&1\\0&0&1\\0&1&0\end{bmatrix}}} are distinct. Another possibility would have been deleting the fourth column. == Learning theory application == From the perspective of computational learning theory, Bondy's theorem can be rephrased as follows: Let C be a concept class over a finite domain X. Then there exists a subset S of X with the size at most |C| − 1 such that S is a witness set for every concept in C. This implies that every finite concept class C has its teaching dimension bounded by |C| − 1.

    Read more →
  • Artificial intelligence in hiring

    Artificial intelligence in hiring

    Artificial intelligence can be used to automate aspects of the job recruitment process. Advances in artificial intelligence, such as the advent of machine learning and the growth of big data, enable AI to be utilized to recruit, screen, and predict the success of applicants. Proponents of artificial intelligence in hiring claim it reduces bias, assists with finding qualified candidates, and frees up human resource workers' time for other tasks, while opponents worry that AI perpetuates inequalities in the workplace and will eliminate jobs. Despite the potential benefits, the ethical implications of AI in hiring remain a subject of debate, with concerns about algorithmic transparency, accountability, and the need for ongoing oversight to ensure fair and unbiased decision-making throughout the recruitment process. == Background == It is common for companies to use AI to automate aspects of their hiring process, especially the hospitality, finance, and tech industries. == Uses == === Screeners === Screeners are tests that allow companies to sift through a large applicant pool and extract applicants that have desirable features. What factors are used to screen applicants is a concern to ethicists and civil rights activists. A screener that favors people who have similar characteristics to those already employed at a company may perpetuate inequalities. For example, if a company that is predominantly white and male uses its employees' data to train its screener it may accidentally create a screening process that favors white, male applicants. The automation of screeners also has the potential to reduce biases. Biases against applicants with African American sounding names have been shown in multiple studies. An AI screener has the potential to limit human bias and error in the hiring process, allowing more minority applicants to be successful. === Recruitment === Recruitment involves the identification of potential applicants and the marketing of positions. AI is commonly utilized in the recruitment process because it can help boost the number of qualified applicants for positions. Companies are able to use AI to target their marketing to applicants who are likely to be good fits for a position. This often involves the use of social media sites advertising tools, which rely on AI. Facebook allows advertisers to target ads based on demographics, location, interests, behavior, and connections. Facebook also allows companies to target a "look-a-like" audience, that is the company supplies Facebook with a data set, typically the company's current employees, and Facebook will target the ad to profiles that are similar to the profiles in the data set. Additionally, job sites like Indeed, Glassdoor, and ZipRecruiter target job listings to applicants that have certain characteristics employers are looking for. Targeted advertising has many advantages for companies trying to recruit such being a more efficient use of resources, reaching a desired audience, and boosting qualified applicants. This has helped make it a mainstay in modern hiring. Who receives a targeted ad can be controversial. In hiring, the implications of targeted ads have to do with who is able to find out about and then apply to a position. Most targeted ad algorithms are proprietary information. Some platforms, like Facebook and Google, allow users to see why they were shown a specific ad, but users who do not receive the ad likely never know of its existence and also have no way of knowing why they were not shown the ad. === Interviews === Chatbots were one of the first applications of AI and are commonly used in the hiring process. Interviewees interact with chatbots to answer interview questions, and an analysis of their responses can be generated by AI. HireVue has created technology that analyzes interviewees' responses and gestures during recorded video interviews. Over 12 million interviewees have been screened by the more than 700 companies that utilize the service. == Controversies == Artificial intelligence in hiring confers many benefits, but it also has some challenges that have concerned experts. AI is only as good as the data it is using. Biases can inadvertently be baked into the data used in AI. Often companies will use data from their employees to decide what people to recruit or hire. This can perpetuate bias and lead to more homogenous workforces. Facebook Ads was an example of a platform that created such controversy for allowing business owners to specify what type of employee they are looking for. For example, job advertisements for nursing and teach could be set such that only women of a specific age group would see the advertisements. Facebook Ads has since then removed this function from its platform, citing the potential problems with the function in perpetuating biases and stereotypes against minorities. The growing use of Artificial Intelligence-enabled hiring systems has become an important component of modern talent hiring, particularly through social networks such as LinkedIn and Facebook. However, data overflow embedded in the hiring systems, based on Natural Language Processing (NLP) methods, may result in unconscious gender bias. Utilizing data driven methods may mitigate some bias generated from these systems It can also be hard to quantify what makes a good employee. This poses a challenge for training AI to predict which employees will be best. Commonly used metrics like performance reviews can be subjective and have been shown to favor white employees over black employees and men over women. Another challenge is the limited amount of available data. Employers only collect certain details about candidates during the initial stages of the hiring process. This requires AI to make determinations about candidates with very limited information to go off of. Additionally, many employers do not hire employees frequently and so have limited firm specific data to go off. To combat this, many firms will use algorithms and data from other firms in their industry. AI's reliance on applicant and current employees personal data raises privacy issues. These issues effect both the applicants and current employees, but also may have implications for third parties who are linked through social media to applicants or current employees. For example, a sweep of someone's social media will also show their friends and people they have tagged in photos or posts. == AI and the future of hiring == Artificial intelligence along with other technological advances such as improvements in robotics have placed 47% of jobs at risk of being eliminated in the near future. In 2016 the founder of the World Economic Forum, Klaus Schwab, called AI and related technology the "Fourth Industrial Revolution". According to some scholars, however, the transformative impact of AI on labor has been overstated. The "no-real-change" theory holds that an IT revolution has already occurred, but that the benefits of implementing new technologies does not outweigh the costs associated with adopting them. This theory claims that the result of the IT revolution is thus much less impactful than had originally been forecasted. Other scholars refute this theory claiming that AI has already led to significant job loss for unskilled labor and that it will eliminate middle skill and high skill jobs in the future. This position is based around the idea that AI is not yet a technology of general use and that any potential 4th industrial revolution has not fully occurred. A third theory holds that the effect of AI and other technological advances is too complicated to yet be understood. This theory is centered around the idea that while AI will likely eliminate jobs in the short term it will also likely increase the demand for other jobs. The question then becomes will the new jobs be accessible to people and will they emerge near when jobs are eliminated. == AI use in hiring for candidates == Job seekers now commonly encounter AI-driven tools at multiple stages, including automated resume parsing, video interview analysis, chatbots for frequently asked questions, and real‑time application updates. Some candidates also employ AI career agents, designed to optimize job searches, tailor applications, and interface with hiring teams. A 2025 Australian study found that AI-driven video interviews exhibited transcription error rates of up to 22% for non‑native speakers and those with speech-related disabilities, raising concerns of discrimination. A 2017 study in the Journal of Sociology found persistent gender and racial disparities in AI screening tools, even when fairness interventions are applied. Industry observers describe a growing “AI arms race” in recruitment, where both employers and candidates increasingly rely on automated agents. Employers use recruiting systems to source and filter applicants, while candidates deploy AI agents to prepare and submit applications. == Regulations == The Artifici

    Read more →
  • Yooreeka

    Yooreeka

    Yooreeka is a library for data mining, machine learning, soft computing, and mathematical analysis. The project started with the code of the book "Algorithms of the Intelligent Web". Although the term "Web" prevailed in the title, in essence, the algorithms are valuable in any software application. It covers all major algorithms and provides many examples. Yooreeka 2.x is licensed under the Apache License rather than the somewhat more restrictive LGPL (which was the license of v1.x). The library is written 100% in the Java language. == Algorithms == The following algorithms are covered: Clustering Hierarchical—Agglomerative (e.g. MST single link; ROCK) and Divisive Partitional (e.g. k-means) Classification Bayesian Decision trees Neural Networks Rule based (via Drools) Recommendations Collaborative filtering Content based Search PageRank DocRank Personalization

    Read more →
  • Hinge loss

    Hinge loss

    In machine learning, the hinge loss is a loss function used for training classifiers. The hinge loss is used for "maximum-margin" classification, most notably for support vector machines (SVMs). For an intended output t = ±1 and a classifier score y, the hinge loss of the prediction y is defined as ℓ ( y ) = max ( 0 , 1 − t ⋅ y ) {\displaystyle \ell (y)=\max(0,1-t\cdot y)} Note that y {\displaystyle y} should be the "raw" output of the classifier's decision function, not the predicted class label. For instance, in linear SVMs, y = w ⋅ x + b {\displaystyle y=\mathbf {w} \cdot \mathbf {x} +b} , where ( w , b ) {\displaystyle (\mathbf {w} ,b)} are the parameters of the hyperplane and x {\displaystyle \mathbf {x} } is the input variable(s). When t and y have the same sign (meaning y predicts the right class) and | y | ≥ 1 {\displaystyle |y|\geq 1} , the hinge loss ℓ ( y ) = 0 {\displaystyle \ell (y)=0} . When they have opposite signs, ℓ ( y ) {\displaystyle \ell (y)} increases linearly with y, and similarly if | y | < 1 {\displaystyle |y|<1} , even if it has the same sign (correct prediction, but not by enough margin). The Hinge loss is not a proper scoring rule. == Extensions == While binary SVMs are commonly extended to multiclass classification in a one-vs.-all or one-vs.-one fashion, it is also possible to extend the hinge loss itself for such an end. Several different variations of multiclass hinge loss have been proposed. For example, Crammer and Singer defined it for a linear classifier as ℓ ( y ) = max ( 0 , 1 + max y ≠ t w y x − w t x ) {\displaystyle \ell (y)=\max(0,1+\max _{y\neq t}\mathbf {w} _{y}\mathbf {x} -\mathbf {w} _{t}\mathbf {x} )} , where t {\displaystyle t} is the target label, w t {\displaystyle \mathbf {w} _{t}} and w y {\displaystyle \mathbf {w} _{y}} are the model parameters. Weston and Watkins provided a similar definition, but with a sum rather than a max: ℓ ( y ) = ∑ y ≠ t max ( 0 , 1 + w y x − w t x ) {\displaystyle \ell (y)=\sum _{y\neq t}\max(0,1+\mathbf {w} _{y}\mathbf {x} -\mathbf {w} _{t}\mathbf {x} )} . In structured prediction, the hinge loss can be further extended to structured output spaces. Structured SVMs with margin rescaling use the following variant, where w denotes the SVM's parameters, y the SVM's predictions, φ the joint feature function, and Δ the Hamming loss: ℓ ( y ) = max ( 0 , Δ ( y , t ) + ⟨ w , ϕ ( x , y ) ⟩ − ⟨ w , ϕ ( x , t ) ⟩ ) = max ( 0 , max y ∈ Y ( Δ ( y , t ) + ⟨ w , ϕ ( x , y ) ⟩ ) − ⟨ w , ϕ ( x , t ) ⟩ ) {\displaystyle {\begin{aligned}\ell (\mathbf {y} )&=\max(0,\Delta (\mathbf {y} ,\mathbf {t} )+\langle \mathbf {w} ,\phi (\mathbf {x} ,\mathbf {y} )\rangle -\langle \mathbf {w} ,\phi (\mathbf {x} ,\mathbf {t} )\rangle )\\&=\max(0,\max _{y\in {\mathcal {Y}}}\left(\Delta (\mathbf {y} ,\mathbf {t} )+\langle \mathbf {w} ,\phi (\mathbf {x} ,\mathbf {y} )\rangle \right)-\langle \mathbf {w} ,\phi (\mathbf {x} ,\mathbf {t} )\rangle )\end{aligned}}} . == Optimization == The hinge loss is a convex function, so many of the usual convex optimizers used in machine learning can work with it. It is not differentiable, but has a subgradient with respect to model parameters w of a linear SVM with score function y = w ⋅ x {\displaystyle y=\mathbf {w} \cdot \mathbf {x} } that is given by ∂ ℓ ∂ w i = { − t ⋅ x i if t ⋅ y < 1 , 0 otherwise . {\displaystyle {\frac {\partial \ell }{\partial w_{i}}}={\begin{cases}-t\cdot x_{i}&{\text{if }}t\cdot y<1,\\0&{\text{otherwise}}.\end{cases}}} However, since the derivative of the hinge loss at t y = 1 {\displaystyle ty=1} is undefined, smoothed versions may be preferred for optimization, such as Rennie and Srebro's ℓ ( y ) = { 1 2 − t y if t y ≤ 0 , 1 2 ( 1 − t y ) 2 if 0 < t y < 1 , 0 if 1 ≤ t y {\displaystyle \ell (y)={\begin{cases}{\frac {1}{2}}-ty&{\text{if}}~~ty\leq 0,\\{\frac {1}{2}}(1-ty)^{2}&{\text{if}}~~0 Read more →

  • Harrison White

    Harrison White

    Harrison Colyar White (March 21, 1930 – May 18, 2024) was an American sociologist who was the Giddings Professor of Sociology at Columbia University. White played an influential role in the “Harvard Revolution” in social networks and the New York School of relational sociology. He is credited with the development of a number of mathematical models of social structure including vacancy chains and blockmodels. He has been a leader of a revolution in sociology that is still in process, using models of social structure that are based on patterns of relations instead of the attributes and attitudes of individuals. Among social network researchers, White is widely respected. For instance, at the 1997 International Network of Social Network Analysis conference, the organizer held a special “White Tie” event, dedicated to White. Social network researcher Emmanuel Lazega refers to him as both “Copernicus and Galileo” because he invented both the vision and the tools. The most comprehensive documentation of his theories can be found in the book Identity and Control, first published in 1992. A major rewrite of the book appeared in June 2008. In 2011, White received the W.E.B. DuBois Career of Distinguished Scholarship Award from the American Sociological Association, which honors "scholars who have shown outstanding commitment to the profession of sociology and whose cumulative work has contributed in important ways to the advancement of the discipline." Before his retirement to live in Tucson, Arizona, White was interested in sociolinguistics and business strategy as well as sociology. == Life and career == === Early years === White was born on March 21, 1930, in Washington, D.C. He had three siblings and his father was a doctor in the US Navy. Although moving around to different Naval bases throughout his adolescence, he considered himself Southern, and Nashville, TN to be his home. At the age of 15, he entered the Massachusetts Institute of Technology (MIT), receiving his undergraduate degree at 20 years of age; five years later, in 1955, he received a doctorate in theoretical physics, also from MIT with John C. Slater as his advisor. His dissertation was titled A quantum-mechanical calculation of inter-atomic force constants in copper. This was published in the Physical Review as "Atomic Force Constants of Copper from Feynman's Theorem" (1958). While at MIT he also took a course with the political scientist Karl Deutsch, who White credits with encouraging him to move toward the social sciences. === Princeton University === After receiving his PhD in theoretical physics, he received a Fellowship from the Ford Foundation to begin his second doctorate in sociology at Princeton University. His dissertation advisor was Marion J. Levy. White also worked with Wilbert Moore, Fred Stephan, and Frank W. Notestein while at Princeton. His cohort was very small, with only four or five other graduate students including David Matza, and Stanley Udy. At the same time, he took up a position as an operations analyst at the Operations Research Office, Johns Hopkins University from 1955 to 1956. During this period, he worked with Lee S. Christie on Queuing with Preemptive Priorities or with Breakdown, which was published in 1958. Christie previously worked alongside mathematical psychologist R. Duncan Luce in the Small Group Laboratory at MIT while White was completing his first PhD in physics also at MIT. While continuing his studies at Princeton, White also spent a year as a fellow at the Center for Advanced Study in the Behavioral Sciences, Stanford University, California where he met Harold Guetzkow. Guetzkow was a faculty member at the Carnegie Institute of Technology, known for his application of simulations to social behavior and long-time collaborator with many other pioneers in organization studies, including Herbert A. Simon, James March, and Richard Cyert. Upon meeting Simon through his mutual acquaintance with Guetzkow, White received an invitation to move from California to Pittsburgh to work as an assistant professor of Industrial Administration and Sociology at the Graduate School of Industrial Administration, Carnegie Institute of Technology (later Carnegie-Mellon University), where he stayed for a couple of years, between 1957 and 1959. In an interview, he claimed to have fought with the dean, Leyland Bock, to have the word "sociology" included in his title. It was also during his time at the Stanford Center for Advanced Study that White met his first wife, Cynthia A. Johnson, who was a graduate of Radcliffe College, where she had majored in art history. The couple's joint work on the French Impressionists, Canvases and Careers (1965) and “Institutional Changes in the French Painting World” (1964), originally grew out of a seminar on art in 1957 at the Center for Advanced Study led by Robert Wilson. White originally hoped to use sociometry to map the social structure of French art to predict shifts, but he had an epiphany that it was not social structure but institutional structure which explained the shift. It was also during these years that White, still a graduate student in sociology, wrote and published his first social scientific work, "Sleep: A Sociological Interpretation" in Acta Sociologica in 1960, together with Vilhelm Aubert, a Norwegian sociologist. This work was a phenomenological examination of sleep which attempted to "demonstrate that sleep was more than a straightforward biological activity... [but rather also] a social event". For his dissertation, White carried out empirical research on a research and development department in a manufacturing firm, consisting of interviews and a 110-item questionnaire with managers. He specifically used sociometric questions, which he used to model the "social structure" of relationships between various departments and teams in the organization. In May 1960 he submitted as his doctoral dissertation, titled Research and Development as a Pattern in Industrial Management: A Case Study in Institutionalisation and Uncertainty, earning a PhD in sociology from Princeton University. His first publication based on his dissertation was ''Management conflict and sociometric structure'' in the American Journal of Sociology. === University of Chicago === In 1959 James Coleman left the University of Chicago to found a new department of social relations at Johns Hopkins University, this left a vacancy open for a mathematical sociologist like White. He moved to Chicago to start working as an associate professor at the Department of Sociology. At that time, highly influential sociologists, such as Peter Blau, Mayer Zald, Elihu Katz, Everett Hughes, Erving Goffman were there. As Princeton only required one year in residence, and White took the opportunity to take positions at Johns Hopkins, Stanford, and Carnegie while still working on his dissertation, it was at Chicago that White credits as being his "real socialization in a way, into sociology." It was here that White advised his first two graduate students Joel H. Levine and Morris Friedell, both who went on to make contributions to social network analysis in sociology. While at the Center for Advanced Study, White began learning anthropology and became fascinated with kinship. During his stay at the University of Chicago White was able to finish An Anatomy of Kinship, published in 1963 within the Prentice-Hall series in Mathematical Analysis of Social Behavior, with James Coleman and James March as chief editors. The book received significant attention from many mathematical sociologists of the time, and contributed greatly to establish White as a model builder. === The Harvard Revolution === In 1963, White left Chicago to be an associate professor of sociology at the Harvard Department of Social Relations—the same department founded by Talcott Parsons and still heavily influenced by the structural-functionalist paradigm of Parsons. As White previously only taught graduate courses at Carnegie and Chicago, his first undergraduate course was An Introduction to Social Relations (see Influence) at Harvard, which became infamous among network analysts. As he "thought existing textbooks were grotesquely unscientific," the syllabus of the class was noted for including few readings by sociologists, and comparatively more readings by anthropologists, social psychologists, and historians. White was also a vocal critic of what he called the "attributes and attitudes" approach of Parsonsian sociology, and came to be the leader of what has been variously known as the “Harvard Revolution," the "Harvard breakthrough," or the "Harvard renaissance" in social networks. He worked closely with small group researchers George C. Homans and Robert F. Bales, which was largely compatible with his prior work in organizational research and his efforts to formalize network analysis. Overlapping White's early years, Charles Tilly, a graduate of the Harvard Department of Social

    Read more →
  • Artificial Linguistic Internet Computer Entity

    Artificial Linguistic Internet Computer Entity

    A.L.I.C.E. (Artificial Linguistic Internet Computer Entity), also referred to as Alicebot, or simply Alice, is a natural language processing chatbot—a program that engages in a conversation with a human by applying some heuristical pattern matching rules to the human's input. It was inspired by Joseph Weizenbaum's classical ELIZA program. It is one of the strongest programs of its type and has won the Loebner Prize, awarded to accomplished humanoid, talking robots, three times (in 2000, 2001, and 2004). The program is unable to pass the Turing test, as even the casual user will often expose its mechanistic aspects in short conversations. Alice was originally composed by Richard Wallace; it "came to life" on November 23, 1995. The program was rewritten in Java beginning in 1998. The current incarnation of the Java implementation is Program D. The program uses an XML Schema called AIML (Artificial Intelligence Markup Language) for specifying the heuristic conversation rules. Alice code has been reported to be available as open source. The AIML source is available from ALICE A.I. Foundation on Google Code and from the GitHub account of Richard Wallace. These AIML files can be run using an AIML interpreter like Program O or Program AB. == In popular culture == Spike Jonze has cited ALICE as the inspiration for his academy award-winning film Her, in which a human falls in love with a chatbot. In a New Yorker article titled “Can Humans Fall in Love with Bots?” Jonze said “that the idea originated from a program he tried about a decade ago called the ALICE bot, which engages in friendly conversation.” The Los Angeles Times reported:Though the film’s premise evokes comparisons to Siri, Jonze said he actually had the idea well before the Apple digital assistant came along, after using a program called Alicebot about ten years ago. As geek nostalgists will recall, that intriguing if at times crude software (it flunked the industry-standard Turing Test) would attempt to engage users in everyday chatter based on a database of prior conversations. Jonze liked it, and decided to apply a film genre to it. “I thought about that idea, and what if you had a real relationship with it?” Jonze told reporters. “And I used that as a way to write a relationship movie and a love story.”

    Read more →
  • Ground truth

    Ground truth

    Ground truth is information that is known to be real or true, provided by direct observation and measurement (i.e. empirical evidence) as opposed to information provided by inference. The term ground truth appeared in remote sensing literature as early as 1972, when NASA described it as essential "data about ... materials on the earth's surface" used to calibrate measurements. It was later adopted by the statistical modeling and machine learning communities. == Etymology == The Oxford English Dictionary (s.v. ground truth) records the use of the word Groundtruth in the sense of 'fundamental truth' from Henry Ellison's poem "The Siberian Exile's Tale", published in 1833. == Usage == The term "ground truth" can be used as a noun, adjective, and verb. Noun: "ground truth" (no hyphen). Example: "The ground truth is essential for training accurate models." Adjective: "ground-truth" (hyphenated compound adjective). Example: "We need to use ground-truth data to validate the model." Verb: "to ground-truth" or "to groundtruth" (compound verb,). Example: "We need to ground-truth the results to ensure their accuracy." == Statistics and machine learning == In statistics and machine learning, ground truth is the ideal expected result, used in statistical models to prove or disprove research hypotheses. "Ground truthing" is the process of gathering the good data for this test. Ground truth is typically included in labeled data. In machine learning, "ground truth" is not necessarily objectively correct or true. For example, in training AI models or relevance rankers, it may be a set of judgments made by people or inferred from user behavior, which may depend on context. For example, in Bayesian spam filtering, a supervised learning system is typically trained by examples labeled as spam and non-spam. Although these labels may be subjective or inaccurate, they are considered ground truth. True ground truth in machine learning is objective data. For example, suppose we are testing a stereo vision system to see how well it can estimate 3D positions. A calibrated laser rangefinder may provide accurate distances as ground truth. == Remote sensing == In remote sensing, "ground truth" refers to information collected at the imaged location. Ground truth allows image data to be related to real features and materials on the ground. The collection of ground truth data enables calibration of remote-sensing data, and aids in the interpretation and analysis of what is being sensed. Examples include cartography, meteorology, analysis of aerial photographs, satellite imagery and other techniques in which data are gathered at a distance. More specifically, ground truth may refer to a process in which "pixels" on a satellite image are compared to what is imaged (at the time of capture) in order to verify the contents of the "pixels" in the image (noting that the concept of "pixel" is imaging-system-dependent). In the case of a classified image, supervised classification can help to determine the accuracy of the classification by the remote sensing system which can minimize error in the classification. Ground truth is usually done on site, correlating what is known with surface observations and measurements of various properties of the features of the ground resolution cells under study in the remotely sensed digital image. The process also involves taking geographic coordinates of the ground resolution cell with GPS technology and comparing those with the coordinates of the "pixel" being studied provided by the remote sensing software to understand and analyze the location errors and how it may affect a particular study. Ground truth is important in the initial supervised classification of an image. When the identity and location of land cover types are known through a combination of field work, maps, and personal experience these areas are known as training sites. The spectral characteristics of these areas are used to train the remote sensing software using decision rules for classifying the rest of the image. These decision rules such as Maximum Likelihood Classification, Parallelopiped Classification, and Minimum Distance Classification offer different techniques to classify an image. Additional ground truth sites allow the remote sensor to establish an error matrix that validates the accuracy of the classification method used. Different classification methods may have different percentages of error for a given classification project. It is important that the remote sensor chooses a classification method that works best with the number of classifications used while providing the least amount of error. Ground truth also helps with atmospheric correction. Since images from satellites have to pass through the atmosphere, they can get distorted because of absorption in the atmosphere. So ground truth can help fully identify objects in satellite photos. === Errors of commission === An example of an error of commission is when a pixel reports the presence of a feature (such a tree) that, in reality, is absent (no tree is actually present). Ground truthing ensures that the error matrices have a higher accuracy percentage than would be the case if no pixels were ground-truthed. This value is the complement of the user's accuracy, i.e. Commission Error = 1 - user's accuracy. === Errors of omission === An example of an error of omission is when pixels of a certain type, for example, maple trees, are not classified as maple trees. The process of ground-truthing helps to ensure that the pixel is classified correctly and the error matrices are more accurate. This value is the complement of the producer's accuracy, i.e. Omission Error = 1 - producer's accuracy == Geographical information systems == In GIS the spatial data is modeled as field (like in remote sensing raster images) or as object (like in vectorial map representation). They are modeled from the real world (also named geographical reality), typically by a cartographic process (illustrated). Geographic information systems such as GIS, GPS, and GNSS, have become so widespread that the term "ground truth" has taken on special meaning in that context. If the location coordinates returned by a location method such as GPS are an estimate of a location, then the "ground truth" is the actual location on Earth. A smart phone might return a set of estimated location coordinates such as 43.87870, −103.45901. The ground truth being estimated by those coordinates is the tip of George Washington's nose on Mount Rushmore. The accuracy of the estimate is the maximum distance between the location coordinates and the ground truth. We could say in this case that the estimate accuracy is 10 meters, meaning that the point on Earth represented by the location coordinates is thought to be within 10 meters of George's nose—the ground truth. In slang, the coordinates indicate where we think George Washington's nose is located, and the ground truth is where it really is. In practice a smart phone or hand-held GPS unit is routinely able to estimate the ground truth within 6–10 meters. Specialized instruments can reduce GPS measurement error to under a centimeter. == Military usage == US military slang uses "ground truth" to refer to the facts comprising a tactical situation—as opposed to intelligence reports, mission plans, and other descriptions reflecting the conative or policy-based projections of the industrial·military complex. The term appears in the title of the Iraq War documentary film The Ground Truth (2006), and also in military publications, for example Stars and Stripes saying: "Stripes decided to figure out what the ground truth was in Iraq."

    Read more →
  • Margin classifier

    Margin classifier

    In machine learning (ML), a margin classifier is a type of classification model which is able to give an associated distance from the decision boundary for each data sample. For instance, if a linear classifier is used, the distance (typically Euclidean, though others may be used) of a sample from the separating hyperplane is the margin of that sample. The notion of margins is important in several ML classification algorithms, as it can be used to bound the generalization error of these classifiers. These bounds are frequently shown using the VC dimension. The generalization error bound in boosting algorithms and support vector machines is particularly prominent. == Margin for boosting algorithms == The margin for an iterative boosting algorithm given a dataset with two classes can be defined as follows: the classifier is given a sample pair ( x , y ) {\displaystyle (x,y)} , where x ∈ X {\displaystyle x\in X} is a domain space and y ∈ Y = { − 1 , + 1 } {\displaystyle y\in Y=\{-1,+1\}} is the sample's label. The algorithm then selects a classifier h j ∈ C {\displaystyle h_{j}\in C} at each iteration j {\displaystyle j} where C {\displaystyle C} is a space of possible classifiers that predict real values. This hypothesis is then weighted by α j ∈ R {\displaystyle \alpha _{j}\in R} as selected by the boosting algorithm. At iteration t {\displaystyle t} , the margin of a sample x {\displaystyle x} can thus be defined as y ∑ j t α j h j ( x ) ∑ | α j | . {\displaystyle {\frac {y\sum _{j}^{t}\alpha _{j}h_{j}(x)}{\sum |\alpha _{j}|}}.} By this definition, the margin is positive if the sample is labeled correctly, or negative if the sample is labeled incorrectly. This definition may be modified and is not the only way to define the margin for boosting algorithms. However, there are reasons why this definition may be appealing. == Examples of margin-based algorithms == Many classifiers can give an associated margin for each sample. However, only some classifiers utilize information of the margin while learning from a dataset. Many boosting algorithms rely on the notion of a margin to assign weight to samples. If a convex loss is utilized (as in AdaBoost or LogitBoost, for instance) then a sample with a higher margin will receive less (or equal) weight than a sample with a lower margin. This leads the boosting algorithm to focus weight on low-margin samples. In non-convex algorithms (e.g., BrownBoost), the margin still dictates the weighting of a sample, though the weighting is non-monotone with respect to the margin. == Generalization error bounds == One theoretical motivation behind margin classifiers is that their generalization error may be bound by the algorithm parameters and a margin term. An example of such a bound is for the AdaBoost algorithm. Let S {\displaystyle S} be a set of m {\displaystyle m} data points, sampled independently at random from a distribution D {\displaystyle D} . Assume the VC-dimension of the underlying base classifier is d {\displaystyle d} and m ≥ d ≥ 1 {\displaystyle m\geq d\geq 1} . Then, with probability 1 − δ {\displaystyle 1-\delta } , we have the bound: P D ( y ∑ j t α j h j ( x ) ∑ | α j | ≤ 0 ) ≤ P S ( y ∑ j t α j h j ( x ) ∑ | α j | ≤ θ ) + O ( 1 m d log 2 ⁡ ( m / d ) / θ 2 + log ⁡ ( 1 / δ ) ) {\displaystyle P_{D}\left({\frac {y\sum _{j}^{t}\alpha _{j}h_{j}(x)}{\sum |\alpha _{j}|}}\leq 0\right)\leq P_{S}\left({\frac {y\sum _{j}^{t}\alpha _{j}h_{j}(x)}{\sum |\alpha _{j}|}}\leq \theta \right)+O\left({\frac {1}{\sqrt {m}}}{\sqrt {d\log ^{2}(m/d)/\theta ^{2}+\log(1/\delta )}}\right)} for all θ > 0 {\displaystyle \theta >0} .

    Read more →
  • Discrete diffusion model

    Discrete diffusion model

    In machine learning, discrete diffusion models are a class of diffusion models, which themselves are a class of latent variable generative models. Each discrete diffusion model consists of two major components: the forward jump diffusion process, and the reverse jump diffusion process. The goal of diffusion modeling is, given a given dataset and a forward process, to learn a model for the reverse process, such that the reverse process can generate new elements that are distributed similarly as the original dataset. A trained discrete diffusion model can be sampled in many ways, which trades off computational efficiency and sample quality. In general, higher quality data can be obtained, but at the price of higher computational cost. In standard diffusion modeling, the diffusion process takes place over a state space that is continuous space of R n {\displaystyle \mathbb {R} ^{n}} , but over a discrete set S {\displaystyle S} . A discrete set is simply a set where one cannot speak of "infinitesimally close" points. Points can be more or less separated from each other, but the separation is always a finite number. This in particular means the standard framework of continuous diffusion does not apply, since it uses gaussian noise, which is continuous. Nevertheless, an analogous theory can be produced. Discrete diffusion is usually used for language modeling. In practice, the state space S {\displaystyle S} is not only discrete, but finite, so this is what we will assume from now on. == Continuous time Markov process == In the case of continuous state space, during the forward discrete diffusion process, at each step t → t + d t {\displaystyle t\to t+dt} , we mix in an infinitesimal amount of gaussian noise d x t = − 1 2 β ( t ) x t d t + β ( t ) d W t {\displaystyle dx_{t}=-{\frac {1}{2}}\beta (t)x_{t}dt+{\sqrt {\beta (t)}}dW_{t}} . This changes the probability density function, by first a convolution with the density of a gaussian, followed by a scaling. In the case of discrete state space, the gaussian noise must be replaced by a noise that takes values over a finite set. For example, if the noise is the uniform distribution over S {\displaystyle S} , then the probability distribution at time t + d t {\displaystyle t+dt} satisfies q t + d t ( x ) = ( 1 − d t ) q t ( x ) + d t ( 1 | S | ∑ y ∈ S q t ( y ) ) {\displaystyle q_{t+dt}(x)=(1-dt)q_{t}(x)+dt\left({\frac {1}{|S|}}\sum _{y\in S}q_{t}(y)\right)} More succinctly, ∂ t q t ( x ) = − ( 1 − 1 | S | ) q t ( x ) + ∑ y ∈ S , y ≠ x 1 | S | q t ( y ) {\displaystyle \partial _{t}q_{t}(x)=-\left(1-{\frac {1}{|S|}}\right)q_{t}(x)+\sum _{y\in S,y\neq x}{\frac {1}{|S|}}q_{t}(y)} In general, we do not need to convolve with a uniformly distributed noise, but with an arbitrary noise process. That is, we use an arbitrary matrix Q t {\displaystyle Q_{t}} such that ∂ t q t ( y ) = ∑ x ∈ S Q t ( y , x ) q t ( x ) {\displaystyle \partial _{t}q_{t}(y)=\sum _{x\in S}Q_{t}(y,x)q_{t}(x)} where Q t {\displaystyle Q_{t}} is called the rate matrix. Any matrix may be used as a rate matrix if it has non-negative off-diagonals, and each column sums to 0: Q t ( y , x ) ≥ 0 ∀ y ≠ x , ∑ y ∈ S Q t ( y , x ) = 0 ∀ x {\displaystyle Q_{t}(y,x)\geq 0\quad \forall y\neq x,\quad \sum _{y\in S}Q_{t}(y,x)=0\quad \forall x} A continuous time Markov chain (CTMC) is defined by a continuous function Q {\displaystyle Q} that maps any time t ∈ [ 0 , T ) {\displaystyle t\in [0,T)} to a rate matrix Q t {\displaystyle Q_{t}} . Given the function Q {\displaystyle Q} , time-evolution under the CTMC is done as follows: Given state x t {\displaystyle x_{t}} at time t {\displaystyle t} , and given an infinitesimal d t {\displaystyle dt} , the state at t + d t {\displaystyle t+dt} is x t + d t {\displaystyle x_{t+dt}} , such that Pr ( x t + d t | x t ) = { 1 + Q t ( x t + d t , x t ) d t if x t + d t = x t Q t ( x t + d t , x t ) d t else {\displaystyle \Pr(x_{t+dt}|x_{t})={\begin{cases}1+Q_{t}(x_{t+dt},x_{t})dt&{\text{if }}x_{t+dt}=x_{t}\\Q_{t}(x_{t+dt},x_{t})dt&{\text{else}}\end{cases}}} This implies that the probability distribution function evolves according to ∂ t q t ( y ) = ∑ x ∈ S Q t ( y , x ) q t ( x ) {\displaystyle \partial _{t}q_{t}(y)=\sum _{x\in S}Q_{t}(y,x)q_{t}(x)} which is what we previously specified. === Backward process === Similarly to the case of continuous diffusion, in discrete diffusion, there exists a backward diffusion process Q ¯ t {\displaystyle {\bar {Q}}_{t}} : s ( x , t ) y := q t ( y ) q t ( x ) , Q ¯ t ( y , x ) := { s ( x , t ) y Q t ( x , y ) if y ≠ x − ∑ y : y ≠ x Q ¯ t ( y , x ) if y = x {\displaystyle s(x,t)_{y}:={\frac {q_{t}(y)}{q_{t}(x)}},\quad {\bar {Q}}_{t}(y,x):={\begin{cases}s(x,t)_{y}Q_{t}(x,y)&{\text{if }}y\neq x\\-\sum _{y:y\neq x}{\bar {Q}}_{t}(y,x)&{\text{if }}y=x\end{cases}}} where s ( x , t ) y {\displaystyle s(x,t)_{y}} should be interpreted as the discrete score or concrete score, since, abusing notation a bit, the score function is ∇ ln ⁡ ρ t ( x ) = 1 d x ( ρ t ( x + d x ) ρ t ( x ) − 1 ) {\displaystyle \nabla \ln \rho _{t}(x)={\frac {1}{dx}}\left({\frac {\rho _{t}(x+dx)}{\rho _{t}(x)}}-1\right)} . If we picture the distribution q t {\displaystyle q_{t}} as a bunch of point-masses, one per state x ∈ S {\displaystyle x\in S} , then the forward diffusion from time t {\displaystyle t} to t + d t {\displaystyle t+dt} is performed by removing Q t ( x , y ) q t ( y ) d t {\displaystyle Q_{t}(x,y)q_{t}(y)dt} from the mass at y {\displaystyle y} and moving it to the mass at x {\displaystyle x} , for each pair x ≠ y {\displaystyle x\neq y} . Thus, the process is reversed in detail by the CTMC defined by Q ¯ {\displaystyle {\bar {Q}}} , since Q ¯ t ( y , x ) q t ( x ) = Q t ( x , y ) q t ( y ) {\displaystyle {\bar {Q}}_{t}(y,x)q_{t}(x)=Q_{t}(x,y)q_{t}(y)} . Given Q ¯ t {\displaystyle {\bar {Q}}_{t}} , if we have a way to sample from q t {\displaystyle q_{t}} , then we can sample from q t − d t {\displaystyle q_{t-dt}} by first sampling x t ∼ q t {\displaystyle x_{t}\sim q_{t}} , then sampling x t − d t {\displaystyle x_{t-dt}} according to Pr ( x t − d t | x t ) = { 1 + Q ¯ t ( x t − d t , x t ) d t if x t − d t = x t Q ¯ t ( x t − d t , x t ) d t else {\displaystyle \Pr(x_{t-dt}|x_{t})={\begin{cases}1+{\bar {Q}}_{t}(x_{t-dt},x_{t})dt&{\text{if }}x_{t-dt}=x_{t}\\{\bar {Q}}_{t}(x_{t-dt},x_{t})dt&{\text{else}}\end{cases}}} === Overall plan of score-matching discrete diffusion modeling === Similar to score-matching continuous diffusion, score-matching discrete diffusion is a method to sample an initial distribution. If we have a certain function s θ {\displaystyle s_{\theta }} that approximates the true score function s θ ( x , t ) y ≈ s ( x , t ) y {\displaystyle s_{\theta }(x,t)_{y}\approx s(x,t)_{y}} , then it allows a corresponding Q ¯ θ {\displaystyle {\bar {Q}}^{\theta }} to be defined in the same way. If we also have a base distribution q base {\displaystyle q_{\text{base}}} such that it is easy to sample from, and approximately equal to the true terminal distribution q base ≈ q T {\displaystyle q_{\text{base}}\approx q_{T}} , then we can perform the backward CTMC with Q ¯ θ {\displaystyle {\bar {Q}}^{\theta }} and q T θ := q terminal {\displaystyle q_{T}^{\theta }:=q_{\text{terminal}}} . When both approximations are good, the backward CTMC would give q 0 θ ≈ q 0 {\displaystyle q_{0}^{\theta }\approx q_{0}} . This is the idea of score-matching discrete diffusion modeling. If q data {\displaystyle q_{\text{data}}} is sharp, in the sense that for some x , x ′ {\displaystyle x,x'} , we have q data ( x ) ≫ q data ( x ′ ) {\displaystyle q_{\text{data}}(x)\gg q_{\text{data}}(x')} , then the score function would diverge as 1 / t {\displaystyle 1/t} at the t → 0 {\displaystyle t\to 0} limit. To avoid this in practice, it is common to use early stopping, which is to stop the backward process at some time δ > 0 {\displaystyle \delta >0} , and sample from q δ θ {\displaystyle q_{\delta }^{\theta }} instead of q 0 θ {\displaystyle q_{0}^{\theta }} . === Tractable forward processes === The theory of CTMC works for any continuous choice of rate matrices Q {\displaystyle Q} . However, most choices are computationally expensive and cannot be used in practice. In the case of continuous diffusion, the gaussian noise is used for the simple reason that the sum of any number of gaussians is still a gaussian. This allows one to sample any x t ∼ ρ t {\displaystyle x_{t}\sim \rho _{t}} by sampling a single x 0 ∼ ρ 0 {\displaystyle x_{0}\sim \rho _{0}} , followed by a single gaussian noise z ∼ N ( 0 , I ) {\displaystyle z\sim {\mathcal {N}}(0,I)} , and let x t = α ¯ t x 0 + σ t z {\displaystyle x_{t}={\sqrt {{\bar {\alpha }}_{t}}}x_{0}+\sigma _{t}z} , without needing any x s {\displaystyle x_{s}} for any 0 < s < t {\displaystyle 0 Read more →

  • Resolution enhancement technology

    Resolution enhancement technology

    Resolution enhancement technology (RET) is a form of image processing technology used to manipulate dot characteristics popular among laser printer and inkjet printer manufacturers. Closely related RET techniques are also used in VLSI photolithography manufacturing technology, in particular in relation to 90 nanometre technology. Resolution refers to the sharpness of image detail, smoothness of curved lines, and the faithful reproduction of an image. In both cases, RET uses pre-compensation of the image in order to try to mitigate the effects of the printing process. Among the major issues in RET in VLSI technology are the fundamental properties of a wave: amplitude, phase, and direction.

    Read more →
  • Confirmatory blockmodeling

    Confirmatory blockmodeling

    Confirmatory blockmodeling is a deductive approach in blockmodeling, where a blockmodel (or part of it) is prespecify before the analysis, and then the analysis is fit to this model. When only a part of analysis is prespecify (like individual cluster(s) or location of the block types), it is called partially confirmatory blockmodeling. This is so-called indirect approach, where the blockmodeling is done on the blockmodel fitting (e.g., a priori hypothesized blockmodel). Opposite approach to the confirmatory blockmodeling is an inductive exploratory blockmodeling.

    Read more →
  • Population model (evolutionary algorithm)

    Population model (evolutionary algorithm)

    The population model of an evolutionary algorithm (EA) describes the structural properties of its population to which its members are subject. A population is the set of all proposed solutions of an EA considered in one iteration, which are also called individuals according to the biological role model. The individuals of a population can generate further individuals as offspring with the help of the genetic operators of the procedure. The simplest and widely used population model in EAs is the global or panmictic model, which corresponds to an unstructured population. It allows each individual to choose any other individual of the population as a partner for the production of offspring by crossover, whereby the details of the selection are irrelevant as long as the fitness of the individuals plays a significant role. Due to global mate selection, the genetic information of even slightly better individuals can prevail in a population after a few generations (iteration of an EA), provided that no better other offspring have emerged in this phase. If the solution found in this way is not the optimum sought, that is called premature convergence. This effect can be observed more often in panmictic populations. In nature global mating pools are rarely found. What prevails is a certain and limited isolation due to spatial distance. The resulting local neighbourhoods initially evolve independently and mutants have a higher chance of persisting over several generations. As a result, genotypic diversity in the gene pool is preserved longer than in a panmictic population. It is therefore obvious to divide the previously global population by substructures. Two basic models were introduced for this purpose, the island models, which are based on a division of the population into fixed subpopulations that exchange individuals from time to time, and the neighbourhood models, which assign individuals to overlapping neighbourhoods, also known as cellular genetic or evolutionary algorithms (cGA or cEA). The associated division of the population also suggests a corresponding parallelization of the procedure. For this reason, the topic of population models is also frequently discussed in the literature in connection with the parallelization of EAs. == Island models == In the island model, also called the migration model or coarse grained model, evolution takes place in strictly divided subpopulations. These can be organised panmictically, but do not have to be. From time to time an exchange of individuals takes place, which is called migration. The time between an exchange is called an epoch and its end can be triggered by various criteria: E.g. after a given time or given number of completed generations, or after the occurrence of stagnation. Stagnation can be detected, for example, by the fact that no fitness improvement has occurred in the island for a given number of generations. Island models introduce a variety of new strategy parameters: Number of subpopulations Size of the subpopulations Neighbourhood relations between islands: they determine which islands are considered neighbouring and can thus exchange individuals, see picture of a simple unidirectional ring (black arrows) and its extension by additional bidirectional neighbourhood relations (additional green arrows) Criteria for the termination of an epoch, synchronous or asynchronous migration Migration rate: number or proportion of individuals involved in migration. Migrant selection: There are many alternatives for this. E.g. the best individuals can replace the worst or randomly selected ones. Depending on the migration rate, this can affect one or more individuals at a time. With these parameters, the selection pressure can be influenced to a considerable extent. For example, it increases with the interconnectedness of the islands and decreases with the number of subpopulations or the epoch length. == Neighbourhood models or cellular evolutionary algorithms == The neighbourhood model, also called diffusion model or fine grained model, defines a topological neighbouhood relation between the individuals of a population that is independent of their phenotypic properties. The fundamental idea of this model is to provide the EA population with a special structure defined as a connected graph, in which each vertex is an individual that communicates with its nearest neighbours. Particularly, individuals are conceptually set in a toroidal mesh, and are only allowed to recombine with close individuals. This leads to a kind of locality known as isolation by distance. The set of potential mates of an individual is called its neighbourhood or deme. The adjacent figure illustrates that by showing two slightly overlapping neighbourhoods of two individuals marked yellow, through which genetic information can spread between the two demes. It is known that in this kind of algorithm, similar individuals tend to cluster and create niches that are independent of the deme boundaries and, in particular, can be larger than a deme. There is no clear borderline between adjacent groups, and close niches could be easily colonized by competitive ones and maybe merge solution contents during this process. Simultaneously, farther niches can be affected more slowly. EAs with this type of population are also well known as cellular EAs (cEA) or cellular genetic algorithms (cGA). A commonly used structure for arranging the individuals of a population is a 2D toroidal grid, although the number of dimensions can be easily extended (to 3D) or reduced (to 1D, e.g. a ring, see the figure on the right). The neighbourhood of a particular individual in the grid is defined in terms of the Manhattan distance from it to others in the population. In the basic algorithm, all the neighbourhoods have the same size and identical shapes. The two most commonly used neighbourhoods for two-dimensional cEAs are L5 and C9, see the figure on the left. Here, L stands for Linear while C stands for Compact. Each deme represents a panmictic subpopulation within which mate selection and the acceptance of offspring takes place by replacing the parent. The rules for the acceptance of offspring are local in nature and based on the neighbourhood: for example, it can be specified that the best offspring must be better than the parent being replaced or, less strictly, only better than the worst individual in the deme. The first rule is elitist and creates a higher selective pressure than the second non-elitist rule. In elitist EAs, the best individual of a population always survives. In this respect, they deviate from the biological model. The overlap of the neighbourhoods causes a mostly slow spread of genetic information across the neighbourhood boundaries, hence the name diffusion model. A better offspring now needs more generations than in panmixy to spread in the population. This promotes the emergence of local niches and their local evolution, thus preserving genotypic diversity over a longer period of time. The result is a better and dynamic balance between breadth and depth search adapted to the search space during a run. Depth search takes place in the niches and breadth search in the niche boundaries and through the evolution of the different niches of the whole population. For the same neighbourhood size, the spread of genetic information is larger for elongated figures like L9 than for a block like C9, and again significantly larger than for a ring. This means that ring neighbourhoods are well suited for achieving high quality results, even if this requires comparatively long run times. On the other hand, if one is primarily interested in fast and good, but possibly suboptimal results, 2D topologies are more suitable. == Comparison == When applying both population models to genetic algorithms, evolutionary strategy and other EAs, the splitting of a total population into subpopulations usually reduces the risk of premature convergence and leads to better results overall more reliably and faster than would be expected with panmictic EAs. Island models have the disadvantage compared to neighbourhood models that they introduce a large number of new strategy parameters. Despite the existing studies on this topic in the literature, a certain risk of unfavourable settings remains for the user. With neighbourhood models, on the other hand, only the size of the neighbourhood has to be specified and, in the case of the two-dimensional model, the choice of the neighbourhood figure is added. == Parallelism == Since both population models imply population partitioning, they are well suited as a basis for parallelizing an EA. This applies even more to cellular EAs, since they rely only on locally available information about the members of their respective demes. Thus, in the extreme case, an independent execution thread can be assigned to each individual, so that the entire cEA can run on a parallel hardware platform. The island model also supports p

    Read more →