FERET (facial recognition technology)

FERET (facial recognition technology)

The Facial Recognition Technology (FERET) program was a government-sponsored project that aimed to create a large, automatic face-recognition system for intelligence, security, and law enforcement purposes. The program began in 1993 under the combined leadership of Dr. Harry Wechsler at George Mason University (GMU) and Dr. Jonathon Phillips at the Army Research Laboratory (ARL) in Adelphi, Maryland and resulted in the development of the Facial Recognition Technology (FERET) database. The goal of the FERET program was to advance the field of face recognition technology by establishing a common database of facial imagery for researchers to use and setting a performance baseline for face-recognition algorithms. Potential areas where this face-recognition technology could be used include: Automated searching of mug books using surveillance photos Controlling access to restricted facilities or equipment Checking the credentials of personnel for background and security clearances Monitoring airports, border crossings, and secure manufacturing facilities for particular individuals Finding and logging multiple appearances of individuals over time in surveillance videos Verifying identities at ATM machines Searching photo ID records for fraud detection The FERET database has been used by more than 460 research groups and is currently managed by the National Institute of Standards and Technology (NIST). By 2017, the FERET database has been used to train artificial intelligence programs and computer vision algorithms to identify and sort faces. == History == The origin of facial recognition technology is largely attributed to Woodrow Wilson Bledsoe and his work in the 1960s, when he developed a system to identify faces from a database of thousands of photographs. The FERET program first began as a way to unify a large body of face-recognition technology research under a standard database. Before the program's inception, most researchers created their own facial imagery database that was attuned to their own specific area of study. These personal databases were small and usually consisted of images from less than 50 individuals. The only notable exceptions were the following: Alex Pentland’s database of around 7500 facial images at the Massachusetts Institute of Technology (MIT) Joseph Wilder's database of around 250 individuals at Rutgers University Christoph von der Malsburg’s database of around 100 facial images at the University of Southern California (USC) The lack of a common database made it difficult to compare the results of face recognition studies in the scientific literature because each report involved different assumptions, scoring methods, and images. Most of the papers that were published did not use images from a common database nor follow a standard testing protocol. As a result, researchers were unable to make informed comparisons between the performances of different face-recognition algorithms. In September 1993, the FERET program was spearheaded by Dr. Harry Wechsler and Dr. Jonathon Phillips under the sponsorship of the U.S. Department of Defense Counterdrug Technology Development Program through DARPA with ARL serving as technical agent. === Phase I === The first facial images for the FERET database were collected from August 1993 to December 1994, a time period known as Phase I. The pictures were initially taken with a 35-mm camera at both GMU and ARL facilities, and the same physical setup was used in each photography session to keep the images consistent. For each individual, the pictures were taken in sets, including two frontal views, a right and left profile, a right and left quarter profile, a right and left half profile, and sometimes at five extra locations. Therefore, a set of images consisted of 5 to 11 images per person. At the end of Phase I, the FERET database had collected 673 sets of images, resulting in over 5000 total images. At the end of Phase I, five organizations were given the opportunity to test their face-recognition algorithm on the newly created FERET database in order to compare how they performed against each other. There five principal investigators were: MIT, led by Alex Pentland Rutgers University, led by Joseph Wilder The Analytic Science Company (TASC), led by Gale Gordon The University of Illinois at Chicago (UIC) and the University of Illinois at Urbana-Champaign, led by Lewis Sadler and Thomas Huang USC, led by Christoph von der Malsburg During this evaluation, three different automatic tests were given to the principal investigators without human intervention: The large gallery test, which served to baseline how algorithms performed against a database when it has not been properly tuned. The false-alarm test, which tested how well the algorithm monitored an airport for suspected terrorists. The rotation test, which measured how well the algorithm performed when the images of an individual in the gallery had different poses compared to those in the probe set. For most of the test trials, the algorithms developed by USC and MIT managed to outperform the other three algorithms for the Phase I evaluation. === Phase II === Phase II began after Phase I, and during this time, the FERET database acquired more sets of facial images. By the start of the Phase II evaluation in March 1995, the database contained 1109 sets of images for a total of 8525 images of 884 individuals. During the second evaluation, the same algorithms from the Phase I evaluation were given a single test. However, the database now contained significantly more duplicate images (463, compared to the previous 60), making the test more challenging. === Phase III === Afterwards, the FERET program entered Phase III where another 456 sets of facial images were added to the database. The Phase III evaluation, which took place in September 1996, aimed to not only gauge the progress of the algorithms since the Phase I assessment but also identify the strengths and weaknesses of each algorithm and determine future objectives for research. By the end of 1996, the FERET database had accumulated a total of 14,126 facial images pertaining to 1199 different individuals as well as 365 duplicate sets of images. As a result of the FERET program, researchers were able to establish a common baseline for comparing different face-recognition algorithms and create a large standard database of facial images that is open for research. In 2003, DARPA released a high-resolution, 24-bit color version of the images in the FERET database (existing reference).

Reverse correlation technique

The reverse correlation technique is a data driven study method used primarily in psychological and neurophysiological research. This method earned its name from its origins in neurophysiology, where cross-correlations between white noise stimuli and sparsely occurring neuronal spikes could be computed quicker when only computing it for segments preceding the spikes. The term has since been adopted in psychological experiments that usually do not analyze the temporal dimension, but also present noise to human participants. In contrast to the original meaning, the term is here thought to reflect that the standard psychological practice of presenting stimuli of defined categories to the participants is "reversed": Instead, the participant's mental representations of categories are estimated from interactions of the presented noise and the behavioral responses. It is used to create composite pictures of individual and/or group mental representations of various items (e.g. faces, bodies, and the self) that depict characteristics of said items (e.g. trustworthiness and self-body image). This technique is helpful when evaluating the mental representations of those with and without mental illnesses. == Terms == This technique utilizes spike-triggered average to explain what areas of signal and noise in an image are valuable for the given research question. Signal is information used to produce objects of value that help explain and connect the world around us. Noise is commonly referred to as unwanted signal that obscures the information that the signal is trying to present. Most importantly for reverse correlation studies, noise is randomly varying information. To determine the areas of importance using reverse correlation, noise is applied to a base image and then evaluated by observers. A base image is any image void of noise that relates to the research question. A base image that has noise superimposed on top is the stimuli that is presented to and evaluated by participants. Each time a new set of stimuli is presented to a participant, this is known as a trial. After a participant has responded to hundreds to thousands of trials, a researcher is ready to create a classification image. A classification image (abbreviated as "CI" in some studies) is a single image that represents the average noise patterns in the images selected by participants. A classification image can also be computed for groups by averaging the individuals’ classification images. These classification images are what researchers use to interpret the data and draw conclusions. As a whole, the reverse correlation method is a process that results in a composite image (from an individual or group) that can be used to estimate and interpret mental representations. == Basic study layout == The reverse correlation method is typically executed as an in-lab computer experiment. This method follows four broad steps. Each of the following steps are described in greater detail below. After creating a research question and determining that the reverse correlation method is the most suitable technique to answer the question, a researcher must (1) design randomly varying stimuli. After the stimuli have been prepared, a researcher should (2) collect data from participants who will see and respond to approximately 300 -1,000 trials. Each trial will either consist of one or two images (side by side) derived from the same base image with noise superimposed on top. Participant responses will depend on the chosen study design; if a researcher presents only one image at a time, participants rate the image on a 4pt scale, but when two images are shown, the participant is asked to choose which best aligns with the given category (e.g. choose the image that looks the most aggressive). Once all of the data is collected, the researcher will (3) compute classification images for each participant and using those images compute group classification images. Finally, with the classification images available, the researcher will (4) evaluate the images and draw conclusions about their results. === Step 1: making stimuli === When designing the stimuli for a reverse correlation study, the two primary factors that one should consider are (1) the base image and (2) the noise that will be used. While not all bases are images per se, the majority are and for this reason the base is typically referred to as a base image. The base image should represent whatever the research question is addressing. For example, if you are interested in peoples’ mental representations of Chinese people, it would not make sense to use a base image of a Spanish or Caucasian person. Again, if you are interested in the mental representations of male vocal patterns, it would make the most sense to use a base vocal pattern that has been produced by a male. Having a base is important because it provides a kind of anchor for participants to work from. When there is no base image, the number of trials that are required increases dramatically, thus making it harder to collect data. While there are studies that have excluded a base image, (e.g. the S study), for more elaborate and nuanced research questions, it is important to have a base image that is a fair representation of what participants are being asked to categorize. Photographs of faces are generally the most popular base image. Although the reverse correlation method is capable of investigating a wide variety of research questions, the most common application of the method is for evaluating faces on a single trait. Reverse correlation studies that address evaluations of the face are sometimes referred to as being a face space reverse correlation model (FSRCM). Thankfully, there are existing databases for face images of varying demographics and emotion that work well as base images. The reverse correlation method can also be used to help researchers identify what areas of an image (e.g. the areas on the face) have diagnostic value. In order to identify these areas of value, researchers start by minimizing the space a participant can pull information from. By imposing a “mask” on an image (e.g. blur an image while leaving random areas un-blurred), this reduces the information individuals might see, and forces them to focus on certain areas. Then, if/when participants are able to correctly identify an image with a trait repeatedly, we can draw conclusions about what areas have diagnostic value. While faces and visual stimuli are the most popular, this is not the only stimuli that can be used in a reverse correlation study. This method was originally designed for auditory stimuli which allows researchers to investigate how perceivers interpret auditory information and create trait based attributions to different sound patterns. For example, by segmenting a vocal recording of a single word (total sound time 426 ms) into six segments (71 ms each), and varying each segment's pitch using Gaussian distributions, researchers were able to uncover what vocal patterns people associated with certain traits. Specifically, this study investigated how listeners rated sound clips of the word “really” as sounding more interrogative (i.e. like the more common reverse correlation studies this study had participants listen to two sound clips per trial, choose which fit the category the best, and then created an average of the pitch contours). Beyond face and auditory perception, research utilizing the reverse correlation method has expanded to investigate how individuals see three-dimensional objects in images with noise (but no signal). After selecting your base image, regardless of what the image is, it is helpful to apply a Gaussian blur to smooth noise in the image. While noise will be applied later, it is helpful to reduce existing noise in the photo before applying your chosen noise. There are three primary choices when it comes to noise: white noise, sine-wave noise, and Gabor noise. The latter two of these constrain the configurations that the noise can have, and because of this white noise is usually the most commonly used. Regardless of the type of noise that is chosen, it is crucial that the noise randomly varies. === Step 2: data collection === Once the stimuli for the study has been developed, the researcher must make a few decisions before actually collecting the data. The researcher must come to a conclusion on how many stimuli will be presented at a time and how many trials the participants will see. In terms of stimuli presentation, a researcher can choose from either a 2-Image Forced Choice (2IFC) or a 4-Alternative Forced Choice (4AFC). The 2IFC presents two images at once (side by side) and requires participants to choose between the two on a specified category (e.g. which image looks the most like a male). Typically the noise from the left image is the mathematical inverse of the noise from the right image. This method was developed to better answer questions that could n

Intelligent character recognition

Intelligent character recognition (ICR) is a method of extracting handwritten text from images. It is a more sophisticated type of OCR technology that recognizes different handwriting styles and fonts to intelligently interpret data from physical documents. ICR is used to organize paper-based unstructured data by scanning documents, extracting information, and adapting extracted data for database storage. ICR algorithms collaborate with OCR to automate data entry from forms by removing the need for keystrokes. It has a high degree of accuracy and is a dependable method for processing various handwritten media quickly. == Capabilities == Most ICR software has a self-learning neural network-based algorithms, which automatically update the recognition database for new handwriting patterns. It extends the usefulness of scanning devices for the purpose of document processing, from printed character recognition (a function of OCR) to hand-written matter recognition. Because this process is involved in recognizing hand writing, accuracy levels may, in some circumstances, not be very good but can achieve 97%+ accuracy rates in reading handwriting in structured forms. Often to achieve these high recognition rates several read engines are used within the software and each is given elective voting rights to determine the true reading of characters. In numeric fields, engines which are designed to read numbers take preference, while in alpha fields, engines designed to read hand written letters have higher elective rights. When used in conjunction with a bespoke interface hub, hand-written data can be automatically populated into a back office system avoiding laborious manual keying and can be more accurate than traditional human data entry. === Automated forms processing === An important development of ICR was the invention of automated forms processing in 1993 by Joseph Corcoran who was awarded a patent on the invention. This involved a three-stage process of capturing the image of the form to be processed by ICR and preparing it to enable the ICR engine to give best results, then capturing the information using the ICR engine and finally processing the results to automatically validate the output from the ICR engine. This application of ICR increased the usefulness of the technology and made it applicable for use with real world forms in normal business applications. Modern software applications use ICR as a technology of recognizing text in forms filled in by hand (hand-printed). == Differences between ICR and OCR == === OCR === Optical character recognition (OCR) is commonly considered to apply to any recognition technique that reads machine printed text. An example of a traditional OCR use case would be to translate the characters from an image of a printed document, such as a book page, newspaper clipping, or legal contract, into a separate file that could be searched and updated with a word processor or document viewer. It's also quite helpful for automating the processing of forms. Information can be swiftly extracted from form fields and entered into another application, like a spreadsheet or database, by zonally applying the OCR engine to those fields. Yet, data is typically manually input rather than typed into form fields. Character identification becomes even more challenging while reading handwritten material. The diversity of more than 700,000 printed font variants is tiny compared to the near unlimited variations in hand-printed characters. The recognition program must take into account not just stylistic differences but also the kind of writing implement used, the standard of the paper, errors, hand stability, and smudges or running ink. === ICR === Intelligent character recognition (ICR) makes use of continuously improving algorithms to collect more information about the variances in hand-printed characters and more precisely identify them. ICR, which was created in the early 1990s to aid in the automation of forms processing, enables the conversion of manually entered data into text that is simple to read, search for, and change. When used to read characters that are obviously divided into distinct areas or zones, such as fixed fields seen on many structured forms, it works best. Both OCR and ICR can be configured to read a variety of languages; however, limiting the expected character set to a smaller number of languages will produce better recognition outcomes. ICR cannot read cursive handwriting since it must still be able to assess each character individually. While writing in cursive, it might be difficult to tell where one character ends and another one begins, and there are more differences across samples than when hand-printing text. A more recent method called intelligent word recognition (IWR) focuses on reading a word in context rather than recognizing individual characters. == Intelligent word recognition == Intelligent word recognition (IWR) can recognize and extract not only printed-handwritten information, cursive handwriting as well. ICR recognizes on the character-level, whereas IWR works with full words or phrases. Capable of capturing unstructured information from every day pages, IWR is said to be more evolved than hand print ICR. Not meant to replace conventional ICR and OCR systems, IWR is optimized for processing real-world documents that contain mostly free-form, hard-to-recognize data fields that are inherently unsuitable for ICR. This means that the highest and best use of IWR is to eliminate a high percentage of the manual entry of handwritten data and run-on hand print fields on documents that otherwise could be keyed only by humans.

Wasserstein GAN

The Wasserstein Generative Adversarial Network (WGAN) is a variant of generative adversarial network (GAN) proposed in 2017 that aims to "improve the stability of learning, get rid of problems like mode collapse, and provide meaningful learning curves useful for debugging and hyperparameter searches". Compared with the original GAN discriminator, the Wasserstein GAN discriminator provides a better learning signal to the generator. This allows the training to be more stable when generator is learning distributions in very high dimensional spaces. == Motivation == === The GAN game === The original GAN method is based on the GAN game, a zero-sum game with 2 players: generator and discriminator. The game is defined over a probability space ( Ω , B , μ r e f ) {\displaystyle (\Omega ,{\mathcal {B}},\mu _{ref})} , The generator's strategy set is the set of all probability measures μ G {\displaystyle \mu _{G}} on ( Ω , B ) {\displaystyle (\Omega ,{\mathcal {B}})} , and the discriminator's strategy set is the set of measurable functions D : Ω → [ 0 , 1 ] {\displaystyle D:\Omega \to [0,1]} . The objective of the game is L ( μ G , D ) := E x ∼ μ r e f [ ln ⁡ D ( x ) ] + E x ∼ μ G [ ln ⁡ ( 1 − D ( x ) ) ] . {\displaystyle L(\mu _{G},D):=\mathbb {E} _{x\sim \mu _{ref}}[\ln D(x)]+\mathbb {E} _{x\sim \mu _{G}}[\ln(1-D(x))].} The generator aims to minimize it, and the discriminator aims to maximize it. A basic theorem of the GAN game states that Repeat the GAN game many times, each time with the generator moving first, and the discriminator moving second. Each time the generator μ G {\displaystyle \mu _{G}} changes, the discriminator must adapt by approaching the ideal D ∗ ( x ) = d μ r e f d ( μ r e f + μ G ) . {\displaystyle D^{}(x)={\frac {d\mu _{ref}}{d(\mu _{ref}+\mu _{G})}}.} Since we are really interested in μ r e f {\displaystyle \mu _{ref}} , the discriminator function D {\displaystyle D} is by itself rather uninteresting. It merely keeps track of the likelihood ratio between the generator distribution and the reference distribution. At equilibrium, the discriminator is just outputting 1 2 {\displaystyle {\frac {1}{2}}} constantly, having given up trying to perceive any difference. Concretely, in the GAN game, let us fix a generator μ G {\displaystyle \mu _{G}} , and improve the discriminator step-by-step, with μ D , t {\displaystyle \mu _{D,t}} being the discriminator at step t {\displaystyle t} . Then we (ideally) have L ( μ G , μ D , 1 ) ≤ L ( μ G , μ D , 2 ) ≤ ⋯ ≤ max μ D L ( μ G , μ D ) = 2 D J S ( μ r e f ‖ μ G ) − 2 ln ⁡ 2 , {\displaystyle L(\mu _{G},\mu _{D,1})\leq L(\mu _{G},\mu _{D,2})\leq \cdots \leq \max _{\mu _{D}}L(\mu _{G},\mu _{D})=2D_{JS}(\mu _{ref}\|\mu _{G})-2\ln 2,} so we see that the discriminator is actually lower-bounding D J S ( μ r e f ‖ μ G ) {\displaystyle D_{JS}(\mu _{ref}\|\mu _{G})} . === Wasserstein distance === Thus, we see that the point of the discriminator is mainly as a critic to provide feedback for the generator, about "how far it is from perfection", where "far" is defined as Jensen–Shannon divergence. Naturally, this brings the possibility of using a different criteria of farness. There are many possible divergences to choose from, such as the f-divergence family, which would give the f-GAN. The Wasserstein GAN is obtained by using the Wasserstein metric, which satisfies a "dual representation theorem" that renders it highly efficient to compute: A proof can be found in the main page on Wasserstein metric. == Definition == By the Kantorovich-Rubenstein duality, the definition of Wasserstein GAN is clear:A Wasserstein GAN game is defined by a probability space ( Ω , B , μ r e f ) {\displaystyle (\Omega ,{\mathcal {B}},\mu _{ref})} , where Ω {\displaystyle \Omega } is a metric space, and a constant K > 0 {\displaystyle K>0} . There are 2 players: generator and discriminator (also called "critic"). The generator's strategy set is the set of all probability measures μ G {\displaystyle \mu _{G}} on ( Ω , B ) {\displaystyle (\Omega ,{\mathcal {B}})} . The discriminator's strategy set is the set of measurable functions of type D : Ω → R {\displaystyle D:\Omega \to \mathbb {R} } with bounded Lipschitz-norm: ‖ D ‖ L ≤ K {\displaystyle \|D\|_{L}\leq K} . The Wasserstein GAN game is a zero-sum game, with objective function L W G A N ( μ G , D ) := E x ∼ μ G [ D ( x ) ] − E x ∼ μ r e f [ D ( x ) ] . {\displaystyle L_{WGAN}(\mu _{G},D):=\mathbb {E} _{x\sim \mu _{G}}[D(x)]-\mathbb {E} _{x\sim \mu _{ref}}[D(x)].} The generator goes first, and the discriminator goes second. The generator aims to minimize the objective, and the discriminator aims to maximize the objective: min μ G max D L W G A N ( μ G , D ) . {\displaystyle \min _{\mu _{G}}\max _{D}L_{WGAN}(\mu _{G},D).} By the Kantorovich-Rubenstein duality, for any generator strategy μ G {\displaystyle \mu _{G}} , the optimal reply by the discriminator is D ∗ {\displaystyle D^{}} , such that L W G A N ( μ G , D ∗ ) = K ⋅ W 1 ( μ G , μ r e f ) . {\displaystyle L_{WGAN}(\mu _{G},D^{})=K\cdot W_{1}(\mu _{G},\mu _{ref}).} Consequently, if the discriminator is good, the generator would be constantly pushed to minimize W 1 ( μ G , μ r e f ) {\displaystyle W_{1}(\mu _{G},\mu _{ref})} , and the optimal strategy for the generator is just μ G = μ r e f {\displaystyle \mu _{G}=\mu _{ref}} , as it should. == Comparison with GAN == In the Wasserstein GAN game, the discriminator provides a better gradient than in the GAN game. Consider for example a game on the real line where both μ G {\displaystyle \mu _{G}} and μ r e f {\displaystyle \mu _{ref}} are Gaussian. Then the optimal Wasserstein critic D W G A N {\displaystyle D_{WGAN}} and the optimal GAN discriminator D {\displaystyle D} are plotted as below: For fixed discriminator, the generator needs to minimize the following objectives: For GAN, E x ∼ μ G [ ln ⁡ ( 1 − D ( x ) ) ] {\displaystyle \mathbb {E} _{x\sim \mu _{G}}[\ln(1-D(x))]} . For Wasserstein GAN, E x ∼ μ G [ D W G A N ( x ) ] {\displaystyle \mathbb {E} _{x\sim \mu _{G}}[D_{WGAN}(x)]} . Let μ G {\displaystyle \mu _{G}} be parametrized by θ {\displaystyle \theta } , then we can perform stochastic gradient descent by using two unbiased estimators of the gradient: ∇ θ E x ∼ μ G [ ln ⁡ ( 1 − D ( x ) ) ] = E x ∼ μ G [ ln ⁡ ( 1 − D ( x ) ) ⋅ ∇ θ ln ⁡ ρ μ G ( x ) ] {\displaystyle \nabla _{\theta }\mathbb {E} _{x\sim \mu _{G}}[\ln(1-D(x))]=\mathbb {E} _{x\sim \mu _{G}}[\ln(1-D(x))\cdot \nabla _{\theta }\ln \rho _{\mu _{G}}(x)]} ∇ θ E x ∼ μ G [ D W G A N ( x ) ] = E x ∼ μ G [ D W G A N ( x ) ⋅ ∇ θ ln ⁡ ρ μ G ( x ) ] {\displaystyle \nabla _{\theta }\mathbb {E} _{x\sim \mu _{G}}[D_{WGAN}(x)]=\mathbb {E} _{x\sim \mu _{G}}[D_{WGAN}(x)\cdot \nabla _{\theta }\ln \rho _{\mu _{G}}(x)]} where we used the reparameterization trick. As shown, the generator in GAN is motivated to let its μ G {\displaystyle \mu _{G}} "slide down the peak" of ln ⁡ ( 1 − D ( x ) ) {\displaystyle \ln(1-D(x))} . Similarly for the generator in Wasserstein GAN. For Wasserstein GAN, D W G A N {\displaystyle D_{WGAN}} has gradient 1 almost everywhere, while for GAN, ln ⁡ ( 1 − D ) {\displaystyle \ln(1-D)} has flat gradient in the middle, and steep gradient elsewhere. As a result, the variance for the estimator in GAN is usually much larger than that in Wasserstein GAN. See also Figure 3 of. The problem with D J S {\displaystyle D_{JS}} is much more severe in actual machine learning situations. Consider training a GAN to generate ImageNet, a collection of photos of size 256-by-256. The space of all such photos is R 256 2 {\displaystyle \mathbb {R} ^{256^{2}}} , and the distribution of ImageNet pictures, μ r e f {\displaystyle \mu _{ref}} , concentrates on a manifold of much lower dimension in it. Consequently, any generator strategy μ G {\displaystyle \mu _{G}} would almost surely be entirely disjoint from μ r e f {\displaystyle \mu _{ref}} , making D J S ( μ G ‖ μ r e f ) = + ∞ {\displaystyle D_{JS}(\mu _{G}\|\mu _{ref})=+\infty } . Thus, a good discriminator can almost perfectly distinguish μ r e f {\displaystyle \mu _{ref}} from μ G {\displaystyle \mu _{G}} , as well as any μ G ′ {\displaystyle \mu _{G}'} close to μ G {\displaystyle \mu _{G}} . Thus, the gradient ∇ μ G L ( μ G , D ) ≈ 0 {\displaystyle \nabla _{\mu _{G}}L(\mu _{G},D)\approx 0} , creating no learning signal for the generator. Detailed theorems can be found in. == Training Wasserstein GANs == Training the generator in Wasserstein GAN is just gradient descent, the same as in GAN (or most deep learning methods), but training the discriminator is different, as the discriminator is now restricted to have bounded Lipschitz norm. There are several methods for this. === Upper-bounding the Lipschitz norm === Let the discriminator function D {\displaystyle D} to be implemented by a multilayer perceptron: D = D n ∘ D n − 1 ∘ ⋯ ∘ D 1 {\displaystyle D=D_{n}\circ D_{n-1}\circ \cdots \circ D_{1}} where D i ( x ) = h ( W i x ) {\displaystyle D_{i}(x)=h(W_

Alexander Gammerman

Alexander Gammerman (born 2 November 1944) is a British computer scientist, and professor at Royal Holloway University of London. He is the co-inventor of conformal prediction. He is the founding director of the Centre for Machine Learning at Royal Holloway, University of London, and a Fellow of the Royal Statistical Society. == Career == Gammerman's academic career has been pursued in the Soviet Union and the United Kingdom. He started working as a Research Fellow in the Agrophysical Research Institute, St. Petersburg. In 1983, he emigrated to the United Kingdom and was appointed as a lecturer in the Computer Science Department at Heriot-Watt University, Edinburgh. Together with Roger Thatcher, Gammerman published several articles on Bayesian inference. In 1993, he was appointed to the established chair in Computer Science at University of London tenable at Royal Holloway and Bedford New College, where he served as the Head of Computer Science department from 1995 to 2005. In 1998, the Centre for Reliable Machine Learning was established, and Gammerman became the first director of the centre. Gammerman has written 7 books. == Honours and awards == In 1996, Gammerman received the P.W. Allen Award from the Forensic Science Society. In 2006, he became an Honorary Professor, at University College London. In 2009, he became a Distinguished Professor at Complutense University of Madrid, Spain. In 2019, he received a research grant funded by the energy company Centrica about predicting the time to the next failure of equipment. In 2020, he received the Amazon Research Award for the project titled Conformal Martingales for Change-Point Detection == Selected books == Measures of Complexity (2016), Springer, ISBN 3319357786. Algorithmic Learning in a Random World (2005), Springer, ISBN 0387001522. Causal Models and Intelligent Data Management (1999), Springer, ISBN 978-3-642-58648-4. Probabilistic Reasoning and Bayesian Belief Networks (1998), Nelson Thornes Ltd, ISBN 1872474268. Computational Learning and Probabilistic Reasoning (1996), Wiley, ISBN 0471962791.

ReRites

ReRites (also known as RERITES, ReadingRites, Big Data Poetry) is a literary work of "Human + A.I. poetry" by David Jhave Johnston that used neural network models trained to generate poetry which the author then edited. ReRites won the Robert Coover Award for a Work of Electronic Literature in 2022. == About the project == The ReRites project began as a daily rite of writing with a neural network, expanded into a series of performances from which video documentation has been published online, and concluded with a set of 12 books and an accompanying book of essays published by Anteism Books in 2019. In Electronic Literature, Scott Rettberg describes the early phases of the project in 2016, when it bore the preliminary name Big Data Poetry. Jhave (the artist name that David Jhave Johnston goes by) describes the process of writing ReRites as a rite: "Every morning for 2 hours (normally 6:30–8:30am) I get up and edit the poetic output of a neural net. Deleting, weaving, conjugating, lineating, cohering. Re-writing. Re-wiring authorship: hybrid augmented enhanced evolutionary". There is video documentation of the writing process. The human editing of the neural network's output is fundamental to this project, and Jhave gives examples of both unedited text extracts and his edited versions in publications about the project. Kyle Booten describes ReRites as "simultaneously dusty and outrageously verdant, monotonously sublime and speckled with beautiful and rare specimens". === Performances === ReRites was first shared with an audience through a series of performances where audience members and poets would participate in reading the automatically generated texts, which appeared on screen so fast that human readers could barely keep up. This has been described as allowing participants to "re-discover[..] the peculiar pleasures of being embodied", or, in Jhave's own words, as a space where human participants were "playing their wits and voices against an evocative infinite deep-learning muse". The first performance was at Brown University's Interrupt Festival in 2019. It has been performed many times since, including at the Barbican Centre in London and Anteism Books. === Print publications === For a single year Jhave published one book of poetry from the ReRites project each month. These twelve volumes are accompanied by a book of essays, all published by Anteism Books. The accompanying essays provide critical responses to the project from poets and scholars including Allison Parrish, Johanna Drucker, Kyle Booten, Stephanie Strickland, John Cayley, Lai-Tze Fan, Nick Montfort, Mairéad Byrne, and Chris Funkhouser. Allison Parrish notes elsewhere that these paratexts to ReRites serve a legitimising function for a genre of poetry that is not yet institutionally acknowledged. === Technical details === Starting in 2016 under the name Big Data Poetry, Jhave generated poems using, in his own words, "neural network code (..) adapted from three corporate github-hosted machine-learning libraries: TensorFlow (Google), PyTorch (Facebook), and AWD-LSTM (SalesForce)". He explains that the "models were trained on a customised corpus of 600,000 lines of poetry ranging from the romantic epoch to the 20th century avant garde". Jhave maintains a GitHub repository with some of the code supporting ReRites. == Reception == ReRites is described by John Cayley as "one of the most thorough and beautiful" poetic responses to machine learning. The work's influence on the field of electronic literature was acknowledged in 2022, when the work won the Electronic Literature Organization's Robert Coover Award for a Work of Electronic Literature. The jury described ReRites as particularly poignant in the time of the pandemic, as it was "a documentation of the performance of the private ritual of writing and the obsessive-compulsive need for writers to communicate — even when no one else is reading". The question of authorship and voice in ReRites has been raised by several critics. Although generated poetry is an established genre in electronic literature, Cayley notes that unlike the combinatory poems created by authors like Nick Montfort, where the author explicitly defines which words and phrases will be recombined, ReRites has "not been directed by literary preconceptions inscribed in the program itself, but only by patterns and rhythms pre-existing in the corpora". In an essay for the Australian journal TEXT, David Thomas Henry Wright asks how to understand authorship and authority in ReRites: "Who or what is the authority of the work? The original data fed into the machine, that is not currently retrievable or discernible from the final works? The code that was taken and adapted for his purposes? Or Jhave, the human editor?" Wright concludes that Jhave is the only actor with any intentionality and therefore the authority of the work. The centrality of the human editor is also emphasised by other scholars. In a chapter analysing ReRites Malthe Stavning Erslev argues that the machine learning misrepresents the dataset it is trained on. While ReRites uses 21st century neural networks, it has been compared to earlier literary traditions. Poet Victoria Stanton, who read at one of the ReRites performances, has compared ReRites to found poetry, while David Thomas Henry Wright compares it to the Oulipo movement and Mark Amerika to the cut-up technique. Scholars also position ReRites firmly within the long tradition of generative poetry both in electronic literature and print, stretching from the I Ching, Queneau's Cent Mille Milliards de Poemes and Nabokov's Pale Fire to computer-generated poems like Christopher Strachey's Love Letter Generator (1952) and more contemporary examples. Jhave describes the process of working with the output from the neural network as "carving". In his book My Life as an Artificial Creative Intelligence, Mark Amerika writes that the "method of carving the digital outputs provided by the language model as part of a collaborative remix jam session with GPT-2, where the language artist and the language model play off each other’s unexpected outputs as if caught in a live postproduction set, is one I share with electronic literature composer David Jhave Johnston, whose AI poetry experiments precede my own investigations."

Alberto Broggi

Alberto Broggi is General Manager at VisLab srl (spinoff of the University of Parma acquired by Silicon-Valley company Ambarella Inc. in June 2015) and a professor of Computer Engineering at the University of Parma in Italy. == Research in computer vision, hardware, and AV == Broggi's research activities started in 1991–1994. His group together with the Dipartimento di Elettronica, Politecnico di Torino, Italy, built their own hardware architecture (named PAPRICA, for PArallel PRocessor for Image Checking and Analysis, based on 256 single-bit processing elements working in SIMD fashion) and installed it on board of a mobile laboratory (Mob-Lab) to develop and test some initial concepts in the field of intelligent vehicles. In 1996, Broggi's group worked to develop a real vehicle prototype (named ARGO, a Lancia Thema passenger car which was equipped with vision sensors, processing systems, and vehicle actuators) and developed the necessary software and hardware that made it able to drive autonomously on standard roads. Broggi's research group (called VisLab from then on) gathered all their findings in a book, which was then also translated in Chinese. When Broggi was with the University of Pavia, his research was extended and applied to extreme conditions (automatic driving on snow and ice): in 2001, VisLab led the research effort of providing a vehicle (RAS, Robot Antartico di Superficie) with sensing capabilities so that it was able to automatically follow the vehicle in front. In 2010 Broggi's group embarked on driving 4 vehicles autonomously from Italy to China with no human intervention. This challenge is called VIAC, for VisLab Intercontinental Autonomous Challenge . Soon after this, Broggi was awarded a second ERC grant (Proof of concept) to industrialize some of the results obtained and successfully tested on the VIAC vehicles. On July 12, 2013, VisLab tested the BRAiVE vehicle in downtown Parma, negotiating two-way narrow rural roads, pedestrian crossings, traffic lights, artificial bumps, pedestrian areas, and tight roundabouts. The vehicle traveled from Parma University Campus up to Piazza della Pilotta (downtown Parma): a 20 minutes run in a real environment, together with real traffic at 11am on a working day, that required absolutely no human intervention. Part of this test was driven with nobody in the driver seat, for the first time ever on public roads.