AI News and Guides

Explore the best AI News and Guides — independent reviews, comparisons, pricing and step-by-step how-to guides, curated by Aizhi.

  • Singularity studies

    Singularity studies

    Singularity studies is an interdisciplinary academic field which examines the idea of technological singularity — the hypothesised point at which artificial intelligence may surpass human intelligence, might be attained by artificial intelligence (AI), robotics, and other technologies and sciences, and its social impacts. In this academic field, the study and research are conducted across a broad array of terrains such as information science, robotics, social informatics, economics, philosophy, and ethics. The primary aim of singularity studies is to gain an integrative understanding of the transformation of social systems occurring in tandem with the explosive evolution of AI and also the changes to be effected by such transformation in the view of humans, ethics, and legal systems. == History == An academic work on technological singurality has appeared in computer science, philosophy, sociology, and law since the early 1990s. Early discussions of an intelligence explosion were popularised by science-fiction writer Vernor Vinge in 1993 and later systematised by futurist Ray Kurzweil. Since the 2010s, universities such as Oxford, Stanford, and Keio have established dedicated programmes, while peer-reviewed journals have begun to publish scenario analyses and policy studies. Ongoing debates question the predictive value of singularity scenarios and warn against a deterministic view of technology. == Characteristics of research == Singularity studies extends beyond mere future predictions and offer an intellectual foundation for proactively designing and creating a desirable future. Principal research themes in this realm include: Ethics of AI; Social implications of technologies; Possibility of harmonious coexistence of humans and AI; Communication with AI; and Redesign of social systems. == Technologists and academics == Vernor Vinge: Propounded the concept of singularity in 1993, making a massive impact on the academic and science-fiction spheres. Ray Kurzweil: Predicted the advent around 2045 of the technological singularity in his 2005 book The Singularity Is Near. Nick Bostrom: Offered philosophical reflections on superintelligence and the risks posed by AI. He is the founding director of the now-dissolved Future of Humanity Institute at the University of Oxford. === Japan === Kento Sasano: A social informatician, AI educator, and inventor. He is the president of the Japan Society of Singularity Studies. == Challenges and outlook == Singularity studies is still evolving as an academic field, and quite a few challenges remain unresolved in regard to the systematization of their theories, research methods, and educational curricula. That said, in this day and age of accelerating technological and societal shifts, interdisciplinary approaches have gained in importance and are drawing much attention in the arenas of scholarly research, intercorporate collaboration, and policy planning.

    Read more →
  • Lenna

    Lenna

    Lenna (or Lena) is a standard test image used in the field of digital image processing, starting in 1973. It is a picture of the Swedish model Lena Forsén, shot by photographer Dwight Hooker and cropped from the centerfold of the November 1972 issue of Playboy magazine. Lenna has attracted controversy because of its subject matter. Starting in the mid-2010s, many journals have deemed it inappropriate and discouraged its use, while others have banned it from publication outright. Forsén herself has called for it to be retired, saying "It's time I retired from tech." The spelling "Lenna" came from the model's desire to encourage the proper pronunciation of her name. "I didn't want to be called Leena [English: ]," she explained. == History == Before Lenna, the first use of a Playboy magazine image to illustrate image processing algorithms was in 1961. Lawrence G. Roberts used two cropped six-bit grayscale facsimile scanned images from Playboy's July 1960 issue featuring Playmate Teddi Smith, in his master's thesis on image dithering at Massachusetts Institute of Technology. Lenna was originally intended for high resolution color image processing study. Its history was described in the May 2001 newsletter of the IEEE Professional Communication Society, in an article by Jamie Hutchinson: Alexander Sawchuk estimates that it was in June or July of 1973 when he, then an assistant professor of electrical engineering at the University of Southern California Signal and Image Processing Institute (SIPI), along with a graduate student and the SIPI lab manager, was hurriedly searching the lab for a good image to scan for a colleague's conference paper. They got tired of their stock of usual test images, dull stuff dating back to television standards work in the early 1960s. They wanted something glossy to ensure good output dynamic range, and they wanted a human face. Just then, somebody happened to walk in with a recent issue of Playboy. The engineers tore away the top third of the centerfold so they could wrap it around the drum of their Muirhead wirephoto scanner, which they had outfitted with analog-to-digital converters (one each for the red, green, and blue channels) and a Hewlett Packard 2100 minicomputer. The Muirhead had a fixed resolution of 100 lines per inch and the engineers wanted a 512×512 image, so they limited the scan to the top 5.12 inches of the picture, effectively cropping it at the subject's shoulders. The image's reach was limited in the 1970s and 80s, which is reflected in it initially only appearing in .org domains, but in July 1991, the image featured on the cover of Optical Engineering alongside Peppers, another popular test image. This drew the attention of Playboy to the potential copyright infringement. The peak of image hits on the internet was in 1995. The scan became one of the most used images in computer history. The use of the photo in electronic imaging has been described as "clearly one of the most important events in [its] history". The image spread to over 100 different domains, particularly .com and .edu. In a 1999 issue of IEEE Transactions on Image Processing "Lena" was used in three separate articles, and the picture continued to appear in scientific journals throughout the beginning of the 21st century. Lenna is so widely accepted in the image processing community that Forsén was a guest at the 50th annual Conference of the Society for Imaging Science and Technology (IS&T) in 1997. In 2015, Lena Forsén was also guest of honor at the banquet of IEEE ICIP 2015. After delivering a speech, she chaired the best paper award ceremony. To explain why the image became a standard in the field, David C. Munson, editor-in-chief of IEEE Transactions on Image Processing, stated that it was a good test image because of its detail, flat regions, shading, and texture. He also noted that "the Lena image is a picture of an attractive woman. It is not surprising that the (mostly male) image processing research community gravitated toward an image that they found attractive." While Playboy often cracks down on illegal uses of its material and did initially send a notice to the publisher of Optical Engineering about its unauthorized use in that publication, over time it has decided to overlook the wide use of Lena. Eileen Kent, VP of new media at Playboy, said, "We decided we should exploit this, because it is a phenomenon." == Criticism == The use of the image has produced controversy because Playboy is "seen (by some) as being degrading to women". In a 1999 essay on reasons for the male predominance in computer science, applied mathematician Dianne P. O'Leary wrote: Suggestive pictures used in lectures on image processing ... convey the message that the lecturer caters to the males only. For example, it is amazing that the "Lena" pin-up image is still used as an example in courses and published as a test image in journals today. A 2012 paper on compressed sensing used a photo of the model Fabio Lanzoni as a test image to draw attention to this issue. The use of the test image at the magnet school Thomas Jefferson High School for Science and Technology in Fairfax County, Virginia, provoked a guest editorial by a senior in The Washington Post in 2015 about its detrimental impact on aspiring female students in computer science. In 2017, the Journal of Modern Optics published an editorial titled "On alternatives to Lenna" suggesting three images (Pirate, Cameraman, and Peppers) that "are reasonably close to Lenna in feature space". In 2018, the Nature Nanotechnology journal announced that they would no longer consider articles using Lenna. In the same year SPIE, the publishers of Optical Engineering, also announced that they "strongly discourage" the use of Lenna, and would no longer consider new submissions containing the image "without convincing scientific justification for its use". They noted that aside from the copyright and ethical issues, that it was also no longer useful as a standard image: "In today's age of high-resolution digital image technology, it seems difficult to argue that a 512 × 512 image produced with a 1970s-era analog scanner is the best we have to offer as an image quality test standard". Forsén stated in the 2019 documentary film Losing Lena, "I retired from modeling a long time ago. It's time I retired from tech, too... Let's commit to losing me." The Institute of Electrical and Electronics Engineers (IEEE) announced that, starting April 1, 2024, it will no longer allow use of Lenna in its publications.

    Read more →
  • Haskins Laboratories

    Haskins Laboratories

    Haskins Laboratories, Inc. is an independent research laboratory, founded in 1935 and located in New Haven, Connecticut since 1970. Many current Haskins researchers are affiliated with Yale University's Child Study Center and/or the University of Connecticut. Haskins is a multidisciplinary and international community of researchers who conduct basic research on spoken and written language and global literacy. A guiding perspective of their research has been to view speech and language as emerging from biological processes, including those of adaptation, response to stimuli, and conspecific interaction. Haskins Laboratories has a long history of technological and theoretical innovation, from creating systems of rules for speech synthesis and development of an early working prototype of a reading machine for the blind to developing the landmark concept of phonemic awareness as the critical preparation for learning to read an alphabetic writing system. == Research tools and facilities == Haskins Laboratories is equipped, in-house, with a comprehensive suite of tools and capabilities to advance its mission of research into language and literacy. As of 2014, these included: Anechoic chamber Electroencephalography BioSemi 264 electrode, 24 bit Active Two System EGI 128 electrode, Geodesic EEG System 300 Electromagnetic articulography (EMMA) Carstens AG501 NDI WAVE Eye tracking: HL is equipped with 3 SR Research eye-trackers. 2 Model Eyelink 1000 systems. 1 Model Eyelink 1000plus system. Magnetic resonance imaging: Haskins has access to MRI scanners through agreements with the University of Connecticut and the Yale School of Medicine. On-site, HL has a Linux computer cluster dedicated to analysis of MRI data. Motion capture: HL is equipped with a Vicon motion capture system with one Basler high-speed digital camera, six Vicon MX T-20 cameras and a Vicon MX Giganet for synching camera data and connecting cameras to the data capture computer. Near infrared spectroscopy: HL has a TechEn CW6 8x8 system (four emitters; eight detectors). Ultrasound sonogram == History == Many researchers have contributed to scientific breakthroughs at Haskins Laboratories since its founding. All of them are indebted to the pioneering work and leadership of Caryl Parker Haskins, Franklin S. Cooper, Alvin Liberman, Seymour Hutner and Luigi Provasoli. The history presented here focuses on the research program of the division of Haskins Laboratories that, since the 1940s, has been most well known for its work in the areas of speech, language, and reading. === 1930s === Caryl Haskins and Franklin S. Cooper established Haskins Laboratories in 1935. It was originally affiliated with Harvard University, MIT, and Union College in Schenectady, NY. Caryl Haskins conducted research in microbiology, radiation physics, and other fields in Cambridge, MA and Schenectady. In 1939 Haskins Laboratories moved its center to New York City. Seymour Hutner joined the staff to set up a research program in microbiology, genetics, and nutrition. The descendant of the division led by Hutner program eventually became a department of Pace University in New York. The two identically named organizations are no longer formally affiliated. === 1940s === The U. S. Office of Scientific Research and Development, under Vannevar Bush asked Haskins Laboratories to evaluate and develop technologies for assisting blinded World War II veterans. Experimental psychologist Alvin Liberman joined Haskins Laboratories to assist in developing a "sound alphabet" to represent the letters in a text for use in a reading machine for the blind. Luigi Provasoli joined Haskins Laboratories to set up a research program in marine biology. The program in marine biology moved to Yale University in 1970 and disbanded with Provasoli's retirement in 1978. === 1950s === Franklin S. Cooper invented the pattern playback, a machine that converts pictures of the acoustic patterns of speech back into sound. With this device, Alvin Liberman, Cooper, and Pierre Delattre (and later joined by Katherine Safford Harris, Leigh Lisker, Arthur Abramson, and others), discovered the acoustic cues for the perception of phonetic segments (consonants and vowels). Liberman and colleagues proposed a motor theory of speech perception to resolve the acoustic complexity: they hypothesized that we perceive speech by tapping into a biological specialization, a speech module, that contains knowledge of the acoustic consequences of articulation. Liberman, aided by Frances Ingemann and others, organized the results of the work on speech cues into a groundbreaking set of rules for speech synthesis by the Pattern Playback. === 1960s === Franklin S. Cooper and Katherine Safford Harris, working with Peter MacNeilage, were the first researchers in the U.S. to use electromyographic techniques, pioneered at the University of Tokyo, to study the neuromuscular organization of speech. Leigh Lisker and Arthur Abramson looked for simplification at the level of articulatory action in the voicing of certain contrasting consonants. They showed that many acoustic properties of voicing contrasts arise from variations in voice onset time, the relative phasing of the onset of vocal cord vibration and the end of a consonant. Their work has been widely replicated and elaborated, here and abroad, over the following decades. Donald Shankweiler and Michael Studdert-Kennedy used a dichotic listening technique (presenting different nonsense syllables simultaneously to opposite ears) to demonstrate the dissociation of phonetic (speech) and auditory (nonspeech) perception by finding that phonetic structure devoid of meaning is an integral part of language, typically processed in the left cerebral hemisphere. Liberman, Cooper, Shankweiler, and Studdert-Kennedy summarized and interpreted fifteen years of research in "Perception of the Speech Code", still among the most cited papers in the speech literature. It set the agenda for many years of research at Haskins and elsewhere by describing speech as a code in which speakers overlap (or coarticulate) segments to form syllables. Researchers at Haskins connected their first computer to a speech synthesizer designed by Haskins Laboratories' engineers. Ignatius Mattingly, with British collaborators, John N. Holmes and J.N. Shearme, adapted the Pattern playback rules to write the first computer program for synthesizing continuous speech from a phonetically spelled input. A further step toward a reading machine for the blind combined Mattingly's program with an automatic look-up procedure for converting alphabetic text into strings of phonetic symbols. === 1970s === In 1970, Haskins Laboratories moved to New Haven, Connecticut, and entered into affiliation agreements with Yale University and the University of Connecticut; Haskins remains fully independent of both Yale and UConn, administratively and financially. The lab's original location in New Haven, at 270 Crown Street (from 1970 to 2005), was leased from Yale University. Isabelle Liberman, Donald Shankweiler, and Alvin Liberman teamed up with Ignatius Mattingly to study the relationship between speech perception and reading, a topic implicit in Haskins Laboratories' research program since its inception. They developed the concept of phonemic awareness, the knowledge that would-be readers must be aware of the phonemic structure of their language in order to be able to read. Leonard Katz related the work to contemporary cognitive theory and provided expertise in experimental design and data analysis. Under the broad rubric of the "alphabetic principle", this is the core of the lab's present program of reading pedagogy. Patrick Nye joined Haskins Laboratories to lead a team working on the reading machine for the blind. The project culminated when the addition of an optical character recognizer allowed investigators to assemble the first automatic text-to-speech reading machine. By the end of the decade this technology had advanced to the point where commercial concerns assumed the task of designing and manufacturing reading machines for the blind. In 1973, Franklin S. Cooper was selected to form a panel of six experts charged with investigating the famous 18-minute gap in the White House office tapes of President Richard Nixon related to the Watergate scandal. Building on earlier work, Philip Rubin developed the sinewave synthesis program, which was then used by Robert Remez, Rubin, and colleagues to show that listeners can perceive continuous speech without traditional speech cues from a pattern of sinewaves that track the changing resonances of the vocal tract. This paved the way for a view of speech as a dynamic pattern of trajectories through articulatory-acoustic space. Philip Rubin and colleagues developed Paul Mermelstein's anatomically simplified vocal tract model, originally worked on at Bell Laboratories, into the first articulatory synthesizer that can be controlled in a phy

    Read more →
  • Buckeye Corpus

    Buckeye Corpus

    The Buckeye Corpus of conversational speech is a speech corpus created by a team of linguists and psychologists at Ohio State University led by Prof. Mark Pitt. It contains high-quality recordings from 40 speakers in Columbus, Ohio conversing freely with an interviewer. The interviewer's voice is heard only faintly in the background of these recordings. The sessions were conducted as Sociolinguistics interviews, and are essentially monologues. The speech has been orthographically transcribed and phonetically labeled. The audio and text files, together with time-aligned phonetic labels, are stored in a format for use with speech analysis software (Xwaves and Wavesurfer). Software for searching the transcription files is also available at the project web site. The corpus is available to researchers in academia and industry. The project was funded by the National Institute on Deafness and Other Communication Disorders and the Office of Research at Ohio State University.

    Read more →
  • Supper (Spotify)

    Supper (Spotify)

    Supper is a web-based application on the Spotify digital music streaming platform. The Supper app was born from a group of friends who had backgrounds in the music and gastronomy industries. Digital music solutions company Artisan Council later executed it. The app now sits in the top 40 applications on Spotify. == About == The Supper Spotify application matches recipes for all occasions and skill levels with a playlist for both preparation and presentation, as envisioned by the chefs themselves. Supper is credited with being one of the first apps to pair music with food. Playing on the social nature of music and food culture, users can seamlessly experience both for the first time with real time music streaming. == Supper.mx == In May 2014 Supper was launched outside of the Spotify streaming platform. Though still in partnership with Spotify, supper.mx allows users to view Supper's music + food collaborations on mobile, tablet and desktop, without the need to download Spotify directly. == Curators == All of the recipes and playlists featured on the Supper app come straight from a growing network of tastemakers, including chefs, musicians and institutions around the world. Each month the recipes and playlists are updated in conjunction with current holidays, events and seasons. === Launch === Launching in October 2013 the first edition of Supper featured content from a range of eating institutions and culture makers from the US and Australia. Brooklyn Bowl (Brooklyn) Roberta's Pizza (Brooklyn) Fancy Hanks (Melbourne) The Foresters/Queenies Upstairs (Sydney) Hipstamatic Panama House (Bondi) Sweetwater Inn (Melbourne) Soul Clap (Syd record label) Yellow Birds (Melbourne) === November 2013 === Yardbird (Hong Kong) Sonoma Bakery (Sydney) Do or Dine (Brooklyn) Cameo Gallery (Brooklyn) Hypertrak (Blog) Blue Smoke (NYC) The Crepes of Wrath (Blog) Willin Low // Wild Rocket - Wild Oats - Relish === December 2013 === The Copper Mill (Sydney) Thug Kitchen Mamak (Sydney) Tutu's (Brooklyn) Chin Chin (Melbourne) Flat Iron Steak (London) Greasy Spoon (Copenhagen) === January 2014 === Mexicali Taco & Co. (LA) Church & State (LA) Salts Cure (LA) Nopa (SF) L & E Oyster (LA) 4100 bar (LA) Golden Gopher (LA) The Pie Hole (LA) State Bird Provisions (SF) === Momofuku === In February 2014 Supper teamed up with restaurant heavy weights Momofuku. The recipes featured came from their iconic New York, Toronto and Sydney restaurants. Head office also got involved with an instructional from Brand Director Sue Chan on how to paint Momofuku vibes on to any party. === SXSW === March sees the Supper team migrate to Austin, Texas for SXSW, bringing together the best eateries the city has to offer as well as the music that has influenced them. Restaurants and eateries on board in 2014 included: The Backspace Kelis Swifts Attic Uchi Jackalope Paul Qui/East Side King Thai Kun Wonderland Hole in the Wall Justine's Brasserie The Liberty === Kelis === In April 2014 Kelis presented 5 of her recipes paired with a personal playlist for Supper. Kelis shared her recipes for apple farro, jerk ribs, New York vanilla bean cheesecake and Jerk Ribs. The Kelis/Supper collaboration coincided with the release of Kelis' 2014 album titled 'Food'. === Roberta's Pizza === In May 2014 Bushwick's Roberta's Pizza was guest curator on the Supper app and website. Included in their selections were restaurants and bars from across New York including Bun-ker Vietnamese, Old Stanley's Bar, St. Anselm, Chuko, Frank's Cocktail Lounge, Junior's Cheesecake, Xi'an Famous Foods, Xe Lua, 124 Old Rabbit and Yuji Ramen.

    Read more →
  • Separable filter

    Separable filter

    A separable filter in image processing can be written as product of two more simple filters. Typically a 2-dimensional convolution operation is separated into two 1-dimensional filters. This reduces the computational costs on an N × M {\displaystyle N\times M} image with a m × n {\displaystyle m\times n} filter from O ( M ⋅ N ⋅ m ⋅ n ) {\displaystyle {\mathcal {O}}(M\cdot N\cdot m\cdot n)} down to O ( M ⋅ N ⋅ ( m + n ) ) {\displaystyle {\mathcal {O}}(M\cdot N\cdot (m+n))} . == Examples == 1. A two-dimensional smoothing filter: 1 3 [ 1 1 1 ] ∗ 1 3 [ 1 1 1 ] = 1 9 [ 1 1 1 1 1 1 1 1 1 ] {\displaystyle {\frac {1}{3}}{\begin{bmatrix}1\\1\\1\end{bmatrix}}{\frac {1}{3}}{\begin{bmatrix}1&1&1\end{bmatrix}}={\frac {1}{9}}{\begin{bmatrix}1&1&1\\1&1&1\\1&1&1\end{bmatrix}}} 2. Another two-dimensional smoothing filter with stronger weight in the middle: 1 4 [ 1 2 1 ] ∗ 1 4 [ 1 2 1 ] = 1 16 [ 1 2 1 2 4 2 1 2 1 ] {\displaystyle {\frac {1}{4}}{\begin{bmatrix}1\\2\\1\end{bmatrix}}{\frac {1}{4}}{\begin{bmatrix}1&2&1\end{bmatrix}}={\frac {1}{16}}{\begin{bmatrix}1&2&1\\2&4&2\\1&2&1\end{bmatrix}}} 3. The Sobel operator, used commonly for edge detection: [ 1 2 1 ] ∗ [ 1 0 − 1 ] = [ 1 0 − 1 2 0 − 2 1 0 − 1 ] {\displaystyle {\begin{bmatrix}1\\2\\1\end{bmatrix}}{\begin{bmatrix}1&0&-1\end{bmatrix}}={\begin{bmatrix}1&0&-1\\2&0&-2\\1&0&-1\end{bmatrix}}} This works also for the Prewitt operator. In the examples, there is a cost of 3 multiply–accumulate operations for each vector which gives six total (horizontal and vertical). This is compared to the nine operations for the full 3x3 matrix. Another notable example of a separable filter is the Gaussian blur whose performance can be greatly improved the bigger the convolution window becomes.

    Read more →
  • The Triple Revolution

    The Triple Revolution

    "The Triple Revolution" was an open memorandum sent to U.S. President Lyndon B. Johnson and other government figures on March 22, 1964. It concerned three megatrends of the time: increasing use of automation, the nuclear arms race, and advancements in human rights. Drafted under the auspices of the Center for the Study of Democratic Institutions, it was signed by an array of noted social activists, professors, and technologists who identified themselves as the Ad Hoc Committee on the Triple Revolution. The chief initiator of the proposal was W. H. "Ping" Ferry, at that time a vice-president of CSDI, basing it in large part on the ideas of the futurist Robert Theobald. == Overview == The statement identified three revolutions underway in the world: the cybernation revolution of increasing automation; the weaponry revolution of mutually assured destruction; and the human rights revolution. It discussed primarily the cybernation revolution. The committee claimed that machines would usher in "a system of almost unlimited productive capacity" while continually reducing the number of manual laborers needed, and increasing the skill needed to work, thereby producing increasing levels of unemployment. It proposed that the government should ease this transformation through large-scale public works, low-cost housing, public transit, electrical power development, income redistribution, union representation for the unemployed, and government restraint on technology deployment. == Legacy == Martin Luther King Jr.'s final Sunday sermon, delivered six days before his April 1968 assassination, explicitly references the thesis of "The Triple Revolution": There can be no gainsaying of the fact that a great revolution is taking place in the world today. In a sense it is a triple revolution: that is, a technological revolution, with the impact of automation and cybernation; then there is a revolution in weaponry, with the emergence of atomic and nuclear weapons of warfare; then there is a human rights revolution, with the freedom explosion that is taking place all over the world. Yes, we do live in a period where changes are taking place. And there is still the voice crying through the vista of time saying, "Behold, I make all things new; former things are passed away." In Harlan Ellison's 1967 anthology Dangerous Visions, Philip José Farmer's story "Riders of the Purple Wage" uses the Triple Revolution document as the premise of a future society, in which the "purple wage" of the title is a guaranteed income dole on which most of the population lives. At the 1968 World Science Fiction Convention in San Francisco, Farmer delivered a lengthy Guest of Honor speech in which he called for the founding of a grassroots activist organization called REAP which would work for implementation of the Ad Hoc Committee's recommendations. Looking back on the proposal in his 2008 book, Daniel Bell wrote: "the cybernetic revolution quickly proved to be illusory. There were no spectacular jumps in productivity. ... Cybernation had proved to be one more instance of the penchant for overdramatizing a momentary innovation and blowing it up far out of proportion to its actuality. ... The image of a completely automated production economy—with an endless capacity to turn out goods—was simply a social-science fiction of the early 1960s. Paradoxically, the vision of Utopia was suddenly replaced by the spectre of Doomsday. In place of the early-sixties theme of endless plenty, the picture by the end of the decade was one of a fragile planet of limited resources whose finite stocks were being rapidly depleted, and whose wastes from soaring industrial production were polluting the air and waters." In his 2015 book Rise of the Robots, Martin Ford claims The Triple Revolution's predictions of steady decline in future employment were not wrong, but rather premature. He cites "Seven Deadly Trends" that began in the 1970s-1980s and by the mid-2010s appeared set to continue: Stagnation in real wages Decline in labor's share of national income in many countries (breakdown of Bowley's law), while corporate profits increased Declining labor force participation Diminishing job creation, lengthening jobless recoveries, and soaring long-term unemployment Rising inequality Declining incomes, and underemployment for recent college graduates Polarization and part-time jobs (middle-class jobs are disappearing, to be replaced by a small number of high-paying jobs and large number of low-paying jobs) According to Ford, the 1960s were part of what in retrospect seems like a golden age for labor in the United States, when productivity and wages rose together in near lockstep, and unemployment was low. But after about 1980, wages began stagnating while productivity continued to rise. Labor's share of the economic output began to decline. Ford describes the role that automation and information technology play in these trends, and how new technologies including narrow AI threaten to destroy jobs faster than displaced workers can be retrained for new jobs, before automation takes the new jobs as well. This includes many job categories, such as in transportation, that were never threatened by automation before. According to a 2013 study, about 47% of US jobs are susceptible to automation. == Signatories ==

    Read more →
  • Super-resolution imaging

    Super-resolution imaging

    Super-resolution imaging (SR) is a class of techniques that improve the resolution of an imaging system. In optical SR the diffraction limit of systems is transcended, while in geometrical SR the resolution of digital imaging sensors is enhanced. In some radar and sonar imaging applications (e.g. magnetic resonance imaging (MRI), high-resolution computed tomography), subspace decomposition-based methods (e.g. MUSIC) and compressed sensing-based algorithms (e.g., SAMV) are employed to achieve SR over standard periodogram algorithm. Super-resolution imaging techniques are used in general image processing and in super-resolution microscopy. == Super-resolution principles == Several concepts are fundamental to super-resolution imaging: Diffraction limit: the capacity of an optical instrument to reproduce the details of an object in an image has limits that are imposed by laws of physics: the diffraction equations in the wave theory of light, or the uncertainty principle for photons in quantum mechanics. Information transfer can never be increased beyond this boundary, but packets outside the limits can be cleverly swapped for (or multiplexed with) some inside it. Super-resolution microscopy does not so much “break” as “circumvent” the diffraction limit. New procedures probing electro-magnetic disturbances at the molecular level (in the so-called near field) remain fully consistent with Maxwell's equations. Spatial frequency domain: A succinct expression of the diffraction limit is given in the spatial frequency domain. In Fourier optics light distributions are expressed as superpositions of a series of grating light patterns in a range of fringe widths - these widths represent the spatial frequencies. It is generally taught that diffraction theory stipulates an upper limit, the cut-off spatial-frequency, beyond which pattern elements fail to be transferred into the optical image, i.e., are not resolved. But in fact what is set by diffraction theory is the width of the passband, not a fixed upper limit. No laws of physics are broken when a spatial frequency band beyond the cut-off spatial frequency is swapped for one inside it: this has long been implemented in dark-field microscopy. Nor are information-theoretical rules broken when superimposing several bands, disentangling them in the received image needs assumptions of object invariance during multiple exposures, i.e., the substitution of one kind of uncertainty for another. Information: When the term super-resolution is used in techniques based on the inference of object details using a statistical treatment of the image within standard resolution limits (for example, averaging multiple exposures), it involves an exchange of one kind of information (extracting signal from noise) for another (the assumption that the target has remained invariant). Recent breakthroughs incorporate quantum-transformer hybrids into super-resolution, such as QUIET‑SR, a 2025 model that employs shifted quantum window attention within a transformer to enhance image detail while respecting diffraction and information-theory limits Similarly, frequency-integrated transformers (e.g., FIT) enrich super-resolution by explicitly combining spatial and frequency-domain information via FFT-based attention, improving reconstruction across scales Resolution and localization: True resolution involves the distinction of whether a target, e.g. a star or a spectral line, is single or double, ordinarily requiring separable peaks in the image. When a target is known to be single, its location can be determined with higher precision than the image width by finding the centroid (center of gravity) of its image light distribution. The word ultra-resolution had been proposed for this process but it did not catch on, and the high-precision localization procedure is typically referred to as super-resolution. == Techniques == === Optical or diffractive super-resolution === Substituting spatial-frequency bands: Though the bandwidth allowable by diffraction is fixed, it can be positioned anywhere in the spatial-frequency spectrum. Dark-field illumination in microscopy is an example. See also aperture synthesis. ==== Multiplexing spatial-frequency bands ==== An image is formed using the normal passband of the optical device. Then, some known light structure (for example, a set of light fringes) is superimposed on the target. The image now contains components resulting from the combination of the target and the superimposed light structure, e.g. moiré fringes, and carries information about target detail which simple unstructured illumination does not. The “superresolved” components, however, need disentangling to be revealed. For an example, see structured illumination (figure to left). ==== Multiple parameter use within traditional diffraction limit ==== If a target has no special polarization or wavelength properties, two polarization states or non-overlapping wavelength regions can be used to encode target details, one in a spatial-frequency band inside the cut-off limit the other beyond it. Both would use normal passband transmission but are then separately decoded to reconstitute target structure with extended resolution. ==== Probing near-field electromagnetic disturbance ==== Super-resolution microscopy is generally discussed within the realm of conventional optical imagery. However, modern technology allows the probing of electromagnetic disturbance within molecular distances of the source, which has superior resolution properties. See also evanescent waves and the development of the new super lens. === Geometrical or image-processing super-resolution === ==== Multi-exposure image noise reduction ==== When an image is degraded by noise, the resolution may be improved by averaging multiple exposures. See example on the right. ==== Single-frame deblurring ==== Known defects in a given imaging situation, such as defocus or aberrations, can sometimes be mitigated in whole or in part by suitable spatial-frequency filtering of even a single image. Such procedures all stay within the diffraction-mandated passband, and do not extend it. ==== Sub-pixel image localization ==== The location of a single source can be determined by computing the "center of gravity" (centroid) of the light distribution extending over several adjacent pixels (see figure on the left). Provided that there is enough light, this can be achieved with arbitrary precision, very much better than pixel width of the detecting apparatus and the resolution limit for the decision of whether the source is single or double. This technique, which requires the presupposition that all the light comes from a single source, is at the basis of what has become known as super-resolution microscopy, e.g. stochastic optical reconstruction microscopy (STORM), where fluorescent probes attached to molecules give nanoscale distance information. It is also the mechanism underlying visual hyperacuity. ==== Bayesian induction beyond traditional diffraction limit ==== Some object features, though beyond the diffraction limit, may be known to be associated with other object features that are within the limits and hence contained in the image. Then conclusions can be drawn, using statistical methods, from the available image data about the presence of the full object. The classical example is Toraldo di Francia's proposition of judging whether an image is that of a single or double star by determining whether its width exceeds the spread from a single star. This can be achieved at separations well below the classical resolution bounds, and requires the prior limitation to the choice "single or double?" The approach can take the form of extrapolating the image in the frequency domain, by assuming that the object is an analytic function, and that we can exactly know the function values in some interval. This method is severely limited by the ever-present noise in digital imaging systems, but it can work for radar, astronomy, microscopy or magnetic resonance imaging. More recently, a fast single image super-resolution algorithm based on a closed-form solution to ℓ 2 − ℓ 2 {\displaystyle \ell _{2}-\ell _{2}} problems has been proposed and demonstrated to accelerate most of the existing Bayesian super-resolution methods significantly. == Aliasing == Geometrical SR reconstruction algorithms are possible if and only if the input low resolution images have been under-sampled and therefore contain aliasing. Because of this aliasing, the high-frequency content of the desired reconstruction image is embedded in the low-frequency content of each of the observed images. Given a sufficient number of observation images, and if the set of observations vary in their phase (i.e. if the images of the scene are shifted by a sub-pixel amount), then the phase information can be used to separate the aliased high-frequency content from the true low-frequency content, and the full-resolution image can be accurate

    Read more →
  • Ugly duckling theorem

    Ugly duckling theorem

    The ugly duckling theorem is an argument showing that classification is not really possible without some sort of bias. More particularly, it assumes finitely many properties combinable by logical connectives, and finitely many objects; it asserts that any two different objects share the same number of (extensional) properties. The theorem is named after Hans Christian Andersen's 1843 story "The Ugly Duckling", because it shows that a duckling is just as similar to a swan as two swans are to each other. It was derived by Satosi Watanabe in 1969. == Mathematical formula == Suppose there are n things in the universe, and one wants to put them into classes or categories. One has no preconceived ideas or biases about what sorts of categories are "natural" or "normal" and what are not. So one has to consider all the possible classes that could be, all the possible ways of making a set out of the n objects. There are 2 n {\displaystyle 2^{n}} such ways, the size of the power set of n objects. One can use that to measure the similarity between two objects, and one would see how many sets they have in common. However, one cannot. Any two objects have exactly the same number of classes in common if we can form any possible class, namely 2 n − 1 {\displaystyle 2^{n-1}} (half the total number of classes there are). To see this is so, one may imagine each class is represented by an n-bit string (or binary encoded integer), with a zero for each element not in the class and a one for each element in the class. As one finds, there are 2 n {\displaystyle 2^{n}} such strings. As all possible choices of zeros and ones are there, any two bit-positions will agree exactly half the time. One may pick two elements and reorder the bits so they are the first two, and imagine the numbers sorted lexicographically. The first 2 n / 2 {\displaystyle 2^{n}/2} numbers will have bit #1 set to zero, and the second 2 n / 2 {\displaystyle 2^{n}/2} will have it set to one. Within each of those blocks, the top 2 n / 4 {\displaystyle 2^{n}/4} will have bit #2 set to zero and the other 2 n / 4 {\displaystyle 2^{n}/4} will have it as one, so they agree on two blocks of 2 n / 4 {\displaystyle 2^{n}/4} or on half of all the cases, no matter which two elements one picks. So if we have no preconceived bias about which categories are better, everything is then equally similar (or equally dissimilar). The number of predicates simultaneously satisfied by two non-identical elements is constant over all such pairs. Thus, some kind of inductive bias is needed to make judgements to prefer certain categories over others. === Boolean functions === Let x 1 , x 2 , … , x n {\displaystyle x_{1},x_{2},\dots ,x_{n}} be a set of vectors of k {\displaystyle k} booleans each. The ugly duckling is the vector which is least like the others. Given the booleans, this can be computed using Hamming distance. However, the choice of boolean features to consider could have been somewhat arbitrary. Perhaps there were features derivable from the original features that were important for identifying the ugly duckling. The set of booleans in the vector can be extended with new features computed as boolean functions of the k {\displaystyle k} original features. The only canonical way to do this is to extend it with all possible Boolean functions. The resulting completed vectors have 2 k {\displaystyle 2^{k}} features. The ugly duckling theorem states that there is no ugly duckling because any two completed vectors will either be equal or differ in exactly half of the features. Proof. Let x and y be two vectors. If they are the same, then their completed vectors must also be the same because any Boolean function of x will agree with the same Boolean function of y. If x and y are different, then there exists a coordinate i {\displaystyle i} where the i {\displaystyle i} -th coordinate of x {\displaystyle x} differs from the i {\displaystyle i} -th coordinate of y {\displaystyle y} . Now the completed features contain every Boolean function on k {\displaystyle k} Boolean variables, with each one exactly once. Viewing these Boolean functions as polynomials in k {\displaystyle k} variables over GF(2), segregate the functions into pairs ( f , g ) {\displaystyle (f,g)} where f {\displaystyle f} contains the i {\displaystyle i} -th coordinate as a linear term and g {\displaystyle g} is f {\displaystyle f} without that linear term. Now, for every such pair ( f , g ) {\displaystyle (f,g)} , x {\displaystyle x} and y {\displaystyle y} will agree on exactly one of the two functions. If they agree on one, they must disagree on the other and vice versa. (This proof is believed to be due to Watanabe.) == Discussion == A possible way around the ugly duckling theorem would be to introduce a constraint on how similarity is measured by limiting the properties involved in classification, for instance, between A and B. However Medin et al. (1993) point out that this does not actually resolve the arbitrariness or bias problem since in what respects A is similar to B: "varies with the stimulus context and task, so that there is no unique answer, to the question of how similar is one object to another". For example, "a barberpole and a zebra would be more similar than a horse and a zebra if the feature striped had sufficient weight. Of course, if these feature weights were fixed, then these similarity relations would be constrained". Yet the property "striped" as a weight 'fix' or constraint is arbitrary itself, meaning: "unless one can specify such criteria, then the claim that categorization is based on attribute matching is almost entirely vacuous". Stamos (2003) remarked that some judgments of overall similarity are non-arbitrary in the sense they are useful: "Presumably, people's perceptual and conceptual processes have evolved that information that matters to human needs and goals can be roughly approximated by a similarity heuristic... If you are in the jungle and you see a tiger but you decide not to stereotype (perhaps because you believe that similarity is a false friend), then you will probably be eaten. In other words, in the biological world stereotyping based on veridical judgments of overall similarity statistically results in greater survival and reproductive success." Unless some properties are considered more salient, or 'weighted' more important than others, everything will appear equally similar, hence Watanabe (1986) wrote: "any objects, in so far as they are distinguishable, are equally similar". In a weaker setting that assumes infinitely many properties, Murphy and Medin (1985) give an example of two putative classified things, plums and lawnmowers: "Suppose that one is to list the attributes that plums and lawnmowers have in common in order to judge their similarity. It is easy to see that the list could be infinite: Both weigh less than 10,000 kg (and less than 10,001 kg), both did not exist 10,000,000 years ago (and 10,000,001 years ago), both cannot hear well, both can be dropped, both take up space, and so on. Likewise, the list of differences could be infinite… any two entities can be arbitrarily similar or dissimilar by changing the criterion of what counts as a relevant attribute." According to Woodward, the ugly duckling theorem is related to Schaffer's Conservation Law for Generalization Performance, which states that all algorithms for learning of boolean functions from input/output examples have the same overall generalization performance as random guessing. The latter result is generalized by Woodward to functions on countably infinite domains.

    Read more →
  • Dispo

    Dispo

    Dispo (formerly David's Disposable) is an American photo sharing and social networking app owned by Dispo, Inc. and co-founded by CEO Daniel Liss, YouTuber David Dobrik, and Natalie Mariduena. When the app initially launched on iOS in December 2019, it briefly charted as the most downloaded free app on the App Store, ahead of both Disney+ and Instagram. The app was rebranded and relaunched as Dispo, expanding from a simple camera app to a full social network in March 2021. It is based on the disposable camera. == History == On December 21, 2019, the app was first launched on the App Store under the name "David's Disposable." In its first week of release, it was downloaded more than a million times, reaching number one among free apps in the App Store. In June 2020, the team decided to rename the app to Dispo, purchasing the Dispo.fun domain on June 21, 2020. The company announced the change in September 2020. The early Dispo team consisted of Dobrik's longtime friend and business associate Natalie Mariduena as its treasurer, entrepreneur and venture capitalist Daniel Liss as chief executive officer, Regynald Augustin as first engineer, and Briana Hokanson as lead designer. In October 2020, the company raised a $4M seed round with backing from Alexis Ohanian's venture fund Seven Seven Six alongside other investors including Unshackled Ventures, Shrug Capital, and Weekend Fund. In February 2021, Axios reported that the app had generated US$20 million in its series A round, led by Spark Capital. At this time, the app was valued at US$200 million. A New York Times profile asked, "Are Disposables the Future of Photosharing?" In March 2021, the app was officially relaunched with new social network features and its invite-only feature was dropped. On March 21, 2021, it was announced that Spark Capital would sever all ties with Dispo in light of several disparaging allegations against David Dobrik and The Vlog Squad. The same day, it was announced that Dobrik would leave the company and step down from the company's board of directors. On March 22, 2021, Seven Seven Six and Unshackled Ventures announced they would be standing by the company and its remaining employees but donating profits to charity. In June, 2021, CEO Daniel Liss announced Dispo's official Series A. Investors and advisors in the new Dispo include Ohanian's Seven Seven Six, Unshackled, Endeavor, photographers Annie Leibovitz and Raven B. Varona, NBA stars Kevin Durant and Andre Iguodala (through their 35 Ventures and F9 Strategies venture firms, respectively). Other participants include Cara Delevingne, Sofia Vergara, Shade Room CEO Angelica Nwandu, Latin World Entertainment CEO Luis Balaguer, and Amplify Africa co-founders Damilare Kujembola and Timi Adeyeba. == Overview == Dispo has been compared to other image sharing and social networking services, most notably Instagram and VSCO, although users cannot immediately see the photos they have taken using the app. When a user attempts to take a photo, the interface mimics the developing process of a disposable camera. Users can take as many photos on the app as they want; they do not appear on the app however, until 9 am the next day. Once the set of photos appear on the app, users can choose to save them or share them with other users in a "roll". == Reception == Screen Rant has called the app "like Clubhouse [referring to the app] but for photos," comparing the early invite-only features of the apps. As it greatly restricts the user's editing options and sets out to offer a more authentic social networking experience, the app has been widely dubbed the "anti-Instagram". Between March 2021 and June 2021, the app reached the top ten in the App Store's photo/video rankings on 5 continents including in the US, Japan, Spain, Germany, Brazil, and Australia. It has been a notable success in Japan, where it opened its first international office in July 2021. In July 2021, NBA number one draft pick Cade Cunningham announced he had selected Dispo as his exclusive social media partner for the NBA draft.

    Read more →
  • Free Studio

    Free Studio

    Free Studio is a freeware set of multimedia computer programs developed by DVDVideoSoft. The programs are available in one integrated package and also as separate downloads (Free Studio Manager is included in both). == Overview == The Free Studio software bundle consists of about 48 programs, grouped into several sections: YouTube, MP3 & Audio, CD-DVD-BD, DVD & Video, Photo & Images, Mobiles, Apple Devices, and 3D. The largest group is the DVD & Video section containing 14 different applications. Mobiles section is the second largest group with 13 programs. However, the YouTube section, particularly YouTube downloading programs, has gained more popularity among users. The programs have been tested and endorsed by a dozen of software portals and have won awards from these sites. Free Studio is most popular in Germany, Greece, Italy, and the United States. It is also popular in Japan, France, and the United Kingdom. Some of the programs in the package are free and open-source software. == History == DVDVideoSoft project was launched in 2006 by company Digital Wave Ltd., for software development to produce multimedia application software. The founders distributed paid software as an affiliate at the start, later their own products appeared on the site. Free YouTube Download was the first successful program, then DVDVideoSoft created and launched several other 'Free YouTube' applications. Later on upon users' requests DVDVideoSoft started developing other kinds of applications including media converters etc. Today DVDVideoSoft offers up to 49 different programs for video, audio and image processing individually or integrated into the Free Studio package. == Features == DVDVideoSoft YouTube programs can be used to download YouTube videos in their original format and convert them to AVI, DVD, MP4, WMV etc. or different audio formats. YouTube section contains Free Video Call Recorder for Skype button, but the program itself is not included into FS installation (it has to be downloaded and installed separately). The "MP3 & Audio" section consists of the programs which convert audio files between different formats, convert audio files to Flash for web, extract audio from video files, edit audio files (Free Audio Dub), rip and burn CDs. Enclosed in the CD-DVD-BD section are the applications that enable users to burn files and folders to discs, to convert videos to a DVD format and vice versa, to burn CDs, and to copy music from audio CDs into files. The "DVD and Video" section contains several desktop video and DVD converters. Some of the programs can flip, rotate and cut (Free Video Dub) videos. One of the most popular programs from the section is Free Video Dub. Converted videos are now, contrary to previous versions, watermarked if no paid membership is present. Free Studio includes several applications for Apple phones, iPods and other devices. The Mobiles section contains a dozen video converters for various mobile devices such as cell phones, Tablets and Game consoles. They convert videos to play them on (BlackBerry, HTC, LG phones, Sony/Sony Ericsson, Nintendo, Xbox, Motorola phones, etc.) The "Photo & Images" section incorporates the programs for image conversion and resizing, extracting JPEG frames from videos (Free Video To JPEG Converter), recording screen activities, making screenshots (Free Screen Recorder). The 3D section is composed of the programs to make 3D videos and 3D images. There are several algorithms which allow to create different types of 3D images. == Supported formats == === Video formats === Input: .avi; .ivf; .div; .divx; .mpg; .mpeg; .mpe; .mp4; .m4v; .wmv; .asf; .webm; .mkv; .mov; .qt; .ts; .mts; .m2t; .m2ts; .mod; .tod; .vro; .dat; .3gp2; .3gpp; .3gp; .3g2; .dvr-ms; .flv; .f4v; .amv; .rm; .rmm; .rv; .rmvb; .ogv; DVD video Output: .mp4; .wmv; .avi; .mkv; .webm; .flv; .swf; .mov; .3gp; .m2ts; DVD video === Audio formats === Input: .mp3 .wav; .aac; .m4a; .m4b; .wma; .ogg; .flac; .ra; .ram; .amr; .ape; .mka; .tta; .aiff; .au; .mpc; .spx; .ac3; audio cd Output: .mp3; .m4a; .aac; .wav; .wma; .ogg; .flac; .ape; audio CD === Image formats === Input: .jpg, .png, .bmp, .gif, .tga Output: .jpg, .png, .bmp, .gif, .tga, .pdf == Reception == The programs have been tested and endorsed by Chip Online, Tucows, SnapFiles, Brothersoft, and Softonic and have won awards from these sites. Free Studio is most popular in Germany, United States and Italy. It is also popular in Japan, France and the United Kingdom. The most popular applications, according to CNET statistics, include Free YouTube to MP3 Converter, Free Video to MP3 Converter, Free MP4 Video Converter and Free YouTube Download. Other programs with high rank: Free AVI Video Converter, Free Video Editor, Free Audio Converter and Free Studio in a whole. == Criticism == Free Studio (as can be common for freeware packages) is criticized for toolbar and Web search engine installation. Older versions have also included OpenCandy, which is loaded automatically, with no request for user approval. There can be difficulties installing only the programs needed without installing bundled extra programs. In March 2017, DVDVideoSoft announced that it had stopped showing other products' ads during installation and removed all toolbars, search engines, and OpenCandy.

    Read more →
  • AstroPay

    AstroPay

    AstroPay is a global digital wallet that provides users with a way to pay, send, and receive money. The app provides online payments, virtual and physical debit cards, peer-to-peer money transfers, and more. == History == AstroPay was founded in Uruguay in 2009 as a payment processing company. Over time, it expanded its services across Latin America, EMEA, and APAC. A significant milestone occurred in 2016, when AstroPay spun off dLocal, focusing on cross-border payments for emerging markets. dLocal became Uruguay's first unicorn and eventually went public through a successful IPO. In 2020, AstroPay spun off its payment processing services into a new entity, D24, to focus on mobile wallet for cross border. Between 2023 and 2024 the Company brought new leadership to guide its transition towards becoming a fully focused global digital multicurrency wallet where users save, send, and spend globally. This shift introduced enhanced features, including loyalty prepaid cards and multicurrency accounts. == Services == AstroPay offers three main products: AstroPay Wallet, AstroPay check-out, and AstroPay Platform. AstroPay Wallet is a digital wallet for consumers, where they have multicurrency accounts, prepaid card and marketplace. With AstroPay check-out, businesses can tap into AstroPay's wallet user base by accepting AstroPay as a payment method in their check-out options. Lastly, AstroPay Platform enables other businesses to use the AstroPay network to launch their own global wallet. == Brand endorsements, partnerships == AstroPay's marketing strategy has included the development of co-branded products with sports teams and other brand. The company sponsored Burnley Football Club during the 2018–19 Premier League season, renewing the partnership for the 2021–22 Premier League season when it became the club's official payment service partner. In August 2021, AstroPay entered into a partnership with the Wolverhampton Wanderers for the 2021-22 Premier League season, and the following year, became the team's shirt sponsor. Later, in September 2021, AstroPay expanded its partnership with Wolverhampton Wanderers, which included becoming the team's official payment partner and later, in 2023, co-launching a co-branded card. Other partnerships include Newcastle United in 2021 in the English Premier League. AstroPay made arrangements to ensure that branding and logo would be visible on the pitch-side LED advertising during Premier League matches. Furthermore, in June 2022, the company renewed it's partnership with Wolverhampton Wanderers for the 2022-23 Premier League season and launched its Wolves debit card in February 2023. Some other notable partnerships include: Universidad de Chile in 2024, Tottenham Hotspurs in 2023-25, and even a collaboration with Lionel Messi across all of Latin America. == Recent developments == AstroPay has refocused its strategy since 2023, pivoting from payment processing to concentrate on its global digital wallet. This move reflects a broader effort to redefine the company's market positioning by emphasizing global user-friendly financial services, while separating its identity from previous operations managed by dLocal and D24.

    Read more →
  • Sayre's paradox

    Sayre's paradox

    Sayre's paradox is a dilemma encountered in the design of automated handwriting recognition systems. A standard statement of the paradox is that a cursively written word cannot be recognized without being segmented and cannot be segmented without being recognized. The paradox was first articulated in a 1973 publication by Kenneth M. Sayre, after whom it was named. == Nature of the problem == It is relatively easy to design automated systems capable of recognizing words inscribed in a printed format. Such words are segmented into letters by the very act of writing them on the page. Given templates matching typical letter shapes in a given language, individual letters can be identified with a high degree of probability. In cases of ambiguity, probable letter sequences can be compared with a selection of properly spelled words in that language (called a lexicon). If necessary, syntactic features of the language can be applied to render a generally accurate identification of the words in question. Printed-character recognition systems of this sort are commonly used in processing standardized government forms, in sorting mail by zip code, and so forth. In cursive writing, however, letters comprising a given word typically flow sequentially without gaps between them. Unlike a sequence of printed letters, cursively connected letters are not segmented in advance. Here is where Sayre's Paradox comes into play. Unless the word is already segmented into letters, template-matching techniques like those described above cannot be applied. That is, segmentation is a prerequisite for word recognition. But there are no reliable techniques for segmenting a word into letters unless the word itself has been identified. Word recognition requires letter segmentation, and letter segmentation requires word recognition. There is no way a cursive writing recognition system employing standard template-matching techniques can do both simultaneously. Advantages to be gained by use of automated cursive writing recognition systems include routing mail with handwritten addresses, reading handwritten bank checks, and automated digitalization of hand-written documents. These are practical incentives for finding ways of circumventing Sayre's Paradox. == Avoiding the paradox == One way of ameliorating the adverse effects of the paradox is to normalize the word inscriptions to be recognized. Normalization amounts to eliminating idiosyncrasies in the penmanship of the writer, such as unusual slope of the letters and unusual slant of the cursive line. This procedure can increase the probability of a correct match with a letter template, resulting in an incremental improvement in the success rate of the system. Since improvement of this sort still depends on accurate segmentation, however, it remains subject to the limitations of Sayre's Paradox. Researchers have come to realize that the only way to circumvent the paradox is by use of procedures that do not rely on accurate segmentation. == Directions of current research == Segmentation is accurate to the extent that it matches distinctions among letters in the actual inscriptions presented to the system for recognition (the input data). This is sometimes referred to as “explicit segmentation”. “Implicit segmentation,” by contrast, is division of the cursive line into more parts than the number of actual letters in the cursive line itself. Processing these “implicit parts” to achieve eventual word identification requires specific statistical procedures involving hidden Markov models (HMM). A Markov model is a statistical representation of a random process, which is to say a process in which future states are independent of states occurring before the present. In such a process, a given state is dependent only on the conditional probability of its following the state immediately before it. An example is a series of outcomes from successive casts of a die. An HMM is a Markov model, individual states of which are not fully known. Conditional probabilities between states are still determinate, but the identities of individual states are not fully disclosed. Recognition proceeds by matching HMMs of words to be recognized with previously prepared HMMs of words in the lexicon. The best match in a given case is taken to indicate the identity of the handwritten word in question. As with systems based on explicit segmentation, automated recognition systems based on implicit segmentation are judged more or less successful according to the percentage of correct identifications they accomplish. Instead of explicit segmentation techniques, most automated handwriting recognition systems today employ implicit segmentation in conjunction with HMM-based matching procedures. The constraints epitomized by Sayre's Paradox are largely responsible for this shift in approach.

    Read more →
  • Morphological antialiasing

    Morphological antialiasing

    Morphological antialiasing (MLAA) is a spatial anti-aliasing technique used in real-time computer graphics. It reduces artifacts, such as jaggies, when representing a high-resolution image at a lower resolution. MLAA is a post-process filtering which detects borders in the resulting image and then finds specific patterns in these. Anti-aliasing is achieved by blending pixels in these borders, according to the pattern they belong to and their position within the pattern. Introduced in 2009, MLAA was an early and influential example of anti-aliasing techniques done in post-processing, which makes them suitable for deferred shading. A similar method in this class is fast approximate anti-aliasing (FXAA). Temporal anti-aliasing, also a post-process, has become the most common anti-aliasing method for real-time rendering and video games. Enhanced subpixel morphological antialiasing, or SMAA, is an image-based GPU-based implementation of MLAA developed by Universidad de Zaragoza and Crytek.

    Read more →
  • Inpainting

    Inpainting

    Inpainting is a conservation process where damaged, deteriorated, or missing parts of an artwork are filled in to present a complete image. This process is commonly used in image restoration. It can be applied to both physical and digital art mediums such as oil or acrylic paintings, chemical photographic prints, sculptures, or digital images and video. With its roots in physical artwork, such as painting and sculpture, traditional inpainting is performed by a trained art conservator who has carefully studied the artwork to determine the mediums and techniques used in the piece, potential risks of treatments, and ethical appropriateness of treatment. == History == The modern use of inpainting can be traced back to Pietro Edwards (1744–1821), Director of the Restoration of the Public Pictures in Venice, Italy. Using a scientific approach, Edwards focused his restoration efforts on the intentions of the artist. It was during the 1930 International Conference for the Study of Scientific Methods for the Examination and Preservation of Works of Art, that the modern approach to inpainting was established. Helmut Ruhemann (1891–1973), a German restorer and conservator, led the discussions on the use of inpainting in conservation. Helmut Ruhemann was a leading figure in modernizing restoration and conservation. His greatest contribution to the field of conservation "was his insistence on following the methods of the original painter exactly, and on understanding the painter's artistic intention". After his career of over 40 years as a conservator, Ruhemann published his treatise The Cleaning of Paintings: Problems & Potentialities in 1968. In describing his method, Ruhemann states that "The surface [of the fill] should be slightly lower than that of the surrounding paint to allow for the thickness of the inpainting...Inpainting medium should look and behave like the original medium, but must not darken with age." Cesare Brandi (1906–1988) developed the teoria del restauro, the inpainting approach combining aesthetics and psychology. However, this approach was used primarily by Italian restorers and conservators, with the terminology becoming widespread in the 1990s. Technological advancements led to new applications of inpainting. Widespread use of digital techniques range from entirely automatic computerized inpainting to tools used to simulate the process manually. Since the mid-1990s, the process of inpainting has evolved to include digital media. More commonly known as image or video interpolation, a form of estimation, digital inpainting includes the use of computer software that relies on sophisticated algorithms to replace lost or corrupted parts of the image data. == Ethics == In order to preserve the integrity of an original artwork, any inpainting technique or treatment applied to physical or digital work should be reversible or distinguishable from the original content of the artwork. Prior to any treatments, conservators proceed according to the American Institute of Conservation of Historical and Artistic Works. There are several ethic considerations before Inpainting can be justified. Various deliberation decisions over the ethical appropriateness of the amount and type of inpainting done, resides on many factors. As most conservation treatments, inpainting's ethical questions rest mainly with authenticity, reversibility and documentation.Any intervention to compensate for loss should be documented in treatment records and reports and should be detectable by common examination methods. Such compensation should be reversible and should not falsely modify the known aesthetic, conceptual, and physical characteristics of the cultural property, especially by removing or obscuring original material.New technologies and the aesthetic demand for perfect images without imperfections challenge conservators' ethical practices to protect the integrity of originals. == Methods == Inpainting methods and techniques depend on the desired goal and type of image being treated. Treatments to fill in the gaps are different between physical and digital art. In inpainting, detailed records of the initial state of the images can help with the treatment and replicate the original closer. === Physical inpainting === Inpainting is rooted in the conservation and restoration of paintings. Inpainting can aim to make a visual improvement to the artwork as a whole by repairing missing or damaged parts using methods and materials equivalent to the original artist's work. ==== Application techniques ==== By studying the painting methods of various artists and the composition of paints used historically, conservators are able to restore works very closely to their original visual appearance. The picture as a whole determines how to fill in the gap. Helmut Ruhemann's inpainting techniques by Jessell have procedures to "preserve" the quality of oil and tempera paintings. === Digital inpainting === Many programs are able to reconstruct missing or damaged areas of digital photographs and videos. Most widely known for use with digital images is Adobe Photoshop. Given the various abilities of the digital camera and the digitization of old photos, inpainting has become an automatic process that can be performed on digital images. The inpainting techniques can be applied to object removal, text removal, and other automatic modifications of images and videos. In video special effects, inpainting is usually performed after video matting. They can also be observed in applications like image compression and super-resolution. In photography and cinema, it is used for film restoration to reverse, repair, or mitigate deterioration (e.g., physical damage such as cracks in photographs, scratches and dust spots in film, or chemical damage resulting in image loss; performed infrared cleaning). It can also be used for removing red-eye, the stamped date from photographs, and objects for creative effect. This technique can be used to replace any lost blocks in the coding and transmission of images, for example, in a streaming video. It can also be used to remove logos or watermarks in videos. Deep learning neural network-based inpainting can be used for decensoring images. Deep image prior-based techniques can be used for digital image inpainting, where a trained deep learning model is either unavailable or infeasible. Deep models for visual content generation, like text-to-image or text-to-video, learn complex priors over the distribution of visual content, and can be used to inpaint missing parts. For example, videos can be separated into layers, using a technique called omnimatte, which either pretrain an omnimatte model or without any training using an omnimatte-zero model. Three main groups of 2D image-inpainting algorithms can be found in the literature. The first one to be noted is structural (or geometric) inpainting, the second one is texture inpainting, the last one is a combination of these two techniques. They use the information of the known or non-destroyed image areas in order to fill the gap, similar to how physical images are restored. ==== Structural ==== Structural or geometric inpainting is used for smooth images that have strong, defined borders. There are many different approaches to geometric inpainting, but they all come from the idea that geometry can be recovered from similar areas or domains. Bertalmio proposed a method of structural inpainting that mimics how conservators address painting restoration. Bertalmio proposed that by progressively transferring similar information from the borders of an inpainting domain inwards, the gap can be filled. ==== Textural ==== While structural/geometric inpainting works to repair smooth images, textural inpainting works best with images that are heavily textured. Texture has a repetitive pattern which means that a missing portion cannot be restored by continuing the level lines into the gap; level lines provide a complete, stable representation of an image. To repair texture in an image, one can combine frequency and spatial domain information to fill in a selected area with a desired texture. This method, while the most simple and very effective, works well when selecting a texture to be in-painted. For a texture that covers a wider area or a larger frame one would have to go through the image segmenting the areas to be in-painted and selecting the corresponding textures from throughout the image; there are programs that can help find the corresponding areas that work in a similar way as 'find and replace' works in a word processor. ==== Combined structural and textural ==== Combined structural and textural inpainting approaches simultaneously try to perform texture- and structure-filling in regions of missing image information. Most parts of an image consist of texture and structure and the boundaries between image regions contain a large amount of structural information. This is the result when blending differ

    Read more →