Best AI Image Generators

Best AI Image Generators — hands-on reviews, top picks, pricing, pros and cons and a practical how-to guide on Aizhi.

  • Data augmentation

    Data augmentation

    Data augmentation is a statistical technique which allows maximum likelihood estimation from incomplete data. Data augmentation has important applications in Bayesian analysis, and the technique is widely used in machine learning to reduce overfitting when training machine learning models, achieved by training models on several slightly-modified copies of existing data. == Synthetic oversampling techniques for traditional machine learning == Synthetic Minority Over-sampling Technique (SMOTE) is a method used to address imbalanced datasets in machine learning. In such datasets, the number of samples in different classes varies significantly, leading to biased model performance. For example, in a medical diagnosis dataset with 90 samples representing healthy individuals and only 10 samples representing individuals with a particular disease, traditional algorithms may struggle to accurately classify the minority class. SMOTE rebalances the dataset by generating synthetic samples for the minority class. For instance, if there are 100 samples in the majority class and 10 in the minority class, SMOTE can create synthetic samples by randomly selecting a minority class sample and its nearest neighbors, then generating new samples along the line segments joining these neighbors. This process helps increase the representation of the minority class, improving model performance. == Data augmentation for image classification == When convolutional neural networks grew larger in mid-1990s, there was a lack of data to use, especially considering that some part of the overall dataset should be spared for later testing. It was proposed to perturb existing data with affine transformations to create new examples with the same labels, which were complemented by so-called elastic distortions in 2003, and the technique was widely used as of 2010s. Data augmentation can enhance CNN performance and acts as a countermeasure against CNN profiling attacks. Data augmentation has become fundamental in image classification, enriching training dataset diversity to improve model generalization and performance. The evolution of this practice has introduced a broad spectrum of techniques, including geometric transformations, color space adjustments, and noise injection. === Geometric Transformations === Geometric transformations alter the spatial properties of images to simulate different perspectives, orientations, and scales. Common techniques include: Affine Transformation Rotation: Rotating images by a specified degree to help models recognize objects at various angles. Reflection: Reflecting images horizontally or vertically to introduce variability in orientation. Translation: Shifting images in different directions to teach models positional invariance. Scaling Shear Mapping Cropping: Removing sections of the image to focus on particular features or simulate closer views. Elastic Distortion Morphing within the same class: Generating new samples by applying morphing techniques between two images belonging to the same class, thereby increasing intra-class diversity. === Color Space Transformations === Color space transformations modify the color properties of images, addressing variations in lighting, color saturation, and contrast. Techniques include: Brightness Adjustment: Varying the image's brightness to simulate different lighting conditions. Contrast Adjustment: Changing the contrast to help models recognize objects under various clarity levels. Saturation Adjustment: Altering saturation to prepare models for images with diverse color intensities. Color Jittering: Randomly adjusting brightness, contrast, saturation, and hue to introduce color variability. === Noise Injection === Injecting noise into images simulates real-world imperfections, teaching models to ignore irrelevant variations. Techniques involve: Gaussian Noise: Adding Gaussian noise mimics sensor noise or graininess. Salt and Pepper Noise: Introducing black or white pixels at random simulates sensor dust or dead pixels. == Data augmentation for signal processing == Residual or block bootstrap can be used for time series augmentation. === Biological signals === Synthetic data augmentation is of paramount importance for machine learning classification, particularly for biological data, which tend to be high dimensional and scarce. The applications of robotic control and augmentation in disabled and able-bodied subjects still rely mainly on subject-specific analyses. Data scarcity is notable in signal processing problems such as for Parkinson's Disease Electromyography signals, which are difficult to source - Zanini, et al. noted that it is possible to use a generative adversarial network (in particular, a DCGAN) to perform style transfer in order to generate synthetic electromyographic signals that corresponded to those exhibited by sufferers of Parkinson's Disease. The approaches are also important in electroencephalography (brainwaves). Wang, et al. explored the idea of using deep convolutional neural networks for EEG-Based Emotion Recognition, results show that emotion recognition was improved when data augmentation was used. A common approach is to generate synthetic signals by re-arranging components of real data. Lotte proposed a method of "Artificial Trial Generation Based on Analogy" where three data examples x 1 , x 2 , x 3 {\displaystyle x_{1},x_{2},x_{3}} provide examples and an artificial x s y n t h e t i c {\displaystyle x_{synthetic}} is formed which is to x 3 {\displaystyle x_{3}} what x 2 {\displaystyle x_{2}} is to x 1 {\displaystyle x_{1}} . A transformation is applied to x 1 {\displaystyle x_{1}} to make it more similar to x 2 {\displaystyle x_{2}} , the same transformation is then applied to x 3 {\displaystyle x_{3}} which generates x s y n t h e t i c {\displaystyle x_{synthetic}} . This approach was shown to improve performance of a Linear Discriminant Analysis classifier on three different datasets. Current research shows great impact can be derived from relatively simple techniques. For example, Freer observed that introducing noise into gathered data to form additional data points improved the learning ability of several models which otherwise performed relatively poorly. Tsinganos et al. studied the approaches of magnitude warping, wavelet decomposition, and synthetic surface EMG models (generative approaches) for hand gesture recognition, finding classification performance increases of up to +16% when augmented data was introduced during training. More recently, data augmentation studies have begun to focus on the field of deep learning, more specifically on the ability of generative models to create artificial data which is then introduced during the classification model training process. In 2018, Luo et al. observed that useful EEG signal data could be generated by Conditional Wasserstein Generative Adversarial Networks (GANs) which was then introduced to the training set in a classical train-test learning framework. The authors found classification performance was improved when such techniques were introduced. === Mechanical signals === The prediction of mechanical signals based on data augmentation brings a new generation of technological innovations, such as new energy dispatch, 5G communication field, and robotics control engineering. In 2022, Yang et al. integrate constraints, optimization and control into a deep network framework based on data augmentation and data pruning with spatio-temporal data correlation, and improve the interpretability, safety and controllability of deep learning in real industrial projects through explicit mathematical programming equations and analytical solutions.

    Read more →
  • United States export controls on AI chips and semiconductors

    United States export controls on AI chips and semiconductors

    United States export controls on AI chips and semiconductors are a series of regulations imposed by the United States restricting the export of technology and equipment related to artificial intelligence to other countries, primarily targeting China. This has happened in the context of a broader trade war. In January 2026, BIS formalized a flexible license review policy for these transactions.

    Read more →
  • Maia and Marco

    Maia and Marco

    Maia and Marco are artificial intelligence used by GMA Network. Unveiled in 2023, they are used to fulfill the role of sports newscasters. == Background == Maia and Marco are artificial intelligence (AI) which take the form of three-dimensional human avatars. Maia makes use of a female avatar while Marco uses a male likeness. They have aesthetic features that are typical to Filipino showbusiness personalities. Among the technologies used in making and operating the AI include image generation, text-to-speech AI voice synthesis/generation, and deep learning face animation. They are also demonstrated to be bilingual, being able to speak in English and Tagalog (Filipino). == Use == The AI pair was unveiled by GMA Network on September 24, 2023, for their coverage of Season 99 of the National Collegiate Athletic Association (NCAA). Fulfilling the role of sports newscasters, Maia and Marco would join GMA's courtside human reporters. The AI pair are scheduled to appear four times a month on GMA's digital media platforms. They will not appear in traditional television broadcast. == Reception == The launch of the Maia and Marco was met with strong reactions. Various journalists and other personalities across the Philippine media industry expressed concern that their employment be at risk with the introduction of AI. The quality of the AI ability to emulate human behavior was characterized by critics as "soulless". GMA responding to concerns has stated that the AI would complement rather than replace its live human journalists including sportscasters. The National Union of Journalists of the Philippines urged dialogue among its peers in the newsroom on policy on how to use AI, which the group acknowledge as "inevitable".

    Read more →
  • Minion (solver)

    Minion (solver)

    Minion is a solver for satisfaction problems. Unlike constraint programming toolkits, which expect users to write programs in a traditional programming language like C++, Java or Prolog, Minion takes a text file which specifies the problem, and solves using only this. This makes using Minion much simpler, at the cost of much less customization. Minion has been shown to be faster than major commercial constraint solvers including CPLEX (formerly IBM ILOG). == Overview == Minion was introduced in 2006 by researchers at the University of St Andrews as a “fast, scalable” solver for large and hard CSP instances. The project provides a compact input language and a low-overhead C++ implementation aimed at throughput and memory efficiency. == Design and features == Minion implements a range of variable and constraint types commonly used in CSP modelling, plus search heuristics and optimisation support. The solver architecture prioritises cache-friendly data structures and specialised propagators. Notably, the developers adapted watched literal techniques from SAT solving to speed up constraint propagation for, among others, Boolean sums, the element global constraint, and table constraints. The modelling approach relies on a plain-text format (parsed by Minion) rather than embedding models into a host programming language. This reduces overhead and supports rapid “model-and-run” experimentation for large benchmark sets. == Performance == In the original evaluation on standard benchmarks, the authors reported that Minion often ran between one and two orders of magnitude faster than state-of-the-art toolkits of the time (including ILOG Solver and Gecode) on large, hard instances, with smaller gains—or slowdowns—on easier problems. Subsequent research has used Minion as a baseline solver in empirical studies and test generation tasks, reflecting its adoption within parts of the constraint programming community. == Applications == Minion has been applied in academic work on combinatorial search, scheduling and test generation, and is available to other environments via wrappers (for example, from the R language).

    Read more →
  • BERT (language model)

    BERT (language model)

    Bidirectional encoder representations from transformers (BERT) is a language model introduced in October 2018 by researchers at Google. It learns to represent text as a sequence of vectors using self-supervised learning. It uses the encoder-only transformer architecture. BERT dramatically improved the state of the art for large language models. As of 2020, BERT is a ubiquitous baseline in natural language processing (NLP) experiments. BERT is trained by masked token prediction and next sentence prediction. With this training, BERT learns contextual, latent representations of tokens in their context, similar to ELMo and GPT-2. It found applications for many natural language processing tasks, such as coreference resolution and polysemy resolution. It improved on ELMo and spawned the study of "BERTology", which attempts to interpret what is learned by BERT. BERT was originally implemented in the English language at two model sizes, BERTBASE (110 million parameters) and BERTLARGE (340 million parameters). Both were trained on the Toronto BookCorpus (800M words) and English Wikipedia (2,500M words). The weights were released on GitHub. On March 11, 2020, 24 smaller models were released, the smallest being BERTTINY with just 4 million parameters. == Architecture == BERT is an "encoder-only" transformer architecture. At a high level, BERT consists of 4 modules: Tokenizer: This module converts a piece of English text into a sequence of integers ("tokens"). Embedding: This module converts the sequence of tokens into an array of real-valued vectors representing the tokens. It represents the conversion of discrete token types into a lower-dimensional Euclidean space. Encoder: a stack of Transformer blocks with self-attention, but without causal masking. Task head: This module converts the final representation vectors into one-shot encoded tokens again by producing a predicted probability distribution over the token types. It can be viewed as a simple decoder, decoding the latent representation into token types, or as an "un-embedding layer". The task head is necessary for pre-training, but it is often unnecessary for so-called "downstream tasks," such as question answering or sentiment classification. Instead, one removes the task head and replaces it with a newly initialized module suited for the task, and finetune the new module. The latent vector representation of the model is directly fed into this new module, allowing for sample-efficient transfer learning. === Embedding === This section describes the embedding used by BERTBASE. The other one, BERTLARGE, is similar, just larger. The tokenizer of BERT is WordPiece, which is a sub-word strategy like byte-pair encoding. Its vocabulary size is 30,000, and any token not appearing in its vocabulary is replaced by [UNK] ("unknown"). The first layer is the embedding layer, which contains three components: token type embeddings, position embeddings, and segment type embeddings. Token type: The token type is a standard embedding layer, translating a one-hot vector into a dense vector based on its token type. Position: The position embeddings are based on a token's position in the sequence. BERT uses absolute position embeddings, where each position in a sequence is mapped to a real-valued vector. Each dimension of the vector consists of a sinusoidal function that takes the position in the sequence as input. Segment type: Using a vocabulary of just 0 or 1, this embedding layer produces a dense vector based on whether the token belongs to the first or second text segment in that input. In other words, type-1 tokens are all tokens that appear after the [SEP] special token. All prior tokens are type-0. The three embedding vectors are added together representing the initial token representation as a function of these three pieces of information. After embedding, the vector representation is normalized using a LayerNorm operation, outputting a 768-dimensional vector for each input token. After this, the representation vectors are passed forward through 12 Transformer encoder blocks, and are decoded back to 30,000-dimensional vocabulary space using a basic affine transformation layer. === Architectural family === The encoder stack of BERT has 2 free parameters: L {\displaystyle L} , the number of layers, and H {\displaystyle H} , the hidden size. There are always H / 64 {\displaystyle H/64} self-attention heads, and the feed-forward/filter size is always 4 H {\displaystyle 4H} . By varying these two numbers, one obtains an entire family of BERT models. For BERT: the feed-forward size and filter size are synonymous. Both of them denote the number of dimensions in the middle layer of the feed-forward network. the hidden size and embedding size are synonymous. Both of them denote the number of real numbers used to represent a token. The notation for encoder stack is written as L/H. For example, BERTBASE is written as 12L/768H, BERTLARGE as 24L/1024H, and BERTTINY as 2L/128H. == Training == === Pre-training === BERT was pre-trained simultaneously on two tasks: Masked language modeling (MLM): In this task, BERT ingests a sequence of words, where one word may be randomly changed ("masked"), and BERT tries to predict the original words that had been changed. For example, in the sentence "The cat sat on the [MASK]," BERT would need to predict "mat." This helps BERT learn bidirectional context, meaning it understands the relationships between words not just from left to right or right to left but from both directions at the same time. Next sentence prediction (NSP): In this task, BERT is trained to predict whether one sentence logically follows another. For example, given two sentences, "The cat sat on the mat" and "It was a sunny day", BERT has to decide if the second sentence is a valid continuation of the first one. This helps BERT understand relationships between sentences, which is important for tasks like question answering or document classification. ==== Masked language modeling ==== In masked language modeling, 15% of tokens would be randomly selected for masked-prediction task, and the training objective was to predict the masked token given its context. In more detail, the selected token is: replaced with a [MASK] token with probability 80%, replaced with a random word token with probability 10%, not replaced with probability 10%. The reason not all selected tokens are masked is to avoid the dataset shift problem. The dataset shift problem arises when the distribution of inputs seen during training differs significantly from the distribution encountered during inference. A trained BERT model might be applied to word representation (like Word2Vec), where it would be run over sentences not containing any [MASK] tokens. It is later found that more diverse training objectives are generally better. As an illustrative example, consider the sentence "my dog is cute". It would first be divided into tokens like "my1 dog2 is3 cute4". Then a random token in the sentence would be picked. Let it be the 4th one "cute4". Next, there would be three possibilities: with probability 80%, the chosen token is masked, resulting in "my1 dog2 is3 [MASK]4"; with probability 10%, the chosen token is replaced by a uniformly sampled random token, such as "happy", resulting in "my1 dog2 is3 happy4"; with probability 10%, nothing is done, resulting in "my1 dog2 is3 cute4". After processing the input text, the model's 4th output vector is passed to its decoder layer, which outputs a probability distribution over its 30,000-dimensional vocabulary space. ==== Next sentence prediction ==== Given two sentences, the model predicts if they appear sequentially in the training corpus, outputting either [IsNext] or [NotNext]. During training, the algorithm sometimes samples two sentences from a single continuous span in the training corpus, while at other times, it samples two sentences from two discontinuous spans. The first sentence starts with a special token, [CLS] (for "classify"). The two sentences are separated by another special token, [SEP] (for "separate"). After processing the two sentences, the final vector for the [CLS] token is passed to a linear layer for binary classification into [IsNext] and [NotNext]. For example: Given "[CLS] my dog is cute [SEP] he likes playing [SEP]", the model should predict [IsNext]. Given "[CLS] my dog is cute [SEP] how do magnets work [SEP]", the model should predict [NotNext]. === Fine-tuning === BERT is meant as a general pretrained model for various applications in natural language processing. That is, after pre-training, BERT can be fine-tuned with fewer resources on smaller datasets to optimize its performance on specific tasks such as natural language inference and text classification, and sequence-to-sequence-based language generation tasks such as question answering and conversational response generation. The original BERT paper published results demonstrating that a small amount of fine

    Read more →
  • John F. Sowa

    John F. Sowa

    John Florian Sowa (born 1940) is an American computer scientist, an expert in artificial intelligence and computer design, and the inventor of conceptual graphs. == Biography == Sowa received a BS in mathematics from Massachusetts Institute of Technology in 1962, an MA in applied mathematics from Harvard University in 1966, and a PhD in computer science from the Vrije Universiteit Brussel in 1999 with a dissertation titled "Knowledge Representation: Logical, Philosophical, and Computational Foundations". Sowa spent most of his professional career at IBM, starting in 1962 at IBM's applied mathematics group. Over the decades he has researched and developed emerging fields of computer science from compilers, programming languages, and system architecture to artificial intelligence and knowledge representation. In the 1990s Sowa was associated with the IBM Educational Center in New York. Over the years he taught courses at the IBM Systems Research Institute, Binghamton University, Stanford University, the Linguistic Society of America and the Université du Québec à Montréal. He is a fellow of the Association for the Advancement of Artificial Intelligence. After early retirement at IBM, Sowa in 2001 cofounded VivoMind Intelligence, Inc. with Arun K. Majumdar. With this company he was developing data-mining and database technology, more specifically high-level "ontologies" for artificial intelligence and automated natural language understanding. Currently Sowa is working with Kyndi Inc., also founded by Majumdar. John Sowa is married to the philologist Cora Angier Sowa, and they live in Croton-on-Hudson, New York. == Work == Sowa's research interests since the 1970s were in the field of artificial intelligence, expert systems and database query linked to natural languages. In his work he combines ideas from numerous disciplines and eras modern and ancient, for example, applying ideas from Aristotle, the medieval scholastics to Alfred North Whitehead and including database schema theory, and incorporating the model of analogy of Islamic scholar Ibn Taymiyyah in his works. === Conceptual graph === Sowa invented conceptual graphs, a graphic notation for logic and natural language, based on the structures in semantic networks and on the existential graphs of Charles S. Peirce. He introduced the concept in the 1976 article "Conceptual graphs for a data base interface" in the IBM Journal of Research and Development. He elaborated upon it in the 1983 book Conceptual structures: information processing in mind and machine. In the 1980s, this theory had "been adopted by a number of research and development groups throughout the world. International conferences on conceptual structures (ICCS) have been held since 1993, following a series of conceptual graph workshops that began in 1986. === Sowa's law of standards === In 1991, Sowa first stated his Law of Standards: "Whenever a major organization develops a new system as an official standard for X, the primary result is the widespread adoption of some simpler system as a de facto standard for X." Like Gall's law, The Law of Standards is essentially an argument in favour of underspecification. Examples include: The introduction of PL/I resulting in COBOL and FORTRAN becoming the de facto standards for business and scientific programming respectively The introduction of Algol-68 resulting in Pascal becoming the de facto standard for academic programming The introduction of the Ada language resulting in C becoming the de facto standard for US Department of Defense programming The introduction of OS/2 resulting in Windows becoming the de facto standard for desktop OS The introduction of X.400 resulting in SMTP becoming the de facto standard for electronic mail The introduction of X.500 resulting in LDAP becoming the de facto standard for directory services == Publications == 1984. Conceptual Structures - Information Processing in Mind and Machine. The Systems Programming Series, Addison-Wesley 1991. Principles of Semantic Networks. Morgan Kaufmann. Mineau, Guy W; Moulin, Bernard; Sowa, John F, eds. (1993). Conceptual Graphs for Knowledge Representation. LNCS. Vol. 699. doi:10.1007/3-540-56979-0. ISBN 978-3-540-56979-4. S2CID 32275791. 1994. International Conference on Conceptual Structures (2nd : 1994 : College Park, Md.) Conceptual structures, current practices : Second International Conference on Conceptual Structures, ICCS'94, College Park, Maryland, USA, August 16–20, 1994 : proceedings. William M. Tepfenhart, Judith P. Dick, John F. Sowa, eds. Ellis, Gerard; Levinson, Robert; Rich, William; Sowa, John F, eds. (1995). Conceptual Structures: Applications, Implementation and Theory. LNCS. Vol. 954. doi:10.1007/3-540-60161-9. ISBN 978-3-540-60161-6. S2CID 27300281. Lukose, Dickson; Delugach, Harry; Keeler, Mary; Searle, Leroy; Sowa, John, eds. (1997). Conceptual Structures: Fulfilling Peirce's Dream. LNCS. Vol. 1257. doi:10.1007/BFb0027865. ISBN 3-540-63308-1. S2CID 1934069. 2000. Knowledge representation : logical, philosophical, and computational foundations, Brooks Cole Publishing Co., Pacific Grove Articles, a selection Sowa, J. F. (July 1976). "Conceptual Graphs for a Data Base Interface". IBM Journal of Research and Development. 20 (4): 336–357. doi:10.1147/rd.204.0336. Sowa, J. F.; Zachman, J. A. (1992). "Extending and formalizing the framework for information systems architecture". IBM Systems Journal. 31 (3): 590–616. doi:10.1147/sj.313.0590. 1992. "Conceptual Graph Summary"; In: T.E. Nagle et al. (Eds.). Conceptual Structures: Current Research and Practice. Chichester: Ellis Horwood. 1995. "Top-level ontological categories." in: International journal of human-computer studies. Vol. 43, Iss. 5–6, Nov. 1995, pp. 669–685 2006. "Semantic Networks". In: Encyclopedia of Cognitive Science.. John Wiley & Sons.

    Read more →
  • Composite portrait

    Composite portrait

    Composite portraiture (also known as composite photographs) is a technique invented by Sir Francis Galton in the 1880s after a suggestion by Herbert Spencer for registering photographs of human faces on the two eyes to create an "average" photograph of all those in the photographed group. Spencer had suggested using onion paper and line drawings, but Galton devised a technique for multiple exposures on the same photographic plate. He noticed that these composite portraits were more attractive than any individual member, and this has generated a large body of research on human attractiveness and averageness one hundred years later. He also suggested in a Royal Society presentation in 1883 that the composites provided an interesting concrete representation of human ideal types and concepts. He discussed using the technique to investigate characteristics of common types of humanity, such as criminals. In his mind, it was an extension of the statistical techniques of averages and correlation. In this sense, it represents one of the first implementations of convolution factor analysis and neural networks in the understanding of knowledge representation in the human mind. Galton also suggested that the technique could be used for creating natural types of common objects. During the late 19th century, English psychometrician Sir Francis Galton attempted to define physiognomic characteristics of health, disease, beauty, and criminality, via a method of composite photography. Galton's process involved the photographic superimposition of two or more faces by multiple exposures. After averaging together photographs of violent criminals, he found that the composite appeared "more respectable" than any of the faces comprising it; this was likely due to the irregularities of the skin across the constituent images being averaged out in the final blend. Since the advancement of computer graphics technology in the early 1990s, Galton's composite technique has been adopted and greatly improved using computer graphics software.

    Read more →
  • Andrew Ng

    Andrew Ng

    Andrew Yan-Tak Ng (Chinese: 吳恩達; born April 18, 1976) is a British-American computer scientist and technology entrepreneur focusing on machine learning and artificial intelligence (AI). Ng was a cofounder and head of Google Brain and was the former Chief Scientist at Baidu. Ng is an adjunct professor at Stanford University (formerly associate professor and Director of its Stanford AI Lab or SAIL). Ng has also worked in online education, cofounding Coursera and DeepLearning.AI. He has spearheaded many efforts to "democratize deep learning" teaching over 8 million students through his online courses. Ng is renowned globally in computer science, recognized in Time magazine's 100 Most Influential People in 2012 and Fast Company's Most Creative People in 2014. His influence extends to being named in the Time100 AI Most Influential People in 2023. In 2018, he launched and currently heads the AI Fund, initially a $175-million investment fund for backing artificial intelligence startups. He has founded Landing AI, which provides AI-powered SaaS products. On April 11, 2024, Amazon announced Ng's appointment to its board of directors. == Early life and education == Andrew Yan-Tak Ng was born in London, in 1976 to Ronald Paul Ng, a hematologist and lecturer at UCL Medical School, and Tisa Ho, an arts administrator working at the London Film Festival. His parents were both immigrants from Hong Kong. His family moved back to Hong Kong and he spent his early childhood there. In 1984 he and his family moved to Singapore. Ng attended and graduated from Raffles Institution. In 1997, he earned his undergraduate degree with a triple major in computer science, statistics, and economics from Carnegie Mellon University in Pittsburgh, Pennsylvania. Between 1996 and 1998 he also conducted research on reinforcement learning, model selection, and feature selection at the AT&T Bell Labs. In 1998, Ng earned his master's degree in Electrical Engineering and Computer Science from the Massachusetts Institute of Technology (MIT) in Cambridge, Massachusetts. At MIT, he built the first publicly available, automatically indexed web-search engine for research papers on the web. It was a precursor to CiteSeerX/ResearchIndex, but specialized in machine learning. In 2002, he received his Doctor of Philosophy (Ph.D.) in Computer Science from the University of California, Berkeley, under the supervision of Michael I. Jordan. His thesis is titled "Shaping and policy search in reinforcement learning" and is well-cited to this day. == Career == === Academia and teaching === Ng started working as an assistant professor at Stanford University in 2002 and as an associate professor in 2009. Ng is a professor at Stanford University departments of Computer Science and electrical engineering. He served as the director of the Stanford Artificial Intelligence Laboratory (SAIL), where he taught students and undertook research related to data mining, big data, and machine learning. His machine learning course CS229 at Stanford is the most popular course offered on campus with over 1,000 students enrolling some years. As of 2020, three of the most popular courses on Coursera are Ng's: Machine Learning (#1), AI for Everyone (#5), Neural Networks and Deep Learning (#6). In 2008, his group at Stanford was one of the first in the US to start advocating the use of GPUs in deep learning. The rationale was that an efficient computation infrastructure could speed up statistical model training by orders of magnitude, ameliorating some of the scaling issues associated with big data. At the time it was a controversial and risky decision, but since then and following Ng's lead, GPUs have become a cornerstone in the field. Since 2017, Ng has been advocating the shift to high-performance computing (HPC) for scaling up deep learning and accelerating progress in the field. In 2012, along with Stanford computer scientist Daphne Koller he cofounded and was CEO of Coursera, a website that offers free online courses to everyone. It took off with over 100,000 students registered for Ng's popular CS229A course. Today, several million people have enrolled in Coursera courses, making the site one of the leading massive open online courses (MOOCs) in the world. === Industry === From 2011 to 2012, he worked at Google, where he founded and directed the Google Brain Deep Learning Project with Jeff Dean, Greg Corrado, and Rajat Monga. In 2014, he joined Baidu as chief scientist, and carried out research related to big data and AI. There he set up several research teams for things like facial recognition and Melody, an AI chatbot for healthcare. He also developed for the company the AI platform called DuerOS and other technologies that positioned Baidu ahead of Google in the discourse and development of AI. In March 2017, he announced his resignation from Baidu. He soon afterward launched DeepLearning.AI, an online series of deep learning courses (including the AI for Good Specialization). Then Ng launched LandingAI, which provides AI-powered SaaS products. In January 2018, Ng unveiled the AI Fund, raising $175 million to invest in new startups. In November 2021, LandingAI secured a $57 million round of series A funding led by McRock Capital, to help enterprises adopt AI. In October 2024, Ng's AI Fund made its first investment in India, backing AI healthcare startup Jivi, which uses AI for diagnoses, treatment recommendations, and administrative tasks. The investment highlights the growth of India's AI sector, expected to reach $22 billion by 2027. === Research === Ng researches primarily in machine learning, deep learning, machine perception, computer vision, and natural language processing; and is one of the world's most famous and influential computer scientists. He's frequently won best paper awards at academic conferences and has had a huge impact on the field of AI, computer vision, and robotics. During graduate school, together with David M. Blei and Michael I. Jordan, Ng co-authored the influential paper that introduced latent Dirichlet allocation (LDA) for his thesis on reinforcement learning for drones. His early work includes the Stanford Autonomous Helicopter project, which developed one of the most capable autonomous helicopters in the world. He was the leading scientist and principal investigator on the STAIR (Stanford Artificial Intelligence Robot) project, which resulted in Robot Operating System (ROS), a widely used open source software robotics platform. His vision to build an AI robot and put a robot in every home inspired Scott Hassan to back him and create Willow Garage. He is also one of the founding team members for the Stanford WordNet project, which uses machine learning to expand the Princeton WordNet database created by Christiane Fellbaum. In 2011, Ng founded the Google Brain project at Google, which developed large-scale artificial neural networks using Google's distributed computing infrastructure. Among its notable results was a neural network trained using deep learning algorithms on 16,000 CPU cores, which learned to recognize cats after watching only YouTube videos, and without ever having been told what a "cat" is. The project's technology is also currently used in the Android operating system's speech recognition system. === Views on AI === Ng thinks that the real threat is contemplating the future of work: "Rather than being distracted by evil killer robots, the challenge to labor caused by these machines is a conversation that academia and industry and government should have." He has emphasized the importance of expanding access to AI education, stating that empowering people around the world to use AI tools is essential to building AI applications. In a December 2023 Financial Times interview, Ng highlighted concerns regarding the impact of potential regulations on open-source AI, emphasizing how reporting, licensing, and liability risks could unfairly burden smaller firms and stifle innovation. He argued that regulating basic technologies like open-source models could hinder progress without markedly enhancing safety. Ng advocated for carefully designed regulations to prevent obstacles to the development and distribution of beneficial AI technologies. In a June 2024 interview with the Financial Times, Ng expressed concerns about proposed AI legislation in California that would have required developers to implement safety mechanisms such as a "kill switch" for advanced models. He described the bill as creating "massive liabilities for science-fiction risks" and said it "stokes fear in anyone daring to innovate." Other critics argued the bill would impose burdens on open-source developers and smaller AI companies. The bill was ultimately vetoed by Governor Gavin Newsom in September 2024. == Online education: massive open online course == In 2011, Stanford launched a total of three massive open online course (MOOCs) on machine learning (CS229a), databases, and AI, taught by Ng

    Read more →
  • Lesk algorithm

    Lesk algorithm

    The Lesk algorithm is a classical algorithm for word sense disambiguation introduced by Michael E. Lesk in 1986. It operates on the premise that words within a given context are likely to share a common meaning. This algorithm compares the dictionary definitions of an ambiguous word with the words in its surrounding context to determine the most appropriate sense. Variations, such as the Simplified Lesk algorithm, have demonstrated improved precision and efficiency. However, the Lesk algorithm has faced criticism for its sensitivity to definition wording and its reliance on brief glosses. Researchers have sought to enhance its accuracy by incorporating additional resources like thesauruses and syntactic models. == Overview == The Lesk algorithm is based on the assumption that words in a given "neighborhood" (section of text) will tend to share a common topic. A simplified version of the Lesk algorithm is to compare the dictionary definition of an ambiguous word with the terms contained in its neighborhood. Versions have been adapted to use WordNet. An implementation might look like this: for every sense of the word being disambiguated one should count the number of words that are in both the neighborhood of that word and in the dictionary definition of that sense the sense that is to be chosen is the sense that has the largest number of this count. A frequently used example illustrating this algorithm is for the context "pine cone". The following dictionary definitions are used: PINE 1. kinds of evergreen tree with needle-shaped leaves 2. waste away through sorrow or illness CONE 1. solid body which narrows to a point 2. something of this shape whether solid or hollow 3. fruit of certain evergreen trees As can be seen, the best intersection is Pine #1 ⋂ Cone #3 = 2. == Simplified Lesk algorithm == In Simplified Lesk algorithm, the correct meaning of each word in a given context is determined individually by locating the sense that overlaps the most between its dictionary definition and the given context. Rather than simultaneously determining the meanings of all words in a given context, this approach tackles each word individually, independent of the meaning of the other words occurring in the same context. "A comparative evaluation performed by Vasilescu et al. (2004) has shown that the simplified Lesk algorithm can significantly outperform the original definition of the algorithm, both in terms of precision and efficiency. By evaluating the disambiguation algorithms on the Senseval-2 English all words data, they measure a 58% precision using the simplified Lesk algorithm compared to the only 42% under the original algorithm. Note: Vasilescu et al. implementation considers a back-off strategy for words not covered by the algorithm, consisting of the most frequent sense defined in WordNet. This means that words for which all their possible meanings lead to zero overlap with current context or with other word definitions are by default assigned sense number one in WordNet." Simplified LESK Algorithm with smart default word sense (Vasilescu et al., 2004) The COMPUTEOVERLAP function returns the number of words in common between two sets, ignoring function words or other words on a stop list. The original Lesk algorithm defines the context in a more complex way. == Criticisms == Unfortunately, Lesk’s approach is very sensitive to the exact wording of definitions, so the absence of a certain word can radically change the results. Further, the algorithm determines overlaps only among the glosses of the senses being considered. This is a significant limitation in that dictionary glosses tend to be fairly short and do not provide sufficient vocabulary to relate fine-grained sense distinctions. A lot of work has appeared offering different modifications of this algorithm. These works use other resources for analysis (thesauruses, synonyms dictionaries or morphological and syntactic models): for instance, it may use such information as synonyms, different derivatives, or words from definitions of words from definitions. == Lesk variants == Original Lesk (Lesk, 1986) Adapted/Extended Lesk (Banerjee and Pederson, 2002/2003): In the adaptive lesk algorithm, a word vector is created corresponds to every content word in the wordnet gloss. Concatenating glosses of related concepts in WordNet can be used to augment this vector. The vector contains the co-occurrence counts of words co-occurring with w in a large corpus. Adding all the word vectors for all the content words in its gloss creates the Gloss vector g for a concept. Relatedness is determined by comparing the gloss vector using the Cosine similarity measure. There are a lot of studies concerning Lesk and its extensions: Wilks and Stevenson, 1998, 1999; Mahesh et al., 1997; Cowie et al., 1992; Yarowsky, 1992; Pook and Catlett, 1988; Kilgarriff and Rosensweig, 2000; Kwong, 2001; Nastase and Szpakowicz, 2001; Gelbukh and Sidorov, 2004.

    Read more →
  • Ian Goodfellow

    Ian Goodfellow

    Ian J. Goodfellow (born 1987) is an American computer scientist, engineer, and executive, most noted for his work on artificial neural networks and deep learning. He is a research scientist at Google DeepMind, was previously employed as a research scientist at Google Brain and director of machine learning at Apple as well as one of the first employees at OpenAI, and has made several important contributions to the field of deep learning, including the invention of the generative adversarial network (GAN). Goodfellow co-wrote, as the first author, the textbook Deep Learning (2016) and wrote the chapter on deep learning in the authoritative textbook of the field of artificial intelligence, Artificial Intelligence: A Modern Approach (used in more than 1,500 universities in 135 countries). == Education == Goodfellow obtained his BSc and MSc in computer science from Stanford University under the supervision of Andrew Ng, and his PhD in machine learning from the Université de Montréal in February 2015, under the supervision of Yoshua Bengio and Aaron Courville. Goodfellow's thesis is titled Deep learning of representations and its application to computer vision. == Career == After graduation, Goodfellow joined Google as part of the Google Brain research team. In March 2016, he left Google to join the newly founded OpenAI research laboratory. 11 months later, in March 2017, Goodfellow returned to Google Research, but left again in 2019. In 2019, Goodfellow joined Apple as director of machine learning in the Special Projects Group. He resigned from Apple in April 2022 to protest Apple's plan to require in-person work for its employees. Shortly after, Goodfellow then joined Google DeepMind as a research scientist. In 2025, Goodfellow left Google. As of July 2026, based on information on Goodfellow's LinkedIn profile, he is co-founding a startup company. == Research == Goodfellow is best known for inventing generative adversarial networks (GANs), using deep learning to generate images. This approach uses two neural networks to competitively improve an image's quality. A “generator” network creates a synthetic image based on an initial set of images such as a collection of faces. A “discriminator” network tries to determine whether images are authentic or created by the generator. The generate-detect cycle is repeated. For each iteration, the generator and the discriminator use the other's feedback to improve or detect the generated images, until the discriminator can no longer distinguish between generated and authentic images. However, GANs have also been used to create deepfakes. At Google, Goodfellow developed a system enabling Google Maps to automatically transcribe addresses from photos taken by Street View cars and demonstrated security vulnerabilities of machine learning systems. == Recognition == In 2017, Goodfellow was cited in MIT Technology Review's 35 Innovators Under 35. In 2019, he was included in Foreign Policy's list of 100 Global Thinkers.

    Read more →
  • Freddy II

    Freddy II

    Freddy (1969–1971) and Freddy II (1973–1976) were experimental robots built in the Department of Machine Intelligence and Perception (later Department of Artificial Intelligence, now part of the School of Informatics at the University of Edinburgh). == Technology == Technical innovations involving Freddy were at the forefront of the 70s robotics field. Freddy was one of the earliest robots to integrate vision, manipulation and intelligent systems as well as having versatility in the system and ease in retraining and reprogramming for new tasks. The idea of moving the table instead of the arm simplified the construction. Freddy also used a method of recognising the parts visually by using graph matching on the detected features. The system used an innovative collection of high level procedures for programming the arm movements which could be reused for each new task. == Lighthill controversy == In the mid 1970s there was controversy about the utility of pursuing a general purpose robotics programme in both the USA and the UK. A BBC TV programme in 1973, referred to as the "Lighthill Debate", pitched James Lighthill, who had written a critical report for the science and engineering research funding agencies in the UK, against Donald Michie from the University of Edinburgh and John McCarthy from Stanford University. The Edinburgh Freddy II and Stanford/SRI Shakey robots were used to illustrate the state-of-the-art at the time in intelligent robotics systems. == Freddy I and II == Freddy Mark I (1969–1971) was an experimental prototype, with 3 degrees-of-freedom created by a rotating platform driven by a pair of independent wheels. The other main components were a video camera and bump sensors connected to a computer. The computer moved the platform so that the camera could see and then recognise the objects. Freddy II (1973–1976) was a 5 degrees of freedom manipulator with a large vertical 'hand' that could move up and down, rotate about the vertical axis and rotate objects held in its gripper around one horizontal axis. Two remaining translational degrees of freedom were generated by a work surface that moved beneath the gripper. The gripper was a two finger pinch gripper. A video camera was added as well as later a light stripe generator. The Freddy and Freddy II projects were initiated and overseen by Donald Michie. The mechanical hardware and analogue electronics were designed and built by Stephen Salter (who also pioneered renewable energy from waves (see Salter's Duck)), and the digital electronics and computer interfacing were designed by Harry Barrow and Gregan Crawford. The software was developed by a team led by Rod Burstall, Robin Popplestone and Harry Barrow which used the POP-2 programming language, one of the world's first functional programming languages. The computing hardware was an Elliot 4130 computer with 384KB (128K 24-bit words) RAM and a hard disk linked to a small Honeywell H316 computer with 16KB of RAM which directly performed sensing and control. Freddy was a versatile system which could be trained and reprogrammed to perform a new task in a day or two. The tasks included putting rings on pegs and assembling simple model toys consisting of wooden blocks of different shapes, a boat with a mast and a car with axles and wheels. Information about part locations was obtained using the video camera, and then matched to previously stored models of the parts. It was soon realised in the Freddy project that the 'move here, do this, move there' style of robot behavior programming (actuator or joint level programming) is tedious and also did not allow for the robot to cope with variations in part position, part shape and sensor noise. Consequently, the RAPT robot programming language was developed by Pat Ambler and Robin Popplestone, in which robot behavior was specified at the object level. This meant that robot goals were specified in terms of desired position relationships between the robot, objects and the scene, leaving the details of how to achieve the goals to the underlying software system. Although developed in the 1970s RAPT is still considerably more advanced than most commercial robot programming languages. The team of people who contributed to the project were leaders in the field at the time and included Pat Ambler, Harry Barrow, Ilona Bellos, Chris Brown, Rod Burstall, Gregan Crawford, Jim Howe, Donald Michie, Robin Popplestone, Stephen Salter, Austin Tate and Ken Turner. Also of interest in the project was the use of a structured-light 3D scanner to obtain the 3D shape and position of the parts being manipulated. The Freddy II robot is currently on display at the Royal Museum in Edinburgh, Scotland, with a segment of the assembly video shown in a continuous loop.

    Read more →
  • ChipTest

    ChipTest

    ChipTest was a 1985 chess playing computer built by Feng-hsiung Hsu, Thomas Anantharaman and Murray Campbell at Carnegie Mellon University. It is the predecessor of Deep Thought which in turn evolved into Deep Blue. == History == ChipTest was based on a special VLSI-technology move generator chip developed by Hsu. ChipTest was controlled by a Sun-3/160 workstation and capable of searching approximately 50,000 moves per second. Hsu and Anantharaman entered ChipTest in the 1986 North American Computer Chess Championship, and it was only partially tested when the tournament began. It lost its first two rounds, but finished with an even score. In August 1987, ChipTest was overhauled and renamed ChipTest-M, M standing for microcode. The new version had eliminated ChipTest's bugs and was ten times faster, searching 500,000 moves per second and running on a Sun-4 workstation. ChipTest-M won the North American Computer Chess Championship in 1987 with a 4–0 sweep. ChipTest was invited to play in the 1987 American Open, but the team did not enter due to an objection by the HiTech team, also from Carnegie Mellon University. HiTech and ChipTest shared some code, and Hitech was already playing in the tournament. The two teams became rivals. Designing and implementing ChipTest revealed many possibilities for improvement, so the designers started on a new machine. Deep Thought 0.01 was created in May 1988 and the version 0.02 in November the same year. This new version had two customized VLSI chess processors and it was able to search 720,000 moves per second. With the "0.02" dropped from its name, Deep Thought won the World Computer Chess Championship with a perfect 5–0 score in 1989.

    Read more →
  • Pattern playback

    Pattern playback

    The pattern playback is an early talking device that was built by Dr. Franklin S. Cooper and his colleagues, including John M. Borst and Caryl Haskins, at Haskins Laboratories in the late 1940s and completed in 1950. There were several different versions of this hardware device. Only one currently survives. The machine converts pictures of the acoustic patterns of speech in the form of a spectrogram back into sound. Using this device, Alvin Liberman, Frank Cooper, and Pierre Delattre (later joined by Katherine Safford Harris, Leigh Lisker, and others) were able to discover acoustic cues for the perception of phonetic segments (consonants and vowels). This research was fundamental to the development of modern techniques of speech synthesis, reading machines for the blind, the study of speech perception and speech recognition, and the development of the motor theory of speech perception. To create sound, the pattern playback machine uses an arc light source which is directed against a rotating disk with 50 concentric tracks whose transparencies vary systematically in order to produce 50 harmonics of a fundamental frequency. The light is further projected against a spectrogram, whose reflectance corresponds to the sound pressure level of the partial of the signal, and is then directed towards a photovoltaic cell by which the light variation is converted into sound pressure variations. The pattern playback was last used in an experimental study by Robert Remez in 1976. The pattern playback now resides in the Museum at Haskins Laboratories in New Haven, Connecticut. The technique of pattern playback also now refers, more generally, to algorithms or techniques for converting spectrograms, cochleagrams, and correlograms from pictures back into sounds. A demonstration is in the TV show Adventure. Pioneering technology in psycholinguistics (CBS Television. 1953). == Digital pattern playback == In the 1970s, digital pattern playbacks began to supplant the earlier version. An early prototype was developed by Patrick Nye, Philip Rubin, and colleagues at Haskins Laboratories. It combined a "Ubiquitous Spectrum Analyzer"[1] for automatic spectral analysis, along with a VAX GT-40 display processor for graphic manipulation of the displayed spectrogram, a form of "synthesis by art", and subsequent re-synthesis using a 40 channel filter bank. This hybrid hardware/software digital pattern playback was eventually replaced at Haskins Laboratories by the HADES analysis and display system, designed by Philip Rubin, and implemented in Fortran on the VAX family of computers. A more modern version has been described by Arai and colleagues [2]. An on-line demonstration is available [3].

    Read more →
  • Vilém Flusser

    Vilém Flusser

    Vilém Flusser (May 12, 1920 – November 27, 1991) was a Czech-born Brazilian philosopher, writer and journalist, best known for his contributions to media studies, communication theory, and the philosophy of language. He lived for a long period in São Paulo (where he became a Brazilian citizen) and later in France, and his works are written in many different languages. His early work was marked by discussion of the thought of Martin Heidegger, and by the influence of existentialism and phenomenology. Phenomenology would play a major role in the transition to the later phase of his work, in which he turned his attention to the philosophy of communication and of artistic production. He contributed to the dichotomy logic theory through history: the period of image worship, and period of text worship, with deviations consequently into idolatry and "textolatry". == Life == Flusser was born in 1920 in Prague, Czechoslovakia into a family of Jewish intellectuals. His father, Gustav Flusser, studied mathematics and physics (under Albert Einstein among others). Vilém attended German and Czech primary schools and later a German grammar school. In 1938, Flusser started to study philosophy at the Juridical Faculty of the Charles University in Prague. In 1939, shortly after the Nazi occupation, Flusser emigrated to London (with Edith Barth, his later wife, and her parents) to continue his studies for one term at the London School of Economics and Political Science. Vilém Flusser lost all of his family in the German concentration camps: his father died in Buchenwald in 1940; his grandparents, his mother and his sister were brought to Theresienstadt and later to Auschwitz where they were killed. The next year, he emigrated to Brazil, living both in São Paulo and Rio de Janeiro. He started working at a Czech import/export company and then at Stabivolt, a manufacturer of radios and transistors. In 1960 he started to collaborate with the Brazilian Institute of Philosophy (IBF) in São Paulo and published in the Revista Brasileira de Filosofia; by these means he seriously approached the Brazilian intellectual community. Flusser had as his friend and closest interlocutor the Brazilian philosopher Vicente Ferreira da Silva. Flusser and Vicente Ferreira da Silva met in São Paulo in the 1960s and began a close intellectual dialogue that continued until Ferreira da Silva's death in 1963. Flusser wrote several essays on Ferreira da Silva's work and that Ferreira da Silva's concept of "Fundamental ontology” had a significant impact on Flusser's understanding of the nature of reality. During the 60s Flusser published and taught at several schools in São Paulo, being Lecturer for Philosophy of Science at the Escola Politécnica of the University of São Paulo and Professor of Philosophy of Communication at the Escola Dramática and the Escola Superior de Cinema in São Paulo. He also participated actively in the arts, collaborating with the Bienal de São Paulo, among other cultural events. Beginning in the 1950s he taught philosophy and worked as a journalist, before publishing his first book Língua e realidade (Language and Reality) in 1963. In 1972 he decided to leave Brazil. Some say it was because it was becoming difficult to publish because of the military regime. Others dispute this reason, since his work on communication and language did not threaten the military. In 1970, when a reform took place at the University of São Paulo by the Brazilian military government, all Lecturers of Philosophy (members of the Department of Philosophy) were dismissed. Flusser, who taught at the Engineering School (Escola Politécnica), had to leave the university as well. In 1972 he and his wife Edith settled temporarily in Merano (Tyrol). Further short stays in various European countries followed until they moved to Robion in southern France in 1981, where they remained until Flusser's death in 1991. To the end of his life, he was quite active writing and giving lectures around media theory and working with new topics (Philosophy of Photography, Technical Images, etc.). He died in 1991 in a car accident near the Czech–German border, while trying to visit his native city, Prague, to give a lecture. Vilém Flusser is the cousin of David Flusser. == Philosophy == Flusser's essays are short, provocative and lucid, with a resemblance to the style of journalistic articles. Critics have noted he is less a 'systematic' thinker than a 'dialogic' one, purposefully eclectic and provocative (Cubitt 2004). However, his early books, written in the 1960s, primarily in Portuguese, and published in Brazil, have a slightly different style. Flusser's writings relate to each other, however, which means that he intensively works over certain topics and dissects them into a number of brief essays. His main topics of interest were: epistemology, ethics, aesthetics, ontology, language philosophy, semiotics, philosophy of science, the history of Western culture, the philosophy of religion, the history of symbolic language, technology, writing, the technical image, photography, migration, media and literature, and, especially in his later years, the philosophy of communication and of artistic production. His writings reflect his wandering life: although the majority of his work was written in German and Portuguese, he also wrote in English and French, with scarce translation to other languages. Because Flusser's writings in different languages are dispersed in the form of books, articles or sections of books, his work as a media philosopher and cultural theorist is only now becoming more widely known. The first book by Flusser to be published in English was Towards a Philosophy of Photography in 1984 by the then new journal European Photography, which was his own translation of the work. The Shape of Things, was published in London in 1999 and was followed by a new translation of Towards a Philosophy of Photography. Flusser's archives have been held by the Academy of Media Arts in Cologne and are currently housed at the Berlin University of the Arts. === Philosophy of photography === Writing about photography in the 1970s and 80s, in the face of the early worldwide impact of computer technologies, Flusser argued that the photograph was the first in a number of technical image forms to have fundamentally changed the way in which the world is seen. Historically, the importance of photography had been that it introduced nothing less than a new epoch: 'The invention of photography constitutes a break in history that can only be understood in comparison to that other historical break constituted by the invention of linear writing.' Whereas ideas might previously have been interpreted in terms of their written form, photography heralded new forms of perceptual experience and knowledge. As Flusser Archive Supervisor Claudia Becker describes, "For Flusser, photography is not only a reproductive imaging technology, it is a dominant cultural technique through which reality is constituted and understood". In this context, Flusser argued that photographs have to be understood in strict separation from 'pre-technical image forms'. For example, he contrasted them to paintings which he described as images that can be sensibly 'decoded', because the viewer is able to interpret what he or she sees as more or less direct signs of what the painter intended. By contrast, even though photography produces images that seem to be 'faithful reproductions' of objects and events they cannot be so directly 'decoded'. The crux of this difference stems, for Flusser, from the fact that photographs are produced through the operations of an apparatus. And the photographic apparatus operates in ways that are not immediately known or shaped by its operator. For example, he described the act of photographing as follows: The photographer's gesture as the search for a viewpoint onto a scene takes place within the possibilities offered by the apparatus. The photographer moves within specific categories of space and time regarding the scene: proximity and distance, bird- and worm's-eye views, frontal- and side-views, short or long exposures, etc. The Gestalt of space–time surrounding the scene is prefigured for the photographer by the categories of his camera. These categories are an a priori for him. He must 'decide' within them: he must press the trigger. Roughly put, the person using a camera might think that they are operating its controls to produce a picture that shows the world the way they want it to be seen, but it is the pre-programmed character of the camera that sets the parameters of this act and it is the apparatus that shapes the meaning of the resulting image. Given the central role of photography to almost all aspects of contemporary life, the programmed character of the photographic apparatus shapes the experience of looking at and interpreting photographs as well as most of the cultural contexts in which we do so. Flusse

    Read more →
  • Journal of Experimental and Theoretical Artificial Intelligence

    Journal of Experimental and Theoretical Artificial Intelligence

    The Journal of Experimental and Theoretical Artificial Intelligence is a quarterly peer-reviewed scientific journal published by Taylor and Francis. It covers all aspects of artificial intelligence and was established in 1989. The editor-in-chief is Eric Dietrich (Binghamton University), the deputy editors-in-chief are Li Pheng Khoo (School of Mechanical & Aerospace Engineering, Nanyang Technological University) and Antonio Lieto (Department of Computer Science, University of Turin). == Abstracting and indexing == The journal is abstracted and indexed in: According to the Journal Citation Reports, the journal has a 2020/2021 impact factor of 2.340 .

    Read more →