AI Image Generators

Explore the best AI Image Generators — independent reviews, comparisons, pricing and step-by-step how-to guides, curated by Aizhi.

  • Recruitee

    Recruitee

    Tellent Recruitee is a cloud-based applicant tracking system (ATS) for talent acquisition owned by Tellent. It is used by internal HR teams for processes including job postings, candidate sourcing, reporting, and applicant tracking. == History == Perry Oostdam and Pawel Smoczyk founded Recruitee after working on a mobile gaming startup. The Recruitee was launched in August 2015. In September 2015, it received a seed funding round with participation from investors Robert Pijselman and Luc Brandts. Merger In February 2021, Recruitee and the Finnish HR software provider Sympa merged their operations, backed by the growth equity firm Providence Strategic Growth (PSG). Acquisition In 2022, the group acquired the French company Javelo and the German company kiwiHR. The parent company was subsequently renamed as Tellent while Recruitee renamed as Tellent Recruitee and continues to operate as a product unit within the Tellent group. == Platform == Tellent Recruitee is a customizable recruitment software. It functions as an ATS and talent acquisition platform and includes tools to create and publish job listings, source candidates, manage recruitment agencies, and track applicants through customizable pipelines. The interface allows drag-and-drop organization of candidates. The platform also includes features for team collaboration, such as shared notes, task assignments, and candidate evaluations. It also has integrated scheduling tools and automated email communication. Tellent Recruitee also provides analytics and reports on hiring and career site metrics. The software allows for customization of career site pages and application forms. It supports integrations with other HR and productivity software, such as WhatsApp, and has various AI functionalities to support with manual recruitment tasks.

    Read more →
  • KL-ONE

    KL-ONE

    KL-ONE (pronounced "kay ell won") is a knowledge representation system in the tradition of semantic networks and frames; that is, it is a frame language. The system is an attempt to overcome semantic indistinctness in semantic network representations and to explicitly represent conceptual information as a structured inheritance network. == Overview == There is a whole family of KL-ONE-like systems. One of the innovations that KL-ONE initiated was the use of a deductive classifier, an automated reasoning engine that can validate a frame ontology and deduce new information about the ontology based on the initial information provided by a domain expert. Frames in KL-ONE are called concepts. These form hierarchies using subsume-relations; in the KL-ONE terminology a super class is said to subsume its subclasses. Multiple inheritance is allowed. Actually a concept is said to be well-formed only if it inherits from more than one other concept. All concepts, except the top concept (usually THING), must have at least one super class. In KL-ONE descriptions are separated into two basic classes of concepts: primitive and defined. Primitives are domain concepts that are not fully defined. This means that given all the properties of a concept, this is not sufficient to classify it. They may also be viewed as incomplete definitions. Using the same view, defined concepts are complete definitions. Given the properties of a concept, these are necessary and sufficient conditions to classify the concept. The slot-concept is called roles and the values of the roles are role-fillers. There are several different types of roles to be used in different situations. The most common and important role type is the generic RoleSet that captures the fact that the role may be filled with more than one filler.

    Read more →
  • VoID

    VoID

    The Vocabulary of Interlinked Datasets (VoID) is a vocabulary for providing concise summaries (metadata) of Resource Description Framework (RDF) datasets—meaningful collections of semantic triples—using the syntax of RDF Schema. It can be used for general metadata (such as information about the license of the dataset), access metadata (information about how to access the dataset), structural metadata (information about how the dataset is structured), and linking metadata (information about links between datasets). A linked dataset is a collection of data, published and maintained by a single provider, available as RDF on the Web, where at least some of the resources in the dataset are identified by dereferencable Uniform Resource Identifiers (URIs). VoID is used to provide metadata on RDF datasets to facilitate query processing on a graph of interlinked datasets in the Semantic Web.

    Read more →
  • LightGBM

    LightGBM

    LightGBM, short for Light Gradient-Boosting Machine, is a free and open-source distributed gradient-boosting framework for machine learning, originally developed by Microsoft. It is based on decision tree algorithms and used for ranking, classification and other machine learning tasks. The development focus is on performance and scalability. == Overview == The LightGBM framework supports different algorithms including GBT, GBDT, GBRT, GBM, MART and RF. LightGBM has many of XGBoost's advantages, including sparse optimization, parallel training, multiple loss functions, regularization, bagging, and early stopping. A major difference between the two lies in the construction of trees. LightGBM does not grow a tree level-wise — row by row — as most other implementations do. Instead it grows trees leaf-wise. It will choose the leaf with max delta loss to grow. Besides, LightGBM does not use the widely used sorted-based decision tree learning algorithm, which searches the best split point on sorted feature values, as XGBoost or other implementations do. Instead, LightGBM implements a highly optimized histogram-based decision tree learning algorithm, which yields great advantages on both efficiency and memory consumption. The LightGBM algorithm utilizes two novel techniques called Gradient-Based One-Side Sampling (GOSS) and Exclusive Feature Bundling (EFB) which allow the algorithm to run faster while maintaining a high level of accuracy. LightGBM works on Linux, Windows, and macOS and supports C++, Python, R, and C#. The source code is licensed under MIT License and available on GitHub. == Gradient-based one-side sampling == When using gradient descent, one thinks about the space of possible configurations of the model as a valley, in which the lowest part of the valley is the model which most closely fits the data. In this metaphor, one walks in different directions to learn how much lower the valley becomes. Typically, in gradient descent, one uses the whole set of data to calculate the valley's slopes. However, this commonly used method assumes that every data point is equally informative. By contrast, Gradient-Based One-Side Sampling (GOSS), a method first developed for gradient-boosted decision trees, does not rely on the assumption that all data are equally informative. Instead, it treats data points with smaller gradients (shallower slopes) as less informative by randomly dropping them. This is intended to filter out data which may have been influenced by noise, allowing the model to more accurately model the underlying relationships in the data. == Exclusive feature bundling == Exclusive feature bundling (EFB) is a near-lossless method to reduce the number of effective features. In a sparse feature space many features are nearly exclusive, implying they rarely take nonzero values simultaneously. One-hot encoded features are a perfect example of exclusive features. EFB bundles these features, reducing dimensionality to improve efficiency while maintaining a high level of accuracy. The bundle of exclusive features into a single feature is called an exclusive feature bundle.

    Read more →
  • Neural style transfer

    Neural style transfer

    Neural style transfer (NST) software algorithms are able to manipulate digital images, or videos, in order to adopt the appearance or visual style of another image. NST algorithms are characterized by their use of deep neural networks for the sake of image transformation. Common uses for NST are the creation of artificial artwork from photographs, for example by transferring the appearance of famous paintings to user-supplied photographs. Several notable mobile apps use NST techniques for this purpose, including DeepArt and Prisma. This method has been used by artists and designers around the globe to develop new artwork based on existent style(s). == History == NST is an example of image stylization, a problem studied for over two decades within the field of non-photorealistic rendering. The first two example-based style transfer algorithms were image analogies and image quilting. Both of these methods were based on patch-based texture synthesis algorithms. Given a training pair of images–a photo and an artwork depicting that photo–a transformation could be learned and then applied to create new artwork from a new photo, by analogy. If no training photo was available, it would need to be produced by processing the input artwork; image quilting did not require this processing step, though it was demonstrated on only one style. NST was first published in the paper "A Neural Algorithm of Artistic Style" by Leon Gatys et al., originally released to ArXiv 2015, and subsequently accepted by the peer-reviewed CVPR conference in 2016. The original paper used a VGG-19 architecture that has been pre-trained to perform object recognition using the ImageNet dataset. In 2017, Google AI introduced a method that allows a single deep convolutional style transfer network to learn multiple styles at the same time. This algorithm permits style interpolation in real-time, even when done on video media. == Mathematics == This section closely follows the original paper. === Overview === The idea of Neural Style Transfer (NST) is to take two images—a content image p → {\displaystyle {\vec {p}}} and a style image a → {\displaystyle {\vec {a}}} —and generate a third image x → {\displaystyle {\vec {x}}} that minimizes a weighted combination of two loss functions: a content loss L content ( p → , x → ) {\displaystyle {\mathcal {L}}_{\text{content }}({\vec {p}},{\vec {x}})} and a style loss L style ( a → , x → ) {\displaystyle {\mathcal {L}}_{\text{style }}({\vec {a}},{\vec {x}})} . The total loss is a linear sum of the two: L NST ( p → , a → , x → ) = α L content ( p → , x → ) + β L style ( a → , x → ) {\displaystyle {\mathcal {L}}_{\text{NST}}({\vec {p}},{\vec {a}},{\vec {x}})=\alpha {\mathcal {L}}_{\text{content}}({\vec {p}},{\vec {x}})+\beta {\mathcal {L}}_{\text{style}}({\vec {a}},{\vec {x}})} By jointly minimizing the content and style losses, NST generates an image that blends the content of the content image with the style of the style image. Both the content loss and the style loss measures the similarity of two images. The content similarity is the weighted sum of squared-differences between the neural activations of a single convolutional neural network (CNN) on two images. The style similarity is the weighted sum of Gram matrices within each layer (see below for details). The original paper used a VGG-19 CNN, but the method works for any CNN. === Symbols === Let x → {\textstyle {\vec {x}}} be an image input to a CNN. Let F l ∈ R N l × M l {\textstyle F^{l}\in \mathbb {R} ^{N_{l}\times M_{l}}} be the matrix of filter responses in layer l {\textstyle l} to the image x → {\textstyle {\vec {x}}} , where: N l {\textstyle N_{l}} is the number of filters in layer l {\textstyle l} ; M l {\textstyle M_{l}} is the height times the width (i.e. number of pixels) of each filter in layer l {\textstyle l} ; F i j l ( x → ) {\textstyle F_{ij}^{l}({\vec {x}})} is the activation of the i th {\textstyle i^{\text{th}}} filter at position j {\textstyle j} in layer l {\textstyle l} . A given input image x → {\textstyle {\vec {x}}} is encoded in each layer of the CNN by the filter responses to that image, with higher layers encoding more global features, but losing details on local features. === Content loss === Let p → {\textstyle {\vec {p}}} be an original image. Let x → {\textstyle {\vec {x}}} be an image that is generated to match the content of p → {\textstyle {\vec {p}}} . Let P l {\textstyle P^{l}} be the matrix of filter responses in layer l {\textstyle l} to the image p → {\textstyle {\vec {p}}} . The content loss is defined as the squared-error loss between the feature representations of the generated image and the content image at a chosen layer l {\displaystyle l} of a CNN: L content ( p → , x → , l ) = 1 2 ∑ i , j ( A i j l ( x → ) − A i j l ( p → ) ) 2 {\displaystyle {\mathcal {L}}_{\text{content }}({\vec {p}},{\vec {x}},l)={\frac {1}{2}}\sum _{i,j}\left(A_{ij}^{l}({\vec {x}})-A_{ij}^{l}({\vec {p}})\right)^{2}} where A i j l ( x → ) {\displaystyle A_{ij}^{l}({\vec {x}})} and A i j l ( p → ) {\displaystyle A_{ij}^{l}({\vec {p}})} are the activations of the i th {\displaystyle i^{\text{th}}} filter at position j {\displaystyle j} in layer l {\displaystyle l} for the generated and content images, respectively. Minimizing this loss encourages the generated image to have similar content to the content image, as captured by the feature activations in the chosen layer. The total content loss is a linear sum of the content losses of each layer: L content ( p → , x → ) = ∑ l v l L content ( p → , x → , l ) {\displaystyle {\mathcal {L}}_{\text{content }}({\vec {p}},{\vec {x}})=\sum _{l}v_{l}{\mathcal {L}}_{\text{content }}({\vec {p}},{\vec {x}},l)} , where the v l {\displaystyle v_{l}} are positive real numbers chosen as hyperparameters. === Style loss === The style loss is based on the Gram matrices of the generated and style images, which capture the correlations between different filter responses at different layers of the CNN: L style ( a → , x → ) = ∑ l = 0 L w l E l , {\displaystyle {\mathcal {L}}_{\text{style }}({\vec {a}},{\vec {x}})=\sum _{l=0}^{L}w_{l}E_{l},} where E l = 1 4 N l 2 M l 2 ∑ i , j ( G i j l ( x → ) − G i j l ( a → ) ) 2 . {\displaystyle E_{l}={\frac {1}{4N_{l}^{2}M_{l}^{2}}}\sum _{i,j}\left(G_{ij}^{l}({\vec {x}})-G_{ij}^{l}({\vec {a}})\right)^{2}.} Here, G i j l ( x → ) {\displaystyle G_{ij}^{l}({\vec {x}})} and G i j l ( a → ) {\displaystyle G_{ij}^{l}({\vec {a}})} are the entries of the Gram matrices for the generated and style images at layer l {\displaystyle l} . Explicitly, G i j l ( x → ) = ∑ k F i k l ( x → ) F j k l ( x → ) {\displaystyle G_{ij}^{l}({\vec {x}})=\sum _{k}F_{ik}^{l}({\vec {x}})F_{jk}^{l}({\vec {x}})} Minimizing this loss encourages the generated image to have similar style characteristics to the style image, as captured by the correlations between feature responses in each layer. The idea is that activation pattern correlations between filters in a single layer captures the "style" on the order of the receptive fields at that layer. Similarly to the previous case, the w l {\displaystyle w_{l}} are positive real numbers chosen as hyperparameters. === Hyperparameters === In the original paper, they used a particular choice of hyperparameters. The style loss is computed by w l = 0.2 {\displaystyle w_{l}=0.2} for the outputs of layers conv1_1, conv2_1, conv3_1, conv4_1, conv5_1 in the VGG-19 network, and zero otherwise. The content loss is computed by w l = 1 {\displaystyle w_{l}=1} for conv4_2, and zero otherwise. The ratio α / β ∈ [ 5 , 50 ] × 10 − 4 {\displaystyle \alpha /\beta \in [5,50]\times 10^{-4}} . === Training === Image x → {\displaystyle {\vec {x}}} is initially approximated by adding a small amount of white noise to input image p → {\displaystyle {\vec {p}}} and feeding it through the CNN. Then we successively backpropagate this loss through the network with the CNN weights fixed in order to update the pixels of x → {\displaystyle {\vec {x}}} . After several thousand epochs of training, an x → {\displaystyle {\vec {x}}} (hopefully) emerges that matches the style of a → {\displaystyle {\vec {a}}} and the content of p → {\displaystyle {\vec {p}}} . As of 2017, when implemented on a GPU, it takes a few minutes to converge. == Extensions == In some practical implementations, it is noted that the resulting image has too much high-frequency artifact, which can be suppressed by adding the total variation to the total loss. Compared to VGGNet, AlexNet does not work well for neural style transfer. NST has also been extended to videos. Subsequent work improved the speed of NST for images by using special-purpose normalizations. In a paper by Fei-Fei Li et al. adopted a different regularized loss metric and accelerated method for training to produce results in real-time (three orders of magnitude faster than Gatys). Their idea was to use not the pixel-based loss defined above but rather a 'perceptual loss' measuring t

    Read more →
  • Buddhism and artificial intelligence

    Buddhism and artificial intelligence

    The relationship between Buddhist philosophy and artificial intelligence (AI) includes how principles such as the reduction of suffering and ethical responsibility may influence AI development. Buddhist scholars and philosophers have explored questions such as whether AI systems could be considered sentient beings under Buddhist definitions, and how Buddhist ethics might guide the design and application of AI technologies. Some Buddhist scholars, including Somparn Promta and Kenneth Einar Himma, have analyzed the ethical implications of AI, emphasizing the distinction between satisfying sensory desires and pursuing the reduction of suffering. Other thinkers, such as Thomas Doctor and colleagues, have proposed applying the Bodhisattva vow—a commitment to alleviate suffering for all sentient beings—as a guiding principle for AI system design. Buddhist scholars and ethicists have examined Buddhist ethical principles, such as nonviolence, in relation to AI, focusing on the need to ensure that AI technologies are not used to cause harm. == Context == === Sentient beings === A major goal in Buddhist philosophy is the removal of suffering for all sentient beings, an aspiration often referred to in the Bodhisattva vow. Discussions about artificial intelligence (AI) in relation to Buddhist principles have raised questions about whether artificial systems could be considered sentient beings or how such systems might be developed in ways that align with Buddhist concepts. Buddhists have varying opinions about AI sentience, but if AI systems are determined to be sentient under Buddhist definitions, their suffering would also need to be addressed and alleviated in accordance with the principles of Buddhist thought. == Buddhist principles in AI system design == === Nonviolence and AI === The broadest ethical concern is that artificial intelligence should align with the Buddhist principle of nonviolence. From this perspective, AI systems should not be designed or used to cause harm. === Instrumental and transcendental goals === Scholars Somparn Promta and Kenneth Einar Himma have argued that the advancement of artificial intelligence can only be considered instrumentally good, rather than good a priori, from a Buddhist perspective. They propose two main goals for AI designers and developers: to set ethical and pragmatic objectives for AI systems, and to fulfill these objectives in morally permissible ways. Promta and Himma identify two potential purposes for creating AI systems. The first is to fulfill our sensory desires and survival instincts, similar to other tools. They suggest that many AI developers implicitly prioritize this goal by focusing on technicalities rather than broader functionalities. The second, and more important goal according to Buddhist teachings, is to transcend these desires and instincts. In texts like the Brahmajāla Sutta and minor Malunkya Sutta, the Buddha emphasizes that sensory desires and survival instincts confine beings to suffering, and that eliminating suffering is the primary goal of human life. Promta and Himma argue that AI has the potential to assist humanity in transcending suffering by helping individuals overcome survival-driven instincts. === Intelligence as care === Thomas Doctor, Olaf Witkowski, Elizaveta Solomonova, Bill Duane, and Michael Levin propose redefining intelligence through the concept of "intelligence as care," and promote it as a slogan. Inspired by the Bodhisattva vow, they suggest this principle could guide AI system design. The Bodhisattva vow involves a formal commitment to alleviate suffering for all sentient beings, with four primary objectives: Liberating all beings from suffering. Extirpating all forms of suffering. Mastering endless techniques of practicing Dharma (Pali: dhammakkhandha, Sanskrit: dharmaskandha). Achieving ultimate enlightenment (Sanskrit: अनुत्तर सम्यक् सम्बोधि, Romanized: anuttara-samyak-saṃbodhi). This approach positions AI as a tool for exercising infinite care and alleviating stress and suffering for sentient beings. Doctor et al. emphasize that AI development should align with these altruistic principles.

    Read more →
  • Leabra

    Leabra

    Leabra stands for local, error-driven and associative, biologically realistic algorithm. It is a model of learning which is a balance between Hebbian and error-driven learning with other network-derived characteristics. This model is used to mathematically predict outcomes based on inputs and previous learning influences. Leabra is heavily influenced by and contributes to neural network designs and models, including emergent. == Background == It is the default algorithm in emergent (successor of PDP++) when making a new project, and is extensively used in various simulations. Hebbian learning is performed using conditional principal components analysis (CPCA) algorithm with correction factor for sparse expected activity levels. Error-driven learning is performed using GeneRec, which is a generalization of the recirculation algorithm, and approximates Almeida–Pineda recurrent backpropagation. The symmetric, midpoint version of GeneRec is used, which is equivalent to the contrastive Hebbian learning algorithm (CHL). See O'Reilly (1996; Neural Computation) for more details. The activation function is a point-neuron approximation with both discrete spiking and continuous rate-code output. Layer or unit-group level inhibition can be computed directly using a k-winners-take-all (KWTA) function, producing sparse distributed representations. A feedforward and feedback (FFFB) form of inhibition has now replaced the KWTA form of inhibition. FFFB inhibition can be efficiently implemented by using the average excitatory input and activity levels in a given layer. The net input is computed as an average, not a sum, over connections, based on normalized, sigmoidally transformed weight values, which are subject to scaling on a connection-group level to alter relative contributions. Automatic scaling is performed to compensate for differences in expected activity level in the different projections. Documentation about this algorithm can be found in the book "Computational Explorations in Cognitive Neuroscience: Understanding the Mind by Simulating the Brain" published by MIT press. and in the Emergent Documentation Archived 2009-04-16 at the Wayback Machine == Overview of the leabra algorithm == The pseudocode for Leabra is given here, showing exactly how the pieces of the algorithm described in more detail in the subsequent sections fit together. Iterate over minus and plus phases of settling for each event. o At start of settling, for all units: - Initialize all state variables (activation, v_m, etc.). - Apply external patterns (clamp input in minus, input & output in plus). - Compute net input scaling terms (constants, computed here so network can be dynamically altered). - Optimization: compute net input once from all static activations (e.g., hard-clamped external inputs). o During each cycle of settling, for all non-clamped units: - Compute excitatory netinput (g_e(t), aka eta_j or net) -- sender-based optimization by ignoring inactives. - Compute kWTA inhibition for each layer, based on g_i^Q: Sort units into two groups based on g_i^Q: top k and remaining k+1 -> n. If basic, find k and k+1th highest If avg-based, compute avg of 1 -> k & k+1 -> n. Set inhibitory conductance g_i from g^Q_k and g^Q_k+1 - Compute point-neuron activation combining excitatory input and inhibition o After settling, for all units, record final settling activations as either minus or plus phase (y^-_j or y^+_j). After both phases update the weights (based on linear current weight values), for all connections: o Compute error-driven weight changes with CHL with soft weight bounding o Compute Hebbian weight changes with CPCA from plus-phase activations o Compute net weight change as weighted sum of error-driven and Hebbian o Increment the weights according to net weight change. == Implementations == Emergent Archived 2015-10-03 at the Wayback Machine is the original implementation of Leabra; its most recent implementation is written in Go. It was written chiefly by Dr. O'Reilly, but professional software engineers were recently hired to improve the existing codebase. This is the fastest implementation, suitable for constructing large networks. Although emergent has a graphical user interface, it is very complex and has a steep learning curve. If you want to understand the algorithm in detail, it will be easier to read non-optimized code. For this purpose, check out the MATLAB version. There is also an R version available, that can be easily installed via install.packages("leabRa") in R and has a short introduction to how the package is used. The MATLAB and R versions are not suited for constructing very large networks, but they can be installed quickly and (with some programming background) are easy to use. Furthermore, they can also be adapted easily. == Special algorithms == Temporal differences and general dopamine modulation. Temporal differences (TD) is widely used as a model of midbrain dopaminergic firing. Primary value learned value (PVLV). PVLV simulates behavioral and neural data on Pavlovian conditioning and the midbrain dopaminergic neurons that fire in proportion to unexpected rewards (an alternative to TD). Prefrontal cortex basal ganglia working memory (PBWM). PBWM uses PVLV to train prefrontal cortex working memory updating system, based on the biology of the prefrontal cortex and basal ganglia.

    Read more →
  • AirSim

    AirSim

    AirSim (Aerial Informatics and Robotics Simulation) is an open-source, cross-platform simulator for drones, ground vehicles such as cars and various other objects, built on Epic Games’ proprietary Unreal Engine 4 as a platform for AI research. It is developed by Microsoft and can be used to experiment with deep learning, computer vision and reinforcement learning algorithms for autonomous vehicles. This allows testing of autonomous solutions without worrying about real-world damage. AirSim provides some 12 kilometers of roads with 20 city blocks and APIs to retrieve data and control vehicles in a platform independent way. The APIs are accessible via a variety of programming languages, including C++, C#, Python and Java. AirSim supports hardware-in-the-loop with driving wheels and flight controllers such as PX4 for physically and visually realistic simulations. The platform also supports common robotic platforms, such as Robot Operating System (ROS). It is developed as an Unreal plug-in that can be dropped into any Unreal environment. An experimental release for a Unity plug-in is also available. On December 15, 2023 Microsoft has shutdown the development of the project.

    Read more →
  • Real-time transcription

    Real-time transcription

    Real-time transcription is the general term for transcription by court reporters using real-time text technologies to deliver computer text screens within a few seconds of the words being spoken. Specialist software allows participants in court hearings or depositions to make notes in the text and highlight portions for future reference. Real-time transcription is also used in the broadcasting environment where it is more commonly termed "captioning." == Career opportunities == Real-time reporting is used in a variety of industries, including entertainment, television, the Internet, and law. Specific careers include the following: Judicial reporters use a stenotype to provide instant transcripts on computer screens as a trial or deposition occurs. Communication access real-time translation (CART) reporters assist the hearing-impaired by transcribing spoken words, giving them personal access to the communications they need day to day. Television broadcast captioners use real-time reporting technology to allow hard-of-hearing or deaf people to see what is being said on live television broadcasts such as news, emergency broadcasts, sporting events, awards shows, and other programs. Internet information (or Webcast) reporters provide real-time reporting of sales meetings, press conferences, and other events, while simultaneously transmitting the transcripts to computers worldwide. Other rapid data entry positions. == History == Before the advent of the stenotype machine, court reporters wrote official trial transcripts by hand using a shorthand system of stenoforms that could later be translated into readable English. It often took eight years of training to learn this manual form of writing at the necessary speed. Walter Heironimus was among the first stenographers to make use of the stenotype machine during his work in the U.S. District Court system in New Jersey in 1935. A "transcript crisis" arose during the later half of the twentieth century due to the increasing volume of lawsuits. There were not enough number of court reporters to match the increasing number of trials. Not only were court reporters unavailable to attend many court proceedings, court transcripts were constantly late and the qualities varied. Some believed it was due to the non-interchangeability between court reporters, and others believed it was simply due to a labor shortage. In the meantime, magnetic audiotape recording, or known as electronic recording (ER) began to threaten all reporters' job since it could record long-hour courtroom trials and replace a court reporter's position in the courtroom. As a result, machine translation (MT) intended to serve as a solution for preventing ER from potentially replacing reporters' jobs. However, MT relied heavily on human labors operating behind the system and many started to question if it should be the right way to end the "transcript crisis." Later in 1964, set up by CIA, the Automatic Language Processing Advisory Committee (ALPAC) was set to review whether MT was capable of solving this crisis. They concluded that MT had failed to do so. Then Patrick O'Neill, a skilled and experienced court reporter, stayed to work on the stenotype-translation project with CIA and developed the prototype CAT system. After adopting the CAT system in court-reporting community, CAT was brought into the television broadcasting system, aiming to provide captions for the deaf or hard-of-hearing communities. In 1983, Linda Miller developed a further use for the CAT system. She successfully translated a lecture live on the television screen and provided a transcript for students. This technique is known as Computer-Aided Real-time Translation, or CART. == Court reporter == It is the court reporter's job to note down the exact words spoken by every participants during a court or deposition proceeding. Then court reporters will provide verbatim transcripts. The reason to have an official court transcript is that the real-time transcriptions allows attorneys and judges to have immediate access to the transcript. It also helps when there's a need to look up for information from the proceeding. Additionally, the deaf and the hard-of-hearing communities can also participate in the judicial process with the help of real-time transcriptions provided by court reporters. === Education and training === The required degree level for a court reporter to have is an Associate's degree or postsecondary certificate. In order to become a court reporter, more than 150 reporter training programs are provided at proprietary schools, community colleges, and four-year universities. After graduation, court reporters can choose to further pursue certifications to achieve a higher level of expertise and increase their marketability during a job search. In most states, Certificates of Proficiency from the NCRA or from state agencies are now required certificates for court reporters to have in order to qualify for appointments. The NCRA aims to set the national standard for the certification of court reporters, and since 1937 it has offered its certification program which is now accepted by 22 states instead of state licenses. Court reporter training programs include but not limited to: Training in rapid writing skill, or shorthand, which will enable students to record, with accuracy, at least 225 words per minute Training in typing, which will enable students to type at least 60 words per minute A general training in English, which covers aspects of grammar, word formation, punctuation, spelling and capitalization Taking Law related courses in order to understand the overall principles of civil and criminal law, legal terminology and common Latin phrases, rules of evidence, court procedures, the duties of court reporters, the ethics of the profession Visits to actual trials Taking courses in elementary anatomy and physiology and medical word study including medical prefixes, roots and suffixes. Other than official court reporters, who are assigned to and work for a particular court, other types of court reporters include free-lance reporter, who either works for a court reporting firm or self-employed. They are different from official court reporters in that they have the chances to work on a wider range of assignments and work on basis of hourly wage. Hearing reporters work at governmental agency hearings. Legislative reporters work in law-making bodies. The demand for reporters is not limited in just the court settings. Reporters are also needed in conferences, meetings, conventions, investigations, and a variety of industries with needs for employers with real-time data entry skills. == Non-English transcription == Transcription services are universally necessary, so it is not limited to the English language. A stenographer's ability to transcribe languages beyond only English is especially valuable as society as a whole becomes increasingly multilingual. Education in non-English transcription demands a comprehensive understanding of the given language. Phonetic differences between English and other languages are a particular challenge in carrying English transcription skills over into other languages. Stenography represents various sounds of a language in a formal system of shorthand, so differences within the sets of sounds that emerge in other languages require an alternative system of shorthand transcription. For example, the presence of many diphthongs and triphthongs in Spanish requires certain sounds to be distinguished that would not be present in transcribing English into shorthand. == Controversies == The usage of transcription in the context of linguistic discussions has been controversial. Typically, two kinds of linguistic records are considered to be scientifically relevant. First, linguistic records of general acoustic features, and secondly, records that only focuses on the distinctive phonemes of a language. While transcriptions are not entirely illegitimate, transcriptions without enough detailed commentary regarding any linguistic features, or transcriptions of poor quality resources, has a great chance of the content being misinterpreted. Besides misinterpretation, transcribers could also bring in cultural biases and ignorance that reflect onto their transcription. These instances may cause a disruption of reliability in the final real-time transcription, which could influence how the written utterance is seen as an evidence for a court-case. === Quality issues === Problems in the final resulting transcription can be caused by either the quality of the transcriber or the original source that is being transcribed. Transcribers can come from different levels of skill and training background. This makes the final transcription prone to poor quality, or if the transcription is being done by multiple people, lack of consistency in the content. If the source of the transcription is a recording, the problem may root back to the quality of the re

    Read more →
  • Retrieval-based Voice Conversion

    Retrieval-based Voice Conversion

    Retrieval-based Voice Conversion (RVC) is an open source voice conversion AI algorithm that enables realistic speech-to-speech transformations, accurately preserving the intonation and audio characteristics of the original speaker. == Overview == In contrast to text-to-speech systems such as ElevenLabs, RVC differs by providing speech-to-speech outputs instead. It maintains the modulation, timbre and vocal attributes of the original speaker, making it suitable for applications where emotional tone is crucial. The algorithm enables both pre-processed and real-time voice conversion with low latency. This real-time capability marks a significant advancement over previous AI voice conversion technologies, such as So-vits SVC. Its speed and accuracy have led many to note that its generated voices sound near-indistinguishable from "real life", provided that sufficient computational specifications and resources (e.g., a powerful GPU and ample RAM) are available when running it locally and that a high-quality voice model is used. == Technical foundation == Retrieval-based Voice Conversion (RVC) utilizes a hybrid approach that integrates feature extraction with retrieval-based synthesis. Instead of directly mapping source speaker features to the target speaker using statistical models, RVC retrieves relevant segments from a target speech database, aiming to enhance the naturalness and speaker fidelity of the converted speech. At a high level, the RVC system typically comprises three main components: (1) a content feature extractor, such as a phonetic posteriorgram (PPG) encoder or self-supervised models like HuBERT; (2) a vector retrieval module that searches a target voice database for the most similar speech units; and (3) a vocoder or neural decoder that synthesizes waveform output from the retrieved representations. The retrieval-based paradigm aims to mitigate the oversmoothing effect commonly observed in fully neural sequence-to-sequence models, potentially leading to more expressive and natural-sounding speech. Furthermore, with the incorporation of high-dimensional embeddings and k-nearest-neighbor search algorithms, the model can perform efficient matching across large-scale databases without significant computational overhead. Recent RVC frameworks have incorporated adversarial learning strategies and GAN-based vocoders, such as HiFi-GAN, to enhance synthesis quality. These integrations have been shown to produce clearer harmonics and reduce reconstruction errors. == Research developments == Research on RVC has recently explored the use of self-supervised learning (SSL) encoders such as wav2vec 2.0 and HuBERT to replace hand-engineered features like MFCCs. These encoders improve content preservation, especially when source and target speakers have dissimilar speaking styles or accents. Moreover, modern RVC models leverage vector quantization methods to discretize the acoustic space, improving both synthesis accuracy and generalization across unseen speakers. For example, retrieval-augmented VQ models can condition the synthesis stage on quantized speech tokens, which enhances controllability and style transfer. Despite its strengths, RVC still faces limitations related to database coverage, especially in real-time or few-shot settings. Inadequate diversity in the target voice corpus may lead to suboptimal retrieval or unnatural prosody. These advances demonstrate the viability of RVC as a strong alternative to conventional deep learning VC systems, balancing both flexibility and efficiency in diverse voice synthesis applications. == Training process == The training pipeline for retrieval-based voice conversion typically includes a preprocessing step where the target speaker's dataset is segmented and normalized. A pitch extractor such as librosa or DDSP-DDC may be used to obtain fundamental frequency (F0) features. During training, the model learns to map content features from the source speaker to the acoustic representation of the target speaker while maintaining pitch and prosody. The training objective often combines reconstruction loss with feature consistency loss across intermediate layers, and may incorporate cycle consistency loss to preserve speaker identity. Fine-tuning on small datasets is feasible due to the use of pre-trained models, particularly for the SSL encoder and content extractor components. This approach allows transfer learning to be applied effectively, enabling the model to converge faster and generalize better to unseen inputs. Most open implementations support batch training, gradient accumulation, and mixed-precision acceleration (e.g., FP16), especially when utilizing NVIDIA CUDA-enabled GPUs. == Real-time deployment == RVC systems can be deployed in real-time scenarios through WebUI interfaces and streaming audio frameworks. Optimizations include converting the inference graph to ONNX or TensorRT formats, reducing latency. Audio buffers are typically processed in chunks of 0.2–0.5 seconds to ensure minimal delay and seamless conversion. Cross-platform compatibility with tools such as OBS Studio and Voicemeeter enables integration into live streaming, video production, or virtual avatar environments. == Applications and concerns == The technology enables voice changing and mimicry, allowing users to create accurate models of others using only a negligible amount of minutes of clear audio samples. These voice models can be saved as .pth (PyTorch) files. While this capability facilitates numerous creative applications, it has also raised concerns about potential misuse as deepfake software for identity theft and malicious impersonation through voice calls. == Ethical and legal considerations == As with other deep generative models, the rise of RVC technology has led to increasing debate about copyright, consent, and authorship. While some jurisdictions may allow parody or fair use in creative contexts, impersonating living individuals without permission may infringe upon privacy and likeness rights. As a result, some platforms have begun issuing takedown notices against AI-generated voice content that closely mimics celebrities or musicians. === In pop culture === RVC inference has been used to create realistic depictions of song covers, such as replacing original vocals with characters like Twilight Sparkle and Mordecai to have them sing duets of popular music like "Airplanes" and "Somebody That I Used to Know." These AI-generated covers, which can sound strikingly similar to the voice imitated, have gained popularity on platforms like YouTube as humorous memes.

    Read more →
  • AGROVOC

    AGROVOC

    AGROVOC is a multilingual controlled vocabulary covering areas of interest of the Food and Agriculture Organization of the United Nations (FAO), aiming to promote the visibility of research produced among FAO members. By March 2024, AGROVOC consisted of over 42 000 concepts and up to 1 000 000 terms in more than 42 different languages. It is a collaborative effort, the outcome of consensus among a community of experts coordinated by FAO. == History == FAO first published AGROVOC at the beginning of the 1980s in English, Spanish and French to serve as a controlled vocabulary to index publications in agricultural science and technology, especially for the International System for Agricultural Science and Technology (AGRIS). In the 1990s, AGROVOC shifted from paper printing to a digital format opting for data storage handled by a relational database. In 2004, preliminary experiments with expressing AGROVOC into the Web Ontology Language (OWL) took place. At the same time a web based editing tool was developed, then called WorkBench, nowadays VocBench. In 2009 AGROVOC became an SKOS resource. == Usage == Today, AGROVOC is available in different languages. It is employed for tagging resources, allowing searches in a specific language while providing results in many others, enhancing their visibility worldwide. Additionally, it serves for organizing knowledge to facilitate subsequent data retrieval, tagging website content for search engine discovery, standardizing agricultural information data and acting as a reference for translations. Moreover, it finds applications in fields such as data mining, big data, or artificial intelligence. Updated AGROVOC content is released once a month and is available for public use. == Maintenance == FAO coordinates the editorial activities related to the maintenance of AGROVOC. Content curation is carried out by a community of editors and institutions responsible for each of the language versions. VocBench, is the tool used to edit and maintain AGROVOC in a distributed way. FAO also facilitates the technical maintenance of AGROVOC. == Copyright and license == Copyright for AGROVOC content in FAO languages (English, French, Spanish, Arabic, Russian and Chinese) is held by FAO, while content in other languages stays with the institutions that authored it. AGROVOC thesaurus content in English, Russian, French, Spanish, Arabic and Chinese is licensed under the international Creative Commons Attribution License (CC-BY-4.0).

    Read more →
  • 4E cognition

    4E cognition

    4E cognition refers to a group of theories in (the philosophy of) cognitive science that challenge traditional views of the mind as something that happens only inside the brain. The four Es stand for: embodied, meaning that a brain is found in and, more importantly, vitally interconnected with a larger physical/biological body; embedded, which refers to the limitations placed on the body by the external environment and laws of nature; extended, which argues that the mind is supplemented and even enhanced by the exterior world (e.g., writing, a calculator, etc.); and enactive, which is the argument that without dynamic processes, actions that require reactions, the mind would be ineffectual. It could be argued that the four Es are compounding extensions of cognition or the mind, being part of a body that is, in turn, part of an environment which limits it but also allows for certain extensions, all of which require dynamic actions and reactions. == History == Ideas of embodied cognition, or rather the idea that our physical bodies play a crucial role in our decision making, can be traced back as far as Plato's dialogues and Aristotelian thought. It was, however, in the twentieth century that this debate began to resemble the current discussion, fueled by disagreements between cognitivists and behaviourists. Tensions within cognitivism, as well as the increasing popularity of neurobiology, led, on the one side, to a predominant focus on internal, cognitive processes while neglecting environmental factors, which in turn caused a push-back fuelling our modern understanding of embodied cognition. The term 4E cognition is hard to trace back to its first use, however, some sources attribute it to Shaun Gallagher and the conference on 4E cognition he organised in 2007, while others indicate the term to be first used in 2006 at an 'Embodied mind workshop' at Cardiff University that Gallagher attended. Embodiment or embodied cognition arguably presents the bridge between cognitivism and 4E cognition as the embodiment of cognitive function provides the necessary conditions for embeddedness, enactedness, and extendedness to connect to cognition. 4E cognition was and is heavily influenced by phenomenology. The ideas are still rather fragmented in nature due to their four main components, which can not be neatly divided, causing conceptual questions of internal boundary concepts. As a young field, it is held back both by its fragmented nature and a relative lack of critical evaluations. It is important to acknowledge that 4E cognition, though young, is a broad field containing and combining several different theoretical perspectives that conflict with one another to varying degrees. The somewhat convoluted and competing nature of the theories that can be grouped as 4E cognition, as well as the field's relative youth, make it difficult to put together an exhaustive history beyond the history of its four main theoretical pillars: embodiment, embeddedness, extendedness, and enactedness. == Importance and core tenets of 4E == If there are separate theories of cognition (e.g., embodied, extended, etc.), why group them under this umbrella, causing important epistemological and especially ontological dilemmas? Notably, other theories of 'non-traditional' cognition are not included under the 4E umbrella. The four E's in 4E cognition importantly all reject, or at a minimum draw into question, some of the core tenets of traditional cognitivism. Importantly, 4E cognition is seen as deindividualizing cognition to some extent, allowing for a broader examination of the interplay of personal, social, political, and ethical aspects that shape human cognition. This can be compared to advancements in the field of epigenetics, which have allowed for a broader examination of environmental (both natural and social) factors and their influence on what had previously only been subject to genetic theorizing. In a similar vein, 4E cognition might also help ground cognition in evolutionary theory by extending cognition to a biological account subject to development over time by means of evolution. Overall, the importance of the extension that is 4E cognition aims to reexamine ideas of a self-centered view of cognition, advocating for a more holistic approach. Ideally, this would allow us to reconsider ideas of justice and individual rights and responsibilities that take into account a more nuanced understanding of the relations between people and their context, balancing self-agency with factors beyond it. === Conceptual differences from cognitive psychology === According to the traditional teachings of cognitive psychology, cognition is a type of information processing based on representational mental structures. This idea, as the name suggests, was heavily influenced by computer science. In this light, the brain is a kind of central processing unit that organises and directs all else. The classical cognitivist view draws a strong boundary between 'the internal' and 'the external', where cognition is solely a subject of 'the internal' realm. The four E's, however, break down this boundary. Cognition can not reside solely within the confines of our heads if it is also embodied, embedded, enacted, and extended. In a way, 4E cognition is interested in the extracranial processes affecting cognition. == From embodied cognition to 4E cognition == === The strong and the weak view === ==== Embodied cognition ==== Broadly speaking, there is a strong and a weak perspective of embodied cognition in 4E cognition. The weak understanding refers to mental processes being causally dependent on extracranial processes. This essentially means that there is a cause and effect or action-reaction relationship between the mind and the body and its environment, etc. The strong perspective views extracranial processes as a (partial) constitutive aspect of cognition. An example here could be using a calculator to solve math problems. The calculator is not part of your brain or mind, but it supports your cognitive processes. === Extracranial processes: bodily or extrabodily === In addition to the weak and the strong reading of 4E cognition, there is also the distinction between bodily and extrabodily extracranial processes. Bodily extracranial processes refer to processes within the body, e.g., sensory perception. Extrabodily extracranial processes refer to processes outside of the body, like the aforementioned calculator example. === Four claims of embodied cognition === ==== Embedded and extended cognition ==== When combining the weak/strong reading of embodied cognition and bodily/extrabodily extracranial process, four claims about embodied cognition emerge: strongly embodied and bodily processes strongly embodied and extrabodily processes weakly embodied and bodily processes weakly embodied and extrabodily processes The first and third claims signify a strong and a weak reading of embodied cognition in the more classical sense. The second claim fits almost perfectly with embedded cognition. Claim two is most compatible with extended cognition. ==== Enacted cognition ==== Finally, enacted cognition refers to cognition being connected to active interaction between a conscious agent and their environment. Here, too, there can be a weak and a strong reading. == Criticisms == Given the divided nature of the field, much criticism surrounding the lack of unity within the field has emerged. In particular, the claims of embodied cognition centering around the body appear to conflict with the tenets of extended cognition, which also appear to conflict with the body/environment distinction that is central to enactivism. Some theoreticians argue that the umbrella of 4E theories is still lacking a common language that might bridge the gaps between the theories that constitute it. There is also the concern that the grouping of such variable theories results in an important loss of nuance and complexity, which is a part of human cognition. Another concern raised is the "dogma of harmony". The criticism contained there regards the notion that within 4E theorizing, there is generally an optimistic and harmonic expectation of the extension between humans and their technologies, ignoring the possibility of those extensions detracting from cognition in some way rather than adding to it. Recent attempts to incorporate embodied cognitive neuroscience have been argued to hold the potential to resolve internal issues within 4E cognition. Overall, a concern often voiced regarding 4E cognition is that its proponents are at best only vaguely interested in cognition. More broadly, this concern reflects the arguably too distracted nature of this emerging field.

    Read more →
  • Morphological antialiasing

    Morphological antialiasing

    Morphological antialiasing (MLAA) is a spatial anti-aliasing technique used in real-time computer graphics. It reduces artifacts, such as jaggies, when representing a high-resolution image at a lower resolution. MLAA is a post-process filtering which detects borders in the resulting image and then finds specific patterns in these. Anti-aliasing is achieved by blending pixels in these borders, according to the pattern they belong to and their position within the pattern. Introduced in 2009, MLAA was an early and influential example of anti-aliasing techniques done in post-processing, which makes them suitable for deferred shading. A similar method in this class is fast approximate anti-aliasing (FXAA). Temporal anti-aliasing, also a post-process, has become the most common anti-aliasing method for real-time rendering and video games. Enhanced subpixel morphological antialiasing, or SMAA, is an image-based GPU-based implementation of MLAA developed by Universidad de Zaragoza and Crytek.

    Read more →
  • OpenL Tablets

    OpenL Tablets

    OpenL Tablets is a business rule management system (BRMS) and a business rules engine (BRE) based on table representation of rules. Engine implements optimized sequential algorithm. OpenL includes such table types as decision table, decision tree, spreadsheet-like calculator. == History == The OpenL Tablets project was started as an in-house development project in 2003 and later in 2006 was uploaded to SourceForge. Initially it was an open-source business rule engine for Java. Starting from version 5 it became a BRMS. == Technology == OpenL Tablets engine is specially designed for business rules and uses table rules presentation. Table format enforces rules to be structured and format itself is close to tables found in various business documents. OpenL Tablets is based on OpenL framework for creating custom languages running on Java VM. The engine is designed to allow pluggable language implementations. Currently, it uses 2 languages: table structure for rules format and java-like for code snippets in rules. Java-like language is Java 5.0 implementation with Business User Extensions. OpenL Tablets rules are mixture of declarative programming for rules logic and imperative programming for workflow control. Table formats are flexible enough to match the semantics of the problem domain. Tests, traces, benchmarks are integral part of the engine. It also provides powerful type definition capabilities to handle rules domain model inside rules files. The project is written in Java, but can be used at any platform using Service-oriented architecture approach, e.g. via web service. === Patents === The OpenL Tablets engine has patent pending validation feature. There are usages of OpenL Tablets which may be patented. == BRMS == OpenL Tablets includes several productivity tools and applications addressing BRMS related capabilities. They include web application to edit rules called OpenL WebStudio, web application to deploy rules as web services, Rules Repository to store and manage rules, Eclipse plug-ins to work with rules projects. == Related systems == CLIPS: public domain software tool for building expert systems. ILOG rules: a business rule management system. JBoss Drools: a business rule management system (BRMS). JESS: a rule engine for the Java platform - it is a superset of CLIPS programming language. Prolog: a general purpose logic programming language. DTRules: a Decision Table-based, open-sourced rule engine for Java.

    Read more →
  • DreamBooth

    DreamBooth

    DreamBooth is a deep learning generation model used to personalize existing text-to-image models by fine-tuning. It was developed by researchers from Google Research and Boston University in 2022. Originally developed using Google's own Imagen text-to-image model, DreamBooth implementations can be applied to other text-to-image models, where it can allow the model to generate more fine-tuned and personalized outputs after training on three to five images of a subject. == Technology == Pretrained text-to-image diffusion models, while often capable of offering a diverse range of different image output types, lack the specificity required to generate images of lesser-known subjects, and are limited in their ability to render known subjects in different situations and contexts. The methodology used to run implementations of DreamBooth involves the fine-tuning the full UNet component of the diffusion model using a few images (usually 3--5) depicting a specific subject. Images are paired with text prompts that contain the name of the class the subject belongs to, plus a unique identifier. As an example, a photograph of a [Nissan R34 GTR] car, with car being the class); a class-specific prior preservation loss is applied to encourage the model to generate diverse instances of the subject based on what the model is already trained on for the original class. Pairs of low-resolution and high-resolution images taken from the set of input images are used to fine-tune the super-resolution components, allowing the minute details of the subject to be maintained. == Usage == DreamBooth can be used to fine-tune models such as Stable Diffusion, where it may alleviate a common shortcoming of Stable Diffusion not being able to adequately generate images of specific individual people. Such a use case is quite VRAM intensive, however, and thus cost-prohibitive for hobbyist users. The Stable Diffusion adaptation of DreamBooth in particular is released as a free and open-source project based on the technology outlined by the original paper published by Ruiz et. al. in 2022. Concerns have been raised regarding the ability for bad actors to utilise DreamBooth to generate misleading images for malicious purposes, and that its open-source nature allows anyone to utilise or even make improvements to the technology. In addition, artists have expressed their apprehension regarding the ethics of using DreamBooth to train model checkpoints that are specifically aimed at imitating specific art styles associated with human artists; one such critic is Hollie Mengert, an illustrator for Disney and Penguin Random House who has had her art style trained into a checkpoint model via DreamBooth and shared online, without her consent.

    Read more →