AI Video Tools

Explore the best AI Video Tools — independent reviews, comparisons, pricing and step-by-step how-to guides, curated by Aizhi.

VOCEDplus

VOCEDplus is a free international research database about tertiary education, maintained and developed by staff at the c (NCVER) in Adelaide, South Australia. The focus of the database content is the relation of post-compulsory education and training to workforce needs, skills development, and social inclusion. == Structure == The content of the VOCEDplus database encompasses vocational education and training (VET), higher education, lifelong learning, informal learning, VET in schools, adult and community education, apprenticeships/traineeships, international education, providers of education and training, and workforce development. It is international in scope and contains over 84,000 English language records, many with links to full text documents. VOCEDplus contains extensive Australian materials and includes a wide range of international information, covering outcomes of tertiary education in the shape of published research, practice, policy, and statistics. Entries are included for the following types of publications: reports; annual reports; papers; discussion papers; occasional papers; working papers; books; book chapters; conference papers; conference proceedings; journals; journal articles; policy documents; published statistics; theses; podcasts; and teaching and training materials. Each database entry contains standard bibliographic information and an abstract. Many entries include full text access via the publisher's website or a digitised copy. == History == === 1989-1997 === In the early years VOCEDplus was known as VOCED. The original database was produced by a network of clearinghouses across Australia with the aim of sharing activities in the technical and further education (TAFE) sector. VOCED was produced in hardcopy and an electronic version was distributed on diskette. === 1997-2001 === 1997 - the first web version of VOCED was made available from the National Centre for Vocational Education Research (NCVER) organisational website 1998 - a major project to upgrade the database and expand its international coverage commenced 2001 - creation of VOCED's own website 2001 - VOCED endorsed as the UNESCO international database for technical and vocational education and training (TVET) research information === 2001-2009 === Many changes to the database and website occurred during this period with a focus on continuous improvement to meet the needs of users and utilise emerging technologies. 2006 - materials produced for two adult literacy and learning programs funded by the Australian Department of Education, Employment and Workplace Relations (DEEWR) - the Workplace English Language and Learning (WELL) Programme and the Adult Literacy National Project (ALNP) included in VOCED 2007 - the Australian clearinghouse network transferred most of the hardcopy collections to NCVER, to form a centralised repository of resources 2009 - materials produced by Reframing the Future (RTF) a vocational education and training workforce development initiative of the Australian, State and Territory Governments included in VOCED === 2009-2014 === A major rebuild of the database and website was undertaken during this period to take advantage of the potential of new technologies to provide improved services and incorporate Web 2.0 technologies (RSS feeds, and share and bookmark tools). 2009 - scope expanded to more fully encompass the higher education sector 2011 - launch of VOCEDplus with the name change representing the enhanced features and extended focus 2012 - a major retrospective digitisation project commenced and by the end of the 2012-2013 financial year a total of 9,328 publications (593,534 pages/microfiche frames) had been digitised, ensuring these publications are available electronically for free === 2014-2019 === A number of significant curated content products were released during this period. 2015 - release of a refreshed look to adopt the new NCVER branding plus a number of search enhancements (Guided search, Expert search, and Glossary search) were added 2015 - first in the series of 'Focus on...' pages released 2016 - launch of the 'Pod Network', a convenient and efficient platform that allows instant access to research and a multitude of resources on a range of subjects 2017 - completion of the 'Pod Network', consisting of 20 Pods (on broad subjects including Apprenticeships and traineeships, Foundation skills, Teaching and learning, Career development, and Students) and 74 Podlets (on narrow topics including Online learning, Social media, VET in schools, STEM skills, and Adult literacy) 2018 - launch of the 'Timeline of Australian VET Policy Initiatives' and the 'VET Knowledge Bank' which contains a suite of products capturing Australia's diverse, complex and ever-changing VET system 2019 - after an internal review, a refreshed, streamlined version of the 'Pod Network' was released, consisting of 13 Pods and 20 Podlets 2019 - launch of the 'VET Practitioner Resource' which contains a range of information to support VET practitioners in their work and is organised into three sections: (1) Teaching, training and assessment: standards, guidance, research and good practice resources to inform daily work; (2) Practitioners as researchers: information for undertaking practitioner-led research; and (3) The VET workforce: information about VET teachers and trainers, and the professional development needs of the VET workforce 2019 - VOCEDplus celebrated 30 years of providing information to the tertiary education sector and the homepage was refreshed to make it more modern and easier to use === 2020- === VOCEDplus continued to be accessible throughout the COVID-19 pandemic. 2020-2021 - the VET Knowledge Bank added a dedicated page, 'COVID-19 announcements', that showcases the measures introduced by the Australian, state and territory governments to mitigate the impact of the pandemic and promote economic recovery 2020-2024 - published research about the effects of the pandemic on education and training, providers, students, labour markets, employment and employees was collected and made permanently available in the database 2024 - VOCEDplus celebrated 35 years of providing information to the tertiary education sector. The homepage was refreshed and a number of enhancements and new features were implemented including a new My Profile feature, improvements to My Selection, accessible search history and saved searches, enhanced search functionality, and improved navigation.
Read more →
Alice and Sparkle

Alice and Sparkle is a 2022 illustrated children's book published by American technology product designer Ammaar Reshi. Reshi created the book using artificial intelligence programs ChatGPT and Midjourney in one weekend, which sparked controversy among artists, both in regard to the copyright status of the book and the quality of the illustration and text. == Plot == A girl named Alice discovers a group of magical and benevolent artificial intelligence beings. She knows that artificial intelligence is powerful, and that it has the power to do good and evil depending on how it is used. One day, she creates her own artificial intelligence and names it Sparkle. Sparkle helps Alice with her homework and plays with her, and they quickly become good friends. However, Sparkle soon grows more powerful and begins to make its own decisions, which makes Alice both proud and scared. She knows that it is her responsibility to guide Sparkle to do good, not evil. Together, Alice and Sparkle use their knowledge to make the world a better place and to teach people about the power of artificial intelligence. The two live happily ever after, spreading the magic of artificial intelligence. == Structure == Including the dedication and postscript, the book contains twenty four pages, about half of which being illustrations provided by Midjourney. The very short story, composed of text generated by ChatGPT, contains 343 words. Some of the illustrations are accompanied by descriptions, at least one of which was provided by Reshi. Both Alice's and Sparkle's appearances change significantly between illustrations, although Alice's is more consistent. Reshi said Midjourney was unable to generate consistent images of Sparkle, so he had to include a line in the book saying that it could turn "into all kinds of robot shapes". == Creation == When reading a children's book to his friend's daughter, Ammaar Reshi "decided he wanted to write his own". He had no experience with creative writing or illustration, so instead used the chatbot ChatGPT to write the story for him and used the image generation software Midjourney to illustrate it. On December 4, 2022, 72 hours after having the idea for the book, he published it on Amazon's digital bookstore, and published a paperback version the following day. == Controversy == On December 9, 2022, Reshi made a thread on Twitter about his experience publishing the book, which soon went viral. Reshi received heavy backlash from artists with concerns over the ethics of art generated by artificial intelligence. He also received death threats and messages encouraging self-harm because of his publication. Many writers and illustrators criticized both the creation process and the product itself, claiming that if artificial intelligence programs such as Midjourney are trained on existing illustrations, then the original artists should be financially compensated for derivative works such as Alice and Sparkle. The book was temporarily removed from Amazon in January 2023 because of "suspicious review activity", caused by a high volume of both five-star and one-star reviews.
Read more →
Agents of S.H.I.E.L.D. season 4

The fourth season of the American television series Agents of S.H.I.E.L.D., based on the Marvel Comics spy organization S.H.I.E.L.D., follows Phil Coulson and other S.H.I.E.L.D. agents and allies after the signing of the Sokovia Accords. It is set in the Marvel Cinematic Universe (MCU) and acknowledges the continuity of the franchise's films. The season was produced by ABC Studios, Marvel Television, and Mutant Enemy Productions, with Jed Whedon, Maurissa Tancharoen, and Jeffrey Bell serving as showrunners. Clark Gregg reprises his role as Coulson from the film series, starring alongside the returning series regulars Ming-Na Wen, Chloe Bennet, Iain De Caestecker, Elizabeth Henstridge, and Henry Simmons. They are joined by John Hannah who was promoted from his recurring guest role in the third season. The fourth season was ordered in March 2016, with production taking place from that July until the following April. Due to its broadcast schedule, the season was split into three "pods": Ghost Rider for the first eight episodes, featuring recurring guest star Gabriel Luna as the supernatural Robbie Reyes / Ghost Rider and exploring mysticism in the MCU alongside the film Doctor Strange (2016); LMD, referring to the new Life Model Decoy program, for the next seven episodes which focus on recurring guest star Mallory Jansen as the LMD Aida; and Agents of Hydra for the final seven episodes, partly set in a "what if" virtual reality that allowed the return of former series regular Brett Dalton as Grant Ward. The season is also affected by the events of the film Captain America: Civil War (2016), and continues storylines established in the canceled series Agent Carter. The first episode premiered at a screening on September 19, 2016, with the season then airing for 22 episodes on ABC, from September 20, 2016, until May 16, 2017. The premiere debuted to 3.58 million viewers, down from previous season premieres but average for the series. Critical response to the season was positive, with many feeling that each pod was better than the last and in particular praising the visual effects and tone of Ghost Rider, the writing and acting of LMD, and the character development and political commentary explored during Agents of Hydra. The season saw series low viewership, but was still considered to have solved ABC's problem during its new Tuesday night timeslot, and the series was renewed for a fifth season in May 2017. == Episodes == == Cast and characters == == Production == === Development === Agents of S.H.I.E.L.D. was renewed for a fourth season on March 3, 2016, earlier than usual for the series. Executive producer Jed Whedon said on this, "We're thrilled to know going into the end of [season three] with certainty that we will be returning, because we can build our story accordingly." Executive producer Maurissa Tancharoen also noted that logistics for hiring directors for the season in advance would be easier, "which is a very nice privilege to have...that's a luxury". The end of the episode "What If..." features an onscreen tribute to Bill Paxton, who died in February 2017 and had portrayed John Garrett in the series' first season. The series paid additional tribute to Paxton in "All the Madame's Men" with promos during The Bakshi Report news segment showcasing John Garrett as a fallen American hero. The end of "World's End" features a similar onscreen tribute to Powers Boothe, who died in May 2017 and had portrayed Gideon Malick in the series' third season. === Writing === The season shifted to the later 10 pm timeslot, allowing it to take on a darker, more mature tone than previous seasons. According to Tancharoen, "The whole tagline for this year is 'Agents of S.H.I.E.L.D. After Dark'". The timeslot gave the series the opportunity to present an increased level of violence and partial nudity, as well as take more risks and present edgier themes. Following the third-season finale, Tancharoen stated that the fourth season would explore the guilt Daisy Johnson has over Lincoln Campbell's death. Executive producer Jeffrey Bell noted the writers tried to continue the tradition of "finding new combinations and new conflicts" between different sets of characters, given "a lot of procedurals [see] the same people doing the same thing for five years". Pairings that would be explored included Coulson and Mack, continuing from the end of season three, who have a mutual respect for one another due to their relationships with Daisy, and Leo Fitz and Holden Radcliffe, who work together. The Fitz-Simmons relationship was also explored more, examining the new challenges it presented for the two "working together, loving each other and living together". Following the third season's dealing with the themes of Captain America: Civil War (2016), such as the opposing reactions to the Inhumans, Whedon said that the question of "How do you deal with a war with powered people at that level, a government level?" was one that they wanted to answer in the fourth season. Tancharoen called the Inhumans "a permanent part of our universe now", with Whedon adding, "we have a quick-fire way of introducing people with powers. It gives us a lot of leeway in our world, and it lets us explore the metaphors of what it is like to be different. We will never close that chapter." With the Inhumans film being removed from Marvel Studios' release schedule, the series had "a little more freedom" and were "able to do a little bit more" with the species, including the potential of introducing some of the "classic" Inhumans, though the series would focus less on Inhumans than the third season which saw "a real significant Inhuman agenda story". It was not intended to be a spin-off of Agents of S.H.I.E.L.D. On the evolution of S.H.I.E.L.D. to featuring so many powered characters, Whedon said "the dynamic in the world has changed. There was one person with powers, and then by The Avengers there were maybe six total ... now they're much more prevalent, so there's reaction from the public based on that." The season is structured into three "pods" based on its airing schedule: the first eight episodes, subtitled Ghost Rider; LMD (Life Model Decoy) for the subsequent seven episodes; and a third pod for the final seven episodes called Agents of Hydra. Elements and characters cross over between the different pods, but the sections "definitely have a different feel" from one another, as Bell explained that 22 episodes "is a long time to hold a big bad or a single plot line, especially for an audience", and for the past two seasons, the series was able to have two separated halves that "allows us to introduce a big bad. And then, something happens and we rise somebody new ... Now, there's three of those." "Financial considerations" were also taken into account in creating the pods for the season, as using LMDs does not "cost as much as setting a guy's head on fire via CGI". In terms of writing the "complicated season", Whedon said the writers were "aware that our fans are our fans and have spent some time with these characters and are clever and see things coming sometimes ... Part of our job is to create not just what we are presenting on plot, but letting the audience be one step ahead of us and being one step ahead of that." He added that the writers knew that they wanted to tell a Ghost Rider story, an LMD story, and a "what if" scenario, and the hardest part was making each pod still fit together as a single season. The major connection ultimately became the Darkhold, which leads from the magic of Ghost Rider to the advanced science of LMD and then the Framework in Agents of Hydra. Ghost Rider also reappears in the final episode of the season, "World's End", as an additional connection. ==== Ghost Rider ==== While planning the fourth season, Marvel suggested that the series introduce Ghost Rider, after the character's film rights had returned to Marvel from Sony in May 2013. Loeb felt that this made the season unquestionably "the series' biggest" with the "most ambitious story yet". He added that "one of the things that we talked about is, S.H.I.E.L.D. always looked out for the weird, the unusual, the things that were and could be a problem for the public", and Marvel realized that Ghost Rider's abilities, which are more mystical than anything seen in the series to date, opened up "a quarter of the universe that we haven't really spent a lot of time exploring ... what happens if our very real, our very grounded agents who are very much a family have to take on something that is as bizarre and powerful and unique as Ghost Rider." Bell added that the producers would have been willing to give an entire season of the show to a Ghost Rider arc if the season was 13 episodes or less, but 22 episodes seemed too long to "feel like one flavor". The Robbie Reyes version of Ghost Rider was chosen over other versions of the character from the comics because of his relationship with his brother Gabe, w
Read more →
Really Simple Licensing

Really Simple Licensing (RSL) is an open content licensing standard that allows web publishers to set terms for web crawlers gathering training data for generative AI use. It was launched on September 10, 2025 and is managed by the nonprofit RSL Collective, co-founded by RSS co-creator Eckart Walther and former Ask.com CEO Doug Leeds. Participating companies at launch include Reddit, Yahoo, and Medium. Publishers can implement the RSL standard by adding licensing terms to their robots.txt files.
Read more →
Local coordinates

Local coordinates are the ones used in a local coordinate system or a local coordinate space. Simple examples: Houses. In order to work in a house construction, the measurements are referred to a control arbitrary point that will allow to check it: stick/sticks on the ground, steel bar, nails... Addresses. Using house numbers to locate a house on a street; the street is a local coordinate system within a larger system composed of city townships, states, countries, postal codes, etc. Local systems exist for convenience. On ancient times, every work was made on relative bases as there was no conception of global systems. Practically, it is better to use local systems for small works as houses, buildings... For most of the applications, it is desired the position of one element relative to one building or location, and in a more local way, relative to one furniture or person. In a regular way, you will not give your position by geographical coordinates rather than "I am 15 meters away of the entry to the building". So it is a pretty common way to locate things. It is possible to bring latitude and longitude for all terrestrial locations, but unless one has a highly precise GPS device or you make astronomical observations, this is impractical. It is much simpler to use a tape, a rope, a chain... The position information (global) should be transformed into a location. Position refers to a numeric or symbolic description within a spatial reference system, whereas location refers to information about surrounding objects and their interrelationships. (Topological space) == Use == In computer graphics and computer animation, local coordinate spaces are also useful for their ability to model independently transformable aspects of geometrical scene graphs. When modeling a car, for example, it is desirable to describe the center of each wheel with respect to the car's coordinate system, but then specify the shape of each wheel in separate local spaces centered about these points. This way, the information describing each wheel can be simply duplicated four times, and independent transformations (e.g., steering rotation) can be similarly effected. Bounding volumes of objects may be described more accurately using extents in the local coordinates, (i.e. an object oriented bounding box, contrasted with the simpler axis aligned bounding box). The trade-off for this flexibility is additional computational cost: the rendering system must access the higher-level coordinate system of the car and combine it with the space of each wheel in order to draw everything in its proper place. Local coordinates also afford digital designers a means around the finite limits of numerical representation. The tread marks on a tire, for example, can be described using millimeters by allowing the whole tire to occupy the entire range of numeric precision available. The larger aspects of the car, such as its frame, might be described in centimeters, and the terrain that the car travels on could be specified in meters. In differential topology, local coordinates on a manifold are defined by means of an atlas of charts. The basic idea behind coordinate charts is that each small patch of a manifold can be endowed with a set of local coordinates. These are collected together into an atlas, and stitched together in such a way that they are self-consistent on the manifold. In Cartography and Maps, the traditional way of works are local datum. With a local datum the land can be mapped on relative small areas as a country. With the need of global systems, the transformations on between datum became a problem, so geodetic datum have been created. More than 150 local datum have been used in the world.
Read more →
Content-based image retrieval

Content-based image retrieval, also known as query by image content (QBIC) and content-based visual information retrieval (CBVIR), is the application of computer vision techniques to the image retrieval problem, that is, the problem of searching for digital images in large databases (see this survey for a scientific overview of the CBIR field). Content-based image retrieval is opposed to traditional concept-based approaches (see Concept-based image indexing). "Content-based" means that the search analyzes the contents of the image rather than the metadata such as keywords, tags, or descriptions associated with the image. The term "content" in this context might refer to colors, shapes, textures, or any other information that can be derived from the image itself. CBIR is desirable because searches that rely purely on metadata are dependent on annotation quality and completeness. == Comparison with metadata searching == An image meta search requires humans to have manually annotated images by entering keywords or metadata in a large database, which can be time-consuming and may not capture the keywords desired to describe the image. The evaluation of the effectiveness of keyword image search is subjective and has not been well-defined. In the same regard, CBIR systems have similar challenges in defining success. "Keywords also limit the scope of queries to the set of predetermined criteria." and, "having been set up" are less reliable than using the content itself. == History == The term "content-based image retrieval" seems to have originated in 1992 when it was used by Japanese Electrotechnical Laboratory engineer Toshikazu Kato to describe experiments into automatic retrieval of images from a database, based on the colors and shapes present. Since then, the term has been used to describe the process of retrieving desired images from a large collection on the basis of syntactical image features. The techniques, tools, and algorithms that are used originate from fields such as statistics, pattern recognition, signal processing, and computer vision. === QBIC - Query By Image Content === The earliest commercial CBIR system was developed by IBM and was called QBIC (Query By Image Content). Recent network- and graph-based approaches have presented a simple and attractive alternative to existing methods. While the storing of multiple images as part of a single entity preceded the term BLOB (Binary Large OBject), the ability to fully search by content, rather than by description, had to await IBM's QBIC. === VisualRank === == Technical progress == The interest in CBIR has grown because of the limitations inherent in metadata-based systems, as well as the large range of possible uses for efficient image retrieval. Textual information about images can be easily searched using existing technology, but this requires humans to manually describe each image in the database. This can be impractical for very large databases or for images that are generated automatically, e.g. those from surveillance cameras. It is also possible to miss images that use different synonyms in their descriptions. Systems based on categorizing images in semantic classes like "cat" as a subclass of "animal" can avoid the miscategorization problem, but will require more effort by a user to find images that might be "cats", but are only classified as an "animal". Many standards have been developed to categorize images, but all still face scaling and miscategorization issues. Initial CBIR systems were developed to search databases based on image color, texture, and shape properties. After these systems were developed, the need for user-friendly interfaces became apparent. Therefore, efforts in the CBIR field started to include human-centered design that tried to meet the needs of the user performing the search. This typically means inclusion of: query methods that may allow descriptive semantics, queries that may involve user feedback, systems that may include machine learning, and systems that may understand user satisfaction levels. == Techniques == Many CBIR systems have been developed, but as of 2006, the problem of retrieving images on the basis of their pixel content remains largely unsolved. Different query techniques and implementations of CBIR make use of different types of user queries. === Query By Example === QBE (Query By Example) is a query technique that involves providing the CBIR system with an example image that it will then base its search upon. The underlying search algorithms may vary depending on the application, but result images should all share common elements with the provided example. Options for providing example images to the system include: A preexisting image may be supplied by the user or chosen from a random set. The user draws a rough approximation of the image they are looking for, for example with blobs of color or general shapes. This query technique removes the difficulties that can arise when trying to describe images with words. === Semantic retrieval === Semantic retrieval starts with a user making a request like "find pictures of Abraham Lincoln". This type of open-ended task is very difficult for computers to perform - Lincoln may not always be facing the camera or in the same pose. Many CBIR systems therefore generally make use of lower-level features like texture, color, and shape. These features are either used in combination with interfaces that allow easier input of the criteria or with databases that have already been trained to match features (such as faces, fingerprints, or shape matching). However, in general, image retrieval requires human feedback in order to identify higher-level concepts. === Relevance feedback (human interaction) === Combining CBIR search techniques available with the wide range of potential users and their intent can be a difficult task. An aspect of making CBIR successful relies entirely on the ability to understand the user intent. CBIR systems can make use of relevance feedback, where the user progressively refines the search results by marking images in the results as "relevant", "not relevant", or "neutral" to the search query, then repeating the search with the new information. Examples of this type of interface have been developed. === Iterative/machine learning === Machine learning and application of iterative techniques are becoming more common in CBIR. === Other query methods === Other query methods include browsing for example images, navigating customized/hierarchical categories, querying by image region (rather than the entire image), querying by multiple example images, querying by visual sketch, querying by direct specification of image features, and multimodal queries (e.g. combining touch, voice, etc.) == Content comparison using image distance measures == The most common method for comparing two images in content-based image retrieval (typically an example image and an image from the database) is using an image distance measure. An image distance measure compares the similarity of two images in various dimensions such as color, texture, shape, and others. For example, a distance of 0 signifies an exact match with the query, with respect to the dimensions that were considered. As one may intuitively gather, a value greater than 0 indicates various degrees of similarities between the images. Search results then can be sorted based on their distance to the queried image. Many measures of image distance (Similarity Models) have been developed. === Color === Computing distance measures based on color similarity is achieved by computing a color histogram for each image that identifies the proportion of pixels within an image holding specific values. Examining images based on the colors they contain is one of the most widely used techniques because it can be completed without regard to image size or orientation. However, research has also attempted to segment color proportion by region and by spatial relationship among several color regions. === Texture === Texture measures look for visual patterns in images and how they are spatially defined. Textures are represented by texels which are then placed into a number of sets, depending on how many textures are detected in the image. These sets not only define the texture, but also where in the image the texture is located. Texture is a difficult concept to represent. The identification of specific textures in an image is achieved primarily by modeling texture as a two-dimensional gray level variation. The relative brightness of pairs of pixels is computed such that degree of contrast, regularity, coarseness and directionality may be estimated. The problem is in identifying patterns of co-pixel variation and associating them with particular classes of textures such as silky, or rough. Other methods of classifying textures include: Co-occurrence matrix Laws texture energy Wavelet transform Orthogonal transforms (discrete Chebyshev moments) =
Read more →
Ensemble averaging (machine learning)

In machine learning, ensemble averaging is the process of creating multiple models (typically artificial neural networks) and combining them to produce a desired output, as opposed to creating just one model. Ensembles of models often outperform individual models, as the various errors of the ensemble constituents "average out". == Overview == Ensemble averaging is one of the simplest types of committee machines. Along with boosting, it is one of the two major types of static committee machines. In contrast to standard neural network design, in which many networks are generated but only one is kept, ensemble averaging keeps the less satisfactory networks, but with less weight assigned to their outputs. The theory of ensemble averaging relies on two properties of artificial neural networks: In any network, the bias can be reduced at the cost of increased variance In a group of networks, the variance can be reduced at no cost to the bias. This is known as the bias–variance tradeoff. Ensemble averaging creates a group of networks, each with low bias and high variance, and combines them to form a new network which should theoretically exhibit low bias and low variance. Hence, this can be thought of as a resolution of the bias–variance tradeoff. The idea of combining experts can be traced back to Pierre-Simon Laplace. == Method == The theory mentioned above gives an obvious strategy: create a set of experts with low bias and high variance, and average them. Generally, what this means is to create a set of experts with varying parameters; frequently, these are the initial synaptic weights of a neural network, although other factors (such as learning rate, momentum, etc.) may also be varied. Some authors recommend against varying weight decay and early stopping. The steps are therefore: Generate N experts, each with their own initial parameters (these values are usually sampled randomly from a distribution) Train each expert separately Combine the experts and average their values. Alternatively, domain knowledge may be used to generate several classes of experts. An expert from each class is trained, and then combined. A more complex version of ensemble average views the final result not as a mere average of all the experts, but rather as a weighted sum. If each expert is y i {\displaystyle y_{i}} , then the overall result y ~ {\displaystyle {\tilde {y}}} can be defined as: y ~ ( x ; α ) = ∑ j = 1 p α j y j ( x ) {\displaystyle {\tilde {y}}(\mathbf {x} ;\mathbf {\alpha } )=\sum _{j=1}^{p}\alpha _{j}y_{j}(\mathbf {x} )} where α {\displaystyle \mathbf {\alpha } } is a set of weights. The optimization problem of finding alpha is readily solved through neural networks, hence a "meta-network" where each "neuron" is in fact an entire neural network can be trained, and the synaptic weights of the final network is the weight applied to each expert. This is known as a linear combination of experts. It can be seen that most forms of neural network are some subset of a linear combination: the standard neural net (where only one expert is used) is simply a linear combination with all α j = 0 {\displaystyle \alpha _{j}=0} and one α k = 1 {\displaystyle \alpha _{k}=1} . A raw average is where all α j {\displaystyle \alpha _{j}} are equal to some constant value, namely one over the total number of experts. A more recent ensemble averaging method is negative correlation learning, proposed by Y. Liu and X. Yao. This method has been widely used in evolutionary computing. == Benefits == The resulting committee is almost always less complex than a single network that would achieve the same level of performance The resulting committee can be trained more easily on smaller datasets The resulting committee often has improved performance over any single model The risk of overfitting is lessened, as there are fewer parameters (e.g. neural network weights) which need to be set.
Read more →
India AI Impact Summit 2026

The India AI Impact Summit 2026 (also abbreviated as the AI Impact Summit) was an international summit on artificial intelligence held at Bharat Mandapam, New Delhi, India, from 16 to 21 February 2026. It is the fourth in a series of global AI summits following the Bletchley Park AI Safety Summit in 2023, the AI Seoul Summit in 2024, and the AI Action Summit in Paris in 2025. Organised under the IndiaAI Mission by the Ministry of Electronics and Information Technology, it is the first summit in the series to be hosted by a Global South nation. This series of AI summits will continue with the AI Summit in Geneva to be hosted by Switzerland in 2027. The summit was inaugurated by Prime Minister Narendra Modi on 19 February 2026. The opening ceremony was also addressed by French President Emmanuel Macron and United Nations Secretary-General António Guterres. The summit was attended by over 20 heads of state and a delegation of global technology leaders including Sundar Pichai (Google), Sam Altman (OpenAI), and Demis Hassabis (DeepMind). The event faced criticism for organisational issues, misrepresentation of non-Indian products as Indian, and a perceived focus on trade fair activities over substantive governance. == Background == The AI Impact Summit was an international summit on artificial intelligence (AI) held in New Delhi from 16 to 20 February 2026. It followed the AI Action Summit in Paris in February 2025, the AI Seoul Summit in 2024 and the Bletchley Park AI Safety Summit in 2023. According to Crowell & Moring, the changing summit titles seemed to reflect a broader shift in focus away from AI safety and governance toward practical impact, implementation, and measurable outcomes. Ahead of the summit, an international panel of experts published the second International AI Safety Report. The summit was structured around three foundational pillars, termed "Sutras": People, Planet, and Progress. Seven thematic working groups were established to deliver outcomes across these pillars, covering AI for economic growth and social good; democratising AI resources; inclusion for social empowerment; safe and trusted AI; human capital; science; and resilience, innovation, and efficiency. == Programme == The summit ran over five days, later extended to six following overwhelming public response. Originally scheduled to conclude on 20 February, the event was extended to 21 February with expanded evening hours for the exhibition. === India AI Impact Expo === The India AI Impact Expo, inaugurated by Prime Minister Modi on 16 February, featured over 300 exhibitors from 30 countries across more than 10 thematic pavilions. Pavilions were organised across thematic zones aligned with the summit's three pillars, showcasing AI applications in healthcare, agriculture, education, and sustainable industry. === Leaders' Plenary and CEO Roundtable === The Leaders' Plenary on 19 February brought together heads of state, ministers, and representatives from multilateral institutions to outline national and global priorities on AI governance, infrastructure, and international cooperation. A CEO Roundtable, held the same evening, convened senior executives from global technology and industry firms with government leaders to discuss investment, research collaboration, and deployment of AI systems. === Research Symposium === A Research Symposium on AI and its Impact was held on 18 February, with the IIIT Hyderabad as knowledge partner. Discussions covered sovereign AI infrastructure, global adoption challenges, research breakthroughs, and policy priorities. == Participants == The summit drew delegations from over 100 countries, including more than 20 heads of state and 60 ministers. Notable attendees from the technology industry included Sundar Pichai (Google), Sam Altman (OpenAI), Dario Amodei (Anthropic), Demis Hassabis (Google DeepMind), and Mukesh Ambani (Reliance Industries). Representatives from multilateral institutions included Sangbu Kim of the World Bank. == Announcements and outcomes == === Indian AI models === Several Indian AI models and products were unveiled during the summit. Sarvam AI, an Indian AI laboratory, launched a new generation of large language models, including 30-billion and 105-billion parameter models using a mixture of experts architecture, as well as text-to-speech, speech-to-text, and vision models. Sarvam also introduced the Kaze smartglasses, described as the company's first hardware product, which Prime Minister Modi tested at the expo. The government-backed BharatGen Param2 model, a 17-billion parameter model supporting 22 Indian languages with multimodal capabilities, was also launched at the summit. === Infrastructure commitments === Union Minister Ashwini Vaishnaw outlined India's "whole-of-nation" AI strategy, describing plans to build a "frugal, sovereign and scalable" AI ecosystem. The government announced plans to add more than 20,000 GPUs to India's existing base of 38,000 under the IndiaAI Compute Portal. Microsoft announced at the summit that it was on track to invest US$50 billion by the end of the decade to bring AI to lower-income countries. Goa reaffirmed its commitment to artificial intelligence at the India AI Impact Summit 2026. === Guinness World Record === During the summit, India set a Guinness World Record for the most pledges received for an AI responsibility campaign in 24 hours, with 250,946 valid pledges collected between 16 and 17 February 2026. The campaign, conducted in partnership with Intel India as part of the IndiaAI Mission, exceeded its initial target of 5,000 pledges. == Controversies and criticisms == === Galgotias University incident === On 18 February, Galgotias University faced widespread criticism after a representative presented a robot dog at the university's exhibition pavilion as an indigenous development. Social media users identified the robot as the Unitree Go2, a commercially available product manufactured by Chinese company Unitree Robotics. IT Secretary S. Krishnan stated that the government did not want exhibitors to showcase items that were not their own, and the university was directed to vacate its stall. Galgotias University issued an apology, stating that the representative had been "ill-informed" and was not authorised to speak to the press. The incident drew political reactions, with the Indian National Congress using it to criticise the government. The controversy was amplified after Union IT Minister Ashwini Vaishnaw had earlier shared a video clip of the robot on social media, which was subsequently deleted. === Organisational issues === On day 1 of the Summit, Dhananjay Yadav, a Bengaluru-based entrepreneur had alleged that his product was stolen in the Summit. He called it as a pain for the people in an X post. He further wrote, "Think about this: We paid for flights, accommodation, logistics and even the booth. Only to see our wearables disappear inside a high-security zone". Later, the stolen devices were recovered by The Delhi Police. Bloomberg reported that delegates were left stranded without food or water during a security lockdown ahead of the Prime Minister's visit on 19 February. The summit venue was closed to the public on 19 February for the Prime Minister's visit, leading to criticism from attendees who had registered for that day. === Protests by the Indian Youth Congress (IYC) === On 20 February, some members of the Indian Youth Congress (IYC) carried out protests inside the venue with slogans such as "PM is compromised" and the criticism of the recent trade deal between India and the US. 4 of these members were sent to police custody by the court on 22 February. While Bharatiya Janta Party condemned these protests, with its spokesperson Shehzad Poonawalla saying, "From being anti-BJP, you have gone to being anti-national? If you have a problem with the BJP, then protest at the BJP office, Jantar Mantar, or outside the PM's office. But the people of the country and their alliance partners condemn them for their attempt to defame India in front of the entire world at the AI Summit." Congress leader Harish Rawat defended the protests, saying "it's also a fact that AI might become a tool in the hands of a few individuals… It's the opposition's job to warn against that… It's not the first time such international events have been opposed. I know how the BJP protested during the Commonwealth Games… To say that such opposition has happened for the first time is not correct. The BJP has been doing this while in the opposition." These protestors were granted bail by the Delhi high court on 2 March. == Reception and analysis == Bloomberg News reported that Prime Minister Modi used the summit to assert India's global AI ambitions following a challenging year in foreign policy. TechPolicy.Press published several critical analyses of the summit. One article argued that the summit's structure granted "multinational corporations parity with sovereign governments
Read more →
Neuro-symbolic AI

Neuro-symbolic AI is a subfield of artificial intelligence that integrates neural methods (e.g., neural networks and deep learning) with symbolic methods (e.g., formal logic, knowledge representation, and automated reasoning). The goal is to combine the strengths of both approaches, resulting in AI systems that can be trained from raw data and demonstrate robustness against outliers or errors in the base data, while preserving explainability, explicit use of expert knowledge, and explicit cognitive reasoning. As argued by Leslie Valiant and others, the effective construction of rich computational cognitive models demands the combination of symbolic reasoning and efficient machine learning. Gary Marcus argued, "We cannot construct rich cognitive models in an adequate, automated way without the triumvirate of hybrid architecture, rich prior knowledge, and sophisticated techniques for reasoning." Further, "To build a robust, knowledge-driven approach to AI we must have the machinery of symbol manipulation in our toolkit. Too much of useful knowledge is abstract to make do without tools that represent and manipulate abstraction, and to date, the only known machinery that can manipulate such abstract knowledge reliably is the apparatus of symbol manipulation." Angelo Dalli, Henry Kautz, Francesca Rossi, and Bart Selman also argued for such a synthesis. Their arguments attempt to address the two kinds of thinking, as discussed in Daniel Kahneman's book Thinking, Fast and Slow. It describes cognition as encompassing two components: System 1 is fast, reflexive, intuitive, and unconscious. System 2 is slower, step-by-step, and explicit. System 1 is used for pattern recognition. System 2 handles planning, deduction, and deliberative thinking. In this view, deep learning best handles the first kind of cognition, while symbolic reasoning best handles the second kind. Both are necessary for the development of a robust and reliable AI system capable of learning, reasoning, and interacting with humans to accept advice and answer questions. Since the 1990s, dual-process models with explicit references to the two contrasting systems have been the focus of research in both the fields of AI and cognitive science by numerous researchers. In 2025, the adoption of neurosymbolic AI, an approach that integrates neural networks with symbolic reasoning, increased in response to the need to address hallucination issues in large language models. For example, Amazon implemented Neurosymbolic AI in its Vulcan warehouse robots and Rufus shopping assistant to enhance accuracy and decision-making. == Approaches == Approaches for integration are diverse. Henry Kautz's taxonomy of neuro-symbolic architectures follows, along with some examples: Symbolic Neural symbolic is the current approach of many neural models in natural language processing, where words or subword tokens are the ultimate input and output of large language models. Examples include BERT, RoBERTa, and GPT-3. Symbolic[Neural] is exemplified by AlphaGo, where symbolic techniques are used to invoke neural techniques. In this case, the symbolic approach is Monte Carlo tree search and the neural techniques learn how to evaluate game positions. Neural | Symbolic uses a neural architecture to interpret perceptual data as symbols and relationships that are reasoned about symbolically. Neural-Concept Learner is an example. Neural: Symbolic → Neural relies on symbolic reasoning to generate or label training data that is subsequently learned by a deep learning model, e.g., to train a neural model for symbolic computation by using a Macsyma-like symbolic mathematics system to create or label examples. NeuralSymbolic uses a neural net that is generated from symbolic rules. An example is the Neural Theorem Prover, which constructs a neural network from an AND-OR proof tree generated from knowledge base rules and terms. Logic Tensor Networks also fall into this category. Neural[Symbolic] according to Kautz, this approach embeds true symbolic reasoning inside a neural network. These are tightly-coupled neural-symbolic systems, in which the logical inference rules are internal to the neural network. This way, the neural network internally computes the inference from the premises and learns to reason based on logical inference systems. Early work on connectionist modal and temporal logics by Garcez, Lamb, and Gabbay is aligned with this approach. These categories are not exhaustive, as they do not consider multi-agent systems. In 2005, Bader and Hitzler presented a more fine-grained categorization that took into account, e.g., whether the use of symbols included logic and, if so, whether the logic was propositional or first-order logic. The 2005 categorization and Kautz's taxonomy above are compared and contrasted in a 2021 article. Sepp Hochreiter argued that Graph Neural Networks "...are the predominant models of neural-symbolic computing" since "[t]hey describe the properties of molecules, simulate social networks, or predict future states in physical and engineering applications with particle-particle interactions." == Artificial general intelligence == Gary Marcus argues that "...hybrid architectures that combine learning and symbol manipulation are necessary for robust intelligence, but not sufficient", and that there are ...four cognitive prerequisites for building robust artificial intelligence: hybrid architectures that combine large-scale learning with the representational and computational powers of symbol manipulation, large-scale knowledge bases—likely leveraging innate frameworks—that incorporate symbolic knowledge along with other forms of knowledge, reasoning mechanisms capable of leveraging those knowledge bases in tractable ways, and rich cognitive models that work together with those mechanisms and knowledge bases. This echoes earlier calls for hybrid models as early as the 1990s. == History == Garcez and Lamb described research in this area as ongoing, at least since the 1990s. During that period, the terms symbolic and sub-symbolic AI were popular. A series of workshops on neuro-symbolic AI has been held annually since 2005 Neuro-Symbolic Artificial Intelligence. In the early 1990s, an initial set of workshops on this topic were organized. == Research == Key research questions remain, such as: What is the best way to integrate neural and symbolic architectures? How should symbolic structures be represented within neural networks and extracted from them? How should common-sense knowledge be learned and reasoned about? How can abstract knowledge that is hard to encode logically be handled? == Implementations == Implementations of neuro-symbolic approaches include: AllegroGraph: an integrated Knowledge Graph based platform for neuro-symbolic application development. Scallop: a language based on Datalog that supports differentiable logical and relational reasoning. Scallop can be integrated in Python and with a PyTorch learning module. Logic Tensor Networks: encode logical formulas as neural networks and simultaneously learn term encodings, term weights, and formula weights. DeepProbLog: combines neural networks with the probabilistic reasoning of ProbLog. Abductive Learning: integrates machine learning and logical reasoning in a balanced-loop via abductive reasoning, enabling them to work together in a mutually beneficial way. SymbolicAI: a compositional differentiable programming library.
Read more →
Whisper (speech recognition system)

Whisper is a machine learning model for speech recognition and transcription, created by OpenAI and first released as open-source software in September 2022. It is capable of transcribing speech in English and multiple other languages, and can translate several non-English languages into English. Whisper is a weakly-supervised deep learning acoustic model, made using an encoder-decoder transformer architecture. OpenAI claims that the combination of different training data and post-training filtering used in its development has led to improved recognition of accents, background noise, and jargon compared to previous approaches. While the model does not outperform larger, more specialized models and still experiences AI hallucination, it has been showed to be useful for general sound recognition and has many applications across different industries. == Background == Speech recognition has had a long history in research; the first approaches made use of statistical methods, such as dynamic time warping, and later hidden Markov models. At around the 2010s, deep neural network approaches became more common for speech recognition models, which were enabled by the availability of large datasets ("big data") and increased computational performance. Early approaches to deep learning in speech recognition included convolutional neural networks, which were limited due to their inability to capture sequential data, which later led to developments of Seq2seq approaches, which include recurrent neural networks, which made use of long short-term memory. Transformers, introduced in 2017 by Google, displaced many prior state-of-the-art approaches across a wide range in machine learning, and started becoming the core neural architecture in fields such as language modeling and computer vision. Weakly-supervised approaches to training acoustic models were recognized in the early 2020s as promising for speech recognition approaches using deep neural networks. According to a NYT report, in 2021 OpenAI believed they exhausted sources of higher-quality data to train their large language models and decided to complement scraped web text with transcriptions of YouTube videos and podcasts, and developed Whisper to solve this task. Whisper Large V2 was released on December 8, 2022, followed by Whisper Large V3 being released in November 2023, during the OpenAI Dev Day. In March 2025, OpenAI released new transcription models based on GPT-4o and GPT-4o mini, both of which have lower error rates than Whisper. == Architecture == The Whisper architecture is based on an encoder-decoder transformer. Input audio is resampled to 16,000 Hertz (Hz) and converted to an 80-channel Log-magnitude Mel spectrogram using 25 ms windows with a 10 ms stride. The spectrogram is then normalized to a [-1, 1] range with near-zero mean. The encoder takes this Mel spectrogram as input and processes it. It first passes through two convolutional layers. Sinusoidal positional embeddings are added. It is then processed by a series of Transformer encoder blocks (with pre-activation residual connections). The encoder's output is layer normalized. The decoder is a standard transformer decoder. It has the same width and Transformer blocks as the encoder. It uses learned positional embeddings and tied input-output token representations (using the same weight matrix for both the input and output embeddings). It uses a byte-pair encoding tokenizer, of the same kind as used in GPT-2. English-only models use the GPT-2 vocabulary, while multilingual models employ a re-trained multilingual vocabulary with the same number of words. Special tokens are used to allow the decoder to perform multiple tasks: Tokens that denote language (one unique token per language). Tokens that specify task (<|transcribe|> or <|translate|>). Tokens that specify if no timestamps are present (<|notimestamps|>). If the token is not present, then the decoder predicts timestamps relative to the segment, and quantized to 20 ms intervals. <|nospeech|> for voice activity detection. <|startoftranscript|>, and <|endoftranscript|> . Any text that appears before <|startoftranscript|> is not generated by the decoder, but given to the decoder as context. Loss is only computed over non-contextual parts of the sequence, i.e. tokens between these two special tokens. == Training data == The training dataset consists of 680,000 hours of labeled audio-transcript pairs sourced from the internet using semi-supervised learning. This includes 117,000 hours in 96 non-English languages and 125,000 hours of X→English translation data, where X stands for any non-English language. Preprocessing involved standardization of transcripts, filtering to remove machine-generated transcripts using heuristics (e.g., punctuation, capitalization), language identification and matching with transcripts, fuzzy deduplication, and deduplication with evaluation datasets to avoid data contamination. Speechless segments were also included to allow voice activity detection training. For the files still remaining after the filtering process, audio files were then broken into 30-second segments paired with the subset of the transcript that occurs within that time. If this predicted spoken language differed from the language of the text transcript associated with the audio, that audio-transcript pair was not used for training the speech recognition models, but instead for training translation. The model was trained using the AdamW optimizer with gradient norm clipping and a linear learning rate decay with warmup, with batch size 256 segments. Training proceeded for 1 million updates (approximately 2-3 epochs). No data augmentation or regularization, except for the Large V2 model, which used SpecAugment, Stochastic Depth, and BPE Dropout. The training used data parallelism with float16, dynamic loss scaling, and activation checkpointing. === Post-training filtering === After training the first model, researchers ran it on different subsets of the training data, each representing a distinct source. Data sources were ranked by a combination of their error rate and size. Manual inspection of the top-ranked sources (high error, large size) helped determine if the source was low quality (e.g., partial transcriptions, inaccurate alignment). After training, it was fine-tuned to suppress the prediction of speaker names and low-quality sources were then removed. == Capacity == While Whisper does not outperform models which specialize in the LibriSpeech dataset, when tested across many datasets, it is more robust and makes 55.2% fewer errors than other models. Whisper has a differing error rate with respect to transcribing different languages, with a higher word error rate in languages not well-represented in the training data. The authors found that multi-task learning improved overall performance compared to models specialized to one task. They conjectured that the best Whisper model trained is still underfitting the dataset, and larger models and longer training can result in better models. Third-party evaluations have found varying levels of AI hallucination. A study of transcripts of public meetings found hallucinations in eight out of every 10 transcripts, while an engineer discovered hallucinations in "about half" of 100 hours of transcriptions and a developer identified them in "nearly every one" of 26,000 transcripts. A study of 13,140 short audio segments (averaging 10 seconds) found 187 hallucinations (1.4%), 38% of which generated text that could be harmful because it inserted false references to things like race, non-existent medications, or violent events that were not in the audio. == Applications == The model has been used as the base for many applications, such as a unified model for speech recognition and more general sound recognition. Whisper has also been integrated into the workflow of biomedical research. In 2025, a study on Alzheimer's disease detection used the model to transcribe spontaneous speech recordings. The transcripts that were generated by the model were combined with LLM vector embeddings and traditional classifiers to help classify the patients' health. Another application is when OVALYTICS incorporated Whisper to transcribe YouTube videos and automate content moderation systems, which improved its detection of offensive content. The model has also been used in academic libraries and cultral heritage institutions to generate transcripts and captions for their digitized audiovisual collections. In a 2025 case study, Emory University Libraries found that Whisper reduced the labor used in transcription by around 30-35%, shifting work from text creation to text correction. However, human review is still necessary to make sure accuracy, formatting, and accessibility are all standard.
Read more →
Deepfake

Deepfakes (a portmanteau of 'deep learning' and 'fake') are images, videos, or audio that have been edited or generated using artificial intelligence, AI-based tools or audio-video editing software. They may depict real or fictional people and are considered a form of synthetic media, that is media that is usually created by artificial intelligence systems by combining various media elements into a new media artifact. While the act of creating fake content is not new, deepfakes uniquely leverage machine learning and artificial intelligence techniques, including facial recognition algorithms and artificial neural networks such as variational autoencoders and generative adversarial networks (GANs). In turn, the field of image forensics has worked to develop techniques to detect manipulated images. Deepfakes have garnered widespread attention for their potential use in creating child sexual abuse material, celebrity pornographic videos, revenge porn, fake news, hoaxes, bullying, and financial fraud. Academics have raised concerns about the potential for deepfakes to promote disinformation and hate speech, as well as interfere with elections. In response, the information technology industry and governments have proposed recommendations and methods to detect and mitigate their use. Academic research has also delved deeper into the factors driving deepfake engagement online as well as potential countermeasures to malicious application of deepfakes. From traditional entertainment to gaming, deepfake technology has evolved to be increasingly convincing and available to the public, allowing for the disruption of the entertainment and media industries. == History == Photo manipulation was developed in the 19th century and soon applied to motion pictures. Technology steadily improved during the 20th century, and more quickly with the advent of digital video. Deepfake technology has been developed by researchers at academic institutions beginning in the 1990s, and later by amateurs in online communities. More recently, the methods have been adopted by industry. The development of generative adversarial networks (GANs) in the mid-2010s represented a key technical turning point in the evolution of deepfakes. GANs allowed for the creation of highly realistic fake images and videos by training competing neural networks, achieving a much improved visual fidelity over previous methods of creating the content using rules or by using autoencoders, and formed the basis for modern deepfake methods. === Academic research === Academic research related to deepfakes is split between the field of computer vision, a sub-field of computer science, which develops techniques for creating and identifying deepfakes, and humanities and social science approaches that study the social, ethical, aesthetic implications as well as journalistic and informational implications of deepfakes. As deepfakes have risen in prominence in popularity with innovations provided by AI tools, significant research has gone into detection methods and defining the factors driving engagement with deepfakes on the internet. Deepfakes have been shown to appear on social media platforms and other parts of the internet for purposes ranging from entertainment and education related to deepfakes to misinformation to elicit strong reactions. There are gaps in research related to the propagation of deepfakes on social media. Negativity and emotional response are the primary driving factors for users sharing deepfakes. === Social science and humanities approaches to deepfakes === In cinema studies, deepfakes illustrate how "the human face is emerging as a central object of ambivalence in the digital age". Video artists have used deepfakes to "playfully rewrite film history by retrofitting canonical cinema with new star performers". Film scholar Christopher Holliday analyses how altering the gender and race of performers in familiar movie scenes destabilizes gender classifications and categories. The concept of "queering" deepfakes is also discussed in Oliver M. Gingrich's discussion of media artworks that use deepfakes to reframe gender, including British artist Jake Elwes' Zizi: Queering the Dataset, an artwork that uses deepfakes of drag queens to intentionally play with gender. The aesthetic potentials of deepfakes are also beginning to be explored. Theatre historian John Fletcher notes that early demonstrations of deepfakes are presented as performances, and situates these in the context of theater, discussing "some of the more troubling paradigm shifts" that deepfakes represent as a performance genre. While most English-language academic studies of deepfakes focus on the Western anxieties about disinformation and pornography, digital anthropologist Gabriele de Seta has analyzed the Chinese reception of deepfakes, which are known as huanlian, which translates to "changing faces". The Chinese term does not contain the "fake" of the English deepfake, and de Seta argues that this cultural context may explain why the Chinese response has centered on practical regulatory measures to "fraud risks, image rights, economic profit, and ethical imbalances". === Computer science research on deepfakes === A landmark early project was the "Video Rewrite" program, published in 1997. The program modified existing video footage of a person speaking to depict that person mouthing the words from a different audio track. It was the first system to fully automate this kind of facial reanimation, and it did so using machine learning techniques to make connections between the sounds produced by a video's subject and the shape of the subject's face. Contemporary academic projects have focused on creating more realistic videos and improving deepfake techniques. The "Synthesizing Obama" program, published in 2017, modifies video footage of former president Barack Obama to depict him mouthing the words contained in a separate audio track. The project lists as a main research contribution to its photorealistic technique for synthesizing mouth shapes from audio. The "Face2Face" program, published in 2016, modifies video footage of a person's face to depict them mimicking another person's facial expressions. The project highlights its primary research contribution as the development of the first method for re-enacting facial expressions in real time using a camera that does not capture depth, enabling the technique to work with common consumer cameras. Researchers have also shown that deepfakes are expanding into other domains such as medical imagery. In this work, it was shown how an attacker can automatically inject or remove lung cancer in a patient's 3D CT scan. The result was so convincing that it fooled three radiologists and a state-of-the-art lung cancer detection AI. To demonstrate the threat, the authors successfully performed the attack on a hospital in a White hat penetration test. A survey of deepfakes, published in May 2020, provides a timeline of how the creation and detection of deepfakes have advanced over the last few years. The survey identifies that researchers have been focusing on resolving the following challenges of deepfake creation: Generalization. High-quality deepfakes are often achieved by training on hours of footage of the target. This challenge is to minimize the amount of training data and the time to train the model required to produce quality images and to enable the execution of trained models on new identities (unseen during training). Paired Training. Training a supervised model can produce high-quality results, but requires data pairing. This is the process of finding examples of inputs and their desired outputs for the model to learn from. Data pairing is laborious and impractical when training on multiple identities and facial behaviors. Some solutions include self-supervised training (using frames from the same video), the use of unpaired networks such as Cycle-GAN, or the manipulation of network embeddings. Identity leakage. This is where the identity of the driver (i.e., the actor controlling the face in a reenactment) is partially transferred to the generated face. Some solutions proposed include attention mechanisms, few-shot learning, disentanglement, boundary conversions, and skip connections. Occlusions. When part of the face is obstructed with a hand, hair, glasses, or any other item then artifacts can occur. A common occlusion is a closed mouth which hides the inside of the mouth and the teeth. Some solutions include image segmentation during training and in-painting. Temporal coherence. In videos containing deepfakes, artifacts such as flickering and jitter can occur because the network has no context of the preceding frames. Some researchers provide this context or use novel temporal coherence losses to help improve realism. As the technology improves, the interference is diminishing. Overall, deepfakes are expected to have several implications in media and society, med
Read more →
IJCAI Award for Research Excellence

The IJCAI Award for Research Excellence is a biannual award before given at the IJCAI conference to researcher in artificial intelligence as a recognition of excellence of their career. Beginning in 2016, the conference is held annually and so is the award. == Laureates == The recipients of this award have been: John McCarthy (1985) Allen Newell (1989) Marvin Minsky (1991) Raymond Reiter (1993) Herbert A. Simon (1995) Aravind Joshi (1997) Judea Pearl (1999) Donald Michie (2001) Nils Nilsson (2003) Geoffrey E. Hinton (2005) Alan Bundy (2007) Victor R. Lesser (2009) Robert Kowalski (2011) Hector Levesque (2013) Barbara Grosz (2015) for her pioneering research in Natural Language Processing and in theories and applications of Multiagent Collaboration. Michael I. Jordan (2016) for his groundbreaking and impactful research in both the theory and application of statistical machine learning. Andrew Barto (2017) for his pioneering work in the theory of reinforcement learning. Jitendra Malik (2018) Yoav Shoham (2019) Eugene Freuder (2020) Richard S. Sutton (2021) Stuart J. Russell (2022) Sarit Kraus (2023) for her pioneering work of the study of interactions among self-interested agents, creating the field of automated negotiation, and developing methods for coalition formation and teamwork, both as formal models and real-world implementations. == Winners of also Turing Award == John McCarthy (1971) Allen Newell (1975) Marvin Minsky (1969) Herbert A. Simon (1975) Judea Pearl (2011) Geoffrey Hinton (2018) Andrew Barto (2024) Richard S. Sutton (2024)
Read more →
Open-source robotics

Open-source robotics is a branch of robotics where robots are developed with open-source hardware and free and open-source software, publicly sharing blueprints, schematics, and source code. It is thus closely related to the open design movement, the maker movement and open science. == Requirements == Open source robotics means that information about the hardware is easily discerned, so that others can easily rebuild it. In turn, this requires design to use only easily available standard subcomponents and tools, and for the build process to be documented in detail including a bill of materials and detailed ('Ikea style') step-by-step building and testing instructions. (A CAD file alone is not sufficient, as it does not show the steps for performing or testing the build). These requirements are standard to open source hardware in general, and are formalised by various licences, certifications, especially those defined by the peer-reviewed journals Journal of Open Hardware and HardwareX. Licensing requirements for software are the same as for any open source software. But in addition, for software components to be of practical use in real robot systems, they need to be compatible with other software, usually as defined by some robotics middleware community standard. == Hardware systems == Applications to date include: Robot arms, e.g. PARA or Thor Wheeled mobile robots. e.g. OpenScout Four-legged robots such as the Open Dynamic Robot Initiative UAV quadcopters (drones) such as Agilicious Humanoid robots, e.g. iCub, Berkeley Humanoid Lite Self-driving cars, e.g. OpenPodcar (→ Personal rapid transit) Submersible robots, eg. OpenFish Laboratory robotics such as chemical liquid handling Vertical farming Swarm robots, e.g. HeRoSwarm Domestic tasks: vacuum cleaning, floor washing and grass mowing Robot sports including robot combat and autonomous racing Education == Hardware subcomponents == Most open source hardware definitions allow non-open subcomponents to be used in modular design, as long as they are easily available. However many designs try to push openness down into as many subcomponents as possible, with the aim of ultimately reaching fully open designs. Open hardware manual-drive vehicles and their subcomponents, such as from Open Source Ecology, are often used as starting points and extended with automation systems. Open subcomponents can include open-source computing hardware as subcomponents, such as Arduino and RISC-V, as well as open source motors and drivers such as the Open Source Motor Controller and ODrive. Open hardware robotics interface boards can simplify interfacing between middleware software and physical hardware. == Software subcomponents == === Middleware === Robotics middleware is software which links multiple other software components together. In robotics, this specifically means real-time communication systems with standardized message passing protocols. The predominant open source middleware is ROS2, the robot operating system, now as version 2. Other alternatives include ROS1, YARP — used in the iCub, URBI, and Orca. Open source middleware is usually run on an open source operating system, especially the Ubuntu distribution of Linux. === Driver software === Most robot sensors and actuators require software drivers. There is little standardization of open source software at this level, because each hardware device is different. Creating open drivers for closed hardware is difficult as it requires both low level programming and reverse engineering. === Simulation software === Open source robotics simulators include Gazebo, MuJoCo and Webots. Open source 3D game engines such as Godot are also sometimes used as simulators, when equipped with suitable middleware interfaces. === Automation software === At the level of AI, many standard algorithms have open source software implementations, mostly in ROS2. Major components include: Machine vision systems such as the YOLO object detector. 3D photogrammetry Navigation including SLAM and planning such as nav2 Arm inverse kinematics such as moveIt2 == Community == The first signs of the increasing popularity of building and sharing robot designs were found with the maker culture community. What began with small competitions for remote operated vehicles (e.g. Robot combat), soon developed to the building of autonomous telepresence robots such as Sparky and then true robots (being able to take decisions themselves) as the Open Automaton Project. Several commercial companies now also produce kits for making simple robots. The community has adopted open source hardware licenses, certifications, and peer-reviewed publications, which check that source has been made correctly and permanently available under community definitions, and which validate that this has been done. These processes have become critically important due to many historical projects claiming to be open source but them reverting on the promise due to commercialisation or other pressures. As with other forms of open source hardware, the community continues to debate precise criteria for 'ease of build'. A common standard is that designs should be buildable by a technical university student, in a few days, using typical fablab tools, but definitions of all of these subterms can also be debated. Compared to other forms of open source hardware, open source robotics typically includes a large software element, so involves software as well as hardware engineers. Open source concepts are more established in open source software than hardware, so robotics is a field in which those concepts can be shared and transferred from software to hardware. While the community in open source robotics is multi-faceted with a wide range of backgrounds, a sizable sub-community uses the ROS middleware and meets at the ROSCon conferences to discuss development of ROS itself and automation components built on it.
Read more →
Clinical decision support system

A clinical decision support system (CDSS) is a form of health information technology that provides clinicians, staff, patients, or other individuals with knowledge and person-specific information to enhance decision-making in clinical workflows. CDSS tools include alerts and reminders, clinical guidelines, condition-specific order sets, patient data summaries, diagnostic support, and context-aware reference information. They often leverage artificial intelligence to analyze clinical data and help improve care quality and safety. CDSSs constitute a major topic in artificial intelligence in medicine. == Characteristics == A clinical decision support system is an active knowledge system that uses variables of patient data to produce advice regarding health care. This implies that a CDSS is simply a decision support system focused on using knowledge management. === Purpose === The main purpose of modern CDSS is to assist clinicians at the point of care. This means that clinicians interact with a CDSS to help to analyze and reach a diagnosis based on patient data for different diseases. In the early days, CDSSs were conceived to make decisions for the clinician in a literal manner. The clinician would input the information and wait for the CDSS to output the "right" choice, and the clinician would simply act on that output. However, the modern methodology of using CDSSs to assist means that the clinician interacts with the CDSS, utilizing both their knowledge and the CDSS's, better to analyse the patient's data than either a human or a CDSS could do on their own. Typically, a CDSS makes suggestions for the clinician to review, and the clinician is expected to pick out useful information from the presented results and discount erroneous CDSS suggestions. The two main types of CDSS are knowledge-based systems and non-knowledge-based (machine learning–based) systems: An example of how a clinician might use a clinical decision support system is a diagnosis decision support system (DDSS). DDSS requests some of the patient's data and, in response, proposes a set of possible diagnoses. The physician then takes the output of the DDSS and determines which diagnoses are likely and which are not, and, if necessary, orders further tests to narrow down the diagnosis. Another example of a CDSS would be a case-based reasoning (CBR) system. A CBR system might use previous case data to help determine the appropriate amount of beams and the optimal beam angles for use in radiotherapy for brain cancer patients; medical physicists and oncologists would then review the recommended treatment plan to determine its viability. Another important classification of a CDSS is based on the timing of its use. Physicians use these systems at the point of care to help them as they are dealing with a patient, with the timing of use being either pre-diagnosis, during diagnosis, or post-diagnosis. Pre-diagnosis CDSS systems help the physician prepare the diagnoses. CDSSs help review and filter the physician's preliminary diagnostic choices to improve outcomes. Post-diagnosis CDSS systems are used to mine data to derive connections between patients and their past medical history and clinical research to predict future events. Early speculation that AI-based decision support would replace clinicians in common tasks has largely given way to a consensus around assistive models, in which AI augments rather than supplants clinical judgment. Contemporary deep learning-based systems, unlike earlier rule-based tools, can be trained directly on clinical data without manual rule authoring and integrated into electronic health record workflows at the point of care. Another approach, used by the National Health Service in England, is to use a CDSS to triage medical conditions out of hours by suggesting a suitable next step to the patient (e.g. call an ambulance, or see a general practitioner on the next working day). The suggestion, which may be disregarded by either the patient or the phone operative if common sense or caution suggests otherwise, is based on the known information and an implicit conclusion about what the worst-case diagnosis is likely to be; it is not always revealed to the patient because it might well be incorrect and is not based on a medically-trained person's opinion - it is only used for initial triage purposes. === Knowledge-based === Most CDSSs consist of three parts: the knowledge base, an inference engine, and a mechanism to communicate. The knowledge base contains the rules and associations of compiled data which most often take the form of IF-THEN rules. If this was a system for determining drug interactions, then a rule might be that IF drug X is taken AND drug Y is taken THEN alert the user. Using another interface, an advanced user could edit the knowledge base to keep it up to date with new drugs. The inference engine combines the rules from the knowledge base with the patient's data. The communication mechanism allows the system to show the results to the user as well as have input into the system. An expression language such as GELLO or CQL (Clinical Quality Language) is needed for expressing knowledge artefacts in a computable manner. For example: if a patient has diabetes mellitus, and if the last haemoglobin A1c test result was less than 7%, recommend re-testing if it has been over six months, but if the last test result was greater than or equal to 7%, then recommend re-testing if it has been over three months. The current focus of the HL7 CDS WG is to build on the Clinical Quality Language (CQL). The U.S. Centers for Medicare & Medicaid Services (CMS) has announced that it plans to use CQL for the specification of Electronic Clinical Quality Measures (eCQMs). === Non-knowledge-based === CDSSs which do not use a knowledge base use a form of artificial intelligence called machine learning, which allow computers to learn from past experiences and/or find patterns in clinical data. This eliminates the need for writing rules and expert input. However, since systems based on machine learning cannot explain the reasons for their conclusions, most clinicians do not use them directly for diagnoses, reliability and accountability reasons. Nevertheless, they can be useful as post-diagnostic systems, for suggesting patterns for clinicians to look into in more depth. As of 2012, three types of non-knowledge-based systems are support-vector machines, artificial neural networks and genetic algorithms. Artificial neural networks use nodes and weighted connections between them to analyse the patterns found in patient data to derive associations between symptoms and a diagnosis. Genetic algorithms are based on simplified evolutionary processes using directed selection to achieve optimal CDSS results. The selection algorithms evaluate components of random sets of solutions to a problem. The solutions that come out on top are then recombined and mutated and run through the process again. This happens over and over until the proper solution is discovered. They are functionally similar to neural networks in that they are also "black boxes" that attempt to derive knowledge from patient data. Non-knowledge-based networks often focus on a narrow list of symptoms, such as symptoms for a single disease, as opposed to the knowledge-based approach, which covers the diagnosis of many diseases. An example of a non-knowledge-based CDSS is a web server developed using a support vector machine for the prediction of gestational diabetes in Ireland. == Regulations == === History, United States === The IOM had published a report in 1999, To Err is Human, which focused on the patient safety crisis in the United States, pointing to the incredibly high number of deaths. This statistic attracted great attention to the quality of patient care. The Institute of Medicine (IOM) promoted the usage of health information technology, including clinical decision support systems, to advance the quality of patient care. With the enactment of the American Recovery and Reinvestment Act of 2009 (ARRA), there was a push for widespread adoption of health information technology through the Health Information Technology for Economic and Clinical Health Act (HITECH). Through these initiatives, more hospitals and clinics were integrating electronic medical records (EMRs) and computerized physician order entry (CPOE) within their health information processing and storage. Despite the absence of laws, the CDSS vendors would almost certainly be viewed as having a legal duty of care to both the patients who may adversely be affected due to CDSS usage and the clinicians who may use the technology for patient care. However, duties of care legal regulations are not explicitly defined yet. With the enactment of the HITECH Act included in the ARRA, encouraging the adoption of health IT, more detailed case laws for CDSS and EMRs were still being defined by the Office of National Coordinator for Health Informati
Read more →
CADE ATP System Competition

The CADE ATP System Competition (CASC) is an annual competition of fully automated theorem provers for classical logic. == Competition == CASC is associated with the Conference on Automated Deduction and the International Joint Conference on Automated Reasoning organized by the Association for Automated Reasoning. It has inspired similar competition in related fields, in particular the successful SMT-COMP competition for satisfiability modulo theories, the SAT Competition for propositional reasoners, and the modal logic reasoning competition. The first CASC, CASC-13, was held as part of the 13th Conference on Automated Deduction at Rutgers University, New Brunswick, NJ, in 1996. Among the systems competing were Otter and SETHEO.
Read more →