Voice user interface

Voice user interface

A voice user interface (VUI) enables spoken human interaction with computers, using speech recognition to understand spoken commands and answer questions, and typically text to speech to play a reply. A voice command device is a device controlled with a voice user interface. Voice user interfaces have been added to automobiles, home automation systems, computer operating systems, home appliances like washing machines and microwave ovens, and television remote controls. They are the primary way of interacting with virtual assistants on smartphones and smart speakers. Older automated attendants (which route phone calls to the correct extension) and interactive voice response systems (which conduct more complicated transactions over the phone) can respond to the pressing of keypad buttons via DTMF tones, but those with a full voice user interface allow callers to speak requests and responses without having to press any buttons. Newer voice command devices are speaker-independent, so they can respond to multiple voices, regardless of accent or dialectal influences. They are also capable of responding to several commands at once, separating vocal messages, and providing appropriate feedback, accurately imitating a natural conversation. == Overview == A VUI is the interface to any speech application. Only a short time ago, controlling a machine by simply talking to it was only possible in science fiction. Until recently, this area was considered to be artificial intelligence. However, advances in technologies like text-to-speech, speech-to-text, natural language processing, and cloud services contributed to the mass adoption of these types of interfaces. VUIs have become more commonplace, and people are taking advantage of the value that these hands-free, eyes-free interfaces provide in many situations. VUIs rely on the ability to process input reliably, inconsistent performance often leads to decreased user engagement and negative feedback. Designing a good VUI requires interdisciplinary talents of computer science, linguistics and human factors such as psychology. Even with advanced development tools, constructing an effective VUI requires understanding of both the tasks to be performed, as well as the target audience that will use the final system. The closer the VUI matches the user's mental model of the task, the easier it will be to use with little or no training, resulting in both higher efficiency and higher user satisfaction. A VUI designed for the general public should emphasize ease of use and provide a lot of help and guidance for first-time callers. In contrast, a VUI designed for a small group of power users (including field service workers), should focus more on productivity and less on help and guidance. Such applications should streamline the call flows, minimize prompts, eliminate unnecessary iterations and allow elaborate "mixed initiative dialogs", which enable callers to enter several pieces of information in a single utterance and in any order or combination. In short, speech applications have to be carefully crafted for the specific business process that is being automated. Not all business processes render themselves equally well for speech automation. In general, the more complex the inquiries and transactions are, the more challenging they will be to automate, and the more likely they will be to fail with the general public. In some scenarios, automation is simply not applicable, so live agent assistance is the only option. A legal advice hotline, for example, would be very difficult to automate. On the flip side, speech is perfect for handling quick and routine transactions, like changing the status of a work order, completing a time or expense entry, or transferring funds between accounts. == History == Early applications for VUI included voice-activated dialing of phones, either directly or through a (typically Bluetooth) headset or vehicle audio system. In 2007, a CNN business article reported that voice command was over a billion dollar industry and that companies like Google and Apple were trying to create speech recognition features. In the years since the article was published, the world has witnessed a variety of voice command devices. Additionally, Google has created a speech recognition engine called Pico TTS and Apple released Siri. Voice command devices are becoming more widely available, and innovative ways for using the human voice are always being created. For example, Business Week suggests that the future remote controller is going to be the human voice. Currently Xbox Live allows such features and Jobs hinted at such a feature on the new Apple TV. == Voice command software products on computing devices == Both Apple Mac and Windows PC provide built in speech recognition features for their latest operating systems. === Microsoft Windows === Two Microsoft operating systems, Windows 7 and Windows Vista, provide speech recognition capabilities. Microsoft integrated voice commands into their operating systems to provide a mechanism for people who want to limit their use of the mouse and keyboard, but still want to maintain or increase their overall productivity. ==== Windows Vista ==== With Windows Vista voice control, a user may dictate documents and emails in mainstream applications, start and switch between applications, control the operating system, format documents, save documents, edit files, efficiently correct errors, and fill out forms on the Web. The speech recognition software learns automatically every time a user uses it, and speech recognition is available in English (U.S.), English (U.K.), German (Germany), French (France), Spanish (Spain), Japanese, Chinese (Traditional), and Chinese (Simplified). In addition, the software comes with an interactive tutorial, which can be used to train both the user and the speech recognition engine. ==== Windows 7 ==== In addition to all the features provided in Windows Vista, Windows 7 provides a wizard for setting up the microphone and a tutorial on how to use the feature. ==== Mac OS X ==== All Mac OS X computers come pre-installed with the speech recognition software. The software is user-independent, and it allows for a user to, "navigate menus and enter keyboard shortcuts; speak checkbox names, radio button names, list items, and button names; and open, close, control, and switch among applications." However, the Apple website recommends a user buy a commercial product called Dictate. === Commercial products === If a user is not satisfied with the built in speech recognition software or a user does not have a built speech recognition software for their OS, then a user may experiment with a commercial product such as Braina Pro or DragonNaturallySpeaking for Windows PCs, and Dictate, the name of the same software for Mac OS. == Voice command mobile devices == Any mobile device running Android OS, Microsoft Windows Phone, iOS 9 or later, or Blackberry OS provides voice command capabilities. In addition to the built-in speech recognition software for each mobile phone's operating system, a user may download third party voice command applications from each operating system's application store: Apple App store, Google Play, Windows Phone Marketplace (initially Windows Marketplace for Mobile), or BlackBerry App World. === Android OS === Google has developed an open source operating system called Android, which allows a user to perform voice commands such as: send text messages, listen to music, get directions, call businesses, call contacts, send email, view a map, go to websites, write a note, and search Google. The speech recognition software is available for all devices since Android 2.2 "Froyo", but the settings must be set to English. Google allows for the user to change the language, and the user is prompted when he or she first uses the speech recognition feature if he or she would like their voice data to be attached to their Google account. If a user decides to opt into this service, it allows Google to train the software to the user's voice. Google introduced the Google Assistant with Android 7.0 "Nougat". It is much more advanced than the older version. Amazon.com has the Echo that uses Amazon's custom version of Android to provide a voice interface. === Microsoft Windows === Windows Phone is Microsoft's mobile device's operating system. On Windows Phone 7.5, the speech app is user independent and can be used to: call someone from your contact list, call any phone number, redial the last number, send a text message, call your voice mail, open an application, read appointments, query phone status, and search the web. In addition, speech can also be used during a phone call, and the following actions are possible during a phone call: press a number, turn the speaker phone on, or call someone, which puts the current call on hold. Windows 10 introduces Cortana, a voice control system that replaces the formerly used voice control on Windows

Read the Docs

Read the Docs is an open-sourced free software documentation hosting platform. It generates documentation written with the Sphinx documentation generator, MkDocs, or Jupyter Book. == History == The site was created in 2010 by Eric Holscher, Bobby Grace, and Charles Leifer. On March 9, 2011, the Python Software Foundation Board awarded a grant of US$840 to the Read the Docs project for one year of hosting fees. On November 13, 2017, the Linux Mint project announced that they were moving their documentation to Read the Docs. In 2020, Read the Docs received a $200,000 grant from the Chan Zuckerberg Initiative. For 2021, Read the Docs reported 700 million page views and 196 million unique visitors. In 2013, a "Write the Docs" conference for Read the Docs users was launched, which has since turned into a generic software-documentation community. As of 2024, it continues to hold annual global conferences, organize local meetups, and maintain a Slack channel for "people who care about documentation."

International Olympiad in Artificial Intelligence

The International Olympiad in Artificial Intelligence (IOAI) is an annual International Science Olympiad in the field of artificial intelligence (AI) for secondary education students under the age of 20. The first IOAI was held in Burgas, Bulgaria, in 2024. Each country or territory may send up to two teams, each consisting of up to four students supported by one leader. Participants are selected through a multi-stage National Olympiad in Artificial Intelligence (NOAI) and/or a Regional Olympiad such as the NAOAI or APOAI. Participants at the IOAI compete on an individual basis. As of 2025, there were 61 countries and territories participating in the IOAI. Three hundred students participated in IOAI 2025. As of 2026, 130 countries and territories are accredited for participation in the IOAI. == Competition Structure == The IOAI consists of three contests: the Individual Contest, the Team Challenge, and the GAITE contest. Medals are awarded based solely on the Individual Contest. === Individual Contest === The Individual Contest is the main competition of the IOAI in which contestants compete individually on separate computers and are not permitted to communicate during the contest. Medals are awarded solely on the basis of the total score from the two-day Individual Contest. The Individual Contest consists of two on-site contest days (six hours per day), preceded by an at-home practice round and an on-site practice session. In IOAI 2025, three at-home problems were released for preparation approximately one month before the on-site contest. Results from this at-home round do not affect final results. The first on-site contest day (Individual Contest 1) comprises three tasks as extensions and continuations of the at-home tasks, while the second day (Individual Contest 2) comprises two or three tasks which are novel and different from the at-home tasks. The Individual Contest tasks span various AI domains such as machine learning, natural language processing, and computer vision. The IOAI 2025 contest rules describe tasks as requiring typical machine-learning workflows, including writing code, fitting models on training data, and running inference on test data, using identical local machines and GPU resources (minimum 24 GB RAM). Tasks, datasets, and submissions are handled through a contest platform (Bohrium), including a web-based Jupyter notebook environment for GPU access. Internet access is restricted to a whitelist of documentation sites and an integrated compact large language model accessible within the platform. The use of external APIs are prohibited unless a task explicitly allows them. In IOAI 2025, each contest task was scored up to 100 points and could include multiple subtasks. Scores are normalized using a baseline solution and a maximum score derived from either a Scientific Committee solution or the best contestant submission. Contestants can view only their own scores during the contest; a live scoreboard may be available publicly outside the contest hall but is not permitted to be viewed by contestants during the contest. For non-English-speaking teams, the IOAI hold a translation session beginning three hours before each contest day in which team leaders review and may amend machine-translated task statements; translations must match the English original and are published after the contest. The IOAI committee also enforces quarantine restrictions during these translation sessions, where neither contestants or team leaders may not use cell phones, laptops, and other communication devices. === Team Challenge === The Team Challenge is a team-based component of the IOAI. The results of this part do not affect the distribution of medals. The IOAI 2025 rules describe it as a “creative and AI-oriented challenge” in which a team's contestants sit together and cooperate, with the format varying by year. In IOAI 2024, teams worked with existing AI image and video generation tools to produce a visual result. In IOAI 2025, teams were assigned to program a robot to complete various tasks. === GAITE Contest === The GAITE (Global AI Talent Empowerment) contest is a simplified version of the individual contest with a separate scoreboard, where participants may ask for hints. It is designed for countries and territories with limited International Science Olympiads history, and it awards alternative prizes instead of medals. == Awards Distribution == The top 50% of the participants in the individual contest receive gold, silver and bronze medals in ratio of 1:2:3, respectively. The top three individuals receive honorary trophies. As in other International Science Olympiads, if an individual is in the top 50% on one of the days, but does not receive a medal, they receive an honorary mention during the awards ceremony. The GAITE contest has similar cutoff logic, but receives a reward instead of a medal. The top three teams in the Team Challenge receive trophies. == National selection and regional competitions == National delegations are selected through country-level qualification processes referred to as National Olympiads in Artificial Intelligence (NOAI) or equivalent, which are widely known for their low success rates. Although the total number of participants worldwide is not published, available data indicate exceptionally competitive national pools; for example, Brazil reports over 716,000 competitors, while Russia reports more than 72,000. In addition, Regional Olympiads (for example, APOAI or NAOAI) provide continent-level competition and preparation platforms in most regions. === National Selection (National Olympiads in Artificial Intelligence) === Participating countries and territories select their students for the IOAI through a National Olympiad in Artificial Intelligence (NOAI) or an equivalent process. The names of these selection processes differ by country, but almost all of them (excluding newer countries participating in the GAITE contest) have in common that the process comprises multiple and/or extremely rigorous selection stages. United States / Canada – The USA–North America AI Olympiad (USAAIO) is a three-round process including an invitational in-person round and a subsequent selection camp, after which a national delegation is selected for IOAI. Russia – The Russian Olympiad in Artificial Intelligence is organized as a multi-stage process (training, qualification, main round, final). Organizers reported 72,316 registrations for the training round and 52,260 registrations for the qualifying round in one season, with tasks spanning mathematics, algorithms/programming, and machine learning; 977 students were disqualified following plagiarism checks. Japan – Japan's national selection consists of multiple stages, beginning with the Japan Olympiad in Artificial Intelligence (JOAI), a large-scale Kaggle-style competition. High-performing participants advance through additional assessment stages, including written solution reports and technical interviews. From this process, eight students are selected for the APOAI team, with four ultimately chosen to represent Japan at the IOAI. Brazil – Brazil's National Olympiad in Artificial Intelligence (ONIA) is conducted as a large competition which consists of progressive rounds of evaluation. It identifies 28 top students from over 716,000 competitors, four of which are selected for the IOAI. The competition is held in four phases across two cycles, including a two-step third phase and a final training-and-evaluation phase that selects a four-student national team. Singapore – Singapore's national Olympiad consists of two rounds: an online preliminary round (300 MCQs in 3 hours) selects the top 150 performers to advance to the final assessment, which includes both theory questions and Python programming tasks. Additional training and selection may follow the finals for top performers. Poland – The Polish AI Olympiad adopts a two-stage structure: an open online first stage (at-home tasks) and a second-stage competitive camp with 30 selected participants competing for a four-person IOAI team. France – The Olympiades Françaises d'Intelligence Artificielle (OFIA), organized by France-IOI, follow a three-stage structure consisting of an open online qualification round, a second selection round, and a multi-day national training camp and final in Paris. Bangladesh – The Bangladesh AI Olympiad (BdAIO) selects competitors in three rounds: the online preliminary round, the national finals, and the team selection camp. In 2025, 406 participants competed in the national finals. Norway – The Norwrgian AI Olympiad (NOKI) is a three-stage selection system; however, unlike other countries, its first two rounds are shared with the Norwegian Informatics Olympiad. The national Olympiad reports 1,180 participants in the first round. Hong Kong – The national Olympiad reported more than 800 preliminary-round entrants, narrowing through multiple rounds to 25 finalists, with a subsequent

Argument Interchange Format

The Argument Interchange Format (AIF) is an international effort to develop a representational mechanism for exchanging argument resources between research groups, tools, and domains using a semantically rich language. AIF traces its history back to a 2005 colloquium in Budapest. The result of the work in Budapest was first published as a draft description in 2006. Building on this foundation, further work then used the AIF to build foundations for the Argument Web. AIF-RDF is the extended ontology represented in the Resource Description Framework Schema (RDFS) semantic language. The Argument Interchange Format introduces a small set of ontological concepts that aim to capture a common understanding of argument -- one that works in multiple domains (both domains of argumentation and also domains of academic research), so that data can be shared and re-used across different projects in different areas. These ontological concepts are: Information (I-nodes) Applications of Rules of Inference (RA-nodes) Applications of Rules of Conflict (CA-nodes) Applications of Rules of Preference (PA-nodes) extended by: Schematic Forms (F-nodes) that are instantiated by RA, CA and PA nodes The AIF has reifications in a variety of development environments and implementation languages including MySQL database schema RDF Prolog JSON as well as translations to visual languages such as DOT and SVG. AIF data can be accessed online at AIFdb.

The Old Axolotl

The Old Axolotl (Polish: Starość aksolotla) is a 2015 digital-only novel by Polish science-fiction author Jacek Dukaj. The novel was released in Polish on March 10, 2015, and shortly afterward, on March 24 that year, in English (translated by Stanley Bill). It has been described as "an experiment in reading (and creating) the electronic literature of the future". It is Dukaj's first novel to be published in English, though several of his short stories (The Golden Galley, 1996, The Iron General, 2010, The Apocrypha of Lem, 2011) have been translated prior to this. The novel has inspired two Netflix original series: the 2020 Belgian Into the Night, and its 2022 Turkish language spin-off Yakamoz S-245. == Plot == The novel presents a post-apocalyptic, cyberpunk vision of Earth where biological life has been wiped out, inhabited by robots and mechs, many of which are humans whose consciousness has been digitized in the wake of an extinction event. == Significance and analysis == The novel is an example of electronic literature, available only in digital formats, and has no traditional paper version. It was designed from the beginning not only to incorporate more traditional elements such as illustrations, but also hypertext, and 3D-printable models of main robotic characters designed by Alex Jaeger, the art director of Transformers films. The novel composition is layered, with the narrative layer, an encyclopedic/hyperlinked footnote layer, and a multimedia layer, including illustrations and a short promotional video by the Oscar-nominated Platige Image studio. One of the novel's central questions is: "What does it mean to be human?" Other subjects include post humanism and other "staples of cyberpunk and related genres, such as the artificial intelligence". The novel is representative of Dukaj's prose, posing philosophical questions about the future of man and technology. The author explained that: "stories such as The Old Axolotl that model an ‘escape from the body’ are born out of a sense of progress as a process of ‘de-animalising’ human beings through science. This has its origin in the pre-Enlightenment intuition of ‘liberation from nature’. For one of the last shackles of nature is corporeality itself, the limitations of our physicality." The other major element of the novel is Dukaj's attempts to introduce the reader to the new style of electronic literature. The novel was nominated for the 2016 Janusz A. Zajdel Award.

Joseph Stanislaus Ostoja-Kotkowski

Joseph Stanislaus Ostoja-Kotkowski AM, FRSA (also known as J. S. Ostoja-Kotkowski, Ostoja and Stan Ostoja-Kotkowski; 28 December 1922 – 2 April 1994) was best known for his ground-breaking work in chromasonics, laser kinetics and 'sound and image' productions. He earned recognition in Australia and overseas for his pioneering work in laser sound and image technology. His work included painting (instrumental in developing geometric art in Australia), photography, film-making, theatre design, fabric design, murals, kinetic and static sculpture, stained glass, vitreous enamel murals, op-collages, computer graphics, and laser art. Ostoja flourished between 1940 and 1994. Ostoja's films are still being exhibited. == Biography == Joseph Stanislaus Ostoja-Kotkowski was born in Golub, Poland, on 28 December 1922, descending from an old noble family that was part of the Clan of Ostoja. He studied drawing under Olgierd Vetesco in Przasnysz from 1940-1945. After winning a scholarship, he completed his studies at the Düsseldorf Academy of Fine Arts in Germany in 1949. In 1950 Ostoja migrated to Australia, arriving in Melbourne where he supported himself with work as a labourer. He enrolled at the Victorian School of Fine Arts National Gallery School under Alan Sumner and William Dargie 1950-1955 and there introduced the new abstract expression of Europe both to lecturers and students. He settled in the Adelaide Hills, South Australia, on the Booth estate at Stirling, living under the patronage of the Booth family for over 40 years (Freya Booth, the wife of Edward Stirling Booth, was a daughter of the artist Sir Hans Heysen). His first one-man exhibition was also in South Australia at the Royal Society of Arts, Adelaide. In 1956 Ostoja met and collaborated with Ian Davidson in the production of the short film Five South Australian Artists, and became involved in stage and theatre set design. He co-produced several experimental films again with Ian Davidson, including The Quest of Time in 1957 Ostoja's work in abstract expression began to receive accolades. He won the Cornell Prize for the canvas Form in Landscape. He started to design sets for theatre and dance including for Six Characters in Search of an Author by Luigi Pirandello (1957); the South Australian production of Samuel Beckett's Waiting for Godot (1958); Gaetano Donizetti's Elixir of Love, with novel light settings and modulations, for the Elder Conservatorium of the University of Adelaide which used his techniques for their Opera Workshops (1959); for The Egg; and for two performances of the South Australian Ballet Theatre with light/colour abstract presentations (1959). 1960 This year he designed sets for a new opera group which would eventually grow into the South Australian Opera Company. Among other theatrical events, he designed and executed the scenery for Moon on a Rainbow Shawl by Errol John, and The Teahouse of the August Moon by John Patrick, (a production by the University of Adelaide Theatre Guild). He received artistic satisfaction but little financial reward for these efforts. In this year also, he staged a visual production on the theme of Orpheus, using dance, music and voice with several projectors. This was the first attempt at quadraphonic sound in Australia, working in collaboration with Derek Jolly, who provided the sound and projection equipment. It was also the first demonstration of "Chromasonics" - the science of translating sound into visual images. Ostoja then designed innovative "abstracted" scenery for a production of The Marriage of Figaro and Benjamin Britten's The Turn of the Screw. 1961 Ostoja designed the sets for the controversial South Australian production of Patrick White's The Ham Funeral - also Alan Seymour's Swamp Creatures, both performed by the University of Adelaide Theatre Guild. He designed and constructed six stained glass windows for the Refectory at the University of Adelaide. In this period Ostoja designed special lights and gauzes for difficult effects required in an ambitious production of the opera Don Carlos by the Opera Workshop, for the Elder Conservatorium. 1962 Ostoja designed and built sets for the production of J.B, by Archibald MacLeish, for the second Adelaide Festival of Arts. He exhibited vitreous enamel works in Melbourne's Argus Gallery. Max Harris, in The Bulletin of 20 October 1962, praised Ostoja's sets for My Cousin from Fiji in Union Theatre, Adelaide, and his technique of rear screen projections as later adopted throughout Australia. 1963 Ostoja continued to develop Multi-Image projections, demonstrating for the first time in Australia the concept later to be known as 'audio-visuals!'. Ostoja gave Sir Herbert Read, the art critic, a personal viewing of one of his visual presentations. At Christmas, in the Elder Conservatorium, collaborating again with Derek Jolly, Ostoja gave what was probably the world's first "visual concert", using special projectors and incorporating music, colours and shapes. 1964 With fellow Adelaide artist John Dallwitz, Ostoja co-designed the first of several experimental dance and stage productions in the Adelaide Festival of Arts Sound and Image. The production featured Adelaide dancer Elizabeth_Cameron_Dalman. Also for the Adelaide Festival of Arts of that year, he designed the largest light mosaic ever staged up to that time, upon the facade of an 11-storey building. Ostoja was invited to New Zealand, and exhibited the first electronically generated images in Australia in Melbourne, at the Argus Gallery. His design for the 50-foot (15 m) bas-relief mural for the new B.P. building in Melbourne was the subject of a film which won the "Blue Ribbon" Award in the American Film Festival in New York. 1965 Ostoja designed and made the first light kinetic mural in Australia, and continued to evolve theatrical works using multi-screen and Multi-projector techniques. The Production of Jean Genet's The Balcony was very controversial. With Elizabeth Dalman, Ostoja produced new dance forms for Melbourne Television. He introduced Op Art to Australia, both at South Yarra Gallery in Melbourne, and Gallery A in Sydney. 1966 With John Dallwitz, Ostoja was invited by the Adelaide Festival of Arts to present more experimental theatre, Sound and image 1966. This highly acclaimed production incorporated Australian poetry into the sound, electronic music, and visual images and featured the dancer Antonio Rodrigues. The architect Robin Boyd commissioned Ostoja to design two large Op murals for the Australian Pavilion entrance at the Expo 67. Ostoja was awarded a Churchill Fellowship, which enabled him to have extensive world travel, comparing art and technology in many countries. He began to work with language, contemporary poetry and prose, and computers. 1967 John Dallwitz and Ostoja presented Sound and Image at the Festival of Perth. In Berne, Switzerland, Ostoja received the "Excellence F.I.A.P." Award for innovative photography. 1968 At the Adelaide Festival of Arts, Ostoja and John Dallwitz collaborated again to stage Sound and Image. This was the first theatre production in the world to use a laser beam. It also included the first science fiction play (The Veldt by Ray Bradbury) performed in Australia. Ostoja's theatre methods were increasingly attracting the attention of critics to how plays were staged. "Chromasonics", developed and introduced by Ostoja, was now being used extensively in the entertainment industry. 1969 Ostoja staged Krzysztof Penderecki's St. Luke Passion, a controversial, contemporary religious work. The South Australian The Advertiser wrote an extensive critique of Ostoja's work. Robin Boyd commissioned Ostoja to build a "Chromasonic" exhibit located in the Space Tube at the Australian Pavilion for Expo '70 in Osaka. 1970 Ostoja presented an Australian Aboriginal Dreamtime theme in his "Sound and Image" theatre, working with leading contemporary figures in poetry, music and dance. This was the first production of its kind in Australia, and appeared after the Festival in Melbourne, Sydney, Canberra and Perth. Ostoja's Space Scape mural, sixty feet long by ten feet high, won the Australia-wide competition for a mural for Adelaide Airport. His 120 feet (37 m) high 'light and sound' structure for the Adelaide Festival was the first of its kind in the world. 1971 Ostoja awarded a Creative Arts Fellowship at the Australian National University, Canberra. His 18-month stay resulted in the design and building of a "Chromasonics unit-laser", a 100 feet (30 m) Chromasonic tower, and a world premiere of a Synchronos concert. 1972 With Don Burrows and Don Banks, Ostoja presented Synchronos 72, where one could "hear the colours and see the sounds". Ostoja added Cymatics, developed during the Fellowship, to his workshop repertoire. He was invited to exhibit his photography in the National Gallery, Melbourne. 1973 Ostoja received a Fellowship from the Australian American Education Associatio

Riffusion

Riffusion is a neural network, designed by Seth Forsgren and Hayk Martiros, that generates music using images of sound rather than audio. The resulting music has been described as "de otro mundo" (otherworldly), although unlikely to replace man-made music. The model was made available on December 15, 2022, with the code also freely available on GitHub. The first version of Riffusion was created as a fine-tuning of Stable Diffusion, an existing open-source model for generating images from text prompts, on spectrograms, resulting in a model which used text prompts to generate image files which could then be put through an inverse Fourier transform and converted into audio files. While these files were only several seconds long, the model could also use latent space between outputs to interpolate different files together (using the img2img capabilities of SD). It was one of many models derived from Stable Diffusion. In December 2022, Mubert similarly used Stable Diffusion to turn descriptive text into music loops. In January 2023, Google published a paper on their own text-to-music generator called MusicLM. Forsgren and Martiros formed a startup, also called Riffusion, and raised $4 million in venture capital funding in October 2023.