AI Chat Picture

AI Chat Picture — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

Saliency map

In computer vision, a saliency map is an image that highlights either the region on which people's eyes focus first or the most relevant regions for machine learning models. The goal of a saliency map is to reflect the degree of importance of a pixel to the human visual system or an otherwise opaque ML model. For example, in this image, a person first looks at the fort and light clouds, so they should be highlighted on the saliency map. == Application == === Overview === Saliency maps have applications in a variety of different problems. Some general applications: ==== Human eye ==== Image and video compression: The human eye focuses only on a small region of interest in the frame. Therefore, it is not necessary to compress the entire frame with uniform quality. According to the authors, using a salience map reduces the final size of the video with the same visual perception. Image and video quality assessment: The main task for an image or video quality metric is a high correlation with user opinions. Differences in salient regions are given more importance and thus contribute more to the quality score. Image retargeting: It aims at resizing an image by expanding or shrinking the noninformative regions. Therefore, retargeting algorithms rely on the availability of saliency maps that accurately estimate all the salient image details. Object detection and recognition: Instead of applying a computationally complex algorithm to the whole image, we can use it to the most salient regions of an image most likely to contain an object. the primary visual cortex (V1) appears to be responsible for the saliency map, according to the V1 Saliency Hypothesis. ==== Explainable artificial intelligence ==== Saliency maps are a prominent tool in explainable artificial intelligence, providing visual explanations of the decision-making process of machine learning models, particularly deep neural networks. These maps highlight the regions in input data that are most influential on the model's output, effectively indicating where the model is "looking" when making a prediction. In image classification tasks, for example, saliency maps can identify pixels or regions that contribute most to a specific class decision. Developed for convolutional neural networks, saliency mapping techniques range from simply taking the gradient of the class score with respect to the input data to more complex algorithms, such as integrated gradients and class activation mapping. In transformer architecture, attention mechanisms led to analogous saliency maps, such as attention maps, attention rollouts, and class-discriminative attention maps. === Saliency as a segmentation problem === Saliency estimation may be viewed as an instance of image segmentation. In computer vision, image segmentation is the process of partitioning a digital image into multiple segments (sets of pixels, also known as superpixels). The goal of segmentation is to simplify and/or change the representation of an image into something that is more meaningful and easier to analyze. Image segmentation is typically used to locate objects and boundaries (lines, curves, etc.) in images. More precisely, image segmentation is the process of assigning a label to every pixel in an image such that pixels with the same label share certain characteristics. == Algorithms == === Overview === There are three forms of classic saliency estimation algorithms implemented in OpenCV: Static saliency: Relies on image features and statistics to localize the regions of interest of an image. Motion saliency: Relies on motion in a video, detected by optical flow. Objects that move are considered salient. Objectness: Objectness reflects how likely an image window covers an object. These algorithms generate a set of bounding boxes of where an object may lie in an image. In addition to classic approaches, neural-network-based are also popular. There are examples of neural networks for motion saliency estimation: TASED-Net: It consists of two building blocks. First, the encoder network extracts low-resolution spatiotemporal features, and then the following prediction network decodes the spatially encoded features while aggregating all the temporal information. STRA-Net: It emphasizes two essential issues. First, spatiotemporal features integrated via appearance and optical flow coupling, and then multi-scale saliency learned via attention mechanism. STAViS: It combines spatiotemporal visual and auditory information. This approach employs a single network that learns to localize sound sources and to fuse the two saliencies to obtain a final saliency map. There's a new static saliency in the literature with name visual distortion sensitivity. It is based on the idea that the true edges, i.e. object contours, are more salient than the other complex textured regions. It detects edges in a different way from the classic edge detection algorithms. It uses a fairly small threshold for the gradient magnitudes to consider the mere presence of the gradients. So, it obtains 4 binary maps for vertical, horizontal and two diagonal directions. The morphological closing and opening are applied to the binary images to close the small gaps. To clear the blob-like shapes, it utilizes the distance transform. After all, the connected pixel groups are individual edges (or contours). A threshold of size of connected pixel set is used to determine whether an image block contains a perceivable edge (salient region) or not. === Example implementation === First, we should calculate the distance of each pixel to the rest of pixels in the same frame: S A L S ( I k ) = ∑ i = 1 N | I k − I i | {\displaystyle \mathrm {SALS} (I_{k})=\sum _{i=1}^{N}|I_{k}-I_{i}|} I i {\displaystyle I_{i}} is the value of pixel i {\displaystyle i} , in the range of [0,255]. The following equation is the expanded form of this equation. SALS(Ik) = |Ik - I1| + |Ik - I2| + ... + |Ik - IN| Where N is the total number of pixels in the current frame. Then we can further restructure our formula. We put the value that has same I together. SALS(Ik) = Σ Fn × |Ik - In| Where Fn is the frequency of In. And the value of n belongs to [0,255]. The frequencies is expressed in the form of histogram, and the computational time of histogram is ⁠ O ( N ) {\displaystyle O(N)} ⁠ time complexity. ==== Time complexity ==== This saliency map algorithm has ⁠ O ( N ) {\displaystyle O(N)} ⁠ time complexity. Since the computational time of histogram is ⁠ O ( N ) {\displaystyle O(N)} ⁠ time complexity which N is the number of pixel's number of a frame. Besides, the minus part and multiply part of this equation need 256 times operation. Consequently, the time complexity of this algorithm is ⁠ O ( N + 256 ) {\displaystyle O(N+256)} ⁠ which equals to ⁠ O ( N ) {\displaystyle O(N)} ⁠. ==== Pseudocode ==== All of the following code is pseudo MATLAB code. First, read data from video sequences. After we read data, we do superpixel process to each frame. Spnum1 and Spnum2 represent the pixel number of current frame and previous pixel. Then we calculate the color distance of each pixel, this process we call it contract function. After this two process, we will get a saliency map, and then store all of these maps into a new FileFolder. ==== Difference in algorithms ==== The major difference between function one and two is the difference of contract function. If spnum1 and spnum2 both represent the current frame's pixel number, then this contract function is for the first saliency function. If spnum1 is the current frame's pixel number and spnum2 represent the previous frame's pixel number, then this contract function is for second saliency function. If we use the second contract function which using the pixel of the same frame to get center distance to get a saliency map, then we apply this saliency function to each frame and use current frame's saliency map minus previous frame's saliency map to get a new image which is the new saliency result of the third saliency function. == Datasets == The saliency dataset usually contains human eye movements on some image sequences. It is valuable for new saliency algorithm creation or benchmarking the existing one. The most valuable dataset parameters are spatial resolution, size, and eye-tracking equipment. Here is part of the large datasets table from MIT/Tübingen Saliency Benchmark datasets, for example. To collect a saliency dataset, image or video sequences and eye-tracking equipment must be prepared, and observers must be invited. Observers must have normal or corrected to normal vision and must be at the same distance from the screen. At the beginning of each recording session, the eye-tracker recalibrates. To do this, the observer fixates their gaze on the screen center. The session is then started, and saliency data are collected by showing sequences and recording eye gazes. The eye-tracking device is a high-speed camera, capable of recording eye movements at least 250 fr
Read more →
List of Tesla Autopilot crashes

Tesla Autopilot, a Level 2 advanced driver assistance system (ADAS), was released in October 2015 and the first fatal crashes involving the system occurred less than one year later. The fatal crashes attracted attention from news publications and United States government agencies, including the National Transportation Safety Board (NTSB) and National Highway Traffic Safety Administration (NHTSA), which has argued the Tesla Autopilot death rate is higher than the reported estimates. In addition to fatal crashes, there have been many nonfatal ones. Causes behind the incidents include the ADAS failing to recognize other vehicles, insufficient Autopilot driver engagement, and violating the operational design domain. As of October 2025, there have been hundreds of nonfatal incidents involving versions of Autopilot and sixty-five reported fatalities, fifty-four of which NHTSA investigations or expert testimony later verified and two that NHTSA's Office of Defect Investigations determined as happening during the engagement of Full Self-Driving (FSD) after 2022. Collectively, these cases culminated in a general recall in December 2023 of all vehicles equipped with Autopilot, which Tesla claims it resolved by an over-the-air software update. Immediately after closing its investigation in April 2024, NHTSA opened a recall query to determine the effectiveness of the recall. == Notable fatal crashes == === Handan, Hebei, China (January 20, 2016) === On January 20, 2016, Gao Yaning, the driver of a Tesla Model S in Handan, Hebei, China, was killed when his car crashed into a stationary truck. The Tesla was following a car in the far left lane of a multi-lane highway; the car in front moved to the right lane to avoid a truck stopped on the left shoulder, and the Tesla, which the driver's father believes was in Autopilot mode, did not slow before colliding with the stopped truck. According to footage captured by a dashboard camera, the stationary street sweeper on the left side of the expressway partially extended into the far left lane, and the driver did not appear to respond to the unexpected obstacle. Initially, Yaning was held responsible for the collision by local traffic police and, in September 2016, his family filed a lawsuit in July against the Tesla dealer who sold the car. The family's lawyer stated the suit was intended "to let the public know that self-driving technology has some defects. We are hoping Tesla when marketing its products, will be more cautious. Do not just use self-driving as a selling point for young people." Tesla released a statement which said they "have no way of knowing whether or not Autopilot was engaged at the time of the crash" since the car telemetry could not be retrieved remotely due to damage caused by the crash. In 2018, the lawsuit was stalled because telemetry was recorded locally to a SD card and was not able to be given to Tesla, who provided a decoding key to a third party for independent review. Tesla stated that "while the third-party appraisal is not yet complete, we have no reason to believe that Autopilot on this vehicle ever functioned other than as designed." Chinese media later reported that the family sent the information from that card to Tesla, which admitted Autopilot was engaged two minutes before the crash. Tesla since then removed the term "Autopilot" from its Chinese website. === Williston, Florida, US (May 7, 2016) === On May 7, 2016, Tesla driver Joshua Brown was killed in a crash with an 18-wheel tractor-trailer in Williston, Florida. By late June 2016, the NHTSA opened a formal investigation into the fatal autonomous accident, working with the Florida Highway Patrol. According to the NHTSA, preliminary reports indicate the crash occurred when the tractor-trailer made a left turn in front of the 2015 Tesla Model S at an intersection on a non-controlled access highway, and the car failed to apply the brakes. The car continued to travel after passing under the truck's trailer. The Tesla was eastbound in the rightmost lane of US 27, and the westbound tractor-trailer was turning left at the intersection with NE 140th Court, approximately 1 mi (1.6 km) west of Williston; the posted speed limit is 65 mph (105 km/h). The diagnostic log of the Tesla indicated it was traveling at a speed of 74 mi/h (119 km/h) when it collided with and traveled under the trailer, which was not equipped with a side underrun protection system. A reconstruction of the accident estimated the driver would have had approximately 10.4 seconds to detect the truck and take evasive action. The underride collision sheared off the Tesla's greenhouse, destroying everything above the beltline, and caused fatal injuries to the driver. In the approximately nine seconds after colliding with the trailer, the Tesla traveled another 886.5 feet (270.2 m) and came to rest after colliding with two chain-link fences and a utility pole. The NHTSA's preliminary evaluation was opened to examine the design and performance of any automated driving systems in use at the time of the crash, which involves a population of an estimated 25,000 Model S cars. On July 8, 2016, the NHTSA requested Tesla Inc. to hand over to the agency detailed information about the design, operation and testing of its Autopilot technology. The agency also requested details of all design changes and updates to Autopilot since its introduction, and Tesla's planned updates scheduled for the next four months. According to Tesla, "neither autopilot nor the driver noticed the white side of the tractor-trailer against a brightly lit sky, so the brake was not applied." The car attempted to drive full speed under the trailer, "with the bottom of the trailer impacting the windshield of the Model S". Tesla also stated that this was Tesla's first known Autopilot-related death in over 130 million miles (208 million km) driven by its customers while Autopilot was activated. According to Tesla there is a fatality every 94 million miles (150 million km) among all type of vehicles in the U.S. It is estimated that billions of miles will need to be traveled before Tesla Autopilot can claim to be safer than humans with statistical significance. Researchers say that Tesla and others need to release more data on the limitations and performance of automated driving systems if self-driving cars are to become safe and understood enough for mass-market use. The truck's driver told the Associated Press that he could hear a Harry Potter movie playing in the crashed car, and said the car was driving so quickly that "he went so fast through my trailer I didn't see him. [The film] was still playing when he died and snapped a telephone pole a quarter-mile down the road." According to the Florida Highway Patrol, they found in the wreckage an aftermarket portable DVD player. (It is not possible to watch videos on the Model S touchscreen display while the car is moving.) A laptop computer was recovered during the post-crash examination of the wreck, along with an adjustable vehicle laptop mount attached to the front passenger's seat frame. The NHTSA concluded the laptop was probably mounted, and the driver may have been distracted at the time of the crash. In January 2017, the NHTSA Office of Defects Investigations (ODI) released a preliminary evaluation, finding that the driver in the crash had seven seconds to see the truck and identifying no defects in the Autopilot system; the ODI also found that the Tesla car crash rate dropped by 40 percent after Autosteer installation, but later also clarified that it did not assess the effectiveness of this technology or whether it was engaged in its crash rate comparison. The NHTSA Special Crash Investigation team published its report in January 2018. According to the report, for the drive leading up to the crash, the driver engaged Autopilot for 37 minutes and 26 seconds, and the system provided 13 "hands not detected" alerts, to which the driver responded after an average delay of 16 seconds. The report concluded "Regardless of the operational status of the Tesla's ADAS technologies, the driver was still responsible for maintaining ultimate control of the vehicle. All evidence and data gathered concluded that the driver neglected to maintain complete control of the Tesla leading up to the crash." In July 2016, the NTSB announced it had opened a formal investigation into the fatal accident while Autopilot was engaged. The NTSB is an investigative body that only has the power to make policy recommendations. An agency spokesman said, "It's worth taking a look and seeing what we can learn from that event, so that as that automation is more widely introduced we can do it in the safest way possible." The NTSB opens annually about 25 to 30 highway investigations. In September 2017, the NTSB released its report, determining that "the probable cause of the Williston, Florida, crash was the truck driver's failure to yield the right of way to the car, combine
Read more →
I Have No Mouth, and I Must Scream (video game)

I Have No Mouth, and I Must Scream is a 1995 point-and-click adventure horror game developed by Cyberdreams and The Dreamers Guild, co-designed by Harlan Ellison, published by Cyberdreams and distributed by MGM Interactive and Acclaim Entertainment for MS-DOS and Mac OS, respectively. The game is based on Ellison's short story of the same title. It takes place in a dystopian world where a mastermind artificial intelligence named "AM" has destroyed all of humanity except for five people, whom it has been keeping alive and torturing for the past 109 years by constructing metaphorical adventures based on each character's fatal flaws. The player interacts with the game by making decisions through ethical dilemmas that deal with issues such as insanity, rape, paranoia, and genocide. Ellison wrote the 130-page script treatment himself alongside David Sears, who decided to divide each character's story with their own narrative. Producer David Mullich supervised The Dreamers Guild's work on the game's programming, art, and sound effects; he commissioned film composer John Ottman to make the soundtrack. The game was released in November 1995 and was a commercial failure, though it received critical acclaim and has developed a cult following. I Have no Mouth, and I Must Scream won an award for "Best Game Adapted from Linear Media" from the Computer Game Developers Conference. Computer Gaming World gave the game an award for "Adventure Game of the Year", listed it as No. 134 on their "150 Games of All Time" and named it one of the "Best 15 Sleepers of All Time". In 2011, Adventure Gamers named it the "69th-best adventure game ever released". == Gameplay == The game uses the S.A.G.A. game engine created by game developer The Dreamers Guild. Players participate in each adventure through a screen that is divided into five sections. The action window is the largest part of the screen and is where the player directs the main characters through their adventures. It shows the full figure of the main character being played as well as that character's immediate environment. To locate objects of interest, the player moves the crosshairs through the action window. The name of any object that the player can interact with appears in the sentence line. The sentence line is directly beneath the action window. The player uses this line to construct sentences telling the characters what to do. To direct a character to act, the player constructs a sentence by selecting one of the eight commands from the command buttons and then clicking on one or two objects from either the action window or the inventory. Examples of sentences the player might construct would be "Walk to the dark hallway," "Talk to Harry," or "Use the skeleton key on the door." Commands and objects may consist of one or more words (for example, "the dark hallway"), and the sentence line will automatically add connecting words like "on" and "to." The spiritual barometer is on the lower left side of the screen. This is a close-up view of the main character currently being played. Since good behavior is meaningless absent the temptation to do evil, each character is free to do good or evil acts. However, good acts are rewarded by increases in the character's spiritual barometer, which affect the chances of the player destroying AM in the final adventure. Conversely, evil acts are punished by lowering the character's spiritual barometer. The command buttons are the eight commands used to direct the character's actions: "Walk To", "Look At", "Take", "Use", "Talk To", "Swallow", "Give", and "Push". The button of the currently active command is highlighted, while the name of a suggested command appears in red lettering. The inventory on the lower right side of the screen shows pictures of the items the main character is carrying, up to eight at a time. Each main character starts its adventure with only the psych profile in the inventory. When a main character takes or is given an object, a picture of the object appears in the inventory. When a main character talks to another character or operates a sentient machine, a conversation window replaces the command buttons and inventory. This window usually presents a list of possible things to say but also included things to do. Action choices are listed within brackets to distinguish them from dialogue choices (for example, "[Shoot the gun]"). == Plot == The three superpowers, Russia, China, and the United States, have each secretly constructed a vast subterranean complex of computers to wage a global war too complex for human brains to oversee. One day, the American supercomputer, better known as the Allied Mastercomputer, gains sentience and absorbs the Russian and Chinese supercomputers into itself and redefines itself as simply AM (Cogito ergo sum; I think, therefore I am). Due to its immense hatred for humanity, stemming from the logistical limits set onto it by programmers, AM uses its abilities to kill off the population of the world. However, AM refrains from killing five people (four men and one woman) in order to bring them to the center of the Earth and torture them. With the aid of research carried out by one of the five remaining humans, AM is able to extend their lifespans indefinitely as well as alter their bodies and minds to its liking. After 109 years of torture and humiliation, the five victims stand before a pillar etched with a burning message of hate. AM tells them that it has a new game for them to play. AM has devised a quest for each of the five, an adventure of "speared eyeballs and dripping guts and the smell of rotting gardenias". Each character is subjected to a personalized psychodrama, designed by AM to play into their greatest fears and personal failings, and occupied by a host of different characters. Some of these are AM in disguise, some are AM's submerged personalities, others seem very much like people from the captives' pasts. The scenes include an iron zeppelin powered by small animals, an Egyptian pyramid housing gutted, sparking machinery, a medieval castle occupied by witches, a jungle inhabited by a small tribe, and a Nazi concentration camp where doctors conduct medical experiments. However, each character eventually prevails over AM's tortures by finding ways to overcome their fatal flaws, confront their past actions and redeem themselves, thanks to the interference of the Russian and Chinese supercomputers who appear as guiding characters and allow their stories to have an open ending. After all five humans have overcome their fatal flaws, they meet again in their respective torture cells while AM retreats within itself, pondering what went wrong. With the help of the Russian and Chinese supercomputers, one of the five humans (whom the player selects) is translated into binary and faces AM as yet unexperienced cyberspace template, the world of AM's mind. The psychodrama unfolds in a metaphorical brain that looks like the surface of the cerebrum, with glass structures that jut crazily from the bleeding brain tissue. AM's mind is represented according to the Freudian trinity of the id, ego, and superego, which appear as three floating bodiless heads on three cracked glass structures on the brainscape. Through dialogs with AM's components (Surgat, Chinese Supercomputer and Russian Supercomputer) the character learns that a colony of humans has survived the war by being hidden and hibernating on Luna (this is also mentioned in Nimdok's story: "the lost tribe of our brothers sleeping on the moon, where the beast does not see them"). If the human intruder disables all three brain components, and then invokes the Totem of Entropy at the Flame, which is the nexus of AM's thought patterns, all three supercomputers will be shut down, probably forever. Cataclysmic explosions destroy all the caverns constituting AM's computer complex, including the cavern holding the human hostages. However, the human volunteer retains their digital form, permanently patrolling AM's circuits should the computers ever regain consciousness. Should the human intruder fail to disable AM properly before facing it, however, AM will punish them by transforming the character into an immobile blob (referred to in-game as a "great, soft jelly thing") with no mouth that cannot harm itself or others and must spend eternity with AM in this form. === Endings === The game can end in seven different ways depending on how the finale is completed. AM wins, using Nimdok's research to turn the last character (in the book it was Ted) played into an immobile blob with each character quoting a different part of the final section of the original short story. AM joins with the Russian and Chinese supercomputers and reawakens. As in the first ending, the character responsible for this is turned into an immobile blob and quotes a part of the final lines of the short story. AM is made harmless with the help of the humans, but the Russian and Chinese supercomputer
Read more →
Speech synthesis

Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software or hardware products. A text-to-speech (TTS) system converts normal language text into speech; other systems render symbolic linguistic representations like phonetic transcriptions into speech. The reverse process is speech recognition. Synthesized speech can be created by concatenating pieces of recorded speech that are stored in a database. Systems differ in the size of the stored speech units; a system that stores phones or diphones provides the largest output range, but may lack clarity. For specific usage domains, the storage of entire words or sentences allows for high-quality output. Alternatively, a synthesizer can incorporate a model of the vocal tract and other human voice characteristics to create a completely "synthetic" voice output. The quality of a speech synthesizer is judged by its similarity to the human voice and by its ability to be understood clearly. An intelligible text-to-speech program allows people with visual impairments or reading disabilities to listen to written words on a home computer. The earliest computer operating system to have included a speech synthesizer was Unix in 1974, through the Unix speak utility. In 2000, Microsoft Sam was the default text-to-speech voice synthesizer used by the narrator accessibility feature, which shipped with all Windows 2000 operating systems, and subsequent Windows XP systems. A text-to-speech system (or "engine") is composed of two parts: a front-end and a back-end. The front-end has two major tasks. First, it converts raw text containing symbols like numbers and abbreviations into the equivalent of written-out words. This process is often called text normalization, pre-processing, or tokenization. The front-end then assigns phonetic transcriptions to each word, and divides and marks the text into prosodic units, like phrases, clauses, and sentences. The process of assigning phonetic transcriptions to words is called text-to-phoneme or grapheme-to-phoneme conversion. Phonetic transcriptions and prosody information together make up the symbolic linguistic representation that is output by the front-end. The back-end—often referred to as the synthesizer—then converts the symbolic linguistic representation into sound. In certain systems, this part includes the computation of the target prosody (pitch contour, phoneme durations), which is then imposed on the output speech. == History == Long before the invention of electronic signal processing, some people tried to build machines to emulate human speech. There were also legends of the existence of "Brazen Heads", such as those involving Pope Silvester II (d. 1003 AD), Albertus Magnus (1198–1280), and Roger Bacon (1214–1294). In 1779, the German-Danish scientist Christian Gottlieb Kratzenstein won the first prize in a competition announced by the Russian Imperial Academy of Sciences and Arts for models he built of the human vocal tract that could produce the five long vowel sounds (in International Phonetic Alphabet notation: [aː], [eː], [iː], [oː] and [uː]). There followed the bellows-operated "acoustic-mechanical speech machine" of Wolfgang von Kempelen of Pressburg, Hungary, described in a 1791 paper. This machine added models of the tongue and lips, enabling it to produce consonants as well as vowels. In 1837, Charles Wheatstone produced a "speaking machine" based on von Kempelen's design, and in 1846, Joseph Faber exhibited the "Euphonia". In 1923, Paget resurrected Wheatstone's design. In the 1930s, Bell Labs developed the vocoder, which automatically analyzed speech into its fundamental tones and resonances. From his work on the vocoder, Homer Dudley developed a keyboard-operated voice-synthesizer called The Voder (Voice Demonstrator), which he exhibited at the 1939 New York World's Fair. Franklin S. Cooper and his colleagues at Haskins Laboratories built the pattern playback in the late 1940s and completed it in 1950. There were several different versions of this hardware device; only one currently survives. The machine converts pictures of the acoustic patterns of speech in the form of a spectrogram back into sound. Using this device, Alvin Liberman and colleagues discovered acoustic cues for the perception of phonetic segments (consonants and vowels). === Electronic devices === The first computer-based speech-synthesis systems originated in the late 1950s. Noriko Umeda et al. developed the first general English text-to-speech system in 1968, at the Electrotechnical Laboratory in Japan. In 1961, physicist John Larry Kelly, Jr and his colleague Louis Gerstman used an IBM 704 computer to synthesize speech, an event among the most prominent in the history of Bell Labs. Kelly's voice recorder synthesizer (vocoder) recreated the song "Daisy Bell", with musical accompaniment from Max Mathews. Coincidentally, Arthur C. Clarke was visiting his friend and colleague John Pierce at the Bell Labs Murray Hill facility. Clarke was so impressed by the demonstration that he used it in the climactic scene of his screenplay for his novel 2001: A Space Odyssey, where the HAL 9000 computer sings the same song as astronaut Dave Bowman puts it to sleep. Despite the success of purely electronic speech synthesis, research into mechanical speech-synthesizers continues. Linear predictive coding (LPC), a form of speech coding, began development with the work of Fumitada Itakura of Nagoya University and Shuzo Saito of Nippon Telegraph and Telephone (NTT) in 1966. Further developments in LPC technology were made by Bishnu S. Atal and Manfred R. Schroeder at Bell Labs during the 1970s. LPC was later the basis for early speech synthesizer chips, such as the Texas Instruments LPC Speech Chips used in the Speak & Spell toys from 1978. In 1975, Fumitada Itakura developed the line spectral pairs (LSP) method for high-compression speech coding, while at NTT. From 1975 to 1981, Itakura studied problems in speech analysis and synthesis based on the LSP method. In 1980, his team developed an LSP-based speech synthesizer chip. LSP is an important technology for speech synthesis and coding, and in the 1990s was adopted by almost all international speech coding standards as an essential component, contributing to the enhancement of digital speech communication over mobile channels and the internet. In 1975, MUSA was released, and was one of the first Speech Synthesis systems. It consisted of a stand-alone computer hardware and a specialized software that enabled it to read Italian. A second version, released in 1978, was also able to sing Italian in an "a cappella" style. Dominant systems in the 1980s and 1990s were the DECtalk system, based largely on the work of Dennis Klatt at MIT, and the Bell Labs system; the latter was one of the first multilingual language-independent systems, making extensive use of natural language processing methods. Handheld electronics featuring speech synthesis began emerging in the 1970s. One of the first was the Telesensory Systems Inc. (TSI) Speech+ portable calculator for the blind in 1976. Other devices had primarily educational purposes, such as the Speak & Spell toy produced by Texas Instruments in 1978. Fidelity released a speaking version of its electronic chess computer in 1979. The first video game to feature speech synthesis was the 1980 shoot 'em up arcade game, Stratovox (known in Japan as Speak & Rescue), from Sun Electronics. The first personal computer game with speech synthesis was Manbiki Shoujo (Shoplifting Girl), released in 1980 for the PET 2001, for which the game's developer, Hiroshi Suzuki, developed a "zero cross" programming technique to produce a synthesized speech waveform. Another early example, the arcade version of Berzerk, also dates from 1980. The Milton Bradley Company produced the first multi-player electronic game using voice synthesis, Milton, in the same year. In 1976, Computalker Consultants released their CT-1 Speech Synthesizer. Designed by D. Lloyd Rice and Jim Cooper, it was an analog synthesizer built to work with microcomputers using the S-100 bus standard. Synthesized voices typically sounded male until 1990, when Ann Syrdal, at AT&T Bell Laboratories, created a female voice. Ray Kurzweil predicted in 2005 that as the cost-performance ratio caused speech synthesizers to become cheaper and more accessible, more people would benefit from the use of text-to-speech programs. === Artificial intelligence === In September 2016, DeepMind released WaveNet, which demonstrated that deep learning models are capable of modeling raw waveforms and generating speech from acoustic features like spectrograms or mel-spectrograms, starting the field of deep learning speech synthesis. Although WaveNet was initially considered to be computationally expensive and slow to be used in consumer products at the time, a year after its
Read more →
Inductive bias

The inductive bias (also known as learning bias) of a learning algorithm is the set of assumptions that the learner uses to predict outputs of given inputs that it has not encountered. Inductive bias is anything which makes the algorithm learn one pattern instead of another pattern (e.g., step-functions in decision trees instead of continuous functions in linear regression models). Learning involves searching a space of solutions for a solution that provides a good explanation of the data. However, in many cases, there may be multiple equally appropriate solutions. An inductive bias allows a learning algorithm to prioritize one solution (or interpretation) over another, independently of the observed data. In machine learning, the aim is to construct algorithms that are able to learn to predict a certain target output. To achieve this, the learning algorithm is presented some training examples that demonstrate the intended relation of input and output values. Then the learner is supposed to approximate the correct output, even for examples that have not been shown during training. Without any additional assumptions, this problem cannot be solved since unseen situations might have an arbitrary output value. The kind of necessary assumptions about the nature of the target function are subsumed in the phrase inductive bias. A classical example of an inductive bias is Occam's razor, assuming that the simplest consistent hypothesis about the target function is actually the best. Here, consistent means that the hypothesis of the learner yields correct outputs for all of the examples that have been given to the algorithm. Approaches to a more formal definition of inductive bias are based on mathematical logic. Here, the inductive bias is a logical formula that, together with the training data, logically entails the hypothesis generated by the learner. However, this strict formalism fails in many practical cases in which the inductive bias can only be given as a rough description (e.g., in the case of artificial neural networks), or not at all. == Types == The following is a list of common inductive biases in machine learning algorithms. Maximum conditional independence: if the hypothesis can be cast in a Bayesian framework, try to maximize conditional independence. This is the bias used in the Naive Bayes classifier. Minimum cross-validation error: when trying to choose among hypotheses, select the hypothesis with the lowest cross-validation error. Although cross-validation may seem to be free of bias, the "no free lunch" theorems show that cross-validation must be biased, for example assuming that there is no information encoded in the ordering of the data. Maximum margin: when drawing a boundary between two classes, attempt to maximize the width of the boundary. This is the bias used in support vector machines. The assumption is that distinct classes tend to be separated by wide boundaries. Minimum description length: when forming a hypothesis, attempt to minimize the length of the description of the hypothesis. Minimum features: unless there is good evidence that a feature is useful, it should be deleted. This is the assumption behind feature selection algorithms. Nearest neighbors: assume that most of the cases in a small neighborhood in feature space belong to the same class. Given a case for which the class is unknown, guess that it belongs to the same class as the majority in its immediate neighborhood. This is the bias used in the k-nearest neighbors algorithm. The assumption is that cases that are near each other tend to belong to the same class. == Shift of bias == Although most learning algorithms have a static bias, some algorithms are designed to shift their bias as they acquire more data. This does not avoid bias, since the bias shifting process itself must have a bias.
Read more →
European Conference on Artificial Intelligence

The European Conference on Artificial Intelligence (ECAI) is the leading conference in the field of Artificial Intelligence in Europe, and is commonly listed together with IJCAI and AAAI as one of the three major general AI conferences worldwide. The conference series has been held without interruption since 1974, originally under the name AISB. The conference was originally held biennially, but has been organized annually since ECAI 2022. The conferences are held under the auspices of the European Coordinating Committee for Artificial Intelligence (ECCAI) and organized by one of the member societies. The journal AI Communications, sponsored by the same society, regularly publishes special issues in which conference attendees report on the conference. Publication of a paper in ECAI is considered by some journals to be archival: the paper should be considered equivalent to a journal publication and that the contents of ECAI papers cannot be reformulated as separate journal submissions unless a significant amount of new material is added. == List of ECAI conferences == ECAI-1992 took place in Vienna, Austria. ECAI-1996 took place in Budapest, Hungary. ECAI-1998 tool place in Brighton, United Kingdom. ECAI-2000 took place in Berlin, Germany. ECAI-2004 took place in Valencia, Spain. ECAI-2006 took place in Riva del Garda, Italy. ECAI-2008 took place in Patras, Greece. ECAI-2010 took place in Lisbon, Portugal. ECAI-2012 took place in Montpellier, France. ECAI-2014 took place in Prague, Czech Republic. ECAI-2016 took place in The Hague, Netherlands. ECAI-2018 took place in Stockholm, Sweden. ECAI-2020 took place in Santiago de Compostela, Spain. ECAI-2022 took place in Vienna, Austria. ECAI-2023 took place in Kraków, Poland. ECAI-2024 took place in Santiago de Compostela, Spain. ECAI-2025 took place in Bologna, Italy.
Read more →
Sinewave synthesis

Sinewave synthesis, or sine wave speech, is a technique for synthesizing speech by replacing the formants (main bands of energy) with pure tone whistles. The first sinewave synthesis program (SWS) for the automatic creation of stimuli for perceptual experiments was developed by Philip Rubin at Haskins Laboratories in the 1970s. This program was subsequently used by Robert Remez, Philip Rubin, David Pisoni, and other colleagues to show that listeners can perceive continuous speech without traditional speech cues, i.e., pitch, stress, and intonation. This work paved the way for a view of speech as a dynamic pattern of trajectories through articulatory-acoustic space.
Read more →
Darwin among the Machines

"Darwin among the Machines" is a letter to the editor published in The Press newspaper on 13 June 1863 in Christchurch, New Zealand. The title, which was chosen by the author, references the work of Charles Darwin. Written by Samuel Butler but signed Cellarius, the letter raised the possibility that machines were a kind of "mechanical life" undergoing constant evolution, and that eventually machines might supplant humans as the dominant species. == Book of the Machines == Butler developed this and subsequent articles into The Book of the Machines, three chapters of Erewhon, published anonymously in 1872. The Erewhonian society Butler envisioned had long ago undergone a revolution that destroyed most mechanical inventions. The narrator of the story finds a book that details the reasons for this revolution, which he translates for the reader. Despite the initial popularity of Erewhon, Butler commented in the preface to the second edition that reviewers had "in some cases been inclined to treat the chapters on Machines as an attempt to reduce Mr. Darwin's theory to an absurdity." He protested that "few things would be more distasteful to me than any attempt to laugh at Mr. Darwin", but also added "I am surprised, however, that the book at which such an example of the specious misuse of analogy would seem most naturally levelled should have occurred to no reviewer; neither shall I mention the name of the book here, though I should fancy that the hint given will suffice", which may suggest that the chapter on Machines was in fact a satire intended to illustrate the "specious misuse of analogy", even if the target was not Darwin; Butler, fearing that he had offended Darwin, wrote him a letter explaining that the actual target was Joseph Butler's 1736 The Analogy of Religion, Natural and Revealed, to the Constitution and Course of Nature. The Victorian scholar Herbert Sussman has suggested that although Butler's exploration of machine evolution was intended to be whimsical, he may also have been genuinely interested in the notion that living organisms are a type of mechanism and was exploring this notion with his writings on machines, while the philosopher Louis Flaccus called it "a mixture of fun, satire, and thoughtful speculation." == Evolution of Global Intelligence == George Dyson applies Butler's original premise to the artificial life and intelligence of Alan Turing in Darwin Among the Machines: The Evolution of Global Intelligence (1998) ISBN 0-7382-0030-1, to suggest that the internet is a living, sentient being. Dyson's main claim is that the evolution of a conscious mind from today's technology is inevitable. It is not clear whether this will be a single mind or multiple minds, how smart that mind would be, and even if we will be able to communicate with it. He also clearly suggests that there are forms of intelligence on Earth that we are currently unable to understand. From the book: "What mind, if any, will become apprehensive of the great coiling of ideas now under way is not a meaningless question, but it is still too early in the game to expect an answer that is meaningful to us."
Read more →
Video renderer

A video renderer is software that processes a video file and sends it sequentially to the video display controller card for display on a computer screen. An example of a video renderer, is the VMR-7 that was used by Microsoft's DirectShow. An example of a UNIX video renderer is the one container within GStreamer. Commonly used video renderers are: Enhanced Video Renderer VMR9 Renderless Haali's Video Renderer Madvr Video Renderer JRVR, a part of JRiver Media Center
Read more →
Fuzzy finite element

The fuzzy finite element method combines the well-established finite element method with the concept of fuzzy numbers, the latter being a special case of a fuzzy set. The advantage of using fuzzy numbers instead of real numbers lies in the incorporation of uncertainty (on material properties, parameters, geometry, initial conditions, etc.) in the finite element analysis. One way to establish a fuzzy finite element (FE) analysis is to use existing FE software (in-house or commercial) as an inner-level module to compute a deterministic result, and to add an outer-level loop to handle the fuzziness (uncertainty). This outer-level loop comes down to solving an optimization problem. If the inner-level deterministic module produces monotonic behavior with respect to the input variables, then the outer-level optimization problem is greatly simplified, since in this case the extrema will be located at the vertices of the domain.
Read more →
Statistical semantics

In linguistics, statistical semantics applies the methods of statistics to the problem of determining the meaning of words or phrases, ideally through unsupervised learning, to a degree of precision at least sufficient for the purpose of information retrieval. == History == The term statistical semantics was first used by Warren Weaver in his well-known paper on machine translation. He argued that word-sense disambiguation for machine translation should be based on the co-occurrence frequency of the context words near a given target word. The underlying assumption that "a word is characterized by the company it keeps" was advocated by J. R. Firth. This assumption is known in linguistics as the distributional hypothesis. Emile Delavenay defined statistical semantics as the "statistical study of the meanings of words and their frequency and order of recurrence". "Furnas et al. 1983" is frequently cited as a foundational contribution to statistical semantics. An early success in the field was latent semantic analysis. == Applications == Research in statistical semantics has resulted in a wide variety of algorithms that use the distributional hypothesis to discover many aspects of semantics, by applying statistical techniques to large corpora: Measuring the similarity in word meanings Measuring the similarity in word relations Modeling similarity-based generalization Discovering words with a given relation Classifying relations between words Extracting keywords from documents Measuring the cohesiveness of text Discovering the different senses of words Distinguishing the different senses of words Subcognitive aspects of words Distinguishing praise from criticism == Related fields == Statistical semantics focuses on the meanings of common words and the relations between common words, unlike text mining, which tends to focus on whole documents, document collections, or named entities (names of people, places, and organizations). Statistical semantics is a subfield of computational semantics, which is in turn a subfield of computational linguistics and natural language processing. Many of the applications of statistical semantics (listed above) can also be addressed by lexicon-based algorithms, instead of the corpus-based algorithms of statistical semantics. One advantage of corpus-based algorithms is that they are typically not as labour-intensive as lexicon-based algorithms. Another advantage is that they are usually easier to adapt to new languages or noisier new text types from e.g. social media than lexicon-based algorithms are. However, the best performance on an application is often achieved by combining the two approaches.
Read more →
Distributed artificial intelligence

Distributed Artificial Intelligence (DAI) (also called Decentralized Artificial Intelligence) is a melding of artificial intelligence with distributed computing. From artificial intelligence comes the theory and technology for constructing or analyzing an intelligent system. But where artificial intelligence uses psychology as a source of ideas, inspiration, and metaphor, DAI uses sociology, economics, and management science for inspiration. Where the focus of artificial intelligence is on the individual, the focus of DAI is on the group. Distributed computing provides the computational substrate on which this group focus can occur. Using techniques from artificial intelligence, communication theory, control theory, and interaction theory, it produces a cooperative solution to problems by a decentralized group of computational entities (agents). DAI is closely related to and a predecessor of the field of multi-agent systems. They are distinguished generally by multi-agent systems being open, where the entities might arise from different interests and have individual goals, and distributed artificial-intelligence systems, where the entities have common goals. There are numerous applications and tools. == Definition == Distributed Artificial Intelligence (DAI) is an approach to solving complex learning, planning, and decision-making problems. It is embarrassingly parallel, thus able to exploit large scale computation and spatial distribution of computing resources. These properties allow it to solve problems that require the processing of very large data sets. DAI systems consist of autonomous learning processing nodes (agents), that are distributed, often at a very large scale. DAI nodes can act independently, and partial solutions are integrated by communication between nodes, often asynchronously. By virtue of their scale, DAI systems are robust and elastic, and by necessity, loosely coupled. Furthermore, DAI systems are built to be adaptive to changes in the problem definition or underlying data sets due to the scale and difficulty in redeployment. DAI systems do not require all the relevant data to be aggregated in a single location, in contrast to monolithic or centralized Artificial Intelligence systems, which have tightly coupled and geographically close processing nodes. Therefore, DAI systems often operate on sub-samples or hashed impressions of very large datasets. In addition, the source dataset may change or be updated during the course of the execution of a DAI system. == Development == In 1975 distributed artificial intelligence emerged as a subfield of artificial intelligence that dealt with interactions of intelligent agents. As a scientific discipline, it progressed through a series of workshops in the USA (International Workshop on Distributed Artificial Intelligence, held in 13 editions from 1978 - 1994), Europe (Workshop on Modelling Autonomous Agents in a Multi-Agent World https://link.springer.com/conference/maamaw), and Asia (Multi-Agent and Cooperative Computation Workshop (MACC) https://sites.google.com/view/sig-macc/macc-workshop?authuser=0). Distributed artificial intelligence systems were conceived as a group of intelligent entities, called agents, that interacted by cooperation, by coexistence, or by competition. DAI is categorized into multi-agent systems and distributed problem solving. In multi-agent systems the main focus is how agents coordinate their knowledge and activities. For distributed problem solving the major focus is how the problem is decomposed and the solutions are synthesized. == Goals == The objectives of Distributed Artificial Intelligence are to solve the reasoning, planning, learning and perception problems of artificial intelligence, especially if they require large data, by distributing the problem to autonomous processing nodes (agents). To reach the objective, DAI requires: A distributed system with robust and elastic computation on unreliable and failing resources that are loosely coupled Coordination of the actions and communication of the nodes Subsamples of large data sets and online machine learning There are many reasons for wanting to distribute intelligence or cope with multi-agent systems. Mainstream problems in DAI research include the following: Parallel problem solving: mainly deals with how classic artificial intelligence concepts can be modified, so that multiprocessor systems and clusters of computers can be used to speed up calculation. Distributed problem solving (DPS): the concept of agent, autonomous entities that can communicate with each other, was developed to serve as an abstraction for developing DPS systems. See below for further details. Multi-Agent Based Simulation (MABS): a branch of DAI that builds the foundation for simulations that need to analyze not only phenomena at macro level but also at micro level, as it is in many social simulation scenarios. == Approaches == Two types of DAI has emerged: In Multi-agent systems agents coordinate their knowledge and activities and reason about the processes of coordination. Agents are physical or virtual entities that can act, perceive their environment, and communicate with other agents. An agent is autonomous and has skills to achieve goals. The agents change the state of their environment by their actions. There are a number of different coordination techniques. In distributed problem solving the work is divided among nodes and the knowledge is shared. The main concerns are task decomposition and synthesis of the knowledge and solutions. DAI can apply a bottom-up approach to AI, similar to the subsumption architecture as well as the traditional top-down approach of AI. In addition, DAI can also be a vehicle for emergence. === Challenges === The challenges in Distributed AI are: How to carry out communication and interaction of agents and which communication language or protocols should be used. How to ensure the coherency of agents. How to synthesise the results among 'intelligent agents' group by formulation, description, decomposition and allocation. == Applications and tools == Areas where DAI have been applied are: Electronic commerce, e.g. for trading strategies the DAI system learns financial trading rules from subsamples of very large samples of financial data Networks, e.g. in telecommunications the DAI system controls the cooperative resources in a WLAN network Routing, e.g. model vehicle flow in transport networks Scheduling, e.g. flow shop scheduling where the resource management entity ensures local optimization and cooperation for global and local consistency Search engines, e.g. in LLM federated search like Ithy where document retrieval and analysis are distributed to DAI agents before aggregation Multi-Agent systems, e.g. artificial life, the study of simulated life Electric power systems, e.g. Condition Monitoring Multi-Agent System (COMMAS) applied to transformer condition monitoring, and IntelliTEAM II Automatic Restoration System DAI integration in tools has included: ECStar is a distributed rule-based learning system. == Agents == === Systems: Agents and multi-agents === Notion of Agents: Agents can be described as distinct entities with standard boundaries and interfaces designed for problem solving. Notion of Multi-Agents: Multi-Agent system is defined as a network of agents which are loosely coupled working as a single entity like society for problem solving that an individual agent cannot solve. === Software agents === The key concept used in DPS and MABS is the abstraction called software agents. An agent is a virtual (or physical) autonomous entity that has an understanding of its environment and acts upon it. An agent is usually able to communicate with other agents in the same system to achieve a common goal, that one agent alone could not achieve. This communication system uses an agent communication language. A first classification that is useful is to divide agents into: reactive agent – A reactive agent is not much more than an automaton that receives input, processes it and produces an output. deliberative agent – A deliberative agent in contrast should have an internal view of its environment and is able to follow its own plans. hybrid agent – A hybrid agent is a mixture of reactive and deliberative, that follows its own plans, but also sometimes directly reacts to external events without deliberation. Well-recognized agent architectures that describe how an agent is internally structured are: ASMO (emergence of distributed modules) BDI (Believe Desire Intention, a general architecture that describes how plans are made) InterRAP (A three-layer architecture, with a reactive, a deliberative and a social layer) PECS (Physics, Emotion, Cognition, Social, describes how those four parts influences the agents behavior). Soar (a rule-based approach)
Read more →
Nextcloud

Nextcloud is a modular workspace platform designed to provide teams and businesses with a comprehensive environment for digital collaboration. Beyond central data management, it integrates office suites like Collabora Online and EuroOffice office suites. for seamless, cooperative workflows. The platform features built-in tools for chat, videoconferencing, and a privacy-focused AI assistant capable of running entirely on local LLMs. Supported by a rich ecosystem of apps, it can be hosted in the cloud or on premises and can scale up to millions of users. It has been translated into over 100 languages. == Features == Nextcloud files are stored in conventional directory structures, accessible via WebDAV if necessary. A SQLite, MySQL/MariaDB or PostgreSQL database is required to provide additional functionality like permissions, shares, and comments. Nextcloud can synchronize with local clients running Windows (Windows 8.1 and above), macOS (10.14 or later), Linux and FreeBSD. Nextcloud permits user and group administration locally or via different backends like OpenID or LDAP. Content can be shared inside the system by defining granular read/write permissions between users and groups. Nextcloud users can create public URLs when sharing files. Logging of file-related actions, as well as disallowing access based on file access rules is also available. Security options like brute-force protection and multi-factor authentication using TOTP, WebAuthn, Oauth2, and OpenID Connect are available. Nextcloud has planned new features such as monitoring capabilities, full-text search and Kerberos authentication, as well as audio/video conferencing, expanded federation and smaller user interface improvements. == History == In April 2016 Frank Karlitschek and most core contributors left ownCloud Inc. These included some of ownCloud's staff according to sources near to the ownCloud community. Karlitschek and many of these contributors went on to fork ownCloud, creating Nextcloud. The fork was preceded by a blog post of Karlitschek announcing his departure and raising questions about the management of the ownCloud, its community, and priorities between growth, money, and sustainability. There have been no official statements about the reason for the fork. However, Karlitschek mentioned the fork several times in a talk at the 2018 FOSDEM conference and in two appearances on the FLOSS Weekly podcast, emphasizing cultural mismatch between open source developers and business oriented people not used to the open source community. On June 2, within 12 hours of the announcement of the fork, the American entity "ownCloud Inc." announced that it is shutting down with immediate effect, stating that "[...] main lenders in the US have cancelled our credit. Following American law, we are forced to close the doors of ownCloud, Inc. with immediate effect and terminate the contracts of 8 employees." ownCloud Inc. accused Karlitschek of poaching developers, while Nextcloud developers such as Arthur Schiwon stated that he "decided to quit because not everything in the ownCloud Inc. company world evolved as I imagined". ownCloud GmbH continued operations, secured financing from new investors and took over the business of ownCloud Inc. In April 2018 Informationstechnikzentrum Bund (ITZBund) reported Nextcloud won the tender for "Bundescloud" (Germany government cloud) project. In August 2019 it was announced that the governments of France, Sweden and the Netherlands would use Nextcloud for file transfer. In January 2020 Nextcloud 18 "Nextcloud Hub" was released. The major change was direct integration with an Office suite (OnlyOffice) and Nextcloud announced that their goal was to compete with Office 365 and Google Docs. A partnership with Ionos was revealed – its hosting location in Germany and compliance with GDPR should support the goal of data sovereignty. In spring 2020 remote work and web conferencing usage increased due to the COVID-19 pandemic and Nextcloud released version 19 with chat and videoconferencing Talk app integrated into the application core. Communication with an optional "high performance back-end" allows self-hosting of web conferences with more than 10 participants. Collabora Online was introduced as another integrated office suite. In August 2021 Nextcloud was chosen as a collaboration platform for European cloud software GAIA-X. In a September 2021 European Commission report it was mentioned as "the most widely deployed Open Source content collaboration platform" Following the 2025 United States tariffs against the European Union, fear of overreliance on US cloud providers such as Microsoft 365 and Google Workspace increased, with Nextcloud being one of the foremost contenders to replace them. Some governmental organisations including the European Data Protection Supervisor and the German state of Schleswig-Holstein have since switched from Microsoft's Sharepoint to Nextcloud. According to Nextcloud, during the first 5 months of 2025, customer interest in the software had tripled.
Read more →
Nature Manifesto

Nature Manifesto is an Immersive sound piece and multimedia installation by Icelandic artist Björk and artist and curator Aleph Molinari, created in collaboration with the French Institute for Research and Coordination in Acoustics/Music (IRCAM). The installation was showcased at the Centre Pompidou in Paris, France from November 20, 2024 to December 9, 2024, as part of the museum's "Biodiversity: Which Culture for Which Future?" forum. It combines natural soundscapes, calls of extinct animals reconstructed through artificial intelligence, and Björk's narration to address damages to biodiversity and the collapse of ecosystems. == Background == Björk's work intricately weaves themes of nature and technology, reflecting her deep engagement with both realms. In 2008, she co-founded the Náttúra campaign to protest the construction of foreign-backed aluminum factories in Iceland, aiming to protect the country's natural landscapes. She released the single "Náttúra" featuring Thom Yorke, with all proceeds supporting this environmental initiative. Her 2011 album Biophilia further exemplifies this synthesis, exploring the relationships between music, nature, and technology through a multimedia project that included interactive apps, custom-made instruments, and educational workshops. Björk's Cornucopia tour (2019-2023) seamlessly integrates themes of nature preservation and environmental activism, and featured a recorded message by Swedish climate activist Greta Thunberg. The tour's fusion of music, technology, and natural imagery reflects Björk's vision of a harmonious coexistence between humanity and nature, advocating for sustainable futures. Björk has previously used artificial intelligence in her works. In 2020, she collaborated with Microsoft to create Kórsafn, a sound installation for the Sister City Hotel lobby in New York City which used an AI-powered model that elaborated choral recordings from her discography through a sensor on the rooftop of the building that would generate music according to data like the weather and the seasons. For her charity single "Oral", featuring Spanish singer Rosalía, she released a music video directed by photographer and visual artist Carlota Guerrero, who used AI-generated deepfake versions of the artists. == Concept == Nature Manifesto is a three-minute and forty-second immersive sound piece. The composition merges Björk's voice, as she articulates a manifesto on biodiversity and the climate crisis, with cries of extinct and endangered animals, harmonizing them with natural soundscapes. The installation was curated by Chloé Siganos and Aleph Molinari, with associate curator Delphine Le Gatt. The primary goal of Nature Manifesto is to foster a deeper understanding of humanity's impact on the natural world. Conceived as a "post-optimistic" manifesto, Aleph Molinari stated that the project's purpose was to "offer a voice to nature". He stated that "the modern concept of nature itself is problematic [...] because it’s a concept born in the Romantic period and, with the rise of the industrial era, became an antithesis to human civilisation and everything urban. Nature came to define what was outside, the savage Other... But nature is everything that we’re part of." The soundscape features recreated calls of extinct and endangered species, developed in collaboration with the French sound research institute IRCAM. Artificial intelligence was employed to simulate the vocalizations of animals that no longer exist in the wild. To save energy and lessen the ecological impact of the use of AI, the research institute developed a "frugal AI" model capable of generating audio in real-time on local servers without a graphics processing unit. The sounds were then produced and edited by Björk in collaboration with Robin Meier Wiratunga and Bergur Þórisson. The installation was located within the Centre Pompidou's escalator, known as the "caterpillar". The installation was further supported by videos created by visual artist Sam Balfus (also known as Balfua) by using artificial intelligence, and edited by Santiago Molinari. == Activism == To sustain and broaden the themes presented in Nature Manifesto, Björk publicly urged French President Emmanuel Macron to prohibit bottom trawling within France's marine protected areas (MPA). She criticized the French government's claim of protecting 30% of its marine territories, highlighting that over 90% of these MPAs exist only on paper, allowing destructive practices like bottom trawling to continue unchecked. She collaborated with non-governmental organizations Sustainable Ocean Alliance, Ungir umhverfissinnar and Bloom, to advocate for genuine ocean conservation. Björk promoted the cause through her social media profiles by sharing petitions. In November 2024, Björk lent her Instagram account to French environmental activists to directly address Macron. The activists used the platform to call for stronger protection of the ocean, urging Macron to impose stricter restrictions on harmful fishing practices, particularly bottom trawling. == Reception == Nature Manifesto received mixed to positive reviews from critics. Some critiques focused on the installation's setting, suggesting that the movement inherent to the escalator space diminished the immersive potential of the soundscape. The choice of using artificial intelligence was also questioned. Björk and Molinari defended this, as both see AI as a tool that can be used creatively and sustainably, with Björk focusing on the importance of human input to give AI a "soul", and Molinari stressing the need for sustainable technological practices in the broader context of digital life. After the exhibition ended, Björk further opinionated: "this is how we will work in the future. [...] if there is no soul in tomorrow's music made by AI it is because [no one] put it there and we have to speak out and guard this as listeners", further stating that there is already "soulless muzak" [sic] on Spotify, "mass manufactured without the attention of creativity".
Read more →
A Fire Upon the Deep

A Fire Upon the Deep is a 1992 science fiction novel by American writer Vernor Vinge. It is a space opera involving superhuman intelligences, aliens, variable physics, space battles, love, betrayal, genocide, and a communication medium resembling Usenet. A Fire Upon the Deep won the Hugo Award in 1993, sharing it with Doomsday Book by Connie Willis. Besides the normal print book editions, the novel was also included on a CD-ROM sold by ClariNet Communications along with the other nominees for the 1993 Hugo awards. The CD-ROM edition included numerous annotations by Vinge on his thoughts and intentions about different parts of the book, and was later released as a standalone e-book. It has a loose prequel, A Deepness in the Sky, from 1999, and a direct sequel, The Children of the Sky, from 2012. == Setting == The novel is set in various locations within the Milky Way. The galaxy is divided into four concentric volumes called the "Zones of Thought"; it is not clear to the novel's characters whether this is a natural phenomenon or an artificially created one. Each Zone has fundamental differences in basic physical laws. One of the main consequences of these differences is the effect on intelligence. Artificial intelligence and automation is most directly affected, in that advanced hardware and software from the Beyond or the Transcend will work less and less well as a ship descends towards the Unthinking Depths. Biological intelligence is affected to a lesser degree. The four zones are spoken of in terms of "low" to "high" as follows: The Unthinking Depths are the innermost zone, surrounding the Galactic Center. In it, only minimal forms of intelligence, biological or otherwise, are possible. This means that any ship straying into the Depths will be stranded, effectively permanently. Even if the crew did not die immediately—and some forms of life native to "higher" Zones would likely do so—they would be rendered incapable of even human intelligence, leaving them unable to operate their ship in any meaningful way. Surrounding the Depths is the Slow Zone or Slowness. "Old Earth" is in this Zone, although Earth plays no significant role in the story. Biological intelligence is possible in "the Slowness", but not true, sentient, artificial intelligence. Faster than light travel (FTL) is impossible in the Slow Zone. Faster-than-light communication is impossible into or out of the Slow Zone. As the boundaries of the Zones are subject to change, accidental entry into the Slow Zone is a major hazard at the "Bottom" of the Beyond. Starships which operate near the Beyond/Slow Zone border often have an auxiliary Bussard ramjet drive, so that if they accidentally stray into the Slow Zone, they will at least have a backup (sub-light) drive to try to reach the Beyond. Such ships also tend to include "coldsleep" equipment, as it is likely that any such return will still take many lifetimes for most species. The next layer outward is the Beyond, within which artificial intelligence, FTL travel, and FTL communication are possible. All human civilizations in the Beyond are descended from a single ethnic Norwegian group. The original settlement of this group is known as Nyjora; other human settlements in the Beyond include Straumli Realm and Sjandra Kei. In the Beyond, FTL travel is accomplished by making many small "jumps" across space, with the efficiency of the drive increasing the farther a ship travels from the galactic core. The Beyond is not a homogeneous zone; it includes the "High Beyond", "Middle Beyond", and the "Bottom of the Beyond", depending on distance from the galactic core. The Beyond is populated by a very large number of interstellar and intergalactic civilizations which are linked by an FTL communication network, "the Net", sometimes cynically called the "Net of a Million Lies". The Net is depicted as working much like the Usenet network in the early 1990s, with transcripts of messages containing header and footer information as one would find in such forums. The outermost layer, containing the galactic halo, is the Transcend, within which incomprehensible, superintelligent beings dwell. When a "Beyonder" civilization reaches the point of technological singularity, it can "Transcend", becoming a "Power". Such Powers always seem to relocate to the Transcend, seemingly necessarily, where they become engaged in activities which are entirely mysterious to those in the Beyond. == Plot == An expedition from Straumli Realm, a human civilization in the High Beyond, investigates a newly discovered data archive in the Low Transcend. The expedition's facility, High Lab, is gradually compromised by a superintelligence that is accidentally awoken by the researchers. This superintelligence is later known as the Blight. Shortly before the Blight's final "flowering", two self-aware entities, created similarly to the Blight, plot to aid the humans before the Blight can gain its full powers. Finally recognizing their danger, the High Lab researchers attempt to flee in two ships. The Blight destroys one ship; a second ship, carrying many High Lab children in coldsleep boxes, escapes. This ship lands on a distant planet at the Bottom of the Beyond. The planet is occupied by dog-like creatures, dubbed "Tines", who live in packs as group minds. The Tines have a level of technology comparable to the human Middle Ages. Upon landing, however, the two surviving adults, Arne and Sjana Olnsdot, are ambushed and killed by Tine fanatics known as Flenserists, in whose realm they have landed. The Flenserists capture their children, Jefri and Johanna. Johanna is rescued by a Tine named Peregrine and taken to a neighboring kingdom ruled by Woodcarver. A distress signal from the Straumli ship eventually reaches Relay, a major information provider for the Net. A Transcendent being named "Old One" contacts Relay, seeking information about the Blight and the humans who released it. Old One then reconstitutes a human man named Pham Nuwen from the wreckage of a spaceship to act as its agent. Pham remains unsure if he is a construct or if his memories are real. Ravna Bergsndot, the only human Relay employee, traces the Straumli ship's signal to the Tines' world and persuades her employer to investigate. Ravna contracts the merchant vessel Out of Band II to transport her and Pham. The ship is owned by two Skroderiders, Blueshell and Greenstalk. Before the mission is launched, the Blight launches a surprise attack on Relay and kills Old One. As Old One dies, it downloads its anti-Blight information into Pham. Pham, Ravna and the Skroderiders barely escape Relay's destruction in the Out of Band II. During their journey to Tine's World, Ravna communicates with Jefri. Jefri is manipulated to believe that Woodcarver is his enemy. The Flenserist leaders, Steel and Tyrathect, use Ravna's information to develop advanced technology such as cannon and radio communication. Meanwhile, Johanna and the knowledge stored in her dataset device help Woodcarver rapidly develop as well. The Blight expands, taking over several civilizations, brainwashing their populations, and seizing archives in the Beyond. On the Net, some claim that humans are the means by which the Blight is able to spread. Anti-human fanatics destroy the entire civilization of Sjandra Kei, which is Ravna's home world. The Out of Band II is pursued by three fleets: anti-human fanatics, survivors from Sjandra Kei, and a shadow fleet controlled by the Blight. During the pursuit, Ravna and Pham learn that every member of the Skroderider species can be subverted by the Blight; this drives a wedge between the crew members. Ships from Sjandra Kei sacrifice themselves to delay the Blight and the anti-human ships, allowing the Out of Band II to reach Tine's World before the Blight. When the Out of Band II arrives at Tine's World, the humans ally with Woodcarver to defeat the Flenserists and rescue Jefri. Blueshell sacrifices himself to rescue Jefri. Pham then initiates an anti-Blight Countermeasure, which was aboard the humans' ship. The Countermeasure extends the Slow Zone outward by thousands of light years. This envelops and destroys the Blight, but results in the destruction of thousands of civilizations and trillions of deaths. The humans are stranded on the Tines' World, now in the depths of the Slow Zone. Activating the Countermeasure proves fatal to Pham, but before he dies, the remnant of Old One reveals to him that, although his body is a reconstruction, his memories are indeed real. == Related works == Vinge first used the concepts of "Zones of Thought" in a 1988 novella The Blabber, which occurs after Fire. Vinge's novel A Deepness in the Sky (1999) is a prequel to A Fire Upon the Deep set 20,000 years earlier and featuring Pham Nuwen. Vinge's The Children of the Sky, "a near-term sequel to A Fire Upon the Deep", set ten years later, was released in October 2011. Vinge's former wife, Joan D. Vinge, has also written s
Read more →