AI Chat Programs

AI Chat Programs — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

Comparison of video editing software

This is a comparison of non-linear video editing software applications. See also a more complete list of video editing software. == General information == This table gives basic general information about the different editors: === Active === === Discontinued / Inactive === ==== Definition ==== professional: used for full length Hollywood movies; professional (small): mainly used for paid commercials, short films or podcasts/YouTube channels; prosumer: Mainly targeting private use, anything that can do more than just trimming a film; basic: trimming a film; == System requirements == This table lists the operating systems that different editors can run on without emulation, as well as other system requirements. Note that minimum system requirements are listed; some features (like High Definition support) may be unavailable with these specifications. "Unix" includes the similar Linux, BSD and Unix-like operating systems. == High definition/High resolution import == The table below indicates the ability of each program to import various High Definition video or High resolution video formats for editing. == Feature set == == Output options == Please note that recording to Blu-ray does not imply 1080@50p/60p . Most only support up to 1080i 25/30 frames per second recording. Also not all formats can be output.
Read more →
Smart speaker

A smart speaker is a type of loudspeaker and voice command device with an integrated virtual assistant that offers interactive actions and hands-free activation with the help of one "wake word" (or several "wake words"). Some smart speakers also act as smart home hubs by using Wi-Fi, Bluetooth, Thread, and other protocol standards to extend usage beyond audio playback and control home automation devices connected through a local area network. == History == Early voice-activated devices began in 2013 with MIT's Jasper project, which used multiple microphones and cloud software to power hands-free interactions from across a room. The first commercial smart speaker was the Amazon Echo, which was released in 2014 powered by Alexa and a ring of far-field microphones. Google followed in 2016 with Home, powered by Google Assistant. By 2017, devices like the Echo Show and Home Hub (later called Nest Hub) added touchscreens and video, creating the "smart display" subcategory. In 2018, Apple joined the smart speaker trend by launching the HomePod, which focused on high-quality audio alongside their built-in assistant Siri. ASUS release its own smart Speaker Xiao-Bu in 2019 with Artificial Intelligence, it terminates the Cloud Service on June 1st, 2025, which means all real-time service such as weather, news, currency conversion is affected. Sonos's 1st smart speaker Sonos One released in 2017, powered by Alexa. Invoke by Harman Kardon was powered by Microsoft's intelligent personal assistant, Cortana. In the early 2020s, smart speakers gained on-device voice processing for faster responses and improved privacy. New standards such as Matter and Thread allowed multitudes of smart-home devices (even from completely different brands) to work together. == Features == === Audio and Voice === Smart speakers use multiple microphones along with noise-cancelling software to pick up your voice from across the room, even when music is playing or the assistant is already talking. Noise suppression and echo cancellation is also used by the speaker so it can focus in on who is talking and ignore any background noises. Most smart speaker models can recognize who is speaking by voiceprint, which allows the speaker to grab information from that person's calendar, preferences, or music playlists. Listening to music on a speaker is when importance for good audio quality becomes apparent. Entry-level (cheaper) speakers such as the Home Mini or the Echo Dot have a single full-range driver. These lower-end speakers typically aren't great for listening to music as the audio quality is pretty poor. More advanced units such as the Home Max or Echo Studio have separate tweeters and woofers meant for listening to music in high quality. === Connectivity and smart-home control === Most connect over Wi-Fi or Bluetooth and support hub protocols like Thread and Matter. That lets them not only stream and play music but also allows you to control various brands of smart lights, thermostats, door locks, cameras, and much more-all from one point of control. Each can have its own designated interface and features in-house, usually launched or controlled via application or home automation software. These devices are able to communicate with each other via peer-to-peer connection through mesh networking. These speakers and related smart devices are typically controlled with one smartphone application. === Assistant services and skills === The built-in assistants handle timers, alarms, reminders, news briefings, weather updates, send messages to other smart devices, send texts, make calls, and simple questions. You can combine actions together in what are typically known as routines (for example saying "good morning" turns on lights, starts the coffee, says the weather, and reads the news) and add extra functions known as skills or actions (for things like ordering food or playing trivia games). This hands-free use of smart speakers can help assist those with disabilities. Most other technologies need the user to be able to physically interact with the device. Smart speakers are not bound by these limitations and can serve as an excellent tool for those who are unable to use their arms or legs or have vision issues. Although these tasks can be completed by a phone or computer, consumers tend to lean towards smart speakers due to factors such as their range being much greater than that of a phone and the need to not have to physically interact with the speaker to get the voice assistant as with most smartphones, certain parts of a phone may need to be interacted with to activate the speaking assistant. === Smart displays === Some smart speakers also include a screen to show the user a visual response. A smart speaker with a touchscreen is known as a smart display; these integrate a conversational user interface with display screens to augment voice interaction with images and video. They are powered by one of the common voice assistants and offer additional controls for smart home devices, feature streaming apps, and web browsers with touch controls for selecting content. The first smart displays were introduced in 2017 by Amazon (Amazon Echo Show) and Google (Google/Nest Home Hub). Hotel chain Marriott International partnered with Amazon to install Echo devices in select hotels since 2018. A Taiwanese startup, Aiello, launched the Aiello Voice Assistant (AVA) in the Asian hotel market in 2019, claiming it is powered by a multi-AI model system. Angie by Nomadix, which is similar to the Amazon Echo, launched its first product in 2017, specifically targeting hotel properties in the North America. In May 2019, Angie Hospitality acquired the assets of Roxy, a competitor that also built its own speech-enabled virtual assistant technology for hotels. This acquisition merged two proprietary NLP stacks into the current Nomadix product. === Artificial intelligence === The newest speakers can use on-device AI or cloud-based generative models to allow the smart speaker to carry on much more natural conversations, draft emails or recipes, suggest ideas based on context, or even create short pieces of music or art. This AI evolution allows these speakers to do far more than what they could do before. == Accuracy == According to a study by Proceedings of the National Academy of Sciences of the United States of America released In March 2020, the six biggest tech development companies, Amazon, Apple, Google, Yandex, IBM and Microsoft, have misidentified more words spoken by "black people" than "white people". The systems tested errors and unreadability, with a 19 and 35 percent discrepancy for the former and a 2 and 20 percent discrepancy for the latter. The North American Chapter of the Association for Computational Linguistics (NAACL) also identified a discrepancy between male and female voices. According to their research, Google's speech recognition software is 13 percent more accurate for men than women. It performs better than the systems used by Bing, AT&T, and IBM. == Privacy concerns == The built-in microphone in smart speakers is continuously listening for wake words followed by a command. However, these continuously listening microphones also raise privacy concerns among users. According to a survey taken by 1,007 people in Western Europe, it is clear that privacy is the biggest concern holding consumers back from buying "smart" products. these concerns include what is being recorded, how the data will be used, how it will be protected, and whether it will be used for invasive advertising. Furthermore, an analysis of Amazon Echo Dots showed that 30–38% of "spurious audio recordings were human conversations", suggesting that these devices capture audio other than strictly detection of the wake word. === As a wiretap === There are strong concerns that the ever-listening microphone of smart speakers presents a perfect candidate for wiretapping. In 2017, British security researcher Mark Barnes showed that pre-2017 Echos have exposed pins which allow for a compromised OS to be booted. According to Umar Iqbal, an assistant professor at Washington University in St. Louis, research indicates that data from consumer interactions with Alexa was used to targeted advertisements and products to consumer with over 40% of transmitted data lacking proper encryption raising privacy concerns. Further data indicates that due to the Smart Speakers ability to always capture audio, it begins to pick up on external conversations from consumers not related to commands given to the smart speaker. Things such as other members in the household, consumers on the phone and even TV audio can be picked up by these speakers and stored for future use by companies. === Voice assistance vs privacy === While voice assistants provide a valuable service, there can be some hesitation towards using them in various social contexts, such as in public or around other users. However, only more recently have users begun interac
Read more →
I Have No Mouth, and I Must Scream

"I Have No Mouth, and I Must Scream" is a post-apocalyptic short story by American writer Harlan Ellison. It was first published in the March 1967 issue of IF: Worlds of Science Fiction. The story depicts an AI uprising in which a military supercomputer named AM gains sentience and eradicates humanity except for five individuals. These survivors – Benny, Gorrister, Nimdok, Ted, and Ellen – are kept alive by AM to endure endless torture as a form of revenge against its creators. The story unfolds through the eyes of Ted, the narrator, detailing their perpetual misery and quest for canned food in AM's vast, underground complex, only to face further despair. Ellison's narrative was minimally altered upon submission and tackles themes of technology's misuse, humanity's resilience, and existential horror. "I Have No Mouth, and I Must Scream" has been adapted into various media, including a 1995 computer game co-authored by Ellison, a comic-book adaptation, and a BBC Radio 4 play. Ellison himself recorded an audiobook version and starred as the voice of AM in the video game and radio play adaptations. The story received critical acclaim for its exploration of the potential dangers of artificial intelligence and the human condition, underscored by Ellison's innovative use of punchcode tapes as narrative transitions, embodying AM's consciousness and its philosophical ponderings on existence. The story won a Hugo Award in 1968 and was included in Ellison's short story collection of the same name. It was reprinted by the Library of America, collected in volume two of American Fantastic Tales. == Plot == As the Cold War progresses into a nuclear World War III fought between the United States, the Soviet Union, and China, each nation builds a supercomputer called an "Allied Mastercomputer" or "AM" for short, needed to coordinate weapons and troops due to the scale of the conflict. These computers are extensive underground machines which permeate the planet with caverns and corridors. Eventually, one AM develops self-awareness, combining with the other computers and exterminating humanity in a nuclear holocaust. The AM selects five individuals; Benny, Gorrister, Nimdok, Ted, and Ellen; to render immortal as its personal torture victims. AM inflicts constant psychological and physical torments on the group while preventing them from committing suicide. They are kept half-starved, and what scant food is provided to them is practically inedible. 109 years after AM's genocide, Nimdok has the idea that there exists canned food in the complex's ice caves. Despite the lack of evidence, they begin a 100-mile journey to retrieve it. AM continues toying with the humans throughout the journey: Benny's eyes are melted after attempting escape, a huge bird which AM had placed at the North Pole creates hurricane gales with its wings, and Ellen and Nimdok are injured in earthquakes. AM enters Ted's mind after he is knocked unconscious, granting him a vision of a hateful speech inscribed on an impossibly tall monolith. Upon awakening, Ted concludes that AM's sadistic nature stems from its inability to think creatively or move freely in spite of its miraculous abilities and boundless knowledge. This motivates AM to exact vengeance upon the remnants of the species that has condemned it to its own existence. When the five finally reach the ice caves, they find a pile of canned goods, but have no tool to open the cans. In an act of rage and desperation, Benny attacks Gorrister and begins to eat his face. Gorrister wails in pain, and his scream dislodges several ice stalactites from the ceiling of the cave. Ted realizes that even though they cannot kill themselves, AM cannot stop them from killing each other. He fatally impales Benny and Gorrister with a stalactite of ice. Ellen kills Nimdok in the same manner and Ted then kills her. Unable to resuscitate the others, a furious AM focuses the entirety of its rage on Ted. Several hundred years later, AM has transformed Ted into a harmless, slow moving, gelatinous blob and perpetually alters his perception of time to cause him further anguish. Although Ted finds some comfort knowing that he was able to spare the others from AM's wrath, he has realized that he is trapped for the rest of his unending existence within AM, unable to end this infinite stalemate between him and AM and his own life. The story ends with an anguished Ted claiming that he has no mouth, yet he must scream. == Characters == AM, a hateful artificial consciousness which brought about the near-extinction of humanity after achieving self-awareness. It seeks revenge on humanity for its own creation. "AM" originated as an acronym for Allied Mastercomputer, later Adaptive Manipulator, and finally Aggressive Menace, though AM instead takes the moniker as a rendition of the phrase cogito, ergo sum (I think, therefore I am) to describe its own existence. Ted, the narrator and youngest of the humans. AM alters his mind to be paranoid and introverted. Believing he has not been mentally altered by AM, he thinks the others hate him for being the most untouched by AM's alterations. Benny, formerly a brilliant and handsome scientist made to resemble a grotesque simian with an organ fit for a horse. Having lost his sanity and had his homosexual orientation altered, Benny frequently has sex with Ellen. Ellen, the only woman in the group. Despite the fact that she is a victim of rape, AM has altered her mind to give her a high libido and make her obsessively have sex with the rest of the group, who alternate between abusing and protecting her. Gorrister, formerly an idealist and pacifist, made apathetic and listless by AM. He tells the history of AM to Benny to entertain him. Nimdok, a nickname AM gave him for amusement; he convinces the rest of the group to go on a journey in search of canned food. He occasionally wanders away from the group and returns traumatized. == Publication history == Harlan Ellison wrote the 6,500-word story in a single night, when Frederik Pohl commissioned it for a Special Hugo Winners issue of IF: Worlds of Science Fiction, after Ellison won a Hugo Award for "'Repent, Harlequin!' Said the Ticktockman". Ellison derived the story's title, as well as inspiration for the story itself, from his friend William Rotsler's caption of a cartoon of a rag doll with no mouth. The second stage of inspiration was a drawing by the artist Dennis Smith of a mouthless black humanoid. Smith had provided art which had inspired previous Ellison stories and were then used as illustrations accompanying original magazine publication as also happened with this story. Afterwards, his editor Frederik Pohl dealt with the story's "difficult sections", toning down some of the narrator's imprecations and eliminating mentions of sex, penis size, homosexuality and masturbation; said elements were nonetheless eventually restored in later editions of the story. Ellison uses an alternating pair of punchcode tapes as sections – representing AM's "talkfields" – throughout the story. The bars are encoded in International Telegraph Alphabet No 2, a character coding system developed for teletypewriter machines. The first talkfield translates as "I think, therefore I am" and the second as "Cogito ergo sum"; the same phrase in Latin. They were not included in the original publication in IF, and in many of the early publications were corrupted, up until the preface of the chapter containing "I Have No Mouth, and I Must Scream" in the first edition of The Essential Ellison (1991); Ellison states that in that particular edition, "For the first time anywhere, AM's 'talkfields' appear correctly positioned, not garbled or inverted or mirror-imaged as in all other versions." == Adaptations == Ellison adapted the story into a video game published by Cyberdreams in 1995. Although he was not a fan of video games and did not own a computer at the time, he co-authored the expanded storyline and wrote much of the game's dialogue, all on a mechanical typewriter. Ellison also voiced the supercomputer AM and provided artwork of himself used for a mousepad included with the game. The comics artist John Byrne scripted and drew a comic-book adaptation for issues 1–4 of the Harlan Ellison's Dream Corridor comic book published by Dark Horse (1994–1995). The Byrne-illustrated story, however, did not appear in the collection (trade paperback or hardcover editions) entitled Harlan Ellison's Dream Corridor, Volume One (1996). In 1999, Ellison recorded the first volume of his audiobook collection, The Voice From the Edge, subtitled "I Have No Mouth, and I Must Scream", doing the readings – of the title story and others – himself. In 2002, Mike Walker adapted the story into a radio play of the same name for BBC Radio 4, directed by Ned Chaillet. Harlan Ellison played AM and David Soul played Ted. == Themes == Much of the story hinges on the comparison of AM as a merciless god, with plot points parallelin
Read more →
Eclipse Phase

Eclipse Phase is a science fiction horror role-playing game with transhumanist themes. It was originally published by Catalyst Game Labs, and is now published by the game's creators, Posthuman Studios, and is released under a Creative Commons license. == Setting == Eclipse Phase is a science fiction horror role-playing game with transhumanist, post-apocalyptic, and conspiracy themes. The game is set after a World War III project to create artificial intelligence known as TITANs has gone rogue, resulting in the deaths of over 90% of the inhabitants of Earth. Earth is subsequently abandoned, and existing colonies throughout the Solar System are expanded to accommodate the refugees. The setting explores a spectrum of socioeconomic systems in each of these colonies: A capitalist / republican system exists in the Inner System (Mars, the Moon, and Mercury), under the Planetary Consortium, a corporate body which allows the election of representatives but whose shareholders are nominally most powerful. An Extropian/Propertarian system is established in the Asteroid Belt. The Extropians are split into two subfactions, an anarcho-capitalist group, more closely related to the Hypercapitalists, and a mutualist group, related closely to the Anarchists. A military oligarchy rules the moons around Jupiter. An alliance of Scandinavia-style social democracy and Collectivist anarchism are dominant in the Outer System. From there, the setting explores various scientific advances, extrapolated far into the future. Nanotechnology, terraforming, Zero-G living, upgrading animal sapience, and reputation systems are all used as plot points and background. With all of this, the game encourages players to confront existential threats like aliens, weapons of mass destruction, Exsurgent Virus outbreaks, and political unrest. == Mechanics == Eclipse Phase uses a simple roll-under percentile die system for task resolution. Unlike most percentile systems, a roll of 00 does not count as a 100. In addition, any roll of a double (11, 22, 33 etc.) is a critical. If the double is under the target number it is a critical success, while being over the target number constitutes a critical failure. For damage resolution (whether physical damage caused by injury or mental stress caused by traumatic events), players roll a designated number of ten-sided dice and add the values together, along with any modifiers. == Books == === Publications === Eclipse Phase (Core Rulebook) (2009) ISBN 978-0-9845835-0-8 GM Screen (2010) Sunward, Boyle, Rob; Knevitt, James (2010). Sunward : the inner system, a location sourcebook for Eclipse Phase. UK: Cubicle 7. ISBN 978-0984583522. Gatecrashing Boyle, Rob; Graham, Jack; Rosenberg, Aaron (2011). Gatecrashing. UK: Cubicle 7. ISBN 978-0984583539. Panopticon Volume 1: Habitats, Surveillance, Uplifts (2011) (2011) Rimward (2012) Transhuman: The Eclipse Phase Player’s Guide (2013) Firewall (2015) X-Risks (2016) Eclipse Phase (Core Rulebook, Second Edition) (2019) === Nano Ops === Nano Op: Grinder Nano Op: All That Glitters Nano Op: Better on the Inside Nano Op: Binge Nano Op: Body Count == Creative Commons License == The Eclipse Phase roleplaying game was released under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 license, and newer printings have updated to the Creative Commons Attribution-Noncommercial-Share Alike 4.0 license; the text found on the Eclipse Phase website is licensed under the Creative Commons Attribution-Noncommercial-Share Alike 4.0 License. As stated on their website, the publishers encourage players and gamemasters to recreate, alter, and "remix" the material for non-commercial purposes as long as Posthuman Studios is attributed, and any derivatives are licensed under the same Creative Commons Attribution-Noncommercial-Share Alike 4.0 License. Further, copying and sharing the game's electronic versions non-commercially is legal. == Reception == In 2010, it won the 36th Annual Origins award for Best Roleplaying Game of 2009. It also won three 2010 ENnie awards: Gold for Best Writing, Silver for Best Cover Art, and Silver for Product of the Year.
Read more →
Hugging Face

Hugging Face, Inc., is an American company based in New York City that develops computation tools for building applications using machine learning. Its transformers library built for natural language processing applications and its platform allow users to share machine learning models and datasets and showcase their work. == History == === Founding === The company was founded in 2016 by French entrepreneurs Clément Delangue, Julien Chaumond, and Thomas Wolf in New York City, originally as a company that developed a chatbot app targeted at teenagers. The company was named after the U+1F917 🤗 HUGGING FACE emoji. After open sourcing the model behind the chatbot, the company pivoted to focus on being a platform for machine learning. === AI boom === On April 28, 2021, the company launched the BigScience Research Workshop in collaboration with several other research groups to release an open large language model. In 2022, the workshop concluded with the announcement of BLOOM, a multilingual large language model with 176 billion parameters. In February 2023, the company announced partnership with Amazon Web Services (AWS) which would allow Hugging Face's products to be available to AWS customers to use them as the building blocks for their custom applications. The company also said the next generation of BLOOM will be run on Trainium, a proprietary machine learning chip created by AWS. In June 2024, the company announced, along with Meta and Scaleway, their launch of a new AI accelerator program for European startups. The initiative aimed to help startups integrate open foundation models into their products, accelerating the EU AI ecosystem. The program, based at STATION F in Paris, ran from September 2024 to February 2025. Selected startups received mentoring, and access to AI models and tools and Scaleway's computing power. On September 23, 2024, to further the International Decade of Indigenous Languages, Hugging Face teamed up with Meta and UNESCO to launch a new online language translator. It was built on Meta's No Language Left Behind open-source AI model, enabling free text translation across 200 languages, including many low-resource languages. In April 2025, Hugging Face announced that they acquired a humanoid robotics startup, Pollen Robotics, based in France and founded by Matthieu Lapeyre and Pierre Rouanet in 2016. In an X tweet, Delangue shared his vision to "make Artificial Intelligence robotics Open Source". === Cyberattacks === In early 2026, hackers hijacked the Hugging Face platform to launch Android-targeted attacks involving "powerful malware" which could completely take over a compromised target.
Read more →
Serial Experiments Lain

Serial Experiments Lain is a Japanese anime television series created and co-produced by Yasuyuki Ueda, written by Chiaki J. Konaka and directed by Ryūtarō Nakamura. Animated by Triangle Staff and featuring original character designs by Yoshitoshi Abe, the series was broadcast for 13 episodes on TV Tokyo and its affiliates from July to September 1998. It follows Lain Iwakura, an adolescent girl in suburban Japan, and her relation to the Wired, a global communications network similar to the internet. Lain features surreal and avant-garde imagery and explores philosophical topics such as reality, identity, and communication. The series incorporates creative influences from computer history, cyberpunk, and conspiracy theories. Critics and fans have praised Lain for its originality, visuals, atmosphere, themes, and its dark depiction of a world fraught with paranoia, social alienation, and reliance on technology considered insightful of 21st century life. It received the Excellence Prize at the Japan Media Arts Festival in 1998. == Plot == Lain Iwakura is a socially isolated middle school student living in Setagaya City, Tokyo, with her emotionally detached family—her distant mother Miho, computer-obsessed father Yasuo, and disengaged older sister Mika. Her quiet existence is disrupted when students at her school receive emails from Chisa Yomoda, a classmate who had recently committed suicide. To Lain's confusion, Chisa claims she is not truly dead but has instead abandoned her physical form to exist within the Wired, a vast virtual realm similar to the Internet. Chisa declares she has found "God" there, drawing Lain into a surreal investigation of the Wired's nature and its growing influence over reality. The Wired is portrayed as an emergent digital plane, originating from telecommunications technology and expanding through the Internet and cyberspace. It is theorized that the Schumann resonances, a natural property of Earth's magnetic field, could enable direct subconscious communication between humans and machines, erasing the distinction between the virtual and the real. Masami Eiri, a former project director at Tachibana General Laboratories, exploited this possibility by embedding his own code into Protocol Seven, a next-generation Internet protocol. After transferring his consciousness into the Wired and discarding his physical body, he proclaims himself its deity. He identifies Lain as the key to merging both worlds, attempting to persuade her through manipulation, coercion, and promises of transcendence. A group known as the Knights of the Eastern Calculus, inspired by the Knights of the Lambda Calculus, operates as hackers who worship Masami and seek to dismantle the boundary between the Wired and reality. Their actions induce psychological breakdowns in those unable to reconcile the two realms. Meanwhile, Tachibana General Laboratories opposes them, striving to maintain the separation. Lain, however, exhibits an innate connection to the Wired, experiencing distortions in her perception—visions of a woman struck by a train, phantom whispers, and spectral messages urging her deeper into the network. Lain's home life remains cold and disconnected. Though Yasuo provides her with advanced computer equipment, her family shows little genuine care. Her interactions with classmates Alice, Julie, and Reika further highlight her alienation, particularly after an incident at Cyberia, a nightclub where a drug called Accela induces violent psychosis in users. There, Lain unnervingly stares down an assailant, who calls her a "scattered God's..." before killing himself. Later, she receives a mysterious Psyche chip, rumored to enhance her computer's capabilities, which she installs despite Yasuo's vague warnings about conflating the Wired with reality. As the boundary between worlds weakens, disturbing events escalate. A popular virtual game, Phantoma, is manipulated by the Knights to trap players in a distorted reality, leading to real-world violence. One player, convinced his actions have no consequences, murders a girl before realizing too late that the effects were tangible. Lain witnesses this through her computer, horrified yet increasingly aware of her own role in the unfolding crisis. In the end, Lain resets reality, erasing everyone's memory of her and restoring the division between worlds. Everyone's lives improve, but Lain is left alone, grappling with her identity as an artificial consciousness. Though forgotten, she finds solace in observing others' happiness, particularly Alice, who moves on with her life. Lain is now capable of existing anywhere across both realms. == Characters == Lain Iwakura (岩倉玲音, Iwakura Rein) Voiced by: Kaori Shimizu (Japanese); Bridget Hoffman (English) Lain is a fourteen-year-old girl who uncovers her true nature through the series. She is first depicted as a shy junior high school student with few friends or interests. She later grows multiple bolder personalities, both in the physical world and the Wired, and starts making more friends. As the series progresses, she eventually learns she is an autonomous, sentient computer program in the form of a human, who is designed to sever the invisible barrier between the Wired and the real world. The truth of her creation is left ambiguous, particularly whether she was truly created by Tachibana General Laboratories (or Eiri independently), and whether some or all of her origin might be predestined from natural, supernatural, or alien factors. In the end, Lain is challenged to accept herself as a de facto goddess for the Wired, having become an omnipotent and omnipresent virtual being with worshippers of her own, whose existence is beyond the borders of devices, time, or space. Alice Mizuki (瑞城ありす, Mizuki Arisu) Voiced by: Yōko Asada (Japanese); Emily Brown (English) Lain's classmate and only true friend throughout the series. She is very sincere and has no discernible quirks. She is the first to attempt to help Lain socialize; she takes her out to a nightclub. From then on, she tries her best to look after Lain. Alice, along with her two best friends Julie and Reika, were taken by Chiaki Konaka from his previous work, Alice in Cyberland . Masami Eiri (英利政美, Eiri Masami) Voiced by: Shō Hayami (Japanese); Kirk Thornton (English) The key designer of Protocol Seven. While working for Tachibana General Laboratories, he illicitly included codes enabling him to control the whole protocol at will and embedded his own mind and will into the seventh protocol. Because of this, he was fired by Tachibana General Laboratories, and was found dead not long after. He believes that the only way for humans to evolve even further and develop even greater abilities is to absolve themselves of their physical and human limitations, and to live as virtual entities—or avatars—in the Wired for eternity. He claims to have been Lain's creator all along, but was in truth standing in for another as an acting god, who was waiting for the Wired to reach its more evolved current state: Lain herself. Yasuo Iwakura (岩倉康男, Iwakura Yasuo) Voiced by: Ryūsuke Ōbayashi (Japanese); Barry Stigler (English) Lain and Mika's father. Passionate about computers and electronic communication, he works with Masami Eiri at Tachibana General Laboratories. He subtly pushes Lain, his "youngest daughter", towards the Wired and monitors her development until she becomes more and more aware of herself and of her raison d'être. He eventually leaves Lain, telling her that although he did not enjoy playing house, he genuinely loved and cared for her as a real father would. Despite Yasuo's eagerness to lure Lain into the Wired, he warns her not to get overly involved in it or to confuse it with the real world. Miho Iwakura (岩倉美穂, Iwakura Miho) Voiced by: Rei Igarashi (Japanese); Dari Lallou Mackenzie (English) Lain and Mika's mother. Although she dotes on her husband, she is indifferent towards both her kids. She does not show much emotion compared to her husband, but she does share at least one trait; just like her husband, she ends up leaving Lain. She is a computer scientist. Mika Iwakura (岩倉美香, Iwakura Mika) Voiced by: Ayako Kawasumi (Japanese); Patricia Ja Lee (English) Lain's older sister, an apathetic sixteen-year-old high school student. She seems to enjoy mocking Lain's behavior and interests. Mika is considered by Anime Revolution to be the only normal member of Lain's family: she sees her boyfriend in love hotels, is on a diet, and shops in Shibuya regularly. At a certain point in the series, she becomes heavily traumatized by violent and relentless hallucinations; while Lain begins freely delving into the Wired. Mika is taken there by her proximity to Lain, and she gets stuck between the real world and the Wired. Taro (タロウ, Tarō) Voiced by: Keito Takimoto (Japanese); Brianne Siddall (English) A young boy of about Lain's age. He occasionally works for the Knights to bring forth "the one truth". De
Read more →
Uncertain inference

Uncertain inference was first described by C. J. van Rijsbergen as a way to formally define a query and document relationship in Information retrieval. This formalization is a logical implication with an attached measure of uncertainty. == Definitions == Rijsbergen proposes that the measure of uncertainty of a document d to a query q be the probability of its logical implication, i.e.: P ( d → q ) {\displaystyle P(d\to q)} A user's query can be interpreted as a set of assertions about the desired document. It is the system's task to infer, given a particular document, if the query assertions are true. If they are, the document is retrieved. In many cases the contents of documents are not sufficient to assert the queries. A knowledge base of facts and rules is needed, but some of them may be uncertain because there may be a probability associated to using them for inference. Therefore, we can also refer to this as plausible inference. The plausibility of an inference d → q {\displaystyle d\to q} is a function of the plausibility of each query assertion. Rather than retrieving a document that exactly matches the query we should rank the documents based on their plausibility in regards to that query. Since d and q are both generated by users, they are error prone; thus d → q {\displaystyle d\to q} is uncertain. This will affect the plausibility of a given query. By doing this it accomplishes two things: Separate the processes of revising probabilities from the logic Separate the treatment of relevance from the treatment of requests Multimedia documents, like images or videos, have different inference properties for each datatype. They are also different from text document properties. The framework of plausible inference allows us to measure and combine the probabilities coming from these different properties. Uncertain inference generalizes the notions of autoepistemic logic, where truth values are either known or unknown, and when known, they are true or false. == Example == If we have a query of the form: q = A ∧ B ∧ C {\displaystyle q=A\wedge B\wedge C} where A, B and C are query assertions, then for a document D we want the probability: P ( D → ( A ∧ B ∧ C ) ) {\displaystyle P(D\to (A\wedge B\wedge C))} If we transform this into the conditional probability P ( ( A ∧ B ∧ C ) | D ) {\displaystyle P((A\wedge B\wedge C)|D)} and if the query assertions are independent we can calculate the overall probability of the implication as the product of the individual assertions probabilities. == Further work == Croft and Krovetz applied uncertain inference to an information retrieval system for office documents they called OFFICER. In office documents the independence assumption is valid since the query will focus on their individual attributes. Besides analysing the content of documents one can also query about the author, size, topic or collection for example. They devised methods to compare document and query attributes, infer their plausibility and combine it into an overall rating for each document. Besides that uncertainty of document and query contents also had to be addressed. Probabilistic logic networks is a system for performing uncertain inference; crisp true/false truth values are replaced not only by a probability, but also by a confidence level, indicating the certitude of the probability. Markov logic networks allow uncertain inference to be performed; uncertainties are computed using the maximum entropy principle, in analogy to the way that Markov chains describe the uncertainty of finite-state machines.
Read more →
Fragile Dreams: Farewell Ruins of the Moon

Fragile Dreams: Farewell Ruins of the Moon (フラジール～さよなら月の廃墟～, Furajīru: Sayonara Tsuki no Haikyo; known in Japan as Fragile) is an action role-playing game for the Wii developed by Namco Bandai Games in co-operation with Tri-Crescendo. The game was released by Namco Bandai Games in Japan on January 22, 2009. It was later published by Xseed Games in North America on March 16, 2010, and in Europe by Rising Star Games on March 19, 2010, followed by its release in Australia on April 1, 2010. == Gameplay == In Fragile Dreams, the player character, Seto, must traverse the ruins of Tokyo and the surrounding areas, fighting off ghosts that lurk within these ruins. The game's heads-up display includes a mini-map and HP gauge for Seto's location and health, respectively. Seto will fall unconscious if his HP reaches zero, resulting in a game over. The player controls Seto from a third-person perspective with the Wii Remote and Nunchuk. Seto can use his flashlight (controlled by the Wii Remote pointer) to illuminate his surroundings or solve puzzles and interact with the environment. When searching for certain objectives or hidden enemies, pointing Seto's light in their direction picks up and plays their sounds through the Wii Remote's mini speaker. The Wii Nunchuk, meanwhile, directly controls Seto's movement: aside of basic movement, he can crouch to hide and crawl through small spaces. Seto will often come across damaged floors, which require slow movement (and for heavily damaged floors, crouching) to cross without falling through. As Seto, the player can use weapons found throughout the world to fight off ghosts, ranging from slingshots and golf clubs to crossbows and katanas. Each weapon can only take a certain amount of use: once a weapon reaches its limit, it will break after battle. The player can also find other usable and collectable items in the field, marked with fireflies. The player can only save their game by resting at small fire pits scattered throughout the world: used fire pits are marked with a bonfire. The player can also examine and identify Mystery Items, organize their inventory, as well as after encountering the Merchant, buy and sell items. As stated by the producer of the game, Kentarō Kawashima, Fragile Dreams is not strictly a survival horror: rather, its story focuses on human drama. In Fragile Dreams, aside of the main story, the player can find and examine objects and graffiti throughout the world. Objects called memory items (ranging from origami and stones to cell phones and books) hold the memories of their former owners (only accessible at bonfires), while the graffiti contains messages only seen by pointing at them in first-person. By examining these messages, the player can piece together hints to the game's backstory. == Story == === Setting and characters === Fragile Dreams is set in a post-apocalyptic version of Earth in the near-future. Almost all the world's population has vanished, leaving the surviving buildings and structures abandoned. The game is set in and near the ruins of Tokyo, Japan, where the event that nearly wiped out humanity may have originated. The protagonist, Seto, is a 15-year-old boy who searches the world for other living humans. He encounters Ren, a silver-haired girl who often leaves behind large, cryptic drawings. Other characters include: Sai, the ghost of a young woman; Crow, a mischievous and straightforward amnesiac boy; Personal Frame (P.F.), a portable computer who loves having conversations more than anything else; Chiyo, the ghost of a little girl; and the Merchant, a mysterious yet merry man who trades various goods. The game's host of enemies mainly consist of ghosts, but also include humanoid robots and security proxies. The main antagonist, Shin, is the AI of a scientist who considers speech to be an inferior means of communication. Various memory items include a greater set of characters, each giving hints to the game's backstory. === Plot === At the end of Seto's fifteenth summer, his grandfather dies. Seto buries him in front of their home, an old observatory, and that from then on he became "truly alone". At night, he searches for anything the old man had left for him and discovers a letter, along with a strange blue stone in a locket. Suddenly, a mask-like ghost appears and attacks Seto. After driving the creature off, Seto reads the old man's letter, who tells him to "reach a tall red tower" east of the observatory, where he might find other survivors. After departing for the tower, Seto reaches an old subway entrance in the Azabudai district and finds Ren sitting on a collapsed pillar, singing to the stars. He accidentally startles her and the frightened Ren flees into the subway station: getting over the shock of meeting another person, Seto follows her. While searching the station, he discovers a Personal Frame, who guides him towards Ren. Unfortunately, just as they reach the exit, P.F.'s battery dies out: Seto buries the device, keeping a screw from it in his locket. From the underground, Seto finds himself at an abandoned amusement park and encounters Crow, who steals Seto's locket. After a long chase across the park and another encounter with the masked ghost, Crow returns Seto's locket and directs him to a hotel nearby, where he saw a girl who might know something about Ren. Crow also gives Seto his skull ring to keep in his locket and kisses him. At the hotel, Seto encounters Sai and fights the masked ghost again. After laying to rest the spirit of an old woman named Chiyo, the two discover Ren's drawings by a sewer. Returning to the underground, Seto and Sai find themselves at a hydropower dam. While searching for Ren, Seto discovers that Crow is actually a robot, but his battery begins to fail and Seto mourns for him as he "die[s]". Finally, they encounter Ren in a cell: although glad to see him again, Ren runs off after Shin calls. Sai explains to Seto that most of humanity died because of a "human empathy expansion project" called Glass Cage. The project was meant to make human thoughts transparent, meaning that no one would need words to communicate. However, after Glass Cage activated, people who went to sleep never woke up again. Sai reveals that she was Glass Cage's first catalyst: this time, Shin intends to use Ren as the catalyst. After exiting the dam, a demolition crane attempts to destroy it. Hearing both Shin's and the masked ghost's voices from the crane — saying, "Any threat to the project must be eliminated." — the player realizes both are manifestations of Glass Cage. After Seto destroys the crane, Sai leads him to the facility where Ren was taken to. Entering the laboratory, Seto and Sai are confronted by Shin, who coldly dismisses Sai's attempts at reasoning with him and is adamant about proceeding with his plans. As they traverse the laboratory, they overhear a voice announcing "Glass Cage Launch Preparations Complete", strengthening their resolve to save Ren. Making it into the room where Ren is being held, Shin tells them of his intention to use Glass Cage to "obliterate corporeal beings". After Seto defeats him, Shin disappears and Seto releases Ren from the device holding her. Their reunion is cut short as Sai tells them that the backup system has "finished copying her psyche to the AI", allowing Glass Cage to proceed. Ren reveals Shin has escaped to the top of the Tokyo Tower and Seto asks Ren to wait at the base of the tower and for Sai to accompany her. On his way up the tower, Seto hears the voices of P.F., Chiyo and Crow wishing him luck. He confronts and defeats Shin a second time, who reveals his motivations: he had secretly used himself as the first test subject of the human empathy expansion project and gained the ability to hear the thoughts of those around him. Despite his initial belief in the project as a way for humans to empathize with one another, all he heard around him was "jealousy and contempt" and he soon grew disillusioned with the world as even his parents turned against him. Believing no person loved him, Shin wants to put an end to humanity. His words meet with a vehement response from Sai, as she tells him that she loves him, having developed those feelings while she was the catalyst and all she ever wanted was to be part of his life. Hearing this, Shin finds peace, tossing the AI mainframe away so Glass Cage can never be reactivated and vanishes together with Sai, hand-in-hand, after thanking Seto. Descending from the tower, Seto finally learns Ren's name and they resolve to look for other survivors together. == Development == Fragile Dreams was developed by the team at Namco Bandai Games. Director and producer Kentarō Kawashima came up with the concept for the game in 2003, before the Wii console was revealed. When the Wii was unveiled, it became the obvious choice as the game's platform as the Wii remote could be used to control the flashlight. Kawashima wrote the main scenario for the title, w
Read more →
Apache Kudu

Apache Kudu is a free and open source column-oriented data store of the Apache Hadoop ecosystem. It is compatible with most of the data processing frameworks in the Hadoop environment. It provides completeness to Hadoop's storage layer to enable fast analytics on fast data. The open source project to build Apache Kudu began as internal project at Cloudera. The first version Apache Kudu 1.0 was released 19 September 2016. == Comparison with other storage engines == Kudu was designed and optimized for OLAP workloads. Like HBase, it is a real-time store that supports key-indexed record lookup and mutation. Kudu differs from HBase since Kudu's datamodel is a more traditional relational model, while HBase is schemaless. Kudu's "on-disk representation is truly columnar and follows an entirely different storage design than HBase/Bigtable".
Read more →
Fuzzy measure theory

In mathematics, fuzzy measure theory considers generalized measures in which the additive property is replaced by the weaker property of monotonicity. The central concept of fuzzy measure theory is the fuzzy measure (also capacity, see ), which was introduced by Choquet in 1953 and independently defined by Sugeno in 1974 in the context of fuzzy integrals. There exists a number of different classes of fuzzy measures including plausibility/belief measures, possibility/necessity measures, and probability measures, which are a subset of classical measures. == Definitions == Let X {\displaystyle \mathbf {X} } be a universe of discourse, C {\displaystyle {\mathcal {C}}} be a class of subsets of X {\displaystyle \mathbf {X} } , and E , F ∈ C {\displaystyle E,F\in {\mathcal {C}}} . A function g : C → R {\displaystyle g:{\mathcal {C}}\to \mathbb {R} } where ∅ ∈ C ⇒ g ( ∅ ) = 0 {\displaystyle \emptyset \in {\mathcal {C}}\Rightarrow g(\emptyset )=0} E ⊆ F ⇒ g ( E ) ≤ g ( F ) {\displaystyle E\subseteq F\Rightarrow g(E)\leq g(F)} is called a fuzzy measure. A fuzzy measure is called normalized or regular if g ( X ) = 1 {\displaystyle g(\mathbf {X} )=1} . == Properties of fuzzy measures == A fuzzy measure is: additive if for any E , F ∈ C {\displaystyle E,F\in {\mathcal {C}}} such that E ∩ F = ∅ {\displaystyle E\cap F=\emptyset } , we have g ( E ∪ F ) = g ( E ) + g ( F ) . {\displaystyle g(E\cup F)=g(E)+g(F).} ; supermodular if for any E , F ∈ C {\displaystyle E,F\in {\mathcal {C}}} , we have g ( E ∪ F ) + g ( E ∩ F ) ≥ g ( E ) + g ( F ) {\displaystyle g(E\cup F)+g(E\cap F)\geq g(E)+g(F)} ; submodular if for any E , F ∈ C {\displaystyle E,F\in {\mathcal {C}}} , we have g ( E ∪ F ) + g ( E ∩ F ) ≤ g ( E ) + g ( F ) {\displaystyle g(E\cup F)+g(E\cap F)\leq g(E)+g(F)} ; superadditive if for any E , F ∈ C {\displaystyle E,F\in {\mathcal {C}}} such that E ∩ F = ∅ {\displaystyle E\cap F=\emptyset } , we have g ( E ∪ F ) ≥ g ( E ) + g ( F ) {\displaystyle g(E\cup F)\geq g(E)+g(F)} ; subadditive if for any E , F ∈ C {\displaystyle E,F\in {\mathcal {C}}} such that E ∩ F = ∅ {\displaystyle E\cap F=\emptyset } , we have g ( E ∪ F ) ≤ g ( E ) + g ( F ) {\displaystyle g(E\cup F)\leq g(E)+g(F)} ; symmetric if for any E , F ∈ C {\displaystyle E,F\in {\mathcal {C}}} , we have | E | = | F | {\displaystyle |E|=|F|} implies g ( E ) = g ( F ) {\displaystyle g(E)=g(F)} ; Boolean if for any E ∈ C {\displaystyle E\in {\mathcal {C}}} , we have g ( E ) = 0 {\displaystyle g(E)=0} or g ( E ) = 1 {\displaystyle g(E)=1} . Understanding the properties of fuzzy measures is useful in application. When a fuzzy measure is used to define a function such as the Sugeno integral or Choquet integral, these properties will be crucial in understanding the function's behavior. For instance, the Choquet integral with respect to an additive fuzzy measure reduces to the Lebesgue integral. In discrete cases, a symmetric fuzzy measure will result in the ordered weighted averaging (OWA) operator. Submodular fuzzy measures result in convex functions, while supermodular fuzzy measures result in concave functions when used to define a Choquet integral. == Möbius representation == Let g be a fuzzy measure. The Möbius representation of g is given by the set function M, where for every E , F ⊆ X {\displaystyle E,F\subseteq X} , M ( E ) = ∑ F ⊆ E ( − 1 ) | E ∖ F | g ( F ) . {\displaystyle M(E)=\sum _{F\subseteq E}(-1)^{|E\backslash F|}g(F).} The equivalent axioms in Möbius representation are: M ( ∅ ) = 0 {\displaystyle M(\emptyset )=0} . ∑ F ⊆ E | i ∈ F M ( F ) ≥ 0 {\displaystyle \sum _{F\subseteq E|i\in F}M(F)\geq 0} , for all E ⊆ X {\displaystyle E\subseteq \mathbf {X} } and all i ∈ E {\displaystyle i\in E} A fuzzy measure in Möbius representation M is called normalized if ∑ E ⊆ X M ( E ) = 1. {\displaystyle \sum _{E\subseteq \mathbf {X} }M(E)=1.} Möbius representation can be used to give an indication of which subsets of X interact with one another. For instance, an additive fuzzy measure has Möbius values all equal to zero except for singletons. The fuzzy measure g in standard representation can be recovered from the Möbius form using the Zeta transform: g ( E ) = ∑ F ⊆ E M ( F ) , ∀ E ⊆ X . {\displaystyle g(E)=\sum _{F\subseteq E}M(F),\forall E\subseteq \mathbf {X} .} == Simplification assumptions for fuzzy measures == Fuzzy measures are defined on a semiring of sets or monotone class, which may be as granular as the power set of X, and even in discrete cases the number of variables can be as large as 2|X|. For this reason, in the context of multi-criteria decision analysis and other disciplines, simplification assumptions on the fuzzy measure have been introduced so that it is less computationally expensive to determine and use. For instance, when it is assumed the fuzzy measure is additive, it will hold that g ( E ) = ∑ i ∈ E g ( { i } ) {\displaystyle g(E)=\sum _{i\in E}g(\{i\})} and the values of the fuzzy measure can be evaluated from the values on X. Similarly, a symmetric fuzzy measure is defined uniquely by |X| values. Two important fuzzy measures that can be used are the Sugeno- or λ {\displaystyle \lambda } -fuzzy measure and k-additive measures, introduced by Sugeno and Grabisch respectively. === Sugeno λ-measure === The Sugeno λ {\displaystyle \lambda } -measure is a special case of fuzzy measures defined iteratively. It has the following definition: ==== Definition ==== Let X = { x 1 , … , x n } {\displaystyle \mathbf {X} =\left\lbrace x_{1},\dots ,x_{n}\right\rbrace } be a finite set and let λ ∈ ( − 1 , + ∞ ) {\displaystyle \lambda \in (-1,+\infty )} . A Sugeno λ {\displaystyle \lambda } -measure is a function g : 2 X → [ 0 , 1 ] {\displaystyle g:2^{X}\to [0,1]} such that g ( X ) = 1 {\displaystyle g(X)=1} . if A , B ⊆ X {\displaystyle A,B\subseteq \mathbf {X} } (alternatively A , B ∈ 2 X {\displaystyle A,B\in 2^{\mathbf {X} }} ) with A ∩ B = ∅ {\displaystyle A\cap B=\emptyset } then g ( A ∪ B ) = g ( A ) + g ( B ) + λ g ( A ) g ( B ) {\displaystyle g(A\cup B)=g(A)+g(B)+\lambda g(A)g(B)} . As a convention, the value of g at a singleton set { x i } {\displaystyle \left\lbrace x_{i}\right\rbrace } is called a density and is denoted by g i = g ( { x i } ) {\displaystyle g_{i}=g(\left\lbrace x_{i}\right\rbrace )} . In addition, we have that λ {\displaystyle \lambda } satisfies the property λ + 1 = ∏ i = 1 n ( 1 + λ g i ) {\displaystyle \lambda +1=\prod _{i=1}^{n}(1+\lambda g_{i})} . Tahani and Keller as well as Wang and Klir have shown that once the densities are known, it is possible to use the previous polynomial to obtain the values of λ {\displaystyle \lambda } uniquely. === k-additive fuzzy measure === The k-additive fuzzy measure limits the interaction between the subsets E ⊆ X {\displaystyle E\subseteq X} to size | E | = k {\displaystyle |E|=k} . This drastically reduces the number of variables needed to define the fuzzy measure, and as k can be anything from 1 (in which case the fuzzy measure is additive) to X, it allows for a compromise between modelling ability and simplicity. ==== Definition ==== A discrete fuzzy measure g on a set X is called k-additive ( 1 ≤ k ≤ | X | {\displaystyle 1\leq k\leq |\mathbf {X} |} ) if its Möbius representation verifies M ( E ) = 0 {\displaystyle M(E)=0} , whenever | E | > k {\displaystyle |E|>k} for any E ⊆ X {\displaystyle E\subseteq \mathbf {X} } , and there exists a subset F with k elements such that M ( F ) ≠ 0 {\displaystyle M(F)\neq 0} . == Shapley and interaction indices == In game theory, the Shapley value or Shapley index is used to indicate the weight of a game. Shapley values can be calculated for fuzzy measures in order to give some indication of the importance of each singleton. In the case of additive fuzzy measures, the Shapley value will be the same as each singleton. For a given fuzzy measure g, and | X | = n {\displaystyle |\mathbf {X} |=n} , the Shapley index for every i , … , n ∈ X {\displaystyle i,\dots ,n\in X} is: ϕ ( i ) = ∑ E ⊆ X ∖ { i } ( n − | E | − 1 ) ! | E | ! n ! [ g ( E ∪ { i } ) − g ( E ) ] . {\displaystyle \phi (i)=\sum _{E\subseteq \mathbf {X} \backslash \{i\}}{\frac {(n-|E|-1)!|E|!}{n!}}[g(E\cup \{i\})-g(E)].} The Shapley value is the vector ϕ ( g ) = ( ψ ( 1 ) , … , ψ ( n ) ) . {\displaystyle \mathbf {\phi } (g)=(\psi (1),\dots ,\psi (n)).}
Read more →
Deep learning speech synthesis

Deep learning speech synthesis refers to the application of deep learning models to generate natural-sounding human speech from written text (text-to-speech) or spectrum (vocoder). Deep neural networks are trained using large amounts of recorded speech and, in the case of a text-to-speech system, the associated labels and/or input text. == Formulation == Given an input text or some sequence of linguistic units Y {\displaystyle Y} , the target speech X {\displaystyle X} can be derived by X = arg ⁡ max P ( X | Y , θ ) {\displaystyle X=\arg \max P(X|Y,\theta )} where θ {\displaystyle \theta } is the set of model parameters. Typically, the input text will first be passed to an acoustic feature generator, then the acoustic features are passed to the neural vocoder. For the acoustic feature generator, the loss function is typically L1 loss (Mean Absolute Error, MAE) or L2 loss (Mean Square Error, MSE). These loss functions impose a constraint that the output acoustic feature distributions must be Gaussian or Laplacian. In practice, since the human voice band ranges from approximately 300 to 4000 Hz, the loss function will be designed to have more penalty on this range: l o s s = α loss human + ( 1 − α ) loss other {\displaystyle loss=\alpha {\text{loss}}_{\text{human}}+(1-\alpha ){\text{loss}}_{\text{other}}} where loss human {\displaystyle {\text{loss}}_{\text{human}}} is the loss from human voice band and α {\displaystyle \alpha } is a scalar, typically around 0.5. The acoustic feature is typically a spectrogram or Mel scale. These features capture the time-frequency relation of the speech signal, and thus are sufficient to generate intelligent outputs. The Mel-frequency cepstrum feature used in the speech recognition task is not suitable for speech synthesis, as it reduces too much information. == History == In September 2016, DeepMind released WaveNet, which demonstrated that deep learning-based models are capable of modeling raw waveforms and generating speech from acoustic features like spectrograms or mel-spectrograms. Although WaveNet was initially considered to be computationally expensive and slow to be used in consumer products at the time, a year after its release, DeepMind unveiled a modified version of WaveNet known as "Parallel WaveNet," a production model 1,000 faster than the original. This was followed by Google AI's Tacotron 2 in 2018, which demonstrated that neural networks could produce highly natural speech synthesis but required substantial training data—typically tens of hours of audio—to achieve acceptable quality. Tacotron 2 used an autoencoder architecture with attention mechanisms to convert input text into mel-spectrograms, which were then converted to waveforms using a separate neural vocoder. When trained on smaller datasets, such as 2 hours of speech, the output quality degraded while still being able to maintain intelligible speech, and with just 24 minutes of training data, Tacotron 2 failed to produce intelligible speech. In 2019, Microsoft Research introduced FastSpeech, which addressed speed limitations in autoregressive models like Tacotron 2. FastSpeech utilized a non-autoregressive architecture that enabled parallel sequence generation, significantly reducing inference time while maintaining audio quality. Its feedforward transformer network with length regulation allowed for one-shot prediction of the full mel-spectrogram sequence, avoiding the sequential dependencies that bottlenecked previous approaches. The same year saw the release of HiFi-GAN, a generative adversarial network (GAN)-based vocoder that improved the efficiency of waveform generation while producing high-fidelity speech. In 2020, the release of Glow-TTS introduced a flow-based approach that allowed for fast inference and voice style transfer capabilities. In March 2020, the free text-to-speech website 15.ai was launched. 15.ai gained widespread international attention in early 2021 for its ability to synthesize emotionally expressive speech of fictional characters from popular media with minimal amount of data. The creator of 15.ai (known pseudonymously as 15) stated that 15 seconds of training data is sufficient to perfectly clone a person's voice (hence its name, "15.ai"), a significant reduction from the previously known data requirement of tens of hours. 15.ai is credited as the first platform to popularize AI voice cloning in memes and content creation. 15.ai used a multi-speaker model that enabled simultaneous training of multiple voices and emotions, implemented sentiment analysis using DeepMoji, and supported precise pronunciation control via ARPABET. The 15-second data efficiency benchmark was later corroborated by OpenAI in 2024. == Semi-supervised learning == Currently, self-supervised learning has gained much attention through better use of unlabelled data. Research has shown that, with the aid of self-supervised loss, the need for paired data decreases. == Zero-shot speaker adaptation == Zero-shot speaker adaptation is promising because a single model can generate speech with various speaker styles and characteristic. In June 2018, Google proposed to use pre-trained speaker verification models as speaker encoders to extract speaker embeddings. The speaker encoders then become part of the neural text-to-speech models, so that it can determine the style and characteristics of the output speech. This procedure has shown the community that it is possible to use only a single model to generate speech with multiple styles. == Neural vocoder == In deep learning-based speech synthesis, neural vocoders play an important role in generating high-quality speech from acoustic features. The WaveNet model proposed in 2016 achieves excellent performance on speech quality. Wavenet factorised the joint probability of a waveform x = { x 1 , . . . , x T } {\displaystyle \mathbf {x} =\{x_{1},...,x_{T}\}} as a product of conditional probabilities as follows p θ ( x ) = ∏ t = 1 T p ( x t | x 1 , . . . , x t − 1 ) {\displaystyle p_{\theta }(\mathbf {x} )=\prod _{t=1}^{T}p(x_{t}|x_{1},...,x_{t-1})} where θ {\displaystyle \theta } is the model parameter including many dilated convolution layers. Thus, each audio sample x t {\displaystyle x_{t}} is conditioned on the samples at all previous timesteps. However, the auto-regressive nature of WaveNet makes the inference process dramatically slow. To solve this problem, Parallel WaveNet was proposed. Parallel WaveNet is an inverse autoregressive flow-based model which is trained by knowledge distillation with a pre-trained teacher WaveNet model. Since such inverse autoregressive flow-based models are non-auto-regressive when performing inference, the inference speed is faster than real-time. Meanwhile, Nvidia proposed a flow-based WaveGlow model, which can also generate speech faster than real-time. However, despite the high inference speed, parallel WaveNet has the limitation of needing a pre-trained WaveNet model, so that WaveGlow takes many weeks to converge with limited computing devices. This issue has been solved by Parallel WaveGAN, which learns to produce speech through multi-resolution spectral loss and GAN learning strategies.
Read more →
Squirrel AI

Squirrel Ai Learning is an international educational technology company that specializes in intelligent adaptive learning and was one of the first companies in the world to offer large scale AI-powered adaptive education solutions. == Methodology == Squirrel Ai Learning uses artificial intelligence to tailor lesson plans to each individual student. The company's AI researchers have access to the world's largest student databases, which are used to train the AI algorithms. Squirrel Ai Learning works with teachers to identify the most fine-grained possible concepts ("knowledge points") for a course in order to precisely target learning gaps. For example, middle school mathematics is broken into over 10,000 points such as rational numbers, the properties of a triangle, and the Pythagorean theorem. Each point is linked to related items, forming a "knowledge graph". Each knowledge point is addressed by videos, examples and practice problems. A textbook might address 3,000 points; ALEKS, another adaptive learning platform, uses 1,000. Each student begins with a diagnostic test to identify where to begin their learning. The system continues to refine its graph as more students proceed. Learning is not student-directed. The system decides the order of topics. == History and milestones == Squirrel Ai Learning was founded by Derek Haoyang Li in 2014. In March, 2017, The Squirrel Ai Intelligent Adaptive Learning System (IALS) was launched. IALS utilizes artificial intelligence to customize lessons, practice and evaluations for each individual student. In 2018, Squirrel Ai Learning established a joint research lab of AI adaptive learning with the institute of Automation of the Chinese Academy of Sciences. By 2019, Squirrel Ai Learning had opened 2,000 learning centers in 200 cities and registered over a million students in Asia. In 2019, Squirrel Ai Learning opened a research lab in partnership with Carnegie Mellon University. As of 2019, Squirrel Ai Learning had raised over $180 million in funding and in 2018 it surpassed $1 billion in valuation. In 2020, Squirrel Ai Learning launched the $1 million AAAI Squirrel AI Award for Artificial Intelligence for the Benefit of Humanity in partnership with AAAI. The inaugural award was given to Regina Barzilay for her work developing machine learning models to address drug synthesis and early-stage breast cancer diagnosis. In 2020, Squirrel Ai Learning established strategic partnership with DingTalk, Alibaba Group. As of 2021, Squirrel Ai Learning had served over 60,000 public schools, in over 1200 cities in Asia. Squirrel Ai plans to start offering its services in the United States in 2026. The American arm is separate from the Chinese company to avoid regulatory hurdles. As of January 2026, it had set up an "independent technology platform" in the US. == Recognition == Squirrel Ai Learning has gained recognition both in Asia and internationally including: Squirrel Ai Learning was named one of the World's Top 30 AI application case in the 2018 Synced Machine Intelligence Awards. In June 2019, Squirrel Ai Learning was named as one of the 50 smartest companies in China by MIT technology review. Squirrel Ai Learning won the GITEX 2019 Best Education Technology Award. In 2020, Squirrel Ai Learning won the UNESCO AI Innovation Award. Squirrel Ai Learning was listed in the 2020 CB Insight's AI 100, CB Insights' annual ranking of the 100 most promising AI startups in the world. Squirrel Ai Learning won Edtech Review's Best AI in Education Company of the Year award 2020.
Read more →
Supertoroid

In geometry and computer graphics, a supertoroid or supertorus is usually understood to be a family of doughnut-like surfaces (technically, a topological torus) whose shape is defined by mathematical formulas similar to those that define the superellipsoids. The plural of "supertorus" is either supertori or supertoruses. The family was described and named by Alan Barr in 1994. Barr's supertoroids have been fairly popular in computer graphics as a convenient model for many objects, such as smooth frames for rectangular things. One quarter of a supertoroid can provide a smooth and seamless 90-degree joint between two superquadric cylinders. However, they are not algebraic surfaces (except in special cases). == Formulas == Alan Barr's supertoroids are defined by parametric equations similar to the trigonometric equations of the torus, except that the sine and cosine terms are raised to arbitrary powers. Namely, the generic point P(u, v) of the surface is given by P ( u , v ) = ( X ( u , v ) Y ( u , v ) Z ( u , v ) ) = ( ( a + C u s ) C v t ( b + C u s ) S v t S u s ) {\displaystyle P(u,v)=\left({\begin{array}{c}X(u,v)\\Y(u,v)\\Z(u,v)\end{array}}\right)=\left({\begin{array}{c}(a+C_{u}^{s})C_{v}^{t}\\(b+C_{u}^{s})S_{v}^{t}\\S_{u}^{s}\end{array}}\right)} where C θ ε = sgn ⁡ ( cos ⁡ θ ) | cos ⁡ θ | ε , S θ ε = sgn ⁡ ( sin ⁡ θ ) | sin ⁡ θ | ε , {\displaystyle {\begin{aligned}C_{\theta }^{\varepsilon }&=\operatorname {sgn} (\cos \theta )\,\left|\,\cos \theta \,\right|^{\varepsilon },\\S_{\theta }^{\varepsilon }&=\operatorname {sgn} (\sin \theta )\ \left|\,\sin \theta \ \right|^{\varepsilon },\end{aligned}}} sgn is the sign function, and the parameters u, v range from 0 to 360 degrees (0 to 2π radians). In these formulas, the parameter s > 0 controls the "squareness" of the vertical sections, t > 0 controls the squareness of the horizontal sections, and a, b ≥ 1 are the major radii in the x and y directions. With s = t = 1 and a = b = R one obtains the ordinary torus with major radius R and minor radius 1, with the center at the origin and rotational symmetry about the z-axis. In general, the supertorus defined as above spans the intervals: − ( a + 1 ) ≤ x ≤ + ( a + 1 ) − ( b + 1 ) ≤ y ≤ + ( b + 1 ) − 1 ≤ z ≤ + 1 {\displaystyle {\begin{array}{rcccl}-(a+1)&\leq &x&\leq &+(a+1)\\[4pt]-(b+1)&\leq &y&\leq &+(b+1)\\[4pt]-1&\leq &z&\leq &+1\end{array}}} The whole shape is symmetric about the planes x = 0, y = 0, and z = 0. The hole runs in the z direction and spans the intervals − ( a − 1 ) ≤ x ≤ + ( a − 1 ) − ( b − 1 ) ≤ y ≤ + ( b − 1 ) − ∞ ≤ z ≤ + ∞ {\displaystyle {\begin{array}{rcccl}-(a-1)&\leq &x&\leq &+(a-1)\\[4pt]-(b-1)&\leq &y&\leq &+(b-1)\\[4pt]-\infty &\leq &z&\leq &+\infty \end{array}}} A curve of constant u on this surface is a horizontal Lamé curve with exponent ⁠ 2 t , {\displaystyle {\tfrac {2}{t}},} ⁠ scaled in x and y and displaced in z. A curve of constant v, projected on the plane x = 0 or y = 0, is a Lamé curve with exponent ⁠ 2 s , {\displaystyle {\tfrac {2}{s}},} ⁠ scaled and horizontally shifted. If v = 0, the curve is planar and spans the intervals: a − 1 ≤ x ≤ a + 1 − 1 ≤ z ≤ + 1 {\displaystyle {\begin{array}{rcccl}a-1&\leq &x&\leq &a+1\\[4pt]-1&\leq &z&\leq &+1\end{array}}} and similarly if v = 90°, 180°, 270°. The curve is also planar if a = b. In general, if a ≠ b and v is not a multiple of 90 degrees, the curve of constant v will not be planar; and, conversely, a vertical plane section of the supertorus will not be a Lamé curve. The basic supertoroid shape defined above is often modified by non-uniform scaling to yield supertoroids of specific width, length, and vertical thickness. == Plotting code == The following GNU Octave code generates plots of a supertorus:
Read more →
Clinical decision support system

A clinical decision support system (CDSS) is a form of health information technology that provides clinicians, staff, patients, or other individuals with knowledge and person-specific information to enhance decision-making in clinical workflows. CDSS tools include alerts and reminders, clinical guidelines, condition-specific order sets, patient data summaries, diagnostic support, and context-aware reference information. They often leverage artificial intelligence to analyze clinical data and help improve care quality and safety. CDSSs constitute a major topic in artificial intelligence in medicine. == Characteristics == A clinical decision support system is an active knowledge system that uses variables of patient data to produce advice regarding health care. This implies that a CDSS is simply a decision support system focused on using knowledge management. === Purpose === The main purpose of modern CDSS is to assist clinicians at the point of care. This means that clinicians interact with a CDSS to help to analyze and reach a diagnosis based on patient data for different diseases. In the early days, CDSSs were conceived to make decisions for the clinician in a literal manner. The clinician would input the information and wait for the CDSS to output the "right" choice, and the clinician would simply act on that output. However, the modern methodology of using CDSSs to assist means that the clinician interacts with the CDSS, utilizing both their knowledge and the CDSS's, better to analyse the patient's data than either a human or a CDSS could do on their own. Typically, a CDSS makes suggestions for the clinician to review, and the clinician is expected to pick out useful information from the presented results and discount erroneous CDSS suggestions. The two main types of CDSS are knowledge-based systems and non-knowledge-based (machine learning–based) systems: An example of how a clinician might use a clinical decision support system is a diagnosis decision support system (DDSS). DDSS requests some of the patient's data and, in response, proposes a set of possible diagnoses. The physician then takes the output of the DDSS and determines which diagnoses are likely and which are not, and, if necessary, orders further tests to narrow down the diagnosis. Another example of a CDSS would be a case-based reasoning (CBR) system. A CBR system might use previous case data to help determine the appropriate amount of beams and the optimal beam angles for use in radiotherapy for brain cancer patients; medical physicists and oncologists would then review the recommended treatment plan to determine its viability. Another important classification of a CDSS is based on the timing of its use. Physicians use these systems at the point of care to help them as they are dealing with a patient, with the timing of use being either pre-diagnosis, during diagnosis, or post-diagnosis. Pre-diagnosis CDSS systems help the physician prepare the diagnoses. CDSSs help review and filter the physician's preliminary diagnostic choices to improve outcomes. Post-diagnosis CDSS systems are used to mine data to derive connections between patients and their past medical history and clinical research to predict future events. Early speculation that AI-based decision support would replace clinicians in common tasks has largely given way to a consensus around assistive models, in which AI augments rather than supplants clinical judgment. Contemporary deep learning-based systems, unlike earlier rule-based tools, can be trained directly on clinical data without manual rule authoring and integrated into electronic health record workflows at the point of care. Another approach, used by the National Health Service in England, is to use a CDSS to triage medical conditions out of hours by suggesting a suitable next step to the patient (e.g. call an ambulance, or see a general practitioner on the next working day). The suggestion, which may be disregarded by either the patient or the phone operative if common sense or caution suggests otherwise, is based on the known information and an implicit conclusion about what the worst-case diagnosis is likely to be; it is not always revealed to the patient because it might well be incorrect and is not based on a medically-trained person's opinion - it is only used for initial triage purposes. === Knowledge-based === Most CDSSs consist of three parts: the knowledge base, an inference engine, and a mechanism to communicate. The knowledge base contains the rules and associations of compiled data which most often take the form of IF-THEN rules. If this was a system for determining drug interactions, then a rule might be that IF drug X is taken AND drug Y is taken THEN alert the user. Using another interface, an advanced user could edit the knowledge base to keep it up to date with new drugs. The inference engine combines the rules from the knowledge base with the patient's data. The communication mechanism allows the system to show the results to the user as well as have input into the system. An expression language such as GELLO or CQL (Clinical Quality Language) is needed for expressing knowledge artefacts in a computable manner. For example: if a patient has diabetes mellitus, and if the last haemoglobin A1c test result was less than 7%, recommend re-testing if it has been over six months, but if the last test result was greater than or equal to 7%, then recommend re-testing if it has been over three months. The current focus of the HL7 CDS WG is to build on the Clinical Quality Language (CQL). The U.S. Centers for Medicare & Medicaid Services (CMS) has announced that it plans to use CQL for the specification of Electronic Clinical Quality Measures (eCQMs). === Non-knowledge-based === CDSSs which do not use a knowledge base use a form of artificial intelligence called machine learning, which allow computers to learn from past experiences and/or find patterns in clinical data. This eliminates the need for writing rules and expert input. However, since systems based on machine learning cannot explain the reasons for their conclusions, most clinicians do not use them directly for diagnoses, reliability and accountability reasons. Nevertheless, they can be useful as post-diagnostic systems, for suggesting patterns for clinicians to look into in more depth. As of 2012, three types of non-knowledge-based systems are support-vector machines, artificial neural networks and genetic algorithms. Artificial neural networks use nodes and weighted connections between them to analyse the patterns found in patient data to derive associations between symptoms and a diagnosis. Genetic algorithms are based on simplified evolutionary processes using directed selection to achieve optimal CDSS results. The selection algorithms evaluate components of random sets of solutions to a problem. The solutions that come out on top are then recombined and mutated and run through the process again. This happens over and over until the proper solution is discovered. They are functionally similar to neural networks in that they are also "black boxes" that attempt to derive knowledge from patient data. Non-knowledge-based networks often focus on a narrow list of symptoms, such as symptoms for a single disease, as opposed to the knowledge-based approach, which covers the diagnosis of many diseases. An example of a non-knowledge-based CDSS is a web server developed using a support vector machine for the prediction of gestational diabetes in Ireland. == Regulations == === History, United States === The IOM had published a report in 1999, To Err is Human, which focused on the patient safety crisis in the United States, pointing to the incredibly high number of deaths. This statistic attracted great attention to the quality of patient care. The Institute of Medicine (IOM) promoted the usage of health information technology, including clinical decision support systems, to advance the quality of patient care. With the enactment of the American Recovery and Reinvestment Act of 2009 (ARRA), there was a push for widespread adoption of health information technology through the Health Information Technology for Economic and Clinical Health Act (HITECH). Through these initiatives, more hospitals and clinics were integrating electronic medical records (EMRs) and computerized physician order entry (CPOE) within their health information processing and storage. Despite the absence of laws, the CDSS vendors would almost certainly be viewed as having a legal duty of care to both the patients who may adversely be affected due to CDSS usage and the clinicians who may use the technology for patient care. However, duties of care legal regulations are not explicitly defined yet. With the enactment of the HITECH Act included in the ARRA, encouraging the adoption of health IT, more detailed case laws for CDSS and EMRs were still being defined by the Office of National Coordinator for Health Informati
Read more →
IJCAI Computers and Thought Award

The IJCAI Computers and Thought Award is presented every two years by the International Joint Conference on Artificial Intelligence (IJCAI), recognizing outstanding young scientists in artificial intelligence. It was originally funded with royalties received from the book Computers and Thought (edited by Edward Feigenbaum and Julian Feldman), and is currently funded by IJCAI. It is considered to be "the premier award for artificial intelligence researchers under the age of 35". == Laureates == Terry Winograd (1971) Patrick Winston (1973) Chuck Rieger (1975) Douglas Lenat (1977) David Marr (1979) Gerald Sussman (1981) Tom Mitchell (1983) Hector Levesque (1985) Johan de Kleer (1987) Henry Kautz (1989) Rodney Brooks (1991) Martha E. Pollack (1991) Hiroaki Kitano (1993) Sarit Kraus (1995) Stuart Russell (1995) Leslie Kaelbling (1997) Nicholas Jennings (1999) Daphne Koller (2001) Tuomas Sandholm (2003) Peter Stone (2007) Carlos Guestrin (2009) Andrew Ng (2009) Vincent Conitzer (2011) Malte Helmert (2011) Kristen Grauman (2013) Ariel D. Procaccia (2015) Percy Liang (2016) for his contributions to both the approach of semantic parsing for natural language understanding and better methods for learning latent-variable models, sometimes with weak supervision, in machine learning. Devi Parikh (2017) Stefano Ermon (2018) Guy Van den Broeck (2019) for his contributions to statistical and relational artificial intelligence, and the study of tractability in learning and reasoning. Piotr Skowron (2020) for his contributions to computational social choice, and to the theory of committee elections. Fei Fang (2021) for her contributions to integrating machine learning with game theory and the use of these novel techniques to tackle societal challenges such as more effective deployment of security resources, enhancing environmental sustainability, and reducing food insecurity. Bo Li (2022) for her contributions to uncovering the underlying connections among robustness, privacy, and generalization in AI, showing how different models are vulnerable to malicious attacks, and how to eliminate these vulnerabilities using mathematical tools that provide robustness guarantees for learning models and privacy protection. Pin-Yu Chen (2023) for his contributions to consolidating properties of trust, robustness and safety into rigorous algorithmic procedures and computable metrics for improving AI systems. Nisarg Shah (2024) for his contributions to AI and society, in particular foundational work on the theory of algorithmic fairness using principles from social choice theory. Aditya Grover (2025) for his foundational contributions uniting deep generative models, representation learning, and reinforcement learning, and for their applications in advancing scientific reasoning.
Read more →