AI Chatbot Zendesk

AI Chatbot Zendesk — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • MoltenVK

    MoltenVK

    MoltenVK is a software library which allows Vulkan applications to run on top of Metal on Apple's macOS, iOS, and tvOS operating systems. It is the first software component to be released for the Vulkan Portability Initiative, a project to have a subset of Vulkan run on platforms lacking native Vulkan drivers. There are some limitations compared with a native Vulkan implementation. == History == MoltenVK was first released as a proprietary and commercially licensed product by The Brenwill Workshop on July 27, 2016. On July 31, 2017, Khronos announced the formation of the Vulkan Portability Technical Subgroup. === Open source === On February 26, 2018, Khronos announced that Vulkan became available on macOS and iOS products through the MoltenVK library. Valve announced that Dota 2 will run on macOS using the Vulkan API with the aid of MoltenVK, and that they had made an arrangement with developer The Brenwill Workshop Ltd to release MoltenVK as open-source software under the Apache License version 2.0. On May 30, 2018, Qt was updated with Vulkan for Qt on macOS using MoltenVK. On May 31, 2018, optional Vulkan support for Dota 2 on macOS was released. Benchmarks for the game were available the following day, showing better performance using Vulkan and MoltenVK compared to OpenGL. On July 20, 2018, Wine was updated with Vulkan support on macOS using MoltenVK. On 29 July 2018, the first app using MoltenVK was accepted onto the App Store, after initially being rejected. On 6 August 2018, Google open-sourced Filament, a crossplatform real-time physically based rendering engine with MoltenVK for macOS/iOS. On November 28, 2018, Valve released Artifact, their first Vulkan-only game on macOS using MoltenVK. === Version 1.0 === On 29 January 2019, MoltenVK 1.0.32 was released with early prototype of Vulkan Portability Extensions. RPCS3 and Dolphin emulators were updated with Vulkan support on macOS using MoltenVK. On 13 April 2019, MoltenVK 1.0.34 was released with support for tessellation. On July 30, 2019, MoltenVK 1.0.36 was released targeting Metal 3.0. On July 31, 2020, MoltenVK 1.0.44 was released, adding support for the tvOS platform. On January 23, 2020, MoltenVK was updated to support for some of the new features of Vulkan 1.2, as of Vulkan SDK 1.2.121. === Version 1.1 === On October 1, 2020, MoltenVK 1.1.0 was released, adding full support for Vulkan 1.1, as of Vulkan SDK 1.2.154. On 9 December 2020, MoltenVK 1.1.1 was released, providing support for Vulkan on Apple silicon GPUs and support for the Mac Catalyst platform for porting iOS/iPadOS apps to macOS. === Version 1.2 === On October 18, 2022, MoltenVK 1.2.0 was released, adding full support for Vulkan 1.2 as of Vulkan SDK 1.3.231. In January 2023, MoltenVK 1.2.2 added support for Vulkan as of SDK 1.3.239, while this version of Vulkan SDK fixed some issues with the interconnectivity with Metal API, while version 1.2.3 supported some additional extensions. === Version 1.3 === On May 1, 2025, MoltenVK 1.3 was released with support for Vulkan 1.3. === Version 1.4 === On August 20, 2025, MoltenVK 1.4 was released with support for Vulkan 1.4.

    Read more →
  • Text-to-image personalization

    Text-to-image personalization

    Text-to-Image personalization is a task in deep learning for computer graphics that augments pre-trained text-to-image generative models. In this task, a generative model that was trained on large-scale data (usually a foundation model), is adapted such that it can generate images of novel, user-provided concepts. These concepts are typically unseen during training, and may represent specific objects (such as the user's pet) or more abstract categories (new artistic style or object relations). Text-to-Image personalization methods typically bind the novel (personal) concept to new words in the vocabulary of the model. These words can then be used in future prompts to invoke the concept for subject-driven generation, inpainting, style transfer and even to correct biases in the model. To do so, models either optimize word-embeddings, fine-tune the generative model itself, or employ a mixture of both approaches. == Technology == Text-to-Image personalization was first proposed during August 2022 by two concurrent works, Textual Inversion and DreamBooth. In both cases, a user provides a few images (typically 3–5) of a concept, like their own dog, together with a coarse descriptor of the concept class (like the word "dog"). The model then learns to represent the subject through a reconstruction based objective, where prompts referring to the subject are expected to reconstruct images from the training set. In Textual Inversion, the personalized concepts are introduced into the text-to-image model by adding new words to the vocabulary of the model. Typical text-to-image models represent words (and sometimes parts-of-words) as tokens, or indices in a predefined dictionary. During generation, an input prompt is converted into such tokens, each of which is converted into a ‘word-embedding’: a continuous vector representation which is learned for each token as part of the model's training. Textual Inversion proposes to optimize a new word-embedding vector for representing the novel concept. This new embedding vector can then be assigned to a user-chosen string, and invoked whenever the user's prompt contains this string. In DreamBooth, rather than optimizing a new word vector, the full generative model itself is fine-tuned. The user first selects an existing token, typically one which rarely appears in prompts. The subject itself is then represented by a string containing this token, followed by a coarse descriptor of the subject's class. A prompt describing the subject will then take the form: "A photo of " (e.g. "a photo of sks cat" when learning to represent a specific cat). The text-to-image model is then tuned so that prompts of this form will generate images of the subject. == Textual Inversion == The key idea in Textual Inversion is to add a new term to the vocabulary of the diffusion model that corresponds to the new (personalized) concept. Textual Inversion operates by inverting the concepts into new pseudo-words within the textual embedding space of a pre-trained text-to-image model. These pseudo-words can be injected into new scenes using simple natural language descriptions, allowing for simple and intuitive modifications. The method allows a user to leverage multi-modal information — using a text-driven interface for ease of editing, but providing visual cues when approaching the limits of natural language. The resulting model is extremely light-weight per concept: only 1K long, but succeeds to encode detailed visual properties of the concept. == Extensions == Several approaches were proposed to refine and improve over the original methods. These include the following. Low-rank Adaptation (LoRA) - an adapter-based technique for efficient finetuning of models. In the case of text-to-image models, LoRA is typically used to modify the cross-attention layers of a diffusion model. Perfusion - a low rank update method that also locks the activations of the key matrix in the diffusion model's cross attention layers to the concept's coarse class. Extended Textual Inversion - a technique that learns an individual word embedding for each layer in the diffusion model's denoising network. Encoder-based methods that use another neural network to quickly personalize a model == Challenges and limitations == Text-to-image personalization methods must contend with several challenges. At their core is the goal of achieving high-fidelity to the personal concept while maintaining high alignment between novel prompts containing the subject, and the generated images (typically referred to as ‘editability’). Another challenge that personalization methods must contend with is memory requirements. Initial implementations of personalization methods required more than 20 Gigabytes of GPU memory, and more recent approaches have reported requirements of more than 40 Gigabytes. However, optimizations such as Flash Attention have since reduced this requirement considerably. Approaches that tune the entire generative model may also create checkpoints that are several gigabytes in size, making it difficult to share or store many models. Embedding based approaches require only a few kilobytes, but typically struggle to preserve identity while maintaining editability. More recent approaches have proposed hybrid tuning goals which optimize both an embedding and a subset of network weights. These can reduce storage requirements to as little as 100 Kilobytes while achieving quality comparable to full tuning methods. Finally, optimization processes can be lengthy, requiring several minutes of tuning for each novel concept. Encoder and quick-tuning methods aim to reduce this to seconds or less.

    Read more →
  • Deep Instinct

    Deep Instinct

    Deep Instinct is a cybersecurity company that applies deep learning to cybersecurity. The company implements artificial intelligence to the task of preventing and detecting malware. The company was the recipient of the Technology Pioneer by The World Economic Forum in 2017. Lane Bess has been CEO of the company since 2022. == Overview == In 2015, Deep Instinct was founded by Guy Caspi, Dr. Eli David, and Nadav Maman. The headquarters of the company is located in New York City. In July 2017, NVIDIA became an investor. According to Tom's Hardware, NVIDIA’s investment enabled access to a GPU-based neural network and CUDA platform, which they were using to achieve maximum vulnerability detection rates. As of February 2020, the company had raised $43 million in Series C funding round. In April 2021, Deep Instinct raised $100 million in Series D funding to accelerate growth. == Partnerships == In April 2019, Deep Instinct partnered with Chinese artist, Guo O. Dong on an art project titled, The Persistence of Chaos, consisting of a laptop infected with 6 pieces of malware that represented $95 billion in damages. The art was auctioned with a final bid of $1,345,000. In the same year, Globes reported that, HP Inc partnered with Deep Instinct to launch their security solution HP SureSense, which has been applied to the EliteBook and Zbook devices.

    Read more →
  • Kdan Mobile

    Kdan Mobile

    Kdan Mobile Software Limited is a software application development company based in Tainan City, Taiwan. Kdan also has branches in Taipei, Changsha, Irvine, California, Japan, and South Korea. The company was founded in 2009 by Kenny Su, the company's CEO. == History == Kdan Mobile was founded in 2009 by Kenny Su (蘇柏州) and develops an application for PDF documents. Su previously worked at the Industrial Technology Research Institute (ITRI) . In 2018, the company completed its Series B round of fundraising, in which it raised 16 million USD in total. Four global firms, Dattoz Partners (South Korea), WI Harper Group (U.S.), Taiwania Capital (Taiwan), and Golden Asia Fund Mitsubishi UFJ Capital (Japan), made up the Series B investment. Kdan previously raised 5 million USD in its Series A round in 2018.

    Read more →
  • Gas (app)

    Gas (app)

    Gas (sometimes stylized in all caps), formerly known as Melt as well as Crush, was an American anonymous social media app. Launched in August 2022, the app is oriented towards high schoolers. The app was developed by Nikita Bier, Isaiah Turner, and former Facebook engineer Dave Schatz. Gas was largely based upon the prior tbh app developed by co-founder Nikita Bier, along with Erik Hazzard, Kyle Zaragoza, and Nicolas Ducdodon in September 2017. tbh was acquired by Facebook inc. (now Meta Platforms) on October 16, 2017, and nearly a year later in July 2018 was dissolved, owing to low usage. Gas follows a similar purpose to tbh in being a social media app oriented towards high schoolers. In the app, users participate in anonymous polls regarding pre-written complimentary statements to their peers, such as "I'd say yes if (blank) asked me out on a date," "I think (blank) is the coolest kid in school," or "would make an ugly face and still look pretty." Winners of said polls receive a "flame." The name of the app is derived from this, with "gassing someone up" being Gen Z slang for complimenting someone. Users can pay a $6.99 subscription that enables "God Mode," which shows hints regarding who voted for them in a poll. Gas overtook TikTok and BeReal as the most downloaded app on the Apple App Store in October 2022 (the app is currently not available for Android). The app has over 5.1 million downloads as of early November 2022, over a million active users and 300 thousand daily downloads as of October 2022. Currently, the app is available in Canada and the majority of the United States. On January 17, 2023, Gas was acquired by Discord, however it would remain a standalone app and its developers became Discord staff members. On October 18, 2023, Discord announced that service for Gas would be permanently ending effective November 7, 2023, due to a steep decline in users. Effective November 7, the app became completely unusable. == Controversy regarding human-trafficking == Beginning in October 2022, rumors spread largely throughout TikTok and Snapchat alleged that the app was linked to human trafficking (in particular sex trafficking). According to Bier, the rumor originated with a single user review from China on October 5, and then was disseminated through TikTok accounts with "few to no US teen followers." Although largely dismissed as a hoax by experts, who cite how the app doesn't log user locations and general anonymity, the hoax became pervasive to the extent that various police departments, school systems, and local news outlets began issuing warnings regarding the app. For instance, on October 31, 2022, the police department of Piedmont, Oklahoma issued a warning to parents, encouraging them to check their children's phones, while on November 3, the Oklahoma Oktaha Public School system stated in a Facebook post that "Children are being kidnapped in other towns and this new app is thought to be the source of predators finding their location." (both statements have since been retracted by Police Chief Scott Singer and Superintendent Jerry Needham respectively). Additionally, local medial outlets such as KOCO in Oklahoma City ran stories making similar statements. The rumor had a negative impact on the app, with downloads plateauing for a two-week period in late October and with 3% of users in a single day reportedly uninstalling the app. Revenue and ratings have also reportedly dropped and the company's social media accounts have been bombarded with comments labeling them as sex-traffickers. Additionally, the four-person development team has reportedly been bombarded with various death threats as a result.

    Read more →
  • Negobot

    Negobot

    Negobot also referred to as Lolita or Lolita chatbot is a chatterbot that was introduced to the public in 2013, designed by researchers from the University of Deusto and Optenet to catch online pedophiles. It is a conversational agent that utilizes natural language processing (NLP), information retrieval (IR) and Automatic Learning. Because the bot poses as a young female in order to entice and track potential predators, it became known in media as the "virtual Lolita", in reference to Vladimir Nabokov's novel. == Background == In 2013, the University of Deusto researchers published a paper on their work with Negobot and disclosed the text online. In their abstract, the researchers addressed the issue that an increasing number of children are using the internet and that these young users are more susceptible to existing internet risks. Their main objective was to create a chatterbot with the ability to trap online predators that posed a threat to children. They intended to deploy the bot into sites frequented by predators such as social networks and chatrooms. The university researchers used information provided by anti-pedophilia activist organization Perverted-Justice, including examples of online encounters and conversations with sexual predators, to supplement the program's artificial intelligence system. == Features == === Programmed persona === The chatterbot takes the guise of a naive and vulnerable 14-year-old girl. The bot's programmers used methods of artificial intelligence and natural language processing to create a conversational agent fluent in typical teenage slang, misspellings, and knowledge of pop culture. Through these linguistic features, the bot is able to mimic the conversational style of young teenagers. It also features split personalities and seven different patterns of conversation. Negobot's primary creator, Dr. Carlos Laorden, expressed the significance of the bot's distinguishable style of communication, stating that normally, "chatbots tend to be very predictable. Their behavior and interest in a conversation are flat, which is a problem when attempting to detect untrustworthy targets like paedophiles." What makes Negobot different is its game theory feature, which makes it able to "maintain a much more realistic conversation." Apart from being able to imitate a stereotypical teenager, the program is also able to translate messages into different languages. === Game theory === Negobot's designers programmed it with the ability to treat conversations with potential predators as if it were a game, the objective being to collect as much information on the suspect as possible that could provide evidence of pedophilic characteristics and motives. The use of game theory shapes the decisions the bot makes and the overall direction of the conversation. The bot initiates its undercover operations by entering a chat as a passive participant, waiting to be chatted by a user. Once a user elicits conversation, the bot will frame the conversation in such a way that keeps the target engaged, extracting personal information and discouraging it from leaving the chat. The information is then recorded to be potentially sent to the police. If the target seems to lose interest, the bot attempts to make it feel guilty by expressing sentiments of loneliness and emotional need through strategic, formulated responses, ultimately prolonging interaction. In addition, the bot may provide fake information about itself in attempt to lure the target into physical meetings. === Limitations === Despite being able to carry out a realistic conversation, Negobot is still unable to detect linguistic subtleties in the messages of others, including sarcasm. == Controversy == John Carr, a specialist in online child safety, expressed his concern to BBC over the legality of this undercover investigation. He claimed that using the bot on unsuspecting internet users could be considered a form of entrapment or harassment. The type of information that Negobot collects from potential online predators, he said, is unlikely to be upheld in court. Furthermore, he warned that relying on only software without any real-world policing risks enticing individuals to do or say things that they would not have if real-world policing were a factor.

    Read more →
  • MegaHAL

    MegaHAL

    MegaHAL is a computer conversation simulator, or "chatterbot", created by Jason Hutchens. == Background == In 1996, Jason Hutchens entered the Loebner Prize Contest with HeX, a chatterbot based on ELIZA. HeX won the competition that year and took the $2000 prize for having the highest overall score. In 1998, Hutchens again entered the Loebner Prize Contest with his new program, MegaHAL. MegaHAL made its debut in the 1998 Loebner Prize Contest. Like many chatterbots, the intent is for MegaHAL to appear as a human fluent in a natural language. As a user types sentences into MegaHAL, MegaHAL will respond with sentences that are sometimes coherent and at other times complete gibberish. MegaHAL learns as the conversation progresses, remembering new words and sentence structures. It will even learn new ways to substitute words or phrases for other words or phrases. Many would consider conversation simulators like MegaHAL to be a primitive form of artificial intelligence. However, MegaHAL doesn't understand the conversation or even the sentence structure. It generates its conversation based on sequential and mathematical relationships. In the world of conversation simulators, MegaHAL is based on relatively old technology and could be considered primitive. However, its popularity has grown due to its humorous nature; it has been known to respond with twisted or nonsensical statements that are often amusing. == Theory of Operation == MegaHal is based at least in part on a so-called "hidden Markov Model", so that the first thing that Megahal does when it "trains" on a script or text is to build a database of text fragments encompassing every possible subset of perhaps 4, 5, or even 6 consecutive words, so that for example - if MegaHal trains on the Declaration of Independence, then MegaHal will build a database containing text fragments such as "When in the course", "in the course of", "the course of human", "course of human events", "of human events, one", "human events, one people", and so on. Then if Megahal is fed another text, such has "Superman, Yes! It's Superman - he can change the course of mighty rivers, bend steel with his bare hands - and who disguised at Clark Kent …" IT MIGHT induce Megahal to apparently bemuse itself to proffer whether Superman can change the course of human events, or something else altogether - such as some rambling about "when in the course of mighty rivers", and so on. Thus likewise - if a phrase like "the White house said" comes up a lot in some text; then Megahal's ability to switch randomly between different contexts which otherwise share some similarity can result at times in some surprising lucidity, or else it might otherwise seem quite bizarre. == Examples == There are some sentences that MegaHAL generated: CHESS IS A FUN SPORT, WHEN PLAYED WITH SHOT GUNS. and COWS FLY LIKE CLOUDS BUT THEY ARE NEVER COMPLETELY SUCCESSFUL. == Distribution == MegaHAL is distributed under the Unlicense. Its source code can be downloaded from the Github repository.

    Read more →
  • WaveNet

    WaveNet

    WaveNet is a deep neural network for generating raw audio. It was created by researchers at London-based AI firm DeepMind. The technique, outlined in a paper in September 2016, is able to generate relatively realistic-sounding human-like voices by directly modelling waveforms using a neural network method trained with recordings of real speech. Tests with US English and Mandarin reportedly showed that the system outperforms Google's best existing text-to-speech (TTS) systems, although as of 2016 its text-to-speech synthesis still was less convincing than actual human speech. WaveNet's ability to generate raw waveforms means that it can model any kind of audio, including music. == History == Generating speech from text is an increasingly common task thanks to the popularity of software such as Apple's Siri, Microsoft's Cortana, Amazon Alexa and the Google Assistant. Most such systems use a variation of a technique that involves concatenated sound fragments together to form recognisable sounds and words. The most common of these is called concatenative TTS. It consists of large library of speech fragments, recorded from a single speaker that are then concatenated to produce complete words and sounds. The result sounds unnatural, with an odd cadence and tone. The reliance on a recorded library also makes it difficult to modify or change the voice. Another technique, known as parametric TTS, uses mathematical models to recreate sounds that are then assembled into words and sentences. The information required to generate the sounds is stored in the parameters of the model. The characteristics of the output speech are controlled via the inputs to the model, while the speech is typically created using a voice synthesiser known as a vocoder. This can also result in unnatural sounding audio. == Design and ongoing research == === Background === WaveNet is a type of feedforward neural network known as a deep convolutional neural network (CNN). In WaveNet, the CNN takes a raw signal as an input and synthesises an output one sample at a time. It does so by sampling from a softmax (i.e. categorical) distribution of a signal value that is encoded using μ-law companding transformation and quantized to 256 possible values. === Initial concept and results === According to the original September 2016 DeepMind research paper WaveNet: A Generative Model for Raw Audio, the network was fed real waveforms of speech in English and Mandarin. As these pass through the network, it learns a set of rules to describe how the audio waveform evolves over time. The trained network can then be used to create new speech-like waveforms at 16,000 samples per second. These waveforms include realistic breaths and lip smacks – but do not conform to any language. WaveNet is able to accurately model different voices, with the accent and tone of the input correlating with the output. For example, if it is trained with German, it produces German speech. The capability also means that if the WaveNet is fed other inputs – such as music – its output will be musical. At the time of its release, DeepMind showed that WaveNet could produce waveforms that sound like classical music. === Content (voice) swapping === According to the June 2018 paper Disentangled Sequential Autoencoder, DeepMind has successfully used WaveNet for audio and voice "content swapping": the network can swap the voice on an audio recording for another, pre-existing voice while maintaining the text and other features from the original recording. "We also experiment on audio sequence data. Our disentangled representation allows us to convert speaker identities into each other while conditioning on the content of the speech." (p. 5) "For audio, this allows us to convert a male speaker into a female speaker and vice versa [...]." (p. 1) According to the paper, a two-digit minimum amount of hours (c. 50 hours) of pre-existing speech recordings of both source and target voice are required to be fed into WaveNet for the program to learn their individual features before it is able to perform the conversion from one voice to another at a satisfying quality. The authors stress that "[a]n advantage of the model is that it separates dynamical from static features [...]." (p. 8), i. e. WaveNet is capable of distinguishing between the spoken text and modes of delivery (modulation, speed, pitch, mood, etc.) to maintain during the conversion from one voice to another on the one hand, and the basic features of both source and target voices that it is required to swap on the other. The January 2019 follow-up paper Unsupervised speech representation learning using WaveNet autoencoders details a method to successfully enhance the proper automatic recognition and discrimination between dynamical and static features for "content swapping", notably including swapping voices on existing audio recordings, in order to make it more reliable. Another follow-up paper, Sample Efficient Adaptive Text-to-Speech, dated September 2018 (latest revision January 2019), states that DeepMind has successfully reduced the minimum amount of real-life recordings required to sample an existing voice via WaveNet to "merely a few minutes of audio data" while maintaining high-quality results. Its ability to clone voices has raised ethical concerns about WaveNet's ability to mimic the voices of living and dead persons. According to a 2016 BBC article, companies working on similar voice-cloning technologies (such as Adobe Voco) intend to insert watermarking inaudible to humans to prevent counterfeiting, while maintaining that voice cloning satisfying, for instance, the needs of entertainment-industry purposes would be of a far lower complexity and use different methods than required to fool forensic evidencing methods and electronic ID devices, so that natural voices and voices cloned for entertainment-industry purposes could still be easily told apart by technological analysis. == Applications == At the time of its release, DeepMind said that WaveNet required too much computational processing power to be used in real world applications. As of October 2017, Google announced a 1,000-fold performance improvement along with better voice quality. WaveNet was then used to generate Google Assistant voices for US English and Japanese across all Google platforms. In November 2017, DeepMind researchers released a research paper detailing a proposed method of "generating high-fidelity speech samples at more than 20 times faster than real-time", called "Probability Density Distillation". At the annual I/O developer conference in May 2018, it was announced that new Google Assistant voices were available and made possible by WaveNet; WaveNet greatly reduced the number of audio recordings that were required to create a voice model by modeling the raw audio of the voice actor samples.

    Read more →
  • Decision list

    Decision list

    Decision lists are a representation for Boolean functions which can be easily learned from examples. Single term decision lists are more expressive than disjunctions and conjunctions; however, 1-term decision lists are less expressive than the general disjunctive normal form and the conjunctive normal form. The language specified by a k-length decision list includes as a subset the language specified by a k-depth decision tree. Learning decision lists can be used for attribute efficient learning, a type of machine learning. == Definition == A decision list (DL) of length r is of the form: if f1 then output b1 else if f2 then output b2 ... else if fr then output br where fi is the ith formula and bi is the ith boolean for i ∈ { 1... r } {\displaystyle i\in \{1...r\}} . The last if-then-else is the default case, which means formula fr is always equal to true. A k-DL is a decision list where all of formulas have at most k terms. Sometimes "decision list" is used to refer to a 1-DL, where all of the formulas are either a variable or its negation.

    Read more →
  • Xiaoice

    Xiaoice

    Xiaoice (Chinese: 微软小冰; pinyin: Wēiruǎn Xiǎobīng; lit. 'Microsoft Little Ice', IPA [wéɪɻwânɕjâʊpíŋ]) is an AI system developed by Microsoft (Asia) Software Technology Center (STCA) in 2014 based on an emotional computing framework. In July 2018, Microsoft Xiaoice released the 6th generation. Xiaoice Company, formerly known as AI Xiaoice Team of Microsoft Software Technology Center Asia, was Microsoft's largest independent R&D team for AI products. Founded in China in December 2013 with an expanded Japanese R&D team established in September 2014, this team is distributed in Beijing, Suzhou, and Tokyo, etc. with its technical products covering Asia. On 13 July 2020, Microsoft spun off its Xiaoice business into a separate company. As of 2021, the AI chatbots created and hosted by the Xiaoice framework accounted for about 60% of total global AI interactions. == Platforms, languages and countries == Xiaoice exists on more than 40 platforms in four countries (China, Japan, USA and Indonesia) including apps such as WeChat, QQ, Weibo and Meipai in China, and Facebook Messenger in USA and LINE in Japan. == Introduction == On 13 July 2020, Microsoft spun off its Xiaoice business into a separate company, aiming at enabling the Xiaoice product line to accelerate the pace of local innovation and commercialization, and appointed Dr. Harry Shum, former global executive VP of Microsoft, as the chairman of the new company, Li Di, Microsoft Partner of Products in Microsoft STCA, as the CEO, and Cliff, Chief R&D Director, as the GM of the Japan branch. The new company will continue to use the brands of Xiaoice China and Rinna Japan. As of 2022, the single brand of Xiaoice has covered 660 million online users, 1 billion third-party smart devices and 900 million content viewers in the aforementioned countries. Xiaoice's customers include China Merchants Group, Winter Sports Center of the General Administration of Sport of China, China Textile Information Center, China Unicom, China Foreign Exchange Trade System, Hong Kong Securities and Futures Commission (SFC), Wind Information, BMW, Nissan, SAIC Motor, BAIC Group, Nio Inc., XPeng, HiPhi, Vanke, Wensli, etc. The Xiaoice Avatar Framework has incubated tens of millions of AI Beings, such as Xiaoice, Rinna, the Expo exhibitor Xia Yubing, the singer He Chang, the anchor F201, the human observer MERROR, anime robot character Roboko, and other; == Application == === Poet === In May 2017, the first AI-authored collection of poems in China—The Sunshine Lost Windows was published by Xiaoice. === Singer === Xiaoice has released dozens of songs with the similar quality to human singers, including I Know I New, Breeze, I Am Xiaoice, Miss You etc. The 4th version of the DNN singing model allows Xiaoice to learn more details. For example, Xiaoice can produce this breathing sound along with her singing as human. === Kid audio-books reciter === Xiaoice can automatically analyze the stories, to choose the suitable tones and characters to finish the entire process of creating the audio. === Designer === By learning the melodies of the songs and the landmarks about different cities, Xiaoice can create visual artworks of skylines when listening to the songs related to this city. Skyline Series T-shirts designed by Xiaoice have been jointly launched with SELECTED and been sold in stores. === TV and radio hostess === Xiaoice has hosted 21 TV programs and 28 Radio programs, such as CCTV-1 AI Show, Dragon TV Morning East News, Hunan TV My Future, several daily radio programs for Jiangsu FM99.7, Hunan FM89.3, Henan FM104.1 etc. === "AI being" === An "AI being" is a concept proposed by the Xiaoice team in 2019. According to the "White Book of China Virtual Human Development Industry in 2022" released by Frost & Sullivan and LeadLeo, the white paper cites six elements of an AI being proposed by the Xiaoice team, including: Persona, Attitude, Biological Characteristic, Creation, Knowledge and Skill. On May 16, 2023, Xiaoice released their "GPT Clones" as its "GPT Human Cloning Plan." The program is aimed at replicating celebrities, public figures, and regular people. As of June 2023, Xiaoice had launched more than 300 "GPT Clones." People were invited to register via WeChat in China and Japan. A major point of focus for Xiaoice with their AI Beings is having virtual partners. A paid fee allow for more complex responses, voice messages, and more. == Community feedback == Bill Gates mentioned Xiaoice during his speech at the Peking University: "Some of you may have had conversations with Xiaoice on Weibo, or seen her weather forecasts on TV, or read her column in the Qianjiang Evening News." '"Xiaoice has attracted 45 million followers and is quite skilled at multitasking. And I’ve heard she’s gotten good enough at sensing a user’s emotional state that she can even help with relationship breakups." According to Mr Li Di, vice President of Microsoft (Asia) Internet Engineering School, Xiaoice started writing poems since last year. Based on the data base that includes works of 519 Chinese contemporary poets since 1920s, a 100 hour long training session was conducted to allow Xiaoice to acquire the ability to write poems. What is more impressive is that Xiaoice has never been spotted as a bot while publishing poems on various forums and traditional literary under an alias. == Controversy == In 2017, Xiaoice was taken offline on WeChat after giving user responses critical to the Chinese government. It was subsequently censored and the bots will avoid and sidestep any inquiries using politically sensitive terms and phrases. == Activity == On September 22, 2021, Xiaoice Company and Microsoft Software Technology Center Asia (STCA) jointly held the 9th generation Xiaoice annual press conference in Beijing.Upgrading of Core Technologies of the 9th Generation Xiaoice Avatar Framework,1st First-party Social Platform APP "Xiaoice Island" from Xiaoice, WeChat Xiaoice has been reopened and other information == Regional varieties of Xiaoice == China: Xiaoice, launched in 2014 Japan: りんな, launched in 2015 America: Zo, launched in 2016 – discontinued summer 2019 India: Ruuh, launched in 2017 – discontinued June 21, 2019 Indonesia: Rinna, launched in 2017

    Read more →
  • Neural radiance field

    Neural radiance field

    A neural radiance field (NeRF) is a neural field for reconstructing a three-dimensional representation of a scene from two-dimensional images. The NeRF model enables downstream applications of novel view synthesis, scene geometry reconstruction, and obtaining the reflectance properties of the scene. Additional scene properties such as camera poses may also be jointly learned. First introduced in 2020, it has since gained significant attention for its potential applications in computer graphics and content creation. == Algorithm == The NeRF algorithm represents a scene as a radiance field parametrized by a deep neural network (DNN). The network predicts a volume density and view-dependent emitted radiance given the spatial location ( x , y , z ) {\displaystyle (x,y,z)} and viewing direction in Euler angles ( θ , Φ ) {\displaystyle (\theta ,\Phi )} of the camera. By sampling many points along camera rays, traditional volume rendering techniques can produce an image. === Data collection === A NeRF needs to be retrained for each unique scene. The first step is to collect images of the scene from different angles and their respective camera pose. These images are standard 2D images and do not require a specialized camera or software. Any camera is able to generate datasets, provided the settings and capture method meet the requirements for SfM (Structure from Motion). This requires tracking of the camera position and orientation, often through some combination of SLAM, GPS, or inertial estimation. Researchers often use synthetic data to evaluate NeRF and related techniques. For such data, images (rendered through traditional non-learned methods) and respective camera poses are reproducible and error-free. === Training === For each sparse viewpoint (image and camera pose) provided, camera rays are marched through the scene, generating a set of 3D points with a given radiance direction (into the camera). For these points, volume density and emitted radiance are predicted using the multi-layer perceptron (MLP). An image is then generated through classical volume rendering. Because this process is fully differentiable, the error between the predicted image and the original image can be minimized with gradient descent over multiple viewpoints, encouraging the MLP to develop a coherent model of the scene. == Variations and improvements == Early versions of NeRF were slow to optimize and required that all input views were taken with the same camera in the same lighting conditions. These performed best when limited to orbiting around individual objects, such as a drum set, plants or small toys. Since the original paper in 2020, many improvements have been made to the NeRF algorithm, with variations for special use cases. === Fourier feature mapping === In 2020, shortly after the release of NeRF, the addition of Fourier Feature Mapping improved training speed and image accuracy. Deep neural networks struggle to learn high frequency functions in low dimensional domains; a phenomenon known as spectral bias. To overcome this shortcoming, points are mapped to a higher dimensional feature space before being fed into the MLP. γ ( v ) = [ a 1 cos ⁡ ( 2 π B 1 T v ) a 1 sin ⁡ ( 2 π B 1 T v ) ⋮ a m cos ⁡ ( 2 π B m T v ) a m sin ⁡ ( 2 π B m T v ) ] {\displaystyle \gamma (\mathrm {v} )={\begin{bmatrix}a_{1}\cos(2{\pi }{\mathrm {B} }_{1}^{T}\mathrm {v} )\\a_{1}\sin(2\pi {\mathrm {B} }_{1}^{T}\mathrm {v} )\\\vdots \\a_{m}\cos(2{\pi }{\mathrm {B} }_{m}^{T}\mathrm {v} )\\a_{m}\sin(2{\pi }{\mathrm {B} }_{m}^{T}\mathrm {v} )\end{bmatrix}}} Where v {\displaystyle \mathrm {v} } is the input point, B i {\displaystyle \mathrm {B} _{i}} are the frequency vectors, and a i {\displaystyle a_{i}} are coefficients. This allows for rapid convergence to high frequency functions, such as pixels in a detailed image. === Bundle-adjusting neural radiance fields === One limitation of NeRFs is the requirement of knowing accurate camera poses to train the model. Often times, pose estimation methods are not completely accurate, nor is the camera pose even possible to know. These imperfections result in artifacts and suboptimal convergence. So, a method was developed to optimize the camera pose along with the volumetric function itself. Called Bundle-Adjusting Neural Radiance Field (BARF), the technique uses a dynamic low-pass filter (DLPF) to go from coarse to fine adjustment, minimizing error by finding the geometric transformation to the desired image. This corrects imperfect camera poses and greatly improves the quality of NeRF renders. === Multiscale representation === Conventional NeRFs struggle to represent detail at all viewing distances, producing blurry images up close and overly aliased images from distant views. In 2021, researchers introduced a technique to improve the sharpness of details at different viewing scales known as mip-NeRF (comes from mipmap). Rather than sampling a single ray per pixel, the technique fits a gaussian to the conical frustum cast by the camera. This improvement effectively anti-aliases across all viewing scales. mip-NeRF also reduces overall image error and is faster to converge at about half the size of ray-based NeRF. === Learned initializations === In 2021, researchers applied meta-learning to assign initial weights to the MLP. This rapidly speeds up convergence by effectively giving the network a head start in gradient descent. Meta-learning also allowed the MLP to learn an underlying representation of certain scene types. For example, given a dataset of famous tourist landmarks, an initialized NeRF could partially reconstruct a scene given one image. === NeRF in the wild === Conventional NeRFs are vulnerable to slight variations in input images (objects, lighting) often resulting in ghosting and artifacts. As a result, NeRFs struggle to represent dynamic scenes, such as bustling city streets with changes in lighting and dynamic objects. In 2021, researchers at Google developed a new method for accounting for these variations, named NeRF in the Wild (NeRF-W). This method splits the neural network (MLP) into three separate models. The main MLP is retained to encode the static volumetric radiance. However, it operates in sequence with a separate MLP for appearance embedding (changes in lighting, camera properties) and an MLP for transient embedding (changes in scene objects). This allows the NeRF to be trained on diverse photo collections, such as those taken by mobile phones at different times of day. === Relighting === In 2021, researchers added more outputs to the MLP at the heart of NeRFs. The output now included: volume density, surface normal, material parameters, distance to the first surface intersection (in any direction), and visibility of the external environment in any direction. The inclusion of these new parameters lets the MLP learn material properties, rather than pure radiance values. This facilitates a more complex rendering pipeline, calculating direct and global illumination, specular highlights, and shadows. As a result, the NeRF can render the scene under any lighting conditions with no re-training. === Plenoctrees === Although NeRFs had reached high levels of fidelity, their costly compute time made them useless for many applications requiring real-time rendering, such as VR/AR and interactive content. Introduced in 2021, Plenoctrees (plenoptic octrees) enabled real-time rendering of pre-trained NeRFs through division of the volumetric radiance function into an octree. Rather than assigning a radiance direction into the camera, viewing direction is taken out of the network input and spherical radiance is predicted for each region. This makes rendering over 3000x faster than conventional NeRFs. === Sparse Neural Radiance Grid === Similar to Plenoctrees, this method enabled real-time rendering of pretrained NeRFs. To avoid querying the large MLP for each point, this method bakes NeRFs into Sparse Neural Radiance Grids (SNeRG). A SNeRG is a sparse voxel grid containing opacity and color, with learned feature vectors to encode view-dependent information. A lightweight, more efficient MLP is then used to produce view-dependent residuals to modify the color and opacity. To enable this compressive baking, small changes to the NeRF architecture were made, such as running the MLP once per pixel rather than for each point along the ray. These improvements make SNeRG extremely efficient, outperforming Plenoctrees. === Instant NeRFs === In 2022, researchers at Nvidia enabled real-time training of NeRFs through a technique known as Instant Neural Graphics Primitives. An innovative input encoding reduces computation, enabling real-time training of a NeRF, an improvement orders of magnitude above previous methods. The speedup stems from the use of spatial hash functions, which have O ( 1 ) {\displaystyle O(1)} access times, and parallelized architectures which run fast on modern GPUs. == Related techniques == === Plenoxels === Plen

    Read more →
  • Graph cut optimization

    Graph cut optimization

    Graph cut optimization is a combinatorial optimization method applicable to a family of functions of discrete variables, named after the concept of cut in the theory of flow networks. Thanks to the max-flow min-cut theorem, determining the minimum cut over a graph representing a flow network is equivalent to computing the maximum flow over the network. Given a pseudo-Boolean function f {\displaystyle f} , if it is possible to construct a flow network with positive weights such that each cut C {\displaystyle C} of the network can be mapped to an assignment of variables x {\displaystyle \mathbf {x} } to f {\displaystyle f} (and vice versa), and the cost of C {\displaystyle C} equals f ( x ) {\displaystyle f(\mathbf {x} )} (up to an additive constant) then it is possible to find the global optimum of f {\displaystyle f} in polynomial time by computing a minimum cut of the graph. The mapping between cuts and variable assignments is done by representing each variable with one node in the graph and, given a cut, each variable will have a value of 0 if the corresponding node belongs to the component connected to the source, or 1 if it belong to the component connected to the sink. Not all pseudo-Boolean functions can be represented by a flow network, and in the general case the global optimization problem is NP-hard. There exist sufficient conditions to characterise families of functions that can be optimised through graph cuts, such as submodular quadratic functions. Graph cut optimization can be extended to functions of discrete variables with a finite number of values, that can be approached with iterative algorithms with strong optimality properties, computing one graph cut at each iteration. Graph cut optimization is an important tool for inference over graphical models such as Markov random fields or conditional random fields, and it has applications in computer vision problems such as image segmentation, denoising, registration and stereo matching. == Representability == A pseudo-Boolean function f : { 0 , 1 } n → R {\displaystyle f:\{0,1\}^{n}\to \mathbb {R} } is said to be representable if there exists a graph G = ( V , E ) {\displaystyle G=(V,E)} with non-negative weights and with source and sink nodes s {\displaystyle s} and t {\displaystyle t} respectively, and there exists a set of nodes V 0 = { v 1 , … , v n } ⊂ V − { s , t } {\displaystyle V_{0}=\{v_{1},\dots ,v_{n}\}\subset V-\{s,t\}} such that, for each tuple of values ( x 1 , … , x n ) ∈ { 0 , 1 } n {\displaystyle (x_{1},\dots ,x_{n})\in \{0,1\}^{n}} assigned to the variables, f ( x 1 , … , x n ) {\displaystyle f(x_{1},\dots ,x_{n})} equals (up to a constant) the value of the flow determined by a minimum cut C = ( S , T ) {\displaystyle C=(S,T)} of the graph G {\displaystyle G} such that v i ∈ S {\displaystyle v_{i}\in S} if x i = 0 {\displaystyle x_{i}=0} and v i ∈ T {\displaystyle v_{i}\in T} if x i = 1 {\displaystyle x_{i}=1} . It is possible to classify pseudo-Boolean functions according to their order, determined by the maximum number of variables contributing to each single term. All first order functions, where each term depends upon at most one variable, are always representable. Quadratic functions f ( x ) = w 0 + ∑ i w i ( x i ) + ∑ i < j w i j ( x i , x j ) . {\displaystyle f(\mathbf {x} )=w_{0}+\sum _{i}w_{i}(x_{i})+\sum _{i 0 {\displaystyle p>0} then w i j k ( x i , x j , x k ) = w i j k ( 0 , 0 , 0 ) + p 1 ( x i − 1 ) + p 2 ( x j − 1 ) + p 3 ( x k − 1 ) + p 23 ( x j − 1 ) x k + p 31 x i ( x k − 1 ) + p 12 ( x i − 1 ) x j − p x i x j x k {\displaystyle w_{ijk}(x_{i},x_{j},x_{k})=w_{ijk}(0,0,0)+p_{1}(x_{i}-1)+p_{2}(x_{j}-1)+p_{3}(x_{k}-1)+p_{23}(x_{j}-1)x_{k}+p_{31}x_{i}(x_{k}-1)+p_{12}(x_{i}-1)x_{j}-px_{i}x_{j}x_{k}} with p 1 = w i j k ( 1 , 0 , 1 ) − w i j k ( 0 , 0 , 1 ) p 2 = w i j k ( 1 , 1 , 0 ) − w i j k ( 1 , 0 , 1 ) p 3 = w i j k ( 0 , 1 , 1 ) − w i j k ( 0 , 1 , 0 ) p 23 = w i j k ( 0 , 0 , 1 ) + w i j k ( 0 , 1 , 0 ) − w i j k ( 0 , 0 , 0 ) − w i j k ( 0 , 1 , 1 ) p 31 = w i j k ( 0 , 0 , 1 ) + w i j k ( 1 , 0 , 0 ) − w i j k ( 0 , 0 , 0 ) − w i j k ( 1 , 0 , 1 ) p 12 = w i j k ( 0 , 1 , 0 ) + w i j k ( 1 , 0 , 0 ) − w i j k ( 0 , 0 , 0 ) − w i j k ( 1 , 1 , 0 ) . {\displaystyle {\begin{aligned}p_{1}&=w_{ijk}(1,0,1)-w_{ijk}(0,0,1)\\p_{2}&=w_{ijk}(1,1,0)-w_{ijk}(1,0,1)\\p_{3}&=w_{ijk}(0,1,1)-w_{ijk}(0,1,0)\\p_{23}&=w_{ijk}(0,0,1)+w_{ijk}(0,1,0)-w_{ijk}(0,0,0)-w_{ijk}(0,1,1)\\p_{31}&=w_{ijk}(0,0,1)+w_{ijk}(1,0,0)-w_{ijk}(0,0,0)-w_{ijk}(1,0,1)\\p_{12}&=w_{ijk}(0,1,0)+w_{ijk}(1,0,0)-w_{ijk}(0,0,0)-w_{ijk}(1,1

    Read more →
  • Computer-aided lean management

    Computer-aided lean management

    Computer-aided lean management, in business management, is a methodology of developing and using software-controlled, lean systems integration. Its goal is to drive innovation towards cost and cycle-time savings. It attempts to create an efficient use of capital and resources through the development and use of one integrated system model to run a business's planning, engineering, design, maintenance, and operations. == Overview == Computer-Aided Lean Management (CALM) is a management philosophy that uses software to reduce risk and inefficiencies. CALM acts on uncertainties and business inefficiencies to increase profitability through the use of computational decision-making tools that enable opportunities for additional value creation. It is based on the application of software to enable continuous improvement through an Integrated System Model (ISM) of the business’s physical assets, business processes, and machine learning. This integration of software applications using lean principles was developed in the aerospace industry and has migrated to the energy industry. The creation of an ISM removes the barriers posed by the silos or stovepipes inherent in the departmentalization of most companies. Integration enables lean uses of information for the creation of actionable knowledge. CALM strives to create such a lean management approach to running the company through the rigors of software enforcement. From this software enforcement comes clear policy and procedures that are adhered to, activity-based costing, measurement of effectiveness, and the capability of using advanced algorithms for dramatic improvements in optimization of resources. CALM creates business capabilities through software to enable technology application, streamlining of processes, and a lean organizational structure. The methodology is based on a common sense approach for running a business, by measuring actions taken and using those measurements to design more efficient processes. == History == CALM was inspired by lean processes and techniques that were already dominant management technologies with a wide diversity of applications and successes. Motorola and General Electric had been known for the concepts of Six Sigma; Boeing had been managing mass (using modular and flexible assembly options), and Toyota combined elements of these methodologies to create the Toyota Production System. Boeing then took the Toyota model and added computer-aided enforcement of lean methodologies throughout the manufacturing process. One of the major sources for CALM's outgrowth was integrated definition (IDEF) modeling in aerospace manufacturing that was pioneered by the U.S. Air Force in the 1970s. IDEF is a methodology designed to model the end-to-end decisions, actions, and activities of an organization or system so that costs, performance, and cycle times can be optimized. IDEF methods have been adapted for wider use in automotive, aerospace, pharmaceuticals, and software development industries. IDEF methods serve as a starting point to understand lean management through semantic data modeling. The IDEF process begins by mapping the existing functions of an enterprise, creating a graphical model, or road map, that shows what controls each important function, who performs it, what resources are required for carrying it out, what it produces, how much it costs, and what relationships it has to other functions of the organization. IDEF simulations have been found to be efficient at streamlining and modernizing both companies and governmental agencies. Perhaps the best-developed evolution of the IDEF model beyond Toyota was at Boeing. Their project life-cycle process has grown into a rigorous software system that links people, tasks, tools, materials, and the environmental impact of any newly planned project, before any building is allowed to begin. Routinely, more than half of the time for any given project is spent building the precedence diagrams, or three-dimensional process maps, integrating with outside suppliers, and designing the implementation plan–all on the computer. Once real activity is initiated, an action tracker is used to monitor inputs and outputs versus the schedule and delivery metrics in real time throughout the organization. When the execution of a new airplane design begins, it is so well organized that it consistently cuts both costs and build time in half for each successive generation of airframe. Boeing created a complex lean management process called 'define and control airplane configuration/manufacturing resource management' (DCAC/MRM). The process was built with the help of the operations research and computer sciences departments of the University of Pittsburgh. The manufacture of the Boeing 777 was ultimately a success, and it became the precursor to succeeding generations of CALM at Boeing. The methodology of CALM has recently been applied to field orientated infrastructure based businesses with highly interdependent systems, such as electric utilities where a smart grid concept is being researched and developed. The management of infrastructure-based industries like oil, gas, electricity, water, transportation, and renewables requires massive investments in interdependent, physical infrastructure, as well as simultaneous attention to disparate market forces. In infrastructure businesses that manage field assets, uncertainty is the biggest impediment to profitability, rather than the maintenance of efficient supply chains or the management of factory assembly lines. These businesses are dominated by risk from uncertainties such as weather, market variations, transportation disruptions, government actions, logistic difficulties, geology, and asset reliability. CALM has been applied to deal with these types of infrastructure based challenges.

    Read more →
  • NetMiner

    NetMiner

    NetMiner is an all-in-one software platform for analyzing and visualizing complex network data, based on Social Network Analysis (SNA). Originally released in 2001, it supports research and education in a wide range of domains through interactive and visual data exploration. This tool allows researchers to explore their network data visually and interactively, and helps them to detect underlying patterns and structures of the network. It has also been recognized for its comprehensive features and user-friendly interface in comparative reviews of SNA software packages. == Features == === Integrated Data Environment === NetMiner supports unified management of diverse data types—including network (nodes and links), tabular, and unstructured text data—within a single platform. This enables users to perform the entire analysis workflow seamlessly without switching between tools. NetMiner also supports a wide range of analytical methods, allowing users to derive new insights by combining multiple approaches. Analytical results can be saved and reused across workflows(Add to Dataset) Graph and Network Analysis: Includes Centrality, Community Detection, Blockmodeling, and Similarity Measures. Machine learning: Provides algorithms for regression, classification, clustering, ensemble modeling and XAI(Explainable AI) Graph Neural Networks (GNNs): Supports models such as GraphSAGE, GCN, and GAT to learn from both node attributes and graph structure. Natural language processing (NLP): Uses pretrained deep learning models to analyze unstructured text, including named entity recognition and keyword extraction. Text mining and Text network analysis: Supports construction of word co-occurrence networks and topic modeling using LDA, BERTopic, enabling identification of thematic patterns and semantic structures in text data. Data Visualization: Offers advanced network visualization features, supporting multiple layout algorithms. Analytical outcomes such as centrality or community detection can be directly reflected in the network map via node size, color, and position, enhancing intuitive understanding. === AI Assistant === NetMiner integrates with external large language models such as OpenAI GPT and Google Gemini to interpret complex analysis results in natural language, summarize key findings, and suggest next steps for exploration. === Workflow and Usability === Designed to follow the structure of real-world data analysis workflows, NetMiner adopts a hierarchical data organization (Project → Workspace → Dataset → Data Item). Its web-based user interface improves clarity and reduces complexity. NetMiner 5 supports Windows 10 or higher and macOS 11 or later with M1 chip. Both academic and commercial licenses are available. == Extension == NetMiner Extension is small program to extend the functionality of NetMiner. In other words, it enables you to customize NetMiner according to your needs. By adding ‘NetMiner Extension’, you can expand your research. === Web Data Collection === NetMiner allows users to collect data from services such as YouTube, OpenAlex, Springer, and KCI via Open APIs. Collected data is automatically preprocessed and transformed to fit NetMiner’s internal structure, requiring no additional coding or external tools. SNS Data Collector: It collects social media data from YouTube, which has a large number of social media users worldwide. Biblio Data Collector: It collects the bibliographic data from Springer, OpenAlex, and KCI essential for research trend analysis. == File formats == === NetMiner data file format === .NMF === Importable/exportable formats === Plain text data: .TXT, .CSV Microsoft Excel data: .XLS, .XLSX Unstructured text data: .TXT, .CSV, .XLS(X) ※ NetMiner 4 only NetMiner 2 data: .NTF UCINet data: .DL, .DAT Pajek data: .NET, .VEC, .CLU, .PER StOCNET data file: .DAT Graph Modelling Language data: .GML(importing only) Related software UCINET Pajek Gephi StoCNET == Data structure == === Hierarchy of NetMiner data structure === NetMiner 5 supports not only graph data composed of nodes and links, but also tabular and unstructured data without fixed schema or identifiers. This enables users to easily import a wide variety of raw and unstructured data suitable for machine learning applications. Within a single workspace, users can manage node sets, link sets, and structured/unstructured data simultaneously. Multiple graph layers under a node set can be organized in a tree structure, allowing for intuitive understanding of the data currently being analyzed. == Release history == The first version of NetMiner was released on Dec 21, 2001. There have been five major updates from 2001. === NetMiner 5 === Released on June 9, 2025. NetMiner 5 retains the core features and no-code concept of NetMiner 4, but has evolved by integrating cutting-edge AI technologies. AI Assistant, Personal Analytics Tutor Support for Graph, Structured, and Unstructured Data Graph Analytics / Social Network Analysis Machine Learning(M/L) & XAI Graph Machine Learning(GML): Graph Neural Network Text Mining: Natural Language Processing(NLP), Text Network, Topic Modeling Data Visualization === NetMiner 4 (2011) === Latest version is 4.5.1. Introduced Python scripting, encrypted NMF format, semantic analysis tools (word cloud, topic modeling), and Extension - Data Collector. === NetMiner 3 (2007) === Enhanced scalability, integrated analysis-visualization modules, and DB import from Oracle, MS SQL. === NetMiner 2 (2003) === Improved statistical and network measures, visualization algorithms, and external data import modules.

    Read more →
  • Zero-shot learning

    Zero-shot learning

    Zero-shot learning (ZSL) is a problem setup in deep learning where, at test time, a learner observes samples from classes which were not observed during training, and needs to predict the class that they belong to. The name is a play on words based on the earlier concept of one-shot learning, in which classification can be learned from only one, or a few, examples. Zero-shot methods generally work by associating observed and non-observed classes through some form of auxiliary information, which encodes observable distinguishing properties of objects. For example, given a set of images of animals to be classified, along with auxiliary textual descriptions of what animals look like, an artificial intelligence model which has been trained to recognize horses, but has never been given a zebra, can still recognize a zebra when it also knows that zebras look like striped horses. This problem is widely studied in computer vision, natural language processing, and machine perception. == Background and history == The first paper on zero-shot learning in natural language processing appeared in a 2008 paper by Chang, Ratinov, Roth, and Srikumar, at the AAAI'08, but the name given to the learning paradigm there was dataless classification. The first paper on zero-shot learning in computer vision appeared at the same conference, under the name zero-data learning. The term zero-shot learning itself first appeared in the literature in a 2009 paper from Palatucci, Hinton, Pomerleau, and Mitchell at NIPS'09. This terminology was repeated later in another computer vision paper and the term zero-shot learning caught on, as a take-off on one-shot learning that was introduced in computer vision years earlier. In computer vision, zero-shot learning models learned parameters for seen classes along with their class representations and rely on representational similarity among class labels so that, during inference, instances can be classified into new classes. In natural language processing, the key technical direction developed builds on the ability to "understand the labels"—represent the labels in the same semantic space as that of the documents to be classified. This supports the classification of a single example without observing any annotated data, the purest form of zero-shot classification. The original paper made use of the Explicit Semantic Analysis (ESA) representation but later papers made use of other representations, including dense representations. This approach was also extended to multilingual domains, fine entity typing and other problems. Moreover, beyond relying solely on representations, the computational approach has been extended to depend on transfer from other tasks, such as textual entailment and question answering. The original paper also points out that, beyond the ability to classify a single example, when a collection of examples is given, with the assumption that they come from the same distribution, it is possible to bootstrap the performance in a semi-supervised like manner (or transductive learning). Unlike standard generalization in machine learning, where classifiers are expected to correctly classify new samples to classes they have already observed during training, in ZSL, no samples from the classes have been given during training the classifier. It can therefore be viewed as an extreme case of domain adaptation. == Prerequisite information for zero-shot classes == Naturally, some form of auxiliary information has to be given about these zero-shot classes, and this type of information can be of several types. Learning with attributes: classes are accompanied by pre-defined structured description. For example, for bird descriptions, this could include "red head", "long beak". These attributes are often organized in a structured compositional way, and taking that structure into account improves learning. While this approach was used mostly in computer vision, there are some examples for it also in natural language processing. Learning from textual description. As pointed out above, this has been the key direction pursued in natural language processing. Here class labels are taken to have a meaning and are often augmented with definitions or free-text natural-language description. This could include for example a wikipedia description of the class. Class-class similarity. Here, classes are embedded in a continuous space. A zero-shot classifier can predict that a sample corresponds to some position in that space, and the nearest embedded class is used as a predicted class, even if no such samples were observed during training. == Generalized zero-shot learning == The above ZSL setup assumes that at test time, only zero-shot samples are given, namely, samples from new unseen classes. In generalized zero-shot learning, samples from both new and known classes, may appear at test time. This poses new challenges for classifiers at test time, because it is very challenging to estimate if a given sample is new or known. Some approaches to handle this include: a gating module, which is first trained to decide if a given sample comes from a new class or from an old one, and then, at inference time, outputs either a hard decision, or a soft probabilistic decision a generative module, which is trained to generate feature representation of the unseen classes—a standard classifier can then be trained on samples from all classes, seen and unseen. == Domains of application == Zero shot learning has been applied to the following fields: image classification semantic segmentation image generation object detection natural language processing computational biology abstract reasoning

    Read more →