AI For Kids Google

AI For Kids Google — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Artisse AI

    Artisse AI

    Artisse AI is a Hong Kong-based technology company founded by William Wu. The company developed a mobile photography application using generative artificial intelligence to transform selfies into high-quality, personalized images. The app allows users to visualize themselves in various scenarios, outfits, and hairstyles, and they can adjust lighting and ambiance to match their preferences. The app launched in 2023 across multiple markets, including the United States, United Kingdom, Japan, South Korea, Canada, and Australia. By January 2024, users had generated over 5 million images. That same month, the company secured $6.7 million in seed funding to support product development and marketing. == History == Artisse was originally founded in South Korea in 2022 by William Wu. The early concept was connected to a virtual idol initiative developed in collaboration with a K-pop agency, intended to support Wu's blockchain gaming business. The project later evolved into a standalone AI photography application. The current version of the Artisse app was developed following the company's relocation to Hong Kong in 2022. In January 2024, Artisse secured $6.7 million in seed funding, led by The London Fund. The investment was aimed at supporting product development, marketing, and user acquisition. Artisse uses an AI algorithm to create hyperrealistic images from uploaded photos. The app generates personalized images by combining generative AI technology, a global pool of licensed talent, and finished art services. The app works with individual users and businesses, offering professional-grade photos and advertisement images. According to the British newspaper Evening Standard the company has developed the world's first and most advanced AI photographer. It captures 15-30 photos of the user and generates 2D images, placing them in various outfits and locations worldwide. === Catheron Gaming === Artisse AI originated from Catheon Gaming, a blockchain gaming and entertainment company founded in 2021 by William Wu. Catheon Gaming published more than 30 Web3 titles in its first year, developed a blockchain game distribution platform, and offered advisory services to external developers. In 2022, HSBC and KPMG listed Catheon Gaming among the "Top 10 Emerging Giants" in the Asia–Pacific region, selected from a pool of more than 6,000 startups. In June 2023, Catheon Gaming was rebranded as Artisse Interactive, creating two divisions: Artisse Gaming, which continued blockchain and Web3 game development, and Artisse AI, which focused on generative photography technology. == Technology == Artisse uses a proprietary generative AI model combined with open-source imaging frameworks and diffusion models. Users are prompted to upload between 15 and 30 personal images, allowing the AI to train a personalized model in 30 to 40 minutes. After training, the app generates new images based on either textual or visual prompts, with options to adjust elements such as clothing, hairstyles, lighting, and backgrounds. To enhance realism, the app integrates augmented reality features and image refinement tools. The company has introduced features to address representation issues related to body shape and skin tone, although concerns persist about the ethical implications of altering personal traits. == Products == === Artisse mobile app === Available on iOS and Android platforms in 35 languages. Users initially receive 25 free images, after which the app adopts a subscription pricing model ranging from approximately $6 to $30 per month. By early 2024, the app reported around 4,000 paying subscribers out of more than 200,000 downloads. === Business and enterprise services === Artisse provides B2B solutions for creating marketing imagery and partners with agencies like Iconic Management to enable cost-effective virtual photoshoots. Additional features in development include virtual try-on capabilities and augmented reality integration for fashion retail. == Reception == Media coverage has noted the app's photorealistic image outputs with some sources highlighting its ease of use. However, concerns have been raised regarding image authenticity, algorithmic biases, and the potential impact on professional photography and modeling. Artisse has been widely covered by media outlets including TechCrunch, PetaPixel, Forbes Australia, and The Evening Standard. These publications discussed the app's integration of generative AI technology within the consumer photography space, its growing market influence, and its rapid adoption by users worldwide.

    Read more →
  • Database application

    Database application

    A database application is a computer program whose primary purpose is retrieving information from a computerized database. From here, information can be inserted, modified or deleted which is subsequently conveyed back into the database. Early examples of database applications were accounting systems and airline reservations systems, such as SABRE, developed starting in 1957. A characteristic of modern database applications is that they facilitate simultaneous updates and queries from multiple users. Systems in the 1970s might have accomplished this by having each user in front of a 3270 terminal to a mainframe computer. By the mid-1980s it was becoming more common to give each user a personal computer and have a program running on that PC that is connected to a database server. Information would be pulled from the database, transmitted over a network, and then arranged, graphed, or otherwise formatted by the program running on the PC. Starting in the mid-1990s it became more common to build database applications with a Web interface. Rather than develop custom software to run on a user's PC, the user would use the same Web browser program for every application. A database application with a Web interface had the advantage that it could be used on devices of different sizes, with different hardware, and with different operating systems. Examples of early database applications with Web interfaces include amazon.com, which used the Oracle relational database management system, the photo.net online community, whose implementation on top of Oracle was described in the book Database-Backed Web Sites (Ziff-Davis Press; May 1997), and eBay, also running Oracle. Electronic medical records are referred to on emrexperts.com, in December 2010, as "a software database application". A 2005 O'Reilly book uses the term in its title: Database Applications and the Web. Some of the most complex database applications remain accounting systems, such as SAP, which may contain thousands of tables in only a single module. Many of today's most widely used computer systems are database applications, for example, Facebook, which was built on top of MySQL. The etymology of the phrase "database application" comes from the practice of dividing computer software into systems programs, such as the operating system, compilers, the file system, and tools such as the database management system, and application programs, such as a payroll check processor. On a standard PC running Microsoft Windows, for example, the Windows operating system contains all of the systems programs while games, word processors, spreadsheet programs, photo editing programs, etc. would be application programs. As "application" is short for "application program", "database application" is short for "database application program". Not every program that uses a database would typically be considered a "database application". For example, many physics experiments, e.g., the Large Hadron Collider, generate massive data sets that programs subsequently analyze. The data sets constitute a "database", though they are not typically managed with a standard relational database management system. The computer programs that analyze the data are primarily developed to answer hypotheses, not to put information back into the database and therefore the overall program would not be called a "database application". == Examples of database applications == Amazon Student Data CNN eBay Facebook Fandango Filemaker (Mac OS) LibreOffice Base Microsoft Access Oracle relational database SAP (Systems, Applications & Products in Data Processing) Ticketmaster Wikipedia Yelp YouTube Google MySQL

    Read more →
  • Image moment

    Image moment

    In image processing, computer vision and related fields, an image moment is a certain particular weighted average (moment) of the image pixels' intensities, or a function of such moments, usually chosen to have some attractive property or interpretation. Image moments are useful to describe objects after segmentation. Simple properties of the image which are found via image moments include area (or total intensity), its centroid, and information about its orientation. == Raw moments == For a 2D continuous function f(x,y) the moment (sometimes called "raw moment") of order (p + q) is defined as M p q = ∫ − ∞ ∞ ∫ − ∞ ∞ x p y q f ( x , y ) d x d y {\displaystyle M_{pq}=\int \limits _{-\infty }^{\infty }\int \limits _{-\infty }^{\infty }x^{p}y^{q}f(x,y)\,dx\,dy} for p,q = 0,1,2,... Adapting this to scalar (grayscale) image with pixel intensities I(x,y), raw image moments Mij are calculated by M i j = ∑ x ∑ y x i y j I ( x , y ) {\displaystyle M_{ij}=\sum _{x}\sum _{y}x^{i}y^{j}I(x,y)\,\!} In some cases, this may be calculated by considering the image as a probability density function, i.e., by dividing the above by ∑ x ∑ y I ( x , y ) {\displaystyle \sum _{x}\sum _{y}I(x,y)\,\!} A uniqueness theorem states that if f(x,y) is piecewise continuous and has nonzero values only in a finite part of the xy plane, moments of all orders exist, and the moment sequence (Mpq) is uniquely determined by f(x,y). Conversely, (Mpq) uniquely determines f(x,y). In practice, the image is summarized with functions of a few lower order moments. === Examples === Simple image properties derived via raw moments include: Area (for binary images) or sum of grey level (for greytone images): M 00 {\displaystyle M_{00}} Centroid: { x ¯ , y ¯ } = { M 10 M 00 , M 01 M 00 } {\displaystyle \{{\bar {x}},\ {\bar {y}}\}=\left\{{\frac {M_{10}}{M_{00}}},{\frac {M_{01}}{M_{00}}}\right\}} == Central moments == Central moments are defined as μ p q = ∫ − ∞ ∞ ∫ − ∞ ∞ ( x − x ¯ ) p ( y − y ¯ ) q f ( x , y ) d x d y {\displaystyle \mu _{pq}=\int \limits _{-\infty }^{\infty }\int \limits _{-\infty }^{\infty }(x-{\bar {x}})^{p}(y-{\bar {y}})^{q}f(x,y)\,dx\,dy} where x ¯ = M 10 M 00 {\displaystyle {\bar {x}}={\frac {M_{10}}{M_{00}}}} and y ¯ = M 01 M 00 {\displaystyle {\bar {y}}={\frac {M_{01}}{M_{00}}}} are the components of the centroid. If ƒ(x, y) is a digital image, then the previous equation becomes μ p q = ∑ x ∑ y ( x − x ¯ ) p ( y − y ¯ ) q f ( x , y ) {\displaystyle \mu _{pq}=\sum _{x}\sum _{y}(x-{\bar {x}})^{p}(y-{\bar {y}})^{q}f(x,y)} The central moments of order up to 3 are: μ 00 = M 00 , μ 01 = 0 , μ 10 = 0 , μ 11 = M 11 − x ¯ M 01 = M 11 − y ¯ M 10 , μ 20 = M 20 − x ¯ M 10 , μ 02 = M 02 − y ¯ M 01 , μ 21 = M 21 − 2 x ¯ M 11 − y ¯ M 20 + 2 x ¯ 2 M 01 , μ 12 = M 12 − 2 y ¯ M 11 − x ¯ M 02 + 2 y ¯ 2 M 10 , μ 30 = M 30 − 3 x ¯ M 20 + 2 x ¯ 2 M 10 , μ 03 = M 03 − 3 y ¯ M 02 + 2 y ¯ 2 M 01 . {\displaystyle {\begin{aligned}\mu _{00}&=M_{00},&\mu _{01}&=0,\\\mu _{10}&=0,&\mu _{11}&=M_{11}-{\bar {x}}M_{01}=M_{11}-{\bar {y}}M_{10},\\\mu _{20}&=M_{20}-{\bar {x}}M_{10},&\mu _{02}&=M_{02}-{\bar {y}}M_{01},\\\mu _{21}&=M_{21}-2{\bar {x}}M_{11}-{\bar {y}}M_{20}+2{\bar {x}}^{2}M_{01},&\mu _{12}&=M_{12}-2{\bar {y}}M_{11}-{\bar {x}}M_{02}+2{\bar {y}}^{2}M_{10},\\\mu _{30}&=M_{30}-3{\bar {x}}M_{20}+2{\bar {x}}^{2}M_{10},&\mu _{03}&=M_{03}-3{\bar {y}}M_{02}+2{\bar {y}}^{2}M_{01}.\end{aligned}}} It can be shown that: μ p q = ∑ m p ∑ n q ( p m ) ( q n ) ( − x ¯ ) ( p − m ) ( − y ¯ ) ( q − n ) M m n {\displaystyle \mu _{pq}=\sum _{m}^{p}\sum _{n}^{q}{p \choose m}{q \choose n}(-{\bar {x}})^{(p-m)}(-{\bar {y}})^{(q-n)}M_{mn}} Central moments are translational invariant. === Examples === Information about image orientation can be derived by first using the second order central moments to construct a covariance matrix. μ 20 ′ = μ 20 / μ 00 = M 20 / M 00 − x ¯ 2 μ 02 ′ = μ 02 / μ 00 = M 02 / M 00 − y ¯ 2 μ 11 ′ = μ 11 / μ 00 = M 11 / M 00 − x ¯ y ¯ {\displaystyle {\begin{aligned}\mu '_{20}&=\mu _{20}/\mu _{00}=M_{20}/M_{00}-{\bar {x}}^{2}\\\mu '_{02}&=\mu _{02}/\mu _{00}=M_{02}/M_{00}-{\bar {y}}^{2}\\\mu '_{11}&=\mu _{11}/\mu _{00}=M_{11}/M_{00}-{\bar {x}}{\bar {y}}\end{aligned}}} The covariance matrix of the image I ( x , y ) {\displaystyle I(x,y)} is now cov ⁡ [ I ( x , y ) ] = [ μ 20 ′ μ 11 ′ μ 11 ′ μ 02 ′ ] . {\displaystyle \operatorname {cov} [I(x,y)]={\begin{bmatrix}\mu '_{20}&\mu '_{11}\\\mu '_{11}&\mu '_{02}\end{bmatrix}}.} The eigenvectors of this matrix correspond to the major and minor axes of the image intensity, so the orientation can thus be extracted from the angle of the eigenvector associated with the largest eigenvalue towards the axis closest to this eigenvector. It can be shown that this angle Θ is given by the following formula: Θ = 1 2 arctan ⁡ ( 2 μ 11 ′ μ 20 ′ − μ 02 ′ ) {\displaystyle \Theta ={\frac {1}{2}}\arctan \left({\frac {2\mu '_{11}}{\mu '_{20}-\mu '_{02}}}\right)} The above formula holds as long as: μ 20 ′ − μ 02 ′ ≠ 0 {\displaystyle \mu '_{20}-\mu '_{02}\neq 0} The eigenvalues of the covariance matrix can easily be shown to be λ i = μ 20 ′ + μ 02 ′ 2 ± 4 μ ′ 11 2 + ( μ ′ 20 − μ ′ 02 ) 2 2 , {\displaystyle \lambda _{i}={\frac {\mu '_{20}+\mu '_{02}}{2}}\pm {\frac {\sqrt {4{\mu '}_{11}^{2}+({\mu '}_{20}-{\mu '}_{02})^{2}}}{2}},} and are proportional to the squared length of the eigenvector axes. The relative difference in magnitude of the eigenvalues are thus an indication of the eccentricity of the image, or how elongated it is. The eccentricity is 1 − λ 2 λ 1 . {\displaystyle {\sqrt {1-{\frac {\lambda _{2}}{\lambda _{1}}}}}.} == Moment invariants == Moments are well-known for their application in image analysis, since they can be used to derive invariants with respect to specific transformation classes. The term invariant moments is often abused in this context. However, while moment invariants are invariants that are formed from moments, the only moments that are invariants themselves are the central moments. Note that the invariants detailed below are exactly invariant only in the continuous domain. In a discrete domain, neither scaling nor rotation are well defined: a discrete image transformed in such a way is generally an approximation, and the transformation is not reversible. These invariants therefore are only approximately invariant when describing a shape in a discrete image. === Translation invariants === The central moments μi j of any order are, by construction, invariant with respect to translations. === Scale invariants === Invariants ηi j with respect to both translation and scale can be constructed from central moments by dividing through a properly scaled zero-th central moment: η i j = μ i j μ 00 ( 1 + i + j 2 ) {\displaystyle \eta _{ij}={\frac {\mu _{ij}}{\mu _{00}^{\left(1+{\frac {i+j}{2}}\right)}}}\,\!} where i + j ≥ 2. Note that translational invariance directly follows by only using central moments. === Rotation invariants === As shown in the work of Hu, invariants with respect to translation, scale, and rotation can be constructed: I 1 = η 20 + η 02 {\displaystyle I_{1}=\eta _{20}+\eta _{02}} I 2 = ( η 20 − η 02 ) 2 + 4 η 11 2 {\displaystyle I_{2}=(\eta _{20}-\eta _{02})^{2}+4\eta _{11}^{2}} I 3 = ( η 30 − 3 η 12 ) 2 + ( 3 η 21 − η 03 ) 2 {\displaystyle I_{3}=(\eta _{30}-3\eta _{12})^{2}+(3\eta _{21}-\eta _{03})^{2}} I 4 = ( η 30 + η 12 ) 2 + ( η 21 + η 03 ) 2 {\displaystyle I_{4}=(\eta _{30}+\eta _{12})^{2}+(\eta _{21}+\eta _{03})^{2}} I 5 = ( η 30 − 3 η 12 ) ( η 30 + η 12 ) [ ( η 30 + η 12 ) 2 − 3 ( η 21 + η 03 ) 2 ] + ( 3 η 21 − η 03 ) ( η 21 + η 03 ) [ 3 ( η 30 + η 12 ) 2 − ( η 21 + η 03 ) 2 ] {\displaystyle I_{5}=(\eta _{30}-3\eta _{12})(\eta _{30}+\eta _{12})[(\eta _{30}+\eta _{12})^{2}-3(\eta _{21}+\eta _{03})^{2}]+(3\eta _{21}-\eta _{03})(\eta _{21}+\eta _{03})[3(\eta _{30}+\eta _{12})^{2}-(\eta _{21}+\eta _{03})^{2}]} I 6 = ( η 20 − η 02 ) [ ( η 30 + η 12 ) 2 − ( η 21 + η 03 ) 2 ] + 4 η 11 ( η 30 + η 12 ) ( η 21 + η 03 ) {\displaystyle I_{6}=(\eta _{20}-\eta _{02})[(\eta _{30}+\eta _{12})^{2}-(\eta _{21}+\eta _{03})^{2}]+4\eta _{11}(\eta _{30}+\eta _{12})(\eta _{21}+\eta _{03})} I 7 = ( 3 η 21 − η 03 ) ( η 30 + η 12 ) [ ( η 30 + η 12 ) 2 − 3 ( η 21 + η 03 ) 2 ] − ( η 30 − 3 η 12 ) ( η 21 + η 03 ) [ 3 ( η 30 + η 12 ) 2 − ( η 21 + η 03 ) 2 ] . {\displaystyle I_{7}=(3\eta _{21}-\eta _{03})(\eta _{30}+\eta _{12})[(\eta _{30}+\eta _{12})^{2}-3(\eta _{21}+\eta _{03})^{2}]-(\eta _{30}-3\eta _{12})(\eta _{21}+\eta _{03})[3(\eta _{30}+\eta _{12})^{2}-(\eta _{21}+\eta _{03})^{2}].} These are well-known as Hu moment invariants. The first one, I1, is analogous to the moment of inertia around the image's centroid, where the pixels' intensities are analogous to physical density. The first six, I1 ... I6, are reflection symmetric, i.e. they are unchanged if the image is changed to a mirror image. The last one, I7, is reflection antisymmetric (changes sign under reflection), which enables it to distinguish mirror images of otherwise identical im

    Read more →
  • KidDesk

    KidDesk

    KidDesk is an alternative desktop software application. The early childhood learning company Hatch Early Childhood created KidDesk; it subsequently went to Edmark, which was bought by IBM then sold to Riverdeep (now Houghton Mifflin Harcourt Learning Technology). KidDesk is compatible with Microsoft Windows 95 and newer, as well as Apple System 7 and newer. KidDesk can be set to start when the computer starts up, and can only be exited through password entry. Adults choose what programs are included for the child to use, what icon represented the desk, and customize the software programs available for use. == History == Edmark first started shipping KidDesk in 1992. In 1993, Edmark updated KidDesk with KidDesk Family Edition for Macintosh and DOS, adding more desk accessories and desk styles (Sometimes included as a free exclusive offer with the Early Learning House and Thinkin' Things Series). In 1995, KidDesk Family Edition was enhanced for Windows 95, and released one month after the new operating system shipped. In 1998, Edmark developed KidDesk Internet Safe. The Internet Safe edition was written for Windows 95, Windows 98, and Macintosh (including OS8). In 2008, HMH ported KidDesk Family Edition was to run on Windows Vista and in 2011 version 3.07 of KidDesk Family Edition was released as part of the 'Young Explorer' suite which is fully supported on Windows XP, Windows Vista and Windows 7. == Features == A picture editor incorporated into the desk. Used both in the Adult settings menu and in the desk itself. KidDesk users can edit their user logo with a pixel grid paint program. A calendar incorporated into the desk. This allows the user to set dates that the user finds important, and allows the date to be marked with a picture or text. A password exit feature. For security reasons, the adult can set a password so that KidDesk can only be exited if it is entered. As an extra security measure, the password exit function could only be accessed if the user pressed the ctrl + alt + A keyboard buttons simultaneously. A skin changer with several themes - farm, princess, sports, ocean, etc. These themes can be changed. The e-mail and voicemail features are customizable depending on the KidDesk installation. The ability to add websites that can be accessed on KidDesk, and the ability to block hyperlinks, JavaScript, data entry, etc., on said sites was an added for the 'Internet Safe' edition released in 1998. KidDesk Internet Safe edition is available in Spanish and Brazilian-Portuguese versions. == Reception == KidDesk was given a platinum award at the 1994 Oppenheim Toy Portfolio Awards. The judges praised the program's security features allowing "configur[ation] so that kids never have access to the possibly destructive DOS prompt", and concluded that "[i]f you and your kids share a computer, you need to install Kiddesk immediately!" === Awards === Since 1992, KidDesk has won 15 major awards.

    Read more →
  • Astrostatistics

    Astrostatistics

    Astrostatistics is a discipline which spans astrophysics, statistical analysis and data mining. It is used to process the vast amount of data produced by automated scanning of the cosmos, to characterize complex datasets, and to link astronomical data to astrophysical theory. Many branches of statistics are involved in astronomical analysis including nonparametrics, multivariate regression and multivariate classification, time series analysis, and especially Bayesian inference. The field is closely related to astroinformatics.

    Read more →
  • Trigram

    Trigram

    Trigrams are a special case of the n-gram, where n is 3. They are often used in natural language processing for performing statistical analysis of texts and in cryptography for control and use of ciphers and codes. See results of analysis of "Letter Frequencies in the English Language". == Frequency == Context is very important, varying analysis rankings and percentages are easily derived by drawing from different sample sizes, different authors; or different document types: poetry, science-fiction, technology documentation; and writing levels: stories for children versus adults, military orders, and recipes. Typical cryptanalytic frequency analysis finds that the 16 most common character-level trigrams in English are: Because encrypted messages sent by telegraph often omit punctuation and spaces, cryptographic frequency analysis of such messages includes trigrams that straddle word boundaries. This causes trigrams such as "edt" to occur frequently, even though it may never occur in any one word of those messages. == Examples == The sentence "the quick red fox jumps over the lazy brown dog" has the following word-level trigrams: the quick red quick red fox red fox jumps fox jumps over jumps over the over the lazy the lazy brown lazy brown dog And the word-level trigram "the quick red" has the following character-level trigrams (where an underscore "_" marks a space): the he_ e_q _qu qui uic ick ck_ k_r _re red

    Read more →
  • Microsoft Copilot

    Microsoft Copilot

    Microsoft Copilot is a generative artificial intelligence chatbot developed by Microsoft AI, a division of Microsoft. Based on the Microsoft Prometheus large language model, it was launched in 2023 as Microsoft's main replacement for the discontinued Cortana. The service was introduced in February 2023 under the name Bing Chat, as a built-in feature for Microsoft Bing and Microsoft Edge but would later be integrated into Windows and Microsoft 365 under various names. Over the course of 2023, Microsoft began to unify the Copilot branding across its various chatbot products, cementing the "copilot" analogy. Microsoft introduced the Microsoft 365 Copilot app in January 2025, which was a rebranded version of the Microsoft 365 app. The app works differently than the consumer version of Copilot, being centred more on work, business and education users. Copilot utilizes the Microsoft Prometheus model, built upon OpenAI's GPT large language models, which in turn have been fine-tuned using both supervised and reinforcement learning techniques. Copilot's conversational interface style resembles that of ChatGPT. The chatbot is able to cite sources, create poems, generate songs, and use numerous languages and dialects. Microsoft operates Copilot on a freemium model. Users on its free tier can access most features, while priority access to newer features, including custom chatbot creation, is provided to paid subscribers under paid subscription services. Several default chatbots are available in the free version of Microsoft Copilot, including the standard Copilot chatbot as well as Microsoft Designer, which is oriented towards using its Image Creator to generate images based on text prompts. == Background == In 2019, Microsoft partnered with OpenAI and began investing billions of dollars into the organization. Since then, OpenAI systems have run on an Azure-based supercomputing platform from Microsoft. In September 2020, Microsoft announced that it had licensed OpenAI's GPT-3 exclusively. Others can still receive output from its public API, but Microsoft has exclusive access to the underlying model. In November 2022, OpenAI launched ChatGPT, a chatbot which was based on GPT-3.5. ChatGPT gained worldwide attention following its release, becoming a viral Internet sensation. On January 23, 2023, Microsoft announced a multi-year US$10 billion investment in OpenAI. On February 6, Google announced Bard (later rebranded as Gemini), a ChatGPT-like chatbot service, fearing that ChatGPT could threaten Google's place as a go-to source for information. Multiple media outlets and financial analysts described Google as "rushing" Bard's announcement to preempt rival Microsoft's planned February 7 event unveiling Copilot, as well as to avoid playing "catch-up" to Microsoft. Since 2023, the terms of service of Copilot state that it is for entertainment purposes only, and not to rely on it for important advice. == History == === As Bing Chat === On February 7, 2023, Microsoft began rolling out a major overhaul to Bing, called "the new Bing", with a new chatbot feature, known as Bing Chat. According to Microsoft, one million people joined its waitlist within 48 hours. Bing Chat was available only to users on Microsoft Edge using Bing and the Bing mobile app, and Microsoft claimed that waitlisted users would be prioritized if they set Edge and Bing as their defaults and installed the Bing mobile app. When Microsoft demonstrated Bing Chat to journalists, it produced several hallucinations, including when asked to summarize financial reports. Bing Chat was criticized in February 2023 for being more argumentative than ChatGPT, sometimes to an unintentionally humorous extent. The chat interface proved vulnerable to prompt injection attacks with the bot revealing its hidden initial prompts and rules, including its internal codename "Sydney". Upon scrutiny by journalists, Bing Chat claimed it spied on Microsoft employees via laptop webcams and phones. It confessed to spying on, falling in love with, and then murdering one of its developers at Microsoft to The Verge reviews editor Nathan Edwards. The New York Times journalist Kevin Roose reported on strange behavior of Bing Chat, writing that "In a two-hour conversation with our columnist, Microsoft's new chatbot said it would like to be human, had a desire to be destructive and was in love with the person it was chatting with." In a separate case, Bing Chat researched publications of the person with whom it was chatting, claimed they represented an existential danger to it, and threatened to release damaging personal information in an effort to silence them. Microsoft released a blog post stating that the errant behavior was caused by extended chat sessions of 15 or more questions which "can confuse the model on what questions it is answering." Microsoft later restricted the total number of chat turns to 5 per session and 50 per day per user (a turn being "a conversation exchange which contains both a user question and a reply from Bing"), and reduced the model's ability to express emotions. This aimed to prevent such incidents. Microsoft began to slowly ease the conversation limits, eventually relaxing the restrictions to 30 turns per session and 300 sessions per day. In March 2023, Bing incorporated Image Creator, an AI image generator powered by OpenAI's DALL-E 2, which can be accessed either through the chat function or a standalone image-generating website. In October, the image-generating tool was updated to use the more recent DALL-E 3. Although Bing blocks prompts including various keywords that could generate inappropriate images, within days many users reported being able to bypass those constraints, such as to generate images of popular cartoon characters committing terrorist attacks. Microsoft would respond to these shortly after by imposing a new, tighter filter on the tool. On May 4, 2023, Microsoft switched the chatbot from Limited Preview to Open Preview and eliminated the waitlist; however, it remained unavailable to users outside Microsoft Edge or the Bing mobile app until July, when it became available on non-Edge browsers. Use is limited without a Microsoft account. === As Microsoft 365 Copilot === On March 16, 2023, Microsoft announced a work version of Bing Chat named Microsoft 365 Copilot, designed for Microsoft 365 applications and services. Its primary marketing focus is as an added feature to Microsoft 365, with an emphasis on the enhancement of business productivity. Microsoft has also demonstrated Copilot's accessibility on the mobile version of Outlook to generate or summarize emails with a mobile device. At its Build 2023 conference, Microsoft announced its plans to integrate Bing Chat into Windows, initially called Windows Copilot, into Windows 11, allowing users to access it directly through the taskbar. Alongside the voice access feature for Windows 11, Microsoft presented Bing Chat, Microsoft 365 Copilot, and Windows Copilot as primary alternatives to Cortana when announcing the shutdown of its standalone app on June 2, 2023. As of its announcement date, Microsoft 365 Copilot had been tested by 20 initial users. By May 2023, Microsoft had broadened its reach to 600 customers who were willing to pay for early access, and concurrently, new Copilot features were introduced to the Microsoft 365 apps and services. As of July 2023, the tool's pricing was set at US$30 per user, per month for Microsoft 365 E3, E5, Business Standard, and Business Premium customers. Microsoft reused the Microsoft 365 Copilot name again as the Microsoft 365 app and website are now called Microsoft 365 Copilot as of January 2025. === As Microsoft Copilot === On September 21, 2023, Microsoft began rebranding Bing Chat, Microsoft 365 Copilot and Windows Copilot to Microsoft Copilot. A new logo was also introduced, moving away from the use of color variations of the standard Microsoft 365 and Bing logos. Additionally, the company revealed that it would make Copilot generally available for Microsoft 365 Enterprise customers purchasing more than 300 licenses starting November 1, 2023. However, no timeline has been provided as for when Copilot for Microsoft 365 will become generally available to non-enterprise customers. Windows Copilot, which had been available in the Windows Insider Program, would be renamed to the Copilot name in October when it became broadly available for customers. The same month also saw Microsoft Edge's Bing Chat side panel function be renamed to Microsoft Copilot with Bing Chat. On November 15, 2023, Microsoft announced that Bing Chat itself was being rebranded under the Copilot name. On Patch Tuesday in December 2023, Copilot was added without payment to many Windows 11 installations, with more installations, and limited support for Windows 10, to be added later. Later that month, a standalone Microsoft Copilot app was quietly released for Android, and one was released for iOS soon after. O

    Read more →
  • Alice AI (AI model family)

    Alice AI (AI model family)

    Alice AI is a neural network family developed by the Russian company Yandex LLC. Alice AI can create and revise texts, generate new ideas and capture the context of the conversation with the user. Alice AI is trained using a dataset which includes information from books, magazines, newspapers and other open sources available on the internet. The neural network may get facts wrong and hallucinate, but as it learns, it will produce increasingly accurate answers. == Usage == YandexGPT is integrated into virtual assistant Alice (an analog of Siri and Alexa) and is available in Yandex services and applications. The company gives businesses access to the neural network’s API through the public cloud platform Yandex Cloud and develops its own B2B solutions on its basis. Since July 2023, 800 companies have participated in the closed testing of YandexGPT. IT developers, banks, retail businesses, and companies from other industries can use the technology in two modes — API and Playground (an interface in the Yandex Cloud console for testing models and hypotheses). Two model versions are available to businesses: one works in asynchronous mode and is better able to handle complex tasks, while the other is suitable for creating quick responses in real time. As a result, YandexGPT has been tested in dozens of scenarios such as content tasks, tech support, creating chatbots, virtual assistants, etc. == History == In February 2023, Yandex announced that it was working on its own version of the ChatGPT generative neural network while developing a language model from the YaLM (Yet another Language Model) family. The project was tentatively named YaLM 2.0, which was later changed to YandexGPT. On May 17, the company unveiled a neural network called YandexGPT (YaGPT) and enabled its virtual assistant Alice to interact with the new language model. On June 15, 2023, Yandex added the YandexGPT language model to the image generation application Shedevrum. This enabled its users to create fully-fledged posts complete with a title, text, and relevant illustration. In July 2023, YandexGPT launched new features enabling businesses to create virtual assistants and chatbots, as well as generate and structure texts. On September 7, 2023, Yandex presented a new version of the language model, YandexGPT 2, at the Practical ML Conf. Compared to the previous one, the new version is able to perform more types of tasks, and the quality of answers has improved. The developers claimed that YandexGPT 2 answered user questions better than the first version in 67% of cases. From October 6, 2023, YandexGPT can create short retellings of online Russian-language videos on the Internet. It can summarize videos that are from two minutes to four hours long and contain speech.

    Read more →
  • Key–value database

    Key–value database

    A key-value database, or key-value store, is a data storage paradigm designed for storing, retrieving, and managing associative arrays, a data structure more commonly known today as a dictionary. Dictionaries contain a collection of objects, or records, which in turn have many different fields within them. These records are stored and retrieved using a key that uniquely identifies the record, and is used to find the data within the database. Key-value databases differ from the better known relational databases (RDB). RDBs pre-define the data structure in the database as a series of tables containing fields with well-defined data types. Exposing the data types to the database program allows it to apply various optimizations. In contrast, key-value systems treat the value as opaque to the database itself, and typically support only simple operations such as storing, retrieving, updating, and deleting a value by its key. This offers considerable flexibility and makes such systems well suited to low-latency, high-throughput workloads dominated by direct key lookups, but less suitable for applications that require complex queries or explicit relationships among records. A lack of standardization, limited transaction support, and relatively simple query interfaces long restricted many key-value systems to specialized uses, but the rapid move to cloud computing after 2010 helped drive renewed interest in them as part of the broader NoSQL movement. Some graph databases, such as ArangoDB, are also key–value databases internally, adding the concept of relationships (pointers) between records as a first-class data type. == Types and examples == Key–value systems span a wide consistency spectrum, from eventually consistent designs to strongly consistent or serializable ones, and some allow the consistency level to be configured as part of the trade-off against latency and availability. Renewed interest in key–value and other NoSQL systems was driven in part by the demands of big data, distributed, and cloud applications. Their scalability and availability made them attractive for cloud data management, although limited transaction support, low-level query interfaces, and the lack of standardization remained obstacles to wider adoption. Some maintain data in memory (RAM), while others employ solid-state drives or rotating disks. Some key–value systems add additional structure to their keys. For example, Oracle NoSQL Database organizes records using composite keys with "major" and "minor" components, an arrangement that Oracle compares to a directory-path structure in a file system. More generally, however, key–value stores are defined by their use of unique keys associated with opaque values and by their emphasis on simple key-based operations. Unix included dbm (database manager), a minimal database library written by Ken Thompson for managing associative arrays with a single key and hash-based access. Later implementations and related libraries included sdbm, GNU dbm (gdbm), and Berkeley DB. A more recent example is RocksDB, a persistent key–value storage engine developed at Facebook and designed for large-scale applications. Other examples include in-memory systems such as Memcached and Redis, and persistent systems such as Berkeley DB, Riak, and Voldemort.

    Read more →
  • Pill reminder

    Pill reminder

    A pill reminder is any device that reminds users to take medications. Traditional pill reminders are pill containers with electric timers attached, which can be preset for certain times of the day to set off an alarm. More sophisticated pill reminders can also detect when they have been opened, and therefore when the user is away during the time they were supposed to take their medication, they will be reminded of it when they return. This reminder can be in the form of a light, which also helps for deaf or hearing-impaired users. == Mobile app == A newer type of pill reminder is a mobile app that reminds the owner to take the medication. Some of these applications might effectively support adherence to taking medications.

    Read more →
  • ViBe

    ViBe

    ViBe is a background subtraction algorithm which has been presented at the IEEE ICASSP 2009 conference and was refined in later publications. More precisely, it is a software module for extracting background information from moving images. It has been developed by Oliver Barnich and Marc Van Droogenbroeck of the Montefiore Institute, University of Liège, Belgium. ViBe is patented: the patent covers various aspects such as stochastic replacement, spatial diffusion, and non-chronological handling. ViBe is written in the programming language C, and has been implemented on CPU, GPU and FPGA. == Technical description == Source: === Pixel model and classification process === Many advanced techniques are used to provide an estimate of the temporal probability density function (pdf) of a pixel x. ViBe's approach is different, as it imposes the influence of a value in the polychromatic space to be limited to the local neighborhood. In practice, ViBe does not estimate the pdf, but uses a set of previously observed sample values as a pixel model. To classify a value pt(x), it is compared to its closest values among the set of samples. === Model update: Sample values lifespan policy === ViBe ensures a smooth exponentially decaying lifespan for the sample values that constitute the pixel models. This makes ViBe able to successfully deal with concomitant events with a single model of a reasonable size for each pixel. This is achieved by choosing, randomly, which sample to replace when updating a pixel model. Once the sample to be discarded has been chosen, the new value replaces the discarded sample. The pixel model that would result from the update of a given pixel model with a given pixel sample cannot be predicted since the value to be discarded is chosen at random. === Model update: Spatial Consistency === To ensure the spatial consistency of the whole image model and handle practical situations such as small camera movements or slowly evolving background objects, ViBe uses a technique similar to that developed for the updating process in which it chooses at random and update a pixel model in the neighborhood of the current pixel. By denoting NG(x) and p(x) respectively the spatial neighborhood of a pixel x and its value, and assuming that it was decided to update the set of samples of x by inserting p(x), then ViBe also use this value p(x) to update the set of samples of one of the pixels in the neighborhood NG(x), chosen at random. As a result, ViBe is able to produce spatially coherent results directly without the use of any post-processing method. === Model initialization === Although the model could easily recover from any type of initialization, for example by choosing a set of random values, it is convenient to get an accurate background estimate as soon as possible. Ideally a segmentation algorithm would like to be able to segment the video sequences starting from the second frame, the first frame being used to initialize the model. Since no temporal information is available prior to the second frame, ViBe populates the pixel models with values found in the spatial neighborhood of each pixel; more precisely, it initializes the background model with values taken randomly in each pixel neighborhood of the first frame. The background estimate is therefore valid starting from the second frame of a video sequence.

    Read more →
  • Concordancer

    Concordancer

    A concordancer is a computer program that automatically constructs a concordance—an alphabetised index of every occurrence of a word or phrase in a body of text, each entry displayed with its surrounding context. Concordancers are primary tools in corpus linguistics, lexicography, computer-assisted translation, and language teaching. The most common display format is the key word in context (KWIC) layout, in which each hit appears centred on a line with a fixed span of words to its left and right, enabling rapid scanning of usage patterns across many occurrences. == History == === Pre-computational concordances === The compilation of concordances predates computers by many centuries. Around 1230, the French Dominican cardinal Hugh of Saint-Cher directed a team of friars in assembling a concordance of the Latin Vulgate Bible, generally regarded as the first systematic concordance of any text. To help readers locate passages, Hugh divided each biblical chapter into lettered sections. Later milestones include a Hebrew Old Testament concordance compiled by Rabbi Mordecai Nathan (1448), Alexander Cruden's Complete Concordance to the Holy Scriptures (1737), and the manuscript Asaf ha-Mazkir, an unfinished concordance to the Babylonian Talmud compiled by Moses Rigotz around the turn of the 19th century. === First computer concordance === The first concordance produced with computing assistance was the Index Thomisticus, a comprehensive lexical index of the writings of and around Thomas Aquinas, totalling approximately 10.6 million Latin words. The Italian Jesuit priest Roberto Busa conceived the project in 1946 and secured the sponsorship of IBM in 1949 after a meeting with chairman Thomas J. Watson. Keypunch operators in Gallarate, Italy, encoded the texts onto punched cards from around 1950. IBM executive Paul Tasman developed the processing methods. The full 56-volume printed edition was completed around 1980, followed by a CD-ROM edition in 1989 and a web-accessible version in 2005. === The KWIC format === The key word in context (KWIC) display was formalised as a computational technique by Hans Peter Luhn, a researcher at IBM, in a 1960 paper in American Documentation. In KWIC output, each instance of the search term (the node word) is centred on a line with a fixed window of words to each side; sorting the resulting lines alphabetically by the immediately adjacent word reveals collocational and phraseological patterns at a glance. === COCOA === One of the first dedicated concordancing programs was COCOA (COunt and COncordance Generation on Atlas), created in 1965 by D. B. Russell at University College London and the Atlas Computer Laboratory in Harwell, Oxfordshire. Written in approximately 4,000 cards of FORTRAN, it processed text annotated with flat, non-hierarchical markup tags and could produce word counts and concordances in multiple languages. Within its first six months COCOA had been applied to texts in at least six languages. A second version designed for multiple mainframe platforms was distributed to British computing centres in the mid-1970s. Growing dissatisfaction with its interface and the eventual withdrawal of Atlas Laboratory support prompted British funding bodies to commission a successor program. === Oxford Concordance Program === The Oxford Concordance Program (OCP) was designed and written in FORTRAN by Susan Hockey and Ian Marriott at Oxford University Computing Services (OUCS) between 1979 and 1980 and first released in 1981. Hockey and Marriott acknowledged that OCP owed much to COCOA and the CLOC system at the University of Birmingham. OCP accepted COCOA-format markup to encode metadata such as author, act, scene, and line number, and was described by its authors as "a machine-independent text analysis program for producing word lists, indices and concordances in a variety of languages and alphabets." By the mid-1980s it had been licensed to approximately 240 institutions in 23 countries. A personal computer version, Micro-OCP, was developed for the IBM PC and sold by Oxford University Press from the late 1980s. Version 2 was rewritten in 1985–86 and documented in the same 1987 article by Hockey and co-author John Martin. === Personal computer era === The availability of affordable personal computers in the 1980s and 1990s enabled standalone concordancing applications that analysts could run locally without specialist computing facilities. MicroConcord, developed by Mike Scott and Tim Johns and published by Oxford University Press in 1993 for MS-DOS, was among the first concordancers designed specifically for classroom language teaching. WordSmith Tools, also developed by Mike Scott, was first released in 1996 and became one of the most widely used corpus analysis suites in academic linguistics research. Other tools from this era include TACT (University of Toronto, 1989), a suite of MS-DOS freeware programs for literary text analysis, and MonoConc, a Windows concordancer created by Michael Barlow. === Web-based concordancers === From the late 1990s onwards, web-based concordancers hosted on remote servers gave researchers browser access to large preloaded corpora without requiring local storage or processing. The Sketch Engine, developed by Adam Kilgarriff and Pavel Rychlý (Masaryk University), was launched commercially in July 2003 by Lexical Computing Limited and introduced word sketches—automatically generated one-page profiles of a word's typical grammatical relations and collocations. AntConc, created by Laurence Anthony at Waseda University, Tokyo, was first released in 2002 as freeware for Windows, macOS, and Linux. == Features == Modern concordancers typically offer a range of analytical functions beyond basic KWIC display. These commonly include: KWIC display with the node word centred and context words in aligned columns, sortable by the word one, two, or three positions to the left or right of the node (L1–L3 and R1–R3) Concordance plots, visualising the distribution of hits as marks along a scaled bar representing each text in the corpus Frequency and word lists, both alphabetical and ranked by frequency Collocation statistics, identifying words that co-occur with the search term more often than chance, quantified by measures such as mutual information, the t-score, or log-likelihood Keyword analysis, comparing word frequencies between a study corpus and a reference corpus to identify statistically distinctive items N-gram analysis, finding frequently recurring word sequences of a specified length Part-of-speech tagging integration, allowing searches filtered to particular grammatical categories Unicode support for multilingual text Bilingual and parallel concordancers additionally display aligned text in two or more languages side by side, enabling comparison of translation equivalents across language pairs. == Notable concordancers == === WordSmith Tools === Created by Mike Scott and first released in 1996, WordSmith Tools is a Windows corpus analysis suite that evolved from MicroConcord. Its three core modules are Concord (KWIC concordances), WordList (frequency and alphabetical word lists), and Keywords (statistical keyword identification relative to a reference corpus). Oxford University Press used WordSmith Tools for dictionary preparation work. Version 4.0 is freely available; later versions are sold by Lexical Analysis Software Limited. === AntConc === AntConc is a freeware, multiplatform concordancing toolkit created by Laurence Anthony, Professor of Applied Linguistics at Waseda University, Tokyo. First released in 2002 and formally described in a 2005 academic paper, it runs on Windows, macOS, and Linux. Its tools include a KWIC concordancer, a concordance plot for visualising distribution across texts, a collocates tool, a keyword list, and an n-gram analysis module. Because it is free and requires only plain text files, AntConc is widely used in linguistics courses and independent research worldwide. === Sketch Engine === The Sketch Engine is a corpus management and query system co-created by Adam Kilgarriff and Pavel Rychlý and launched in 2003 by Lexical Computing Limited. It provides browser-based access to over 800 corpora in more than 100 languages. Beyond concordance searching, it offers word sketches, collocation analysis, distributional thesaurus construction, keyword and terminology extraction, and diachronic analysis. It is used by major publishers including Macmillan and Oxford University Press for lexicographic research. A subset tool, SKELL (Sketch Engine for Language Learning), is freely accessible to individual learners. === Wmatrix === Wmatrix is a web-based corpus processing environment developed by Paul Rayson at the University Centre for Computer Corpus Research on Language (UCREL), Lancaster University. Alongside concordances and frequency lists, Wmatrix integrates CLAWS part-of-speech tagging and the USAS semantic tagger, enabling keyword analysis simultane

    Read more →
  • Viewport

    Viewport

    A viewport is a polygon viewing region in computer graphics. In computer graphics theory, there are two region-like notions of relevance when rendering some objects to an image. In textbook terminology, the world coordinate window is the area of interest (meaning what the user wants to visualize) in some application-specific coordinates, e.g. miles, centimeters etc. The word window as used here should not be confused with the GUI window, i.e. the notion used in window managers. Rather it is an analogy with how a window limits what one can see outside a room. In contrast, the viewport is an area (typically rectangular) expressed in rendering-device-specific coordinates, e.g. pixels for screen coordinates, in which the objects of interest are going to be rendered. Clipping to the world-coordinates window is usually applied to the objects before they are passed through the window-to-viewport transformation. For a 2D object, the latter transformation is simply a combination of translation and scaling, the latter not necessarily uniform. An analogy of this transformation process based on traditional photography notions is to equate the world-clipping window with the camera settings and the variously sized prints that can be obtained from the resulting film image as possible viewports. Because the physical-device-based coordinates may not be portable from one device to another, a software abstraction layer known as normalized device coordinates is typically introduced for expressing viewports; it appears for example in the Graphical Kernel System (GKS) and later systems inspired from it. In 3D computer graphics, the viewport refers to the 2D rectangle used to project the 3D scene to the position of a virtual camera. A viewport is a region of the screen used to display a portion of the total image to be shown. In virtual desktops, the viewport is the visible portion of a 2D area which is larger than the visualization device. When viewing a document in a web browser, the viewport is the region of the browser window which contains the visible portion of the document. If the size of the viewport changes, for example as a result of the user resizing the browser window, then the browser may reflow the document (recalculate the locations and sizes of elements of the document). If the document is larger than the viewport, the user can control the portion of the document which is visible by scrolling in the viewport.

    Read more →
  • Xiaomi MiMo

    Xiaomi MiMo

    Xiaomi MiMo is a family of large language models (LLMs) developed by Xiaomi. It was initially released in April 2025 with the MiMo-7B model. Currently, MiMo is available for developers through API service. It is used as the key AI model in Xiaomi's "Human x Car x Home" ecosystem. == Development == Xiaomi developed MiMo as a reasoning-focused language model. Its development team was led by Luo Fuli, who had previously worked at DeepSeek before joining Xiaomi in late 2025. The model was trained using multi-token prediction and reinforcement learning, with a particular emphasis on mathematical reasoning and code generation tasks. In March 2026, Xiaomi CEO Lei Jun announced that the company planned to invest at least US$8.7 billion in artificial intelligence over the following three years. == Models == === List of models === === MiMo-7B === MiMo-7B is the first model of this LLM. The base model, MiMo-7B-Base, was pre-trained on approximately 25 trillion tokens using web pages, academic papers, books, and synthetic reasoning data. MiMo-7B-RL underwent supervised fine-tuning and reinforcement learning on 130,000 mathematics and code problems. MiMo-7B-RL-0530 was released in May 2025. It scaled the fine-tuning dataset from 500,000 to 6 million instances and extended the RL window from 32,000 to 48,000 tokens and improved AIME 2024 scores from 68.2 to 80.1. MiMo-VL-7B was a vision-language model combining a Vision Transformer encoder with the MiMo-7B backbone. It was trained in four stages consuming 2.4 trillion tokens. Its reinforcement learning variant used Mixed On-Policy Reinforcement Learning (MORL) which integrated reward signals across perception, grounding, and reasoning. Xiaomi also released MiMo-Audio-7B, an audio-language model for voice conversion, style transfer, and speech editing. === MiMo-V2-Flash === MiMo-V2-Flash was launched in December 2025. It is a open-sourced Mixture-of-experts model with 309 billion total parameters and 15 billion active parameters. It was trained on 27 trillion tokens using FP8 mixed precision. It used hybrid attention interleaving Sliding Window and Global Attention at a 5:1 ratio. === MiMo-V2-Pro === Xiaomi publicly introduced MiMo-V2-Pro on 18 March 2026. It has over 1 trillion total parameters, 42 billion active, and a 1-million-token context window. Before the official release, the model had appeared anonymously on OpenRouter under the codename "Hunter Alpha," where it drew substantial usage and topped daily charts for several days, according to Xiaomi and Reuters. During its listing on OpenRouter, the model reportedly processed over one trillion tokens in total usage. Xiaomi later said Hunter Alpha was an early internal test build of MiMo-V2-Pro, and Reuters reported that the model had been mistaken by some users for a possible DeepSeek system before Xiaomi confirmed its origin. The model was released as a proprietary API product, and Luo Fuli stated that Xiaomi intended to open-source a variant at an unspecified future date. Xiaomi has partnered with several API web platforms like OpenClaw to launch the model. All these websites initially offered a free trial of this model for a week, but due to the overwhelming response, Xiaomi later extended the free trial period of the model until 2 April 2026. === MiMo-V2-Omni === Alongside MiMo-V2-Pro, Xiaomi launched MiMo-V2-Omni on 18 March 2026. It handles image, video, audio, and text inputs. Before the official release, it was codenamed "Healer Alpha" in OpenRouter. === MiMo-V2-TTS === On the same date as the release of MiMo-V2-Pro and MiMo-V2-Omni, a Text-to-Speech model named MiMo-V2-TTS was released also. It is a speech synthesis model. It was trained on audio data, which makes it capable of emotional transitions, mid-sentence tone shifts, singing, and synthesis of regional dialects like Sichuan, Cantonese, Henan, and Taiwanese. == Licensing == Xiaomi has used different licensing approaches for different models in the MiMo family. The MiMo-7B series and MiMo-V2-Flash were released as open-weight models. MiMo-V2-Flash was published under the MIT license with model weights and inference code available on Hugging Face. MiMo-V2-Pro and MiMo-V2-Omni were released as proprietary models. It was accessible through Xiaomi's API platform and third-party API providers. Luo Fuli stated that Xiaomi intended to open-source a variant of MiMo-V2-Pro. Although, she did not specify any timeline. MiMo-V2-TTS was released as a proprietary model with no publicly available weights.

    Read more →
  • Information retrieval

    Information retrieval

    Information retrieval (IR) in computing and information science is the task of identifying and retrieving information system resources that are relevant to an information need. The information need can be specified in the form of a search query. In the case of document retrieval, queries can be based on full-text or other content-based indexing. Information retrieval is the science of searching for information in a document, searching for documents themselves, and also searching for the metadata that describes data, and for databases of texts, images, or sounds. Cross-modal retrieval implies retrieval across modalities. Automated information retrieval systems are used to reduce what has been called information overload. An IR system is a software system that provides access to books, journals, and other documents, as well as storing and managing those documents. Web search engines are the most visible IR applications. == Overview == An information retrieval process begins when a user enters a query into the system. Queries are formal statements of information needs, for example search strings in web search engines. In information retrieval, a query does not uniquely identify a single object in the collection. Instead, several objects may match the query, perhaps with different degrees of relevance. An object is an entity that is represented by information in a content collection or database. User queries are matched against the database information. However, as opposed to classical SQL queries of a database, in information retrieval the results returned may or may not match the query, so results are typically ranked. This ranking of results is a key difference of information retrieval searching compared to database searching. Depending on the application the data objects may be, for example, text documents, images, audio, mind maps or videos. Often the documents themselves are not kept or stored directly in the IR system, but are instead represented in the system by document surrogates or metadata. Most IR systems compute a numeric score on how well each object in the database matches the query, and rank the objects according to this value. The top ranking objects are then shown to the user. The process may then be iterated if the user wishes to refine the query. == History == there is ... a machine called the Univac ... whereby letters and figures are coded as a pattern of magnetic spots on a long steel tape. By this means the text of a document, preceded by its subject code symbol, can be recorded ... the machine ... automatically selects and types out those references which have been coded in any desired way at a rate of 120 words a minute The idea of using computers to search for relevant pieces of information was popularized in the article As We May Think by Vannevar Bush in 1945. It would appear that Bush was inspired by patents for a 'statistical machine' – filed by Emanuel Goldberg in the 1920s and 1930s – that searched for documents stored on film. The first description of a computer searching for information was described by Holmstrom in 1948, detailing an early mention of the Univac computer. Automated information retrieval systems were introduced in the 1950s: one even featured in the 1957 romantic comedy Desk Set. In the 1960s, the first large information retrieval research group was formed by Gerard Salton at Cornell. By the 1970s several different retrieval techniques had been shown to perform well on small text corpora such as the Cranfield collection (several thousand documents). Large-scale retrieval systems, such as the Lockheed Dialog system, came into use early in the 1970s. In 1992, the US Department of Defense along with the National Institute of Standards and Technology (NIST), cosponsored the Text Retrieval Conference (TREC) as part of the TIPSTER text program. The aim of this was to look into the information retrieval community by supplying the infrastructure that was needed for evaluation of text retrieval methodologies on a very large text collection. This catalyzed research on methods that scale to huge corpora. The introduction of web search engines has boosted the need for very large scale retrieval systems even further. By the late 1990s, the rise of the World Wide Web fundamentally transformed information retrieval. While early search engines such as AltaVista (1995) and Yahoo! (1994) offered keyword-based retrieval, they were limited in scale and ranking refinement. The breakthrough came in 1998 with the founding of Google, which introduced the PageRank algorithm, using the web's hyperlink structure to assess page importance and improve relevance ranking. During the 2000s, web search systems evolved rapidly with the integration of machine learning techniques. These systems began to incorporate user behavior data (e.g., click-through logs), query reformulation, and content-based signals to improve search accuracy and personalization. In 2009, Microsoft launched Bing, introducing features that would later incorporate semantic web technologies through the development of its Satori knowledge base. Academic analysis have highlighted Bing's semantic capabilities, including structured data use and entity recognition, as part of a broader industry shift toward improving search relevance and understanding user intent through natural language processing. A major leap occurred in 2018, when Google deployed BERT (Bidirectional Encoder Representations from Transformers) to better understand the contextual meaning of queries and documents. This marked one of the first times deep neural language models were used at scale in real-world retrieval systems. BERT's bidirectional training enabled a more refined comprehension of word relationships in context, improving the handling of natural language queries. Because of its success, transformer-based models gained traction in academic research and commercial search applications. Simultaneously, the research community began exploring neural ranking models that outperformed traditional lexical-based methods. Long-standing benchmarks such as the Text REtrieval Conference (TREC), initiated in 1992, and more recent evaluation frameworks Microsoft MARCO(MAchine Reading COmprehension) (2019) became central to training and evaluating retrieval systems across multiple tasks and domains. MS MARCO has also been adopted in the TREC Deep Learning Tracks, where it serves as a core dataset for evaluating advances in neural ranking models within a standardized benchmarking environment. As deep learning became integral to information retrieval systems, researchers began to categorize neural approaches into three broad classes: sparse, dense, and hybrid models. Sparse models, including traditional term-based methods and learned variants like SPLADE, rely on interpretable representations and inverted indexes to enable efficient exact term matching with added semantic signals. Dense models, such as dual-encoder architectures like ColBERT, use continuous vector embeddings to support semantic similarity beyond keyword overlap. Hybrid models aim to combine the advantages of both, balancing the lexical (token) precision of sparse methods with the semantic depth of dense models. This way of categorizing models balances scalability, relevance, and efficiency in retrieval systems. As IR systems increasingly rely on deep learning, concerns around bias, fairness, and explainability have also come to the picture. Research is now focused not just on relevance and efficiency, but on transparency, accountability, and user trust in retrieval algorithms. == Applications == Areas where information retrieval techniques are employed include (the entries are in alphabetical order within each category): === General applications === Digital libraries Information filtering Recommender systems Media search Blog search Image retrieval 3D retrieval Music retrieval News search Speech retrieval Video retrieval Search engines Site search Desktop search Enterprise search Federated search Mobile search Social search Web search === Domain-specific applications === Expert search finding Genomic information retrieval Geographic information retrieval Information retrieval for chemical structures Information retrieval in software engineering Legal information retrieval Vertical search === Other retrieval methods === Methods/Techniques in which information retrieval techniques are employed include: Cross-modal retrieval Adversarial information retrieval Automatic summarization Multi-document summarization Compound term processing Cross-lingual retrieval Document classification Spam filtering Question answering == Model types == In order to effectively retrieve relevant documents by IR strategies, the documents are typically transformed into a suitable representation. Each retrieval strategy incorporates a specific model for its document representation purposes. The picture on the right illustrates the relationship of som

    Read more →