AI News and Guides

Explore the best AI News and Guides — independent reviews, comparisons, pricing and step-by-step how-to guides, curated by Aizhi.

  • Hildon

    Hildon

    Hildon is an application framework originally developed for mobile devices (PDAs, mobile phones, etc.) running the Linux operating system as well as the Symbian operating system. The Symbian variant of Hildon was discontinued with the cancellation of Series 90. It was developed by Nokia for the Maemo operating system. It focuses on providing a finger-friendly interface. It is primarily a set of GTK extensions that provide mobile-device–oriented functionality, but also provides a desktop environment that includes a task navigator for opening and switching between programs, a control panel for user settings, and status bar, task bar and home applets. It is standard on the Maemo platform used by the Nokia Internet Tablets and the Nokia N900 smartphone. Hildon has also been selected as the framework for Ubuntu Mobile and Embedded Edition. Hildon was an early instance of a software platform for generic computing in a tablet device intended for internet consumption. But Nokia didn't commit to it as their only platform for their future mobile devices and the project competed against other in-house platforms. The strategic advantage of a modern platform was not exploited, being displaced by the Series 60, though its development is continued by the Maemo Leste project. == Components == The Hildon framework includes components that effectively provide a desktop environment. === Hildon Application Manager === Hildon Application Manager is the Hildon graphical package manager, it uses the Debian package management tools APT (Advanced Packaging Tool and dpkg) and provides a graphical interface for installing, updating and removing packages. It is a limited package manager, designed specifically for end-users, in that it doesn't directly offer the user access to system files and libraries. With the Diablo release of Maemo, Hildon Application Manager now supports "Seamless Software Update" (SSU), which implements a variety of features to allow system upgrades to be easily performed through it. === Hildon Control Panel === Hildon Control Panel is the user settings interface for Hildon. It provides simple access to control panels used to change system settings. === Hildon Desktop === Hildon Desktop is the primary UI component of Hildon, so makes up the bulk of what a user will see as "Hildon". It controls application launching and switching, general system control, and provides interfaces for task bar (application menu and task switcher), status bar (brightness and volume control), and home (internet radio and web search) applets. === Hildon Library === The Hildon library, originally developed by Nokia but since Maemo 5, developed by Igalia and Lanedo (who developed MaemoGTK+, the Maemo version of GTK+). It is a set of mobile specific GTK+ widgets for applications in Maemo. Up to Maemo 4, these widgets were designed for stylus usage. However, in Maemo 5, most widgets were deprecated and new widgets for direct finger manipulation were introduced, including a kinetic panning container.

    Read more →
  • Supervised learning

    Supervised learning

    In machine learning, supervised learning (SL) is a type of machine learning paradigm where an algorithm learns to map input data to a specific output based on example input-output pairs. This process involves training a statistical model using labeled data, meaning each piece of input data is provided with the correct output. The term "supervised" refers to the role of a teacher or supervisor who provides this training data, guiding the algorithm towards correct predictions. For instance, if you want a model to identify cats in images, supervised learning would involve feeding it many images of cats (inputs) that are explicitly labeled "cat" (outputs). The goal of supervised learning is for the trained model to accurately predict the output for new, unseen data. This requires the algorithm to effectively generalize from the training examples, a quality measured by its generalization error. Supervised learning is commonly used for tasks like classification (predicting a category, e.g., spam or not spam) and regression (predicting a continuous value, e.g., house prices). == Steps to follow == To solve a given problem of supervised learning, the following steps must be performed: Determine the type of training samples. Before doing anything else, the user should decide what kind of data is to be used as a training set. In the case of handwriting analysis, for example, this might be a single handwritten character, an entire handwritten word, an entire sentence of handwriting, or a full paragraph of handwriting. Gather a training set. The training set needs to be representative of the real-world use of the function. Thus, a set of input objects is gathered together with corresponding outputs, either from human experts or from measurements. Determine the input feature representation of the learned function. The accuracy of the learned function depends strongly on how the input object is represented. Typically, the input object is transformed into a feature vector, which contains a number of features that are descriptive of the object. The number of features should not be too large, because of the curse of dimensionality; but should contain enough information to accurately predict the output. Determine the structure of the learned function and corresponding learning algorithm. For example, one may choose to use support-vector machines or decision trees. Complete the design. Run the learning algorithm on the gathered training set. Some supervised learning algorithms require the user to determine certain control parameters. These parameters may be adjusted by optimizing performance on a subset (called a validation set) of the training set, or via cross-validation. Evaluate the accuracy of the learned function. After parameter adjustment and learning, the performance of the resulting function should be measured on a test set that is separate from the training set. == Algorithm choice == A wide range of supervised learning algorithms are available, each with its strengths and weaknesses. There is no single learning algorithm that works best on all supervised learning problems (see the No free lunch theorem). There are four major issues to consider in supervised learning: === Bias–variance tradeoff === A first issue is the tradeoff between bias and variance. Imagine that we have available several different, but equally good, training data sets. A learning algorithm is biased for a particular input x {\displaystyle x} if, when trained on each of these data sets, it is systematically incorrect when predicting the correct output for x {\displaystyle x} . A learning algorithm has high variance for a particular input x {\displaystyle x} if it predicts different output values when trained on different training sets. The prediction error of a learned classifier is related to the sum of the bias and the variance of the learning algorithm. Generally, there is a tradeoff between bias and variance. A learning algorithm with low bias must be "flexible" so that it can fit the data well. But if the learning algorithm is too flexible, it will fit each training data set differently, and hence have high variance. A key aspect of many supervised learning methods is that they are able to adjust this tradeoff between bias and variance (either automatically or by providing a bias/variance parameter that the user can adjust). === Function complexity and amount of training data === The second issue is of the amount of training data available relative to the complexity of the "true" function (classifier or regression function). If the true function is simple, then an "inflexible" learning algorithm with high bias and low variance will be able to learn it from a small amount of data. But if the true function is highly complex (e.g., because it involves complex interactions among many different input features and behaves differently in different parts of the input space), then the function will only be able to learn with a large amount of training data paired with a "flexible" learning algorithm with low bias and high variance. === Dimensionality of the input space === A third issue is the dimensionality of the input space. If the input feature vectors have large dimensions, learning the function can be difficult even if the true function only depends on a small number of those features. This is because the many "extra" dimensions can confuse the learning algorithm and cause it to have high variance. Hence, input data of large dimensions typically requires tuning the classifier to have low variance and high bias. In practice, if the engineer can manually remove irrelevant features from the input data, it will likely improve the accuracy of the learned function. In addition, there are many algorithms for feature selection that seek to identify the relevant features and discard the irrelevant ones. This is an instance of the more general strategy of dimensionality reduction, which seeks to map the input data into a lower-dimensional space prior to running the supervised learning algorithm. === Noise in the output values === A fourth issue is the degree of noise in the desired output values (the supervisory target variables). If the desired output values are often incorrect (because of human error or sensor errors), then the learning algorithm should not attempt to find a function that exactly matches the training examples. Attempting to fit the data too carefully leads to overfitting. You can overfit even when there are no measurement errors (stochastic noise) if the function you are trying to learn is too complex for your learning model. In such a situation, the part of the target function that cannot be modeled "corrupts" your training data – this phenomenon has been called deterministic noise. When either type of noise is present, it is better to go with a higher bias, lower variance estimator. In practice, there are several approaches to alleviate noise in the output values such as early stopping to prevent overfitting as well as detecting and removing the noisy training examples prior to training the supervised learning algorithm. There are several algorithms that identify noisy training examples and removing the suspected noisy training examples prior to training has decreased generalization error with statistical significance. === Other factors to consider === Other factors to consider when choosing and applying a learning algorithm include the following: Heterogeneity of the data. If the feature vectors include features of many different kinds (discrete, discrete ordered, counts, continuous values), some algorithms are easier to apply than others. Many algorithms, including support-vector machines, linear regression, logistic regression, neural networks, and nearest neighbor methods, require that the input features be numerical and scaled to similar ranges (e.g., to the [-1,1] interval). Methods that employ a distance function, such as nearest neighbor methods and support-vector machines with Gaussian kernels, are particularly sensitive to this. An advantage of decision trees is that they easily handle heterogeneous data. Redundancy in the data. If the input features contain redundant information (e.g., highly correlated features), some learning algorithms (e.g., linear regression, logistic regression, and distance-based methods) will perform poorly because of numerical instabilities. These problems can often be solved by imposing some form of regularization. Presence of interactions and non-linearities. If each of the features makes an independent contribution to the output, then algorithms based on linear functions (e.g., linear regression, logistic regression, support-vector machines, naive Bayes) and distance functions (e.g., nearest neighbor methods, support-vector machines with Gaussian kernels) generally perform well. However, if there are complex interactions among features, then algorithms such as decision trees and neural networks work better, becaus

    Read more →
  • Abeba Birhane

    Abeba Birhane

    Abeba Birhane is an Ethiopian-born cognitive scientist who works at the intersection of complex adaptive systems, machine learning, algorithmic bias, and critical race studies. Birhane's work with Vinay Prabhu uncovered that large-scale image datasets commonly used to develop AI systems, including ImageNet and 80 Million Tiny Images, carried racist and misogynistic labels and offensive images. She has been recognized by VentureBeat as a top innovator in computer vision and named as one of the 100 most influential persons in AI 2023 by TIME magazine. == Early life and education == Birhane was born in Ethiopia. She received her Bachelors of Science in Psychology and a Bachelors of Arts in Philosophy from The Open University. In 2015, she completed her Master of Science in Cognitive Science and, in 2021, her Ph.D. at the Complex Software Lab in the School of Computer Science at University College Dublin. == Career and research == Birhane studied the impacts of emerging AI technologies and how they shape individuals and local communities. She found that AI algorithms tend to disproportionately impact vulnerable groups such as older workers, trans people, immigrants, and children. Her research on relational ethics won the best paper award at NeurIPS’s Black in AI workshop in 2019. She has also studied and written about algorithmic colonization driven by corporate agendas. Her work in decolonizing computational sciences addressed the inherited oppressions in current systems especially towards women of color. In 2020, Birhane and Vinay Prabhu, principal machine learning scientist at UnifyID, published a paper examining the problematic data collection, labelling, classification, and consequences of large image datasets. These datasets, including ImageNet and MIT's 80 Million Tiny Images, have been used to develop thousands of AI algorithms and systems. Birhane and Prabhu found that they contained many racist and misogynistic labels and slurs as well as offensive images. This resulted in MIT voluntarily and formally taking down the 80 Million Tiny Images dataset. More recently, Birhane has worked with Rediet Abebe, George Obaido, and Sekou Remy on researching the barriers to data sharing in Africa. They found that power imbalances are significant in the data sharing process, even when the data comes from Africa. Their research was published at the ACM Conference on Fairness, Accountability, and Transparency. In 2024, Birhane established the AI Accountability Lab research group at Trinity College Dublin. == Selected awards == 2019 NeurIPS Black in AI Workshop Best Paper Award 2020 Venture Beat AI Innovations Award in the category Computer Vision Innovation (received with Vinay Prabhu) 2021 100 Brilliant Women in AI Ethics Hall of Fame Honoree 2022 Lero Director’s Prize for PhD/PostDoctoral Contribution. 2023 100 Most Influential People in AI by TIME magazine

    Read more →
  • Stefano Soatto

    Stefano Soatto

    Stefano Soatto is professor of computer science at the University of California, Los Angeles (UCLA), in Los Angeles, CA, where he is also professor of electrical engineering and founding director of the UCLA Vision Lab. He is also Vice President of applied science for Amazon Web Services' (AWS) AI division. == Academic biography == Soatto obtained his D. Eng. in electrical engineering, cum laude, from the University of Padua in 1992, was an EAP Fellow at the University of California, Berkeley in 1990–1991, and received his Ph.D. in control and dynamical systems from the California Institute of Technology in 1996 with dissertation "A Geometric Approach to Dynamic Vision". In 1996–97 he was a postdoctoral scholar at Harvard University, and subsequently held positions as assistant and associate professor of electrical engineering and biomedical engineering at Washington University in St. Louis, and of mathematics and computer science at the University of Udine, Italy. He has been at UCLA since 2000. He is also Vice President of applied science for Amazon Web Services' (AWS) AI division. == Research == Soatto's research focuses on computer vision, machine learning and robotics. He co-developed optimal algorithms for structure from motion (SFM, or visual SLAM, simultaneous localization and mapping, in robotics; Best Paper Award at CVPR 1998), characterized its ambiguities (David Marr Prize at ICCV 1999), also characterized the identifiability and observability of visual-inertial sensor fusion (Best Paper Award at ICRA 2015). His research focus is the development of representations, that are functions of the data that capture their informative content and discard irrelevant variability in the data (a generalized form of 'noise' or 'clutter'). Soatto's lab first to demonstrate real-time SFM and augmented reality (AR) on commodity hardware in live demos at CVPR 2000, ICCV 2001, and ECCV 2002. He also co-led the UCLA-Golem Team in the second DARPA Grand Challenge for autonomous vehicles, with Emilio Frazzoli (co-founder of NuTonomy), and Amnon Shashua (co-founder of Mobileye). == Recognition == Soatto was named Fellow of the Institute of Electrical and Electronics Engineers (IEEE) in 2013 for contributions to dynamic visual processes. He received the David Marr Prize in Computer Vision in 1999. He was named to the 2022 class of ACM Fellows, "for contributions to the foundations and applications of visual geometry and visual representations learning".

    Read more →
  • Confusion network

    Confusion network

    A confusion network (sometimes called a word confusion network or informally known as a sausage) is a natural language processing method that combines outputs from multiple automatic speech recognition or machine translation systems. Confusion networks are simple linear directed acyclic graphs with the property that each a path from the start node to the end node goes through all the other nodes. The set of words represented by edges between two nodes is called a confusion set. In machine translation, the defining characteristic of confusion networks is that they allow multiple ambiguous inputs, deferring committal translation decisions until later stages of processing. This approach is used in the open source machine translation software Moses and the proprietary translation API in IBM Bluemix Watson.

    Read more →
  • Top 10 AI Text-to-video Tools Compared (2026)

    Top 10 AI Text-to-video Tools Compared (2026)

    Trying to pick the best AI text-to-video tool? An AI text-to-video tool is software that uses machine learning to help you get more done — it scales effortlessly from a single task to thousands. The best picks balance beginner-friendly simplicity with the depth power users need, and they ship updates often. Whether you are a beginner or a pro, the right AI text-to-video tool slots into your workflow and pays for itself fast. This guide breaks down the top picks, their pros and cons, and who each one is best for.

    Read more →
  • Xuedong Huang

    Xuedong Huang

    Xuedong David Huang (born October 20, 1962) is a Chinese-American computer scientist and technology executive who has made contributions to spoken language processing and artificial intelligence, including Azure AI Services. He is Zoom's chief technology officer after serving as Microsoft's Technical Fellow and Azure AI Chief Technology Officer for 30 years. Huang is a strong advocate of AI for Accessibility, and AI for Cultural Heritage. == Education == Huang received his PhD from the University of Edinburgh in 1989 (sponsored by the British ORS and Edinburgh University Scholarship), his MS from Tsinghua University in 1984, and BS from Hunan University in 1982. == Career == After receiving his PhD in 1989, Huang joined Carnegie Mellon University and worked with Raj Reddy and Kai-Fu Lee on speech recognition. At CMU, he directed the Sphinx-II speech system research which achieved the best performance in every category of DARPA's 1992 benchmarking. Microsoft Research recruited him to found and lead Microsoft's spoken language initiatives in 1993. His co-authored book Spoken Language Processing and his Historical speech recognition review succinctly summarize several generations of spoken language research. As Microsoft's Mr. Speech for three decades, Huang has been instrumental in creating Microsoft's Speech Application Programming Interface (SAPI), shipping Microsoft Speech Server, and modernizing spoken language and integrative AI services via Azure AI, which not only enables millions of 3rd party customers but also powers up Microsoft's Windows, Office, Teams, and Azure OpenAI Services. Huang helped Microsoft and Azure Cognitive Services achieve multiple industry's first human parity milestones on the following open research tasks: transcribing conversational speech, machine translation, conversational QnA, and computer vision image captioning. Huang has made significant contributions to the software and AI industry through his executive leadership and his scientific publications, owning more than 170 US patents and impacting billions through Azure AI enabled products and services. In 2016, Wired magazine named him one of 25 Geniuses. In 2021, Azure AI was named the winner of InfoWorld's Technology of the Year Award. Huang was awarded the Allen Newell research excellence medal in 1992, and IEEE Speech Processing Best Paper in 1993. He was recognized as an IEEE Fellow by Institute of Electrical and Electronics Engineers in 2000, named ACM Fellow by Association for Computing Machinery in 2017, and a member of Washington State Academy of Sciences. Huang received 2022 Asian American Corporate Leadership Award, and IEEE Amar Bose Industrial Leader Award. In 2023, he was elected a member of the US National Academy of Engineering (NAE), and a member of the American Academy of Arts and Sciences.

    Read more →
  • Top 10 AI Background Removers Compared (2026)

    Top 10 AI Background Removers Compared (2026)

    Curious about the best AI background remover? An AI background remover is software that uses machine learning to help you get more done — it combines speed, accuracy, and an interface that just works. Hands-on testing shows real-world results vary, so a short free trial is the smartest way to decide. Whether you are a beginner or a pro, the right AI background remover slots into your workflow and pays for itself fast. This guide breaks down the top picks, their pros and cons, and who each one is best for.

    Read more →
  • Synthesia (company)

    Synthesia (company)

    Synthesia Limited is a British multinational artificial intelligence company based in London, United Kingdom. It is a synthetic media-generation software developer and creator of AI-generated video content, including audio-visual agents and cloned avatars. Britain's largest generative-AI firm, it is used by 70% of FTSE 100 and over 90% of Fortune 100 companies. == Overview == Synthesia is most often used by corporations for localized communication, orientation, employee training videos, advertising campaigns, reporting, product demonstrations, customer service, and to create chatbots. Its software algorithm mimics speech and facial movements based on video recordings of an individual’s speech and facial expressions. From this, a text-to-speech video is created to look and sound like the individual. Swiss bank UBS incorporated Synthesia AI-powered avatars of their human financial experts, for instance, in 2025. Users create content via the platform's pre-generated AI presenters or by creating digital representations of themselves, or personal avatars, using the platform's AI video editing tool. These avatars can be used to narrate videos generated from text. As of August 2021, Synthesia's voice database included multiple gender options in over 60 languages. Its free voice library doubled by 2025, to 140 languages and accents, and its Express-Voice technology can clone a user's own voice, or generate a synthetic one. === Deepfakes === The platform prohibits use of its software to create non-consensual clones, including of celebrities or political figures for satirical purposes. Explicit consent must be provided in addition to a strict pre-screening regimen for use of an individual's likeness to avoid “deepfaking”. While the company prohibits use of its technology for misinformation or "news-like content", an October 2023 Freedom House report stated that Synthesia tools had been used by governments in Venezuela, China, Burkina Faso, and Russia to create videos of fake TV news outlets with AI-generated avatars in order to spread propaganda. Actor Dan Dewhirst signed a contract with the company in 2021, becoming one of the first actors whose likeness would be made into an AI avatar, finding his likeness used in the Venezuelan generated-videos. The company stated, in February 2024, that it had improved its misuse detection systems, and, in April 2024, that new users of its technology are screened by the company, and content employing it is further vetted by Synthesia moderators. == History == Synthesia's software utilizes deep learning architecture developed by Lourdes Agapito and Matthias Niessner. The company was co-founded in 2017 by Agapito, Niessner, Victor Riparbelli, and Steffen Tjerrild. In 2018, the company first demonstrated the software's capabilities on the BBC programme Click when it presented a digitization of Matthew Amroliwala speaking Spanish, Mandarin, and Hindi. Through Synthesia's first two years of existence, it employed 10 people and struggled to make sales, leading to an expansion of the company's focus. It moved on from just targeting entertainment studios to a variety of businesses. In 2020, Synthesia users were reported to include Amazon, Tiffany & Co. and IHG Hotels & Resorts. In January 2024, the company introduced its AI video assistant, which turns text-to-video. That April, with a reported 55,000 customers, including half of the Fortune 100, Synthesia launched "expressive avatars". That September, an enhanced dubbing feature was launched, to translate video in 30 languages with naturalized lip-syncing. Peter Hill joined Synthesia as CTO in January 2025, following 25 years at Amazon, and two years as CEO and CPO of Wildfire Studios. That March, a million dollar base of shares was formed to furnish human actors, employed to generate digital avatars, with company stock, which all of its employees hold. By June of that year, 150,000 individuals from among Synthesia's 65,000 customers had created AI-generated avatars of themselves. In July 2025, the company's new global headquarters at Regent’s Place was opened by London mayor Sadiq Khan, who described Britain's largest generative-AI company, then valued at over $2 billion, as a "London success story". By that October, its technology was employed by 90% of the Fortune 100, and Synthesia 3.0 was launched, with hyper-realistic digital avatars equipped with AI-powered dubbing and translation, and a built-in video assistant. In January 2026, it reached a $4 billion valuation, with 70% of FTSE 100 companies noted among its customers. === Funding === The company raised $3.1 million in seed funding in 2019. In April 2021, the company raised $12.5 million in Series A funding. In December 2021, it raised $50 million in a Series B funding round led by Kleiner Perkins and GV (then Google Ventures). Synthesia gained a total valuation of $1 billion, and achieved unicorn status, when it raised $90 million from Accel and Nvidia partnership NVentures, in June 2023, during its Series C funding round. Counting 60,000 customers by January 2025, including over 60% of Fortune 100 companies; the company raised $180 million in a Series D round led by NEA, with new investors World Innovation Lab (WiL), Atlassian Ventures and PSP Growth, as well as existing investors GV, MMC Ventures and FirstMark, doubling Synthesia's valuation to $2.1 billion. Capital raised by 2025 had reached $330 million, with investments slated to further product innovation, talent growth, and company expansion in North America, Europe, Japan and Australia. In April 2025, Adobe Inc. invested £10 million in the company for a strategic partnership. Synthesia subsequently rejected a $3 billion acquisition offer from Adobe, choosing to remain independent. With a revenue stream then exceeding $100 million annually; GV led a Series E funding round in October 2025, resulting in Synthesia's $4 billion valuation, raising $200 million from GV, Nvidia and Accel to develop, in 2026, interactive audio-visual avatar "agents" that converse on topic, for automated sales training and corporate communications, such as recruiting. == Recognition == In 2021, Synthesia partnered with Lay's to create the Messi Messages campaign featuring Argentine footballer Lionel Messi. Users created personalized messages with Synthesia's software and sent custom artificial reality video messages from Messi based on their text input. The campaign received a Cannes Lion Award under the Bronze category. In February 2025, UK Science and Technology Minister Peter Kyle commended Synthesia's "pioneering generative AI innovations."

    Read more →
  • Best AI Clip Makers in 2026

    Best AI Clip Makers in 2026

    Trying to pick the best AI clip maker? An AI clip maker is software that uses machine learning to help you get more done — it scales effortlessly from a single task to thousands. The best picks balance beginner-friendly simplicity with the depth power users need, and they ship updates often. Whether you are a beginner or a pro, the right AI clip maker slots into your workflow and pays for itself fast. Read on for hands-on impressions, pricing tiers, and the standout features that matter.

    Read more →
  • General Regionally Annotated Corpus of Ukrainian

    General Regionally Annotated Corpus of Ukrainian

    General Regionally Annotated Corpus of the Ukrainian Language (GRAC, Ukrainian: Генеральний регіонально анотований корпус української мови, romanized: Heneralnyi rehionalno anotovanyi korpus ukrainskoi movy, ГРАК, Ukrainian грак for rook) is a text corpus of the Ukrainian language comprising more than 2 billion tokens, intended for linguistic research in grammar, vocabulary, and the history of the Ukrainian literary language, as well as for use in compiling dictionaries and grammars. The corpus can be used for language study and also for preparing teaching materials, textbooks, learner’s dictionaries, and exercises using examples from real texts, taking into account frequency and collocational patterns, and so on. The corpus is not a model of standard Ukrainian: it may contain words and combinations that do not match current norms of the literary language. The corpus covers the period from 1816 to 2025, and as of 29 November 2025 it contains more than 812,000 texts by about 35,000 authors. == Composition of the corpus == In the 10th version of the corpus, available for searching from 20 October 2020, 35% consists of fiction. Some fiction genres are highlighted separately: children’s literature, folklore, dramatic works, and scripts. Among non-fiction texts: journalistic writing, including newspaper collections from 1888–1893, 1905, 1913–1918, 1919–1943, modern newspapers from different regions, and texts from online news/information sites; memoirs, letters, and diaries, including a sizeable corpus of Facebook texts representing blogs by people from all regions of Ukraine and the diaspora; scholarly and educational texts: monographs, dissertations, academic articles, textbooks; large subcorpora of academic literature in history, ethnography, philosophy, and law are singled out separately; religious texts, including two Ukrainian translations of the Bible; speeches and interviews. Some dictionaries that include phrasal examples and phraseology have also been incorporated, including the Ukrainian dictionary by Borys Hrinchenko and the Russian-Ukrainian idiomatic dictionary by I. Vyrhan and M. Pylynska. Using the corpus tools, these dictionaries can be searched not only for words, but also for lexico-grammatical patterns within examples and phraseological expressions. About 20% of the texts in the corpus are translations. The corpus includes translations from more than 80 languages, most of all from English and Russian. == Dating == Texts in the corpus are dated by the year of writing, or by the latest year in which a work could have been written; translated texts are dated by the year the translation was produced. A publication year may also be indicated, corresponding to the edition from which the text is taken. == Regional annotation == The corpus’s regional annotation is based on the modern administrative division of Ukraine. The corpus includes texts from all oblasts of Ukraine and from Crimea. A single text may belong to several regional subcorpora (if the author or translator was born, studied, or lived for a long time in different regions). In addition to regional subcorpora, there are subcorpora of works by authors of the Ukrainian diaspora (USA, Canada, Poland, Germany, the United Kingdom, France, etc.). These are mostly texts by emigrants of the 1940s, and to a lesser extent of 1917–1920s. == Morphological annotation == GRAC is based on the morphological analysis system nlp_uk, developed by specialists from the r2u group. The program analyzes the text and, for each word form, determines the lemma (lexeme) and tags (grammatical features). == Research based on the corpus == Research on the Ukrainian language has been carried out using the corpus, including studies of the historical dynamics of language norms, and letter and letter-combination frequencies for font development.

    Read more →
  • Foma (software)

    Foma (software)

    Foma is a free and open source finite-state toolkit created and maintained by Mans Hulden. It includes a compiler, programming language, and C library for constructing finite-state automata and transducers (FST's) for various uses, most typically Natural Language Processing uses such as morphological analysis. Foma can replace the proprietary Xerox Finite State Toolkit for compiling and running FST's written in the lexc and xfst formalisms. The speed is comparable with the Xerox tools for most lexicons, although Foma can be 3 or 4 times slower for very large lexicons (e.g. >100,000 words). Foma is also one of the possible backends of the free and open source Helsinki Finite State Toolkit (where other backends provide support for further formalisms). There are several FOSS morphologies written in lexc/xfst compatible with foma, e.g. for the Sámi, Cornish, Faroese, Finnish, Komi, Mari, Udmurt, Buriat, Greenlandic language and Iñupiaq languages.

    Read more →
  • GEPIR

    GEPIR

    GEPIR (Global Electronic Party Information Registry) was a distributed database operated and owned by GS1 that contains basic information on over 1,000,000 companies in over 100 countries. The database could be searched by Global Trade Item Number (GTIN) code (including Universal Product Code (UPC) and EAN-13 codes), container Code (Serial Shipping Container Code (SSCC)), location number (Global Location Number (GLN)), and (in some countries) the company name. A SOAP webservice existed for API access. As of end December 2023, GEPIR was replaced by a service called Verified by GS1. While it operated, GEPIR had more than 1 million members in more than 100 countries. In 2013, all GS1 111 member organisations joined GEPIR. == Access == GEPIR was accessible for free in almost all countries but the number of request per day was limited (from 20 to 30). Since October 2013, GS1 France restricts access to GEPIR to companies (registration with SIREN code was required to use it). A premium access service had been created by GS1 France in January 2010 which allows companies to use GS1 web and SOAP interface without any limit. == System architecture == GEPIR was a lookup service coordinated by the GS1 GO that provided all end users with the ability to look up information about GS1 Identification Keys. Depending on the service, systems were provided by GS1 Member Organisations (MOs) or 3rd party service providers, or both. Where a GS1 MO did not choose to provide the service directly to its end users, the GS1 Global Office provided the service for that geography. Some services involved a technical component deployed by the GS1 Global Office that coordinates the systems provided by GS1 MOs and/or 3rd party service providers. The GEPIR service was provided by systems deployed by GS1 MOs, with the GS1 GO providing a central point of coordination to federate the local systems. The GS1 GO also provides the MO-level service for MOs that could not or did not wish to deploy their own system.

    Read more →
  • AI Paraphrasing Tools Reviews: What Actually Works in 2026

    AI Paraphrasing Tools Reviews: What Actually Works in 2026

    Comparing the best AI paraphrasing tool? An AI paraphrasing tool is software that uses machine learning to help you get more done — it lowers the barrier so anyone can produce professional output. Privacy matters too: check whether your data trains the model and whether a no-log or enterprise tier is available. Whether you are a beginner or a pro, the right AI paraphrasing tool slots into your workflow and pays for itself fast. We tested the leading options and ranked them by quality, value, and ease of use.

    Read more →
  • P4-metric

    P4-metric

    The P4 metric (also known as FS or Symmetric F ) enables performance evaluation of a binary classifier. The P4 metric is calculated from precision, recall, specificity, and NPV (negative predictive value). The definition of the P4 metric is similar to that of the F1 metric, however the P4 metric definition addresses criticisms leveled against the definition of the F1 metric. The definition of the P4 metric may, therefore, be understood as an extension of the F1 metric. Like the other known metrics, the P4 metric is a function of: TP (true positives), TN (true negatives), FP (false positives), FN (false negatives). == Justification == The key concept of the P4 metric is to leverage the four key conditional probabilities: P ( + ∣ C + ) {\displaystyle P(+\mid C{+})} — the probability that the sample is positive, provided the classifier result was positive. P ( C + ∣ + ) {\displaystyle P(C{+}\mid +)} — the probability that the classifier result will be positive, provided the sample is positive. P ( C − ∣ − ) {\displaystyle P(C{-}\mid -)} — the probability that the classifier result will be negative, provided the sample is negative. P ( − ∣ C − ) {\displaystyle P(-\mid C{-})} — the probability the sample is negative, provided the classifier result was negative. The main assumption behind this metric is that all the probabilities mentioned above are close to 1 for a properly designed binary classifier. Indeed, P 4 = 1 {\displaystyle \mathrm {P} _{4}=1} if, and only if, all of the probabilities above are equal to 1. Another important feature is that P 4 {\displaystyle \mathrm {P} _{4}} tends to zero any of the above probabilities tend to zero. == Definition == P4 is defined as a harmonic mean of four key conditional probabilities: P 4 = 4 1 P ( + ∣ C + ) + 1 P ( C + ∣ + ) + 1 P ( C − ∣ − ) + 1 P ( − ∣ C − ) = 4 1 p r e c i s i o n + 1 r e c a l l + 1 s p e c i f i c i t y + 1 N P V . {\displaystyle \mathrm {P} _{4}={\frac {4}{{\frac {1}{P(+\mid C{+})}}+{\frac {1}{P(C{+}\mid +)}}+{\frac {1}{P(C{-}\mid -)}}+{\frac {1}{P(-\mid C{-})}}}}={\frac {4}{{\frac {1}{\mathit {precision}}}+{\frac {1}{\mathit {recall}}}+{\frac {1}{\mathit {specificity}}}+{\frac {1}{\mathit {NPV}}}}}.} In terms of TP,TN,FP,FN it can be calculated as follows: P 4 = 4 ⋅ T P ⋅ T N 4 ⋅ T P ⋅ T N + ( T P + T N ) ⋅ ( F P + F N ) . {\displaystyle \mathrm {P} _{4}={\frac {4\cdot \mathrm {TP} \cdot \mathrm {TN} }{4\cdot \mathrm {TP} \cdot \mathrm {TN} +(\mathrm {TP} +\mathrm {TN} )\cdot (\mathrm {FP} +\mathrm {FN} )}}.} == Evaluation of the binary classifier performance == Evaluating the performance of binary classifiers is a multidisciplinary concept. It spans from the evaluation of medical tests, psychiatric tests to machine learning classifiers from a variety of fields. Thus, many of the metrics in use exist under several names, some defined independently. == Properties of P4 metric == Symmetry — contrasting to the F1 metric, P4 is symmetrical. It means - it does not change its value when dataset labeling is changed - positives named negatives and negatives named positives. Range: P 4 ∈ [ 0 , 1 ] {\displaystyle \mathrm {P} _{4}\in [0,1]} . Achieving P 4 ≈ 1 {\displaystyle \mathrm {P} _{4}\approx 1} requires all the key four conditional probabilities being close to 1. For P 4 ≈ 0 {\displaystyle \mathrm {P} _{4}\approx 0} it is sufficient that one of the key four conditional probabilities is close to 0. == Examples, comparing with the other metrics == Dependency table for selected metrics ("true" means depends, "false" - does not depend): Metrics that do not depend on a given probability are prone to misrepresentation when the probability approaches 0. === Example 1: Rare disease detection test === Let us consider a medical test used to detect a rare disease. Suppose a population size of 100000 and 0.05% of the population is infected. Further suppose the following test performance: 95% of all positive individuals are classified correctly (TPR=0.95) and 95% of all negative individuals are classified correctly (TNR=0.95). In such a case, due to high population imbalance and in spite of having high test accuracy (0.95), the probability that an individual who has been classified as positive is in fact positive is very low: P ( + ∣ C + ) = 0.0095. {\displaystyle P(+\mid C{+})=0.0095.} We can observe how this low probability is reflected in some of the metrics: P 4 = 0.0370 {\displaystyle \mathrm {P} _{4}=0.0370} , F 1 = 0.0188 {\displaystyle \mathrm {F} _{1}=0.0188} , J = 0.9100 {\displaystyle \mathrm {J} =\mathbf {0.9100} } (Informedness / Youden index), M K = 0.0095 {\displaystyle \mathrm {MK} =0.0095} (Markedness). === Example 2: Image recognition — cats vs dogs === Consider the problem of training a neural network based image classifier with only two types of images: those containing dogs (labeled as 0) and those containing cats (labeled as 1). Thus, the goal is to distinguish between the cats and dogs. Suppose that the classifier overpredicts in favour of cats ("positive" samples): 99.99% of cats are classified correctly and only 1% of dogs are classified correctly. Further, suppose that the image dataset consists of 100000 images, 90% of which are pictures of cats and 10% are pictures of dogs. In this situation, the probability that the picture containing dog will be classified correctly is pretty low: P ( C − | − ) = 0.01. {\displaystyle P(C-|-)=0.01.} Not all metrics are notice this low probability: P 4 = 0.0388 {\displaystyle \mathrm {P} _{4}=0.0388} , F 1 = 0.9478 {\displaystyle \mathrm {F} _{1}=\mathbf {0.9478} } , J = 0.0099 {\displaystyle \mathrm {J} =0.0099} (Informedness / Youden index), M K = 0.8183 {\displaystyle \mathrm {MK} =\mathbf {0.8183} } (Markedness).

    Read more →