AI Generator With No Limits

AI Generator With No Limits — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Protocol Builder

    Protocol Builder

    Protocol Builder is a tool in programming languages to generate code to build protocols in a fast and reliable way. Network programming for all kinds of protocols (such as TCP, UDP, and SNMP) includes converting data to be transferred to raw bytes in the sending side and parsing these bytes in the receiving side. Protocol builders facilitate this stage, usually by automatically generating the code. Protocol Programming has many components to be developed, these are: server listener, server connection, client connection, packets, and loggers. Most protocol builders implement these components automatically so developers save time and money. Currently, there are two Protocol Builders in the market, one for C++ from UpRedSun which is for TCP and UDP protocols. The second one is for .Net languages which generates the code in C# for TCP Protocols, this tool is called .Net Protocol Builder.

    Read more →
  • Language resource

    Language resource

    In linguistics and language technology, a language resource is a "[composition] of linguistic material used in the construction, improvement and/or evaluation of language processing applications, (...) in language and language-mediated research studies and applications." According to Bird & Simons (2003), this includes data, i.e. "any information that documents or describes a language, such as a published monograph, a computer data file, or even a shoebox full of handwritten index cards. The information could range in content from unanalyzed sound recordings to fully transcribed and annotated texts to a complete descriptive grammar", tools, i.e., "computational resources that facilitate creating, viewing, querying, or otherwise using language data", and advice, i.e., "any information about what data sources are reliable, what tools are appropriate in a given situation, what practices to follow when creating new data". The latter aspect is usually referred to as "best practices" or "(community) standards". In a narrower sense, language resource is specifically applied to resources that are available in digital form, and then, "encompassing (a) data sets (textual, multimodal/multimedia and lexical data, grammars, language models, etc.) in machine readable form, and (b) tools/technologies/services used for their processing and management". == Typology == As of May 2020, no widely used standard typology of language resources has been established (current proposals include the LREMap, METASHARE, and, for data, the LLOD classification). Important classes of language resources include data lexical resources, e.g., machine-readable dictionaries, linguistic corpora, i.e., digital collections of natural language data, linguistic data bases such as the Cross-Linguistic Linked Data collection, tools linguistic annotations and tools for creating such annotations in a manual or semiautomated fashion (e.g., tools for annotating interlinear glossed text such as Toolbox and FLEx, or other language documentation tools), applications for search and retrieval over such data (corpus management systems), for automated annotation (part-of-speech tagging, syntactic parsing, semantic parsing, etc.), metadata and vocabularies vocabularies, repositories of linguistic terminology and language metadata, e.g., MetaShare (for language resource metadata), the ISO 12620 data category registry (for linguistic features, data structures and annotations within a language resource), or the Glottolog database (identifiers for language varieties and bibliographical database). == Language resource publication, dissemination and creation == A major concern of the language resource community has been to develop infrastructures and platforms to present, discuss and disseminate language resources. Selected contributions in this regard include: a series of International Conferences on Language Resources and Evaluation (LREC), the European Language Resources Association (ELRA, EU-based), and the Linguistic Data Consortium (LDC, US-based), which represent commercial hosting and dissemination platforms for language resources, the Open Languages Archives Community (OLAC), which provides and aggregates language resource metadata, the Language Resources and Evaluation Journal (LREJ), the European Language Grid is a European platform for language technologies (eg services), data and resources. As for the development of standards and best practices for language resources, these are subject of several community groups and standardization efforts, including ISO Technical Committee 37: Terminology and other language and content resources (ISO/TC 37), developing standards for all aspects of language resources, W3C Community Group Best Practices for Multilingual Linked Open Data (BPMLOD), working on best practice recommendations for publishing language resources as Linked Data or in RDF, W3C Community Group Linked Data for Language Technology (LD4LT), working on linguistic annotations on the web and language resource metadata, W3C Community Group Ontology-Lexica (OntoLex), working on lexical resources, the Open Linguistics working group of the Open Knowledge Foundation, working on conventions for publishing and linking open language resources, developing the Linguistic Linked Open Data cloud, the Text Encoding Initiative (TEI), working on XML-based specifications for language resources and digitally edited text.

    Read more →
  • Adversarial stylometry

    Adversarial stylometry

    Adversarial stylometry is the practice of altering writing style to reduce the potential for stylometry to discover the author's identity or their characteristics. This task is also known as authorship obfuscation or authorship anonymisation. Stylometry poses a significant privacy challenge in its ability to unmask anonymous authors or to link pseudonyms to an author's other identities, which, for example, creates difficulties for whistleblowers, activists, and hoaxers and fraudsters. The privacy risk is expected to grow as machine learning techniques and text corpora develop. All adversarial stylometry shares the core idea of faithfully paraphrasing the source text so that the meaning is unchanged but the stylistic signals are obscured. Such a faithful paraphrase is an adversarial example for a stylometric classifier. Several broad approaches to this exist, with some overlap: imitation, substituting the author's own style for another's; translation, applying machine translation with the hope that this eliminates characteristic style in the source text; and obfuscation, deliberately modifying a text's style to make it not resemble the author's own. Manually obscuring style is possible, but laborious; in some circumstances, it is preferable or necessary. Automated tooling, either semi- or fully-automatic, could assist an author. How best to perform the task and the design of such tools is an open research question. While some approaches have been shown to be able to defeat particular stylometric analyses, particularly those that do not account for the potential of adversariality, establishing safety in the face of unknown analyses is an issue. Ensuring the faithfulness of the paraphrase is a critical challenge for automated tools. It is uncertain if the practice of adversarial stylometry is detectable in itself. Some studies have found that particular methods produced signals in the output text, but a stylometrist who is uncertain of what methods may have been used may not be able to reliably detect them. == History == Rao & Rohatgi (2000), an early work in adversarial stylometry, identified machine translation as a possibility, but noted that the quality of translators available at the time presented severe challenges. Kacmarcik & Gamon (2006) is another early work. Brennan, Afroz & Greenstadt (2012) performed the first evaluation of adversarial stylometric methods on actual texts. Brennan & Greenstadt (2009) introduced the first corpus of adversarially authored texts specifically for evaluating stylometric methods; other corpora include the International Imitation Hemingway Competition, the Faux Faulkner contest, and the hoax blog A Gay Girl in Damascus. == Motivations == Rao & Rohatgi (2000) suggest that short, unattributed documents (i.e., anonymous posts) are not at risk of stylometric identification, but pseudonymous authors who have not practiced adversarial stylometry in producing corpuses of thousands of words may be vulnerable. Narayanan et al. (2012) attempted large-scale deanonymisation of 100,000 blog authors with mixed results: the identifications were significantly better than chance, but only accurately matched the blog and author a fifth of the time; identification improved with the number of posts written by the author in the corpus. Even if an author is not identified, some of their characteristics may still be deduced stylometrically, or stylometry may narrow the anonymity set of potential authors sufficiently for other information to complete the identification. Detecting author characteristics (e.g., gender or age) is often simpler than identifying an author from a large, possibly open, set of candidates. Modern machine learning techniques offer powerful tools for identification; further development of corpora and computational stylometric techniques are likely to raise further privacy issues. Gröndahl & Asokan (2020a) say that the general validity of the hypothesis underlying stylometry—that authors have invariant, content-independent 'style fingerprints'—is uncertain, but "the deanonymisation attack is a real privacy concern". Those interested in practicing adversarial stylometry and stylistic deception include whistleblowers avoiding retribution; journalists and activists; perpetrators of frauds and hoaxes; authors of fake reviews; literary forgers; criminals disguising their identity from investigators; and, generally, anyone with a desire for anonymity or pseudonymity. Authors, or agents acting on behalf of authors, may also attempt to remove stylistic clues to author characteristics (e.g., race or gender) so that knowledge of those characteristics cannot be used for discrimination (e.g., through algorithmic bias). Another possible use for adversarial stylometry is in disguising automatically generated text as human-authored. == Methods == With imitation, the author attempts to mislead stylometry by matching their style to another author's. An incomplete imitation, where some of the true author's unique characteristics appear alongside the imitated author's, can be a detectable signal for the use of adversarial stylometry. Imitation can be performed automatically with style transfer systems, though this typically requires a large corpus in the target style for the system to learn from. Another approach is translation, which employs machine translation of a source text to eliminate characteristic style, often through multiple translators in sequence to produce a round-trip translation. Such chained translation can lead to texts being significantly altered, even to the point of incomprehensibility; improved translation tools reduce this risk. More simply-structured texts can be easier to machine translate without losing the original meaning. Machine translation blurs into direct stylistic imitation or obfuscation achieved through automated style transfer, which can be viewed as a "translation" with the same language as input and output. With low-quality translation tools, an author can be required to manually correct major translation errors while avoiding the hazard of re-introducing stylistic characteristics. Wang, Juola & Riddell (2022) found that gross errors introduced by Google Translate were rare, but more common with several intermediate translations—however, occasional simple or short sentences and misspellings in the source text appeared verbatim in the output, potentially providing an identifying signal. Chain translation can leave characteristic traces of its application in a document, which may allow reconstruction of the intermediate languages used and the number of translation steps performed. Obfuscation involves deliberately changing the style of a text to reduce its similarity to other texts by some metric; this may be performed at the time of writing by conscious modification, or as part of a revision process with feedback from the metric being targeted as an input to decide when the text has been sufficiently obfuscated. In contrast to translation, complex texts can offer more opportunities for effective obfuscation without altering meaning, and likewise genres with more permissible variation allow more obfuscation. However, longer texts are harder to thoroughly obfuscate. Obfuscation can blend into imitation if the author develops a novel target style, distinct from their original style. With respect to masking author characteristics, obfuscation may aim to achieve a union (adding signals for imitated characteristics) or an intersection (removing signals and normalising) of other authors' styles. Avoiding the author's own idiosyncrasies and producing a "normalised" text is a critical obfuscatory step: an author may have a unique tendency to misspell certain words, use particular variants, or to format a document in a characteristic way. Stylometric signals vary in how simply they can be adversarially masked; an author may easily change their vocabulary by conscious choice, but altering the pattern of grammar or the letter frequency in their text may be harder to achieve, though Juola & Vescovi (2011) report that imitation typically succeeds at masking more characteristics than obfuscation. Automated obfuscation may require large amounts of training data written by the author. Concerning automated implementations of adversarial stylometry, two possible implementations are rule-based systems for paraphrasing; and encoder–decoder architectures, where the text passes through an intermediate format that is (intended to be) style-neutral. Another division in automated methods is whether there is feedback from an identification system or not. With such feedback, finding paraphrases for author masking has been characterised as a heuristic search problem, exploring textual variants until the result is stylistically sufficiently far (in the case of obfuscation) or near (in the case of imitation), which then constitutes an adversarial example for that identification system. == Evaluation == How

    Read more →
  • NetOwl

    NetOwl

    NetOwl is a suite of multilingual text and identity analytics products that analyze big data in the form of text data – reports, web, social media, etc. – as well as structured entity data about people, organizations, places, and things. NetOwl utilizes artificial intelligence (AI)-based approaches, including natural language processing (NLP), machine learning (ML), and computational linguistics, to extract entities, relationships, and events; to perform sentiment analysis; to assign latitude/longitude to geographical references in text; to translate names written in foreign languages; and to perform name matching and identity resolution. NetOwl's uses include semantic search and discovery, geospatial analysis, intelligence analysis, content enrichment, compliance monitoring, cyber threat monitoring, risk management, and bioinformatics. == History == The first NetOwl product was NetOwl Extractor, which was initially released in 1996. Since then, Extractor has added many new capabilities, including relationship and event extraction, categorization, name translation, geotagging, and sentiment analysis, as well as entity extraction in other languages. Other products were added later to the NetOwl suite, namely TextMiner, NameMatcher, and EntityMatcher. NetOwl has participated in several 3rd party-sponsored text and entity analytics software benchmarking events. NetOwl Extractor was the top-scoring named entity extraction system at the DARPA-sponsored Message Understanding Conference MUC-6 and the top-scoring link and event extraction system in MUC-7. It was also the top-scoring system at several of the NIST-sponsored Automatic Content Extraction (ACE) evaluation tasks. NetOwl NameMatcher was the top-scoring system at the MITRE Challenge for Multicultural Person Name Matching. == Products == The NetOwl suite includes, among others, the following text and entity analytics products: === Text Analytics === NetOwl Extractor performs entity extraction from unstructured texts using natural language processing (NLP), machine learning (ML), and computational linguistics. Extractor also performs semantic relationship and event extraction as well as geotagging of text. It is used for a variety of data sources including both traditional sources (e.g., news, reports, web pages, email) and social media (e.g., Twitter, Facebook, chats, blogs). It runs on a variety of Big Data analytics platforms, including Apache Hadoop and LexisNexis’s High-Performance Computer Cluster (HPCC) technology. It has been integrated with a number of 3rd party analytical tools such as Esri ArcGIS and Google Earth/Maps. === Identity Analytics === NetOwl NameMatcher and EntityMatcher perform name matching and identity resolution for large multicultural and multilingual entity databases using machine learning (ML) and computational linguistics approaches. They are used for applications such as anti–money laundering (AML), watch lists, regulatory compliance, fraud detection, etc.

    Read more →
  • AlternativeTo

    AlternativeTo

    AlternativeTo is a website which lists alternatives to web-based software, desktop computer software, and mobile apps, and sorts the alternatives by various criteria, including the number of registered users who have "Liked" each of them on AlternativeTo. Users can search the site to find better alternatives to an application they are using or previously have used, including free alternatives such as free web applications (cloud computing) which don't require any installation and can be accessed from any browser. == Differences == Unlike a number of other software directory websites, the software is not arranged into categories, but each individual piece of software has its own list of alternatives. However, users can also search by tag to find software, which offers an alternative way of finding related software. Users can also narrow their search by focusing on particular platforms and license types (such as "free for non-commercial use"). AlternativeTo lists basic information such as platform and license type at the top of each entry, but does not have comparison tables listing software features side by side. AlternativeTo does not host software for download but it provides links to official websites to where you can download or buy them. AlternativeTo allows anyone to register and suggest new alternatives, or to update the information held about existing entries. Suggestions and alterations are reviewed before being made publicly visible. Users can register using either email and password or OpenID. Login with Facebook has been discontinued. As AlternativeTo is itself a web application, it even has a page for alternatives to itself. == Features == Tweets from anyone mentioning particular pieces of software are also pulled in dynamically from Twitter. Each application has an RSS feed for notifying users of newly listed alternatives to that application. After a user has clicked the Like button next to an application, they are offered the opportunity to tell their friends on Facebook or their followers on Twitter that they liked it. The site also has a forum. For software developers, a JSON API used to be available, but has been taken down indefinitely.

    Read more →
  • Conversational user interface

    Conversational user interface

    A conversational user interface (CUI) is a user interface for computers that emulates a conversation with a human. Historically, computers have relied on text-based user interfaces and graphical user interfaces (GUIs) (such as the user pressing a "back" button) to translate the user's desired action into commands the computer understands. While an effective mechanism of completing computing actions, there is a learning curve for the user associated with GUI. Instead, CUIs provide opportunity for the user to communicate with the computer in their natural language rather than in a syntax specific commands.

    Read more →
  • Language identification

    Language identification

    In natural language processing, language identification or language guessing is the problem of determining which natural language a given content is in. Computational approaches to this problem view it as a special case of text categorization, solved with various statistical methods. == Overview == === Logical approach === A common non-statistical intuitive approach (though highly uncertain) is to look for common letter combinations, or distinctive diacritics or punctuation. === Statistical approach === There are several statistical approaches to language identification. An older statistical method by Grefenstette was based on the frequency of short n-grams, which are often function morphemes. For example, "ing" is more common in English than in French, while the sequence "que" is more common in French. Given a new page found on the Web, one counts the number of occurrences of each such short sequence and picks the language whose frequency table it matches the most. One technique is to compare the compressibility of the text to the compressibility of texts in a set of known languages. This approach is known as mutual information based distance measure. The same technique can also be used to empirically construct family trees of languages which closely correspond to the trees constructed using historical methods. Mutual information based distance measure is essentially equivalent to more conventional model-based methods and is not generally considered to be either novel or better than simpler techniques. Another technique, as described by Cavnar and Trenkle (1994) and Dunning (1994) is to create a language n-gram model from a "training text" for each of the languages. These models can be based on characters (Cavnar and Trenkle) or encoded bytes (Dunning); in the latter, language identification and character encoding detection are integrated. Then, for any piece of text needing to be identified, a similar model is made, and that model is compared to each stored language model. The most likely language is the one with the model that is most similar to the model from the text needing to be identified. This approach can be problematic when the input text is in a language for which there is no model. In that case, the method may return another, "most similar" language as its result. Also problematic for any approach are pieces of input text that are composed of several languages, as is common on the Web. As of 2025, a commonly used baseline method is via the fastText library, which has comparable classification accuracy as deep learning techniques, but much faster. == Identifying similar languages == One of the great bottlenecks of language identification systems is to distinguish between closely related languages. Similar languages like Bulgarian and Macedonian or Indonesian and Malay present significant lexical and structural overlap, making it challenging for systems to discriminate between them. In 2014 the DSL shared task has been organized providing a dataset (Tan et al., 2014) containing 13 different languages (and language varieties) in six language groups: Group A (Bosnian, Croatian, Serbian), Group B (Indonesian, Malaysian), Group C (Czech, Slovak), Group D (Brazilian Portuguese, European Portuguese), Group E (Peninsular Spanish, Argentine Spanish), Group F (American English, British English). The best system reached performance of over 95% results (Goutte et al., 2014). Results of the DSL shared task are described in Zampieri et al. 2014. == Software == Apache OpenNLP includes char n-gram based statistical detector and comes with a model that can distinguish 103 languages Apache Tika contains a language detector for 18 languages

    Read more →
  • Microapp

    Microapp

    A microapp is a super-specialized application designed to perform one task or use case with the only objective of doing it well. They follow the single responsibility principle, which states that "a class should have one and only one reason to change." Micro applications help developers create less complex applications while reducing costs by breaking down monolithic systems into groups of independent services acting as one system. A good example of Microapps would be https://docs.citrix.com/en-us/legacy-archive/downloads/microapps.pdfthat provide single purpose action from Salesforce and over 40 applications on its workspace. == Requirements and characteristics == Microapps usually are accessible on any device, display, or operating system without installation on the viewer's device. To qualify as a microapp, the entity must: be built and deployed as an independent software module bring together various media types into a single experience have advanced security and compliance features be functionally-extensible comply with granular data demands be agnostic single use case oriented Microapps differentiate from traditional web or mobile applications by how the end-user interacts with them. Consequently, they can be embedded in websites or viewed online to bypass app stores and are typically built to provide a focused experience to the user. == Usage == Microapps are typically used for commercial purposes to reduce development costs for projects not requiring the large scope of a traditional web or mobile application. In addition, they are often used to showcase in-depth information or enrich marketing material with interactivity. Lately, micro apps are being used to boost productivity by providing quick tools to people to reuse best practices. Users have been interacting with microapps for a while with suites like Microsoft 365 and Google Workspace, where each one of their end-user services could be considered as a microapp. All these microapps share a unique identity manager to provide a unified user experience. == Benefits == Replacing monolith systems with microapps provide several advantages like: Reduce complexity for developers and users. Smaller, more cohesive, and maintainable codebases Scalable organizations with decoupled, autonomous teams Allows for hyper-specialization Independent deployment Multi-stack == Cloud-native microapps == Technologies like Kubernetes, or OpenShift, allow companies to replace their monolith and legacy systems with modular software taking advantage of microapps on reducing costs and improve reliability and security. == Microapps vs. microservices == There is a widespread misunderstanding between these two concepts, which is the key difference. Microservices is an architectural style that is systems-centric, meaning it decouples the presentation and data layer using web services APIs. On the other side, micro apps behave more as a super-architecture style (that embraces microservices among other types), and it is user-centric, meaning they decouple the whole monolith system onto modules that are designed to interact with final users. Both architectural styles rely on modularity to provide high performance, scalability, and resilience. == Considerations == Developing Micro apps requires a different approach than traditional software, and user experience is crucial. The following considerations are essential for switching to microapps. To run multiple microapps is required a single identity management system. Microservices are well suited to make microapps more powerful Apps with different levels of maturity might create a non-unified user experience. Duplication of dependencies can create security issues and inefficiencies. Suitable for well-organized teams

    Read more →
  • Inverse depth parametrization

    Inverse depth parametrization

    In computer vision, the inverse depth parametrization is a parametrization used in methods for 3D reconstruction from multiple images such as simultaneous localization and mapping (SLAM). Given a point p {\displaystyle \mathbf {p} } in 3D space observed by a monocular pinhole camera from multiple views, the inverse depth parametrization of the point's position is a 6D vector that encodes the optical centre of the camera c 0 {\displaystyle \mathbf {c} _{0}} when in first observed the point, and the position of the point along the ray passing through p {\displaystyle \mathbf {p} } and c 0 {\displaystyle \mathbf {c} _{0}} . Inverse depth parametrization generally improves numerical stability and allows to represent points with zero parallax. Moreover, the error associated to the observation of the point's position can be modelled with a Gaussian distribution when expressed in inverse depth. This is an important property required to apply methods, such as Kalman filters, that assume normality of the measurement error distribution. The major drawback is the larger memory consumption, since the dimensionality of the point's representation is doubled. == Definition == Given 3D point p = ( x , y , z ) {\displaystyle \mathbf {p} =(x,y,z)} with world coordinates in a reference frame ( e 1 , e 2 , e 3 ) {\displaystyle (e_{1},e_{2},e_{3})} , observed from different views, the inverse depth parametrization y {\displaystyle \mathbf {y} } of p {\displaystyle \mathbf {p} } is given by: y = ( x 0 , y 0 , z 0 , θ , ϕ , ρ ) {\displaystyle \mathbf {y} =(x_{0},y_{0},z_{0},\theta ,\phi ,\rho )} where the first five components encode the camera pose in the first observation of the point, being c 0 = ( x 0 , y 0 , z 0 ) {\displaystyle \mathbf {c_{0}} =(x_{0},y_{0},z_{0})} the optical centre, ϕ {\displaystyle \phi } the azimuth, θ {\displaystyle \theta } the elevation angle, and ρ = 1 ‖ p − c 0 ‖ {\displaystyle \rho ={\frac {1}{\left\Vert \mathbf {p} -\mathbf {c} _{0}\right\Vert }}} the inverse depth of p {\displaystyle p} at the first observation.

    Read more →
  • Scene text

    Scene text

    Scene text is text that appears in an image captured by a camera in an outdoor environment. The detection and recognition of scene text from camera captured images are computer vision tasks which became important after smart phones with good cameras became ubiquitous. The text in scene images varies in shape, font, colour and position. The recognition of scene text is further complicated sometimes by non-uniform illumination and focus. To improve scene text recognition, the International Conference on Document Analysis and Recognition (ICDAR) conducts a robust reading competition once in two years. The competition was held in 2003, 2005 and during every ICDAR conference. International association for pattern recognition (IAPR) has created a list of datasets as Reading systems. == Text detection == Text detection is the process of detecting the text present in the image, followed by surrounding it with a rectangular bounding box. Text detection can be carried out using image based techniques or frequency based techniques. In image based techniques, an image is segmented into multiple segments. Each segment is a connected component of pixels with similar characteristics. The statistical features of connected components are utilised to group them and form the text. Machine learning approaches such as support vector machine and convolutional neural networks are used to classify the components into text and non-text. In frequency based techniques, discrete Fourier transform (DFT) or discrete wavelet transform (DWT) are used to extract the high frequency coefficients. It is assumed that the text present in an image has high frequency components and selecting only the high frequency coefficients filters the text from the non-text regions in an image. == Word recognition == In word recognition, the text is assumed to be already detected and located and the rectangular bounding box containing the text is available. The word present in the bounding box needs to be recognized. The methods available to perform word recognition can be broadly classified into top-down and bottom-up approaches. In the top-down approaches, a set of words from a dictionary is used to identify which word suits the given image. Images are not segmented in most of these methods. Hence, the top-down approach is sometimes referred as segmentation free recognition. In the bottom-up approaches, the image is segmented into multiple components and the segmented image is passed through a recognition engine. Either an off the shelf Optical character recognition (OCR) engine or a custom-trained one is used to recognise the text.

    Read more →
  • Automatic summarization

    Automatic summarization

    Automatic summarization is the process of shortening a set of data computationally, to create a subset (a summary) that represents the most important or relevant information within the original content. Artificial intelligence (AI) algorithms are commonly developed and employed to achieve this, specialized for different types of data. Text summarization is usually implemented by natural language processing methods, designed to locate the most informative sentences in a given document. On the other hand, visual content can be summarized using computer vision algorithms. Image summarization is the subject of ongoing research; existing approaches typically attempt to display the most representative images from a given image collection, or generate a video that only includes the most important content from the entire collection. Video summarization algorithms identify and extract from the original video content the most important frames (key-frames), and/or the most important video segments (key-shots), normally in a temporally ordered fashion. Video summaries simply retain a carefully selected subset of the original video frames and, therefore, are not identical to the output of video synopsis algorithms, where new video frames are being synthesized based on the original video content. == Commercial products == In 2022 Google Docs released an automatic summarization feature. == Approaches == There are two general approaches to automatic summarization: extraction and abstraction. === Extraction-based summarization === Here, content is extracted from the original data, but the extracted content is not modified in any way. Examples of extracted content include key-phrases that can be used to "tag" or index a text document, or key sentences (including headings) that collectively comprise an abstract, and representative images or video segments, as stated above. For text, extraction is analogous to the process of skimming, where the summary (if available), headings and subheadings, figures, the first and last paragraphs of a section, and optionally the first and last sentences in a paragraph are read before one chooses to read the entire document in detail. Other examples of extraction that include key sequences of text in terms of clinical relevance (including patient/problem, intervention, and outcome). === Abstractive-based summarization === Abstractive summarization methods generate new text that did not exist in the original text. This has been applied mainly for text. Abstractive methods build an internal semantic representation of the original content (often called a language model), and then use this representation to create a summary that is closer to what a human might express. Abstraction may transform the extracted content by paraphrasing sections of the source document, to condense a text more strongly than extraction. Such transformation, however, is computationally much more challenging than extraction, involving both natural language processing and often a deep understanding of the domain of the original text in cases where the original document relates to a special field of knowledge. "Paraphrasing" is even more difficult to apply to images and videos, which is why most summarization systems are extractive. === Aided summarization === Approaches aimed at higher summarization quality rely on combined software and human effort. In Machine Aided Human Summarization, extractive techniques highlight candidate passages for inclusion (to which the human adds or removes text). In Human Aided Machine Summarization, a human post-processes software output, in the same way that one edits the output of automatic translation by Google Translate. == Applications and systems for summarization == There are broadly two types of extractive summarization tasks depending on what the summarization program focuses on. The first is generic summarization, which focuses on obtaining a generic summary or abstract of the collection (whether documents, or sets of images, or videos, news stories etc.). The second is query relevant summarization, sometimes called query-based summarization, which summarizes objects specific to a query. Summarization systems are able to create both query relevant text summaries and generic machine-generated summaries depending on what the user needs. An example of a summarization problem is document summarization, which attempts to automatically produce an abstract from a given document. Sometimes one might be interested in generating a summary from a single source document, while others can use multiple source documents (for example, a cluster of articles on the same topic). This problem is called multi-document summarization. A related application is summarizing news articles. Imagine a system, which automatically pulls together news articles on a given topic (from the web), and concisely represents the latest news as a summary. Image collection summarization is another application example of automatic summarization. It consists in selecting a representative set of images from a larger set of images. A summary in this context is useful to show the most representative images of results in an image collection exploration system. Video summarization is a related domain, where the system automatically creates a trailer of a long video. This also has applications in consumer or personal videos, where one might want to skip the boring or repetitive actions. Similarly, in surveillance videos, one would want to extract important and suspicious activity, while ignoring all the boring and redundant frames captured. At a very high level, summarization algorithms try to find subsets of objects (like set of sentences, or a set of images), which cover information of the entire set. This is also called the core-set. These algorithms model notions like diversity, coverage, information and representativeness of the summary. Query based summarization techniques, additionally model for relevance of the summary with the query. Some techniques and algorithms which naturally model summarization problems are TextRank and PageRank, Submodular set function, Determinantal point process, maximal marginal relevance (MMR) etc. === Keyphrase extraction === The task is the following. You are given a piece of text, such as a journal article, and you must produce a list of keywords or key[phrase]s that capture the primary topics discussed in the text. In the case of research articles, many authors provide manually assigned keywords, but most text lacks pre-existing keyphrases. For example, news articles rarely have keyphrases attached, but it would be useful to be able to automatically do so for a number of applications discussed below. Consider the example text from a news article: "The Army Corps of Engineers, rushing to meet President Bush's promise to protect New Orleans by the start of the 2006 hurricane season, installed defective flood-control pumps last year despite warnings from its own expert that the equipment would fail during a storm, according to documents obtained by The Associated Press". A keyphrase extractor might select "Army Corps of Engineers", "President Bush", "New Orleans", and "defective flood-control pumps" as keyphrases. These are pulled directly from the text. In contrast, an abstractive keyphrase system would somehow internalize the content and generate keyphrases that do not appear in the text, but more closely resemble what a human might produce, such as "political negligence" or "inadequate protection from floods". Abstraction requires a deep understanding of the text, which makes it difficult for a computer system. Keyphrases have many applications. They can enable document browsing by providing a short summary, improve information retrieval (if documents have keyphrases assigned, a user could search by keyphrase to produce more reliable hits than a full-text search), and be employed in generating index entries for a large text corpus. Depending on the different literature and the definition of key terms, words or phrases, keyword extraction is a highly related theme. ==== Supervised learning approaches ==== Beginning with the work of Turney, many researchers have approached keyphrase extraction as a supervised machine learning problem. Given a document, we construct an example for each unigram, bigram, and trigram found in the text (though other text units are also possible, as discussed below). We then compute various features describing each example (e.g., does the phrase begin with an upper-case letter?). We assume there are known keyphrases available for a set of training documents. Using the known keyphrases, we can assign positive or negative labels to the examples. Then we learn a classifier that can discriminate between positive and negative examples as a function of the features. Some classifiers make a binary classification for a test example, while others assign a probability of being a keyphrase. For ins

    Read more →
  • Ernie Bot

    Ernie Bot

    Ernie Bot (Chinese: 文心一言, Pinyin: wénxīn yīyán), full name Enhanced Representation through Knowledge Integration, is an artificial intelligence chatbot developed by the Chinese technology company Baidu. Ernie Bot rivals GPT models in Chinese NLP tasks. It is built on the company's ERNIE series of large language models, which have been in development since 2019. The service was first launched for invited testing on March 16, 2023, and was released to the general public on August 31, 2023, after receiving approval from Chinese regulators. Since its public launch, Ernie Bot has undergone several updates, with newer versions like ERNIE 4.0 and 4.5 released to improve its capabilities. The service has seen rapid user adoption, reportedly reaching over 200 million users by April 2024. It has been integrated into various products, notably powering AI features for the Chinese release of Samsung's Galaxy S24 smartphones. As a product operating in China, Ernie Bot is subject to the country's censorship regulations. It has been observed to refuse answers to politically sensitive questions, such as those regarding CCP general secretary Xi Jinping, the 1989 Tiananmen Square protests and massacre, and other topics deemed taboo by the government. == History == Ernie Bot was initially released for invited testing on March 16, 2023. The live release demo was reported to have been prerecorded, which caused Baidu's stock to drop 10 percent on the day of the launch. The company's stock gained 14 percent the following day after analysts from Citigroup and Bank of America tested Ernie Bot and gave it positive preliminary reviews. On August 31, 2023, Ernie Bot was released to the public after receiving approval from Chinese regulatory authorities. By December 2023, Baidu announced the service had surpassed 100 million users. In January 2024, Hong Kong newspaper South China Morning Post reported that a university research lab linked to the People's Liberation Army (PLA) had tested Ernie Bot for military response scenarios. Baidu denied the allegations, stating it had no connection with the academic paper. That same month, Ernie was integrated into Samsung's Galaxy S24 lineup for its launch in China. The user base reportedly grew to 200 million by April 2024 and 300 million by June 2024. In September 2024, Baidu changed the chatbot's Chinese name from "Wenxin Yiyan" (文心一言) to "Wenxiaoyan" (文小言) to position it as a search assistant. On March 16, 2025, Baidu announced version 4.5 and the reasoning model ERNIE X1. The following month, at the Create2025 Baidu AI Developer Conference, the company released the Wenxin 4.5 Turbo and Wenxin X1 Turbo models, designed to be faster and less expensive to operate. == Development == Ernie Bot is based on Baidu's ERNIE (Enhanced Representation through Knowledge Integration) series of foundation models. The general training process begins with pre-training on large datasets, followed by refinement using techniques like supervised fine-tuning, reinforcement learning with human feedback, and prompt engineering. === Foundation models === ==== Ernie 3.0 ==== The model powering the initial launch of Ernie Bot. It was trained with 10 billion parameters on a 4-terabyte corpus consisting of plain text and a large-scale knowledge graph. ==== Ernie 3.5 ==== Released in June 2023. At the time of release, its performance was reported as "slightly inferior" to OpenAI's GPT-4. ==== Ernie 4.0 ==== Unveiled in October 2023 and released to paying subscribers in November. According to Baidu, this version featured improved performance over its predecessor, with information updated to April 2023. ==== Ernie X1 ==== Announced in March 2025, with Ernie X1 positioned as a specialized reasoning model. Baidu stated that performance improvements were achieved through new technologies such as "FlashMask" dynamic attention masking and a heterogeneous multimodal mixture-of-experts architecture. === Turbo Models === In June 2024, Baidu announced Ernie 4.0 Turbo. In April 2025, Ernie 4.5 Turbo and X1 Turbo were released. These models are optimized for faster response times and lower operational costs. == Service == In its subscription options, the professional plan gives users access to Ernie 4.0 with a payment either for a month or with reduced payment for auto-renewal per month. Meanwhile, Ernie 3.5 is free of charge. Ernie 4.0, the language model for Ernie bot, has information updated to April 2023. == Censorship == Ernie Bot is subject to the Chinese government's censorship regime. In public tests with journalists, Ernie Bot refused to answer questions about CCP general secretary Xi Jinping, the 1989 Tiananmen Square protests and massacre, the persecution of Uyghurs in China in Xinjiang, and the 2019–2020 Hong Kong protests. When queried about the origin of SARS-CoV-2, Ernie Bot stated that it originated among American vape users.

    Read more →
  • Tessellation (computer graphics)

    Tessellation (computer graphics)

    In computer graphics, tessellation is the dividing of datasets of polygons (sometimes called vertex sets) presenting objects in a scene into suitable structures for rendering. Especially for real-time rendering, data is tessellated into triangles, for example in OpenGL 4.0 and Direct3D 11. == In graphics rendering == A key advantage of tessellation for realtime graphics is that it allows detail to be dynamically added and subtracted from a 3D polygon mesh and its silhouette edges based on control parameters (often camera distance). In previously leading realtime techniques such as parallax mapping and bump mapping, surface details could be simulated at the pixel level, but silhouette edge detail was fundamentally limited by the quality of the original dataset. In Direct3D 11 pipeline (a part of DirectX 11), the graphics primitive is the patch. The tessellator generates a triangle-based tessellation of the patch according to tessellation parameters such as the TessFactor, which controls the degree of fineness of the mesh. The tessellation, along with shaders such as a Phong shader, allows for producing smoother surfaces than would be generated by the original mesh. By offloading the tessellation process onto the GPU hardware, smoothing can be performed in real time. Tessellation can also be used for implementing subdivision surfaces, level of detail scaling and fine displacement mapping. OpenGL 4.0 uses a similar pipeline, where tessellation into triangles is controlled by the Tessellation Control Shader and a set of four tessellation parameters. == In computer-aided design == In computer-aided design the constructed design is represented by a boundary representation topological model, where analytical 3D surfaces and curves, limited to faces, edges, and vertices, constitute a continuous boundary of a 3D body. Arbitrary 3D bodies are often too complicated to analyze directly. So they are approximated (tessellated) with a mesh of small, easy-to-analyze pieces of 3D volume—usually either irregular tetrahedra, or irregular hexahedra. The mesh is used for finite element analysis. The mesh of a surface is usually generated per individual faces and edges (approximated to polylines) so that original limit vertices are included into mesh. To ensure that approximation of the original surface suits the needs of further processing, three basic parameters are usually defined for the surface mesh generator: The maximum allowed distance between the planar approximation polygon and the surface (known as "sag"). This parameter ensures that mesh is similar enough to the original analytical surface (or the polyline is similar to the original curve). The maximum allowed size of the approximation polygon (for triangulations it can be maximum allowed length of triangle sides). This parameter ensures enough detail for further analysis. The maximum allowed angle between two adjacent approximation polygons (on the same face). This parameter ensures that even very small humps or hollows that can have significant effect to analysis will not disappear in mesh. An algorithm generating a mesh is typically controlled by the above three and other parameters. Some types of computer analysis of a constructed design require an adaptive mesh refinement, which is a mesh made finer (using stronger parameters) in regions where the analysis needs more detail.

    Read more →
  • Integreat

    Integreat

    Integreat (former project name: Refguide+) is an open source mobile app that provides local information and services tailored to refugees and migrants coming to Germany. The content is maintained by local organizations, such as local governments or integration officers, and made available in locally relevant languages. It was developed by Tür an Tür - Digitalfabrik gGmbH (formerly Tür an Tür - Digital Factory gGmbH) in Augsburg together with a team of researchers and students from the Technical University of Munich. == History == In 1997, the Augsburg association "Tür an Tür", which has been working for refugees since 1992, published the brochure "First Steps", which answers local everyday questions. Since addresses and contact persons change quickly, some information is already outdated after a few weeks. Students of business informatics at the Technical University of Munich therefore developed the app Integreat within eight months together with the association and the social department of the city of Augsburg. The app was then also used by other cities and districts within months. As of February 3, 2022, information is available at 72 locations, including Munich, Dortmund, Nuremberg and Augsburg. == Mode of action == Refugees need information on areas such as registration, contact persons, health care, education, family, work and everyday life. Integreat seeks to provide refugees with this information by allowing them to select their geographic location and receive locally relevant information. This information is available offline once the app is opened so it can be used without an internet connection. In addition, the content is translated into the native languages of refugees and migrants to facilitate access. The content is licensed with a CC BY 4.0 license to facilitate collaboration and translation between content creators and dissemination of the content. Integreat is now being used for a broader migrant audience and says it can also support professionals, volunteers, and counseling centers. == Comparable mobile apps == Other mobile apps that are likewise intended to provide initial orientation for refugees include the app Ankommen, a joint project of the Federal Office for Migration and Refugees, the Goethe-Institut, the Federal Employment Agency and the Bavarian Broadcasting Corporation, which is intended as a companion for the first few weeks in Germany, and the Welcome App, a company-sponsored non-profit initiative for information about Germany and asylum procedures with a regional focus, and a book by the Konrad Adenauer Foundation (KAS) and Verlag Herder with a corresponding app Deutschland - Erste Informationen für Flüchtlinge (Germany - First Information for Refugees) as a companion for Arabic-speaking refugees in Germany.

    Read more →
  • Elastix (image registration)

    Elastix (image registration)

    Elastix is an image registration toolbox built upon the Insight Segmentation and Registration Toolkit (ITK). It is entirely open-source and provides a wide range of algorithms employed in image registration problems. Its components are designed to be modular to ease a fast and reliable creation of various registration pipelines tailored for case-specific applications. It was first developed by Stefan Klein and Marius Staring under the supervision of Josien P.W. Pluim at Image Sciences Institute (ISI). Its first version was command-line based, allowing the final user to employ scripts to automatically process big data-sets and deploy multiple registration pipelines with few lines of code. Nowadays, to further widen its audience, a version called SimpleElastix is also available, developed by Kasper Marstal, which allows the integration of elastix with high level languages, such as Python, Java, and R. == Image registration fundamentals == Image registration is a well-known technique in digital image processing that searches for the geometric transformation that, applied to a moving image, obtains a one-to-one map with a target image. Generally, the images acquired from different sensors (multimodal), time instants (multitemporal), and points of view (multiview) should be correctly aligned to proceed with further processing and feature extraction. Even though there are a plethora of different approaches to image registration, the majority is composed of the same macro building blocks, namely the transformation, the interpolator, the metric, and the optimizer. Registering two or more images can be framed as an optimization problem that requires multiple iterations to converge to the best solution. Starting from an initial transformation computed from the image moments the optimization process searches for the best transformation parameters based on the value of the selected similarity metric. The figure on the right shows the high-level representation of the registration of two images, where the reference remains constant during the entire process, while the moving one will be transformed according to the transformation parameters. In other words, the registration ends when the similarity metric, which is a mathematical function with a certain number of parameters to be optimized, reaches the optimal value which is highly dependent on the specific application. == Main building blocks == Following the structure of the image registration workflow, the elastix toolbox proposes a modular solution that implements for each of the building blocks different algorithms, highly employed in medical image registration, and helps the final users to build their specific pipeline by selecting the most suitable algorithm for each of the main building blocks. Each block is easily configurable both by selecting pre-defined initialization values or by trying multiple sets of parameters and then choosing the most performing one. The registration is performed on images, and the elastix toolbox supports all the data formats supported by ITK, ranging from JPEG and PNG to medical standard formats such as DICOM and NIFTI. It also stores physical pixel spacing, the origin and the relative position to an external world reference system, when provided in the metadata, to facilitate the registration process, especially in medical field applications. === Transformation === The transformation is an essential building block, since it defines the allowable transformations. In image registration, the main distinction can be done between parallel-to-parallel and parallel-to-non parallel (deformable) line mapping transformations. In the elastix toolbox, the final users can select one transformation or compose more transformations either through addition or via composition. Below are reported the different transformation models in order of increasing flexibility, along with the corresponding elastix class names between brackets. Translation (TranslationTransform) allows only translations Rigid (EulerTransform) expands the translation adding rotations and the object is seen as a rigid body Similarity (SimilarityTransform) expands the rigid transformation by introducing isotropic scaling Affine (AffineTransform) expands the rigid transformation allowing both scaling and shear B-splines (BSplineTransform) is a deformable transformation usually preceded by a rigid or affine one Thin-plate splines (SplineKernelTransform) is a deformable transformation belonging to the class of kernel-based transformations that is a composition of and affine and a non-rigid part === Metric === The similarity metric is the mathematical function whose parameters should be optimized to reach the desired registration, and, during the process, it is computed multiple times. Below are reported the available metrics computed employing the reference and the transformed images and the corresponding elastix class names between brackets. Mean squared difference (AdvancedMeanSquares) to be used for mono-modal applications Normalized correlation coefficient (AdvancedNormalizedCorrelation) to be used for images that have an intensity linear relationship Mutual information (AdvancedMattesMutualInformation) to be used for both mono- and multi-modal applications and optimized to reach better performance compared to the normalized version Normalized mutual information (NormalizedMutualInformation) for both mono- and multi-modal applications Kappa statistic (AdvancedKappaStatistic) to be used only for binary images === Sampler === For the computation of the similarity metrics, it is not always necessary to consider all the voxels and, sometimes, it can be useful to use only a fraction of the voxels of the images, i.e. to reduce the execution time for big input images. Below are reported the available criteria for selecting a fraction of the voxels for the similarity metric computation and the corresponding elastix class names between brackets. Full (Full) to employ all the voxels Grid (Grid) to employ a regular grid defined by the user to downsample the image Random (Random) to randomly select a percentage of voxels defined by the users (all voxels have equal probability to be selected) Random coordinate (RandomCoordinate) like the random criterion, but in this case also off-grid positions can be selected to simplify the optimization process === Interpolator === After the application of the transformation, it may occur that the voxels used for the similarity metric computation are at non-voxel positions, so intensity interpolation should be performed to ensure the correctness of the computed values. Below are reported the implemented interpolators and the corresponding elastix class names between brackets. Nearest neighbor (NearestNeighborInterpolator) exploits little resources, but gives low quality results Linear (LinearInterpolator) is sufficient in general applications N-th order B-spline (BSplineInterpolator) can be used to increase the order N, increasing quality and computation time. N=0 and N=1 indicate the nearest neighbor and linear cases respectively. === Optimizer === The optimizer defines the strategy employed for searching the best transformation parameter to reach the correct registration, and it is commonly an iterative strategy. Below are reported some of the implemented optimization strategies. Gradient descent Robbins-Monro, similar to the gradient descent, but employing an approximation of the cost function derivatives A wider range of optimizers is also available, such as Quasi-Newton or evolutionary strategies. === Other features === The elastix software also offers other features that can be employed to speed up the registration procedure and to provide more advanced algorithms to the end-users. Some examples are the introduction of blur and Gaussian pyramid to reduce data complexity, and multi-image and multi-metric framework to deal with more complex applications. == Applications == Elastix has applications mainly in the medical field, where image registration is fundamental to get comprehensive information regarding the analysed anatomical region. It is widely employed in image-guided surgery, tumour monitoring, and treatment assessment. For example, in radiotherapy planning, image registration allows to correctly deliver the treatment and evaluate the obtained results. Thanks to the wide range of implemented algorithms, the use of the elastix software allows physicians and researchers to test different registration pipelines from the simplest to more complex ones, and to save the best one as a configuration file. This file and the fact that the software is completely open-source makes it easy to reproduce the work, that can help supporting the open science paradigm, and allows fast reuse on different patients data. In image-guided surgery, registration time and accuracy are critical points, considering that, during the registration, the patient is on the operating table, and the imag

    Read more →