AI Assistant Jetbrains Plugin

AI Assistant Jetbrains Plugin — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Patch management

    Patch management

    Patch management (or patch management policy or patch policy or patch management process) is concerned with the identification, acquisition, distribution, testing and installation of patches to systems. Proper patch management can be a net productivity boost for an organization. Patches can be used to defend against and eliminate potential vulnerabilities of a system, so that no threats may exploit them. Problems can arise during patch management, including buggy patches that either fail to fix their problem or introduce new issues. Patch management tools help orchestrate all of the procedures involved in patch management. == Description == Patch management is defined as a sub-practice of various disciplines including vulnerability management (part of security management), lifecycle management (with further possible sub-classification into application lifecycle management and release management), change management, and systems management. The practice is broadly concerned with the identification, acquisition, distribution, and installation of patches to systems. Some definitions of patch management are as a software-level practice, while others are as a systems-level process: software, drivers, and firmware. == Cost–benefit analysis == While reserving time for patching takes up enterprise resources, there are balancing factors which can make proper patch management into a net productivity boost for an organization. Up-to-date systems often perform more efficiently, less costly, with less errors, less security risks, and better user workflow. Additionally, compliance with changing local and federal regulations are more likely to be satisfied. Patching security vulnerabilities has been one among many competing priorities for organizations, leading to longer periods before patching for some organizations. Equifax was too slow to implement its 2015 patch management plan to be able to mitigate or prevent the 2017 Equifax data breach, leading to scrutiny from regulators. == Relation to security management == Patches can be used to defend against and eliminate potential vulnerabilities of a system, so that no threats may exploit them; therefore, patch management can be considered a sub-discipline of vulnerability management. Every patchable device in a system presents an attack surface that must be secured. === Time plan === Automatic updates are where the patch is applied automatically with little to know actions or planning required. This approach is recommended for many individuals and organizations. Some organizations also have to prioritize which patches to prioritize given limited resources. Patch Tuesday is the most common process when major companies like Microsoft and Adobe release patches on a known date so that companies can plan resources around implementing the patches more quickly. Linux is open-sourced and patches can be released at any time, leading some to rely on mailing lists or other ways to be alerted to updates. === Inventory === Taking an inventory of software and hardware, including versions can make it easier to correlate with bugs or patches as they become known. Taking stock of how much education and support others in an organization need to install their patches can also help for planning how to implement the patch or design systems to begin with. Streamlining the process by using tools that can communicate with each other can also help to reduce the time of exposure to known vulnerabilities. == Challenges == There are a multitude of problems that can arise during patch management. A common issue is buggy patches, which either fail to fix their problem or introduce new issues. Another issue is deployment synchronization, since various subsystems may receive instructions to update at different times. Similarly, the difficulty of patch management across many devices may grow at an uncontrollable rate depending on organizational size. One prominent demonstration of the challenges facing proper patch management was the buggy Falcon Sensor patch by CrowdStrike which caused one of the worst IT outages of all time. == Implementations == A patch management tool (alternatively patch manager, patch management system, patch management software, or centralized patch management) help orchestrate all of the procedures involved in patch management. Tools can be in-house (applied locally by local administrators), or external, as with managed service providers (applied externally by a provider). === Patch management software === Windows Update for Business, System Center Configuration Manager, and Windows Server Update Services offer control over patch deployment, with features enabling testing, scheduling updates, and setting custom configurations on Windows platforms. === Managed service providers === == Regulatory requirements (United States) == Timely patching of software vulnerabilities is a requirement under multiple regulatory frameworks in the United States. The Health Insurance Portability and Accountability Act (HIPAA) Security Rule requires covered entities to protect electronic protected health information by implementing security measures sufficient to reduce risks to a reasonable and appropriate level, which industry guidance has long interpreted to include timely patch management. A proposed new HIPAA Security Rule would make patch management requirements explicit, mandating that covered entities and business associates deploy security patches and updates within a defined risk-based timeline and maintain written procedures for prioritizing, testing, and applying patches to systems that store, process, or transmit ePHI. The 2025 proposal continues to receive industry pushback as of December 2025. HIPAA was last updated in 2013. The Payment Card Industry Data Security Standard (PCI DSS) requires organizations to protect system components from known vulnerabilities by installing applicable security patches within one month of release for critical patches. The Cybersecurity and Infrastructure Security Agency (CISA) maintains a Known Exploited Vulnerabilities (KEV) catalog that compels U.S. federal agencies to remediate listed vulnerabilities within specified timelines. Agencies are typically required to patch within 3 weeks, though some vulnerabilities must be fixed within 24 hours.

    Read more →
  • Grammatik

    Grammatik

    Grammatik was the first grammar-checking program for home computers. Aspen Software of Albuquerque, NM, released the earliest version of this diction and style checker for personal computers. It was first released no later than 1981, and was inspired by the Writer's Workbench. Grammatik was first available for the TRS-80, and soon had versions for CP/M and the IBM PC. Reference Software International of San Francisco, California, acquired Grammatik in 1985. Development of Grammatik continued, and it became an actual grammar checker that could detect writing errors beyond simple style checking. Subsequent versions were released for MS-DOS, Windows, Macintosh, and Unix. Grammatik was ultimately acquired by WordPerfect Corporation and is integrated into the WordPerfect word processor.

    Read more →
  • Brill tagger

    Brill tagger

    The Brill tagger is an inductive method for part-of-speech tagging. It was described and invented by Eric Brill in his 1993 PhD thesis. It can be summarized as an "error-driven transformation-based tagger". It is: a form of supervised learning, which aims to minimize error; and, a transformation-based process, in the sense that a tag is assigned to each word and changed using a set of predefined rules. In the transformation process, if the word is known, it first assigns the most frequent tag, or if the word is unknown, it naively assigns the tag "noun" to it. High accuracy is eventually achieved by applying these rules iteratively and changing the incorrect tags. This approach ensures that valuable information such as the morphosyntactic construction of words is employed in an automatic tagging process. == Algorithm == The algorithm starts with initialization, which is the assignment of tags based on their probability for each word (for example, "dog" is more often a noun than a verb). Then "patches" are determined via rules that correct (probable) tagging errors made in the initialization phase: Initialization: Known words (in vocabulary): assigning the most frequent tag associated to a form of the word Unknown word == Rules and processing == The input text is first tokenized, or broken into words. Typically in natural language processing, contractions such as "'s", "n't", and the like are considered separate word tokens, as are punctuation marks. A dictionary and some morphological rules then provide an initial tag for each word token. For example, a simple lookup would reveal that "dog" may be a noun or a verb (the most frequent tag is simply chosen), while an unknown word will be assigned some tag(s) based on capitalization, various prefix or suffix strings, etc. (such morphological analyses, which Brill calls Lexical Rules, may vary between implementations). After all word tokens have (provisional) tags, contextual rules apply iteratively, to correct the tags by examining small amounts of context. This is where the Brill method differs from other part of speech tagging methods such as those using Hidden Markov Models. Rules are reapplied repeatedly, until a threshold is reached, or no more rules can apply. Brill rules are of the general form: tag1 → tag2 IF Condition where the Condition tests the preceding and/or following word tokens, or their tags (the notation for such rules differs between implementations). For example, in Brill's notation: IN NN WDPREVTAG DT while would change the tag of a word from IN (preposition) to NN (common noun), if the preceding word's tag is DT (determiner) and the word itself is "while". This covers cases like "all the while" or "in a while", where "while" should be tagged as a noun rather than its more common use as a conjunction (many rules are more general). Rules should only operate if the tag being changed is also known to be permissible, for the word in question or in principle (for example, most adjectives in English can also be used as nouns). Rules of this kind can be implemented by simple Finite-state machines. See Part of speech tagging for more general information including descriptions of the Penn Treebank and other sets of tags. Typical Brill taggers use a few hundred rules, which may be developed by linguistic intuition or by machine learning on a pre-tagged corpus. == Code == Brill's code pages at Johns Hopkins University are no longer on the web. An archived version of a mirror of the Brill tagger at its latest version as it was available at Plymouth Tech can be found on Archive.org. The software uses the MIT License.

    Read more →
  • Live Transcribe

    Live Transcribe

    Live Transcribe is a mobile app for real-time captioning, developed by Google for the Android operating system. Development on the application began in partnership with Gallaudet University. It was publicly released as a free beta for Android 5.0+ on the Google Play Store on February 4, 2019. As of early 2023 it had been downloaded over 500 million times. == Development == Researchers Dimitri Kanevsky, Sagar Savla and Chet Gnegy at Google developed the app in collaboration with researchers at Gallaudet University, an American university for the education of the deaf and hard of hearing. The app uses machine learning to generate captions, similar to YouTube's auto-generated captions. In August 2019, Google made Live Transcribe an open-source project. == Features == The app uses speech recognition to generate live captions in over 80 languages with varying accuracy. The app, which requires connection to the Internet to function, is available to download on the Google Play Store. A later update to the app displayed information on sounds such as clapping, laughter, music, applause, and whistling. In May 2020, the app started supporting transcription in Albanian, Burmese, Estonian, Macedonian, Mongolian, Punjabi, and Uzbek, supporting 70 languages. In March 2022, the app was updated with support to transcribe offline, without Internet connection, so long as the appropriate language pack has been installed. The offline mode is only available for devices with 6GB of RAM and certain Google Pixel devices.

    Read more →
  • Showbox.com

    Showbox.com

    Showbox is an online video streaming platform that enables users to stream and download many videos, commonly movies and TV shows, for free. == History == The company opened the platforms to users who registered from its beta in late 2015. The platform was officially launched in February 2016, enabling any visitor to sign up and create videos online. In April 2016, Showbox was featured on the Product Hunt website, coming to the top of the website's lists for that day and week with over 1400 upvotes from the Product Hunt community. Also in April 2016, Showbox partnered with YouTube's leading multi-channel networks, including Fullscreen, BroadbandTV, StyleHaul, AwesomenessTV, and BuzzMyVideos, to enable their communities of creators to access the platform. In June 2016, the company launched Showbox For Brands, a business-oriented video creation platform, enabling companies to create video content in-house and with their communities and influencers. In March 2017, the company launched Showbox Engage, a use case of its B2B product launched in 2016, enabling companies to launch user-generated content campaigns with their communities. In April 2017, Showbox and the United Nations announced a partnership around the 70th anniversary of the declaration of human rights, with an annual, ongoing global campaign in 135 languages, inviting people worldwide to create their part of the declaration in a video from anywhere around the world. In November 2017, Showbox partnered with the Ad:tech and Digital Marketing World Forum conferences (DMWF) in New York to provide their users and communities with a User Generated Content video solution. == Technology == Showbox's video creation technology includes an online green screen feature, proprietary computer vision algorithms, deep learning technology to support the automatic creation of videos in the cloud, and advanced video composition, including special effects. == Coverage and awards == In March 2015, Showbox was nominated as one of the 10 Israeli startups to take over our TV screens this year. In July 2016, Showbox won the Publicis90 award as part of Publicis' "global initiative to foster digital entrepreneurship". In March 2017, Showbox was chosen as one of The Culture Trip's 10 startups to watch for in 2017.

    Read more →
  • List of chatbots

    List of chatbots

    A chatbot is a software application or web interface that is designed to mimic human conversation through text or voice interactions. Modern chatbots are typically online and use generative artificial intelligence systems that are capable of maintaining a conversation with a user in natural language and simulating the way a human would behave as a conversational partner. Such chatbots often use large language models (LLMs) and natural language processing, but simpler chatbots have existed for decades. == LLM chatbots == == General chatbots == == Historical chatbots ==

    Read more →
  • IruSoft

    IruSoft

    IruSoft (Arabic: آيروسوفت) is an insurance regulatory platform designated for licensing, supervision and inspection of the insurance sector within a country. The platform introduced unique supervision-technology (suptech), insurance-technology (insurtech) and regulatory-technology (regtech) automated modules by which a regulator requires less resources to ensure fairness, transparency and competition and to prevent conflicts of interest in the sector. IruSoft was founded by Abdullah Al-Salloum and owned by the Insurance Regulatory Unit in Kuwait. The Insurance Regulatory Unit optimized processing insurance-sector's customer complaints by issuing Resolution No. (1) of 2022 that introduced IruSoft's complaints public module; an automated resolution center, by which the process of receiving submitted complaints, passing them on to the platforms of licensed insurance companies, tracking matter-related discussions and updates and getting them escalated if unresolved to be discussed by a committee assigned by the unit is integrally automated and analyzed for better key performance indicators.

    Read more →
  • Object co-segmentation

    Object co-segmentation

    In computer vision, object co-segmentation is a special case of image segmentation, which is defined as jointly segmenting semantically similar objects in multiple images or video frames. == Challenges == It is often challenging to extract segmentation masks of a target/object from a noisy collection of images or video frames, which involves object discovery coupled with segmentation. A noisy collection implies that the object/target is present sporadically in a set of images or the object/target disappears intermittently throughout the video of interest. Early methods typically involve mid-level representations such as object proposals. == Dynamic Markov networks-based methods == A joint object discover and co-segmentation method based on coupled dynamic Markov networks has been proposed recently, which claims significant improvements in robustness against irrelevant/noisy video frames. Unlike previous efforts which conveniently assumes the consistent presence of the target objects throughout the input video, this coupled dual dynamic Markov network based algorithm simultaneously carries out both the detection and segmentation tasks with two respective Markov networks jointly updated via belief propagation. Specifically, the Markov network responsible for segmentation is initialized with superpixels and provides information for its Markov counterpart responsible for the object detection task. Conversely, the Markov network responsible for detection builds the object proposal graph with inputs including the spatio-temporal segmentation tubes. == Graph cut-based methods == Graph cut optimization is a popular tool in computer vision, especially in earlier image segmentation applications. As an extension of regular graph cuts, multi-level hypergraph cut is proposed to account for more complex high order correspondences among video groups beyond typical pairwise correlations. With such hypergraph extension, multiple modalities of correspondences, including low-level appearance, saliency, coherent motion and high level features such as object regions, could be seamlessly incorporated in the hyperedge computation. In addition, as a core advantage over co-occurrence based approach, hypergraph implicitly retains more complex correspondences among its vertices, with the hyperedge weights conveniently computed by eigenvalue decomposition of Laplacian matrices. == CNN/LSTM-based methods == In action localization applications, object co-segmentation is also implemented as the segment-tube spatio-temporal detector. Inspired by the recent spatio-temporal action localization efforts with tubelets (sequences of bounding boxes), Le et al. present a new spatio-temporal action localization detector Segment-tube, which consists of sequences of per-frame segmentation masks. This Segment-tube detector can temporally pinpoint the starting/ending frame of each action category in the presence of preceding/subsequent interference actions in untrimmed videos. Simultaneously, the Segment-tube detector produces per-frame segmentation masks instead of bounding boxes, offering superior spatial accuracy to tubelets. This is achieved by alternating iterative optimization between temporal action localization and spatial action segmentation. The proposed segment-tube detector is illustrated in the flowchart on the right. The sample input is an untrimmed video containing all frames in a pair figure skating video, with only a portion of these frames belonging to a relevant category (e.g., the DeathSpirals). Initialized with saliency based image segmentation on individual frames, this method first performs temporal action localization step with a cascaded 3D CNN and LSTM, and pinpoints the starting frame and the ending frame of a target action with a coarse-to-fine strategy. Subsequently, the segment-tube detector refines per-frame spatial segmentation with graph cut by focusing on relevant frames identified by the temporal action localization step. The optimization alternates between the temporal action localization and spatial action segmentation in an iterative manner. Upon practical convergence, the final spatio-temporal action localization results are obtained in the format of a sequence of per-frame segmentation masks (bottom row in the flowchart) with precise starting/ending frames.

    Read more →
  • Spatial embedding

    Spatial embedding

    Spatial embedding is one of feature learning techniques used in spatial analysis where points, lines, polygons or other spatial data types. representing geographic locations are mapped to vectors of real numbers. Conceptually it involves a mathematical embedding from a space with many dimensions per geographic object to a continuous vector space with a much lower dimension. Such embedding methods allow complex spatial data to be used in neural networks and have been shown to improve performance in spatial analysis tasks == Embedded data types == Geographic data can take many forms: text, images, graphs, trajectories, polygons. Depending on the task, there may be a need to combine multimodal data from different sources. The next section describes examples of different types of data and their uses. === Text === Geolocated posts on social media can be used to acquire a library of documents bound to a given place that can be later transformed to embedded vectors using word embedding techniques. === Image === Satellites and aircraft collect digital spatial data acquired from remotely sensed images which can be used in machine learning. They are sometimes hard to analyse using basic image analysis methods and convolutional neural networks can be used to acquire an embedding of images bound to a given geographical object or a region. === Point === A single point of interest (POI) can be assigned multiple features that can be used in machine learning. These could be demographic, transportation, meteorological, or economic data, for example. When embedding single points, it is common to consider the entire set of available points as nodes in a graph. === Line / multiline === Among other things, motion trajectories are represented as lines (multilines). Individual trajectories are embedded taking into account travel time, distances and also features of points visited along the way. Embedding of trajectories allows to improve performance of such tasks as clustering and also categorization. === Polygon === The geographic areas analyzed in machine learning are defined by both administrative boundaries and top-down division into grids of regular shapes such as rectangles, for example. Both types are represented as polygons and, like points, can be assigned different demographic, transportation, or economic features. A polygon can also have features related to the size of the area or shape it represents. === Graph === An example domain where graph representation is used is the street layout in a city, where vertices can be intersections and edges can be roads. The vertices can also be destination points like public transport stops or important points in the city, and the edges represent the flow between them. Embedding graphs or single vertices allows to improve accuracy of analysis methods in which the treated geographical domain can be represented as a network. == Usage == POI recommendation - generating personalized point of interest recommendations based on user preferences. Next/future location prediction - prediction of the next location a person will go to based on their historical trajectory. Zone functions classification - based on different mobility of people or POI distribution a function of a given area in a city can be predicted. Crime prediction - estimation of crime rate in different regions of a city. Local event detection - studying spatio-temporal changes in embeddings can provide valuable information in detection of local event occurring in specific location. Regional mobility popularity prediction - analysis of mobility can show patterns in popularity of different regions in a city. Shape matching - finding a similar shape of given polygon, for example finding building with the same shape as input building. Travel time estimation - predicting estimated travel time given current traffic conditions and special occurring events. Time estimation for on-demand food delivery - estimation of delivery time when placing an order through the website. == Temporal aspect == Some of the data analyzed has a timestamp associated with it. In some cases of data analysis this information is omitted and in others it is used to divide the set into groups. The most common division is the separation of weekdays from weekends or division into hours of the day. This is particularly important in the analysis of mobility data, because the characteristics of mobility during the week and at different times of the day are very different from each other. Another area in which time division into, for example, individual months can be used is in the analysis of tourism of a given region. In order to take such a split into account, embedding methods treat the time stamp specifically or separate versions of the model are developed for different subgroups of the analyzed set.

    Read more →
  • Cleverbot

    Cleverbot

    Cleverbot is a chatterbot web application. It was created by British AI scientist Rollo Carpenter and launched in October 2008. It was preceded by Jabberwacky, a chatbot project that began in 1988 and went online in 1997. In its first decade, Cleverbot held several thousand conversations with Carpenter and his associates. Since launching on the web, the number of conversations held has exceeded 150 million. Besides the web application, Cleverbot is also available as an iOS, Android, and Windows Phone app. == Operation == Cleverbot's responses are not pre-programmed because it learns from human input: Humans type into the box below the Cleverbot logo and the system finds all keywords or an exact phrase matching the input. After searching through its saved conversations, it responds to the input by finding how a human responded to that input when it was asked, in part or in full, by Cleverbot. Cleverbot participated in a formal Turing test at the 2011 Techniche festival at the Indian Institute of Technology Guwahati on 3 September 2011. Out of the 1334 votes cast, Cleverbot was judged to be 59.3% human, compared to the rating of 63.3% human achieved by human participants. A score of 50.05% or higher is often considered to be a passing grade. The software running for the event had to handle just 1 or 2 simultaneous requests, whereas online Cleverbot is usually talking to around 10,000 to 50,000 people at once. == Developments == Cleverbot is constantly growing in data size at the rate of 4 to 7 million interactions per day. Updates to the software have been mostly behind the scenes. In 2014, Cleverbot was upgraded to use GPU serving techniques. Unlike Eliza, the program does not respond in a fixed way, instead choosing its responses heuristically using fuzzy logic, the whole of the conversation being compared to the millions that have taken place before. Cleverbot now uses over 279 million interactions, about 3-4% of the data it has already accumulated. The developers of Cleverbot are attempting to build a new version using machine learning techniques. An app that uses the Cleverscript engine to play a game of 20 Questions has been launched under the name Clevernator. Unlike other such games, the player asks the questions and it is the role of the AI to understand, and answer factually. An app that allows owners to create and talk to their own small Cleverbot-like AI has been launched, called Cleverme! for Apple products. == In popular culture == Cleverbot received media attention after being featured in the popular 2010 creepypasta ARG web serial Ben Drowned by Alexander D. Hall. In early 2017, a Twitch stream of two Google Home devices modified to talk to each other using Cleverbot garnered over 700,000 visitors and over 30,000 peak concurrent viewers.

    Read more →
  • Microsoft Whiteboard

    Microsoft Whiteboard

    Microsoft Whiteboard is a free multi-platform application, as well as an online service and a feature in Microsoft Teams, which simulates a virtual whiteboard and enables real-time collaboration between users. == Overview and features == Microsoft Whiteboard allows users to draw on a virtual whiteboard using input methods such as a stylus pen or a mouse and keyboard, and write down notes, draw connections between shareable ideas, and interact in real time. Microsoft Whiteboard is available to download on the following platforms and devices: Microsoft Windows (on Windows 10 or above) Android Apple iOS Surface Hub devices It is also available on the web and as a feature in Microsoft Teams. Microsoft Whiteboard allows users with Microsoft accounts to view, edit, and share whiteboards using the provided tools and options. The feature set includes tools for drawing, shapes, and media. Drawing in Microsoft Whiteboard is called inking. It works both on mobile devices and computers. The inking toolbar has customizable pencils, a ruler, a highlighter, an eraser, and an object selector. Whiteboard can recognize shapes drawn by hand and straighten them. Holding the Shift key on a computer while inking draws straight lines. Microsoft Whiteboard has keyboard shortcuts for some functions. Additional features include inserting sticky notes, text boxes, stickers, as well as images. Grid lines and colors are adjustable. Different templates can be inserted into the whiteboard. Users can also share their reactions. A feature limited to boards created in Microsoft Teams, is the ability to make them read-only; other participants from the meeting cannot edit them. == Reviews == PC Magazine gave Microsoft Whiteboard a score of 3.5 out of 5, praising the app's free availability and plentiful templates. It compared it to other, paid whiteboarding solutions, and concluded that Microsoft offers the best free one. Some of the cons, described by PCMag, include the inability to view boards without a Microsoft account and the inability to create custom templates.

    Read more →
  • LIVAC Synchronous Corpus

    LIVAC Synchronous Corpus

    LIVAC is an uncommon language corpus dynamically maintained since 1995. Different from other existing corpora, LIVAC has adopted a rigorous and regular "Windows" approach in processing and filtering massive media texts from representative Chinese speech communities such as Beijing, Hong Kong, Macau, Taipei, Singapore, Shanghai, as well as Guangzhou, and Shenzhen. The contents are thus deliberately repetitive in most cases, represented by textual samples drawn from editorials, local and international news, cross-Taiwan Strait news, as well as news on finance, sports and entertainment. By 2023, more than 3 billion characters of news media texts have been filtered, of which 700 million characters have been processed and analyzed and have yielded an expanding Pan-Chinese dictionary of 2.5 million words from the Pan-Chinese printed media. Through rigorous analysis based on computational linguistic methodology, LIVAC has at the same time accumulated a large amount of accurate and meaningful statistical data on the Chinese language and on their diverse speech communities in the Pan-Chinese context, and the results show considerable and important long standing as well as evolving variations. The "Windows" approach is the most innovative feature of LIVAC and has enabled Pan-Chinese media texts to be quantitatively analyzed according to various attributes such as locations, time and subject domains. Thus, various types of comparative studies and applications in information technology as well as development of often related innovative applications have been possible. Moreover, LIVAC has allowed longitudinal developments to be taken into account, facilitating Key Word in Context (KWIC) search and comprehensive study of target words and their underlying concepts as well as linguistic structures over the past 25 years, based on the above mentioned variables of location, time and subject. Results from the extensive and accumulative data analysis contained in LIVAC have enabled the cultivation of textual databases of proper names, place names, organization names, new words, and bi-weekly and annual rosters of media figures. Related applications have included the establishment of verb and adjective databases, the formulation of sentiment indices, and related opinion mining, to measure and compare the popularity of global media figures in the Chinese media (LIVAC Annual Pan-Chinese Celebrity Rosters, later renamed as the Pan-Chinese Newsmaker Rosters). Notable among these are the decades long periodic reviews of the 25 years of annual pan-Chinese rosters since 2000 and compilation of new word databases (LIVAC Annual Pan-Chinese New Word Rosters). On this basis, the analysis of the emergence, diffusion and transformation of new words, and the publication of dictionaries of neologisms have been made possible. A recent focus is on the relative balance between disyllabic words and growing trisyllabic words in the Chinese language, and the comparative study of light verbs in three Chinese speech communities. as well as the link between the language use and use of language as a reflection of epochal change in China. A new LIVAC version 3.1 was launched in February 2024. == Corpus data processing == Accessing media texts, manual input, etc. Text unification including conversion from simplified to traditional Chinese characters, stored as Big5 and Unicode versions Automatic word segmentation Automatic alignment of parallel texts Manual verification, part-of-speech tagging Extraction of words and addition to regional sub-corpora Combination of regional sub-corpora to update the LIVAC corpus, and master lexical database == Labeling for data curation == Categories used include general terms and proper names, such as: general names, surnames, semi titles; geographical, organizations and commercial entities, etc.; time, prepositions, locations, etc.; stack-words; loanwords; case-word; numerals, etc. Construction of databases of proper names, place names, and specific terms, etc. Generate rosters: "new word rosters", "celebrity or media personality rosters", "place name rosters", compound words and matched words Other parts of speech tagging for sub-database, such as common nouns, numerals, numeral classifiers, different types of verbs, and of adjectives, pronouns, adverbs, prepositions, conjunctions, particles marking mood, onomatopoeia, interjection, etc. == Applications == Compilation of Pan-Chinese dictionaries or local dictionaries Information technology research, such as predictive Chinese text input for mobile phones, automatic speech to text conversion, opinion mining Comparative studies on linguistic and cultural developments in the Pan-Chinese regions, especially in a critical period of history in modern China. Language teaching and learning research, and speech-to-text conversion Customized service on linguistic research and lexical search for international corporations and government agencies The above applications are provided by the following functions: Word Segmentation Search Phrase Search Example Sentence Selection Multi-word Comparison Word Cloud

    Read more →
  • List of performance analysis tools

    List of performance analysis tools

    This is a list of performance analysis tools for use in software development. == General purpose, language independent == The following tools work based on log files that can be generated from various systems. time (Unix) - can be used to determine the run time of a program, separately counting user time vs. system time, and CPU time vs. clock time. timem (Unix) - can be used to determine the wall-clock time, CPU time, and CPU utilization similar to time (Unix) but supports numerous extensions. Supports reporting peak resident set size, major and minor page faults, priority and voluntary context switches via getrusage. Supports sampling procfs on supporting systems to report metrics such as page-based resident set size, virtual memory size, read-bytes, and write-bytes, etc. Supports collecting hardware counters when built with PAPI support. == Multiple languages == The following tools work for multiple languages or binaries. == C and C++ == Arm MAP, a performance profiler supporting Linux platforms. AppDynamics, an application performance management service for C/C++ applications via SDK. AQtime Pro, a performance profiler and memory allocation debugger that can be integrated into Microsoft Visual Studio, and Embarcadero RAD Studio, or can run as a stand-alone application. IBM Rational Purify was a memory debugger allowing performance analysis. Instruments (bundled with Xcode) is used to profile an executable's memory allocations, time usage, filesystem activity, GPU activity etc. Intel Parallel Studio contains Intel VTune Amplifier, which tunes both serial and parallel programs. It also includes Intel Advisor and Intel Inspector. Intel Advisor optimizes vectorization (use of SIMD instructions) and prototypes threading implementations. Intel Inspector detects and debugs races, deadlocks and memory errors. Parasoft Insure++ provides a graphical tool that displays and animates memory allocations in real time to expose memory blowout, fragmentation, overuse, bottlenecks and leaks. Visual Studio Team System Profiler, commercial profiler by Microsoft. == Java == inspectIT is an open-source application performance management (APM) service for monitoring and analyzing software applications, available under the Apache License, Version 2.0 (ALv2). JConsole is the profiler which comes with the Java Development Kit JProfiler JRockit Mission Control, a profiler with low overhead. Netbeans Profiler, a profiler integrated into the NetBeans IDE (internally uses jvisualvm profiler) Plumbr, Java application performance monitoring with automated root cause detection. Links memory leaks, GC inefficiency, slow database and external web service calls, locked threads, and other performance problems to the line in source code that causes them. OverOps, Continuous reliability for the modern software supply chain, automatically detect and deliver root cause automation for all errors. VisualVM is a visual tool integrating several commandline JDK tools and lightweight profiling capabilities. It is bundled with the Java Development Kit since version 6, update 7. == JavaScript == The Firefox web browser's developer tools contain a Performance tool, which gives insight into JavaScript performance of a website. Microsoft Visual Studio AJAX Profiling Extensions is a free profiling tool for JavaScript by Microsoft Research. == .NET == CLR Profiler is a free memory profiler provided by Microsoft for CLR applications. GlowCode is a performance and memory profiler for .NET applications using C# and other .NET languages. It identifies time-intensive functions and detects memory leaks and errors in native, managed and mixed Windows x64 and x86 applications. Visual Studio == PHP == BlackFire.io Dbg Xdebug is a PHP extension which provides debugging and profiling capabilities.

    Read more →
  • Language engineering

    Language engineering

    Language engineering involves the creation of natural language processing systems, whose cost and outputs are measurable and predictable. It is a distinct field contrasted to natural language processing and computational linguistics. A recent trend of language engineering is the use of Semantic Web technologies for the creation, archiving, processing, and retrieval of machine processable language data. Meta-Language Engineering is a proposed extension of Language Engineering first recorded in 2025, associated with the work of Delyone de Paula Canedo Filho. The term is used to designate an approach that, in addition to natural language processing, encompasses the symbolic, cognitive, and epistemological structuring of language systems.

    Read more →
  • Pooling layer

    Pooling layer

    In neural networks, a pooling layer is a kind of network layer that downsamples and aggregates information that is dispersed among many vectors into fewer vectors. It has several uses. It removes redundant information, thus reducing the amount of computation and memory required, which makes the model more robust to small variations in the input; and it increases the receptive field of neurons in later layers in the network. == Convolutional neural network pooling == Pooling is most commonly used in convolutional neural networks (CNN). Below is a description of pooling in 2-dimensional CNNs. The generalization to n-dimensions is immediate. As notation, we consider a tensor x ∈ R H × W × C {\displaystyle x\in \mathbb {R} ^{H\times W\times C}} , where H {\displaystyle H} is height, W {\displaystyle W} is width, and C {\displaystyle C} is the number of channels. A pooling layer outputs a tensor y ∈ R H ′ × W ′ × C ′ {\displaystyle y\in \mathbb {R} ^{H'\times W'\times C'}} . We define two variables f , s {\displaystyle f,s} called "filter size" (aka "kernel size") and "stride". Sometimes, it is necessary to use a different filter size and stride for horizontal and vertical directions. In such cases, we define 4 variables: f H , f W , s H , s W {\displaystyle f_{H},f_{W},s_{H},s_{W}} . The receptive field of an entry in the output tensor, y {\displaystyle y} , are all the entries in x {\displaystyle x} that can affect that entry. === Max pooling === Max Pooling (MaxPool) is commonly used in CNNs to reduce the spatial dimensions of feature maps. Define M a x P o o l ( x | f , s ) 0 , 0 , 0 = max ( x 0 : f − 1 , 0 : f − 1 , 0 ) {\displaystyle \mathrm {MaxPool} (x|f,s)_{0,0,0}=\max(x_{0:f-1,0:f-1,0})} where 0 : f − 1 {\displaystyle 0:f-1} means the range 0 , 1 , … , f − 1 {\displaystyle 0,1,\dots ,f-1} . Note that we need to avoid the off-by-one error. The next input is M a x P o o l ( x | f , s ) 1 , 0 , 0 = max ( x s : s + f − 1 , 0 : f − 1 , 0 ) {\displaystyle \mathrm {MaxPool} (x|f,s)_{1,0,0}=\max(x_{s:s+f-1,0:f-1,0})} and so on. The receptive field of y i , j , c {\displaystyle y_{i,j,c}} is x i s + f − 1 , j s + f − 1 , c {\displaystyle x_{is+f-1,js+f-1,c}} , so in general, M a x P o o l ( x | f , s ) i , j , c = m a x ( x i s : i s + f − 1 , j s : j s + f − 1 , c ) {\displaystyle \mathrm {MaxPool} (x|f,s)_{i,j,c}=\mathrm {max} (x_{is:is+f-1,js:js+f-1,c})} If the horizontal and vertical filter size and strides differ, then in general, M a x P o o l ( x | f , s ) i , j , c = m a x ( x i s H : i s H + f H − 1 , j s W : j s W + f W − 1 , c ) {\displaystyle \mathrm {MaxPool} (x|f,s)_{i,j,c}=\mathrm {max} (x_{is_{H}:is_{H}+f_{H}-1,js_{W}:js_{W}+f_{W}-1,c})} More succinctly, we can write y k = max ( { x k ′ | k ′ in the receptive field of k } ) {\displaystyle y_{k}=\max(\{x_{k'}|k'{\text{ in the receptive field of }}k\})} . If H {\displaystyle H} is not expressible as k s + f {\displaystyle ks+f} where k {\displaystyle k} is an integer, then for computing the entries of the output tensor on the boundaries, max pooling would attempt to take as inputs variables off the tensor. In this case, how those non-existent variables are handled depends on the padding conditions, illustrated on the right. Global Max Pooling (GMP) is a specific kind of max pooling where the output tensor has shape R C {\displaystyle \mathbb {R} ^{C}} and the receptive field of y c {\displaystyle y_{c}} is all of x 0 : H , 0 : W , c {\displaystyle x_{0:H,0:W,c}} . That is, it takes the maximum over each entire channel. It is often used just before the final fully connected layers in a CNN classification head. === Average pooling === Average pooling (AvgPool) is similarly defined A v g P o o l ( x | f , s ) i , j , c = a v e r a g e ( x i s : i s + f − 1 , j s : j s + f − 1 , c ) = 1 f 2 ∑ k ∈ i s : i s + f − 1 ∑ l ∈ j s : j s + f − 1 x k , l , c {\displaystyle \mathrm {AvgPool} (x|f,s)_{i,j,c}=\mathrm {average} (x_{is:is+f-1,js:js+f-1,c})={\frac {1}{f^{2}}}\sum _{k\in is:is+f-1}\sum _{l\in js:js+f-1}x_{k,l,c}} Global Average Pooling (GAP) is defined similarly to GMP. It was first proposed in Network-in-Network. Similarly to GMP, it is often used just before the final fully connected layers in a CNN classification head. === Interpolations === There are some interpolations of max pooling and average pooling. Mixed Pooling is a linear sum of max pooling and average pooling. That is, M i x e d P o o l ( x | f , s , w ) = w M a x P o o l ( x | f , s ) + ( 1 − w ) A v g P o o l ( x | f , s ) {\displaystyle \mathrm {MixedPool} (x|f,s,w)=w\mathrm {MaxPool} (x|f,s)+(1-w)\mathrm {AvgPool} (x|f,s)} where w ∈ [ 0 , 1 ] {\displaystyle w\in [0,1]} is either a hyperparameter, a learnable parameter, or randomly sampled anew every time. Lp Pooling is similar to average pooling, but uses Lp norm average instead of average: y k = ( 1 N ∑ k ′ in the receptive field of k | x k ′ | p ) 1 / p {\displaystyle y_{k}=\left({\frac {1}{N}}\sum _{k'{\text{ in the receptive field of }}k}|x_{k'}|^{p}\right)^{1/p}} where N {\displaystyle N} is the size of receptive field, and p ≥ 1 {\displaystyle p\geq 1} is a hyperparameter. If all activations are non-negative, then average pooling is the case of p = 1 {\displaystyle p=1} , and max pooling is the case of p → ∞ {\displaystyle p\to \infty } . Square-root pooling is the case of p = 2 {\displaystyle p=2} . Stochastic pooling samples a random activation x k ′ {\displaystyle x_{k'}} from the receptive field with probability x k ′ ∑ k ″ x k ″ {\displaystyle {\frac {x_{k'}}{\sum _{k''}x_{k''}}}} . It is the same as average pooling in expectation. Softmax pooling is like max pooling, but uses softmax, i.e. ∑ k ′ e β x k ′ x k ′ ∑ k ″ e β x k ″ {\displaystyle {\frac {\sum _{k'}e^{\beta x_{k'}}x_{k'}}{\sum _{k''}e^{\beta x_{k''}}}}} where β > 0 {\displaystyle \beta >0} . Average pooling is the case of β ↓ 0 {\displaystyle \beta \downarrow 0} , and max pooling is the case of β ↑ ∞ {\displaystyle \beta \uparrow \infty } Local Importance-based Pooling generalizes softmax pooling by ∑ k ′ e g ( x k ′ ) x k ′ ∑ k ″ e g ( x k ″ ) {\displaystyle {\frac {\sum _{k'}e^{g(x_{k'})}x_{k'}}{\sum _{k''}e^{g(x_{k''})}}}} where g {\displaystyle g} is a learnable function. === Other poolings === Spatial pyramidal pooling applies max pooling (or any other form of pooling) in a pyramid structure. That is, it applies global max pooling, then applies max pooling to the image divided into 4 equal parts, then 16, etc. The results are then concatenated. It is a hierarchical form of global pooling, and similar to global pooling, it is often used just before a classification head. Region of Interest Pooling (also known as RoI pooling) is a variant of max pooling used in R-CNNs for object detection. It is designed to take an arbitrarily-sized input matrix, and output a fixed-sized output matrix. Covariance pooling computes the covariance matrix of the vectors { x k , l , 0 : C − 1 } k ∈ i s : i s + f − 1 , l ∈ j s : j s + f − 1 {\displaystyle \{x_{k,l,0:C-1}\}_{k\in is:is+f-1,l\in js:js+f-1}} which is then flattened to a C 2 {\displaystyle C^{2}} -dimensional vector y i , j , 0 : C 2 − 1 {\displaystyle y_{i,j,0:C^{2}-1}} . Global covariance pooling is used similarly to global max pooling. As average pooling computes the average, which is a first-degree statistic, and covariance is a second-degree statistic, covariance pooling is also called "second-order pooling". It can be generalized to higher-order poolings. Blur Pooling means applying a blurring method before downsampling. For example, the Rect-2 blur pooling means taking an average pooling at f = 2 , s = 1 {\displaystyle f=2,s=1} , then taking every second pixel (identity with s = 2 {\displaystyle s=2} ). == Vision Transformer pooling == In Vision Transformers (ViT), there are the following common kinds of poolings. BERT-like pooling uses a dummy [CLS] token, "classification". For classification, the output at [CLS] is the classification token, which is then processed by a LayerNorm-feedforward-softmax module into a probability distribution, which is the network's prediction of class probability distribution. This is the one used by the original ViT and Masked Autoencoder. Global average pooling (GAP) does not use the dummy token, but simply takes the average of all output tokens as the classification token. It was mentioned in the original ViT as being equally good. Multihead attention pooling (MAP) applies a multi headed attention block to pooling. Specifically, it takes as input a list of vectors x 1 , x 2 , … , x n {\displaystyle x_{1},x_{2},\dots ,x_{n}} , which might be thought of as the output vectors of a layer of a ViT. It then applies a feedforward layer F F N {\displaystyle \mathrm {FFN} } on each vector, resulting in a matrix V = [ F F N ( v 1 ) , … , F F N ( v n ) ] {\displaystyle V=[\mathrm {FFN} (v_{1}),\dots ,\mathrm {FFN} (v_{n})]} . This is then sent to a multi-headed attention, resulting in M u l t i h e a d e d A

    Read more →