Dynamic knowledge repository

Dynamic knowledge repository

The dynamic knowledge repository (DKR) is a concept developed by Douglas C. Engelbart as a primary strategic focus for allowing humans to address complex problems. He has proposed that a DKR will enable us to develop a collective IQ greater than any individual's IQ. References and discussion of Engelbart's DKR concept are available at the Doug Engelbart Institute. == Definition == A knowledge repository is a computerized system that systematically captures, organizes and categorizes an organization's knowledge. The repository can be searched and data can be quickly retrieved. The effective knowledge repositories include factual, conceptual, procedural and meta-cognitive techniques. The key features of knowledge repositories include communication forums. A knowledge repository can take many forms to "contain" the knowledge it holds. A customer database is a knowledge repository of customer information and insights – or electronic explicit knowledge. A Library is a knowledge repository of books – physical explicit knowledge. A community of experts is a knowledge repository of tacit knowledge or experience. The nature of the repository only changes to contain/manage the type of knowledge it holds. A repository (as opposed to an archive) is designed to get knowledge out. It should therefore have some rules of structure, classification, taxonomy, record management, etc., to facilitate user engagement.

VieON

VieON is an mobile application for television and video on demand provided by VieON Joint Stock Company (formerly Dzones), a subsidiary of DatVietVAC Media and Entertainment Group in Vietnam. The app was launched in 2020, featuring over 140 domestic and international television channels, original series, popular entertainment programs known nationwide, top-tier sports events and live streaming of major events. Additionally, VieON provides animated films, television series and television programs from various countries such as South Korea and China. == History == The application was planned for development in 2016, with the cooperation of strategic consulting partner BCG Digital Ventures from the United States. Prior to 2020, VieON was a rebranded version of VTVcab ON, a product managed by Vietnam Cable Television Corporation (VTVCab) and DatVietVAC. On June 15, 2020, after four years of research and testing, the new version of VieON was officially released by DatVietVAC Group, with Vie Channel Joint Stock Company as the business entity and service provider. This is considered the official launch date of the application. On July 21, 2023, VieON transitioned its business operations and service provision to VieON Joint Stock Company. In January 2024, VieON officially launched its global version, VieON Global, targeting Vietnamese users living abroad. == Background == According to Kantar Media Vietnam, up to 84% of Vietnamese people aged 15–54 use social media daily, and in a similar survey by Nielsen, 90% of respondents said they watch live TV weekly. Additionally, according to research organization Muvi, Southeast Asia's OTT market revenue could reach $650 million annually starting next year. Understanding this, DatVietVAC Group has planned to research and develop an OTT application, even though the Vietnamese market already has some major players such as FPT Play and the international giant Netflix. Additionally, DatVietVAC does not hide its ambition to make this application the number one entertainment channel for Vietnamese people.

Predictor–corrector method

In numerical analysis, predictor–corrector methods belong to a class of algorithms designed to integrate ordinary differential equations – to find an unknown function that satisfies a given differential equation. All such algorithms proceed in two steps: The initial, "prediction" step, starts from a function fitted to the function-values and derivative-values at a preceding set of points to extrapolate ("anticipate") this function's value at a subsequent, new point. The next, "corrector" step refines the initial approximation by using the predicted value of the function and another method to interpolate that unknown function's value at the same subsequent point. == Predictor–corrector methods for solving ODEs == When considering the numerical solution of ordinary differential equations (ODEs), a predictor–corrector method typically uses an explicit method for the predictor step and an implicit method for the corrector step. === Example: Euler method with the trapezoidal rule === A simple predictor–corrector method (known as Heun's method) can be constructed from the Euler method (an explicit method) and the trapezoidal rule (an implicit method). Consider the differential equation y ′ = f ( t , y ) , y ( t 0 ) = y 0 , {\displaystyle y'=f(t,y),\quad y(t_{0})=y_{0},} and denote the step size by h {\displaystyle h} . First, the predictor step: starting from the current value y i {\displaystyle y_{i}} , calculate an initial guess value y ~ i + 1 {\displaystyle {\tilde {y}}_{i+1}} via the Euler method, y ~ i + 1 = y i + h f ( t i , y i ) . {\displaystyle {\tilde {y}}_{i+1}=y_{i}+hf(t_{i},y_{i}).} Next, the corrector step: improve the initial guess using trapezoidal rule, y i + 1 = y i + 1 2 h ( f ( t i , y i ) + f ( t i + 1 , y ~ i + 1 ) ) . {\displaystyle y_{i+1}=y_{i}+{\tfrac {1}{2}}h{\bigl (}f(t_{i},y_{i})+f(t_{i+1},{\tilde {y}}_{i+1}){\bigr )}.} That value is used as the next step. === PEC mode and PECE mode === There are different variants of a predictor–corrector method, depending on how often the corrector method is applied. The Predict–Evaluate–Correct–Evaluate (PECE) mode refers to the variant in the above example: y ~ i + 1 = y i + h f ( t i , y i ) , y i + 1 = y i + 1 2 h ( f ( t i , y i ) + f ( t i + 1 , y ~ i + 1 ) ) . {\displaystyle {\begin{aligned}{\tilde {y}}_{i+1}&=y_{i}+hf(t_{i},y_{i}),\\y_{i+1}&=y_{i}+{\tfrac {1}{2}}h{\bigl (}f(t_{i},y_{i})+f(t_{i+1},{\tilde {y}}_{i+1}){\bigr )}.\end{aligned}}} It is also possible to evaluate the function f only once per step by using the method in Predict–Evaluate–Correct (PEC) mode: y ~ i + 1 = y i + h f ( t i , y ~ i ) , y i + 1 = y i + 1 2 h ( f ( t i , y ~ i ) + f ( t i + 1 , y ~ i + 1 ) ) . {\displaystyle {\begin{aligned}{\tilde {y}}_{i+1}&=y_{i}+hf(t_{i},{\tilde {y}}_{i}),\\y_{i+1}&=y_{i}+{\tfrac {1}{2}}h{\bigl (}f(t_{i},{\tilde {y}}_{i})+f(t_{i+1},{\tilde {y}}_{i+1}){\bigr )}.\end{aligned}}} Additionally, the corrector step can be repeated in the hope that this achieves an even better approximation to the true solution. If the corrector method is run twice, this yields the PECECE mode: y ~ i + 1 = y i + h f ( t i , y i ) , y ^ i + 1 = y i + 1 2 h ( f ( t i , y i ) + f ( t i + 1 , y ~ i + 1 ) ) , y i + 1 = y i + 1 2 h ( f ( t i , y i ) + f ( t i + 1 , y ^ i + 1 ) ) . {\displaystyle {\begin{aligned}{\tilde {y}}_{i+1}&=y_{i}+hf(t_{i},y_{i}),\\{\hat {y}}_{i+1}&=y_{i}+{\tfrac {1}{2}}h{\bigl (}f(t_{i},y_{i})+f(t_{i+1},{\tilde {y}}_{i+1}){\bigr )},\\y_{i+1}&=y_{i}+{\tfrac {1}{2}}h{\bigl (}f(t_{i},y_{i})+f(t_{i+1},{\hat {y}}_{i+1}){\bigr )}.\end{aligned}}} The PECEC mode has one fewer function evaluation than PECECE mode. More generally, if the corrector is run k times, the method is in P(EC)k or P(EC)kE mode. If the corrector method is iterated until it converges, this could be called PE(CE)∞.

Outline of artificial intelligence

The following outline is provided as an overview of and topical guide to artificial intelligence: Artificial intelligence (AI) is intelligence exhibited by machines or software. It is also the name of the scientific field which studies how to create computers and computer software that are capable of intelligent behavior. == AI terminology == Glossary of artificial intelligence == Goals and applications == === General intelligence === Artificial general intelligence AI-complete === Reasoning and problem solving === Automated reasoning Mathematics Automated theorem prover Computer-assisted proof – Computer algebra General Problem Solver Expert system – Decision support system – Clinical decision support system – === Knowledge representation === Knowledge representation Knowledge management Cyc === Planning === Automated planning and scheduling Strategic planning Sussman anomaly – === Learning === Machine learning – Constrained Conditional Models – Deep learning – Neural modeling fields – Supervised learning – Weak supervision (semi-supervised learning) – Unsupervised learning – === Natural language processing === Natural language processing (outline) – Chatterbots – Language identification – Large language model – Retrieval-augmented generation – Natural language user interface – Natural language understanding – Machine translation – Statistical semantics – Question answering – Semantic translation – Concept mining – Data mining – Text mining – Process mining – E-mail spam filtering – Information extraction – Named-entity extraction – Coreference resolution – Named-entity recognition – Relationship extraction – Terminology extraction – === Perception === Machine perception Pattern recognition – Computer Audition – Speech recognition – Speaker recognition – Computer vision (outline) – Image processing Intelligent word recognition – Object recognition – Optical mark recognition – Handwriting recognition – Optical character recognition – Automatic number plate recognition – Information extraction – Image retrieval – Automatic image annotation – Facial recognition systems – Silent speech interface – Activity recognition – Percept (artificial intelligence) === Robotics === Robotics – Behavior-based robotics – Cognitive – Cybernetics – Developmental robotics – Evolutionary robotics – === Control === Intelligent control Self-management (computer science) – Autonomic Computing – Autonomic Networking – === Social intelligence === Affective computing Kismet === Game playing === Game artificial intelligence – Computer game bot – computer replacement for human players. Video game AI – Computer chess – Computer Go – General game playing – General video game playing – === Creativity, art and entertainment === Artificial creativity Artificial life Artificial intelligence art AI anthropomorphism AI agent AI web browser AI boom AI slop Creative computing Generative artificial intelligence Generative pre trained transformer Uncanny valley Music and artificial intelligence Computational humor Chatbot === Integrated AI systems === AIBO – Sony's robot dog. It integrates vision, hearing and motorskills. Asimo (2000 to present) – humanoid robot developed by Honda, capable of walking, running, negotiating through pedestrian traffic, climbing and descending stairs, recognizing speech commands and the faces of specific individuals, among a growing set of capabilities. MIRAGE – A.I. embodied humanoid in an augmented reality environment. Cog – M.I.T. humanoid robot project under the direction of Rodney Brooks. QRIO – Sony's version of a humanoid robot. TOPIO, TOSY's humanoid robot that can play ping-pong with humans. Watson (2011) – computer developed by IBM that played and won the game show Jeopardy! It is now being used to guide nurses in medical procedures. Purpose: Open domain question answering Technologies employed: Natural language processing Information retrieval Knowledge representation Automated reasoning Machine learning Project Debater (2018) – artificially intelligent computer system, designed to make coherent arguments, developed at IBM's lab in Haifa, Israel. === Intelligent personal assistants === Intelligent personal assistant – Amazon Alexa – Assistant – Braina – Cortana – Google Assistant – Google Now – Mycroft – Siri – Viv – === Other applications === Artificial life – simulation of natural life through the means of computers, robotics, or biochemistry. Automatic target recognition – Diagnosis (artificial intelligence) – Speech generating device – Vehicle infrastructure integration – Virtual Intelligence – == History == History of artificial intelligence Progress in artificial intelligence Timeline of artificial intelligence AI effect – as soon as AI successfully solves a problem, the problem is no longer considered by the public to be a part of AI. This phenomenon has occurred in relation to every AI application produced, so far, throughout the history of development of AI. AI winter – a period of disappointment and funding reductions occurring after a wave of high expectations and funding in AI. Such funding cuts occurred in the 1970s, for instance. Moore's law === History by period === 2017 in artificial intelligence 2018 in artificial intelligence 2019 in artificial intelligence 2020 in artificial intelligence 2021 in artificial intelligence 2022 in artificial intelligence 2023 in artificial intelligence 2024 in artificial intelligence 2025 in artificial intelligence 2026 in artificial intelligence 2027 in artificial intelligence 2028 in artificial intelligence 2029 in artificial intelligence === History by subject === History of logic (formal reasoning is an important precursor of AI) History of machine learning (timeline) History of machine translation (timeline) History of natural language processing History of optical character recognition (timeline) == AI algorithms and techniques == === Search === Discrete search algorithms Uninformed search Brute force search – Problem-solving technique and algorithmic paradigmPages displaying short descriptions of redirect targets Search tree – Data structure in tree form sorted for fast lookup Breadth-first search – Algorithm to search the nodes of a graph Depth-first search – Algorithm to search the nodes of a graph State space search – Class of search algorithmsPages displaying short descriptions of redirect targets Informed search Best-first search – Graph exploring search algorithm A search algorithm – Algorithm used for pathfinding and graph traversal Heuristics – Problem-solving methodPages displaying short descriptions of redirect targets Pruning (algorithm) – Data compression techniquePages displaying short descriptions of redirect targets Adversarial search Minmax algorithm – Decision rule used for minimizing the possible loss for a worst-case scenarioPages displaying short descriptions of redirect targets Logic as search Production system (computer science) – Computer program used to provide artificial intelligence Rule based system – Type of computer systemPages displaying short descriptions of redirect targets Production rule – Computer program used to provide artificial intelligence Inference rule – Method of deriving conclusionsPages displaying short descriptions of redirect targets Horn clause – Type of logical formula Forward chaining – Inference engine in an expert system Backward chaining – Method of forming inferences Planning as search State space search – Class of search algorithmsPages displaying short descriptions of redirect targets Means–ends analysis – Problem solving technique === Optimization search === Optimization (mathematics) algorithms Hill climbing – Optimization algorithm Simulated annealing – Probabilistic optimization technique and metaheuristic Beam search – Heuristic search algorithm Random optimization – Optimization technique in mathematics Evolutionary computation Genetic algorithms – Competitive algorithm for searching a problem spacePages displaying short descriptions of redirect targets Gene expression programming – Evolutionary algorithm Genetic programming – Evolving computer programs with techniques analogous to natural genetic processes Differential evolution – Method of mathematical optimization Society based learning algorithms. Swarm intelligence – Collective behavior of decentralized, self-organized systems Particle swarm optimization – Iterative simulation method Ant colony optimization – Optimization algorithmPages displaying short descriptions of redirect targets Metaheuristic – Optimization technique === Logic === Logic and automated reasoning Programming using logic Logic programming – Programming paradigm based on formal logic See "Logic as search" above. Forms of Logic Propositional logic First-order logic First-order logic with equality Constraint satisfaction – Process in artificial intelligence and operations research Fuzzy logic Fuzzy set theory – Sets whose elements have degrees of membershipPages displaying short descriptions

Randomized rounding

In computer science and operations research, randomized rounding is a widely used approach for designing and analyzing approximation algorithms. Many combinatorial optimization problems are computationally intractable to solve exactly (to optimality). For such problems, randomized rounding can be used to design fast (polynomial time) approximation algorithms—that is, algorithms that are guaranteed to return an approximately optimal solution given any input. The basic idea of randomized rounding is to convert an optimal solution of a relaxation of the problem into an approximately-optimal solution to the original problem. The resulting algorithm is usually analyzed using the probabilistic method. == Overview == The basic approach has three steps: Formulate the problem to be solved as an integer linear program (ILP). Compute an optimal fractional solution x {\displaystyle x} to the linear programming relaxation (LP) of the ILP. Round the fractional solution x {\displaystyle x} of the LP to an integer solution x ′ {\displaystyle x'} of the ILP. (Although the approach is most commonly applied with linear programs, other kinds of relaxations are sometimes used. For example, see Goemans' and Williamson's semidefinite programming-based Max-Cut approximation algorithm.) In the first step, the challenge is to choose a suitable integer linear program. Familiarity with linear programming, in particular modelling using linear programs and integer linear programs, is required. For many problems, there is a natural integer linear program that works well, such as in the Set Cover example below. (The integer linear program should have a small integrality gap; indeed randomized rounding is often used to prove bounds on integrality gaps.) In the second step, the optimal fractional solution can typically be computed in polynomial time using any standard linear programming algorithm. In the third step, the fractional solution must be converted into an integer solution (and thus a solution to the original problem). This is called rounding the fractional solution. The resulting integer solution should (provably) have cost not much larger than the cost of the fractional solution. This will ensure that the cost of the integer solution is not much larger than the cost of the optimal integer solution. The main technique used to do the third step (rounding) is to use randomization, and then to use probabilistic arguments to bound the increase in cost due to the rounding (following the probabilistic method from combinatorics). Therein, probabilistic arguments are used to show the existence of discrete structures with desired properties. In this context, one uses such arguments to show the following: Given any fractional solution x {\displaystyle x} of the LP, with positive probability the randomized rounding process produces an integer solution x ′ {\displaystyle x'} that approximates x {\displaystyle x} according to some desired criterion. Finally, to make the third step computationally efficient, one either shows that x ′ {\displaystyle x'} approximates x {\displaystyle x} with high probability (so that the step can remain randomized) or one derandomizes the rounding step, typically using the method of conditional probabilities. The latter method converts the randomized rounding process into an efficient deterministic process that is guaranteed to reach a good outcome. == Example: the set cover problem == The following example illustrates how randomized rounding can be used to design an approximation algorithm for the set cover problem. Fix any instance ⟨ c , S ⟩ {\displaystyle \langle c,{\mathcal {S}}\rangle } of set cover over a universe U {\displaystyle {\mathcal {U}}} . === Computing the fractional solution === For step 1, let IP be the standard integer linear program for set cover for this instance. For step 2, let LP be the linear programming relaxation of IP, and compute an optimal solution x ∗ {\displaystyle x^{}} to LP using any standard linear programming algorithm. This takes time polynomial in the input size. The feasible solutions to LP are the vectors x {\displaystyle x} that assign each set s ∈ S {\displaystyle s\in {\mathcal {S}}} a non-negative weight x s {\displaystyle x_{s}} , such that, for each element e ∈ U {\displaystyle e\in {\mathcal {U}}} , x ′ {\displaystyle x'} covers e {\displaystyle e} —the total weight assigned to the sets containing e {\displaystyle e} is at least 1, that is, ∑ s ∋ e x s ≥ 1. {\displaystyle \sum _{s\ni e}x_{s}\geq 1.} The optimal solution x ∗ {\displaystyle x^{}} is a feasible solution whose cost ∑ s ∈ S c ( S ) x s ∗ {\displaystyle \sum _{s\in {\mathcal {S}}}c(S)x_{s}^{}} is as small as possible. Note that any set cover C {\displaystyle {\mathcal {C}}} for S {\displaystyle {\mathcal {S}}} gives a feasible solution x {\displaystyle x} (where x s = 1 {\displaystyle x_{s}=1} for s ∈ C {\displaystyle s\in {\mathcal {C}}} , x s = 0 {\displaystyle x_{s}=0} otherwise). The cost of this C {\displaystyle {\mathcal {C}}} equals the cost of x {\displaystyle x} , that is, ∑ s ∈ C c ( s ) = ∑ s ∈ S c ( s ) x s . {\displaystyle \sum _{s\in {\mathcal {C}}}c(s)=\sum _{s\in {\mathcal {S}}}c(s)x_{s}.} In other words, the linear program LP is a relaxation of the given set-cover problem. Since x ∗ {\displaystyle x^{}} has minimum cost among feasible solutions to the LP, the cost of x ∗ {\displaystyle x^{}} is a lower bound on the cost of the optimal set cover. === Randomized rounding step === In step 3, we must convert the minimum-cost fractional set cover x ∗ {\displaystyle x^{}} into a feasible integer solution x ′ {\displaystyle x'} (corresponding to a true set cover). The rounding step should produce an x ′ {\displaystyle x'} that, with positive probability, has cost within a small factor of the cost of x ∗ {\displaystyle x^{}} .Then (since the cost of x ∗ {\displaystyle x^{}} is a lower bound on the cost of the optimal set cover), the cost of x ′ {\displaystyle x'} will be within a small factor of the optimal cost. As a starting point, consider the most natural rounding scheme: For each set s ∈ S {\displaystyle s\in {\mathcal {S}}} in turn, take x s ′ = 1 {\displaystyle x'_{s}=1} with probability min ( 1 , x s ∗ ) {\displaystyle \min(1,x_{s}^{})} , otherwise take x s ′ = 0 {\displaystyle x'_{s}=0} . With this rounding scheme, the expected cost of the chosen sets is at most ∑ s c ( s ) x s ∗ {\displaystyle \sum _{s}c(s)x_{s}^{}} , the cost of the fractional cover. This is good. Unfortunately the coverage is not good. When the variables x s ∗ {\displaystyle x_{s}^{}} are small, the probability that an element e {\displaystyle e} is not covered is about ∏ s ∋ e 1 − x s ∗ ≈ ∏ s ∋ e exp ⁡ ( − x s ∗ ) = exp ⁡ ( − ∑ s ∋ e x s ∗ ) ≈ exp ⁡ ( − 1 ) . {\displaystyle \prod _{s\ni e}1-x_{s}^{}\approx \prod _{s\ni e}\exp(-x_{s}^{})=\exp {\Big (}-\sum _{s\ni e}x_{s}^{}{\Big )}\approx \exp(-1).} So only a constant fraction of the elements will be covered in expectation. To make x ′ {\displaystyle x'} cover every element with high probability, the standard rounding scheme first scales up the rounding probabilities by an appropriate factor λ > 1 {\displaystyle \lambda >1} . Here is the standard rounding scheme: Fix a parameter λ ≥ 1 {\displaystyle \lambda \geq 1} . For each set s ∈ S {\displaystyle s\in {\mathcal {S}}} in turn, take x s ′ = 1 {\displaystyle x'_{s}=1} with probability min ( λ x s ∗ , 1 ) {\displaystyle \min(\lambda x_{s}^{},1)} , otherwise take x s ′ = 0 {\displaystyle x'_{s}=0} . Scaling the probabilities up by λ {\displaystyle \lambda } increases the expected cost by λ {\displaystyle \lambda } , but makes coverage of all elements likely. The idea is to choose λ {\displaystyle \lambda } as small as possible so that all elements are provably covered with non-zero probability. Here is a detailed analysis. ==== Lemma (approximation guarantee for rounding scheme) ==== Fix λ = ln ⁡ ( 2 | U | ) {\displaystyle \lambda =\ln(2|{\mathcal {U}}|)} . With positive probability, the rounding scheme returns a set cover x ′ {\displaystyle x'} of cost at most 2 ln ⁡ ( 2 | U | ) c ⋅ x ∗ {\displaystyle 2\ln(2|{\mathcal {U}}|)c\cdot x^{}} (and thus of cost O ( log ⁡ | U | ) {\displaystyle O(\log |{\mathcal {U}}|)} times the cost of the optimal set cover). (Note: with care the O ( log ⁡ | U | ) {\displaystyle O(\log |{\mathcal {U}}|)} can be reduced to ln ⁡ ( | U | ) + O ( log ⁡ log ⁡ | U | ) {\displaystyle \ln(|{\mathcal {U}}|)+O(\log \log |{\mathcal {U}}|)} .) ==== Proof ==== The output x ′ {\displaystyle x'} of the random rounding scheme has the desired properties as long as none of the following "bad" events occur: the cost c ⋅ x ′ {\displaystyle c\cdot x'} of x ′ {\displaystyle x'} exceeds 2 λ c ⋅ x ∗ {\displaystyle 2\lambda c\cdot x^{}} , or for some element e {\displaystyle e} , x ′ {\displaystyle x'} fails to cover e {\displaystyle e} . The expectation of each x s ′ {\displaystyle x'_{s}} is at most λ x s ∗ {\displaystyle \lambda x_{s

LIVAC Synchronous Corpus

LIVAC is an uncommon language corpus dynamically maintained since 1995. Different from other existing corpora, LIVAC has adopted a rigorous and regular "Windows" approach in processing and filtering massive media texts from representative Chinese speech communities such as Beijing, Hong Kong, Macau, Taipei, Singapore, Shanghai, as well as Guangzhou, and Shenzhen. The contents are thus deliberately repetitive in most cases, represented by textual samples drawn from editorials, local and international news, cross-Taiwan Strait news, as well as news on finance, sports and entertainment. By 2023, more than 3 billion characters of news media texts have been filtered, of which 700 million characters have been processed and analyzed and have yielded an expanding Pan-Chinese dictionary of 2.5 million words from the Pan-Chinese printed media. Through rigorous analysis based on computational linguistic methodology, LIVAC has at the same time accumulated a large amount of accurate and meaningful statistical data on the Chinese language and on their diverse speech communities in the Pan-Chinese context, and the results show considerable and important long standing as well as evolving variations. The "Windows" approach is the most innovative feature of LIVAC and has enabled Pan-Chinese media texts to be quantitatively analyzed according to various attributes such as locations, time and subject domains. Thus, various types of comparative studies and applications in information technology as well as development of often related innovative applications have been possible. Moreover, LIVAC has allowed longitudinal developments to be taken into account, facilitating Key Word in Context (KWIC) search and comprehensive study of target words and their underlying concepts as well as linguistic structures over the past 25 years, based on the above mentioned variables of location, time and subject. Results from the extensive and accumulative data analysis contained in LIVAC have enabled the cultivation of textual databases of proper names, place names, organization names, new words, and bi-weekly and annual rosters of media figures. Related applications have included the establishment of verb and adjective databases, the formulation of sentiment indices, and related opinion mining, to measure and compare the popularity of global media figures in the Chinese media (LIVAC Annual Pan-Chinese Celebrity Rosters, later renamed as the Pan-Chinese Newsmaker Rosters). Notable among these are the decades long periodic reviews of the 25 years of annual pan-Chinese rosters since 2000 and compilation of new word databases (LIVAC Annual Pan-Chinese New Word Rosters). On this basis, the analysis of the emergence, diffusion and transformation of new words, and the publication of dictionaries of neologisms have been made possible. A recent focus is on the relative balance between disyllabic words and growing trisyllabic words in the Chinese language, and the comparative study of light verbs in three Chinese speech communities. as well as the link between the language use and use of language as a reflection of epochal change in China. A new LIVAC version 3.1 was launched in February 2024. == Corpus data processing == Accessing media texts, manual input, etc. Text unification including conversion from simplified to traditional Chinese characters, stored as Big5 and Unicode versions Automatic word segmentation Automatic alignment of parallel texts Manual verification, part-of-speech tagging Extraction of words and addition to regional sub-corpora Combination of regional sub-corpora to update the LIVAC corpus, and master lexical database == Labeling for data curation == Categories used include general terms and proper names, such as: general names, surnames, semi titles; geographical, organizations and commercial entities, etc.; time, prepositions, locations, etc.; stack-words; loanwords; case-word; numerals, etc. Construction of databases of proper names, place names, and specific terms, etc. Generate rosters: "new word rosters", "celebrity or media personality rosters", "place name rosters", compound words and matched words Other parts of speech tagging for sub-database, such as common nouns, numerals, numeral classifiers, different types of verbs, and of adjectives, pronouns, adverbs, prepositions, conjunctions, particles marking mood, onomatopoeia, interjection, etc. == Applications == Compilation of Pan-Chinese dictionaries or local dictionaries Information technology research, such as predictive Chinese text input for mobile phones, automatic speech to text conversion, opinion mining Comparative studies on linguistic and cultural developments in the Pan-Chinese regions, especially in a critical period of history in modern China. Language teaching and learning research, and speech-to-text conversion Customized service on linguistic research and lexical search for international corporations and government agencies The above applications are provided by the following functions: Word Segmentation Search Phrase Search Example Sentence Selection Multi-word Comparison Word Cloud

Ontology for Biomedical Investigations

The Ontology for Biomedical Investigations (OBI) is an open-access, integrated ontology for the description of biological and clinical investigations. OBI provides a model for the design of an investigation, the protocols and instrumentation used, the materials used, the data generated and the type of analysis performed on it. The project is being developed as part of the OBO Foundry and as such adheres to all the principles therein such as orthogonal coverage (i.e. clear delineation from other foundry member ontologies) and the use of a common formal language. In OBI the common formal language used is the Web Ontology Language (OWL). As of March 2008, a pre-release version of the ontology was made available at the project's SVN repository. == Scope == The Ontology for Biomedical Investigations (OBI) addresses the need for controlled vocabularies to support integration and joint ("cross-omics") analysis of experimental data, a need originally identified in the transcriptomics domain by the FGED Society, which developed the MGED Ontology as an annotation resource for microarray data.Smith B, Ashburner M, Rosse C, Bard J, Bug W, Ceusters W, et al. (November 2007). "The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration". Nature Biotechnology. 25 (11): 1251–5. doi:10.1038/nbt1346. PMC 2814061. PMID 17989687. OBI uses the basic formal ontology upper-level ontology as a means of describing general entities that do not belong to a specific problem domain. As such, all OBI classes are a subclass of some BFO class. The ontology has the scope of modeling all biomedical investigations and as such contains ontology terms for aspects such as: biological material – for example blood plasma instrument (and parts of an instrument therein) – for example DNA microarray, centrifuge information content – such as an image or a digital information entity such as an electronic medical record design and execution of an investigation (and individual experiments therein) – for example study design, electrophoresis material separation data transformation (incorporating aspects such as data normalization and data analysis) – for example principal components analysis dimensionality reduction, mean calculation Less 'concrete' aspects such as the role a given entity may play in a particular scenario (for example the role of a chemical compound in an experiment) and the function of an entity (for example the digestive function of the stomach to nutriate the body) are also covered in the ontology. == OBI consortium == The MGED Ontology was originally identified in the transcriptomics domain by the FGED Society and was developed to address the needs of data integration. Following a mutual decision to collaborate, this effort later became a wider collaboration between groups such as FGED, PSI and MSI in response to the needs of areas such as transcriptomics, proteomics and metabolomics and the FuGO (Functional Genomics Investigation Ontology) was created. This later became the OBI covering the wider scope of all biomedical investigations. As an international, cross-domain initiative, the OBI consortium draws upon a pool of experts from a variety of fields, not limited to biology. The current list of OBI consortium members is available at the OBI consortium website. The consortium is made up of a coordinating committee which is a combination of two subgroups, the Community Representative (those representing a particular biomedical community) and the Core Developers (ontology developers who may or may not be members of any single community). Separate to the coordinating committee is the Developers Working Group which consists of developers within the communities collaborating in the development of OBI at the discretion of current OBI Consortium members. == Papers on OBI ==