List of text mining software

Text mining computer programs are available from many commercial and open source companies and sources. == Commercial == Angoss – Angoss Text Analytics provides entity and theme extraction, topic categorization, sentiment analysis and document summarization capabilities via the embedded AUTINDEX – is a commercial text mining software package based on sophisticated linguistics by IAI (Institute for Applied Information Sciences), Saarbrücken. DigitalMR – social media listening & text+image analytics tool for market research. FICO Score – leading provider of analytics. General Sentiment – Social Intelligence platform that uses natural language processing to discover affinities between the fans of brands with the fans of traditional television shows in social media. Stand alone text analytics to capture social knowledge base on billions of topics stored to 2004. IBM LanguageWare – the IBM suite for text analytics (tools and Runtime). IBM SPSS – provider of Modeler Premium (previously called IBM SPSS Modeler and IBM SPSS Text Analytics), which contains advanced NLP-based text analysis capabilities (multi-lingual sentiment, event and fact extraction), that can be used in conjunction with Predictive Modeling. Text Analytics for Surveys provides the ability to categorize survey responses using NLP-based capabilities for further analysis or reporting. Inxight – provider of text analytics, search, and unstructured visualization technologies. (Inxight was bought by Business Objects that was bought by SAP AG in 2008). Language Computer Corporation – text extraction and analysis tools, available in multiple languages. Lexalytics – provider of a text analytics engine used in Social Media Monitoring, Voice of Customer, Survey Analysis, and other applications. Salience Engine. The software provides the unique capability of merging the output of unstructured, text-based analysis with structured data to provide additional predictive variables for improved predictive models and association analysis. Linguamatics – provider of natural language processing (NLP) based enterprise text mining and text analytics software, I2E, for high-value knowledge discovery and decision support. Mathematica – provides built in tools for text alignment, pattern matching, clustering and semantic analysis. See Wolfram Language, the programming language of Mathematica. MATLAB offers Text Analytics Toolbox for importing text data, converting it to numeric form for use in machine and deep learning, sentiment analysis and classification tasks. Medallia – offers one system of record for survey, social, text, written and online feedback. NetMiner – software for network analysis and text mining. Supports social media and bibliographic data collection, NLP for english and chinese, sentiment analysis, work co-occurrence network(text network analysis) and visualization. NetOwl – suite of multilingual text and entity analytics products, including entity extraction, link and event extraction, sentiment analysis, geotagging, name translation, name matching, and identity resolution, among others. PolyAnalyst - text analytics environment. PoolParty Semantic Suite - graph-based text mining platform. RapidMiner with its Text Processing Extension – data and text mining software. SAS – SAS Text Miner and Teragram; commercial text analytics, natural language processing, and taxonomy software used for Information Management. Sketch Engine – a corpus manager and analysis software which providing creating text corpora from uploaded texts or the Web including part-of-speech tagging and lemmatization or detecting a particular website. Sysomos – provider social media analytics software platform, including text analytics and sentiment analysis on online consumer conversations. WordStat – Content analysis and text mining add-on module of QDA Miner for analyzing large amounts of text data. == Open source == Carrot2 – text and search results clustering framework. GATE – general Architecture for Text Engineering, an open-source toolbox for natural language processing and language engineering. Gensim – large-scale topic modelling and extraction of semantic information from unstructured text (Python). KH Coder – for Quantitative Content Analysis or Text Mining The KNIME Text Processing extension. Natural Language Toolkit (NLTK) – a suite of libraries and programs for symbolic and statistical natural language processing (NLP) for the Python programming language. OpenNLP – natural language processing. Orange with its text mining add-on. The PLOS Text Mining Collection. The programming language R provides a framework for text mining applications in the package tm. The Natural Language Processing task view contains tm and other text mining library packages. spaCy – open-source Natural Language Processing library for Python Stanbol – an open source text mining engine targeted at semantic content management. Voyant Tools – a web-based text analysis environment, created as a scholarly project.

Augment (app)

Augment is an augmented reality SaaS platform that allows users to visualize their products in 3D in real environment and in real-time through tablets or smartphones. The software can be used for retail, e-commerce, architecture, and other purposes. Augment created a mobile app of the same name, used to visualize 3D models in augmented reality and a web application called Augment Manager for 3D content management. The company is based in Paris, France, and was founded in October 2011 by Jean-François Chianetta, Cyril Champier, and Mickaël Jordan. In March 2016, Augment announced €3 million in its series-A round from Salesforce Ventures, which bringing the total funding since launch to $4.7 million. Augment lets businesses and 3D professionals visualize projects in their actual size and environment, on iPhone, iPad, and Android, using the power of augmented reality. Users can print the Augment tracker or create their own tracker to place the 3D models in space and at scale in real time. Common uses of the technology include product presentations, interactive print campaigns and e-Commerce product visualization. Augment has just released its augmented reality SDK solutions for retail and augmented commerce. The SDK solutions, available for both native mobile app and web integrations, allow companies to embed augmented reality product visualization in their existing eCommerce platforms. == Technology == Augment uses the following 3D technologies: Vuforia Augmented Reality SDK OpenGL == Customer cases == Companies such as Coca-Cola, Siemens, Nokia, Nestle, and Boeing are using Augment's solutions. == History == Augment was first created by Jean-François Chianetta in October 2011. Chianetta later teamed up with Cyril Champier and Mickaël Jordan for further development. The co-founding team was among the 12 startups of Season 3 of French accelerator Le Camping. The team raised one million euros (US$1,300,000) in April 2013 and moved its office to Paris. In March 2016, Augment raised US$3M Series A funding from Salesforce and other investors. In 2013, Augment's first service, Boost Business Catalog, was made available to help businesses catalogue and display their product models. Customers can rotate the images in 3D and view augmented content before deciding what to buy. == Awards == "Best Innovation" at Ecommerce Mag Trophy 2013

Maximum-entropy Markov model

In statistics, a maximum-entropy Markov model (MEMM), or conditional Markov model (CMM), is a graphical model for sequence labeling that combines features of hidden Markov models (HMMs) and maximum entropy (MaxEnt) models. An MEMM is a discriminative model that extends a standard maximum entropy classifier by assuming that the unknown values to be learnt are connected in a Markov chain rather than being conditionally independent of each other. MEMMs find applications in natural language processing, specifically in part-of-speech tagging and information extraction. == Model == Suppose we have a sequence of observations O 1 , … , O n {\displaystyle O_{1},\dots ,O_{n}} that we seek to tag with the labels S 1 , … , S n {\displaystyle S_{1},\dots ,S_{n}} that maximize the conditional probability P ( S 1 , … , S n ∣ O 1 , … , O n ) {\displaystyle P(S_{1},\dots ,S_{n}\mid O_{1},\dots ,O_{n})} . In a MEMM, this probability is factored into Markov transition probabilities, where the probability of transitioning to a particular label depends only on the observation at that position and the previous position's label: P ( S 1 , … , S n ∣ O 1 , … , O n ) = ∏ t = 1 n P ( S t ∣ S t − 1 , O t ) . {\displaystyle P(S_{1},\dots ,S_{n}\mid O_{1},\dots ,O_{n})=\prod _{t=1}^{n}P(S_{t}\mid S_{t-1},O_{t}).} Each of these transition probabilities comes from the same general distribution P ( s ∣ s ′ , o ) {\displaystyle P(s\mid s',o)} . For each possible label value of the previous label s ′ {\displaystyle s'} , the probability of a certain label s {\displaystyle s} is modeled in the same way as a maximum entropy classifier: P ( s ∣ s ′ , o ) = P s ′ ( s ∣ o ) = 1 Z ( o , s ′ ) exp ⁡ ( ∑ a λ a f a ( o , s ) ) . {\displaystyle P(s\mid s',o)=P_{s'}(s\mid o)={\frac {1}{Z(o,s')}}\exp \left(\sum _{a}\lambda _{a}f_{a}(o,s)\right).} Here, the f a ( o , s ) {\displaystyle f_{a}(o,s)} are real-valued or categorical feature-functions, and Z ( o , s ′ ) {\displaystyle Z(o,s')} is a normalization term ensuring that the distribution sums to one. This form for the distribution corresponds to the maximum entropy probability distribution satisfying the constraint that the empirical expectation for the feature is equal to the expectation given the model: E e ⁡ [ f a ( o , s ) ] = E p ⁡ [ f a ( o , s ) ] for all a . {\displaystyle \operatorname {E} _{e}\left[f_{a}(o,s)\right]=\operatorname {E} _{p}\left[f_{a}(o,s)\right]\quad {\text{ for all }}a.} The parameters λ a {\displaystyle \lambda _{a}} can be estimated using generalized iterative scaling. Furthermore, a variant of the Baum–Welch algorithm, which is used for training HMMs, can be used to estimate parameters when training data has incomplete or missing labels. The optimal state sequence S 1 , … , S n {\displaystyle S_{1},\dots ,S_{n}} can be found using a very similar Viterbi algorithm to the one used for HMMs. The dynamic program uses the forward probability: α t + 1 ( s ) = ∑ s ′ ∈ S α t ( s ′ ) P s ′ ( s ∣ o t + 1 ) . {\displaystyle \alpha _{t+1}(s)=\sum _{s'\in S}\alpha _{t}(s')P_{s'}(s\mid o_{t+1}).} == Strengths and weaknesses == An advantage of MEMMs rather than HMMs for sequence tagging is that they offer increased freedom in choosing features to represent observations. In sequence tagging situations, it is useful to use domain knowledge to design special-purpose features. In the original paper introducing MEMMs, the authors write that "when trying to extract previously unseen company names from a newswire article, the identity of a word alone is not very predictive; however, knowing that the word is capitalized, that is a noun, that it is used in an appositive, and that it appears near the top of the article would all be quite predictive (in conjunction with the context provided by the state-transition structure)." Useful sequence tagging features, such as these, are often non-independent. Maximum entropy models do not assume independence between features, but generative observation models used in HMMs do. Therefore, MEMMs allow the user to specify many correlated, but informative features. Another advantage of MEMMs versus HMMs and conditional random fields (CRFs) is that training can be considerably more efficient. In HMMs and CRFs, one needs to use some version of the forward–backward algorithm as an inner loop in training. However, in MEMMs, estimating the parameters of the maximum-entropy distributions used for the transition probabilities can be done for each transition distribution in isolation. A drawback of MEMMs is that they potentially suffer from the "label bias problem," where states with low-entropy transition distributions "effectively ignore their observations." Conditional random fields were designed to overcome this weakness, which had already been recognised in the context of neural network-based Markov models in the early 1990s. Another source of label bias is that training is always done with respect to known previous tags, so the model struggles at test time when there is uncertainty in the previous tag.

Mirella Lapata

Mirella Lapata is a computer scientist and Professor in the School of Informatics at the University of Edinburgh. Working on the general problem of extracting semantic information from large bodies of text, Lapata develops computer algorithms and models in the field of natural language processing (NLP). == Education == Lapata obtained a Master of Arts (MA) degree from Carnegie Mellon University and subsequently earned a doctorate from the University of Edinburgh. Lapata's doctoral research investigated the acquisition of information from polysemous linguistic units using probabilistic methods supervised by Alex Lascarides, Chris Brew and Steve Finch. == Career and research == After her doctorate, Lapata assumed academic positions at Saarland University and at the Department of Computer Science at the University of Sheffield. At the University of Edinburgh she became a reader in the School of Informatics where she is a full Professor and holds a personal chair in natural language processing. Lapata is a member of the Human Communication Research Center and Institute for Language, Cognition and Computation, both in Edinburgh. Between 2015 and 2017, Lapata served as a member of the Royal Society Machine Learning Working Group. Recently Lapata was granted a European Research Council (ERC) Consolidator Grant worth €1.9M to fund five years of her project, TransModal: Translating from Multiple Modalities into Text. === Awards and honours === In 2009 Lapata became the first recipient of the Microsoft British Computer Society (BCS)/BCS IRSG Karen Spärck Jones Award. The award recognises achievement in furthering the progress in information retrieval and natural language processing; the award commemorates the life and work of Karen Spärck Jones. In 2012 Lapata won an Empirical Methods in Natural Language Processing (EMNLP)-CoNLL 2012 Best Reviewer Award. In 2018 Lapata was awarded, alongside Li Dong, an Association for Computational Linguistics (ACL) Best Paper Honorable Mention. In 2019 Lapata was elected a Fellow of the Royal Society of Edinburgh In 2020 Lapata was elected to the Academia Europaea. In 2025 Lapata was awarded the BCS Lovelace Medal for Computing Research.

Ernst Dickmanns

Ernst Dieter Dickmanns is a German pioneer of dynamic computer vision and of driverless cars. Dickmanns has been a professor at the University of the Bundeswehr Munich (1975–2001), and visiting professor to Caltech and to MIT, teaching courses on "dynamic vision". == Biography == Dickmanns was born in 1936. He studied aerospace and aeronautics at RWTH Aachen (1956–1961), and control engineering at Princeton University (1964/65); from 1961 to 1975 he was associated with the German Aero-Space Research Establishment (now DLR) Oberpfaffenhofen, working in the fields of flight dynamics and trajectory optimization. In 1971/72 he spent a Post-Doc Research Associateship with the NASA-Marshall Space Flight Center, Huntsville (orbiter re-entry). From 1975 to 2001 he was with UniBw Munich, where he initiated the 'Institut fuer Flugmechanik und Systemdynamik' (IFS), the Institut fuer die 'Technik Autonomer Systeme' (TAS), and the research activities in machine vision for vehicle guidance. == Pioneering work in autonomous driving == In the early 1980s his team equipped a Mercedes-Benz van with cameras and other sensors. The 5-ton van was re-engineered that it was possible to control steering wheel, throttle, and brakes through computer commands based on real-time evaluation of image sequences. Software was written that translated the sensory data into appropriate driving commands. For safety reasons, initial experiments in Bavaria took place on streets without traffic. In 1986 the Robot Car "VaMoRs" managed to drive all by itself and by 1987 was capable of driving itself at speeds up to 96 kilometres per hour (60 mph). One of the greatest challenges in high-speed autonomous driving arises through the rapidly changing visual street scenes. Back then, computers were much slower than they are today (~1% of 1%); therefore, sophisticated computer vision strategies were necessary to react in real time. The team of Dickmanns solved the problem through an innovative approach to dynamic vision. Spatiotemporal models were used right from the beginning, dubbed '4-D approach', which did not need storing previous images but nonetheless was able to yield estimates of all 3-D position and velocity components. Attention control including artificial saccadic movements of the platform carrying the cameras allowed the system to focus its attention on the most relevant details of the visual input. Kalman filters have been extended to perspective imaging and were used to achieve robust autonomous driving even in presence of noise and uncertainty. Feedback of prediction errors allowed bypassing the (ill-conditioned) inversion of perspective projection by least-squares parameter fits. When in 1986/83 the EUREKA-project 'PROgraMme for a European Traffic of Highest Efficiency and Unprecedented Safety' (PROMETHEUS) was initiated by the European car manufacturing industry (funding in the range of several hundred million Euros), the initially planned autonomous lateral guidance by buried cables was dropped and substituted by the much more flexible machine vision approach proposed by Dickmanns, and partially encouraged by his successes. Most of the major car companies participated; so did Dickmanns and his team in cooperation with the Daimler-Benz AG. Substantial progress was made in the following 7 years. In particular, Dickmanns' robot cars learned to drive in traffic under various conditions. An accompanying human driver with a "red button" made sure the robot vehicle could not get out of control and become a danger to the public. Since 1992, driving in public traffic was standard as final step in real-world testing. Several dozen Transputers, a special breed of parallel computers, were used to deal with the (by 1990s standards) enormous computational demands. Two culmination points were achieved in 1994/95, when Dickmanns´ re-engineered autonomous S-Class Mercedes-Benz performed international demonstrations. The first was the final presentation of the PROMETHEUS project in October 1994 on Autoroute 1 near the airport Charles-de-Gaulle in Paris. With guests on board, the twin vehicles of Daimler-Benz (VITA-2) and UniBwM (VaMP) drove more than 1,000 kilometres (620 mi) on the three-lane highway in standard heavy traffic at speeds up to 130 kilometres per hour (81 mph). Driving in free lanes, convoy driving with distance keeping depending on speed, and lane changes left and right with autonomous passing have been demonstrated; the latter required interpreting the road scene also in the rear hemisphere. Two cameras with different focal lengths for each hemisphere have been used in parallel for this purpose. The second culmination point was a 1,758 kilometres (1,092 mi) trip in the fall of 1995 from Munich in Bavaria to Odense in Denmark to a project meeting and back. Both longitudinal and lateral guidance were performed autonomously by vision. On highways, the robot achieved speeds exceeding 175 kilometres per hour (109 mph) (there is no general speed limit on the Autobahn). Publications from Dickmann's research group indicate a mean autonomously driven distance without resets of ~9 kilometres (5.6 mi); the longest autonomously driven stretch reached 158 kilometres (98 mi). More than half of the resets required were achieved autonomously (no human intervention). This is particularly impressive considering that the system used black-and-white video-cameras and did not model situations like road construction sites with yellow lane markings; lane-changes at over 140 kilometres per hour (87 mph), and other traffic with more than 40 kilometres per hour (25 mph) relative speed have been handled. In total, 95% autonomous driving (by distance) was achieved. In the years 1994 to 2004 the elder 5-ton van 'VaMoRs' was used to develop the capabilities needed for driving on networks of minor (also unsealed) roads and for cross-country driving including avoidance of negative obstacles like ditches. Turning off onto crossroads of unknown width and intersection angles required a big effort, but has been achieved with "Expectation-based, Multi-focal, Saccadic vision" (EMS-vision). This vertebrate-type vision uses animation capabilities based on knowledge about subject classes (including the autonomous vehicle itself) and their potential behaviour in certain situations. This rich background is used for control of gaze and attention as well as for locomotion. Beside ground vehicle guidance, also applications of the 4-D approach to dynamic vision for unmanned air vehicles (conventional aircraft and helicopters) have been investigated. Autonomous visual landing approaches and landings have been demonstrated in hardware-in-the-loop simulations with visual/inertial data fusion. Real-world autonomous visual landing approaches till shortly before touchdown have been performed in 1992 with the twin-propeller aircraft Dornier 128 of the University of Brunswick at the airport there. Another success of this machine vision technology was the first ever visually controlled grasping experiment of a free-floating object in weightlessness on board the Space Shuttle Columbia D2-mission in 1993 as part of the 'Rotex'-experiment of DLR.

Rejoyn

Rejoyn is a prescription-only digital therapeutic smartphone app approved by the US FDA for the treatment of major depressive disorder (MDD) in adults ages 22 and up. It is prescribed in conjunction with standard antidepressant medication and professional guidance and support. Rejoyn was developed by Click Therapeutics and Otsuka America Pharmaceutical Inc., and gained FDA clearance as a "medical device" on March 30th, 2024. The smartphone app helps patients with depression using exercises based on cognitive behavioral therapy (CBT) along with timed notifications to keep the patient engaged and in treatment. Randomized controlled trials showed that the Rejoyn app was more effective at relieving depression symptoms compared to a "sham app", a placebo app that required similar effort but was not intended to be helpful. Dr. John Torous, MD, MBI,[a] a psychiatrist at the Beth Israel Deaconess Medical Center in Boston, said that the app seems to pose minimal risks, and is an important step forward in unlocking the power of smartphones in treating psychiatric disorders. Some experts have signaled that the claims should be taken with caution, since the app was "tested only in a narrow subset of patients." and its benefits are "not statistically significant," according to the study’s primary outcome."

Trigram tagger

In computational linguistics, a trigram tagger is a statistical method for automatically identifying words as being nouns, verbs, adjectives, adverbs, etc. based on second order Markov models that consider triples of consecutive words. It is trained on a text corpus as a method to predict the next word, taking the product of the probabilities of unigram, bigram and trigram. In speech recognition, algorithms utilizing trigram-tagger score better than those algorithms utilizing IIMM tagger but less well than Net tagger. The description of the trigram tagger is provided by Brants (2000).