Apache Mahout

Apache Mahout

Apache Mahout is a project of the Apache Software Foundation to produce free implementations of distributed or otherwise scalable machine learning algorithms focused primarily on linear algebra. In the past, many of the implementations use the Apache Hadoop platform, however today it is primarily focused on Apache Spark. Mahout also provides Java/Scala libraries for common math operations (focused on linear algebra and statistics) and primitive Java collections. Mahout is a work in progress; a number of algorithms have been implemented. == Features == === Samsara === Apache Mahout-Samsara refers to a Scala domain-specific language (DSL) that allows users to use R-like syntax as opposed to traditional Scala-like syntax. This allows user to express algorithms concisely and clearly. === Backend agnostic === Apache Mahout's code abstracts the domain-specific language from the engine where the code is run. While active development is done with the Apache Spark engine, users are free to implement any engine they choose- H2O and Apache Flink have been implemented in the past and examples exist in the code base. === GPU/CPU accelerators === The JVM has notoriously slow computation. To improve speed, "native solvers" were added which move in-core, and by extension, distributed BLAS operations out of the JVM, offloading to off-heap or GPU memory for processing via multiple CPUs and/or CPU cores, or GPUs when built against the ViennaCL library. ViennaCL is a highly optimized C++ library with BLAS operations implemented in OpenMP, and OpenCL. As of release 14.1, the OpenMP build considered to be stable, leaving the OpenCL build is still in its experimental proof-of-concept phase. === Recommenders === Apache Mahout features implementations of Alternating Least Squares, Co-Occurrence, and Correlated Co-Occurrence, a unique-to-Mahout recommender algorithm that extends co-occurrence to be used on multiple dimensions of data. == History == === Transition from Map Reduce to Apache Spark === While Mahout's core algorithms for clustering, classification and batch based collaborative filtering were implemented on top of Apache Hadoop using the map/reduce paradigm, it did not restrict contributions to Hadoop-based implementations. Contributions that run on a single node or on a non-Hadoop cluster were also welcomed. For example, the 'Taste' collaborative-filtering recommender component of Mahout was originally a separate project and can run stand-alone without Hadoop. Starting with the release 0.10.0, the project shifted its focus to building a backend-independent programming environment, code named "Samsara". The environment consists of an algebraic backend-independent optimizer and an algebraic Scala DSL unifying in-memory and distributed algebraic operators. Supported algebraic platforms are Apache Spark, H2O, and Apache Flink. Support for MapReduce algorithms started being gradually phased out in 2014. === Release history === === Developers === Apache Mahout is developed by a community. The project is managed by a group called the "Project Management Committee" (PMC). The current PMC is Andrew Musselman, Andrew Palumbo, Drew Farris, Isabel Drost-Fromm, Jake Mannix, Pat Ferrel, Paritosh Ranjan, Trevor Grant, Robin Anil, Sebastian Schelter, Stevo Slavić.

Corpus of Linguistic Acceptability

Corpus of Linguistic Acceptability (CoLA) is a dataset the primary purpose of which is to serve as a benchmark for evaluating the ability of artificial neural networks, including large language models, to judge the grammatical correctness of sentences. It consists of 10,657 English sentences from published linguistics literature that were manually labeled either as grammatical or ungrammatical. == Public version == The publicly available version of CoLA contains 9,594 sentences that belong to training and development sets. It excludes 1,063 sentences reserved for a held-out test set.

AI Paragraph Rewriters Reviews: What Actually Works in 2026

Looking for the best AI paragraph rewriter? An AI paragraph rewriter is software that uses machine learning to help you get more done — it can save you hours every week by automating repetitive work. Most options offer a generous free tier, with paid plans unlocking higher limits, faster processing, and team features. Whether you are a beginner or a pro, the right AI paragraph rewriter slots into your workflow and pays for itself fast. This guide breaks down the top picks, their pros and cons, and who each one is best for.

Hans Uszkoreit

Hans Uszkoreit is a German computational linguist. Hans Uszkoreit studied Linguistics and Computer Science at Technische Universität Berlin and the University of Texas at Austin. While he was studying in Austin, he also worked as a research associate in a large machine translation project at the Linguistics Research Center. After he received his Ph.D. in linguistics from the University of Texas, he worked as a computer scientist at the Artificial Intelligence Center and was affiliated with the Center for the Study of Language and Information at Stanford University. Nowadays, he is teaching as a professor of Computational Linguistics at Saarland University. Moreover, he serves as a Scientific Director at the German Research Center for Artificial Intelligence (DFKI) where he heads the DFKI Language Technology Lab. == Life and career == Hans Uszkoreit, a native of East Berlin, was actively involved in a group of young individuals who opposed the East Germany regime. His protesting against the 1968 invasion of Czechoslovakia led to his expulsion from high school and subsequent imprisonment for a period of fifteen months on charges of subversive agitation. Realizing that continuing his education in East Germany was not feasible, Uszkoreit made the decision to escape to West Berlin. There, he completed his high school education and pursued a degree in Linguistics and Computer Science at Technische Universität Berlin. During his time as a student, he worked part-time as an editor and writer for Zitty, a city magazine, which he co-founded. In 1977, Uszkoreit was granted a Fulbright Grant to further his studies at the University of Texas at Austin. During his time in Austin, he concurrently served as a research associate in a significant machine translation project. Subsequently, he received a second Fulbright grant, which enabled him to pursue a Ph.D. program in linguistics. In 1984, he successfully completed his doctoral studies, earning a Ph.D. in linguistics. Between 1982 and 1986, Uszkoreit held the position of a computer scientist at the Artificial Intelligence Center of SRI International in Menlo Park, California. In 1988, he created the Department of Computational Linguistics and Phonetics at Saarland University. In 1989 he was elected head of the Language Technology Lab at DFKI. In 2012, Uszkoreit's achievements in the domain of relation extraction led to his receipt of a Google Faculty Research Award, acknowledging the substantial progress made by Uszkoreit and his team in advancing the field. In 2013, Uszkoreit, in collaboration with Feiyu Xu and Roberto Navigli, was granted an additional Google Research Award, which provided support for a targeted project within Google's Language Understanding Program, focusing on the augmentation of language comprehension and analysis. == Personal life == He is father of a son Jakob Uszkoreit, machine learning researcher scientist, an author of the landmark paper "Attention Is All You Need", and daughter Lena Uszkoreit. == Awards == 2002 Elected Member of the European Academy of Sciences 2012 Google Faculty Research Award 2013 Google Focused Research Award

Julie Beth Lovins

Julie Beth Lovins (October 19, 1945, in Washington, D.C. – January 26, 2018, in Mountain View, California) was a computational linguist who published The Lovins Stemming Algorithm - a type of stemming algorithm for word matching - in 1968. The Lovins Stemmer is a single pass, context sensitive stemmer, which removes endings based on the longest-match principle. The stemmer was the first to be published and was extremely well developed considering the date of its release, having been the main influence on a large amount of the future work in the area. -Adam G., et al == Background == Born on October 19, 1945, in Washington, D.C., Lovins grew up in Amherst, Massachusetts. Her father Gerald H. Lovins was an engineer and her mother, Miriam Lovins, a social services administrator. Lovins' brother Amory Lovins is the co-founder and chief environmental scientist of Rocky Mountain Institute. For her undergraduate degree, Lovins attended Pembroke College, the women's college of Brown University, which later combined into Brown University in 1971. At Pembroke College, Lovins studied mathematics and linguistics, graduating with honors. Her thesis was named, A Study of Idioms. She received the inaugural Bloch Fellowship in 1970 from the Linguistic Society of America to attend graduate school. Lovins obtained her Master of Arts in 1970 and Doctor of Philosophy in 1973 from the University of Chicago, studying linguistics. At the University of Chicago, her dissertation was titled, Loan Phonology -- Subject Matter. A revision of her thesis on loanwords and the phonological structure of Japanese was published in 1975 by the Indiana University Linguistics Club. == Teaching career == Following Lovins' PhD, she spent a year working as a linguist-at-large at a University of Tokyo language research institute and as an English conversation teacher. She then joined the faculty at Tsuda College as a professor of English and linguistics, where she taught for seven years. During her time as a faculty member at Tsuda College, Lovins also served as a guest researcher in the University of Tokyo's Research Institute of Logopedics and Phoniatrics, a research center for speech science. == Industry career == After teaching Japanese phonology at Japanese universities abroad, Lovins moved back to the U.S. to work in the computing industry. She worked on early speech synthesis at Bell Labs in Murray Hill, New Jersey. At Bell Labs, Lovins worked with Osamu Fujimura, a Japanese linguist who is credited as a pioneer in speech sciences. Lovins also worked as a software engineer at various companies in Silicon Valley and served as a consultant for computational linguistics throughout the 1990s. As a consultant, she called her business, "The Language Doctor." == The Lovins Stemming Algorithm == Lovins published an article about her work on developing a stemming algorithm through the Research Laboratory of Electronics at MIT in 1968. Lovins' stemming algorithm is frequently referred to as the Lovins stemmer. A stemming algorithm is the process of taking a word with suffixes and reducing it to its root, or base word. Stemming algorithms are used to improve the accuracy in information retrieval and in domain analysis. These algorithms help find variants of the terms being queried. Stemming algorithms bring value in their reduction of a given query into its less complex form, allowing more similar documents to be retrieved for similar queries. Stemming algorithms are prevalent in search engines, such as Google Search, which did not implement word stemming until 2003. This means that up until 2003, a Google search for the word warm would not have explicitly returned results for related words like warmth or warming. As the first published stemming algorithm, Lovins' work set a precedent and influenced future work in stemming algorithms, such as the Porter Stemmer published by Martin Porter in 1980 which has been recognized widely as the most common stemming algorithm for stemming English. Additionally, the Dawson Stemmer developed by John Dawson is an extension of the Lovins stemmer. The Lovins stemmer follows a rule-based affix elimination approach. It first removes the longest identifiable suffix from the target word - producing a base stem word - then indexes a lookup table to convert the (potentially malformed) stem word to a valid word. This process can be split into two phases. In the first phase, a word is compared with a pre-determined list of endings, and when a word is found to contain one of these endings, the ending is removed, leaving only the stem of the word. The second phase standardizes spelling exceptions that come from the first phase, ensuring that words with only marginally varying stems are appropriately paired together. For example, with the word dried, phase one results in dri, which should match with the word dry. The second phase takes care of these exceptions. Compared to other stemmers, Lovins' algorithm is fast and equipped to handle irregular plural words like person and people. Disadvantages, however, include many suffixes not being available in the table of endings. Furthermore, it is sometimes highly unreliable and frequently fails to form valid words from the stems or to match the stems of like-meaning words. This is most often caused by the usage of specialist terminology and domain-specific vocabulary by the author. == Personal life == Lovins moved to Mountain View, California, in 1979, and later to Old Mountain View in 1981 with her partner and later husband Greg Fowler, a software engineer and advocate for environmental issues & the blind. In their free time, she and her husband enjoyed taking walks and volunteering for their local community. Lovins actively volunteered for organizations like the Old Mountain View Neighborhood Association, Mountain View Friends of the Library, League of Women Voters, Mountain View Cool Cities Team, and the Mountain View Sustainability Task Force. In 2016, Lovins' husband died unexpectedly, following a heart attack. Eighteen days after her husband died, Lovins was diagnosed with brain cancer. She died on January 26, 2018, at a hospice, surrounded by friends, family and caregivers.

Backend as a service

Backend as a service (BaaS), sometimes also referred to as mobile backend as a service (MBaaS), is a service for providing web app and mobile app developers with a way to easily build a backend to their frontend applications. Features available include user management, push notifications, and integration with social networking services. These services are provided via the use of custom software development kits (SDKs) and application programming interfaces (APIs). BaaS is a relatively recent development in cloud computing, with most BaaS startups dating from 2011 or later. Some of the most popular service providers are AWS Amplify and Firebase. == Purpose == Web and mobile apps require a similar set of features on the backend, including notification service, integration with social networks, and cloud storage. Each of these services has its own API that must be individually incorporated into an app, a process that can be time-consuming and complicated for app developers. BaaS providers form a bridge between the frontend of an application and various cloud-based backends via a unified API and SDK. Providing a consistent way to manage backend data means that developers do not need to redevelop their own backend for each of the services that their apps need to access, potentially saving both time and money. Although similar to other cloud-computing business models, such as serverless computing, software as a service (SaaS), infrastructure as a service (IaaS), and platform as a service (PaaS), BaaS is distinct from these other services in that it specifically addresses the cloud-computing needs of web and mobile app developers by providing a unified means of connecting their apps to cloud services. == Features == BaaS providers offer different set of features and backend tools. Some of the most common features include: Database management. Most BaaS solutions provide SQL and/or NoSQL database management services for applications. Developers can store their app data without deploying and managing databases themselves. BaaS usually provides client SDKs, REST and GraphQL APIs for the frontend to interact with databases. File storage. BaaS providers often offer storage solutions for media files, user uploads, and other binary data. Applications can upload, download, and delete files through provided SDKs and APIs. Authentication and authorization. Some BaaS offer authentication and authorization services that allow developers to easily manage app users. This includes user sign-up, login, password reset, social media login integration through OAuth, user group and permission management etc. Notification service. Some BaaS providers such as Firebase and AWS Amplify have notification services that can send custom emails to users and push native notifications on mobile platforms. This is especially useful for applications that need to send messages, alerts, and reminders. Cloud functions. Some BaaS allow developers to deploy and run serverless functions. The functions are usually stateless and can be triggered by various ways including HTTP requests, SDK invocation, background server events, and cloud scheduled executions. Different providers offer runtime support for different languages, some of the popular languages are JavaScript/TypeScript (Node.js, Deno), Python, Java/Kotlin. Cloud functions extend the potential and flexibility of BaaS by allowing developers to write custom functionalities for their apps, working in a way similar to a traditional REST API backend framework. Usage analytics. Analytics data about application usage is often included in BaaS. This allows developers to monitor user behaviors and make decisions correspondingly in marketing strategies and performance optimizations. UI design. Some BaaS providers, such as AWS Amplify and Backendless, offer user interface designing tools that help developers design the frontend UI of web and mobile apps. While this may be useful for small teams and individual developers, UI design assistance may not be conventional in BaaS as it goes beyond the scope of backend infrastructure. Real-Time. Real-time features in a BaaS platform ensure that data updates and synchronizations occur instantly across all clients, making changes immediately visible to users. This is crucial for applications like live chat and collaborative tools, using technologies like WebSockets to maintain continuous server-client connections. == Service providers == BaaS providers have a broad focus, providing SDKs and APIs that work for app development on multiple platforms with different technology stacks, such as JavaScript (for Web apps), Flutter, Java/Kotlin (for Android apps), Swift/Objective-C (for iOS/MacOS/WatchOS/TvOS apps), .NET (for Windows) and others. BaaS providers also come in different types, suiting developers of different needs. === Cloud-based BaaS === Most BaaS providers host backend platforms on their cloud servers. They also manage the infrastructure, security, and scalability of the platforms. Developers can access the backend services via a web interface or the provided APIs. Some examples of cloud-based BaaS include Firebase (hosted on Google Cloud Platform), AWS Amplify (hosted on Amazon Web Services), and Microsoft Azure Mobile Apps (hosted on Microsoft Azure). === Self-hosted BaaS === Self-hosted BaaS allow developers to host backend on their own servers, providing more flexibility and potential to customization compared to cloud-based BaaS, which often is more difficult to migrate from. However, developers are also in charge of managing the infrastructure, security, and scalability of their servers. === Mobile BaaS === Mobile backend as a service (MBaaS) is a type of BaaS specifically for applications deployed in mobile systems. While some references use MBaaS interchangeably for BaaS, BaaS can have a wider variety of support such as for web apps and desktop apps. == Business model == BaaS providers generate revenue from their services in various ways, often using a freemium model. Under this model, a client receives a certain number of free active users or API calls per month, and pays a fee for each user or call over this limit. Alternatively, clients can pay a set fee for a package which allows for a greater number of calls or active users per month. There are also flat fee plans that make the pricing more predictable. Some of the providers offer the unlimited API calls inside their free plan offerings. Another business model that has been used by a lot of BaaS providers is PAYG (pay as you go), which has a flexible cost based on developers' usage of database, storage, bandwidth, function calls, user numbers etc.

Frederick Jelinek

Frederick Jelinek (18 November 1932 – 14 September 2010) was a Czech-American researcher in information theory, automatic speech recognition, and natural language processing. He is well known for his oft-quoted statement, "Every time I fire a linguist, the performance of the speech recognizer goes up". Jelinek was born in Czechoslovakia before World War II and emigrated with his family to the United States in the early years of the communist regime. He studied engineering at the Massachusetts Institute of Technology and taught for 10 years at Cornell University before accepting a job at IBM Research. In 1961, he married Czech screenwriter Milena Jelinek. At IBM, his team advanced approaches to computer speech recognition and machine translation. After IBM, he went to head the Center for Language and Speech Processing at Johns Hopkins University for 17 years, where he was still working on the day he died. == Personal life == Jelinek was born on November 18, 1932, as Bedřich Jelínek in Kladno to Vilém and Trude Jelínek. His father was Jewish; his mother was born in Switzerland to Czech Catholic parents and had converted to Judaism. Jelínek senior, a dentist, had planned early to escape Nazi occupation and flee to England; he arranged for a passport, visa, and the shipping of his dentistry materials. The couple planned to send their son to an English private school. However, Vilém decided to stay at the last minute and was eventually sent to the Theresienstadt concentration camp, where he died in 1945. The family was forced to move to Prague in 1941, but Frederick, his sister and mother—thanks to the latter's background—escaped the concentration camps. After the war, Jelinek entered in the gymnasium, despite having missed several years of schooling because education of Jewish children had been forbidden since 1942. His mother, anxious that her son should get a good education, made great efforts for their emigration, especially when it became clear he would not be allowed to even attempt the graduation examination. His mother hoped her son would become a physician, but Jelinek dreamed of being a lawyer. He studied engineering in evening classes at the City College of New York and received stipends from the National Committee for a Free Europe that allowed him to study at the Massachusetts Institute of Technology. About his choice of specialty, he said: "Fortunately, to electrical engineering there belonged a discipline whose aim was not the construction of physical systems: the theory of information". He obtained his Ph.D. in 1962, with Robert Fano as his adviser. In 1957, Jelinek paid an unexpected visit to Prague. He had been in Vienna and applied for a visa, hoping to see his former acquaintances again. He met with his old friend Miloš Forman, who introduced him to film student Milena Tobolová—whose screenplay had been the basis for the movie Easy Life (Snadný život). His flight back to the U.S. had a stopover in Munich, during which he called her to propose. Tobolová was considered a dissident and the authorities were not happy with her film. Jelinek asked for help from Jerome Wiesner and Cyrus Eaton, the latter who lobbied Nikita Khrushchev. Following the inauguration of John F. Kennedy, a group of Czech dissidents were allowed to emigrate in January 1961. Thanks to the lobbying, the future Milena Jelinek was one of them. After completing his graduate studies, Jelinek, who had developed an interest in linguistics, had plans to work with Charles F. Hockett at Cornell University. However these fell through and during the next ten years he continued to study information theory. Having previously worked at IBM during a sabbatical, he began full-time work there in 1972—at first on leave for Cornell, but permanently from 1974. He remained there for over twenty years. Although at first he had been offered a regular research job, upon his arrival he learned that Josef Raviv had recently been promoted to head of the newly opened IBM Haifa Research Laboratory, and became head of the Continuous Speech Recognition group at the Thomas J. Watson Research Center. Despite his team's successes in this area, Jelinek's work remained little known in his home country because Czech scientists were not allowed to participate in key conferences. After the 1989 fall of communism, Jelinek helped establish scientific relationships, regularly visiting to lecture and helping to persuade IBM to establish a computing centre at Charles University. In 1993, he retired from IBM and went to Johns Hopkins University's Center for Language and Speech Processing, where he was director and Julian Sinclair Smith Professor of Electrical and Computer Engineering. He was still working there at the time of his death; Jelinek died of a heart attack at the close of an otherwise normal workday in mid-September 2010. He was survived by his wife, daughter and son, sister, stepsister, and three grandchildren, including Sophie Gold Jelinek. == Research and legacy == Information theory was a fashionable scientific approach in the mid '50s. However, pioneer Claude Shannon wrote in 1956 that this trendiness was dangerous. He said, "Our fellow scientists in many different fields, attracted by the fanfare and by the new avenues opened to scientific analysis, are using these ideas in their own problems ... It will be all too easy for our somewhat artificial prosperity to collapse overnight when it is realized that the use of a few exciting words like information, entropy, redundancy, do not solve all our problems." During the next decade, a combination of factors shut down the application of information theory to natural language processing (NLP) problems—in particular machine translation. One factor was the 1957 publication of Noam Chomsky's Syntactic Structures, which stated, "probabilistic models give no insight into the basic problems of syntactic structure". This accorded well with the philosophy of the artificial intelligence research of the time, which promoted rule-based approaches. The other factor was the 1966 ALPAC report, which recommended that the government should stop funding research into machine translation. ALPAC chairman John Pierce later said that the field was filled with "mad inventors or untrustworthy engineers". He said that the underlying linguistic problems must be solved before attempts at NLP could be reasonably made. These elements essentially halted research in the field. Jelinek had begun to develop an interest in linguistics after the immigration of his wife, who initially enrolled in the MIT linguistics program with the help of Roman Jakobson. Jelinek often accompanied her to Chomsky's lectures, and even discussed the possibility of changing orientation with his adviser. Fano was "really upset", and after the failure of his project with Hockett at Cornell, he did not return to this field of research until starting work at IBM. The scope of research at IBM was considerably different from that of most other teams. According to Mark Liberman, "While [Jelinek] was leading IBM's effort to solve the general dictation problem during the decade or so following 1972, most other U.S. companies and academic researchers were working on very limited problems ... or were staying out of the field entirely". Jelinek regarded speech recognition as an information theory problem—a noisy channel, in this case the acoustic signal—which some observers considered a daring approach. The concept of perplexity was introduced in their first model, New Raleigh Grammar, which was published in 1976 as the paper "Continuous Speech Recognition by Statistical Methods" in the journal Proceedings of the IEEE. According to Young, the basic noisy channel approach "reduced the speech recognition problem to one of producing two statistical models". Whereas New Raleigh Grammar was a hidden Markov model, their next model, called Tangora, was broader and involved n-grams, specifically trigrams. Even though "it was obvious to everyone that this model was hopelessly impoverished", it was not improved upon until Jelinek presented another paper in 1999. The same trigram approach was applied to phones in single words. Although the identification of parts of speech turned out not to be very useful for speech recognition, tagging methods developed during these projects are now used in various NLP applications. The incremental research techniques developed at IBM eventually became dominant in the field after DARPA, in the mid-80s, returned to NLP research and imposed that methodology to participating teams, shared common goals, data, and precise evaluation metrics. The Continuous Speech Recognition Group's research, which required large amounts of data to train the algorithms, eventually led to the creation of the Linguistic Data Consortium. In the 1980s, although the broader problem of speech recognition remained unsolved, they sought to apply the methods developed to other problems; machine translat