Linguistics Research Center at UT Austin

The Linguistics Research Center (LRC) at the University of Texas is a center for computational linguistics research & development. It was directed by Prof. Winfred Lehmann until his death in 2007, and subsequently by Dr. Jonathan Slocum. Since its founding, virtually all projects at the LRC have involved processing natural language texts with the aid of computers. The principal activities of the Center at present focus on Indo-European languages and comprise historical study, lexicography, and web-based teaching; staff members engage in several independent but often complementary projects in these fields using a variety of software, almost all of it developed in-house. == History == The LRC was founded by Winfred Lehmann in 1961. In the early days, research efforts at the LRC concentrated on machine translation (MT) -- the translation of texts from one human language to another with the aid of computers, very developed nowadays in the field of language industry—funded by the USAF and other sponsors. The LRC concentrated on German English translation, though a copy of the Russian Master Dictionary was deposited at the LRC after the ALPAC report. After a general hiatus ca. 1975-78, new funding led to the development by Jonathan Slocum and others of a new system with the same name (the METAL MT system), but with new sets of tools for linguists and vastly greater success, resulting in the delivery a production prototype then later a full-fledged commercial MT system. MT R&D continued at the LRC, with funding by various sponsors, until well into the 1990s. From its early years to the present, the LRC has mounted a number of smaller projects resulting in the publication of significant works relating to Indo-European languages and/or their common ancestor, Proto-Indo-European. The hallmark of this work has been the use of computers to transcribe texts and prepare them for publication. A prominent example of the LRC using computers to prepare texts for print publication is the book by Winfred P. Lehmann, A Gothic Etymological Dictionary (Leiden: Brill, 1986). The final print-ready version was produced with the aid of a laser printer (exotic new technology, in those days) using, for the various languages included in the entries, approximately 500 special characters—many of them designed at the Center. This was the first major etymological dictionary for Indo-European languages to be produced with the aid of computers. Current LRC projects have concentrated on transcribing early Indo-European texts, developing language lessons based on them, and publishing on the web these and other materials related to the study of Indo-European languages, of their common ancestor Proto-Indo-European, and of historical linguistics more generally. == Alumni == Winfred Lehmann Rolf A. Stachowitz Jonathan Slocum Winfield S. Bennett John White

Feature engineering

Feature engineering is a preprocessing step in supervised machine learning and statistical modeling which transforms raw data into a more effective set of inputs. Each input comprises several attributes, known as features. By providing models with relevant information, feature engineering significantly enhances their predictive accuracy and decision-making capability. Beyond machine learning, the principles of feature engineering are applied in various scientific fields, including physics. For example, physicists construct dimensionless numbers such as the Reynolds number in fluid dynamics, the Nusselt number in heat transfer, and the Archimedes number in sedimentation. They also develop first approximations of solutions, such as analytical solutions for the strength of materials in mechanics. == Clustering == One of the applications of feature engineering has been clustering of feature-objects or sample-objects in a dataset. Especially, feature engineering based on matrix decomposition has been extensively used for data clustering under non-negativity constraints on the feature coefficients. These include Non-Negative Matrix Factorization (NMF), Non-Negative Matrix-Tri Factorization (NMTF), Non-Negative Tensor Decomposition/Factorization (NTF/NTD), etc. The non-negativity constraints on coefficients of the feature vectors mined by the above-stated algorithms yields a part-based representation, and different factor matrices exhibit natural clustering properties. Several extensions of the above-stated feature engineering methods have been reported in literature, including orthogonality-constrained factorization for hard clustering, and manifold learning to overcome inherent issues with these algorithms. Other classes of feature engineering algorithms include leveraging a common hidden structure across multiple inter-related datasets to obtain a consensus (common) clustering scheme. An example is Multi-view Classification based on Consensus Matrix Decomposition (MCMD), which mines a common clustering scheme across multiple datasets. MCMD is designed to output two types of class labels (scale-variant and scale-invariant clustering), and: is computationally robust to missing information, can obtain shape- and scale-based outliers, and can handle high-dimensional data effectively. Coupled matrix and tensor decompositions are popular in multi-view feature engineering. == Predictive modelling == Feature engineering in machine learning and statistical modeling involves selecting, creating, transforming, and extracting data features. Key components include feature creation from existing data, transforming and imputing missing or invalid features, reducing data dimensionality through methods like Principal Components Analysis (PCA), Independent Component Analysis (ICA), and Linear Discriminant Analysis (LDA), and selecting the most relevant features for model training based on importance scores and correlation matrices. Features vary in significance. Even relatively insignificant features may contribute to a model. Feature selection can reduce the number of features to prevent a model from becoming too specific to the training data set (overfitting). Feature explosion occurs when the number of identified features is too large for effective model estimation or optimization. Common causes include: Feature templates - implementing feature templates instead of coding new features Feature combinations - combinations that cannot be represented by a linear system Feature explosion can be limited via techniques such as regularization, kernel methods, and feature selection. == Automation == Automation of feature engineering is a research topic that dates back to the 1990s. Machine learning software that incorporates automated feature engineering has been commercially available since 2016. Related academic literature can be roughly separated into two types: Multi-relational Decision Tree Learning (MRDTL) uses a supervised algorithm that is similar to a decision tree. Deep Feature Synthesis uses simpler methods. === Multi-relational Decision Tree Learning (MRDTL) === Multi-relational Decision Tree Learning (MRDTL) extends traditional decision tree methods to relational databases, handling complex data relationships across tables. It innovatively uses selection graphs as decision nodes, refined systematically until a specific termination criterion is reached. Most MRDTL studies base implementations on relational databases, which results in many redundant operations. These redundancies can be reduced by using techniques such as tuple id propagation. === Open-source implementations === There are a number of open-source libraries and tools that automate feature engineering on relational data and time series: featuretools is a Python library for transforming time series and relational data into feature matrices for machine learning. MCMD: An open-source feature engineering algorithm for joint clustering of multiple datasets. OneBM or One-Button Machine combines feature transformations and feature selection on relational data with feature selection techniques. OneBM helps data scientists reduce data exploration time allowing them to try and error many ideas in short time. On the other hand, it enables non-experts, who are not familiar with data science, to quickly extract value from their data with a little effort, time, and cost. getML community is an open source tool for automated feature engineering on time series and relational data. It is implemented in C/C++ with a Python interface. It has been shown to be at least 60 times faster than tsflex, tsfresh, tsfel, featuretools or kats. tsfresh is a Python library for feature extraction on time series data. It evaluates the quality of the features using hypothesis testing. tsflex is an open source Python library for extracting features from time series data. Despite being 100% written in Python, it has been shown to be faster and more memory efficient than tsfresh, seglearn or tsfel. seglearn is an extension for multivariate, sequential time series data to the scikit-learn Python library. tsfel is a Python package for feature extraction on time series data. kats is a Python toolkit for analyzing time series data. === Deep feature synthesis === The deep feature synthesis (DFS) algorithm beat 615 of 906 human teams in a competition. == Feature stores == The feature store is where the features are stored and organized for the explicit purpose of being used to either train models (by data scientists) or make predictions (by applications that have a trained model). It is a central location where you can either create or update groups of features created from multiple different data sources, or create and update new datasets from those feature groups for training models or for use in applications that do not want to compute the features but just retrieve them when it needs them to make predictions. A feature store includes the ability to store code used to generate features, apply the code to raw data, and serve those features to models upon request. Useful capabilities include feature versioning and policies governing the circumstances under which features can be used. Feature stores can be standalone software tools or built into machine learning platforms. == Alternatives == Feature engineering can be a time-consuming and error-prone process, as it requires domain expertise and often involves trial and error. Deep learning algorithms may be used to process a large raw dataset without having to resort to feature engineering. However, deep learning algorithms still require careful preprocessing and cleaning of the input data. In addition, choosing the right architecture, hyperparameters, and optimization algorithm for a deep neural network can be a challenging and iterative process.

Bainu (website)

Bainu ("how are you?") is a Chinese social networking website written in the Mongolian language. As of 2020 it had about 400,000 users, concentrated in Inner Mongolia. == Core features and positioning == Language and Cultural Characteristics Bainu is based on Traditional Mongolian Script and supports social interactions in the Mongolian language, including various message formats such as text, voice, images, and video. This design aims to preserve and promote Mongolian language and culture, particularly appealing to users in Inner Mongolia and other Mongolian-populated areas. Social Features Instant Messaging: Supports one-on-one private chats and group chats. Users can create interest-based groups or join local communities. Life Sharing: Through the "Chomorlig" feature (similar to Moments or a dynamic feed), users can share daily highlights to enhance community interaction. Location-Based Socializing: Recommends nearby users based on location, making it easier to connect with Mongolian friends in the same city or neighboring regions. Multilingual Support The app interface is available in English, Mongolian, and Simplified Chinese. == Technical Features and User Experience == Cross-Platform Compatibility Supports iPhone, iPad, Mac (with M1 chip or above), and Apple Vision Pro devices, covering users across the Apple ecosystem. Pricing Model Free download and basic features are available. Premium services (e.g., ad-free experience, extended social functions) require a subscription, with pricing options including $0.99/month, $2.99/quarter, and $6.99/year. User Feedback Positive Reviews: Some users praise it as the "best Mongolian-language chat app," recognizing its cultural value and social convenience. Negative Feedback: Reports of app crashes and technical issues, with some users calling for improved stability (e.g., frequent crashes in the iOS version). == Privacy and Data Policy == Bainu collects user data such as location, contact information, and device identifiers, which are linked to user identities. Additionally, user behavior may be tracked through third-party services, raising some privacy concerns. == Current Development and Challenges == User Base As of 2020, Bainu had approximately 400,000 users, primarily concentrated in Inner Mongolia. Policy Impact It was reported by Voice of America (VOA) that the Chinese authorities blocked Bainu on 23 August 2020 in order to prohibit Mongolians from discussing the issue of the authorities’ implementation of "bilingual education" in elementary schools. But now, in 2025, this software is completely available for download and use. see:https://bainu.com/

TheFWA

FWA (Favourite Website Awards) is an international award platform that honors and rewards web designers, developers and agencies around the world for excellence within the field of web design and development. The FWA was founded in May 2000 by Rob Ford. In November 2012, The FWA was the most visited website award program in the history of the internet, with over 170 millions site visits. == Jury == The FWA jury is composed of more than 500 web professionals (200 women + 200 men) from 35 countries. == Awards granted == FWA of the Day (FOTD) : Every day, the FWA jury selects the best project, FWA of the Month (FOTM): Every month, the FWA jury selects the best project, People's Choice Award (PCA) : Every year, a public vote selects the people's favourite project, FWA of the Year (FOTY) : Every year, the FWA jury selects the best project. == Hall Of Fame == The FWA Hall of Fame was established in May 2007 (to celebrate the seventh anniversary of the FWA), as a recognition of web's greatest individuals and companies.

Comet (programming)

Comet is a web application model in which a long-held HTTPS request allows a web server to push data to a browser, without the browser explicitly requesting it. Comet is an umbrella term, encompassing multiple techniques for achieving this interaction. All these methods rely on features included by default in browsers, such as JavaScript, rather than on non-default plugins. The Comet approach differs from the original model of the web, in which a browser requests a complete web page at a time. The use of Comet techniques in web development predates the use of the word Comet as a neologism for the collective techniques. Comet is known by several other names, including Ajax Push, Reverse Ajax, Two-way-web, HTTP Streaming, and HTTP server push among others. The term Comet is not an acronym, but was coined by Alex Russell in his 2006 blog post. In recent years, the standardisation and widespread support of WebSocket and Server-sent events has rendered the Comet model obsolete. == History == === Early Java applets === The ability to embed Java applets into browsers (starting with Netscape Navigator 2.0 in March 1996) made two-way sustained communications possible, using a raw TCP socket to communicate between the browser and the server. This socket can remain open as long as the browser is at the document hosting the applet. Event notifications can be sent in any format – text or binary – and decoded by the applet. === The first browser-to-browser communication framework === The very first application using browser-to-browser communications was Tango Interactive, implemented in 1996–98 at the Northeast Parallel Architectures Center (NPAC) at Syracuse University using DARPA funding. TANGO architecture has been patented by Syracuse University. TANGO framework has been extensively used as a distance education tool. The framework has been commercialized by CollabWorx and used in a dozen or so Command&Control and Training applications in the United States Department of Defense. === First Comet applications === The first set of Comet implementations dates back to 2000, with the Pushlets, Lightstreamer, and KnowNow projects. Pushlets, a framework created by Just van den Broecke, was one of the first open source implementations. Pushlets were based on server-side Java servlets, and a client-side JavaScript library. Bang Networks – a Silicon Valley start-up backed by Netscape co-founder Marc Andreessen – had a lavishly financed attempt to create a real-time push standard for the entire web. In April 2001, Chip Morningstar began developing a Java-based (J2SE) web server which used two HTTP sockets to keep open two communications channels between the custom HTTP server he designed and a client designed by Douglas Crockford; a functioning demo system existed as of June 2001. The server and client used a messaging format that the founders of State Software, Inc. assented to coin as JSON following Crockford's suggestion. The entire system, the client libraries, the messaging format known as JSON and the server, became the State Application Framework, parts of which were sold and used by Sun Microsystems, Amazon.com, EDS and Volkswagen. In March 2006, software engineer Alex Russell coined the term Comet in a post on his personal blog. The new term was a play on Ajax (Ajax and Comet both being common household cleaners in the USA). In 2006, some applications exposed those techniques to a wider audience: Meebo’s multi-protocol web-based chat application enabled users to connect to AOL, Yahoo, and Microsoft chat platforms through the browser; Google added web-based chat to Gmail; JotSpot, a startup since acquired by Google, built Comet-based real-time collaborative document editing. New Comet variants were created, such as the Java-based ICEfaces JSF framework (although they prefer the term "Ajax Push"). Others that had previously used Java-applet based transports switched instead to pure-JavaScript implementations. == Implementations == Comet applications attempt to eliminate the limitations of the page-by-page web model and traditional polling by offering two-way sustained interaction, using a persistent or long-lasting HTTP connection between the server and the client. Since browsers and proxies are not designed with server events in mind, several techniques to achieve this have been developed, each with different benefits and drawbacks. The biggest hurdle is the HTTP 1.1 specification, which states "this specification... encourages clients to be conservative when opening multiple connections". Therefore, holding one connection open for real-time events has a negative impact on browser usability: the browser may be blocked from sending a new request while waiting for the results of a previous request, e.g., a series of images. This can be worked around by creating a distinct hostname for real-time information, which is an alias for the same physical server. This strategy is an application of domain sharding. Specific methods of implementing Comet fall into two major categories: streaming and long polling. === Streaming === An application using streaming Comet opens a single persistent connection from the client browser to the server for all Comet events. These events are incrementally handled and interpreted on the client side every time the server sends a new event, with neither side closing the connection. Specific techniques for accomplishing streaming Comet include the following: ==== Hidden iframe ==== A basic technique for dynamic web application is to use a hidden iframe HTML element (an inline frame, which allows a website to embed one HTML document inside another). This invisible iframe is sent as a chunked block, which implicitly declares it as infinitely long (sometimes called "forever frame"). As events occur, the iframe is gradually filled with script tags, containing JavaScript to be executed in the browser. Because browsers render HTML pages incrementally, each script tag is executed as it is received. Some browsers require a specific minimum document size before parsing and execution is started, which can be obtained by initially sending 1–2 kB of padding spaces. One benefit of the iframes method is that it works in every common browser. Two downsides of this technique are the lack of a reliable error handling method, and the impossibility of tracking the state of the request calling process. ==== XMLHttpRequest ==== The XMLHttpRequest (XHR) object, a tool used by Ajax applications for browser–server communication, can also be pressed into service for server–browser Comet messaging by generating a custom data format for an XHR response, and parsing out each event using browser-side JavaScript; relying only on the browser firing the onreadystatechange callback each time it receives new data. === Ajax with long polling === None of the above streaming transports work across all modern browsers without negative side-effects. This forces Comet developers to implement several complex streaming transports, switching between them depending on the browser. Consequently, many Comet applications use long polling, which is easier to implement on the browser side, and works, at minimum, in every browser that supports XHR. As the name suggests, long polling requires the client to poll the server for an event (or set of events). The browser makes an Ajax-style request to the server, which is kept open until the server has new data to send to the browser, which is sent to the browser in a complete response. The browser initiates a new long polling request in order to obtain subsequent events. IETF RFC 6202 "Known Issues and Best Practices for the Use of Long Polling and Streaming in Bidirectional HTTP" compares long polling and HTTP streaming. Specific technologies for accomplishing long-polling include the following: ==== XMLHttpRequest long polling ==== For the most part, XMLHttpRequest long polling works like any standard use of XHR. The browser makes an asynchronous request of the server, which may wait for data to be available before responding. The response can contain encoded data (typically XML or JSON) or Javascript to be executed by the client. At the end of the processing of the response, the browser creates and sends another XHR, to await the next event. Thus the browser always keeps a request outstanding with the server, to be answered as each event occurs. ==== Script tag long polling ==== While any Comet transport can be made to work across subdomains, none of the above transports can be used across different second-level domains (SLDs), due to browser security policies designed to prevent cross-site scripting attacks. That is, if the main web page is served from one SLD, and the Comet server is located at another SLD (which does not have cross-origin resource sharing enabled), Comet events cannot be used to modify the HTML and DOM of the main page, using those transports. This problem can be sidestepped by creating a proxy server in

Interactions Corporation

Interactions LLC (also known as Interactions Corporation) is an American software company that develops voice and text-based virtual assistant applications for customer-service contact centers. Since September 2025, it has been a subsidiary of SoundHound AI. == History == Interactions was founded in 2004. In July 2011, the company announced a $12 million venture-capital funding round led by Sigma Partners. In November 2014, AT&T sold its "Watson" speech recognition platform and related patents to Interactions in exchange for equity. In May 2017, Interactions acquired the social media customer-engagement company Digital Roots; financial terms were not disclosed. On September 3, 2025, SoundHound AI completed its acquisition of Interactions Corporation, with the acquired company becoming a wholly owned subsidiary. == Products and services == Interactions' products have been described as automated voice portals and intelligent virtual assistants used for customer-service tasks. In 2011, Humana expanded the use of an Interactions voice portal for Medicare Part D enrollment.

Electronic submission

Electronic submission refers to the submission of a document by electronic means: that is, via e-mail or a web form on the Internet, or on an electronic medium such as a compact disc, a hard disk or a USB flash drive. Traditionally, the term "manuscript" referred to anything that was explicitly "written by hand". However, in popular usage and especially in the context of computers and the internet, the term "manuscript" may even refer to documents (text or otherwise) typed out or prepared on typewriters and computers and can be extended to digital photographs and videos, and online surveys too. In other words, any manuscript prepared and submitted online can be considered to be an electronic submission. == History and early usage == There is no concrete data indicating when and by whom were electronic submissions used for the first time. However, research based universities in several countries have been encouraging the collection of course assignments and projects in the form of electronic submissions for almost a decade now. Several governments and organizations are also switching to electronic submissions for the collection of research papers, grant applications and government application forms. == Types of electronic submissions == Since modern computers can store and process information and data in virtually any format and with the Internet allowing easy transfer of this data, the number of scenarios in which submissions can be collected electronically has increased exponentially in the last few years. Some of these scenarios are described below. In most of these scenarios, submissions were collected on hard paper until the Information Technology revolution occurred. === Academic Submissions === Teachers, professors and teaching assistants often collect course assignments and projects electronically. Electronic submissions are usually collected using a web-based system which more often than not also helps in the management of submissions collected and stored on it. (Explained By Henny L, University of Lethbridge, AB, Canada) === Research Papers === In call-for-paper or academic conferences, prospective presenters are usually asked to submit a short abstract or a full paper on their presentation or research work electronically, which is reviewed before being accepted for the conference. === Proposals for Grants === Several grant-giving organizations like the NSA, W3C, NIA, NIH etc. require grant seekers to submit a proposal which if accepted result in the desired grants. A majority of these proposals are now submitted electronically on systems that also help in the managing and tracking the proposals submitted. === Articles for Publication === Magazines, newspapers and other publishing houses have begun accepting electronic submissions for articles from various sources - both internal (by journalists and writers hired by them) as well as external (by users and popular readers). The submitted articles are stored on a server hosted by the publication house or by a third-party Archived 2019-10-13 at the Wayback Machine vendor and are usually evaluated before being given a green signal. === Contests and Competition Entries === Almost every kind of contest or competition requires participants to submit an entry in a format described by the organizers of the contest. If the contest is an Internet-based one, then the entries or nominations for the contest are collected electronically using e-mail or other electronic means depending on feasibility and the choice of the organizers. === Government Applications === The governments of several countries are turning to electronic submission of applications and forms for various government procedures. Electronic submissions allow easier management of the applications and forms submitted. === Legal documents === Many legal documents may be submitted to the courts electronically. In England and Wales, the Civil Procedure Rules include a suitable "document exchange" as an acceptable "method of service". Case law in employment law cases has established that where a claim is submitted electronically, a prudent legal adviser should "check that it has been received and there must be systems in place for doing that". === Resumés and CVs === It has become commonplace for job-seekers to submit soft copies (electronic versions) of their resumés and CVs to recruiting agencies and online job portals. This is usually done over the Internet using e-mail or a pre-hosted web-based system. == Submission management systems == The art and science of collecting and managing electronic submissions is called Submission Management. Certain software vendors have begun developing submission management systems to assist in the collection, tracking and management of complex submission processes realized electronically. Most of these systems are web based and accessible from any device with a browser and an Internet connection. However, a majority of these systems are application specific and cannot be applied to all submission management scenarios. == Resistance to electronic submissions == Despite the easier management and tracking of electronic submissions compared to their paper-based counterparts, widespread adoption and use of electronic submissions and systems for managing them has been hampered by several facts, which include but are not limited to: Inconvenience while drawing figures, diagrams and equations on a computer Resistance to change and adoption of new technologies Lack of or limited access to the Internet.