AI Email Message Generator

AI Email Message Generator — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Rule-based machine translation

    Rule-based machine translation

    Rule-based machine translation (RBMT) is a classical approach of machine translation systems based on linguistic information about source and target languages. Such information is retrieved from (unilingual, bilingual or multilingual) dictionaries and grammars covering the main semantic, morphological, and syntactic regularities of each language. Having input sentences, an RBMT system generates output sentences on the basis of analysis of both the source and the target languages involved. RBMT has been progressively superseded by more efficient methods, particularly neural machine translation. == History == The first RBMT systems were developed in the early 1970s. The most important steps of this evolution were the emergence of the following RBMT systems: Systran Japanese MT systems Today, other common RBMT systems include: Apertium GramTrans == Types of RBMT == There are three different types of rule-based machine translation systems: Direct Systems (Dictionary Based Machine Translation) map input to output with basic rules. Transfer RBMT Systems (Transfer Based Machine Translation) employ morphological and syntactical analysis. Interlingual RBMT Systems (Interlingua) use an abstract meaning. RBMT systems can also be characterized as the systems opposite to Example-based Systems of Machine Translation (Example Based Machine Translation), whereas Hybrid Machine Translations Systems make use of many principles derived from RBMT. == Basic principles == The main approach of RBMT systems is based on linking the structure of the given input sentence with the structure of the demanded output sentence, necessarily preserving their unique meaning. The following example can illustrate the general frame of RBMT: A girl eats an apple. Source Language = English; Demanded Target Language = German Minimally, to get a German translation of this English sentence one needs: A dictionary that will map each English word to an appropriate German word. Rules representing regular English sentence structure. Rules representing regular German sentence structure. And finally, we need rules according to which one can relate these two structures together. Accordingly, we can state the following stages of translation: 1st: getting basic part-of-speech information of each source word: a = indef.article; girl = noun; eats = verb; an = indef.article; apple = noun 2nd: getting syntactic information about the verb "to eat": NP-eat-NP; here: eat – Present Simple, 3rd Person Singular, Active Voice 3rd: parsing the source sentence: (NP an apple) = the object of eat Often only partial parsing is sufficient to get to the syntactic structure of the source sentence and to map it onto the structure of the target sentence. 4th: translate English words into German a (category = indef.article) => ein (category = indef.article) girl (category = noun) => Mädchen (category = noun) eat (category = verb) => essen (category = verb) an (category = indef. article) => ein (category = indef.article) apple (category = noun) => Apfel (category = noun) 5th: Mapping dictionary entries into appropriate inflected forms (final generation): A girl eats an apple. => Ein Mädchen isst einen Apfel. == Ontologies == An ontology is a formal representation of knowledge that includes the concepts (such as objects, processes etc.) in a domain and some relations between them. If the stored information is of linguistic nature, one can speak of a lexicon. In NLP, ontologies can be used as a source of knowledge for machine translation systems. With access to a large knowledge base, rule-based systems can be enabled to resolve many (especially lexical) ambiguities on their own. In the following classic examples, as humans, we are able to interpret the prepositional phrase according to the context because we use our world knowledge, stored in our lexicons:I saw a man/star/molecule with a microscope/telescope/binoculars.Since the syntax does not change, a traditional rule-based machine translation system may not be able to differentiate between the meanings. With a large enough ontology as a source of knowledge however, the possible interpretations of ambiguous words in a specific context can be reduced. === Building ontologies === The ontology generated for the PANGLOSS knowledge-based machine translation system in 1993 may serve as an example of how an ontology for NLP purposes can be compiled: A large-scale ontology is necessary to help parsing in the active modules of the machine translation system. In the PANGLOSS example, about 50,000 nodes were intended to be subsumed under the smaller, manually-built upper (abstract) region of the ontology. Because of its size, it had to be created automatically. The goal was to merge the two resources LDOCE online and WordNet to combine the benefits of both: concise definitions from Longman, and semantic relations allowing for semi-automatic taxonomization to the ontology from WordNet. A definition match algorithm was created to automatically merge the correct meanings of ambiguous words between the two online resources, based on the words that the definitions of those meanings have in common in LDOCE and WordNet. Using a similarity matrix, the algorithm delivered matches between meanings including a confidence factor. This algorithm alone, however, did not match all meanings correctly on its own. A second hierarchy match algorithm was therefore created which uses the taxonomic hierarchies found in WordNet (deep hierarchies) and partially in LDOCE (flat hierarchies). This works by first matching unambiguous meanings, then limiting the search space to only the respective ancestors and descendants of those matched meanings. Thus, the algorithm matched locally unambiguous meanings (for instance, while the word seal as such is ambiguous, there is only one meaning of seal in the animal subhierarchy). Both algorithms complemented each other and helped constructing a large-scale ontology for the machine translation system. The WordNet hierarchies, coupled with the matching definitions of LDOCE, were subordinated to the ontology's upper region. As a result, the PANGLOSS MT system was able to make use of this knowledge base, mainly in its generation element. == Components == The RBMT system contains: a SL morphological analyser - analyses a source language word and provides the morphological information; a SL parser - is a syntax analyser which analyses source language sentences; a translator - used to translate a source language word into the target language; a TL morphological generator - works as a generator of appropriate target language words for the given grammatica information; a TL parser - works as a composer of suitable target language sentences; Several dictionaries - more specifically a minimum of three dictionaries: a SL dictionary - needed by the source language morphological analyser for morphological analysis, a bilingual dictionary - used by the translator to translate source language words into target language words, a TL dictionary - needed by the target language morphological generator to generate target language words. The RBMT system makes use of the following: a Source Grammar for the input language which builds syntactic constructions from input sentences; a Source Lexicon which captures all of the allowable vocabulary in the domain; Source Mapping Rules which indicate how syntactic heads and grammatical functions in the source language are mapped onto domain concepts and semantic roles in the interlingua; a Domain Model/Ontology which defines the classes of domain concepts and restricts the fillers of semantic roles for each class; Target Mapping Rules which indicate how domain concepts and semantic roles in the interlingua are mapped onto syntactic heads and grammatical functions in the target language; a Target Lexicon which contains appropriate target lexemes for each domain concept; a Target Grammar for the target language which realizes target syntactic constructions as linearized output sentences. == Advantages == No bilingual texts are required. This makes it possible to create translation systems for languages that have no texts in common, or even no digitized data whatsoever. Domain independent. Rules are usually written in a domain independent manner, so the vast majority of rules will "just work" in every domain, and only a few specific cases per domain may need rules written for them. No quality ceiling. Every error can be corrected with a targeted rule, even if the trigger case is extremely rare. This is in contrast to statistical systems where infrequent forms will be washed away by default. Total control. Because all rules are hand-written, you can easily debug a rule-based system to see exactly where a given error enters the system, and why. Reusability. Because RBMT systems are generally built from a strong source language analysis that is fed to a transfer step and target language generator, the source language analysis and targe

    Read more →
  • The Citation Project

    The Citation Project

    The Citation Project is a series of studies that measure and analyze first-year college writing students' source use and their ability to understand and implement sources within their own writing. The Citation Project reveals students' source-use habits and the issues that can be seen based on their lack of proper citation skills, such as the prevalence of plagiarism, institution policies, and the results of current writing pedagogy. The Citation Project's central findings were first presented at the Conference on College Composition and Communication in 2012. Although The Citation Project originally referred to this single 2012 study, the feedback received led to the conception of the Project as a broader initiative and as a place to gather and publish studies and data relating to student writing habits for the usage of other researches. == Method == The Citation Project's data comes from the work of 20 researchers analyzing 174 first-year composition students' research papers. The student papers studied originated from 16 institutions across the United States of America, including community colleges, public and private universities, denominational colleges, and Ivy Leagues. Researchers used bibliographic coding to aggregate data regarding the type, length, reading level, and usage of students' sources. == Findings == === Student source assessment and use === This study found that students were capable of identifying, locating, and accessing librarian-approved academic sources, most commonly accessing them with the internet. Despite students demonstrating their ability to find appropriate sources, they tend to exclusively cite the first few pages of their sources. Students' use and analysis of their citations are often limited, frequently resorting to patchwriting, directly restating their source's points, and omitting their own interpretations of their reference's ideas. The Citation Project also highlights students' struggle to accurately determine, address, and value their sources' bias, authority, and credibility. According to the Project's researchers' analysis, these habits demonstrate that first-year college writing students minimally engage with their sources and the academic conversations between them. One researcher from the Citation Project, Rebecca Moore Howard, believes these findings do not point towards students being lazy, but is rather a result of a writing pedagogy that prioritizes efficient, product-focused writing. Another interpretation offered by Sandra Jamieson, another researcher from the Citation Project explains their findings as a result of a lack of adherence to Information Learning (IL) Standards. === Pedagogy === A significant focus of The Citation Project is the development of pedagogical practices intended to equip students with writing and research techniques that will set them up for future success. Writers associated with The Citation Project, such as Tricia Serviss, believe that the practices of teachers surrounding academic integrity and writing practices are what form the foundation of how students think about writing and how to engage with assignments throughout their academic career. They also stress the importance of teaching students to effectively engage with sources rather than simply how to correctly cite them. The Citation Project asserts that endowing students with the ability to read, understand, and synthesize a variety of sources in their writing is a skill that will benefit them throughout their academic careers, and that the surface level typographical focus that many writing programs utilize is inadequate. == Plagiarism == One of the areas that The Citation Project also looks at is how students commit plagiarism throughout their writing. Plagiarism tends to be a checkpoint that gives instructors a sense where students' citation skills stand. Findings from The Citation Project reveal that the most common type of plagiarism is patchwriting which is the act of using the same sentence with only changing a couple of words. These types of issues can be seen as a learning curve due to lack of proper training. Student's that commit plagiarism are often unaware. === Policies === Another issue found is that academic plagiarism policies may not benefit a student's growth but may instead obstruct it. Policies against plagiarism tend to be harsh on the student that committed of offense. Even though student plagiarism is often unintentional academic institutions see this behavior as intentional. Student may then face harsh consequences as a result from their lack of citation knowledge. Additionally, higher level institutions assume that new students already have the skill set to avoid plagiarism which may be an unrealistic expectation. == Legacy == === Inspired studies === ==== Parrott and Napier ==== In one study, "Critical Reading and Student Self-Selected Texts: Results of a Collaborative, Explicit Curricular Approach," Jill Parrot and Trenia Napier quoted the Citation Project's findings as evidence that current collegiate writing curriculums are an ineffective means of teaching students how to properly write academic research papers. The researchers accredited current writing pedagogy's lack of emphasis on teaching critical reading skills. Parrott and Napier tested their thesis by seeing if students would produce more academic writing if they partook in a writing course that taught critical reading. Their results mostly went against this hypothesis, finding students who received additional critical reading training only significantly improved in how they integrated their sources. ==== Kocatepe ==== In May Mehtap Kocatep's study, "Reconceptualising the notion of finding information: How undergraduate students construct information as they read-to-write in an academic writing class," Kocatep expresses that she believes current conversations around writing pedagogy, including the Citation Project, operate with the underlying misconception that information is an easily discoverable static entity and its retrieval is an objective, unbiased decision. Kocatepe instead offers the analysis of what students view as valuable information and if it is worth using is influenced by the socially constructed meanings available to writers at the moment. To further examine students' source engagement, Kocatepe did a study on how female university students from the United Arab Emirates find, retrieve, use, and value sources. Kocatepe's results mainly noted students' almost exclusive reliance on using Google to find sources, as well as how students' navigated mainly English-speaking academic conversations as non-native English speakers. ==== Dahlen, Nordstrom-Sanchez, and Graff ==== Dahlen, Nordstrom-Sanchez, and Graff built their study off The Citation Project research in order to explore the attitudes and practices of students in an undergraduate writing course. As the researchers acknowledge, data collected by the Citation Project was the subject of the bulk of their analysis. This study sought to examine undergraduate writing practices tied to source-usage and elucidate any relevant trends. Dahlen, Nordstrom-Sanchez and Graff found that undergraduate writing students were not engaging with outside sources properly. Key issues discussed include lack of engagement with broad source ideas (in favor of picking out quotes), lack of paraphrasing, and inability to link information between multiple sources. ==== Davis ==== Phillip M. Davis based much of the analysis in his study on data gathered by the Citation Project. This study aimed to examine the particular effects web-based research and study had on undergraduate's papers and the replicability of their bibliographies. Davis sought to see how the shift from physical in-person library based research to online, often at-home research changed the function and usability of the bibliography as a form of documenting source usage in a given work. The primary method of analysis involved examining students' bibliographies to see where they were finding information online and how these sources were accessed. A main issue Davis found was "persistency" of URLs used for online citations. He found that only 18% of URL-based citations continued to function (the others either no longer pointing to the correct document or ceasing to exist altogether) within 3 years of their usage by students, and more than half of claimed online citations could not be found in any form. He suggests that this result brings up questions about how web-based citations should be dealt with in a university context.

    Read more →
  • Document capture software

    Document capture software

    Document capture software refers to applications that provide the ability and feature set to automate the process of scanning paper documents or importing electronic documents, often for the purposes of feeding advanced document classification and data collection processes. Most scanning hardware, both scanners and copiers, provides the basic ability to scan to any number of image file formats, including: PDF, TIFF, JPG, BMP, etc. This basic functionality is augmented by document capture software, which can add efficiency and standardization to the process. == Typical features == Typical features of Document Capture Software include: Barcode recognition Patch Code recognition Separation Optical Character Recognition (OCR) Optical Mark Recognition (OMR) Quality Assurance Indexing Migration === Goal for implementation of a document capture solution === The goal for implementing a document capture solution is to reduce the amount of time spent scanning, separating, enhancing, organizing, classifying, normalizing, and collecting information from document collections, and to produce metadata along with an image/PDF file, and/or OCR text. This information is then migrated to a file share, FTP site, database, Document Management or Enterprise Content Management system. These systems often provide a search function, allowing search of the assets based on the produced metadata, and then viewed using document imaging software. == General document capture system solutions == === Integration with document management system === ECM (Enterprise Content management) and their DMS component (Document Management System) are being adopted by many organizations as a corporate document management system for all types of electronic files, e.g. MS word, PDF ... However, much of the information held by organisations is on paper and this needs to be integrated within the same document repository. By converting paper documents into digital format through scanning, organizations convert paper into image formats such as TIF, JPG, and PDF, and also extract valuable index information or business data from the document using OCR technology. Digital documents and associated metadata can easily be stored in the ECM in a variety of formats. The most popular of these formats is PDF which not only provides an accurate representation of the document but also allows all the OCR text in the document to be stored behind the PDF image. This format is known as PDF with hidden text or text-searchable PDF. This allows users to search for documents by using keywords in the metadata fields or by searching the content of PDF files across the repository. ==== Advantages of scanning documents into a ECM/DMS ==== Information held on paper is usually just as valuable to organisations as the electronic documents that are generated internally. Often this information represents a large proportion of the day to day correspondence with suppliers and customers. Having the ability to manage and share this information internally through a document management system such as SharePoint or a CMIS-compatible repository improves collaboration between departments or employees and also eliminates the risk of losing this information through disasters such as floods or fire. Organisations adopting an ECM/DMS often implement electronic workflow which allows the information held on paper to be included as part of an electronic business process and incorporated into a customer record file along with other associated office documents and emails. For business critical documents, such as purchase orders and supplier invoices, digitising documents helps speed up business transactions as well as reduce manual effort involved in keying data into business systems, such as CRM, ERP and Accounting. Scanned invoices can also be routed to managers for payment approval via email or an electronic workflow. == Electronic document capture == In the earlier implementations of Document Capture Software, the technology focused solely on the digitization and capture of information from paper documents. Document images were acquired from document scanners via TWAIN/ISIS drivers. Only image-based file formats like TIF, JPG, and BMP were typically compatible with these solutions. But in recent years, as the volume of electronically-created documents and the number of proprietary file formats continues to increase at exponential rates, the need for handling documents existing in electronic formats has grown. The relevant document capture products have adapted to function with non-image file formats with the end-goal of creating a unified processing workflow capable of handling all incoming documents The ability to import files from a variety of sources is one example of such adaptation. Importing documents from ECM/DMS software solutions, email servers, FTP, and EDI is now as much of a requirement of document capture software as is paper capture. The normalization of output files to text-based PDF format is now another critical factor in long-term archival of proprietary electronic file formats. Normalization expands access and usage of files to users throughout the enterprise, rather than only those that created the original electronic file.

    Read more →
  • Randomized rounding

    Randomized rounding

    In computer science and operations research, randomized rounding is a widely used approach for designing and analyzing approximation algorithms. Many combinatorial optimization problems are computationally intractable to solve exactly (to optimality). For such problems, randomized rounding can be used to design fast (polynomial time) approximation algorithms—that is, algorithms that are guaranteed to return an approximately optimal solution given any input. The basic idea of randomized rounding is to convert an optimal solution of a relaxation of the problem into an approximately-optimal solution to the original problem. The resulting algorithm is usually analyzed using the probabilistic method. == Overview == The basic approach has three steps: Formulate the problem to be solved as an integer linear program (ILP). Compute an optimal fractional solution x {\displaystyle x} to the linear programming relaxation (LP) of the ILP. Round the fractional solution x {\displaystyle x} of the LP to an integer solution x ′ {\displaystyle x'} of the ILP. (Although the approach is most commonly applied with linear programs, other kinds of relaxations are sometimes used. For example, see Goemans' and Williamson's semidefinite programming-based Max-Cut approximation algorithm.) In the first step, the challenge is to choose a suitable integer linear program. Familiarity with linear programming, in particular modelling using linear programs and integer linear programs, is required. For many problems, there is a natural integer linear program that works well, such as in the Set Cover example below. (The integer linear program should have a small integrality gap; indeed randomized rounding is often used to prove bounds on integrality gaps.) In the second step, the optimal fractional solution can typically be computed in polynomial time using any standard linear programming algorithm. In the third step, the fractional solution must be converted into an integer solution (and thus a solution to the original problem). This is called rounding the fractional solution. The resulting integer solution should (provably) have cost not much larger than the cost of the fractional solution. This will ensure that the cost of the integer solution is not much larger than the cost of the optimal integer solution. The main technique used to do the third step (rounding) is to use randomization, and then to use probabilistic arguments to bound the increase in cost due to the rounding (following the probabilistic method from combinatorics). Therein, probabilistic arguments are used to show the existence of discrete structures with desired properties. In this context, one uses such arguments to show the following: Given any fractional solution x {\displaystyle x} of the LP, with positive probability the randomized rounding process produces an integer solution x ′ {\displaystyle x'} that approximates x {\displaystyle x} according to some desired criterion. Finally, to make the third step computationally efficient, one either shows that x ′ {\displaystyle x'} approximates x {\displaystyle x} with high probability (so that the step can remain randomized) or one derandomizes the rounding step, typically using the method of conditional probabilities. The latter method converts the randomized rounding process into an efficient deterministic process that is guaranteed to reach a good outcome. == Example: the set cover problem == The following example illustrates how randomized rounding can be used to design an approximation algorithm for the set cover problem. Fix any instance ⟨ c , S ⟩ {\displaystyle \langle c,{\mathcal {S}}\rangle } of set cover over a universe U {\displaystyle {\mathcal {U}}} . === Computing the fractional solution === For step 1, let IP be the standard integer linear program for set cover for this instance. For step 2, let LP be the linear programming relaxation of IP, and compute an optimal solution x ∗ {\displaystyle x^{}} to LP using any standard linear programming algorithm. This takes time polynomial in the input size. The feasible solutions to LP are the vectors x {\displaystyle x} that assign each set s ∈ S {\displaystyle s\in {\mathcal {S}}} a non-negative weight x s {\displaystyle x_{s}} , such that, for each element e ∈ U {\displaystyle e\in {\mathcal {U}}} , x ′ {\displaystyle x'} covers e {\displaystyle e} —the total weight assigned to the sets containing e {\displaystyle e} is at least 1, that is, ∑ s ∋ e x s ≥ 1. {\displaystyle \sum _{s\ni e}x_{s}\geq 1.} The optimal solution x ∗ {\displaystyle x^{}} is a feasible solution whose cost ∑ s ∈ S c ( S ) x s ∗ {\displaystyle \sum _{s\in {\mathcal {S}}}c(S)x_{s}^{}} is as small as possible. Note that any set cover C {\displaystyle {\mathcal {C}}} for S {\displaystyle {\mathcal {S}}} gives a feasible solution x {\displaystyle x} (where x s = 1 {\displaystyle x_{s}=1} for s ∈ C {\displaystyle s\in {\mathcal {C}}} , x s = 0 {\displaystyle x_{s}=0} otherwise). The cost of this C {\displaystyle {\mathcal {C}}} equals the cost of x {\displaystyle x} , that is, ∑ s ∈ C c ( s ) = ∑ s ∈ S c ( s ) x s . {\displaystyle \sum _{s\in {\mathcal {C}}}c(s)=\sum _{s\in {\mathcal {S}}}c(s)x_{s}.} In other words, the linear program LP is a relaxation of the given set-cover problem. Since x ∗ {\displaystyle x^{}} has minimum cost among feasible solutions to the LP, the cost of x ∗ {\displaystyle x^{}} is a lower bound on the cost of the optimal set cover. === Randomized rounding step === In step 3, we must convert the minimum-cost fractional set cover x ∗ {\displaystyle x^{}} into a feasible integer solution x ′ {\displaystyle x'} (corresponding to a true set cover). The rounding step should produce an x ′ {\displaystyle x'} that, with positive probability, has cost within a small factor of the cost of x ∗ {\displaystyle x^{}} .Then (since the cost of x ∗ {\displaystyle x^{}} is a lower bound on the cost of the optimal set cover), the cost of x ′ {\displaystyle x'} will be within a small factor of the optimal cost. As a starting point, consider the most natural rounding scheme: For each set s ∈ S {\displaystyle s\in {\mathcal {S}}} in turn, take x s ′ = 1 {\displaystyle x'_{s}=1} with probability min ( 1 , x s ∗ ) {\displaystyle \min(1,x_{s}^{})} , otherwise take x s ′ = 0 {\displaystyle x'_{s}=0} . With this rounding scheme, the expected cost of the chosen sets is at most ∑ s c ( s ) x s ∗ {\displaystyle \sum _{s}c(s)x_{s}^{}} , the cost of the fractional cover. This is good. Unfortunately the coverage is not good. When the variables x s ∗ {\displaystyle x_{s}^{}} are small, the probability that an element e {\displaystyle e} is not covered is about ∏ s ∋ e 1 − x s ∗ ≈ ∏ s ∋ e exp ⁡ ( − x s ∗ ) = exp ⁡ ( − ∑ s ∋ e x s ∗ ) ≈ exp ⁡ ( − 1 ) . {\displaystyle \prod _{s\ni e}1-x_{s}^{}\approx \prod _{s\ni e}\exp(-x_{s}^{})=\exp {\Big (}-\sum _{s\ni e}x_{s}^{}{\Big )}\approx \exp(-1).} So only a constant fraction of the elements will be covered in expectation. To make x ′ {\displaystyle x'} cover every element with high probability, the standard rounding scheme first scales up the rounding probabilities by an appropriate factor λ > 1 {\displaystyle \lambda >1} . Here is the standard rounding scheme: Fix a parameter λ ≥ 1 {\displaystyle \lambda \geq 1} . For each set s ∈ S {\displaystyle s\in {\mathcal {S}}} in turn, take x s ′ = 1 {\displaystyle x'_{s}=1} with probability min ( λ x s ∗ , 1 ) {\displaystyle \min(\lambda x_{s}^{},1)} , otherwise take x s ′ = 0 {\displaystyle x'_{s}=0} . Scaling the probabilities up by λ {\displaystyle \lambda } increases the expected cost by λ {\displaystyle \lambda } , but makes coverage of all elements likely. The idea is to choose λ {\displaystyle \lambda } as small as possible so that all elements are provably covered with non-zero probability. Here is a detailed analysis. ==== Lemma (approximation guarantee for rounding scheme) ==== Fix λ = ln ⁡ ( 2 | U | ) {\displaystyle \lambda =\ln(2|{\mathcal {U}}|)} . With positive probability, the rounding scheme returns a set cover x ′ {\displaystyle x'} of cost at most 2 ln ⁡ ( 2 | U | ) c ⋅ x ∗ {\displaystyle 2\ln(2|{\mathcal {U}}|)c\cdot x^{}} (and thus of cost O ( log ⁡ | U | ) {\displaystyle O(\log |{\mathcal {U}}|)} times the cost of the optimal set cover). (Note: with care the O ( log ⁡ | U | ) {\displaystyle O(\log |{\mathcal {U}}|)} can be reduced to ln ⁡ ( | U | ) + O ( log ⁡ log ⁡ | U | ) {\displaystyle \ln(|{\mathcal {U}}|)+O(\log \log |{\mathcal {U}}|)} .) ==== Proof ==== The output x ′ {\displaystyle x'} of the random rounding scheme has the desired properties as long as none of the following "bad" events occur: the cost c ⋅ x ′ {\displaystyle c\cdot x'} of x ′ {\displaystyle x'} exceeds 2 λ c ⋅ x ∗ {\displaystyle 2\lambda c\cdot x^{}} , or for some element e {\displaystyle e} , x ′ {\displaystyle x'} fails to cover e {\displaystyle e} . The expectation of each x s ′ {\displaystyle x'_{s}} is at most λ x s ∗ {\displaystyle \lambda x_{s

    Read more →
  • Tea (app)

    Tea (app)

    Tea, officially Tea Dating Advice, is a dating surveillance mobile phone application that allows women to post personal data about men they are interested in or are currently dating. Founded by Sean Cook, the app rose to prominence in July 2025 after it was the subject of three major data leaks in July and August 2025. It was removed from Apple's App Store in October 2025, but remains available on the Google Play Store. == History == The app enables its users to upload, view, and comment on photos of men, check men's public records, and perform image searches. It also provides the ability to rate and review men, as well as a group chat function. The app uses artificial intelligence to verify that the user is a woman through facial analysis and other personal information to preserve the app as a women-only space. Users are required to submit their photo and an ID to access the app. The company that created the app was founded by businessman and tech capitalist Sean Cook, who stated in July 2025 that he was inspired to create the app because of his mother's experiences from online dating. According to the company, users remain anonymous, and the requirement to upload an ID was removed in 2023. An August 2025 investigation by 404 Media suggested that much of the information given by Cook on the historical background of the company was inaccurate. In July 2025, private messages, other personally identifying information, and approximately 72,000 images were leaked via 4chan. A further 1.1 million private messages were subsequently leaked using a separate security vulnerability; these included intimate conversations about controversial topics such as adultery and other forms of infidelity to their partners, discussions of abortion, phone numbers, meeting locations, and other confidential communications. The app's publishers subsequently revoked the ability to private message users in the app. Shortly after, the app was hidden from search on Android and an interactive, unverified map was also created of those in the files. By 7 August 2025, ten class action lawsuits had been filed. A further leak was reported later that month. Proponents have praised the app as an aid for women's safety by helping them check men for adultery, catfishing, criminal convictions and other "red flag" behaviors. Critics have described the app as a doxing tool and a violation of privacy, an opportunity for defamation against innocent individuals, and a witch hunt. Cook has stated that the company's legal team receives about three legal threats per day. Another mobile app, called TeaOnHer, was created in response of the app’s popularity. It was described as the male version of the Tea app. The app also reported a data breach in August 2025. In October 2025, Apple removed the app from their app store, telling journalists that the removal was due to a failure to meet company terms regarding content moderation and user privacy. Apple also mentioned an excessive amount of complaints, including allegations that the personal information of minors was being shared. The app remains on the Google Play Store.

    Read more →
  • Timeline of algorithms

    Timeline of algorithms

    The following timeline of algorithms outlines the development of algorithms (mainly "mathematical recipes") since their inception. == Antiquity == Before – writing about "recipes" (on cooking, rituals, agriculture and other themes) c. 1700–2000 BC – Egyptians develop earliest known algorithms for multiplying two numbers c. 1600 BC – Babylonians develop earliest known algorithms for factorization and finding square roots c. 300 BC – Euclid's algorithm c. 200 BC – the Sieve of Eratosthenes 263 AD – Gaussian elimination described by Liu Hui == Medieval Period == 628 – Chakravala method described by Brahmagupta c. 820 – Al-Khawarizmi described algorithms for solving linear equations and quadratic equations in his Algebra; the word algorithm comes from his name 825 – Al-Khawarizmi described the algorism, algorithms for using the Hindu–Arabic numeral system, in his treatise On the Calculation with Hindu Numerals, which was translated into Latin as Algoritmi de numero Indorum, where "Algoritmi", the translator's rendition of the author's name gave rise to the word algorithm (Latin algorithmus) with a meaning "calculation method" c. 850 – cryptanalysis and frequency analysis algorithms developed by Al-Kindi (Alkindus) in A Manuscript on Deciphering Cryptographic Messages, which contains algorithms on breaking encryptions and ciphers c. 1025 – Ibn al-Haytham (Alhazen), was the first mathematician to derive the formula for the sum of the fourth powers, and in turn, he develops an algorithm for determining the general formula for the sum of any integral powers c. 1400 – Ahmad al-Qalqashandi gives a list of ciphers in his Subh al-a'sha which include both substitution and transposition, and for the first time, a cipher with multiple substitutions for each plaintext letter; he also gives an exposition on and worked example of cryptanalysis, including the use of tables of letter frequencies and sets of letters which can not occur together in one word == Before 1940 == 1540 – Lodovico Ferrari discovered a method to find the roots of a quartic polynomial 1545 – Gerolamo Cardano published Cardano's method for finding the roots of a cubic polynomial 1614 – John Napier develops method for performing calculations using logarithms 1671 – Newton–Raphson method developed by Isaac Newton 1690 – Newton–Raphson method independently developed by Joseph Raphson 1706 – John Machin develops a quickly converging inverse-tangent series for π and computes π to 100 decimal places 1768 – Leonhard Euler publishes his method for numerical integration of ordinary differential equations in problem 85 of Institutiones calculi integralis 1789 – Jurij Vega improves Machin's formula and computes π to 140 decimal places, 1805 – FFT-like algorithm known by Carl Friedrich Gauss 1842 – Ada Lovelace writes the first algorithm for a computing engine 1903 – A fast Fourier transform algorithm presented by Carle David Tolmé Runge 1918 - Soundex 1926 – Borůvka's algorithm 1926 – Primary decomposition algorithm presented by Grete Hermann 1927 – Hartree–Fock method developed for simulating a quantum many-body system in a stationary state. 1934 – Delaunay triangulation developed by Boris Delaunay 1936 – Turing machine, an abstract machine developed by Alan Turing, with others developed the modern notion of algorithm. == 1940s == 1942 – A fast Fourier transform algorithm developed by G.C. Danielson and Cornelius Lanczos 1945 – Merge sort developed by John von Neumann 1947 – Simplex algorithm developed by George Dantzig == 1950s == 1950 – Hamming codes developed by Richard Hamming 1952 – Huffman coding developed by David A. Huffman 1953 – Simulated annealing introduced by Nicholas Metropolis 1954 – Radix sort computer algorithm developed by Harold H. Seward 1964 – Box–Muller transform for fast generation of normally distributed numbers published by George Edward Pelham Box and Mervin Edgar Muller. Independently pre-discovered by Raymond E. A. C. Paley and Norbert Wiener in 1934. 1956 – Kruskal's algorithm developed by Joseph Kruskal 1956 – Ford–Fulkerson algorithm developed and published by R. Ford Jr. and D. R. Fulkerson 1957 – Prim's algorithm developed by Robert Prim 1957 – Bellman–Ford algorithm developed by Richard E. Bellman and L. R. Ford, Jr. 1959 – Dijkstra's algorithm developed by Edsger Dijkstra 1959 – Shell sort developed by Donald L. Shell 1959 – De Casteljau's algorithm developed by Paul de Casteljau 1959 – QR factorization algorithm developed independently by John G.F. Francis and Vera Kublanovskaya 1959 – Rabin–Scott powerset construction for converting NFA into DFA published by Michael O. Rabin and Dana Scott == 1960s == 1960 – Karatsuba multiplication 1961 – CRC (Cyclic redundancy check) invented by W. Wesley Peterson 1962 – AVL trees 1962 – Quicksort developed by C. A. R. Hoare 1962 – Bresenham's line algorithm developed by Jack E. Bresenham 1962 – Gale–Shapley 'stable-marriage' algorithm developed by David Gale and Lloyd Shapley 1964 – Heapsort developed by J. W. J. Williams 1964 – multigrid methods first proposed by R. P. Fedorenko 1965 – Cooley–Tukey algorithm rediscovered by James Cooley and John Tukey 1965 – Levenshtein distance developed by Vladimir Levenshtein 1965 – Cocke–Younger–Kasami (CYK) algorithm independently developed by Tadao Kasami 1965 – Buchberger's algorithm for computing Gröbner bases developed by Bruno Buchberger 1965 – LR parsers invented by Donald Knuth 1966 – Dantzig algorithm for shortest path in a graph with negative edges 1967 – Viterbi algorithm proposed by Andrew Viterbi 1967 – Cocke–Younger–Kasami (CYK) algorithm independently developed by Daniel H. Younger 1968 – A graph search algorithm described by Peter Hart, Nils Nilsson, and Bertram Raphael 1968 – Risch algorithm for indefinite integration developed by Robert Henry Risch 1969 – Strassen algorithm for matrix multiplication developed by Volker Strassen == 1970s == 1970 – Dinic's algorithm for computing maximum flow in a flow network by Yefim (Chaim) A. Dinitz 1970 – Knuth–Bendix completion algorithm developed by Donald Knuth and Peter B. Bendix 1970 – BFGS method of the quasi-Newton class 1970 – Needleman–Wunsch algorithm published by Saul B. Needleman and Christian D. Wunsch 1972 – Edmonds–Karp algorithm published by Jack Edmonds and Richard Karp, essentially identical to Dinic's algorithm from 1970 1972 – Graham scan developed by Ronald Graham 1972 – Red–black trees and B-trees discovered 1973 – RSA encryption algorithm discovered by Clifford Cocks 1973 – Jarvis march algorithm developed by R. A. Jarvis 1973 – Hopcroft–Karp algorithm developed by John Hopcroft and Richard Karp 1974 – Pollard's p − 1 algorithm developed by John Pollard 1974 – Quadtree developed by Raphael Finkel and J.L. Bentley 1975 – Genetic algorithms popularized by John Holland 1975 – Pollard's rho algorithm developed by John Pollard 1975 – Aho–Corasick string matching algorithm developed by Alfred V. Aho and Margaret J. Corasick 1975 – Cylindrical algebraic decomposition developed by George E. Collins 1976 – Salamin–Brent algorithm independently discovered by Eugene Salamin and Richard Brent 1976 – Knuth–Morris–Pratt algorithm developed by Donald Knuth and Vaughan Pratt and independently by J. H. Morris 1977 – Boyer–Moore string-search algorithm for searching the occurrence of a string into another string. 1977 – RSA encryption algorithm rediscovered by Ron Rivest, Adi Shamir, and Len Adleman 1977 – LZ77 algorithm developed by Abraham Lempel and Jacob Ziv 1977 – multigrid methods developed independently by Achi Brandt and Wolfgang Hackbusch 1978 – LZ78 algorithm developed from LZ77 by Abraham Lempel and Jacob Ziv 1978 – Bruun's algorithm proposed for powers of two by Georg Bruun 1979 – Khachiyan's ellipsoid method developed by Leonid Khachiyan 1979 – ID3 decision tree algorithm developed by Ross Quinlan == 1980s == 1980 – Brent's Algorithm for cycle detection Richard P. Brendt 1981 – Quadratic sieve developed by Carl Pomerance 1981 – Smith–Waterman algorithm developed by Temple F. Smith and Michael S. Waterman 1983 – Simulated annealing developed by S. Kirkpatrick, C. D. Gelatt and M. P. Vecchi 1983 – Classification and regression tree (CART) algorithm developed by Leo Breiman, et al. 1984 – LZW algorithm developed from LZ78 by Terry Welch 1984 – Karmarkar's interior-point algorithm developed by Narendra Karmarkar 1984 – ACORN PRNG discovered by Roy Wikramaratna and used privately 1985 – Simulated annealing independently developed by V. Cerny 1985 – Car–Parrinello molecular dynamics developed by Roberto Car and Michele Parrinello 1985 – Splay trees discovered by Sleator and Tarjan 1986 – Blum Blum Shub proposed by L. Blum, M. Blum, and M. Shub 1986 – Push relabel maximum flow algorithm by Andrew Goldberg and Robert Tarjan 1986 – Barnes–Hut tree method developed by Josh Barnes and Piet Hut for fast approximate simulation of n-body problems 1987 – Fast multipole method developed by Leslie Greengard and Vladimir

    Read more →
  • Applications of artificial intelligence

    Applications of artificial intelligence

    Artificial intelligence is the capability of computational systems to perform tasks that are typically associated with human intelligence, such as learning, reasoning, problem-solving, perception, and decision-making. Artificial intelligence has been used in applications throughout industry and academia. Within the field of Artificial Intelligence, there are multiple subfields. The subfield of machine learning has been used for various scientific and commercial purposes, including language translation, image recognition, decision-making, credit scoring, and e-commerce. In recent years, massive advancements have been made in the field of generative artificial intelligence, which uses generative models to generate text, images, videos, and other forms of data. This article describes applications of AI in different sectors. == Agriculture == In agriculture, AI has been proposed as a way for farmers to identify areas that need irrigation, fertilization, or pesticide treatments to increase yields, thereby improving efficiency. AI has been used to attempt to classify livestock pig call emotions, automate greenhouses, detect diseases and pests, and optimize irrigation. == AI-assisted software develoment == == Architecture and design == == Business == A 2023 study found that generative AI increased productivity by 15% in contact centers. Another 2023 study found it increased productivity by up to 40% in writing tasks. An August 2025 review by MIT found that of surveyed companies, 95% did not report any improvement in revenue from the use of AI. A September 2025 article by the Harvard Business Review describes how increased use of AI does not automatically lead to increases in revenue or actual productivity. Referring to "AI generated work content that masquerades as good work, but lacks the substance to meaningfully advance a given task" the article coins the term workslop. Per studies done in collaboration with the Stanford Social Media Lab, workslop does not improve productivity and undermines trust and collaboration among colleagues. In telehealth, agentic AI is reportedly facilitating the creation of large business models (millions in annual profit) with 1-2 employees, such as MEDVi, which as of August 2025 only had 2 employees and ~$75M in annual profit for GLP-1 weight-loss telehealth services. == Chatbots == == Computer science == === Programming assistance === ==== AI-assisted software development ==== AI can be used for real-time code completion, chat, and automated test generation. These tools are typically integrated with editors and IDEs as plugins. AI-assisted software development systems differ in functionality, quality, speed, and approach to privacy. Creating software primarily via AI is known as "vibe coding". Code created or suggested by AI can be incorrect or inefficient. The use of AI-assisted coding can potentially speed-up software development, but can also slow-down the process by creating more work when debugging and testing. The rush to prematurely adopt AI technology can also incur additional technical debt. AI also requires additional consideration and careful review for cybersecurity, since AI coding software is trained on a wide range of code of inconsistent quality and often replicates poor practices. ==== Neural network design ==== AI can be used to create other AIs. For example, around November 2017, Google's AutoML project to evolve new neural net topologies created NASNet, a system optimized for ImageNet and POCO F1. NASNet's performance exceeded all previously published performance on ImageNet. ==== Quantum computing ==== Research and development of quantum computers has been performed with machine learning algorithms. For example, there is a prototype, photonic, quantum memristive device for neuromorphic computers (NC)/artificial neural networks and NC-using quantum materials with some variety of potential neuromorphic computing-related applications. The use of quantum machine learning for quantum simulators has been proposed for solving physics and chemistry problems. === Historical contributions === AI researchers have created many tools to solve the most difficult problems in computer science. Many of their inventions have been adopted by mainstream computer science and are no longer considered AI. All of the following were originally developed in AI laboratories: Time sharing Interactive interpreters Graphical user interfaces and the computer mouse Rapid application development environments The linked list data structure Automatic storage management Symbolic programming Functional programming Dynamic programming Object-oriented programming Optical character recognition Constraint satisfaction == Customer service == === Human resources === AI programs have been used in hiring processes to screen resumes and rank candidates based on their qualifications, predict a candidate's likelihood of success in a given role, and automate repetitive communication tasks using chatbots. Studies on these programs have identified tendencies for gender bias, favoring male names and male-coded characteristics, as well as bias against disabled candidates and racial minorities. === Online and telephone customer service === AI underlies avatars (automated online assistants) on web pages. It can reduce operation and training costs. Pypestream automated customer service for its mobile application to streamline communication with customers. A Google app analyzes language and converts speech into text. The platform can identify angry customers through their language and respond appropriately. Amazon uses a chatbot for customer service that can perform tasks like checking the status of an order, cancelling orders, offering refunds and connecting the customer with a human representative. Generative AI (GenAI), such as ChatGPT, is increasingly used in business to automate tasks and enhance decision-making. === Hospitality === In the hospitality industry, AI is used to reduce repetitive tasks, analyze trends, interact with guests, and predict customer needs. AI hotel services come in the form of a chatbot, application, virtual voice assistant and service robots. == Education == In educational institutions, AI has been used to automate routine tasks such as attendance tracking, grading, and marking. AI tools have also been used to monitor student progress and analyze learning behaviors, with the goal of facilitating timely interventions for students facing academic challenges. == Energy and environment == === Energy system === The U.S. Department of Energy wrote in an April 2024 report that AI may have applications in modeling power grids, reviewing federal permits with large language models, predicting levels of renewable energy production, and improving the planning process for electrical vehicle charging networks. Other studies have suggested that machine learning can be used for energy consumption prediction and scheduling, e.g. to help with renewable energy intermittency management (see also: smart grid and climate change mitigation in the power grid). === Environmental monitoring === Autonomous ships that monitor the ocean, AI-driven satellite data analysis, passive acoustics or remote sensing and other applications of environmental monitoring make use of machine learning. For example, "Global Plastic Watch" is an AI-based satellite monitoring-platform for analysis/tracking of plastic waste sites to help prevention of plastic pollution – primarily ocean pollution – by helping identify who and where mismanages plastic waste, dumping it into oceans. === Early-warning systems === Machine learning can be used to spot early-warning signs of disasters and environmental issues, possibly including natural pandemics, earthquakes, landslides, heavy rainfall, long-term water supply vulnerability, tipping-points of ecosystem collapse, cyanobacterial bloom outbreaks, and droughts. === Economic and social challenges === The University of Southern California launched the Center for Artificial Intelligence in Society, with the goal of using AI to address problems such as homelessness. Stanford researchers use AI to analyze satellite images to identify high poverty areas. == Entertainment and media == === Media === AI applications analyze media content such as movies, TV programs, advertisement videos or user-generated content. The solutions often involve computer vision. Typical scenarios include the analysis of images using object recognition or face recognition techniques, or the analysis of video for scene recognizing scenes, objects or faces. AI-based media analysis can facilitate media search, the creation of descriptive keywords for content, content policy monitoring (such as verifying the suitability of content for a particular TV viewing time), speech to text for archival or other purposes, and the detection of logos, products or celebrity faces for ad placement. Motion interpolation Pixel-art scaling algorithms Image scaling Imag

    Read more →
  • Conceptions of Library and Information Science

    Conceptions of Library and Information Science

    Conceptions of Library and Information Science (CoLIS) is a series of conferences about historical, empirical and theoretical perspectives in Library and Information Science. == CoLIS conferences == CoLIS 1 1991 in Tampere, Finland CoLIS 2 1996 in Copenhagen, Denmark CoLIS 3 1999 in Dubrovnik, Croatia CoLIS 4 2002 in Seattle, US CoLIS 5 2005 in Glasgow, Scotland CoLIS 6 2007 in Borås, Sweden CoLIS 7 June 2010 in London, at City University London. CoLIS 8 August 19–22, 2013, in Copenhagen, Denmark, at The Royal School of Library and Information Science. CoLIS 9 June 27–29, 2016, in Uppsala, Sweden, at Uppsala University. CoLIS 10 June 16–19, 2019, in Ljubljana, Slovenia, Faculty of Arts CoLIS 11 May 29–June 1, 2022, in Oslo, Norway, Oslo Metropolitan University.

    Read more →
  • Generative art

    Generative art

    Generative art is post-conceptual art that has been created (in whole or in part) with the use of an autonomous system. An autonomous system in this context is generally one that is non-human and can independently determine features of an artwork that would otherwise require decisions made directly by the artist. In some cases the human creator may claim that the generative system represents their own artistic idea, and in others that the system takes on the role of the creator. "Generative art" often refers to algorithmic art (algorithmically determined computer generated artwork) and synthetic media (general term for any algorithmically generated media), but artists can also make generative art using systems of chemistry, biology, mechanics and robotics, smart materials, manual randomization, mathematics, data mapping, symmetry, and tiling. Generative algorithms, algorithms programmed to produce artistic works through predefined rules, stochastic methods, or procedural logic, often yielding dynamic, unique, and contextually adaptable outputs—are central to many of these practices. == History == The use of the word "generative" in the discussion of art has developed over time. The use of "Artificial DNA" defines a generative approach to art focused on the construction of a system able to generate unpredictable events, all with a recognizable common character. The use of autonomous systems, required by some contemporary definitions, focuses a generative approach where the controls are strongly reduced. This approach is also named "emergent". Margaret Boden and Ernest Edmonds have noted the use of the term "generative art" in the broad context of automated computer graphics in the 1960s, beginning with artwork exhibited by Georg Nees and Frieder Nake in 1965: A. Michael Noll did his initial computer art, combining randomness with order, in 1962, and exhibited it along with works by Bell Julesz in 1965. The terms "generative art" and "computer art" have been used in tandem, and more or less interchangeably, since the very earliest days. The first such exhibition showed the work of Nees in February 1965, which some claim was titled "Generative Computergrafik". While Nees does not himself remember, this was the title of his doctoral thesis published a few years later. The correct title of the first exhibition and catalog was "computer-grafik". "Generative art" and related terms was in common use by several other early computer artists around this time, including Manfred Mohr and Ken Knowlton. Vera Molnár (born 1924) is a French media artist of Hungarian origin. Molnar is widely considered to be a pioneer of generative art, and is also one of the first women to use computers in her art practice. The term "Generative Art" with the meaning of dynamic artwork-systems able to generate multiple artwork-events was clearly used the first time for the "Generative Art" conference in Milan in 1998. The term has also been used to describe geometric abstract art where simple elements are repeated, transformed, or varied to generate more complex forms. Thus defined, generative art was practiced by the Argentinian artists Eduardo Mac Entyre and Miguel Ángel Vidal in the late 1960s. In 1972 the Romanian-born Paul Neagu created the Generative Art Group in Britain. It was populated exclusively by Neagu using aliases such as "Hunsy Belmood" and "Edward Larsocchi". In 1972 Neagu gave a lecture titled 'Generative Art Forms' at the Queen's University, Belfast Festival. In 1970 the School of the Art Institute of Chicago created a department called Generative Systems. As described by Sonia Landy Sheridan the focus was on art practices using the then new technologies for the capture, inter-machine transfer, printing and transmission of images, as well as the exploration of the aspect of time in the transformation of image information. Also noteworthy is John Dunn, first a student and then a collaborator of Sheridan. In 1988 Clauser identified the aspect of systemic autonomy as a critical element in generative art: It should be evident from the above description of the evolution of generative art that process (or structuring) and change (or transformation) are among its most definitive features, and that these features and the very term 'generative' imply dynamic development and motion. (the result) is not a creation by the artist but rather the product of the generative process - a self-precipitating structure. In 1989 Celestino Soddu defined the Generative Design approach to Architecture and Town Design in his book Citta' Aleatorie. In 1989 Franke referred to "generative mathematics" as "the study of mathematical operations suitable for generating artistic images." From the mid-1990s Brian Eno popularized the terms generative music and generative systems, making a connection with earlier experimental music by Terry Riley, Steve Reich and Philip Glass. From the end of the 20th century, communities of generative artists, designers, musicians and theoreticians began to meet, forming cross-disciplinary perspectives. The first meeting about generative Art was in 1998, at the inaugural International Generative Art conference at Politecnico di Milano University, Italy. In Australia, the Iterate conference on generative systems in the electronic arts followed in 1999. On-line discussion has centered around the eu-gene mailing list, which began late 1999, and has hosted much of the debate which has defined the field. These activities have more recently been joined by the Generator.x conference in Berlin starting in 2005. In 2012 the new journal GASATHJ, Generative Art Science and Technology Hard Journal was founded by Celestino Soddu and Enrica Colabella jointing several generative artists and scientists in the editorial board. Some have argued that as a result of this engagement across disciplinary boundaries, the community has converged on a shared meaning of the term. As Boden and Edmonds put it in 2011: Today, the term "Generative Art" is still current within the relevant artistic community. Since 1998 a series of conferences have been held in Milan with that title (Generativeart.com), and Brian Eno has been influential in promoting and using generative art methods (Eno, 1996). Both in music and in visual art, the use of the term has now converged on work that has been produced by the activation of a set of rules and where the artist lets a computer system take over at least some of the decision-making (although, of course, the artist determines the rules). In the call of the Generative Art conferences in Milan (annually starting from 1998), the definition of Generative Art by Celestino Soddu: Generative Art is the idea realized as genetic code of artificial events, as construction of dynamic complex systems able to generate endless variations. Each Generative Project is a concept-software that works producing unique and non-repeatable events, like music or 3D Objects, as possible and manifold expressions of the generating idea strongly recognizable as a vision belonging to an artist / designer / musician / architect /mathematician. Discussion on the eu-gene mailing list was framed by the following definition by Adrian Ward from 1999: Generative art is a term given to work which stems from concentrating on the processes involved in producing an artwork, usually (although not strictly) automated by the use of a machine or computer, or by using mathematic or pragmatic instructions to define the rules by which such artworks are executed. A similar definition is provided by Philip Galanter: Generative art refers to any art practice where the artist creates a process, such as a set of natural language rules, a computer program, a machine, or other procedural invention, which is then set into motion with some degree of autonomy contributing to or resulting in a completed work of art. Around the 2020s, generative AI models learned to imitate the distinct style of particular authors. For example, a generative image model such as Stable Diffusion is able to model the stylistic characteristics of an artist like Pablo Picasso (including his particular brush strokes, use of colour, perspective, and so on), and a user can engineer a prompt such as "an astronaut riding a horse, by Picasso" to cause the model to generate a novel image applying the artist's style to an arbitrary subject. Generative image models have received significant backlash from artists who object to their style being imitated without their permission, arguing that this harms their ability to profit from their own work. The emergence of text-to-image generative AI systems has expanded debates over authorship, copyright, and artistic labor. The main issues in these debates include the eligibility of AI-generated outputs for copyright protection and the legal and ethical questions of using existing copyrighted works as training data for generative AI systems. == Types == === Music === Johann Kirnberger's Mu

    Read more →
  • Artificial empathy

    Artificial empathy

    Artificial empathy or computational empathy is the development of AI systems—such as companion robots or virtual agents—that can detect emotions and respond to them in an empathic way. Although such technology can be perceived as scary or threatening, it could also have a significant advantage over humans for roles in which emotional expression can be important, such as in the health care sector. An October 2025 review and meta-analysis in the British Medical Bulletin found that AI chatbots were rated as showing more empathy than human healthcare professionals in 13 of 15 studies that compared them. Care-givers who perform emotional labor above and beyond the requirements of paid labor can experience chronic stress or burnout, and can become desensitized to patients. Artificial empathy could also help the socialization of care-givers, or serve as role model for emotional detachment. A broader definition of artificial empathy is "the ability of nonhuman models to predict a person's internal state (e.g., cognitive, affective, physical) given the signals (s)he emits (e.g., facial expression, voice, gesture) or to predict a person's reaction (including, but not limited to internal states) when he or she is exposed to a given set of stimuli (e.g., facial expression, voice, gesture, graphics, music, etc.)". A 2025 study reported that some multimodal large language models can recognize basic facial expressions with human-level accuracy on a commonly used research dataset of posed facial expressions. == Areas of research == There are a variety of philosophical, theoretical, and applicative questions related to artificial empathy. For example: Which conditions would have to be met for a robot to respond competently to a human emotion? What models of empathy can or should be applied to Social and Assistive Robotics? Must the interaction of humans with robots imitate affective interaction between humans? Can a robot help science learn about affective development of humans? Would robots create unforeseen categories of inauthentic relations? What relations with robots can be considered authentic? How can we assess artificial empathy in AI systems? == Examples of artificial empathy research and practice == People often communicate and make decisions based on inferences about each other's internal states (e.g., emotional, cognitive, and physical states) that are in turn based on signals emitted by the person such as facial expression, body gesture, voice, and words. Broadly speaking, artificial empathy focuses on developing non-human models that achieve similar objectives using similar data. === Streams of artificial empathy research === Artificial empathy has been applied in various research disciplines, including artificial intelligence and business. Two main streams of research in this domain are: the use of nonhuman models to predict a person's internal state (e.g., cognitive, affective, physical) given the signals he or she emits (e.g., facial expression, voice, gesture) the use of nonhuman models to predict a person's reaction when he or she is exposed to a given set of stimuli (e.g., facial expression, voice, gesture, graphics, music, etc.). Research on affective computing, such as emotional speech recognition and facial expression detection, falls within the first stream of artificial empathy. Contexts that have been studied include oral interviews, call centers, human-computer interaction, sales pitches, and financial reporting. The second stream of artificial empathy has been researched more in marketing contexts, such as advertising, branding, customer reviews, in-store recommendation systems, movies, and online dating. === Artificial empathy applications in practice === With the increasing volume of visual, audio, and text data in commerce, many business applications for artificial empathy have followed. For example, Affectiva analyses viewers' facial expressions from video recordings while they are watching video advertisements in order to optimize the content design of video ads. Software like HireVue, BarRaiser, a hiring intelligence firm, helps firms make recruitment decisions by analyzing audio and video information from candidates' video interviews. Lapetus Solutions develops a model to estimate an individual's longevity, health status, and disease susceptibility from a face photo. Their technology has been applied in the insurance industry. == Artificial empathy and human services == Although artificial intelligence cannot yet replace social workers themselves, the technology has been deployed in that field. Florida State University published a study about Artificial Intelligence being used in the human services field. The research used computer algorithms to analyze health records for combinations of risk factors that could predict a future suicide attempt. The article reports, "machine learning—a future frontier for artificial intelligence—can predict with 80% to 90% accuracy whether someone will attempt suicide as far off as two years into the future. The algorithms become even more accurate as a person's suicide attempt gets closer. For example, the accuracy climbs to 92% one week before a suicide attempt when artificial intelligence focuses on general hospital patients". Such algorithmic machines can help social workers. Social work operates on a cycle of engagement, assessment, intervention, and evaluation with clients. Earlier assessment for risk of suicide can lead to earlier interventions and prevention, therefore saving lives. The system would learn, analyze, and detect risk factors, alerting the clinician of a patient's suicide risk score (analogous to a patient's cardiovascular risk score). Then, social workers could step in for further assessment and preventive intervention.

    Read more →
  • Learning augmented algorithm

    Learning augmented algorithm

    A learning augmented algorithm (also called algorithm with predictions) is an algorithm that can make use of a prediction to improve its performance. Whereas in regular algorithms just the problem instance is inputted, learning augmented algorithms accept an extra parameter. This extra parameter often is a prediction of some property of the solution. This prediction is then used by the algorithm to improve its running time or the quality of its output. The most common application are online algorithms, where a prediction on the uncertain instance is provided. == Description == A learning augmented algorithm typically takes an input ( I , A ) {\displaystyle ({\mathcal {I}},{\mathcal {A}})} . Here I {\displaystyle {\mathcal {I}}} is a problem instance and A {\displaystyle {\mathcal {A}}} is the prediction. A prediction can be any object. Common are the following types: Prediction of an optimal solution. The prediction gives a solution to the problem or characterizes an optimal solution. Prediction of the input. This is mainly used for online problems. Prediction of algorithmic actions. A prediction tailored to a specific algorithm that suggests a specific algorithm execution. Learning augmented algorithms usually satisfy the following three properties: Consistency. A learning augmented algorithm is said to be consistent if the algorithm can be proven to have a good performance when it is provided with an accurate prediction. Smoothness. A learning augmented algorithm is called smooth if its performance can be bounded by a function of the quality of the prediction. Here, the quality can be measured in a problem specific way. This is also called the prediction error. Robustness. A learning augmented algorithm is called robust if its worst-case performance can be bounded even if the given prediction is inaccurate. Learning augmented algorithms generally do not prescribe how the prediction should be done. For this purpose machine learning can be used. == Applications == A few examples of problems where learning augmented algorithms have been applied are the following. === Online algorithms === The ski rental problem The weighted paging problem The set cover problem Nonclairvoyant scheduling The online bipartite matching problem === Warm starting === ==== Data structures ==== The binary search algorithm is an algorithm for finding elements of a sorted list x 1 , … , x n {\displaystyle x_{1},\ldots ,x_{n}} . It needs O ( log ⁡ ( n ) ) {\displaystyle O(\log(n))} steps to find an element with some known value y {\displaystyle y} in a list of length n {\displaystyle n} . With a prediction i {\displaystyle i} for the position of y {\displaystyle y} , the following learning augmented algorithm can be used. First, look at position i {\displaystyle i} in the list. If x i = y {\displaystyle x_{i}=y} , the element has been found. If x i < y {\displaystyle x_{i} y {\displaystyle x_{i}>y} , do the same as in the previous case, but instead consider i − 1 , i − 2 , i − 4 , … {\displaystyle i-1,i-2,i-4,\ldots } . The error is defined to be η = | i − i ∗ | {\displaystyle \eta =|i-i^{}|} , where i ∗ {\displaystyle i^{}} is the real index of y {\displaystyle y} . In the learning augmented algorithm, probing the positions i + 1 , i + 2 , i + 4 , … {\displaystyle i+1,i+2,i+4,\ldots } takes log 2 ⁡ ( η ) {\displaystyle \log _{2}(\eta )} steps. Then a binary search is performed on a list of size at most 2 η {\displaystyle 2\eta } , which takes log 2 ⁡ ( η ) {\displaystyle \log _{2}(\eta )} steps. This makes the total running time of the algorithm 2 log 2 ⁡ ( η ) {\displaystyle 2\log _{2}(\eta )} . So, when the error is small, the algorithm is faster than a normal binary search. This shows that the algorithm is consistent. Even in the worst case, the error will be at most n {\displaystyle n} . Then the algorithm takes at most O ( log ⁡ ( n ) ) {\displaystyle O(\log(n))} steps, so the algorithm is robust. ==== More examples ==== The maximum weight matching problem === Approximation algorithms === The maximum cut problem The vertex cover problem === Mechanism Design === The facility location problem

    Read more →
  • Artificial intelligence in marketing

    Artificial intelligence in marketing

    Artificial intelligence marketing (AI marketing) is a form of marketing that uses artificial intelligence concepts and models such as machine learning, natural language processing, and computer vision to achieve marketing goals. The main difference between AI marketing and traditional forms of marketing reside in the reasoning, which is performed through a computer algorithm rather than a human. Each form of marketing has a different technique to the core of the marketing theory. Traditional marketing directly focuses on the needs of consumers; meanwhile some believe the shift AI may cause will lead marketing agencies to manage consumer needs instead. AI is used in various digital marketing spaces, such as content marketing, email marketing, online advertisement (in combination with machine learning), social media marketing, affiliate marketing, and beyond. == Historical development == AI in marketing has a long history, which goes all the way back to the 1980s. At this time, AI research was focusing on expert systems and robotics. Despite the initial research and the studies that were carried out, AI adoption remained limited. Research on it came to a stop for a while, until research was revived two decades later with the advancement in technology, the rise of big data, and a significant increase in computational power. Eventually, AI became very popular in the marketing world, and caught the eyes of many researchers as well as professionals. A large‐scale bibliometric study covering 1,580 peer‑reviewed papers published between 1982 and 2020 confirms that scholarly output on AI in marketing has surged since 2017, with Expert Systems with Applications emerging as the most prolific outlet. Prior to the application of artificial Intelligence in marketing, there was something called "collaborative filtering". This was used as early as 1998 by Amazon, and one of the first ways companies predicted consumer behavior, which enabled millions of recommendations to different customers. Personalized recommender systems are now widely used, for example to suggest music on Spotify, or TV shows on Netflix. A big milestone in AI marketing happened in 2014, when programmatic ad buying gained much greater popularity. Marketing consists of numerous manual tasks such as researching target markets, insertion orders, and managing high budgets as well as prices. In order to cut costs, and remove the need for these tedious tasks, many companies started to automate the marketing process with AI. In 2015, Google introduced RankBrain, a machine learning component of its search algorithm designed to interpret the intent behind user queries. RankBrain was followed by further AI-based search updates, including BERT in 2019, which improved the understanding of conversational queries, and the Multitask Unified Model (MUM) in 2021, which is multimodal and processes information across 75 languages. These advances shifted search engine optimization practice away from keyword matching toward content that satisfies user intent. Artificial intelligence is increasingly used in marketing to personalize user experiences and automate decision-making. For example, Netflix uses AI algorithms to recommend content based on viewing history, while Sephora employs chatbots to assist customers with product selection and availability. Programmatic advertising platforms like Google Ads leverage machine learning to optimize bidding strategies and target audiences more effectively. These applications demonstrate how AI enhances efficiency, engagement, and conversion rates across digital channels. === Artificial neural networks === An artificial neural network is a form of computer program modeled on the brain and nervous system of humans. Neural networks are composed of a series of interconnected processing neurons that function in unison to achieve certain outcomes. Using “human-like trial and error learning methods neural networks detect patterns existing within a data set ignoring data that is not significant while emphasizing the data which is most influential”. From a marketing perspective, neural networks are a form of software tool used to assist in decision making. Neural networks are effective in gathering and extracting information from large data sources and have the ability to identify cause and effect within tha data. These neural nets through the process of learning, identify relationships and connections between databases. Once knowledge has been accumulated, neural networks can be relied on to provide generalizations and can apply past knowledge and learning to a variety of situations. Neural networks help fulfill the role of marketing companies through effectively aiding in market segmentation and measurement of performance while reducing costs and improving accuracy. Due to their learning ability, flexibility, adaption, and knowledge discovery, neural networks offer many advantages over traditional models. Neural networks can be used to assist in pattern classification, forecasting and marketing analysis. == Tools and uses == Classification of customers can be facilitated through the neural network approach allowing companies to make informed marketing decisions. An example of this was employed by Spiegel Inc., a firm dealing in direct-mail operations that used neural networks to improve efficiencies. Using software developed by NeuralWare Inc., Spiegel identified the demographics of customers who had made a single purchase and those customers who had made repeat purchases. Neural networks where then able to identify the key patterns and consequently identify the customers that were most likely to repeat purchase. Understanding this information allowed Spiegel to streamline marketing efforts, and reduced costs. Sales forecasting “is the process of estimating future events with the goal of providing benchmarks for monitoring actual performance and reducing uncertainty". Artificial intelligence techniques have emerged to facilitate the process of forecasting through increasing accuracy in the areas of demand for products, distribution, employee turnover, performance measurement, and inventory control. An example of forecasting using neural networks is the Airline Marketing Assistant/Tactician; an application developed by BehabHeuristics which allows for the forecasting of passenger demand and consequent seat allocation through neural networks. This system has been used by National air Canada and USAir. Neural networks provide a useful alternative to traditional statistical models due to their reliability, time-saving characteristics and ability to recognize patterns from incomplete or noisy data. Examples of marketing analysis systems includes the Target Marketing System developed by Churchull Systems for Veratex Corporation. This support system scans a market database to identify dormant customers allowing management to make decisions regarding which key customers to target. When performing marketing analysis, neural networks can assist in the gathering and processing of information ranging from consumer demographics and credit history to the purchase patterns of consumers. Predictive analytics is a form of analytics involving the use of historical data and artificial intelligence algorithms to predict future trends and outcomes. It serves as a tool for anticipating and understanding user behavior based on patterns found in data. Predictive analytics uses artificial intelligence machine learning algorithms to recognize and predict patterns within data. Machine learning algorithms analyze the data, recognize patterns, and make predictions through continuous learning and adaptation. Predictive analytics is widely used across businesses and industries as a way to identify opportunities, avoid risks, and anticipate customer needs based on information derived from the analysis of user data. By analyzing historical customer data, artificial intelligence algorithms can deliver relevant and targeted marketing content. Recent systematic reviews show that generative large‑language models such as GPT‑3 and GPT‑4 are now routinely embedded in predictive‑analytics pipelines to mine unstructured market data and anticipate customer intent with greater precision. Personalization engines use artificial intelligence and machine learning to provide content or advertisements that are relevant to the user. User data is gathered, which then gets processed with machine learning, and patterns and trends among the users are identified. Users with shared characteristics or behaviors are then segmented into groups, and the personalization engine adjusts content and advertisements to match each segment's preferences. By processing a large amount of data, personalization engines are able to match users to advertisements and recommendations that align with their interests or preferences. Field evidence from consumer‑goods and electronics firms indicates that AI‑driven personalization can raise

    Read more →
  • Snapshot isolation

    Snapshot isolation

    In databases, and transaction processing (transaction management), snapshot isolation is a guarantee that all reads made in a transaction will see a consistent snapshot of the database (in practice it reads the last committed values that existed at the time it started), and the transaction itself will successfully commit only if no updates it has made conflict with any concurrent updates made since that snapshot. Snapshot isolation has been adopted by several major database management systems, such as InterBase, Firebird, Oracle, MySQL, PostgreSQL, SQL Anywhere, MongoDB and Microsoft SQL Server (2005 and later). The main reason for its adoption is that it allows better performance than serializability, yet still avoids most of the concurrency anomalies that serializability avoids (but not all). In practice snapshot isolation is implemented within multiversion concurrency control (MVCC), where generational values of each data item (versions) are maintained: MVCC is a common way to increase concurrency and performance by generating a new version of a database object each time the object is written, and allowing transactions' read operations of several last relevant versions (of each object). Snapshot isolation has been used to criticize the ANSI SQL-92 standard's definition of isolation levels, as it exhibits none of the "anomalies" that the SQL standard prohibited, yet is not serializable (the anomaly-free isolation level defined by ANSI). In spite of its distinction from serializability, snapshot isolation is sometimes referred to as serializable by Oracle. == Definition == A transaction executing under snapshot isolation appears to operate on a personal snapshot of the database, taken at the start of the transaction. When the transaction concludes, it will successfully commit only if the values updated by the transaction have not been changed externally since the snapshot was taken. Such a write–write conflict will cause the transaction to abort. In a write skew anomaly, two transactions (T1 and T2) concurrently read an overlapping data set (e.g. values V1 and V2), concurrently make disjoint updates (e.g. T1 updates V1, T2 updates V2), and finally concurrently commit, neither having seen the update performed by the other. Were the system serializable, such an anomaly would be impossible, as either T1 or T2 would have to occur "first", and be visible to the other. In contrast, snapshot isolation permits write skew anomalies. As a concrete example, imagine V1 and V2 are two balances held by a single person, Phil. The bank will allow either V1 or V2 to run a deficit, provided the total held in both is never negative (i.e. V1 + V2 ≥ 0). Both balances are currently $100. Phil initiates two transactions concurrently, T1 withdrawing $200 from V1, and T2 withdrawing $200 from V2. If the database guaranteed serializable transactions, the simplest way of coding T1 is to deduct $200 from V1, and then verify that V1 + V2 ≥ 0 still holds, aborting if not. T2 similarly deducts $200 from V2 and then verifies V1 + V2 ≥ 0. Since the transactions must serialize, either T1 happens first, leaving V1 = −$100, V2 = $100, and preventing T2 from succeeding (since V1 + (V2 − $200) is now −$200), or T2 happens first and similarly prevents T1 from committing. If the database is under snapshot isolation(MVCC), however, T1 and T2 operate on private snapshots of the database: each deducts $200 from an account, and then verifies that the new total is zero, using the other account value that held when the snapshot was taken. Since neither update conflicts, both commit successfully, leaving V1 = V2 = −$100, and V1 + V2 = −$200. Some systems built using multiversion concurrency control (MVCC) may support (only) snapshot isolation to allow transactions to proceed without worrying about concurrent operations, and more importantly without needing to re-verify all read operations when the transaction finally commits. This is convenient because MVCC maintains a series of recent history consistent states. The only information that must be stored during the transaction is a list of updates made, which can be scanned for conflicts fairly easily before being committed. However, MVCC systems (such as MarkLogic) will use locks to serialize writes together with MVCC to obtain some of the performance gains and still support the stronger "serializability" level of isolation. == Workarounds == Potential inconsistency problems arising from write skew anomalies can be fixed by adding (otherwise unnecessary) updates to the transactions in order to enforce the serializability property. Materialize the conflict Add a special conflict table, which both transactions update in order to create a direct write–write conflict. Promotion Have one transaction "update" a read-only location (replacing a value with the same value) in order to create a direct write–write conflict (or use an equivalent promotion, e.g. Oracle's SELECT FOR UPDATE). In the example above, we can materialize the conflict by adding a new table which makes the hidden constraint explicit, mapping each person to their total balance. Phil would start off with a total balance of $200, and each transaction would attempt to subtract $200 from this, creating a write–write conflict that would prevent the two from succeeding concurrently. However, this approach violates the normal form. Alternatively, we can promote one of the transaction's reads to a write. For instance, T2 could set V1 = V1, creating an artificial write–write conflict with T1 and, again, preventing the two from succeeding concurrently. This solution may not always be possible. In general, therefore, snapshot isolation puts some of the problem of maintaining non-trivial constraints onto the user, who may not appreciate either the potential pitfalls or the possible solutions. The upside to this transfer is better performance. == Terminology == Snapshot isolation is called "serializable" mode in Oracle and PostgreSQL versions prior to 9.1, which may cause confusion with the "real serializability" mode. There are arguments both for and against this decision; what is clear is that users must be aware of the distinction to avoid possible undesired anomalous behavior in their database system logic. == History == Snapshot isolation arose from work on multiversion concurrency control databases, where multiple versions of the database are maintained concurrently to allow readers to execute without colliding with writers. Such a system allows a natural definition and implementation of such an isolation level. InterBase, later owned by Borland, was acknowledged to provide SI rather than full serializability in version 4, and likely permitted write-skew anomalies since its first release in 1985. Unfortunately, the ANSI SQL-92 standard was written with a lock-based database in mind, and hence is rather vague when applied to MVCC systems. Berenson et al. wrote a paper in 1995 critiquing the SQL standard, and cited snapshot isolation as an example of an isolation level that did not exhibit the standard anomalies described in the ANSI SQL-92 standard, yet still had anomalous behaviour when compared with serializable transactions. In 2008, Cahill et al. showed that write-skew anomalies could be prevented by detecting and aborting "dangerous" triplets of concurrent transactions. This implementation of serializability is well-suited to multiversion concurrency control databases, and has been adopted in PostgreSQL 9.1, where it is known as Serializable Snapshot Isolation (SSI). When used consistently, this eliminates the need for the above workarounds. The downside over snapshot isolation is an increase in aborted transactions. This can perform better or worse than snapshot isolation with the above workarounds, depending on workload.

    Read more →
  • Chinese speech synthesis

    Chinese speech synthesis

    Chinese speech synthesis is the application of speech synthesis to the Chinese language (usually Standard Chinese). It poses additional difficulties due to Chinese characters frequently having different pronunciations in different contexts and the complex prosody, which is essential to convey the meaning of words, and sometimes the difficulty in obtaining agreement among native speakers concerning what the correct pronunciation is of certain phonemes. == Concatenation (Ekho and KeyTip) == Recordings can be concatenated in any desired combination, but the joins sound forced (as is usual for simple concatenation-based speech synthesis) and this can severely affect prosody; these synthesizers are also inflexible in terms of speed and expression. However, because these synthesizers do not rely on a corpus, there is no noticeable degradation in performance when they are given more unusual or awkward phrases. Ekho is an open source TTS which simply concatenates sampled syllables. It currently supports Cantonese, Mandarin, and experimentally Korean. Some of the Mandarin syllables have been pitched-normalised in Praat. A modified version of these is used in Gradint's "synthesis from partials". cjkware.com used to ship a product called KeyTip Putonghua Reader which worked similarly; it contained 120 Megabytes of sound recordings (GSM-compressed to 40 Megabytes in the evaluation version), comprising 10,000 multi-syllable dictionary words plus single-syllable recordings in 6 different prosodies (4 tones, neutral tone, and an extra third-tone recording for use at the end of a phrase). == Lightweight synthesizers (eSpeak and Yuet) == The lightweight open-source speech project eSpeak, which has its own approach to synthesis, has experimented with Mandarin and Cantonese. eSpeak was used by Google Translate from May 2010 until December 2010. The commercial product "Yuet" is also lightweight (it is intended to be suitable for resource-constrained environments like embedded systems); it was written from scratch in ANSI C starting from 2013. Yuet claims a built-in NLP model that does not require a separate dictionary; the speech synthesised by the engine claims clear word boundaries and emphasis on appropriate words. Communication with its author is required to obtain a copy. Both eSpeak and Yuet can synthesis speech for Cantonese and Mandarin from the same input text, and can output the corresponding romanisation (for Cantonese, Yuet uses Yale and eSpeak uses Jyutping; both use Pinyin for Mandarin). eSpeak does not concern itself with word boundaries when these don't change the question of which syllable should be spoken. == Corpus-based == A "corpus-based" approach can sound very natural in most cases but can err in dealing with unusual phrases if they can't be matched with the corpus. The synthesiser engine is typically very large (hundreds or even thousands of megabytes) due to the size of the corpus. === iFlyTek === Anhui USTC iFlyTek Co., Ltd (iFlyTek) published a W3C paper in which they adapted Speech Synthesis Markup Language to produce a mark-up language called Chinese Speech Synthesis Markup Language (CSSML) which can include additional markup to clarify the pronunciation of characters and to add some prosody information. The amount of data involved is not disclosed by iFlyTek but can be seen from the commercial products that iFlyTek have licensed their technology to; for example, Bider's SpeechPlus is a 1.3 Gigabyte download, 1.2 Gigabytes of which is used for the highly compressed data for a single Chinese voice. iFlyTek's synthesiser can also synthesise mixed Chinese and English text with the same voice (e.g. Chinese sentences containing some English words); they claim their English synthesis to be "average". The iFlyTek corpus appears to be heavily dependent on Chinese characters, and it is not possible to synthesize from pinyin alone. It is sometimes possible by means of CSSML to add pinyin to the characters to disambiguate between multiple possible pronunciations, but this does not always work. === NeoSpeech === There is an online interactive demonstration for NeoSpeech speech synthesis, which accepts Chinese characters and also pinyin if it's enclosed in their proprietary "VTML" markup. === Mac OS === Mac OS had Chinese speech synthesizers available up to version 9. This was removed in 10.0 and reinstated in 10.7 (Lion). === Historical corpus-based synthesizers (no longer available) === A corpus-based approach was taken by Tsinghua University in SinoSonic, with the Harbin dialect voice data taking 800 Megabytes. This was planned to be offered as a download but the link was never activated. Nowadays, only references to it can be found on Internet Archive. Bell Labs' approach, which was demonstrated online in 1997 but subsequently removed, was described in a monograph "Multilingual Text-to-Speech Synthesis: The Bell Labs Approach" (Springer, October 31, 1997, ISBN 978-0-7923-8027-6), and the former employee who was responsible for the project, Chilin Shih (who subsequently worked at the University of Illinois) put some notes about her methods on her website.

    Read more →
  • Information scientist

    Information scientist

    The term information scientist developed in the latter part of the twentieth century by Wm. Hovey Smith to describe an individual, usually with a relevant subject degree (such as one in Information and Computer Science - CIS) or high level of subject knowledge, providing focused information to scientific and technical research staff in industry. It is a role quite distinct from and complementary to that of a librarian. Developments in end-user searching, together with some convergence between the roles of librarian and information scientist, have led to a diminution in its use in this context, and the term information officer or information professional (information specialist) are also now used. The term was, and is, also used for an individual carrying out research in information science. Brian C. Vickery mentions that the Institute of Information Scientists (IIS) was established in London during 1958 and lists the criteria put forward by this institute "Criteria for Information Science" (appendix 1) as well as his own "Areas of study in information science" (appendix 2). The IIS merged with the Library Association in 2002 to form the Chartered Institute of Library and Information Professionals (CILIP). == Notable Information Scientists == See also Award of Merit - Association for Information Science and Technology Marcia Bates David Blair (information technologist) Samuel C. Bradford Michael Buckland John M. Carroll Blaise Cronin Emilia Currás Brenda Dervin Eugene Garfield Paul B. Kantor Frederick Wilfrid Lancaster Calvin Mooers Tefko Saracevic Linda C. Smith Robert Saxton Taylor Brian Campbell Vickery Thomas D. Wilson == Additional reading == Ellis, David and Merete Haugan. (1997) "Modelling the information seeking patterns of engineers and research scientists in an industrial environment" (Journal of Documentation, Volume 53(4): pp. 384–403) Poole, Alex H. (2024). "'There's a big difference between going through life with the wind at your back, and going through life leaning into the wind': Feminism in Post-World War II Information Science". Proceedings of the Association for Information Science and Technology. 61: 300–313. doi:10.1002/pra2.1029. Vickery, Brian Campbell (1988) "Essays presented to B. C. Vickery" (Journal of Documentation, Volume 44, pp. 199–283). Vickery, B. & Vickery, A. (1987) Information Science in theory and practice (London: Bowker-Saur, pp. 361–369)

    Read more →