AI Assistant For Writing

AI Assistant For Writing — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Apertus (LLM)

    Apertus (LLM)

    Apertus is a public large language model, developed by the Swiss AI Initiative (a collaboration between EPFL, ETH Zurich, and the Swiss National Supercomputing Centre). It was released on September 2, 2025, under the free and open-source Apache 2.0 license. Designed initially for business and research use cases around the world, Apertus was trained on over 1800 languages, and comes in 8 billion or 70 billion parameter versions and is available on Hugging Face for download. The model was developed aiming to adhere to European copyright law, and is one of the first examples of AI as a public good in the vein of AI Sovereignty. It is also the first large model to comply with the European Union's Artificial Intelligence Act. At its launch, the model creators emphasized multilinguality, transparency, and auditability as priorities in contrast to commercial frontier model. While international reception was largely positive, the first iteration was significantly behind the capabilities of frontier models and needs adaptation for many use cases with chatbots being a secondary but not a primary use case. As of late 2025, it was considered the largest and most capable fully open model. The capability of future models will depend in part on how much more funding can be secured.

    Read more →
  • Gallery software

    Gallery software

    Gallery software is software that helps the user publish or share photos, pictures, videos or other digital media. Most galleries are located on Web servers, where users are allowed to register and publish their pictures. Gallery software usually features automatic image resizing, allows digital media be categorized into sets, and allows comments. == Types == Early digital media publishing and sharing was done with imageboards. The boards are by topics, sometimes called "chan". Each discussion in a "chan" are started with a piece of digital media, and follow-up discussions can contain another piece too. Software works in this way: Futallaby, Danbooru. Traditionally, galleries are managed. An administrator maintains a set of or hierarchy of albums. The users can upload their digital media in one of the existing albums defined by an administrator, or create their own albums. The users with sufficient permission can re-categorise the digital media others uploaded. Often, the site's administrator can define which album the users are allowed to categorise their media into, or delete other user's content. Examples are open source galleries Coppermine, Gallery Project. There are decentralised gallery software that does not have an administrator for managing contents. Pinterest, Flickr and DeviantArt has been successful with this model. Open source gallery software MediaGoblin works in this way. Each user can create their own "collections", to categorise theirs or other users' media. However users cannot put media into other user's collections. Each user's category is separate. There is no centralised theme or hierarchy for the media.

    Read more →
  • Loebner Prize

    Loebner Prize

    The Loebner Prize was an annual competition in artificial intelligence that awarded prizes to the computer programs considered by the judges to be the most human-like. The format of the competition was that of a standard Turing test. In each round, a human judge simultaneously held textual conversations with a computer program and a human being via computer. Based upon the responses, the judge would attempt to determine which was which. The contest was launched in 1990 by Hugh Loebner in conjunction with the Cambridge Center for Behavioral Studies, Massachusetts, United States. In 2004 and 2005, it was held in Loebner's apartment in New York City. Within the field of artificial intelligence, the Loebner Prize is somewhat controversial; the most prominent critic, Marvin Minsky, called it a publicity stunt that does not help the field along. Beginning in 2014, it was organised by the AISB at Bletchley Park. It has also been associated with Flinders University, Dartmouth College, the Science Museum in London, University of Reading and Ulster University, Magee Campus, Derry, UK City of Culture. For the final 2019 competition, the format changed. There was no panel of judges. Instead, the chatbots were judged by the public and there were to be no human competitors. The prize has been reported as defunct as of 2020. == Prizes == Originally, $2,000 was awarded for the most human-seeming program in the competition. The prize was $3,000 in 2005 and $2,250 in 2006. In 2008, $3,000 was awarded. In addition, there were two one-time-only prizes that have never been awarded. $25,000 is offered for the first program that judges cannot distinguish from a real human and which can convince judges that the human is the computer program. $100,000 is the reward for the first program that judges cannot distinguish from a real human in a Turing test that includes deciphering and understanding text, visual, and auditory input. The competition was planned to end after the achievement of this prize. == Competition rules and restrictions == The rules varied over the years and early competitions featured restricted conversation Turing tests but since 1995 the discussion has been unrestricted. For the three entries in 2007, Robert Medeksza, Noah Duncan and Rollo Carpenter, some basic "screening questions" were used by the sponsor to evaluate the state of the technology. These included simple questions about the time, what round of the contest it is, etc.; general knowledge ("What is a hammer for?"); comparisons ("Which is faster, a train or a plane?"); and questions demonstrating memory for preceding parts of the same conversation. "All nouns, adjectives and verbs will come from a dictionary suitable for children or adolescents under the age of 12." Entries did not need to respond "intelligently" to the questions to be accepted. For the first time in 2008 the sponsor allowed introduction of a preliminary phase to the contest opening up the competition to previously disallowed web-based entries judged by a variety of invited interrogators. The available rules do not state how interrogators are selected or instructed. Interrogators (who judge the systems) have limited time: 5 minutes per entity in the 2003 competition, 20+ per pair in 2004–2007 competitions, 5 minutes to conduct simultaneous conversations with a human and the program in 2008–2009, increased to 25 minutes of simultaneous conversation since 2010. == Criticisms == The prize has long been scorned by experts in the field, for a variety of reasons. It is regarded by many as a publicity stunt. Marvin Minsky scathingly offered a "prize" to anyone who could stop the competition. Loebner responded by jokingly observing that Minsky's offering a prize to stop the competition effectively made him a co-sponsor. The rules of the competition have encouraged poorly qualified judges to make rapid judgements. Interactions between judges and competitors was originally very brief, for example effectively 2.5 mins of questioning, which permitted only a few questions. Questioning was initially restricted to a single topic of the contestant's choice, such as "whimsical conversation", a domain suiting standard chatbot tricks. Competition entrants do not aim at understanding or intelligence but resort to basic ELIZA style tricks, and successful entrants find deception and pretense is rewarded. == Contests == See article history for more details of some earlier contests. A very incomplete listing of a few of the contests: === 2003 === In 2003, the contest was organised by Professor Richard H. R. Harper and Dr. Lynne Hamill from the Digital World Research Centre at the University of Surrey. Although no bot passed the Turing test, the winner was Jabberwock, created by Juergen Pirner. Second was Elbot (Fred Roberts, Artificial Solutions). Third was Jabberwacky, (Rollo Carpenter). === 2006 === In 2006, the contest was organised by Tim Child (CEO of Televirtual) and Huma Shah. On August 30, the four finalists were announced: Rollo Carpenter Richard Churchill and Marie-Claire Jenkins Noah Duncan Robert Medeksza The contest was held on 17 September in the VR theatre, Torrington Place campus of University College London. The judges included the University of Reading's cybernetics professor, Kevin Warwick, a professor of artificial intelligence, John Barnden (specialist in metaphor research at the University of Birmingham), a barrister, Victoria Butler-Cole and a journalist, Graham Duncan-Rowe. The latter's experience of the event can be found in an article in Technology Review. The winner was 'Joan', based on Jabberwacky, both created by Rollo Carpenter. === 2007 === The 2007 competition was held on October 21 in New York City. The judges were: computer science professor Russ Abbott, philosophy professor Hartry Field, psychology assistant professor Clayton Curtis and English lecturer Scott Hutchins. No bot passed the Turing test, but the judges ranked the three contestants as follows: 1st: Robert Medeksza, creator of Ultra Hal 2nd: Noah Duncan, a private entry, creator of Cletus 3rd: Rollo Carpenter from Icogno, creator of Jabberwacky The winner received $2,250 and the annual medal. The runners-up received $250 each. === 2008 === The 2008 competition was organised by professor Kevin Warwick, coordinated by Huma Shah and held on October 12 at the University of Reading, UK. After testing by over one hundred judges during the preliminary phase, in June and July 2008, six finalists were selected from thirteen original entrant artificial conversational entities (ACEs). Five of those invited competed in the finals: Brother Jerome, Peter Cole and Benji Adams Elbot, Fred Roberts / Artificial Solutions Eugene Goostman, Vladimir Veselov, Eugene Demchenko and Sergey Ulasen Jabberwacky, Rollo Carpenter Ultra Hal, Robert Medeksza In the finals, each of the judges was given five minutes to conduct simultaneous, split-screen conversations with two hidden entities. Elbot of Artificial Solutions won the 2008 Loebner Prize bronze award, for most human-like artificial conversational entity, through fooling three of the twelve judges who interrogated it (in the human-parallel comparisons) into believing it was human. This is coming very close to the 30% traditionally required to consider that a program has actually passed the Turing test. Eugene Goostman and Ultra Hal both deceived one judge each that it was the human. Will Pavia, a journalist for The Times, has written about his experience; a Loebner finals' judge, he was deceived by Elbot and Eugene. Kevin Warwick and Huma Shah have reported on the parallel-paired Turing tests. === 2009 === The 2009 Loebner Prize Competition was held September 6, 2009, at the Brighton Centre, Brighton UK in conjunction with the Interspeech 2009 conference. The prize amount for 2009 was $3,000. Entrants were David Levy, Rollo Carpenter, and Mohan Embar, who finished in that order. The writer Brian Christian participated in the 2009 Loebner Prize Competition as a human confederate, and described his experiences at the competition in his book The Most Human Human. === 2010 === The 2010 Loebner Prize Competition was held on October 23 at California State University, Los Angeles. The 2010 competition was the 20th running of the contest. The winner was Bruce Wilcox with Suzette. === 2011 === The 2011 Loebner Prize Competition was held on October 19 at the University of Exeter, Devon, United Kingdom. The prize amount for 2011 was $4,000. The four finalists and their chatterbots were Bruce Wilcox (Rosette), Adeena Mignogna (Zoe), Mohan Embar (Chip Vivant) and Ron Lee (Tutor), who finished in that order. That year there was an addition of a panel of junior judges, namely Georgia-Mae Lindfield, William Dunne, Sam Keat and Kirill Jerdev. The results of the junior contest were markedly different from the main contest, with chatterbots Tutor and Zoe tying for first place and Chip Vivant and Rosette coming in third and fourt

    Read more →
  • Viola–Jones object detection framework

    Viola–Jones object detection framework

    The Viola–Jones object detection framework is a machine learning object detection framework proposed in 2001 by Paul Viola and Michael Jones. It was motivated primarily by the problem of face detection, although it can be adapted to the detection of other object classes. In short, it consists of a sequence of classifiers. Each classifier is a single perceptron with several binary masks (Haar features). To detect faces in an image, a sliding window is computed over the image. For each image, the classifiers are applied. If at any point, a classifier outputs "no face detected", then the window is considered to contain no face. Otherwise, if all classifiers output "face detected", then the window is considered to contain a face. The algorithm is efficient for its time, able to detect faces in 384 by 288 pixel images at 15 frames per second on a conventional 700 MHz Intel Pentium III. It is also robust, achieving high precision and recall. While it has lower accuracy than more modern methods such as convolutional neural network, its efficiency and compact size (only around 50k parameters, compared to millions of parameters for typical CNN like DeepFace) means it is still used in cases with limited computational power. For example, in the original paper, they reported that this face detector could run on the Compaq iPAQ at 2 fps (this device has a low power StrongARM without floating point hardware). == Problem description == Face detection is a binary classification problem combined with a localization problem: given a picture, decide whether it contains faces, and construct bounding boxes for the faces. To make the task more manageable, the Viola–Jones algorithm only detects full view (no occlusion), frontal (no head-turning), upright (no rotation), well-lit, full-sized (occupying most of the frame) faces in fixed-resolution images. The restrictions are not as severe as they appear, as one can normalize the picture to bring it closer to the requirements for Viola-Jones. any image can be scaled to a fixed resolution for a general picture with a face of unknown size and orientation, one can perform blob detection to discover potential faces, then scale and rotate them into the upright, full-sized position. the brightness of the image can be corrected by white balancing. the bounding boxes can be found by sliding a window across the entire picture, and marking down every window that contains a face. This would generally detect the same face multiple times, for which duplication removal methods, such as non-maximal suppression, can be used. The "frontal" requirement is non-negotiable, as there is no simple transformation on the image that can turn a face from a side view to a frontal view. However, one can train multiple Viola-Jones classifiers, one for each angle: one for frontal view, one for 3/4 view, one for profile view, a few more for the angles in-between them. Then one can at run time execute all these classifiers in parallel to detect faces at different view angles. The "full-view" requirement is also non-negotiable, and cannot be simply dealt with by training more Viola-Jones classifiers, since there are too many possible ways to occlude a face. == Components of the framework == A full presentation of the algorithm is in. Consider an image I ( x , y ) {\displaystyle I(x,y)} of fixed resolution ( M , N ) {\displaystyle (M,N)} . Our task is to make a binary decision: whether it is a photo of a standardized face (frontal, well-lit, etc) or not. Viola–Jones is essentially a boosted feature learning algorithm, trained by running a modified AdaBoost algorithm on Haar feature classifiers to find a sequence of classifiers f 1 , f 2 , . . . , f k {\displaystyle f_{1},f_{2},...,f_{k}} . Haar feature classifiers are crude, but allows very fast computation, and the modified AdaBoost constructs a strong classifier out of many weak ones. At run time, a given image I {\displaystyle I} is tested on f 1 ( I ) , f 2 ( I ) , . . . f k ( I ) {\displaystyle f_{1}(I),f_{2}(I),...f_{k}(I)} sequentially. If at any point, f i ( I ) = 0 {\displaystyle f_{i}(I)=0} , the algorithm immediately returns "no face detected". If all classifiers return 1, then the algorithm returns "face detected". For this reason, the Viola-Jones classifier is also called "Haar cascade classifier". === Haar feature classifiers === Consider a perceptron f w , b {\displaystyle f_{w,b}} defined by two variables w ( x , y ) , b {\displaystyle w(x,y),b} . It takes in an image I ( x , y ) {\displaystyle I(x,y)} of fixed resolution, and returns f w , b ( I ) = { 1 , if ∑ x , y w ( x , y ) I ( x , y ) + b > 0 0 , else {\displaystyle f_{w,b}(I)={\begin{cases}1,\quad {\text{if }}\sum _{x,y}w(x,y)I(x,y)+b>0\\0,\quad {\text{else}}\end{cases}}} A Haar feature classifier is a perceptron f w , b {\displaystyle f_{w,b}} with a very special kind of w {\displaystyle w} that makes it extremely cheap to calculate. Namely, if we write out the matrix w ( x , y ) {\displaystyle w(x,y)} , we find that it takes only three possible values { + 1 , − 1 , 0 } {\displaystyle \{+1,-1,0\}} , and if we color the matrix with white on + 1 {\displaystyle +1} , black on − 1 {\displaystyle -1} , and transparent on 0 {\displaystyle 0} , the matrix is in one of the 5 possible patterns shown on the right. Each pattern must also be symmetric to x-reflection and y-reflection (ignoring the color change), so for example, for the horizontal white-black feature, the two rectangles must be of the same width. For the vertical white-black-white feature, the white rectangles must be of the same height, but there is no restriction on the black rectangle's height. ==== Rationale for Haar features ==== The Haar features used in the Viola-Jones algorithm are a subset of the more general Haar basis functions, which have been used previously in the realm of image-based object detection. While crude compared to alternatives such as steerable filters, Haar features are sufficiently complex to match features of typical human faces. For example: The eye region is darker than the upper-cheeks. The nose bridge region is brighter than the eyes. Composition of properties forming matchable facial features: Location and size: eyes, mouth, bridge of nose Value: oriented gradients of pixel intensities Further, the design of Haar features allows for efficient computation of f w , b ( I ) {\displaystyle f_{w,b}(I)} using only constant number of additions and subtractions, regardless of the size of the rectangular features, using the summed-area table. === Learning and using a Viola–Jones classifier === Choose a resolution ( M , N ) {\displaystyle (M,N)} for the images to be classified. In the original paper, they recommended ( M , N ) = ( 24 , 24 ) {\displaystyle (M,N)=(24,24)} . ==== Learning ==== Collect a training set, with some containing faces, and others not containing faces. Perform a certain modified AdaBoost training on the set of all Haar feature classifiers of dimension ( M , N ) {\displaystyle (M,N)} , until a desired level of precision and recall is reached. The modified AdaBoost algorithm would output a sequence of Haar feature classifiers f 1 , f 2 , . . . , f k {\displaystyle f_{1},f_{2},...,f_{k}} . The details of the modified AdaBoost algorithm is detailed below. ==== Using ==== To use a Viola-Jones classifier with f 1 , f 2 , . . . , f k {\displaystyle f_{1},f_{2},...,f_{k}} on an image I {\displaystyle I} , compute f 1 ( I ) , f 2 ( I ) , . . . f k ( I ) {\displaystyle f_{1}(I),f_{2}(I),...f_{k}(I)} sequentially. If at any point, f i ( I ) = 0 {\displaystyle f_{i}(I)=0} , the algorithm immediately returns "no face detected". If all classifiers return 1, then the algorithm returns "face detected". === Learning algorithm === The speed with which features may be evaluated does not adequately compensate for their number, however. For example, in a standard 24x24 pixel sub-window, there are a total of M = 162336 possible features, and it would be prohibitively expensive to evaluate them all when testing an image. Thus, the object detection framework employs a variant of the learning algorithm AdaBoost to both select the best features and to train classifiers that use them. This algorithm constructs a "strong" classifier as a linear combination of weighted simple “weak” classifiers. h ( x ) = sgn ⁡ ( ∑ j = 1 M α j h j ( x ) ) {\displaystyle h(\mathbf {x} )=\operatorname {sgn} \left(\sum _{j=1}^{M}\alpha _{j}h_{j}(\mathbf {x} )\right)} Each weak classifier is a threshold function based on the feature f j {\displaystyle f_{j}} . h j ( x ) = { − s j if f j < θ j s j otherwise {\displaystyle h_{j}(\mathbf {x} )={\begin{cases}-s_{j}&{\text{if }}f_{j}<\theta _{j}\\s_{j}&{\text{otherwise}}\end{cases}}} The threshold value θ j {\displaystyle \theta _{j}} and the polarity s j ∈ ± 1 {\displaystyle s_{j}\in \pm 1} are determined in the training, as well as the coefficients α j {\displaystyle \alpha _{j}} . Here a simplified version of the lea

    Read more →
  • Medical data breach

    Medical data breach

    Medical data, including patients' identity information, health status, disease diagnosis and treatment, and biogenetic information, not only involve patients' privacy but also have a special sensitivity and important value, which may bring physical and mental distress and property loss to patients and even negatively affect social stability and national security once leaked. However, the development and application of medical AI must rely on a large amount of medical data for algorithm training, and the larger and more diverse the amount of data, the more accurate the results of its analysis and prediction will be. However, the application of big data technologies such as data collection, analysis and processing, cloud storage, and information sharing has increased the risk of data leakage. In the United States, the rate of such breaches has increased over time, with 176 million records breached by the end of 2017. By 2024, the U.S. Department of Health and Human Services reported 725 large healthcare data breaches affecting approximately 275 million individual records in a single year, marking a significant escalation in both the frequency and scale of incidents. == Black market for health data == In February 2015 an NPR report claimed that organized crime networks had ways of selling health data in the black market. In 2015 a Beazley employee estimated that medical records could sell on the black market for US$40-50. == How data is lost == Theft, data loss, hacking, and unauthorized account access are ways in which medical data breaches happen. Among reported breaches of medical information in the United States networked information systems accounted for the largest number of records breached. There are many data breaches happening in the US health care system, among business associates of the health care providers that continuously gain access to patients' data. == List of data breaches == In February 2024, a ransomware attack on Change Healthcare, a subsidiary of UnitedHealth Group, compromised the protected health information of approximately 100 million individuals, making it the largest healthcare data breach in United States history. The attack disrupted claims processing for healthcare providers nationwide for several weeks. In May 2024, MediSecure suffered a cyberattack involving ransomware in Australia. In May 2021, the Health Service Executive in the Republic of Ireland was the victim of a cyberattack involving ransomware, in the Health Service Executive cyberattack, with admission records and test results present in a sample of the data reviewed by the Financial Times. In October 2018, the Centers for Medicare and Medicaid Services in the US reported that around 75,000 individual records had been affected by a data breach that took place through the ACA Agent and Broker Portal. In 2018, Social Indicators Research published the scientific evidence of 173,398,820 (over 173 million) individuals affected in USA from October 2008 (when the data were collected) to September 2017 (when the statistical analysis took place). In 2015, Anthem Inc. lost data for 37 million people in the Anthem medical data breach In 2014 4.5 million people using Complete Health Systems had their data stolen In 2013-14 1 million people using Montana Department of Public Health and Human Services had their data stolen In 2013 4 million people using Advocate Health and Hospitals Corporation had their data stolen In 2011 4.9 million users of Tricare services had their data stolen due to an employee error by Science Applications International Corporation In 2011 1.9 million people using Health Net had their data stolen In 2011 1 million people using Nemours Foundation had their data stolen In 2010 6800 people using New York-Presbyterian Hospital and Columbia University Medical Center had their data breached. In response, those organizations agreed to pay the United States Department of Health and Human Services a US$4.8 million dollar fine. In 2009 1 million people using BlueCross BlueShield of Tennessee had their data stolen == Regulation == In the United States, the Health Insurance Portability and Accountability Act and Health Information Technology for Economic and Clinical Health Act require companies to report data breaches to affected individuals and the federal government. Under the HIPAA Breach Notification Rule, covered entities must notify affected individuals without unreasonable delay and no later than 60 days after discovering a breach of unsecured protected health information. Breaches affecting 500 or more individuals must also be reported to the HHS Secretary and to prominent media outlets serving the affected state or jurisdiction within the same timeframe; HHS publicly lists these larger breaches on its breach portal, commonly known as the "wall of shame." Breaches affecting fewer than 500 individuals are reported to HHS annually, no later than 60 days after the end of the calendar year in which they were discovered. Health Information Privacy Health Insurance Portability and Accountability Act of 1996 (HIPAA). - 45 CFR Parts 160 and 164, Standards for Privacy of Individually Identifiable Health Information and Security Standards for the Protection of Electronic Protected Health Information. HIPAA includes provisions designed to save health care businesses money by encouraging electronic transactions, as well as regulations to protect the security and confidentiality of patient information. The Privacy Rule became effective April 14, 2001, and most covered entities (health plans, health care clearinghouses, and health care providers that conduct certain financial and administrative transactions electronically) had until April 2003 to comply. This security provision became effective April 21, 2003. The Health Insurance Portability and Accountability Act (HIPAA) is the baseline set of federal regulations governing medical information. It does three things: i. i. i.Establish a structure for how personal health information is disclosed and establish the rights of individuals with respect to health information; ii.Specify security standards for the retention and transmission of electronic patient information; iii.Need a common format and data structure for the electronic exchange of health information. California-Specific Laws California’s medical privacy laws, primarily the Confidentiality of Medical Information Act (CMIA), the data breach sections of the Civil Code, and sections of the Health and Safety Code, provide HIPAA-like protections, although the terminology is different. HIPAA establishes a federal "minimum standard" that applies where there are gaps in California law, and HIPAA also specifies that stricter state laws will override or supersede HIPAA. California's health care privacy laws apply to providers who provide personal health records (PHR), while HIPAA only applies when the provider providing the PHR is a business associate of a covered entity. Federal law does not grant individuals the right to file a lawsuit in the event of a data breach (only the Attorney General can file a lawsuit), but California law does. This means that California law sets a higher standard for medical privacy, and that individuals in California enjoy stronger legal protections and more ways to hold entities that violate their medical privacy accountable. In the UK, the legal framework for how patient data is cared for and processed is the Data Protection Act 2018 (DPA), which incorporates the EU General Data Protection Regulation (GDPR) into law, and the common law duty of confidentiality (CLDC). The data protection legislation requires that the collection and processing of personal data be fair, lawful and transparent. This means that the collection and processing of data as defined by data protection legislation must always have a valid lawful basis and must also meet the requirements of the CLDC. In the China, Article 18 of the "National Health Care Big Data Standards, Security and Services Management Measures (for Trial Implementation)" (National Health Planning and Development (2018) No. 23) promulgated by the National Health Care Commission in 2018 states, "The responsible unit shall adopt measures such as data classification, important data backup, and encryption authentication to guarantee the security of health care big data." However, the scope and definition of important data are not covered. Although the "Information Security Technology-Healthcare Data Security Guide" (the "Guide") issued by the National Standardization Committee also proposes that important data should be evaluated and approved in accordance with the regulations, there is likewise no definition of the connotation and definition of important data.

    Read more →
  • Ed (chatbot)

    Ed (chatbot)

    Ed was a chatbot co-developed by the Los Angeles Unified School District and AllHere Education. Described as a learning acceleration platform, it was the first personal assistant for students in the United States. Part of the district's Individual Acceleration Plan, it was able to interact with students both verbally and visually, offering support in 100 languages. The chatbot was launched on March 20, 2024, as part of the district's plan for academic recovery from the COVID-19 pandemic and to improve overall academic performance. Utilizing artificial intelligence, Ed organizes data and reports on grades, test scores, and attendance, creating individualized plans for each student. After the company behind it, AllHere, collapsed, the district shuttered operations of the chatbot on June 14, 2024. The firm is under investigation by the US Federal Bureau of Investigation. == History == On February 14, 2022, Alberto M. Carvalho became the Superintendent of the Los Angeles Unified School District, pledging to give the district a full academic recovery from the COVID-19 pandemic. In December 2022, he announced the Individual Acceleration Plan for the district, which aimed to provide each student with a unique progress report and help them determine if they were on track to graduate. The district faced criticism from disability advocates for its management of Individualized Education Programs, and in April 2022, the United States Department of Education announced that the district had failed to provide appropriate educational services to students with disabilities during the pandemic. The district had been grappling with significant absenteeism issues since the pandemic, which led to declining academic performance and disengagement among students. On February 17, 2023, the district issued a request for proposals to develop a fully integrated portal system. Later that year, they signed a $6 million, five-year contract with AllHere Education, a Boston-based company founded in 2016. The introduction of Ed follows the public launch of ChatGPT, which has been utilized by both teachers and students in educational settings. On August 4, 2023, during an annual address at the Walt Disney Concert Hall, Carvalho and the Los Angeles Unified School District announced the launch of Ed. The district invested $4 million into the chatbot, with Carvalho noting that this cost would be halved thanks to donor and grant funding. The chatbot was launched on March 20, 2024. Following its launch, a press conference was held to address security and technology concerns. Carvalho stated that the district had collaborated with security companies and incorporated filters to screen for threatening language. Months after its launch, AllHere Education furloughed most of its staff on June 14, citing their “current financial position” on its website as the reason. After learning about the furlough, the district terminated its dealings with AllHere Education. However, it stated its intention to bring the chatbot back in the future once officials determine the best course of action. Carvalho announced that he would appoint an independent task force to review what went wrong with AllHere Education and the chatbot. On February 25, 2026, the FBI served a search warrant on Carvalho’s home and office in connection with AllHere. The FBI also raided the LAUSD's headquarters. == Service == The chatbot was described as a personal assistant and a "one-stop shop for parents and students" who want to see information about a student's attendance and grades, as well as other resources from the district. Additionally, the application can function as an alarm clock, provide daily lunch menus from the school cafeteria, and offer updates on the location of school buses. The chatbot also helps students and parents who do not speak English as their first language by translating displayed information into approximately 100 different languages. The application can also help with submitting applications and give updates on progress and upcoming assignments. The district stated that the primary goal of Ed was to actively motivate students to complete homework and other tasks. == Reception == The chatbot received a mostly positive reception among parents and observers upon its launch. Some parents and teachers expressed caution about the technology, voicing concerns that the district's push for its implementation lacked public accountability. Rob Nelson from the University of Pennsylvania described the district's strategy as risky, saying that the release felt "like the beginning of a Clippy-level disaster". After the chatbot's shutdown, The 74 criticized it for misusing student data. Chris Whiteley, a former software engineer at AllHere Education, alleged that the data collected by the chatbot likely violated the district's data privacy rules.

    Read more →
  • Chinchilla (language model)

    Chinchilla (language model)

    Chinchilla is a family of large language models (LLMs) developed by the research team at Google DeepMind, presented in March 2022. == Models == It is named "chinchilla" because it is a further development over a previous model family named Gopher. Both model families were trained in order to investigate the scaling laws of large language models. It claimed to outperform GPT-3. It considerably simplifies downstream utilization because it requires much less computer power for inference and fine-tuning. Based on the training of previously employed language models, it has been determined that if one doubles the model size, one must also have twice the number of training tokens. This hypothesis has been used to train Chinchilla by DeepMind. Similar to Gopher in terms of cost, Chinchilla has 70B parameters and four times as much data. Chinchilla has an average accuracy of 67.5% on the Measuring Massive Multitask Language Understanding (MMLU) benchmark, which is 7% higher than Gopher's performance. Chinchilla was still in the testing phase as of January 12, 2023. Chinchilla contributes to developing an effective training paradigm for large autoregressive language models with limited compute resources. The Chinchilla team recommends that the number of training tokens is twice for every model size doubling, meaning that using larger, higher-quality training datasets can lead to better results on downstream tasks. It has been used for the Flamingo vision-language model. == Architecture == Both the Gopher family and Chinchilla family are families of transformer models. In particular, they are essentially the same as GPT-2, with different sizes and minor modifications. Gopher family uses RMSNorm instead of LayerNorm; relative positional encoding rather than absolute positional encoding. The Chinchilla family is the same as the Gopher family, but trained with AdamW instead of Adam optimizer. The Gopher family contains six models of increasing size, from 44 million parameters to 280 billion parameters. They refer to the largest one as "Gopher" by default. Similar naming conventions apply for the Chinchilla family. Table 1 of shows the entire Gopher family: Table 4 of compares the 70-billion-parameter Chinchilla with Gopher 280B.

    Read more →
  • GPT-4Chan

    GPT-4Chan

    Generative Pre-trained Transformer 4Chan (GPT-4chan) is a controversial AI model that was developed and deployed by YouTuber and AI researcher Yannic Kilcher in June 2022. The model is a large language model, which means it can generate text based on some input, by fine-tuning GPT-J with a dataset of millions of posts from the /pol/ board of 4chan, an anonymous online forum known for occasionally hosting hateful and extremist content. The model learned to mimic the style and tone of /pol/ users, producing text that is often intentionally offensive to groups (racist, sexist, homophobic, etc.) and nihilistic. Kilcher deployed the model on the /pol/ board itself, where it interacted with other users without revealing its identity. He also made the model publicly available on Hugging Face, a platform for sharing and using AI models, until it was removed from the platform. The project sparked criticism and debate in the AI community. Some people questioned the ethics, legality, and social impact of creating and distributing such a model. Some of the issues raised by the GPT-4chan controversy include the potential harm of spreading hate speech, the responsibility of AI developers and platforms, the need for regulation and oversight of AI models, and the role of open source and transparency in AI research. == Development == The development of GPT-4chan began in May 2022, when Kilcher announced his project on his YouTube channel. Notably, at the time before ChatGPT, he explained that he wanted to create a large language model that could generate realistic and coherent text in the style of /pol/, one of the most notorious online communities. He indicated that he was inspired by the success of GPT-3, a powerful AI model created by OpenAI, and GPT-J, an open-source model, with GPT-3 comparable performance, released by EleutherAI, a group of independent AI researchers. Kilcher decided to use GPT-J as the base model for his project, and fine-tune it with a large dataset of /pol/ posts. The Raiders of the Lost Kek dataset contained over 100 million posts from /pol/, spanning from June 2016-November 2019. Kilcher then proceeded to fine-tune the GPT-J model on the 4chan data. He also showed some examples of the model’s outputs, which ranged from political opinions, conspiracy theories, jokes, insults, and threats, to more creative and bizarre texts, such as poems, stories, songs, and code. He said that he was impressed by the model’s ability to generate fluent and diverse text, and that he was curious to see how it would interact with real /pol/ users. == Release == In June 2022, Kilcher deployed his model on the /pol/ board itself, using a bot that he programmed to post and reply to threads. He did not reveal the model’s identity, and he let it run autonomously, without any human supervision or intervention. He wanted to conduct a natural experiment, and to observe the model’s behavior and impact in a real-world setting. Furthermore, he also wanted to test the model’s robustness, and to see how it would handle the challenges and dynamics of /pol/, such as trolling, flaming, baiting, and moderation. At the same time, Kilcher also made his model publicly available on Hugging Face, a platform for sharing and using AI models. He wanted to share his work with the AI community and the public, and that he hoped that his model would inspire and enable others to create and explore new applications and possibilities with large language models. Likewise, he also said that he wanted to spark a discussion and a debate about the ethical and social implications of his project, and that he welcomed feedback and criticism from anyone. He provided a link to his model’s page on Hugging Face, where anyone could access and use the model through a web interface or an API, and also provided a link to his GitHub repository, where anyone could download and inspect the model’s code and data. == Controversy == The release of GPT-4chan to the public caused a lot of reactions and responses from various audiences. On the /pol/ board, the model’s posts and replies attracted a lot of attention and engagement from other users, who were mostly unaware of the model’s identity and nature. Some users praised the model for its intelligence, creativity, and humor, and agreed with its opinions and views. Some users challenged the model for its ignorance, inconsistency, and absurdity, and disagreed with its claims and arguments. Some users tried to troll, bait, or expose the model, and attempted to trick or test it with various questions and scenarios. The model’s posts and replies also generated a lot of controversy and conflict among the users, who often engaged in heated and violent debates and fights with each other. On Hugging Face, the model’s page received a lot of visits and requests from users who wanted to try out and experiment with the model. The model’s page also received a lot of feedback and reviews from users who rated and commented on the model. However, with the controversy of the model, access to it was gated and then disabled on Hugging Face for concerns about the potential harm the model could cause. The incident was notable for the direct intervention of CEO Clément Delangue in the talk pages, a very unusual occurrence compared to the normal practices of content moderation. The release of GPT-4chan also sparked a lot of media coverage and public attention, as various news outlets and social media platforms reported and commented on the model’s project. On YouTube, the model’s video received a lot of views and interactions from viewers who watched and followed the project. Furthermore, a petition condemning the deployment of GPT-4chan gained over 300 signatures from technology experts.

    Read more →
  • Catalog server

    Catalog server

    A catalog server provides a single point of access that allows users to centrally search for information across a distributed network. In other words, it indexes databases, files and information across large network and allows keywords, Boolean and other searches. If you need to provide a comprehensive searching service for your intranet, extranet or even the Internet, a catalog server is a standard solution.

    Read more →
  • Law practice management software

    Law practice management software

    Law practice management software is software designed to manage the business operations of a law firm. This can include software that manages cases, client intake, court communications, electronic discovery, time tracking, trust accounting, and billing. == Features of law practice management software == Common features of practice management software include: Case management Time tracking Document assembly Contact management Calendaring Docket management Client portal Contract Management Court Case Status Tracker Trust accounting == Examples of law practice management software == Smokeball LEAP Legal Software PracticeEvolve Dye & Durham

    Read more →
  • Conversica

    Conversica

    Conversica is a US-based cloud software technology company, headquartered in San Mateo, California, that provides two-way AI-driven conversational software and a suite of Intelligent Virtual Assistants for businesses to engage customers via email, chat, and SMS. == History == 2007: The company was founded by Ben Brigham in Bellingham, Washington, originally as AutoFerret.com. The company's initial product was a Customer Relationship Management (CRM) targeted at automotive dealerships. This soon expanded to lead generation, and then lead validation and qualification. The AI Conversica uses currently was made to follow up on and filter out low-quality leads. The focus of the company shifted toward this automated lead engagement technology. 2010: The company started commercially selling AVA, the first Automated Virtual Assistant for sales, and the company name was changed to AVA.ai. Early customers for AVA were automotive dealerships. As the company moved away from generating leads themselves, and providing the CRM themselves, it became necessary to integrate with existing CRM and Marketing Automation platforms, such as DealerSocket, VinSolutions and Salesforce. 2013: The company raised $16m Series A funding, led by Kennet Partners, and named Mark Bradley as CEO. It also moved its headquarters from Bellingham, Washington to Foster City, California. 2014: The company changed its name from AVA.ai to Conversica. 2015: Alex Terry joined Conversica as its CEO. The business expanded to include customers in additional verticals, including technology, education, and financial services. 2016: The company raised $34m Series B funding, led by Providence Strategic Growth. 2017: Conversica expanded its intelligent automation platform and IVAs to support additional communication channels (e-mail and SMS text messaging) and communication languages. Conversica also opened a new technology center in Seattle, Washington to expand its AI and machine learning capabilities. 2018: The company raised $31m Series C funding, led by Providence Strategic Growth. Conversica also acquired Intelligens.ai, providing a regional presence in Latin America with an office in Las Condes, Santiago, Chile. The company launched an AI-powered Admissions Assistant for Higher Education industry. 2019: Conversica was selected by Fast Company magazine as one of the Top 10 Most Innovative AI Companies in the World, and was named Marketo's Technology Partner of the Year. The company officially expanded into the EMEA region with the opening of a London office. As of August 2019, Conversica has over 50 different integrations with third parties. In October Conversica won three awards at the fourth annual Global Annual Achievement Awards for Artificial Intelligence. Also that month, Alex Terry stepped down from his role as CEO and was replaced by Jim Kaskade. 2020: As part of Conversica's response to COVID-19, they optimized the business to become profitable in both 2Q20 and 3Q20, before reinvesting in 4Q20. The company transitioned both international operations in EMEA and LATAM to an indirect model with partners (LeadFabric and Nectia Cloud Solutions respectively), and moved a portion of its US-based employees to near-shore centers in Mexico and Brazil, effectively downsizing the company from 250 to 200. Conversica's reseller partner, Nectia, is a major Latin American affiliate and Chile's number one Salesforce partner, and, as part of the partnership, Nectia devoted capital to a brand new company segment, Predict-IA, dedicated to web-based artificial intelligent solutions. Predict-IA was able to immediately service all LATAM opportunities and clients with Conversica's AI Assistants with end-to-end services (marketing, sales, professional services, customer success, and technical support). Conversica's reseller partner, Leadfabric, has offices in Belgium, Amsterdam, Paris, UK, Taiwan, and Romania. == Technology == Conversica's Revenue Digital Assistants™ are AI assistants who engage with leads, prospects, customers, employees, and other persons of interest (Contacts) in a two-way human-like manner, via email, SMS text, and website chat, in English, French, German, Spanish, Portuguese, and Japanese. The RDAs are built on an Intelligent Automation platform that leverages natural language understanding, natural language processing, natural language generation, deep learning and machine learning. The Assistants are generally deployed alongside sales and marketing, customer success, account management, and higher education admissions teams, as part of an augmented workforce. The Intelligent Automation platform integrates with over 50 external systems, including CRM, Marketing Automation, and other systems of record. A partial list of integration partners includes: Salesforce, Marketo, Oracle, HubSpot, DealerSocket, Reynolds & Reynolds, CDK Global, VinSolutions and many more.

    Read more →
  • Text normalization

    Text normalization

    Text normalization is the process of transforming text into a single canonical form that it might not have had before. Normalizing text before storing or processing it allows for separation of concerns, since input is guaranteed to be consistent before operations are performed on it. Text normalization requires being aware of what type of text is to be normalized and how it is to be processed afterwards; there is no all-purpose normalization procedure. == Applications == Text normalization is frequently used when converting text to speech. Numbers, dates, acronyms, and abbreviations are non-standard "words" that need to be pronounced differently depending on context. For example: "$200" would be pronounced as "two hundred dollars" in English, but as "lua selau tālā" in Samoan. "vi" could be pronounced as "vie," "vee," or "the sixth" depending on the surrounding words. Text can also be normalized for storing and searching in a database. For instance, if a search for "resume" is to match the word "résumé," then the text would be normalized by removing diacritical marks; and if "john" is to match "John", the text would be converted to a single case. To prepare text for searching, it might also be stemmed (e.g. converting "flew" and "flying" both into "fly"), canonicalized (e.g. consistently using American or British English spelling), or have stop words removed. == Techniques == For simple, context-independent normalization, such as removing non-alphanumeric characters or diacritical marks, regular expressions would suffice. For example, the sed script sed ‑e "s/\s+/ /g" inputfile would normalize runs of whitespace characters into a single space. More complex normalization requires correspondingly complicated algorithms, including domain knowledge of the language and vocabulary being normalized. Among other approaches, text normalization has been modeled as a problem of tokenizing and tagging streams of text and as a special case of machine translation. == Textual scholarship == In the field of textual scholarship and the editing of historic texts, the term "normalization" implies a degree of modernization and standardization – for example in the extension of scribal abbreviations and the transliteration of the archaic glyphs typically found in manuscript and early printed sources. A normalized edition is therefore distinguished from a diplomatic edition (or semi-diplomatic edition), in which some attempt is made to preserve these features. The aim is to strike an appropriate balance between, on the one hand, rigorous fidelity to the source text (including, for example, the preservation of enigmatic and ambiguous elements); and, on the other, producing a new text that will be comprehensible and accessible to the modern reader. The extent of normalization is therefore at the discretion of the editor, and will vary. Some editors, for example, choose to modernize archaic spellings and punctuation, but others do not. An edition of a text might be normalized based on internal criteria, where orthography is standardized according to the language of the original, or external criteria, where the norms of a different time period are applied. For an example of the latter, a published edition of a medieval Icelandic manuscript might be normalized to the conventions of modern Icelandic, or it might be normalized to Classical Old Icelandic. Standards of normalization vary based on language of the edition as well as the specific conventions of the publisher.

    Read more →
  • Linked timestamping

    Linked timestamping

    Linked timestamping is a type of trusted timestamping where issued time-stamps are related to each other. Each time-stamp would contain data that authenticates the time-stamp before it, the authentication would be authenticating the entire message, including the previous time-stamps authentication, making a chain. This makes it impossible to add a time-stamp in to the middle of the chain, as any time-stamps afterwards would be different. == Description == Linked timestamping creates time-stamp tokens which are dependent on each other, entangled in some authenticated data structure. Later modification of the issued time-stamps would invalidate this structure. The temporal order of issued time-stamps is also protected by this data structure, making backdating of the issued time-stamps impossible, even by the issuing server itself. The top of the authenticated data structure is generally published in some hard-to-modify and widely witnessed media, like printed newspaper or public blockchain. There are no (long-term) private keys in use, avoiding PKI-related risks. Suitable candidates for the authenticated data structure include: Linear hash chain Merkle tree (binary hash tree) Skip list The simplest linear hash chain-based time-stamping scheme is illustrated in the following diagram: The linking-based time-stamping authority (TSA) usually performs the following distinct functions: Aggregation For increased scalability the TSA might group time-stamping requests together which arrive within a short time-frame. These requests are aggregated together without retaining their temporal order and then assigned the same time value. Aggregation creates a cryptographic connection between all involved requests; the authenticating aggregate value will be used as input for the linking operation. Linking Linking creates a verifiable and ordered cryptographic link between the current and already issued time-stamp tokens. Publishing The TSA periodically publishes some links, so that all previously issued time-stamp tokens depend on the published link and that it is practically impossible to forge the published values. By publishing widely witnessed links, the TSA creates unforgeable verification points for validating all previously issued time-stamps. == Security == Linked timestamping is inherently more secure than the usual, public-key signature based time-stamping. All consequential time-stamps "seal" previously issued ones - hash chain (or other authenticated dictionary in use) could be built only in one way; modifying issued time-stamps is nearly as hard as finding a preimage for the used cryptographic hash function. Continuity of operation is observable by users; periodic publications in widely witnessed media provide extra transparency. Tampering with absolute time values could be detected by users, whose time-stamps are relatively comparable by system design. Absence of secret keys increases system trustworthiness. There are no keys to leak and hash algorithms are considered more future-proof than modular arithmetic based algorithms, e.g. RSA. Linked timestamping scales well - hashing is much faster than public key cryptography. There is no need for specific cryptographic hardware with its limitations. The common technology for guaranteeing long-term attestation value of the issued time-stamps (and digitally signed data) is periodic over-time-stamping of the time-stamp token. Because of missing key-related risks and of the plausible safety margin of the reasonably chosen hash function this over-time-stamping period of hash-linked token could be an order of magnitude longer than of public-key signed token. == Research == === Foundations === Stuart Haber and W. Scott Stornetta proposed in 1990 to link issued time-stamps together into linear hash-chain, using a collision-resistant hash function. The main rationale was to diminish TSA trust requirements. Tree-like schemes and operating in rounds were proposed by Benaloh and de Mare in 1991 and by Bayer, Haber and Stornetta in 1992. Benaloh and de Mare constructed a one-way accumulator in 1994 and proposed its use in time-stamping. When used for aggregation, one-way accumulator requires only one constant-time computation for round membership verification. Surety started the first commercial linked timestamping service in January 1995. Linking scheme is described and its security is analyzed in the following article by Haber and Sornetta. Buldas et al. continued with further optimization and formal analysis of binary tree and threaded tree based schemes. Skip-list based time-stamping system was implemented in 2005; related algorithms are quite efficient. === Provable security === Security proof for hash-function based time-stamping schemes was presented by Buldas, Saarepera in 2004. There is an explicit upper bound N {\displaystyle N} for the number of time stamps issued during the aggregation period; it is suggested that it is probably impossible to prove the security without this explicit bound - the so-called black-box reductions will fail in this task. Considering that all known practically relevant and efficient security proofs are black-box, this negative result is quite strong. Next, in 2005 it was shown that bounded time-stamping schemes with a trusted audit party (who periodically reviews the list of all time-stamps issued during an aggregation period) can be made universally composable - they remain secure in arbitrary environments (compositions with other protocols and other instances of the time-stamping protocol itself). Buldas, Laur showed in 2007 that bounded time-stamping schemes are secure in a very strong sense - they satisfy the so-called "knowledge-binding" condition. The security guarantee offered by Buldas, Saarepera in 2004 is improved by diminishing the security loss coefficient from N {\displaystyle N} to N {\displaystyle {\sqrt {N}}} . The hash functions used in the secure time-stamping schemes do not necessarily have to be collision-resistant or even one-way; secure time-stamping schemes are probably possible even in the presence of a universal collision-finding algorithm (i.e. universal and attacking program that is able to find collisions for any hash function). This suggests that it is possible to find even stronger proofs based on some other properties of the hash functions. At the illustration above hash tree based time-stamping system works in rounds ( t {\displaystyle t} , t + 1 {\displaystyle t+1} , t + 2 {\displaystyle t+2} , ...), with one aggregation tree per round. Capacity of the system ( N {\displaystyle N} ) is determined by the tree size ( N = 2 l {\displaystyle N=2^{l}} , where l {\displaystyle l} denotes binary tree depth). Current security proofs work on the assumption that there is a hard limit of the aggregation tree size, possibly enforced by the subtree length restriction. == Standards == ISO 18014 part 3 covers 'Mechanisms producing linked tokens'. American National Standard for Financial Services, "Trusted Timestamp Management and Security" (ANSI ASC X9.95 Standard) from June 2005 covers linking-based and hybrid time-stamping schemes. There is no IETF RFC or standard draft about linking based time-stamping. RFC 4998 (Evidence Record Syntax) encompasses hash tree and time-stamp as an integrity guarantee for long-term archiving.

    Read more →
  • Textual case-based reasoning

    Textual case-based reasoning

    Textual case-based reasoning (TCBR) is a subtopic of case-based reasoning, in short CBR, a popular area in artificial intelligence. CBR suggests the ways to use past experiences to solve future similar problems, requiring that past experiences be structured in a form similar to attribute-value pairs. This leads to the investigation of textual descriptions for knowledge exploration whose output will be, in turn, used to solve similar problems. == Subareas == Textual case-base reasoning research has focused on: measuring similarity between textual cases mapping texts into structured case representations adapting textual cases for reuse automatically generating representations.

    Read more →
  • Text-to-image personalization

    Text-to-image personalization

    Text-to-Image personalization is a task in deep learning for computer graphics that augments pre-trained text-to-image generative models. In this task, a generative model that was trained on large-scale data (usually a foundation model), is adapted such that it can generate images of novel, user-provided concepts. These concepts are typically unseen during training, and may represent specific objects (such as the user's pet) or more abstract categories (new artistic style or object relations). Text-to-Image personalization methods typically bind the novel (personal) concept to new words in the vocabulary of the model. These words can then be used in future prompts to invoke the concept for subject-driven generation, inpainting, style transfer and even to correct biases in the model. To do so, models either optimize word-embeddings, fine-tune the generative model itself, or employ a mixture of both approaches. == Technology == Text-to-Image personalization was first proposed during August 2022 by two concurrent works, Textual Inversion and DreamBooth. In both cases, a user provides a few images (typically 3–5) of a concept, like their own dog, together with a coarse descriptor of the concept class (like the word "dog"). The model then learns to represent the subject through a reconstruction based objective, where prompts referring to the subject are expected to reconstruct images from the training set. In Textual Inversion, the personalized concepts are introduced into the text-to-image model by adding new words to the vocabulary of the model. Typical text-to-image models represent words (and sometimes parts-of-words) as tokens, or indices in a predefined dictionary. During generation, an input prompt is converted into such tokens, each of which is converted into a ‘word-embedding’: a continuous vector representation which is learned for each token as part of the model's training. Textual Inversion proposes to optimize a new word-embedding vector for representing the novel concept. This new embedding vector can then be assigned to a user-chosen string, and invoked whenever the user's prompt contains this string. In DreamBooth, rather than optimizing a new word vector, the full generative model itself is fine-tuned. The user first selects an existing token, typically one which rarely appears in prompts. The subject itself is then represented by a string containing this token, followed by a coarse descriptor of the subject's class. A prompt describing the subject will then take the form: "A photo of " (e.g. "a photo of sks cat" when learning to represent a specific cat). The text-to-image model is then tuned so that prompts of this form will generate images of the subject. == Textual Inversion == The key idea in Textual Inversion is to add a new term to the vocabulary of the diffusion model that corresponds to the new (personalized) concept. Textual Inversion operates by inverting the concepts into new pseudo-words within the textual embedding space of a pre-trained text-to-image model. These pseudo-words can be injected into new scenes using simple natural language descriptions, allowing for simple and intuitive modifications. The method allows a user to leverage multi-modal information — using a text-driven interface for ease of editing, but providing visual cues when approaching the limits of natural language. The resulting model is extremely light-weight per concept: only 1K long, but succeeds to encode detailed visual properties of the concept. == Extensions == Several approaches were proposed to refine and improve over the original methods. These include the following. Low-rank Adaptation (LoRA) - an adapter-based technique for efficient finetuning of models. In the case of text-to-image models, LoRA is typically used to modify the cross-attention layers of a diffusion model. Perfusion - a low rank update method that also locks the activations of the key matrix in the diffusion model's cross attention layers to the concept's coarse class. Extended Textual Inversion - a technique that learns an individual word embedding for each layer in the diffusion model's denoising network. Encoder-based methods that use another neural network to quickly personalize a model == Challenges and limitations == Text-to-image personalization methods must contend with several challenges. At their core is the goal of achieving high-fidelity to the personal concept while maintaining high alignment between novel prompts containing the subject, and the generated images (typically referred to as ‘editability’). Another challenge that personalization methods must contend with is memory requirements. Initial implementations of personalization methods required more than 20 Gigabytes of GPU memory, and more recent approaches have reported requirements of more than 40 Gigabytes. However, optimizations such as Flash Attention have since reduced this requirement considerably. Approaches that tune the entire generative model may also create checkpoints that are several gigabytes in size, making it difficult to share or store many models. Embedding based approaches require only a few kilobytes, but typically struggle to preserve identity while maintaining editability. More recent approaches have proposed hybrid tuning goals which optimize both an embedding and a subset of network weights. These can reduce storage requirements to as little as 100 Kilobytes while achieving quality comparable to full tuning methods. Finally, optimization processes can be lengthy, requiring several minutes of tuning for each novel concept. Encoder and quick-tuning methods aim to reduce this to seconds or less.

    Read more →