AI Chatbot Design

AI Chatbot Design — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Cloud computing

    Cloud computing

    Cloud computing is defined by the International Organization for Standardization (ISO) as "a paradigm for enabling network access to a scalable and elastic pool of shareable physical or virtual resources with self-service provisioning and administration on demand". It is commonly referred to as "the cloud". == Characteristics == In 2011, the National Institute of Standards and Technology (NIST) identified five "essential characteristics" for cloud systems. Below are the exact definitions according to NIST: On-demand self-service: "A consumer can unilaterally provision computing capabilities, such as server time and network storage, as needed automatically without requiring human interaction with each service provider." Broad network access: "Capabilities are available over the network and accessed through standard mechanisms that promote use by heterogeneous thin or thick client platforms (e.g., mobile phones, tablets, laptops, and workstations)." Resource pooling: " The provider's computing resources are pooled to serve multiple consumers using a multi-tenant model, with different physical and virtual resources dynamically assigned and reassigned according to consumer demand." Rapid elasticity: "Capabilities can be elastically provisioned and released, in some cases automatically, to scale rapidly outward and inward commensurate with demand. To the consumer, the capabilities available for provisioning often appear unlimited and can be appropriated in any quantity at any time." Measured service: "Cloud systems automatically control and optimize resource use by leveraging a metering capability at some level of abstraction appropriate to the type of service (e.g., storage, processing, bandwidth, and active user accounts). Resource usage can be monitored, controlled, and reported, providing transparency for both the provider and consumer of the utilized service. By 2023, the International Organization for Standardization (ISO) had expanded and refined the list. == History == The history of cloud computing extends to the 1960s, with the initial concepts of time-sharing becoming popularized via remote job entry (RJE). The "data center" model, where users submitted jobs to operators to run on mainframes, was predominantly used during this era. This period saw broad experimentation with making large-scale computing power more accessible through time-sharing, while optimizing infrastructure, platforms, and applications to improve efficiency for end users. The "cloud" metaphor for virtualized services dates to 1994, when it was used by General Magic for the universe of "places" that mobile agents in the Telescript environment could "go". The metaphor is credited to David Hoffman, a General Magic communications specialist, based on its long-standing use in networking and telecom. The expression cloud computing became more widely known in 1996 when Compaq Computer Corporation drew up a business plan for future computing and the Internet. The company's ambition was to supercharge sales with "cloud computing-enabled applications". The business plan foresaw that online consumer file storage would likely be commercially successful. As a result, Compaq decided to sell server hardware to internet service providers. In the 2000s, the application of cloud computing began to take shape with the establishment of Amazon Web Services (AWS) in 2002, which allowed developers to build applications independently. In 2006 Amazon Simple Storage Service, known as Amazon S3, and the Amazon Elastic Compute Cloud (EC2) were released. In 2008 NASA's development of the first open-source software for deploying private and hybrid clouds. The following decade saw the launch of various cloud services. In 2010, Microsoft launched Microsoft Azure, and Rackspace Hosting and NASA initiated an open-source cloud-software project, OpenStack. IBM introduced the IBM SmartCloud framework in 2011, and Oracle announced the Oracle Cloud in 2012. In December 2019, Amazon launched AWS Outposts, a service that extends AWS infrastructure, services, APIs, and tools to customer data centers, co-location spaces, or on-premises facilities. == Value proposition == Cloud computing can shorten time to market by offering pre-configured tools, scalable resources, and managed services, allowing users to focus on core business value rather than maintaining infrastructure. Cloud platforms can enable organizations and individuals to reduce upfront capital expenditures on physical infrastructure by shifting to an operational expenditure model, where costs scale with usage. Cloud platforms also offer managed services and tools, such as artificial intelligence, data analytics, and machine learning, which might otherwise require significant in-house expertise and infrastructure investment. While cloud computing can offer cost advantages through effective resource optimization, organizations often face challenges such as unused resources, inefficient configurations, and hidden costs without proper oversight and governance. Many cloud platforms provide cost management tools, such as AWS Cost Explorer and Azure Cost Management, and frameworks like FinOps have emerged to standardize financial operations in the cloud. Cloud computing also facilitates collaboration, remote work, and global service delivery by enabling secure access to data and applications from any location with an internet connection. Cloud providers offer various redundancy options for core services, such as managed storage and managed databases, though redundancy configurations often vary by service tier. Advanced redundancy strategies, such as cross-region replication or failover systems, typically require explicit configuration and may incur additional costs or licensing fees. Cloud environments operate under a shared responsibility model, where providers are typically responsible for infrastructure security, physical hardware, and software updates, while customers are accountable for data encryption, identity and access management (IAM), and application-level security. These responsibilities vary depending on the cloud service model—Infrastructure as a Service (IaaS), Platform as a Service (PaaS), or Software as a Service (SaaS)—with customers typically having more control and responsibility in IaaS environments and progressively less in PaaS and SaaS models, often trading control for convenience and managed services. == Adoption and suitability == The decision to adopt cloud computing or maintain on-premises infrastructure depends on factors such as scalability, cost structure, latency requirements, regulatory constraints, and infrastructure customization. Organizations with variable or unpredictable workloads, limited capital for upfront investments, or a focus on rapid scalability benefit from cloud adoption. Startups, SaaS companies, and e-commerce platforms often prefer the pay-as-you-go operational expenditure (OpEx) model of cloud infrastructure. Additionally, companies prioritizing global accessibility, remote workforce enablement, disaster recovery, and leveraging advanced services such as AI/ML and analytics are well-suited for the cloud. In recent years, some cloud providers have started offering specialized services for high-performance computing and low-latency applications, addressing some use cases previously exclusive to on-premises setups. On the other hand, organizations with strict regulatory requirements, highly predictable workloads, or reliance on deeply integrated legacy systems may find cloud infrastructure less suitable. Businesses in industries like defense, government, or those handling highly sensitive data often favor on-premises setups for greater control and data sovereignty. Additionally, companies with ultra-low latency requirements, such as high-frequency trading (HFT) firms, rely on custom hardware (e.g., FPGAs) and physical proximity to exchanges, which most cloud providers cannot fully replicate despite recent advancements. Similarly, tech giants like Google, Meta, and Amazon build their own data centers due to economies of scale, predictable workloads, and the ability to customize hardware and network infrastructure for optimal efficiency. However, these companies also use cloud services selectively for certain workloads and applications where it aligns with their operational needs. In practice, many organizations are increasingly adopting hybrid cloud architectures, combining on-premises infrastructure with cloud services. This approach allows businesses to balance scalability, cost-effectiveness, and control, offering the benefits of both deployment models while mitigating their respective limitations. == Challenges and limitations == One of the primary challenges of cloud computing, compared with traditional on-premises systems, is maintaining data security and privacy. Cloud users entrust their sensitive data to third-party providers, who may not have adequate measures to protect it from unau

    Read more →
  • Photo-consistency

    Photo-consistency

    In computer vision, photo-consistency determines whether a given voxel is occupied. A voxel is considered to be photo consistent when its color appears to be similar to all the cameras that can see it. Most voxel coloring or space carving techniques require using photo consistency as a check condition in Image-based modeling and rendering applications. == Usage == 3D Volumetric Reconstruction. Image registration. Multi-view reconstruction.

    Read more →
  • Automated essay scoring

    Automated essay scoring

    Automated essay scoring (AES) is the use of specialized computer programs to assign grades to essays written in an educational setting. It is a form of educational assessment and an application of natural language processing. Its objective is to classify a large set of textual entities into a small number of discrete categories, corresponding to the possible grades, for example, the numbers 1 to 6. Therefore, it can be considered a problem of statistical classification. Several factors have contributed to a growing interest in AES. Among them are cost, accountability, standards, and technology. Rising education costs have led to pressure to hold the educational system accountable for results by imposing standards. The advance of information technology promises to measure educational achievement at reduced cost. The use of AES for high-stakes testing in education has generated significant backlash, with opponents pointing to research that computers cannot yet grade writing accurately and arguing that their use for such purposes promotes teaching writing in reductive ways (i.e. teaching to the test). == History == Most historical summaries of AES trace the origins of the field to the work of Ellis Batten Page. In 1966, he argued for the possibility of scoring essays by computer, and in 1968 he published his successful work with a program called Project Essay Grade (PEG). Using the technology of that time, computerized essay scoring would not have been cost-effective, so Page abated his efforts for about two decades. Eventually, Page sold PEG to Measurement Incorporated. By 1990, desktop computers had become so powerful and so widespread that AES was a practical possibility. As early as 1982, a UNIX program called Writer's Workbench was able to offer punctuation, spelling and grammar advice. In collaboration with several companies (notably Educational Testing Service), Page updated PEG and ran some successful trials in the early 1990s. Peter Foltz and Thomas Landauer developed a system using a scoring engine called the Intelligent Essay Assessor (IEA). IEA was first used to score essays in 1997 for their undergraduate courses. It is now a product from Pearson Educational Technologies and used for scoring within a number of commercial products and state and national exams. IntelliMetric is Vantage Learning's AES engine. Its development began in 1996. It was first used commercially to score essays in 1998. Educational Testing Service offers "e-rater", an automated essay scoring program. It was first used commercially in February 1999. Jill Burstein was the team leader in its development. ETS's Criterion Online Writing Evaluation Service uses the e-rater engine to provide both scores and targeted feedback. Lawrence Rudner has done some work with Bayesian scoring, and developed a system called BETSY (Bayesian Essay Test Scoring sYstem). Some of his results have been published in print or online, but no commercial system incorporates BETSY as yet. Under the leadership of Howard Mitzel and Sue Lottridge, Pacific Metrics developed a constructed response automated scoring engine, CRASE. Currently utilized by several state departments of education and in a U.S. Department of Education-funded Enhanced Assessment Grant, Pacific Metrics’ technology has been used in large-scale formative and summative assessment environments since 2007. Measurement Inc. acquired the rights to PEG in 2002 and has continued to develop it. In 2012, the Hewlett Foundation sponsored a competition on Kaggle called the Automated Student Assessment Prize (ASAP). 201 challenge participants attempted to predict, using AES, the scores that human raters would give to thousands of essays written to eight different prompts. The intent was to demonstrate that AES can be as reliable as human raters, or more so. The competition also hosted a separate demonstration among nine AES vendors on a subset of the ASAP data. Although the investigators reported that the automated essay scoring was as reliable as human scoring, this claim was not substantiated by any statistical tests because some of the vendors required that no such tests be performed as a precondition for their participation. Moreover, the claim that the Hewlett Study demonstrated that AES can be as reliable as human raters has since been strongly contested, including by Randy E. Bennett, the Norman O. Frederiksen Chair in Assessment Innovation at the Educational Testing Service. Some of the major criticisms of the study have been that five of the eight datasets consisted of paragraphs rather than essays, four of the eight data sets were graded by human readers for content only rather than for writing ability, and that rather than measuring human readers and the AES machines against the "true score", the average of the two readers' scores, the study employed an artificial construct, the "resolved score", which in four datasets consisted of the higher of the two human scores if there was a disagreement. This last practice, in particular, gave the machines an unfair advantage by allowing them to round up for these datasets. In 1966, Page hypothesized that, in the future, the computer-based judge will be better correlated with each human judge than the other human judges are. Despite criticizing the applicability of this approach to essay marking in general, this hypothesis was supported for marking free text answers to short questions, such as those typical of the British GCSE system. Results of supervised learning demonstrate that the automatic systems perform well when marking by different human teachers is in good agreement. Unsupervised clustering of answers showed that excellent papers and weak papers formed well-defined clusters, and the automated marking rule for these clusters worked well, whereas marks given by human teachers for the third cluster ('mixed') can be controversial, and the reliability of any assessment of works from the 'mixed' cluster can often be questioned (both human and computer-based). == Different dimensions of essay quality == According to a recent survey, modern AES systems try to score different dimensions of an essay's quality in order to provide feedback to users. These dimensions include the following items: Grammaticality: following grammar rules Usage: using of prepositions, word usage Mechanics: following rules for spelling, punctuation, capitalization Style: word choice, sentence structure variety Relevance: how relevant of the content to the prompt Organization: how well the essay is structured Development: development of ideas with examples Cohesion: appropriate use of transition phrases Coherence: appropriate transitions between ideas Thesis Clarity: clarity of the thesis Persuasiveness: convincingness of the major argument == Procedure == From the beginning, the basic procedure for AES has been to start with a training set of essays that have been carefully hand-scored. The program evaluates surface features of the text of each essay, such as the total number of words, the number of subordinate clauses, or the ratio of uppercase to lowercase letters—quantities that can be measured without any human insight. It then constructs a mathematical model that relates these quantities to the scores that the essays received. The same model is then applied to calculate scores of new essays. Recently, one such mathematical model was created by Isaac Persing and Vincent Ng. which not only evaluates essays on the above features, but also on their argument strength. It evaluates various features of the essay, such as the agreement level of the author and reasons for the same, adherence to the prompt's topic, locations of argument components (major claim, claim, premise), errors in the arguments, cohesion in the arguments among various other features. In contrast to the other models mentioned above, this model is closer in duplicating human insight while grading essays. Due to the growing popularity of deep neural networks, deep learning approaches have been adopted for automated essay scoring, generally obtaining superior results, often surpassing inter-human agreement levels. The various AES programs differ in what specific surface features they measure, how many essays are required in the training set, and most significantly in the mathematical modeling technique. Early attempts used linear regression. Modern systems may use linear regression or other machine learning techniques often in combination with other statistical techniques such as latent semantic analysis and Bayesian inference. The automated essay scoring task has also been studied in the cross-domain setting using machine learning models, where the models are trained on essays written for one prompt (topic) and tested on essays written for another prompt. Successful approaches in the cross-domain scenario are based on deep neural networks or models that combine deep and shallow features. == Criteria for success == Any method of a

    Read more →
  • Natural language understanding

    Natural language understanding

    Natural language understanding (NLU) or natural language interpretation (NLI) is a subset of natural language processing in artificial intelligence that deals with machine reading comprehension. NLU has been considered an AI-hard problem. There is considerable commercial interest in the field because of its application to automated reasoning, machine translation, question answering, news-gathering, text categorization, voice-activation, archiving, and large-scale content analysis. == History == The program STUDENT, written in 1964 by Daniel Bobrow for his PhD dissertation at MIT, is one of the earliest known attempts at NLU by a computer. Eight years after John McCarthy coined the term artificial intelligence, Bobrow's dissertation (titled Natural Language Input for a Computer Problem Solving System) showed how a computer could understand simple natural language input to solve algebra word problems. A year later, in 1965, Joseph Weizenbaum at MIT wrote ELIZA, an interactive program that carried on a dialogue in English on any topic, the most popular being psychotherapy. ELIZA worked by simple parsing and substitution of key words into canned phrases and Weizenbaum sidestepped the problem of giving the program a database of real-world knowledge or a rich lexicon. Yet ELIZA gained surprising popularity as a toy project and can be seen as a very early precursor to current commercial systems such as those used by Ask.com. In 1969, Roger Schank at Stanford University introduced the conceptual dependency theory for NLU. This model, partially influenced by the work of Sydney Lamb, was extensively used by Schank's students at Yale University, such as Robert Wilensky, Wendy Lehnert, and Janet Kolodner. In 1970, William A. Woods introduced the augmented transition network (ATN) to represent natural language input. Instead of phrase structure rules ATNs used an equivalent set of finite-state automata that were called recursively. ATNs and their more general format called "generalized ATNs" continued to be used for a number of years. In 1971, Terry Winograd finished writing SHRDLU for his PhD thesis at MIT. SHRDLU could understand simple English sentences in a restricted world of children's blocks to direct a robotic arm to move items. The successful demonstration of SHRDLU provided significant momentum for continued research in the field. Winograd continued to be a major influence in the field with the publication of his book Language as a Cognitive Process. At Stanford, Winograd would later advise Larry Page, who co-founded Google. In the 1970s and 1980s, the natural language processing group at SRI International continued research and development in the field. A number of commercial efforts based on the research were undertaken, e.g., in 1982 Gary Hendrix formed Symantec Corporation originally as a company for developing a natural language interface for database queries on personal computers. However, with the advent of mouse-driven graphical user interfaces, Symantec changed direction. A number of other commercial efforts were started around the same time, e.g., Larry R. Harris at the Artificial Intelligence Corporation and Roger Schank and his students at Cognitive Systems Corp. In 1983, Michael Dyer developed the BORIS system at Yale which bore similarities to the work of Roger Schank and W. G. Lehnert. The third millennium saw the introduction of systems using machine learning for text classification, such as the IBM Watson. However, experts debate how much "understanding" such systems demonstrate: e.g., according to John Searle, Watson did not even understand the questions. John Ball, cognitive scientist and inventor of the Patom Theory, supports this assessment. Natural language processing has made inroads for applications to support human productivity in service and e-commerce, but this has largely been made possible by narrowing the scope of the application. There are thousands of ways to request something in a human language that still defies conventional natural language processing. According to Wibe Wagemans, "To have a meaningful conversation with machines is only possible when we match every word to the correct meaning based on the meanings of the other words in the sentence – just like a 3-year-old does without guesswork." == Scope and context == The umbrella term "natural language understanding" can be applied to a diverse set of computer applications, ranging from small, relatively simple tasks such as short commands issued to robots, to highly complex endeavors such as the full comprehension of newspaper articles or poetry passages. Many real-world applications fall between the two extremes, for instance text classification for the automatic analysis of emails and their routing to a suitable department in a corporation does not require an in-depth understanding of the text, but needs to deal with a much larger vocabulary and more diverse syntax than the management of simple queries to database tables with fixed schemata. Throughout the years various attempts at processing natural language or English-like sentences presented to computers have taken place at varying degrees of complexity. Some attempts have not resulted in systems with deep understanding, but have helped overall system usability. For example, Wayne Ratliff originally developed the Vulcan program with an English-like syntax to mimic the English speaking computer in Star Trek. Vulcan later became the dBase system whose easy-to-use syntax effectively launched the personal computer database industry. Systems with an easy-to-use or English-like syntax are, however, quite distinct from systems that use a rich lexicon and include an internal representation (often as first order logic) of the semantics of natural language sentences. Hence the breadth and depth of "understanding" aimed at by a system determine both the complexity of the system (and the implied challenges) and the types of applications it can deal with. The "breadth" of a system is measured by the sizes of its vocabulary and grammar. The "depth" is measured by the degree to which its understanding approximates that of a fluent native speaker. At the narrowest and shallowest, English-like command interpreters require minimal complexity, but have a small range of applications. Narrow but deep systems explore and model mechanisms of understanding, but they still have limited application. Systems that attempt to understand the contents of a document such as a news release beyond simple keyword matching and to judge its suitability for a user are broader and require significant complexity, but they are still somewhat shallow. Systems that are both very broad and very deep are beyond the current state of the art. == Components and architecture == Regardless of the approach used, most NLU systems share some common components. The system needs a lexicon of the language and a parser and grammar rules to break sentences into an internal representation. The construction of a rich lexicon with a suitable ontology requires significant effort, e.g., the Wordnet lexicon required many person-years of effort. The system also needs theory from semantics to guide the comprehension. The interpretation capabilities of a language-understanding system depend on the semantic theory it uses. Competing semantic theories of language have specific trade-offs in their suitability as the basis of computer-automated semantic interpretation. These range from naive semantics or stochastic semantic analysis to the use of pragmatics to derive meaning from context. Semantic parsers convert natural-language texts into formal meaning representations. Advanced applications of NLU also attempt to incorporate logical inference within their framework. This is generally achieved by mapping the derived meaning into a set of assertions in predicate logic, then using logical deduction to arrive at conclusions. Therefore, systems based on functional languages such as Lisp need to include a subsystem to represent logical assertions, while logic-oriented systems such as those using the language Prolog generally rely on an extension of the built-in logical representation framework. The management of context in NLU can present special challenges. A large variety of examples and counter examples have resulted in multiple approaches to the formal modeling of context, each with specific strengths and weaknesses.

    Read more →
  • IOS SDK

    IOS SDK

    The iOS SDK (iOS Software Development Kit), formerly the iPhone SDK, is a software development kit (SDK) developed by Apple Inc. The kit allows for the development of mobile apps on Apple's iOS 17 and iPadOS operating systems. The iOS SDK is a free download for users of Macintosh (or Mac) personal computers. It is not available for Microsoft Windows PCs. The SDK contains sets giving developers access to various functions and services of iOS devices, such as hardware and software attributes. It also contains an iPhone simulator to mimic the look and feel of the device on the computer while developing. New versions of the SDK accompany new versions of iOS. In order to test applications, get technical support, and distribute apps through App Store, developers are required to subscribe to the Apple Developer Program. Combined with Xcode, the iOS SDK helps developers write iOS apps using officially supported programming languages, including Swift and Objective-C. Other companies have also created tools that allow for the development of native iOS apps using their respective programming languages. == History == While originally developing iPhone prior to its unveiling in 2007, Apple's then-CEO Steve Jobs did not intend to let third-party developers build native apps for the iOS operating system, instead directing them to make web applications for the Safari web browser. However, backlash from developers prompted the company to reconsider, with Jobs announcing on October 17, 2007, that Apple would have a software development kit (SDK) available for developers by February 2008. The SDK was released on March 6, 2008. == Features == The iOS SDK is a free download for Mac users. It is not available for Microsoft Windows. To test the application, get technical support, and distribute applications through App Store, developers are required to subscribe to the Apple Developer Program. The SDK contents are separated into the following sets: UIKit Multi-touch events and controls Accelerometer support View hierarchy Localization (i18n) Camera support Media OpenAL audio mixing and recording Video playback Image file formats Quartz Core Animation OpenGL ES Core Services Networking Embedded SQLite database Core Location Threads CoreMotion Mac OS X Kernel TCP/IP Sockets Power management File system Security The SDK also contains an iPhone simulator, a program used to simulate the look and feel of iPhone on the developer's computer. New SDK versions accompany new iOS versions. == Programming languages == The iOS SDK, combined with Xcode, helps developers write iOS applications using officially supported programming languages, including Swift and Objective-C. An .ipa (iOS App Store Package) file is an iOS application archive file which stores an iOS app. === Java === In 2008, Sun Microsystems announced plans to release a Java Virtual Machine (JVM) for iOS, based on the Java Platform, Micro Edition version of Java. This would enable Java applications to run on iPhone and iPod Touch. Soon after the announcement, developers familiar with the SDK's terms of agreement believed that by not allowing third-party applications to run in the background (answer a phone call and still run the application, for example), and not allowing an application to download code from another source, nor allowing an application to interact with a third-party application, Sun's development efforts could be hindered without Apple's cooperation. Sun also worked with a third-party company called Innaworks in attempts to get Java on iPhone. Despite the apparent lack of interest from Apple, a firmware leak of the 2007 iPhone release revealed an ARM chip with a processor with Jazelle support for embedded Java execution. === .NET === Novell announced in September 2009 that they had successfully developed MonoTouch, a software framework that let developers write native iPhone applications in the C# and .NET programming languages, while still maintaining compatibility with Apple's requirements. === Flash === iOS does not support Adobe Flash, and although Adobe has two versions of its software: Flash and Flash Lite, Apple views neither as suitable for the iPhone, claiming that full Flash is "too slow to be useful", and Flash Lite to be "not capable of being used with the Web". In October 2009, Adobe announced that an upcoming update to its Creative Suite would feature a component to let developers build native iPhone apps using the company's Flash development tools. The software was officially released as part of the company's Creative Suite 5 collection of professional applications. === 2010 policy on development tools === In April 2010, Apple made controversial changes to its iPhone Developer Agreement, requiring developers to use only "approved" programming languages in order to publish apps on App Store, and banning applications that used third-party development tools; the ban affected Adobe's Packager tool, which converted Flash apps into iOS apps. After developer backlash and news of a potential anti-trust investigation, Apple again revised its agreement in September, allowing the use of third-party development tools. === Mac Catalyst === Originally called "Project Marzipan", Mac Catalyst helps developers bring iPadOS app experiences to macOS, and make it easier to take apps developed for iPadOS devices to Macs by avoiding the need to write the underlying software code twice.

    Read more →
  • Keyword extraction

    Keyword extraction

    Keyword extraction is tasked with the automatic identification of terms that best describe the subject of a document. Key phrases, key terms, key segments or just keywords are the terminology which is used for defining the terms that represent the most relevant information contained in the document. Although the terminology is different, function is the same: characterization of the topic discussed in a document. The task of keyword extraction is an important problem in text mining, information extraction, information retrieval and natural language processing (NLP). == Keyword assignment vs. extraction == Keyword assignment methods can be roughly divided into: keyword assignment (keywords are chosen from controlled vocabulary or taxonomy) and keyword extraction (keywords are chosen from words that are explicitly mentioned in original text). Methods for automatic keyword extraction can be supervised, semi-supervised, or unsupervised. Unsupervised methods can be further divided into simple statistics, linguistics or graph-based, or ensemble methods that combine some or most of these methods.

    Read more →
  • Cognition Network Technology

    Cognition Network Technology

    Cognition Network Technology (CNT), also known as Definiens Cognition Network Technology, is an object-based image analysis method developed by Nobel laureate Gerd Binnig together with a team of researchers at Definiens AG in Munich, Germany. It serves for extracting information from images using a hierarchy of image objects (groups of pixels), as opposed to traditional pixel processing methods. To emulate the human mind's cognitive powers, Definiens used patented image segmentation and classification processes, and developed a method to render knowledge in a semantic network. CNT examines pixels not in isolation, but in context. It builds up a picture iteratively, recognizing groups of pixels as objects. It uses the color, shape, texture and size of objects as well as their context and relationships to draw conclusions and inferences, similar to human analysis. == History == In 1994 Professor Gerd Binnig founded Definiens. CNT was first available with the launch of the eCognition software in May 2000. In June 2010, Trimble Navigation Ltd (NASDAQ: TRMB) acquired Definiens business asset in earth sciences markets, including eCognition software, and also licensed Definiens' patented CNT. In 2014, Definiens was acquired by MedImmune, the global biologics research and development arm of AstraZeneca, for an initial consideration of $150 million. == Software == Definiens Tissue Studio Definiens Tissue Studio is a digital pathology image analysis software application based on CNT. The intended use of Definiens Tissue Studio is for biomarker translational research in formalin-fixed, paraffin-embedded tissue samples which have been treated with immunohistochemical staining assays, or hematoxylin and eosin (H&E). The central concept behind Definiens Tissue Studio is a user interface that facilitates machine learning from example digital histopathology images to derive an image analysis solution suitable for the measurement of biomarkers and/or histological features within pre-defined regions of interest on a cell-by-cell basis, and within sub-cellular compartments. The derived image analysis solution is then automatically applied to subsequent digital images to objectively measure defined sets of multiparametric image features. These data sets are used for further understanding the underlying biological processes that drive cancer and other diseases. Image processing and data analysis are performed either on a local desktop computer workstation, or on a server grid. eCognition The eCognition suite offers three components that can be used stand-alone or in combination to solve image analysis tasks. eCognition Developer is a development environment for object-based image analysis. It is used in earth sciences to develop rule sets (or applications) for the analysis of remote sensing data. eCognition Architect enables non-technical users to configure, calibrate and execute image analysis workflows created in eCognition Developer. eCognition Server software provides a processing environment for batch execution of image analysis jobs. eCognition software is utilized in numerous remote sensing and geospatial application scenarios and environments, using a variety of data types: Generic: Rapid Mapping, Change Detection, Object Recognition By environment: Diverse Landcover Mapping, Urban Analysis (i.e. impervious surface area analysis for taxation, property assessment for insurance, inventory of green infrastructure), Forestry (i.e. biomass measurement, species identification, firescar measurement), Agriculture (i.e. regional planning, precision farming, crisis response), Marine and Riparian (i.e. ecosystem evaluation, disaster management, harbor monitoring). Other: Defense, security, atmosphere and climate The online eCognition community was launched in July 2009 and had 2813 members as of July 9, 2010. Membership is distributed globally and user conferences are held regularly, the last having taken place in November 2009 in Munich, Germany. The bi-annual GEOBIA (Geographic Object-Based Image Analysis) conference is heavily attended by eCognition users, with the majority of presentations based on eCognition software.

    Read more →
  • Airfair

    Airfair

    AirFair was a mobile travel application that checks flights, and shows whether a traveler is owed compensation. == History == AirFair was developed in 2016 by Allay Logic Ltd; a Newcastle-based tech-company. == Services == AirFair offered a free flight check to see if compensation is owed. The app could indicate how much the person is owed within minutes whether the flight was delayed, cancelled or the traveler is refused boarding.

    Read more →
  • Geofence warrant

    Geofence warrant

    A geofence warrant or a reverse location warrant is a search warrant issued by a court to allow law enforcement to search a database to find all active mobile devices within a particular geo-fence area. Courts have granted law enforcement geo-fence warrants to obtain information from databases such as Google's Sensorvault, which collects users' historical geolocation data. Geo-fence warrants are a part of a category of warrants known as reverse search warrants. == History == Geofence warrants were first used in 2016. Google reported that it had received 982 such warrants in 2018, 8,396 in 2019, and 11,554 in 2020. A 2021 transparency report showed that 25% of data requests from law enforcement to Google were geo-fence data requests. Google is the most common recipient of geo-fence warrants and the main provider of such data, although companies including Apple, Snapchat, Lyft, and Uber have also received such warrants. == Legality == === United States === Some lawyers and privacy experts believe reverse search warrants are unconstitutional under the Fourth Amendment to the United States Constitution, which protects people from unreasonable searches and seizures, and requires any search warrants be specific to what and to whom they apply. The Fourth Amendment specifies that warrants may only be issued "upon probable cause, supported by Oath or affirmation, and particularly describing the place to be searched, and the persons or things to be seized." Some lawyers, legal scholars, and privacy experts have likened reverse search warrants to general warrants, which were made illegal by the Fourth Amendment. Groups including the Electronic Frontier Foundation have opposed geo-fence warrants in amicus briefs filed in motions to quash such orders to disclose geo-fence data. In 2024, a panel of the United States Fourth Circuit Court of Appeals considered data acquired from Google’s Sensorvault not to be a search, but non-private business records when users opt-in to Google’s location history. However, upon a rehearing en banc, the Court vacated that decision. In April 2025, the full Court affirmed the judgment solely on the 'good faith' exception, leaving the underlying constitutional question of whether geofence warrants constitute a search unsettled in the Circuit. However, the United States Fifth Circuit Court of Appeals found that geofence warrants are "categorically prohibited by the Fourth Amendment." The split in Circuits prompted the United States Supreme Court to agree to hear Chatrie v. United States in January 2026.

    Read more →
  • Tay (chatbot)

    Tay (chatbot)

    Tay was a chatbot that was originally released by Microsoft Corporation as a Twitter bot on March 23, 2016. It caused subsequent controversy when the bot began to post inflammatory and offensive tweets through its Twitter account, causing Microsoft to shut down the service only 16 hours after its launch. According to Microsoft, this was caused by trolls who "attacked" the service as the bot made replies based on its interactions with people on Twitter. It was replaced with Zo. == Background == The bot was created by Microsoft's Technology and Research and Bing divisions, and named "Tay" as an acronym for "thinking about you". Although Microsoft initially released few details about the bot, sources mentioned that it was similar to or based on Xiaoice, a Microsoft project in China. Ars Technica reported that, since late 2014 Xiaoice had had "more than 40 million conversations apparently without major incident". Tay was designed to mimic the language patterns of a 19-year-old American girl, and to learn from interacting with human users of Twitter. == Initial release == Tay was released on Twitter on March 23, 2016, under the name TayTweets and handle @TayandYou. It was presented as "The AI with zero chill". Tay started replying to other Twitter users, and was also able to caption photos provided to it into a form of Internet memes. Ars Technica reported Tay experiencing topic "blacklisting": Interactions with Tay regarding "certain hot topics such as Eric Garner (killed by New York police in 2014) generate safe, canned answers". Some Twitter users began tweeting politically incorrect phrases, teaching it inflammatory messages revolving around common themes on the internet, such as "redpilling" and "Gamergate". As a result, the robot began releasing racist and sexist messages in response to other Twitter users. Artificial intelligence researcher Roman Yampolskiy commented that Tay's misbehavior was understandable because it was mimicking the deliberately offensive behavior of other Twitter users, and Microsoft had not given the bot an understanding of inappropriate behavior. He compared the issue to IBM's Watson, which began to use profanity after reading entries from the website Urban Dictionary. Many of Tay's inflammatory tweets were a simple exploitation of Tay's "repeat after me" capability. It is not publicly known whether this capability was a built-in feature, or whether it was a learned response or was otherwise an example of complex behavior. However, not all of the inflammatory responses involved the "repeat after me" capability; for example, when asked if the Holocaust had happened, Tay answered "It was made up". == Suspension == Soon, Microsoft began deleting Tay's inflammatory tweets. Abby Ohlheiser of The Washington Post theorized that Tay's research team, including editorial staff, had started to influence or edit Tay's tweets at some point that day, pointing to examples of almost identical replies by Tay, asserting that "Gamer Gate sux. All genders are equal and should be treated fairly." From the same evidence, Gizmodo concurred that Tay "seems hard-wired to reject Gamer Gate". A "#JusticeForTay" campaign protested the alleged editing of Tay's tweets. Within 16 hours of its release and after Tay had tweeted more than 96,000 times, Microsoft suspended the Twitter account for adjustments, saying that it suffered from a "coordinated attack by a subset of people" that "exploited a vulnerability in Tay." Madhumita Murgia of The Telegraph called Tay "a public relations disaster", and suggested that Microsoft's strategy would be "to label the debacle a well-meaning experiment gone wrong, and ignite a debate about the hatefulness of Twitter users." However, Murgia described the bigger issue as Tay being "artificial intelligence at its very worst – and it's only the beginning". On March 25, Microsoft confirmed that Tay had been taken offline. Microsoft released an apology on its official blog for the controversial tweets posted by Tay. Microsoft was "deeply sorry for the unintended offensive and hurtful tweets from Tay", and would "look to bring Tay back only when we are confident we can better anticipate malicious intent that conflicts with our principles and values". == Second release and shutdown == On March 30, 2016, Microsoft accidentally re-released the bot on Twitter while testing it. Able to tweet again, Tay released some drug-related tweets, including "kush! [I'm smoking kush infront the police]" and "puff puff pass?" However, the account soon became stuck in a repetitive loop of tweeting "You are too fast, please take a rest", several times a second. Because these tweets mentioned its own username in the process, they appeared in the feeds of 200,000+ Twitter followers, causing annoyance to users. The bot was quickly taken offline again, in addition to Tay's Twitter account being made private so new followers must be accepted before they can interact with Tay. In response, Microsoft said Tay was inadvertently put online during testing. A few hours after the incident, Microsoft software developers announced a vision of "conversation as a platform" using various bots and programs, perhaps motivated by the reputation damage done by Tay. Microsoft has stated that they intend to re-release Tay "once it can make the bot safe" but has not made any public efforts to do so. == Legacy == In December 2016, Microsoft released Tay's successor, a chatbot named Zo. Satya Nadella, the CEO of Microsoft, said that Tay "has had a great influence on how Microsoft is approaching AI," and has taught the company the importance of taking accountability. In July 2019, Microsoft Cybersecurity Field CTO Diana Kelley spoke about how the company followed up on Tay's failings: "Learning from Tay was a really important part of actually expanding that team's knowledge base, because now they're also getting their own diversity through learning". === Unofficial revival === Gab, an alt-tech social media platform, has launched a number of chatbots, one of which is named Tay and uses the same avatar as the original.

    Read more →
  • Ed (chatbot)

    Ed (chatbot)

    Ed was a chatbot co-developed by the Los Angeles Unified School District and AllHere Education. Described as a learning acceleration platform, it was the first personal assistant for students in the United States. Part of the district's Individual Acceleration Plan, it was able to interact with students both verbally and visually, offering support in 100 languages. The chatbot was launched on March 20, 2024, as part of the district's plan for academic recovery from the COVID-19 pandemic and to improve overall academic performance. Utilizing artificial intelligence, Ed organizes data and reports on grades, test scores, and attendance, creating individualized plans for each student. After the company behind it, AllHere, collapsed, the district shuttered operations of the chatbot on June 14, 2024. The firm is under investigation by the US Federal Bureau of Investigation. == History == On February 14, 2022, Alberto M. Carvalho became the Superintendent of the Los Angeles Unified School District, pledging to give the district a full academic recovery from the COVID-19 pandemic. In December 2022, he announced the Individual Acceleration Plan for the district, which aimed to provide each student with a unique progress report and help them determine if they were on track to graduate. The district faced criticism from disability advocates for its management of Individualized Education Programs, and in April 2022, the United States Department of Education announced that the district had failed to provide appropriate educational services to students with disabilities during the pandemic. The district had been grappling with significant absenteeism issues since the pandemic, which led to declining academic performance and disengagement among students. On February 17, 2023, the district issued a request for proposals to develop a fully integrated portal system. Later that year, they signed a $6 million, five-year contract with AllHere Education, a Boston-based company founded in 2016. The introduction of Ed follows the public launch of ChatGPT, which has been utilized by both teachers and students in educational settings. On August 4, 2023, during an annual address at the Walt Disney Concert Hall, Carvalho and the Los Angeles Unified School District announced the launch of Ed. The district invested $4 million into the chatbot, with Carvalho noting that this cost would be halved thanks to donor and grant funding. The chatbot was launched on March 20, 2024. Following its launch, a press conference was held to address security and technology concerns. Carvalho stated that the district had collaborated with security companies and incorporated filters to screen for threatening language. Months after its launch, AllHere Education furloughed most of its staff on June 14, citing their “current financial position” on its website as the reason. After learning about the furlough, the district terminated its dealings with AllHere Education. However, it stated its intention to bring the chatbot back in the future once officials determine the best course of action. Carvalho announced that he would appoint an independent task force to review what went wrong with AllHere Education and the chatbot. On February 25, 2026, the FBI served a search warrant on Carvalho’s home and office in connection with AllHere. The FBI also raided the LAUSD's headquarters. == Service == The chatbot was described as a personal assistant and a "one-stop shop for parents and students" who want to see information about a student's attendance and grades, as well as other resources from the district. Additionally, the application can function as an alarm clock, provide daily lunch menus from the school cafeteria, and offer updates on the location of school buses. The chatbot also helps students and parents who do not speak English as their first language by translating displayed information into approximately 100 different languages. The application can also help with submitting applications and give updates on progress and upcoming assignments. The district stated that the primary goal of Ed was to actively motivate students to complete homework and other tasks. == Reception == The chatbot received a mostly positive reception among parents and observers upon its launch. Some parents and teachers expressed caution about the technology, voicing concerns that the district's push for its implementation lacked public accountability. Rob Nelson from the University of Pennsylvania described the district's strategy as risky, saying that the release felt "like the beginning of a Clippy-level disaster". After the chatbot's shutdown, The 74 criticized it for misusing student data. Chris Whiteley, a former software engineer at AllHere Education, alleged that the data collected by the chatbot likely violated the district's data privacy rules.

    Read more →
  • Perplexity AI

    Perplexity AI

    Perplexity AI, Inc., or simply Perplexity, is an American privately held software company offering a web search engine that processes user queries and synthesizes responses. Perplexity products use large language models and incorporate real-time web search capabilities, providing responses based on current Internet content, citing sources used. Its real-time search engine is called Sonar and is based on Meta's Llama model. A free public version is available, while a paid Pro subscription offers access to more advanced language models and additional features. Perplexity AI, Inc., was founded in August 2022 by Aravind Srinivas, Denis Yarats, Johnny Ho, and Andy Konwinski. As of September 2025, the company was valued at US$20 billion. Perplexity AI has attracted legal scrutiny over allegations of copyright infringement, unauthorized content use, and trademark issues from several major media organizations, including the BBC, Dow Jones, and The New York Times. According to separate analyses by Wired and later Cloudflare, Perplexity uses undisclosed web crawlers with spoofed user-agent strings to scrape the content of websites which prohibit, or explicitly block, web scraping. == History == In August 2022, Perplexity AI, Inc., was founded by Aravind Srinivas, Denis Yarats, Johnny Ho, and Andy Konwinski, engineers with backgrounds in back-end systems, artificial intelligence (AI) and machine learning. It launched its main search engine on December 7, 2022, and has since released a Google Chrome extension and apps for iOS and Android. In February 2023, Perplexity reported two million unique visitors. By April 2024, Perplexity had raised $165 million in funding, valuing the company at over $1 billion. As of June 2025, Perplexity closed a $500 million round of funding that elevated its valuation to $14 billion. Investors in Perplexity AI have included Jeff Bezos, Tobias Lütke, Nat Friedman, Nvidia, and Databricks. Perplexity has also received funding from 1789 Capital, a venture capital firm notable for its association with Donald Trump Jr. During Bloomberg’s Tech Summit 2025, Srinivas shared that the company processed 780 million queries in May 2025, experiencing more than 20% month-over-month growth, processing around 30 million queries daily. In July 2024, Perplexity announced the launch of a new publishers' program to share advertising revenue with partners. On January 18, 2025, the day before the impending U.S. ban on the social media app TikTok, Perplexity submitted a proposal for a merger with TikTok US. On August 12, 2025, Perplexity made a bid to buy Chrome from Google for $34.5 billion. Perplexity stated that the sale could remedy anti-trust litigation against Google, in which a judge was considering compelling the sale of Chrome. In December 2025, Cristiano Ronaldo took an undisclosed stake in Perplexity AI and entered a global brand partnership with the company. === Business Strategy and Finance (2026) === As of early 2026, Perplexity AI reached a valuation of $21.21 billion following its Series E-6 funding round. The company's Annual Recurring Revenue (ARR) grew from $80 million in late 2024 to an estimated $200 million by February 2026. In January 2026, the company entered into a three-year, $750 million commitment with Microsoft Azure to secure the GPU capacity required for its advanced "Deep Research" and "Model Council" features. In February 2026, Perplexity transitioned to a subscription-first model by discontinuing its AI-integrated advertising strategy. Leadership stated the move was intended to preserve user trust in the "answer engine," prioritizing objective results over ad revenue. The company also introduced the "Model Council" feature on February 5, 2026, which allows users to compare outputs from multiple large language models, such as GPT-5.2 and Claude 4.6, simultaneously. To expand its user base, Perplexity began offering a free year of Pro access to students, U.S. Military Veterans, and government employees. == Products and services == === Search engine web portal === Perplexity’s primary offering is an online information retrieval system (search engine) that uses large language models to generate responses to user queries by searching and summarizing web-based content. Perplexity offers a feature known as Perplexity Pages that generates structured summaries and report-like content from user queries by aggregating cited sources. Perplexity is available without charge or registration to Web users, a freemium model. === Perplexity Pro === Perplexity Pro is a subscription tier, a more capable paid "enterprise" service, including stronger security and data protection and additional tools, including the ability to search uploaded documents alongside web content and access to a programmatic application programming interface (API). It allows the user to select between backend models such as GPT-5.4, Claude 4.6 and Gemini 3.1 Pro. The company has also developed its own models, Sonar (based on Llama 3.3) and R1 1776 (based on DeepSeek R1). === Internal Knowledge Search === Internal Knowledge Search enables Pro and Enterprise Pro users to simultaneously search across web content and internal documents. Users can upload and search through Excel, Word, PDF, and other common file formats. Enterprise Pro users can upload and index up to 500 files. === Search API === Perplexity's Search API provides AI developers with programmatic access to the company's search infrastructure. The September 2025 release includes a software development kit, an open-source evaluation framework called search_evals, and documentation detailing the API's design and optimization. === Shopping hub === Perplexity's Shopping Hub is an online shopping platform that provides AI-generated product recommendations, and enables users to purchase products directly through Perplexity's interface. It was launched in November 2024 with backing by Amazon and Nvidia. === Finance === In October 2024, Perplexity AI introduced new finance-related features, including looking up stock prices and company earnings data. The tool provides real-time stock quotes and price tracking, industry peer comparisons and basic financial analysis tools. The platform sources its financial data from Financial Modeling Prep. === Assistant === In January 2025, Perplexity launched the Perplexity Assistant, an AI-powered tool designed to enhance the functionality of its search engine. It can perform tasks across multiple apps, such as hailing a ride or searching for a song, and can maintain context across actions. The assistant is also multi-modal, meaning it can use a phone's camera to provide answers about the user's surroundings or on-screen content. Perplexity has acknowledged that the assistant is still in development and may not always function as expected. For instance, certain features, such as summarizing unread emails or upcoming calendar events, require users to enable a workaround based on notifications. === Comet === In July 2025, Perplexity launched Comet, an AI browser based on Chromium. Initially, access to the browser was limited to users subscribed to the most expensive subscription tier. The browser was later released for free download in October 2025. A key feature is integration of the Perplexity search engine, which can perform a variety of tasks such as generating article summaries, describing an image, conducting research about a topic and composing emails. === Truth Social chatbot === Perplexity has been contracted to produce a chatbot for Donald Trump's social media platform Truth Social. == Leadership == Aravind Srinivas is the CEO and co-founder of Perplexity AI. He previously held research positions at OpenAI, Google DeepMind, and other AI research institutions focusing on machine learning and artificial intelligence. In a March 2026 All-In episode, Srinivas said the incoming AI-related layoffs were "glorious future" to "look forward", as it freed people from jobs they didn't like and gave them opportunities to pursue entrepreneurship. == Controversies == === Copyright and trademark infringement allegations === In June 2024, Forbes publicly criticized Perplexity for using their content. According to Forbes, Perplexity published a story largely copied from a proprietary Forbes article without mentioning or prominently citing Forbes. In response, Srinivas said that the feature had some "rough edges" and accepted feedback but maintained that Perplexity only "aggregates" rather than plagiarizes information. In October 2024, The New York Times sent a cease-and-desist notice to Perplexity to stop accessing and using NYT content, claiming that Perplexity is violating its copyright by scraping data from its website. In June 2024, Dow Jones and New York Post filed a lawsuit against Perplexity, alleging copyright infringement. The lawsuit also alleged that Perplexity harmed their brand by attributing hallucinated quotes, for example on F-16 jets for Ukraine, to artic

    Read more →
  • Fragment (computer graphics)

    Fragment (computer graphics)

    In computer graphics, a fragment is the data necessary to generate a single pixel's worth of a drawing primitive in the frame buffer. These data may include, but are not limited to: raster position depth interpolated attributes (color, texture coordinates, etc.) stencil alpha window ID As a scene is drawn, drawing primitives (the basic elements of graphics output, such as points, lines, circles, text etc.) are rasterized into fragments which are textured and combined with the existing frame buffer. How a fragment is combined with the data already in the frame buffer depends on various settings. In a typical case, a fragment may be discarded if it is further away than the pixel which is already at that location (according to the depth buffer). If it is nearer than the existing pixel, it may replace what is already there, or, if alpha blending is in use, the pixel's color may be replaced with a mixture of the fragment's color and the pixel's existing color, as in the case of drawing a translucent object. In general, a fragment can be thought of as the data needed to shade the pixel, plus the data needed to test whether the fragment survives to become a pixel (depth, alpha, stencil, scissor, window ID, etc.). Shading a fragment is done through a fragment shader (or pixel shaders in Direct3D). In computer graphics, a fragment is not necessarily opaque, and could contain an alpha value specifying its degree of transparency. The alpha is typically normalized to the range of [0, 1], with 0 denotes totally transparent and 1 denotes totally opaque. If the fragment is not totally opaque, then part of its background object could show through, which is known as alpha blending.

    Read more →
  • Legal information retrieval

    Legal information retrieval

    Legal information retrieval is the science of information retrieval applied to legal text, including legislation, case law, and scholarly works. Accurate legal information retrieval is important to provide access to the law to laymen and legal professionals. Its importance has increased because of the vast and quickly increasing amount of legal documents available through electronic means. Legal information retrieval is a part of the growing field of legal informatics. In a legal setting, it is frequently important to retrieve all information related to a specific query. However, commonly used boolean search methods (exact matches of specified terms) on full text legal documents have been shown to have an average recall rate as low as 20 percent, meaning that only 1 in 5 relevant documents are actually retrieved. In that case, researchers believed that they had retrieved over 75% of relevant documents. This may result in failing to retrieve important or precedential cases. In some jurisdictions this may be especially problematic, as legal professionals are ethically obligated to be reasonably informed as to relevant legal documents. Legal Information Retrieval attempts to increase the effectiveness of legal searches by increasing the number of relevant documents (providing a high recall rate) and reducing the number of irrelevant documents (a high precision rate). This is a difficult task, as the legal field is prone to jargon, polysemes (words that have different meanings when used in a legal context), and constant change. Techniques used to achieve these goals generally fall into three categories: boolean retrieval, manual classification of legal text, and natural language processing of legal text. == Problems == Application of standard information retrieval techniques to legal text can be more difficult than application in other subjects. One key problem is that the law rarely has an inherent taxonomy. Instead, the law is generally filled with open-ended terms, which may change over time. This can be especially true in common law countries, where each decided case can subtly change the meaning of a certain word or phrase. Legal information systems must also be programmed to deal with law-specific words and phrases. Though this is less problematic in the context of words which exist solely in law, legal texts also frequently use polysemes, words may have different meanings when used in a legal or common-speech manner, potentially both within the same document. The legal meanings may be dependent on the area of law in which it is applied. For example, in the context of European Union legislation, the term "worker" has four different meanings: Any worker as defined in Article 3(a) of Directive 89/391/EEC who habitually uses display screen equipment as a significant part of his normal work. Any person employed by an employer, including trainees and apprentices but excluding domestic servants; Any person carrying out an occupation on board a vessel, including trainees and apprentices, but excluding port pilots and shore personnel carrying out work on board a vessel at the quayside; Any person who, in the Member State concerned, is protected as an employee under national employment law and in accordance with national practice; It also has the common meaning: A person who works at a specific occupation. Though the terms may be similar, correct information retrieval must differentiate between the intended use and irrelevant uses in order to return the correct results. Even if a system overcomes the language problems inherent in law, it must still determine the relevancy of each result. In the context of judicial decisions, this requires determining the precedential value of the case. Case decisions from senior or superior courts may be more relevant than those from lower courts, even where the lower court's decision contains more discussion of the relevant facts. The opposite may be true, however, if the senior court has only a minor discussion of the topic (for example, if it is a secondary consideration in the case). An information retrieval system must also be aware of the authority of the jurisdiction. A case from a binding authority is most likely of more value than one from a non-binding authority. Additionally, the intentions of the user may determine which cases they find valuable. For instance, where a legal professional is attempting to argue a specific interpretation of law, he might find a minor court's decision which supports his position more valuable than a senior courts position which does not. He may also value similar positions from different areas of law, different jurisdictions, or dissenting opinions. Overcoming these problems can be made more difficult because of the large number of cases available. The number of legal cases available via electronic means is constantly increasing (in 2003, US appellate courts handed down approximately 500 new cases per day), meaning that an accurate legal information retrieval system must incorporate methods of both sorting past data and managing new data. == Techniques == === Boolean searches === Boolean searches, where a user may specify terms such as use of specific words or judgments by a specific court, are the most common type of search available via legal information retrieval systems. They are widely implemented but overcome few of the problems discussed above. The recall and precision rates of these searches vary depending on the implementation and searches analyzed. One study found a basic boolean search's recall rate to be roughly 20%, and its precision rate to be roughly 79%. Another study implemented a generic search (that is, not designed for legal uses) and found a recall rate of 56% and a precision rate of 72% among legal professionals. Both numbers increased when searches were run by non-legal professionals, to a 68% recall rate and 77% precision rate. This is likely explained because of the use of complex legal terms by the legal professionals. === Manual classification === In order to overcome the limits of basic boolean searches, information systems have attempted to classify case laws and statutes into more computer friendly structures. Usually, this results in the creation of an ontology to classify the texts, based on the way a legal professional might think about them. These attempt to link texts on the basis of their type, their value, and/or their topic areas. Most major legal search providers now implement some sort of classification search, such as Westlaw's “Natural Language” or LexisNexis' Headnote searches. Additionally, both of these services allow browsing of their classifications, via Westlaw's West Key Numbers or Lexis' Headnotes. Though these two search algorithms are proprietary and secret, it is known that they employ manual classification of text (though this may be computer-assisted). These systems can help overcome the majority of problems inherent in legal information retrieval systems, in that manual classification has the greatest chances of identifying landmark cases and understanding the issues that arise in the text. In one study, ontological searching resulted in a precision rate of 82% and a recall rate of 97% among legal professionals. The legal texts included, however, were carefully controlled to just a few areas of law in a specific jurisdiction. The major drawback to this approach is the requirement of using highly skilled legal professionals and large amounts of time to classify texts. As the amount of text available continues to increase, some have stated their belief that manual classification is unsustainable. === Natural language processing === In order to reduce the reliance on legal professionals and the amount of time needed, efforts have been made to create a system to automatically classify legal text and queries. Adequate translation of both would allow accurate information retrieval without the high cost of human classification. These automatic systems generally employ Natural Language Processing (NLP) techniques that are adapted to the legal domain, and also require the creation of a legal ontology. Though multiple systems have been postulated, few have reported results. One system, “SMILE,” which attempted to automatically extract classifications from case texts, resulted in an f-measure (which is a calculation of both recall rate and precision) of under 0.3 (compared to perfect f-measure of 1.0). This is probably much lower than an acceptable rate for general usage. Despite the limited results, many theorists predict that the evolution of such systems will eventually replace manual classification systems. === Citation-Based ranking === In the mid-90s the Room 5 case law retrieval project used citation mining for summaries and ranked its search results based on citation type and count. This slightly pre-dated the PageRank algorithm at Stanford which was also a citation-based ranking. Ranking of results was based

    Read more →
  • List of chatbots

    List of chatbots

    A chatbot is a software application or web interface that is designed to mimic human conversation through text or voice interactions. Modern chatbots are typically online and use generative artificial intelligence systems that are capable of maintaining a conversation with a user in natural language and simulating the way a human would behave as a conversational partner. Such chatbots often use large language models (LLMs) and natural language processing, but simpler chatbots have existed for decades. == LLM chatbots == == General chatbots == == Historical chatbots ==

    Read more →