Handwriting recognition (HWR), also known as handwritten text recognition (HTR), is the ability of a computer to receive and interpret intelligible handwritten input from sources such as paper documents, photographs, touch-screens and other devices. The image of the written text may be sensed "off line" from a piece of paper by optical scanning (optical character recognition) or intelligent word recognition. Alternatively, the movements of the pen tip may be sensed "on line", for example by a pen-based computer screen surface, a generally easier task as there are more clues available. A handwriting recognition system handles formatting, performs correct segmentation into characters, and finds the most possible words. == Offline recognition == Offline handwriting recognition involves the automatic conversion of text in an image into letter codes that are usable within computer and text-processing applications. The data obtained by this form is regarded as a static representation of handwriting. Offline handwriting recognition is comparatively difficult, as different people have different handwriting styles. And, as of today, OCR engines are primarily focused on machine printed text and ICR for hand "printed" (written in capital letters) text. === Traditional techniques === ==== Character extraction ==== Offline character recognition often involves scanning a form or document. This means the individual characters contained in the scanned image will need to be extracted. Tools exist that are capable of performing this step. However, there are several common imperfections in this step. The most common is when characters that are connected are returned as a single sub-image containing both characters. This causes a major problem in the recognition stage. Yet many algorithms are available that reduce the risk of connected characters. ==== Character recognition ==== After individual characters have been extracted, a recognition engine is used to identify the corresponding computer character. Several different recognition techniques are currently available. ===== Feature extraction ===== Feature extraction works in a similar fashion to neural network recognizers. However, programmers must manually determine the properties they feel are important. This approach gives the recognizer more control over the properties used in identification. Yet any system using this approach requires substantially more development time than a neural network because the properties are not learned automatically. === Modern techniques === Where traditional techniques focus on segmenting individual characters for recognition, modern techniques focus on recognizing all the characters in a segmented line of text. Particularly they focus on machine learning techniques that are able to learn visual features, avoiding the limiting feature engineering previously used. State-of-the-art methods use convolutional networks to extract visual features over several overlapping windows of a text line image which a recurrent neural network uses to produce character probabilities. == Online recognition == Online handwriting recognition involves the automatic conversion of text as it is written on a special digitizer or PDA, where a sensor picks up the pen-tip movements as well as pen-up/pen-down switching. This kind of data is known as digital ink and can be regarded as a digital representation of handwriting. The obtained signal is converted into letter codes that are usable within computer and text-processing applications. The elements of an online handwriting recognition interface typically include: a pen or stylus for the user to write with a touch sensitive surface, which may be integrated with, or adjacent to, an output display. a software application which interprets the movements of the stylus across the writing surface, translating the resulting strokes into digital text. The process of online handwriting recognition can be broken down into a few general steps: preprocessing, feature extraction and classification The purpose of preprocessing is to discard irrelevant information in the input data, that can negatively affect the recognition. This concerns speed and accuracy. Preprocessing usually consists of binarization, normalization, sampling, smoothing and denoising. The second step is feature extraction. Out of the two- or higher-dimensional vector field received from the preprocessing algorithms, higher-dimensional data is extracted. The purpose of this step is to highlight important information for the recognition model. This data may include information like pen pressure, velocity or the changes of writing direction. The last big step is classification. In this step, various models are used to map the extracted features to different classes and thus identifying the characters or words the features represent. === Hardware === Commercial products incorporating handwriting recognition as a replacement for keyboard input were introduced in the early 1980s. Examples include handwriting terminals such as the Pencept Penpad and the Inforite point-of-sale terminal. With the advent of the large consumer market for personal computers, several commercial products were introduced to replace the keyboard and mouse on a personal computer with a single pointing/handwriting system, such as those from Pencept, CIC and others. The first commercially available tablet-type portable computer was the Write-Top from Linus Technologies, released in July 1988. Its operating system was based on MS-DOS. In the early 1990s, hardware makers including NCR, IBM and EO released tablet computers running the PenPoint operating system developed by GO Corp. PenPoint used handwriting recognition and gestures throughout and provided the facilities to third-party software. IBM's tablet computer was the first to use the ThinkPad name and used IBM's handwriting recognition. This recognition system was later ported to Microsoft Windows for Pen Computing, and IBM's Pen for OS/2. None of these were commercially successful. Advancements in electronics allowed the computing power necessary for handwriting recognition to fit into a smaller form factor than tablet computers, and handwriting recognition is often used as an input method for hand-held PDAs. The first PDA to provide written input was the Apple Newton, which exposed the public to the advantage of a streamlined user interface. However, the device was not a commercial success, owing to the unreliability of the software, which tried to learn a user's writing patterns. By the time of the release of the Newton OS 2.0, wherein the handwriting recognition was greatly improved, including unique features still not found in current recognition systems such as modeless error correction, the largely negative first impression had been made. After discontinuation of Apple Newton, the feature was incorporated in Mac OS X 10.2 and later as Inkwell. Palm later launched a successful series of PDAs based on the Graffiti recognition system. Graffiti improved usability by defining a set of "unistrokes", or one-stroke forms, for each character. This narrowed the possibility for erroneous input, although memorization of the stroke patterns did increase the learning curve for the user. The Graffiti handwriting recognition was found to infringe on a patent held by Xerox, and Palm replaced Graffiti with a licensed version of the CIC handwriting recognition which, while also supporting unistroke forms, pre-dated the Xerox patent. The court finding of infringement was reversed on appeal, and then reversed again on a later appeal. The parties involved subsequently negotiated a settlement concerning this and other patents. A Tablet PC is a notebook computer with a digitizer tablet and a stylus, which allows a user to handwrite text on the unit's screen. The operating system recognizes the handwriting and converts it into text. Windows Vista and Windows 7 include personalization features that learn a user's writing patterns or vocabulary for English, Japanese, Chinese Traditional, Chinese Simplified and Korean. The features include a "personalization wizard" that prompts for samples of a user's handwriting and uses them to retrain the system for higher accuracy recognition. This system is distinct from the less advanced handwriting recognition system employed in its Windows Mobile OS for PDAs. Although handwriting recognition is an input form that the public has become accustomed to, it has not achieved widespread use in either desktop computers or laptops. It is still generally accepted that keyboard input is both faster and more reliable. As of 2006, many PDAs offer handwriting input, sometimes even accepting natural cursive handwriting, but accuracy is still a problem, and some people still find even a simple on-screen keyboard more efficient. === Software === Early software could understand print handwriting where the characters were separated; however, cursive handwriting
Knowledge graph embedding
In representation learning, knowledge graph embedding (KGE), also called knowledge representation learning (KRL), or multi-relation learning, is a machine learning task of learning a low-dimensional representation of a knowledge graph's entities and relations while preserving their semantic meaning. Leveraging their embedded representation, knowledge graphs can be used for various applications such as link prediction, triple classification, entity recognition, clustering, and relation extraction. == Definition == A knowledge graph G = { E , R , F } {\displaystyle {\mathcal {G}}=\{E,R,F\}} is a collection of entities E {\displaystyle E} , relations R {\displaystyle R} , and facts F {\displaystyle F} . A fact is a triple ( h , r , t ) ∈ F {\displaystyle (h,r,t)\in F} that denotes a link r ∈ R {\displaystyle r\in R} between the head h ∈ E {\displaystyle h\in E} and the tail t ∈ E {\displaystyle t\in E} of the triple. Another notation that is often used in the literature to represent a triple (or fact) is ⟨ head , relation , tail ⟩ {\displaystyle \langle {\text{head}},{\text{relation}},{\text{tail}}\rangle } . This notation is called the Resource Description Framework (RDF). A knowledge graph represents the knowledge related to a specific domain; leveraging this structured representation, it is possible to infer a piece of new knowledge from it after some refinement steps. However, nowadays, people have to deal with the sparsity of data and the computational inefficiency to use them in a real-world application. The embedding of a knowledge graph is a function that translates each entity and each relation into a vector of a given dimension d {\displaystyle d} , called embedding dimension. It is even possible to embed the entities and relations with different dimensions. The embedding vectors can then be used for other tasks. A knowledge graph embedding is characterized by four aspects: Representation space: The low-dimensional space in which the entities and relations are represented. Scoring function: A measure of the goodness of a triple-embedded representation. Encoding models: The modality in which the embedded representation of the entities and relations interact with each other. Additional information: Any additional information coming from the knowledge graph that can enrich the embedded representation. Usually, an ad hoc scoring function is integrated into the general scoring function for each additional piece of information. == Embedding procedure == All algorithms for creating a knowledge graph embedding follow the same approach. First, the embedding vectors are initialized to random values. Then, they are iteratively optimized using a training set of triples. In each iteration, a batch of size b {\displaystyle b} triples is sampled from the training set, and a triple from it is sampled and corrupted—i.e., a triple that does not represent a true fact in the knowledge graph. The corruption of a triple involves substituting the head or the tail (or both) of the triple with another entity that makes the fact false. The original triple and the corrupted triple are added in the training batch, and then the embeddings are updated, optimizing a scoring function. Iteration stops when a stop condition is reached. Usually, the stop condition depends on the overfitting of the training set. At the end, the learned embeddings should have extracted semantic meaning from the training triples and should correctly predict unseen true facts in the knowledge graph. === Pseudocode === The following is the pseudocode for the general embedding procedure. algorithm Compute entity and relation embeddings input: The training set S = { ( h , r , t ) } {\displaystyle S=\{(h,r,t)\}} , entity set E {\displaystyle E} , relation set R {\displaystyle R} , embedding dimension k {\displaystyle k} output: Entity and relation embeddings initialization: the entities e {\displaystyle e} and relations r {\displaystyle r} embeddings (vectors) are randomly initialized while stop condition do S b a t c h ← s a m p l e ( S , b ) {\displaystyle S_{batch}\leftarrow sample(S,b)} // Sample a batch from the training set for each ( h , r , t ) {\displaystyle (h,r,t)} in S b a t c h {\displaystyle S_{batch}} do ( h ′ , r , t ′ ) ← s a m p l e ( S ′ ) {\displaystyle (h',r,t')\leftarrow sample(S')} // Sample a corrupted fact T b a t c h ← T b a t c h ∪ { ( ( h , r , t ) , ( h ′ , r , t ′ ) ) } {\displaystyle T_{batch}\leftarrow T_{batch}\cup \{((h,r,t),(h',r,t'))\}} end for Update embeddings by minimizing the loss function end while == Performance indicators == These indexes are often used to measure the embedding quality of a model. The simplicity of the indexes makes them very suitable for evaluating the performance of an embedding algorithm even on a large scale. Given Q {\displaystyle {\ce {Q}}} as the set of all ranked predictions of a model, it is possible to define three different performance indexes: Hits@K, MR, and MRR. === Hits@K === Hits@K or in short, H@K, is a performance index that measures the probability to find the correct prediction in the first top K model predictions. Usually, it is used k = 10 {\displaystyle k=10} . Hits@K reflects the accuracy of an embedding model to predict the relation between two given triples correctly. Hits@K = | { q ∈ Q : q < k } | | Q | ∈ [ 0 , 1 ] {\displaystyle ={\frac {|\{q\in Q:q In knowledge representation, particularly in the Semantic Web, a metaclass is a class whose instances can themselves be classes. Similar to their role in programming languages, metaclasses in ontology languages can have properties otherwise applicable only to individuals, while retaining the same class's ability to be classified in a concept hierarchy. This enables knowledge about instances of those metaclasses to be inferred by semantic reasoners using statements made in the metaclass. Metaclasses thus enhance the expressivity of knowledge representations in a way that can be intuitive for users. While classes are suitable to represent a population of individuals, metaclasses can, as one of their feature, be used to represent the conceptual dimension of an ontology. Metaclasses are supported in the Web Ontology Language (OWL) and the data-modeling vocabulary RDFS. Metaclasses are often modeled by setting them as the object of claims involving rdf:type and rdfs:subClassOf—built-in properties commonly referred to as instance of and subclass of. Instance of entails that the subject of the claim is an instance, i.e. an individual that is a member of a class. Subclass of entails that the subject is a class. In the context of instance of and subclass of, the key difference between metaclasses and ordinary classes is that metaclasses are the object of instance of claims used on a class, while ordinary classes are not objects of such claims. (e.g. in a claim Bob instance of Human, Bob is the subject and an Instance, while the object, Human, is an ordinary class; but a further claim that Human instance of Animal species makes "Animal species" a metaclass because it has a member, "Human", that is also a Class). OWL 2 DL supports metaclasses by a feature called punning, in which one entity is interpreted as two different types of thing—a class and an individual—depending on its syntactic context. For example, through punning, an ontology could have a concept hierarchy such as Harry the eagle instance of golden eagle, golden eagle subclass of bird, and golden eagle instance of species. In this case, the punned entity would be golden eagle, because it is represented as a class (second claim) and an instance (third claim); whereas the metaclass would be species, as it has an instance that is a class. Punning also enables other properties that would otherwise be applicable only to ordinary instances to be used directly on classes, for example "golden eagle conservation status least concern." Having arisen from the fields of knowledge representation, description logic and formal ontology, Semantic Web languages have a closer relationship to philosophical ontology than do conventional programming languages such as Java or Python. Accordingly, the nature of metaclasses is informed by philosophical notions such as abstract objects, the abstract and concrete, and type-token distinction. Metaclasses permit concepts to be construed as tokens of other concepts while retaining their ontological status as types. This enables types to be enumerated over, while preserving the ability to inherit from types. For example, metaclasses could allow a machine reasoner to infer from a human-friendly ontology how many elements are in the periodic table, or, given that number of protons is a property of chemical element and isotopes are a subclass of elements, how many protons exist in the isotope hydrogen-2. Metaclasses are sometime organized by levels, in a similar way to the simple Theory of types where classes that are not metaclasses are assigned the first level, classes of classes in the first level are in the second level, classes of classes in the second level on the next and so on. == Examples == Following the type-token distinction, real world objects such as Abraham Lincoln or the planet Mars are regrouped into classes of similar objects. Abraham Lincoln is said to be an instance of human, and Mars is an instance of planet. This is a kind of is-a relationship. Metaclasses are class of classes, such as for example the nuclide concept. In chemistry, atoms are often classified as elements and, more specifically, isotopes. The glass of water one last drank has many hydrogen atoms, each of which is an instance of hydrogen. Hydrogen itself, a class of atoms, is an instance of nuclide. Nuclide is a class of classes, hence a metaclass. == Implementations == === RDF and RDFS === In RDF, the rdf:type property is used to state that a resource is an instance of a class. This enables metaclasses to be easily created by using rdf:type in a chain-like fashion. For example, in the two triples the resource species is a metaclass, because golden eagle is used as a class in the first statement and the class golden eagle is said to be an instance of the class species in the second statement. This way of doing allows :species to have non-class instances. RDF also provides rdf:Property as a way to create properties beyond those defined in the built-in vocabulary. Properties can be used directly on metaclasses, for example "species quantity 8.7 million", where quantity is a property defined via rdf:Property and species is a metaclass per the preceding example above. RDFS, an extension of RDF, introduced rdfs:Class and rdfs:subClassOf and enriched how vocabularies can classify concepts. Whereas rdf:type enables vocabularies to represent instantiation, the property rdfs:subClassOf enables vocabularies to represent subsumption. RDFS thus makes it possible for vocabularies to represent taxonomies, also known as subsumption hierarchies or concept hierarchies, which is an important addition to the type–token distinction made possible by RDF. Notably, the resource rdfs:Class is an instance of itself, demonstrating both the use of metaclasses in the language's internal implementation and a reflexive usage of rdf:type. RDFS is its own metamodel. This allows a second way to express that a resource is a metaclass. A triple to instantiate rdfs:Class, for example :golden_eagle rdf:type rdfs:Class will declare :golden_eagle as a class. It's also possible to subclass the rdfs:Class resource to declare a meta-class resource, for example :species rdfs:SubclassOf. By deduction, any instance of :species is then a class, so it is a class with class-instances, a meta-class.. This second way does not allows non-class instances of species and explicitly declares :tpecies as a meta-class. === OWL === In some OWL flavors like OWL1-DL, entities can be either classes or instances, but cannot be both. This limitations forbids metaclasses and metamodeling. This is not the case in the OWL1 full flavor, but this allows the model to be computationally undecidable. In OWL2, metaclasses can implemented with punning, that is a way to treat classes as if they were individuals. Other approaches have also been proposed and used to check the properties of ontologies at a meta level. ==== Punning ==== OWL 2 supports metaclasses through a feature called punning. In metaclasses implemented by punning, the same subject is interpreted as two fundamentally different types of thing—a class and an individual—depending on its syntactic context. This is similar to a pun in natural language, where different senses of the same word are emphasized to illustrate a point. Unlike in natural language, where puns are typically used for comedic or rhetorical effect, the main goal of punning in Semantic Web technologies is to make concepts easier to represent, closer to how they are discussed in everyday speech or academic literature. Although OWL 2 permits the same symbol to assume different roles, its standard semantics (known as Direct Semantics) still interprets the symbol differently depending on whether it is used as an individual, a class, or a property. === Protégé === In the ontology editor Protégé, metaclasses are templates for other classes who are their instances. == Classification == Some ontologies like the Cyc AI project's classifies classes and metaclasses. Classes are divided into fixed-order classes and variable-order classes. In the case of fixed-order classes, an order is attributed for metaclasses by measuring the distance to individuals with respect to the number of "instance of" triples that are necessary to find an individual. Classes that are not metaclasses are classes of individuals, so their order is "1" (first-order classes). Metaclasses that are classes of first-order classes' order is "2" (second-order classes), and so on. Variable-order metaclasses, on the other hand, can have instances; one example of variable-order metaclass is the class of all fixed-order classes. Ari Holtzman is a professor of Computer Science at the University of Chicago and an expert in the area of natural language processing and computational linguistics. Previously, Holtzman was a PhD student at the University of Washington where he was advised by Luke Zettlemoyer. In 2017, he was a member of the winning team for the inaugural Alexa Prize for developing a conversational AI system for the Amazon Alexa device. Holtzman has made multiple contributions in the area of text generation and language models such as the introduction of nucleus sampling in 2019, his work on AI safety and neural fake news detection, and the fine-tuning of quantized large language models. The DARPA AlphaDogfight was a 2019–2020 DARPA program that pitted computers using F-16 flight simulators against one another. The computers were managed by eight teams of humans, who competed in a single-round elimination for the right to battle a skilled human dogfighter. Heron Systems corporation wrote a deep reinforcement learning software tool that bested the human pilot by a score of 5–0. The tournament program was managed by the Applied Physics Laboratory. The trials took place in October 2019 and January 2020 while the finals were held in August 2020. In 2024 a successor version of the program was tested with in the physical world with the X-62A. Azure Maps is a suite of cloud-based, location-based services provided by Microsoft as part of the company's Azure platform. The platform provides geospatial and location-based services via REST APIs and software development kits (SDKs). The service is typically used to integrate maps or geospatial data into applications. Azure Maps differs from Microsoft's other enterprise mapping service, Bing Maps, in its pricing model, focus on privacy, and its level of integration into the broader Azure cloud ecosystem. == History == Azure Maps was first introduced in public preview mode under the name "Azure Location Based Services" in 2017, primarily as an enterprise solution. The services was intended to add mapping and location-based functionality onto the existing Azure cloud services suite, seen as a critical part of Microsoft's broader Internet-of-Things (IoT) strategy. The preview version included APIs which could be used to develop location aware apps for use cases such as logistics and mobility. In 2018, the software was renamed "Azure Maps," and became generally available to the public, and a number of new functions were added, including route calculation, travel time calculation, and incorporation of real-time traffic data and incident information. Azure Maps was integrated with Azure IoT Central in 2018, which added tracking, monitoring, and geofencing capabilities. A set of mobility APIs on were added in 2019, with applications such as use in public transport apps and shared bicycle fleet management. “Azure Maps Creator,” which converts private facility floor plans into indoor map data, was also introduced in 2019. Some commentators linked these services to Microsoft's broader development of augmented reality products. In 2020, Azure Maps Visual for Power BI was released, integrating location-based features and mapping capabilities into Microsoft's business intelligence software. An elevation API (which was later retired), geolocation services, and an iOS and Android software development kit were introduced in 2021. In 2022, support for historical weather, air quality, and tropical storm data was made generally available and custom styling for indoor maps was also introduced. In 2023, Azure Maps was certified as HIPAA compliant in a move to target healthcare and health insurance companies. == Functionality == === Geocoding === Geocoding is one of the core functionalities of Azure Maps, converting addresses or place names into geographic coordinates. Batch geocoding is used to process large amounts of address data, a function used for route optimization and spatial analysis. === Reverse geocoding === Reverse geocoding derives human-readable information from geographic coordinates like longitude and latitude, used in navigation and by geographic information systems. === Routing === Azure Maps uses map data and routing algorithms to calculate the shortest or fastest routes between locations based on factors like vehicle size and type, traffic conditions, and distance. Routing also supports multi-modal routing, which include multiple modes of transport in a single trip, including cycling, walking, and ferries. This functionality is used for location-based searches and route optimization in applications like fleet management, proximity marketing, and emergency services as well as logistics and delivery, urban planning, ride sharing apps, and outdoor activities. === Map visualization === The platform supports map visualizations that can be modified to reflect real-time data (including from IoT sensors) as well as historical data patterns. Visualizations include heat maps, street maps, satellite imagery and other custom data layers. Maps are rendered using raster or vector tiles which reduce the load of displaying large data sets or complex maps. This can be used in various applications in areas like transportation, smart cities, retail and marketing, public health, and environmental monitoring. For example, it can be used for tracking the spread of diseases or measuring the impact of changing climatic patterns. === Geofencing and spatial analytics === Azure Maps supports polygonal geofencing, which enables the definition of custom geographic boundaries. Geofenced areas can be monitored in real-time for events of interest. For example, an application could send an alert when equipment or persons enter or leave a defined area. Tools for analyzing historical geofencing data are also available via the APIs for optimization purposes. == Industry usage == Azure Maps' geofencing function has seen usage in the construction industry, designating hazardous areas for safety purposes and sending alerts if anyone enters the area. Private facility maps are used by construction companies for monitoring large construction sites to increase productivity and prevent accidents or damage. In emergency management, New Zealand based company Beca has used Azure Maps to provide analysis on the impact of earthquakes to users, including information on the severity and location of an earthquake and the impact on affected properties. Alaska's Department of Transportation uses Azure Maps as part of an information system providing weather-related warnings and analytics to road crews. Airmap, an airspace management platform for drones, uses Azure Maps. Azure Maps has also been used in conjunction with Azure Monitor for risk monitoring by an insurance company. Other companies that use or have used Azure Maps include BMW, Banco Santander, Jvion, MV Transportation, C.H. Robertson, Wise Skulls, Tata Consultancy Services, Providence Health and Services, Gas Brasiliano Distribuidora S.A., Shell plc, Persistent Systems, Phase 2 Dining and Entertainment, Symbio, HID, Globant, and Insight Enterprises. == Partnerships == Azure Maps and TomTom have been partners since 2016, and TomTom provides location data to Azure Maps and can process data from Azure Maps for mapping purposes. In 2021, Azure Maps partnered with AccuWeather to make climatic data available via its APIs, making weather data along all parts of calculated routes available for mobility and logistics purposes. Microsoft has partnered with Esri, the developer of ArcGIS, and there is cross-compatibility between Azure and ArcGIS so that data from Azure Maps can be integrated into ArcGIS and vice versa. Azure Maps partnered with Moovit in 2019, a startup providing software that interfaces with public transport data. Moovit's database on global public transit networks, including information on which stations and facilities are wheelchair accessible, was linked to Azure Maps. This service was noted for its use increasing accessibility to public transport for the visually impaired by means of voice activated route planning assistance. NORAD has used some Azure Maps functions for their NORAD Tracks Santa website during Christmas holidays. == Components == === REST APIs === Various APIs cover the major functionalities across Azure Maps: Data registry API Geolocation API Render API Route API Search API Spatial API Time zone API Traffic API Weather API === SDKs === Azure Maps SDKs uses MapLibre-style specifications and open source MapLibre GL-based libraries as a rendering engine. The Web SDK is used for developing web apps with maps and location-based data and functionality. It includes a map control module as well as modules with drawing tools. It also supports Azure Maps Creator and various spatial data formats. The platform also includes a set of REST SDKs for developers integrating Azure Maps REST APIs into Python, C#, Java or JavaScript applications. Azure Maps also includes Android and iOS SDKs used for developing applications for Android and Apple devices. === Azure Maps Creator === Azure Maps Creator is a tool for generating custom maps for locations like large office complexes, construction sites, or university campuses. These maps can then be integrated into applications and used with other Azure Maps functions for purposes such as wayfinding and maintenance and security in building automation contexts. === Azure Maps Visual for Power BI === Azure Maps is integrated with Microsoft Power BI, a graphical tool for producing data visualizations. Since July 2020, Power BI can be used in conjunction with Azure Maps for developing map-based data visualizations. This functionality entered general availability in May 2023. Partial-order planning is an approach to automated planning that maintains a partial ordering between actions and only commits ordering between actions when forced to, that is, ordering of actions is partial. Also this planning doesn't specify which action will come out first when two actions are processed. By contrast, total-order planning maintains a total ordering between all actions at every stage of planning. Given a problem in which some sequence of actions is needed to achieve a goal, a partial-order plan specifies all actions that must be taken, but specifies an ordering between actions only where needed. Consider the following situation: a person must travel from the start to the end of an obstacle course. The course is composed of a bridge, a see-saw, and a swing-set. The bridge must be traversed before the see-saw and swing-set are reachable. Once reachable, the see-saw and swing-set can be traversed in any order, after which the end is reachable. In a partial-order plan, ordering between these obstacles is specified only when needed. The bridge must be traversed first. Second, either the see-saw or swing-set can be traversed. Third, the remaining obstacle can be traversed. Then the end can be traversed. Partial-order planning relies upon the principle of least commitment for its efficiency. == Partial-order plan == A partial-order plan or partial plan is a plan which specifies all actions that must be taken, but only specifies the order between actions when needed. It is the result of a partial-order planner. A partial-order plan consists of four components: A set of actions (also known as operators). A partial order for the actions. It specifies the conditions about the order of some actions. A set of causal links. It specifies which actions meet which preconditions of other actions. Alternatively, a set of bindings between the variables in actions. A set of open preconditions. It specifies which preconditions are not fulfilled by any action in the partial-order plan. To keep the possible orders of the actions as open as possible, the set of order conditions and causal links must be as small as possible. A plan is a solution if the set of open preconditions is empty. A linearization of a partial order plan is a total order plan derived from the particular partial order plan; in other words, both order plans consist of the same actions, with the order in the linearization being a linear extension of the partial order in the original partial order plan. === Example === For example, a plan for baking a cake might start: go to the store get eggs; get flour; get milk pay for all goods go to the kitchen This is a partial plan because the order for finding eggs, flour and milk is not specified, the agent can wander around the store reactively accumulating all the items on its shopping list until the list is complete. == Partial-order planner == A partial-order planner is an algorithm or program which will construct a partial-order plan and search for a solution. The input is the problem description, consisting of descriptions of the initial state, the goal and possible actions. The problem can be interpreted as a search problem where the set of possible partial-order plans is the search space. The initial state would be the plan with the open preconditions equal to the goal conditions. The final state would be any plan with no open preconditions, i.e. a solution. The initial state is the starting conditions, and can be thought of as the preconditions to the task at hand. For a task of setting the table, the initial state could be a clear table. The goal is simply the final action that needs to be accomplished, for example setting the table. The operators of the algorithm are the actions by which the task is accomplished. For this example there may be two operators: lay (tablecloth), and place (glasses, plates, and silverware). === Plan space === The plan space of the algorithm is constrained between its start and finish. The algorithm starts, producing the initial state and finishes when all parts of the goal have been achieved. In the setting a table example, two types of actions exist that must be addressed: the put-out and lay operators. Four unsolved operators also exist: Action 1, lay-tablecloth, Action 2, Put-out (plates), Action 3, Put-out (silverware), and Action 4, Put-out (glasses). However, a threat arises if Action 2, 3, or 4 comes before Action 1. This threat is that the precondition to the start of the algorithm will be unsatisfied as the table will no longer be clear. Thus, constraints exist that must be added to the algorithm that force Actions 2, 3, and 4 to come after Action 1. Once these steps are completed, the algorithm will finish and the goal will have been completed. === Threats === As seen in the algorithm presented above, partial-order planning can encounter certain threats, meaning orderings that threaten to break connected actions, thus potentially destroying the entire plan. There are two ways to resolve threats: Promotion Demotion Promotion orders the possible threat after the connection it threatens. Demotion orders the possible threat before the connection it threatens. Partial-order planning algorithms are known for being both sound and complete, with sound being defined as the total ordering of the algorithm, and complete being defined as the capability to find a solution, given that a solution does in fact exist. == Partial-order vs. total-order planning == Partial-order planning is the opposite of total-order planning, in which actions are sequenced all at once and for the entirety of the task at hand. The question arises when one has two competing processes, which one is better? Anthony Barret and Daniel Weld have argued in their 1993 book, that partial-order planning is superior to total-order planning, as it is faster and thus more efficient. They tested this theory using Korf’s taxonomy of subgoal collections, in which they found that partial-order planning performs better because it produces more trivial serializability than total-order planning. Trivial serializability facilitates a planner’s ability to perform quickly when dealing with goals that contain subgoals. Planners perform more slowly when dealing with laboriously serializable or nonserializable subgoals. The determining factor that makes a subgoal trivially or laboriously serializable is the search space of different plans. They found that partial-order planning is more adept at finding the quickest path, and is therefore the more efficient of these two main types of planning. == The Sussman anomaly == Partial-order plans are known to easily and optimally solve the Sussman anomaly. Using this type of incremental planning system solves this problem quickly and efficiently. This was a result of partial-order planning that solidified its place as an efficient planning system. == Disadvantages to partial-order planning == One drawback of this type of planning system is that it requires a lot more computational power for each node. This higher per-node cost occurs because the algorithm for partial-order planning is more complex than others. This has important artificial intelligence implications. When coding a robot to do a certain task, the creator needs to take into account how much energy is needed. Though a partial-order plan may be quicker it may not be worth the energy cost for the robot. The creator must be aware of and weigh these two options to build an efficient robot.Metaclass (knowledge representation)
Ari Holtzman
DARPA AlphaDogfight
Azure Maps
Partial-order planning