Ho–Kashyap algorithm

Ho–Kashyap algorithm

The Ho–Kashyap algorithm is an iterative method in machine learning for finding a linear decision boundary that separates two linearly separable classes. It was developed by Yu-Chi Ho and Rangasami L. Kashyap in 1965, and usually presented as a problem in linear programming. == Setup == Given a training set consisting of samples from two classes, the Ho–Kashyap algorithm seeks to find a weight vector w {\displaystyle \mathbf {w} } and a margin vector b {\displaystyle \mathbf {b} } such that: Y w = b {\displaystyle \mathbf {Yw} =\mathbf {b} } where Y {\displaystyle \mathbf {Y} } is the augmented data matrix with samples from both classes (with appropriate sign conventions, e.g., samples from class 2 are negated), w {\displaystyle \mathbf {w} } is the weight vector to be determined, and b {\displaystyle \mathbf {b} } is a positive margin vector. The algorithm minimizes the criterion function: J ( w , b ) = | | Y w − b | | 2 {\displaystyle J(\mathbf {w} ,\mathbf {b} )=||\mathbf {Yw} -\mathbf {b} ||^{2}} subject to the constraint that b > 0 {\displaystyle \mathbf {b} >\mathbf {0} } (element-wise). Given a problem of linearly separating two classes, we consider a dataset of elements { ( x i , y i ) } i ∈ 1 : N {\displaystyle \{(\mathbf {x_{i}} ,y_{i})\}_{i\in 1:N}} where y i ∈ { − 1 , + 1 } {\displaystyle y_{i}\in \{-1,+1\}} . Linearly separating them by a perceptron is equivalent to finding weight and bias w , b {\displaystyle \mathbf {w} ,b} for a perceptron, such that: [ y 1 x 1 1 ⋮ ⋮ y N x N 1 ] [ w b ] > 0 {\displaystyle {\begin{bmatrix}y_{1}\mathbf {x} _{1}&1\\\vdots &\vdots \\y_{N}\mathbf {x} _{N}&1\\\end{bmatrix}}{\begin{bmatrix}\mathbf {w} \\b\end{bmatrix}}>0} == Algorithm == The idea of the Ho–Kashyap algorithm is as follows: Given any b {\displaystyle \mathbf {b} } , the corresponding w {\displaystyle \mathbf {w} } is known: It is simply w = Y + b {\displaystyle \mathbf {w} =\mathbf {Y} ^{+}\mathbf {b} } , where Y + {\displaystyle \mathbf {Y} ^{+}} denotes the Moore–Penrose pseudoinverse of Y {\displaystyle \mathbf {Y} } . Therefore, it only remains to find b {\displaystyle \mathbf {b} } by gradient descent. However, the gradient descent may sometimes decrease some of the coordinates of b {\displaystyle \mathbf {b} } , which may cause some coordinates of b {\displaystyle \mathbf {b} } to become negative, which is undesirable. Therefore, whenever some coordinates of b {\displaystyle \mathbf {b} } would have decreased, those coordinates are unchanged instead. As for the coordinates of b {\displaystyle \mathbf {b} } that would increase, those would increase without issue. Formally, the algorithm is as follows: Initialization: Set b ( 0 ) {\displaystyle \mathbf {b} (0)} to an arbitrary positive vector, typically b ( 0 ) = 1 {\displaystyle \mathbf {b} (0)=\mathbf {1} } (a vector of ones). Set the iteration counter k = 0 {\displaystyle k=0} . Set w ( 0 ) = Y + b ( 0 ) {\displaystyle \mathbf {w} (0)=\mathbf {Y} ^{+}\mathbf {b} (0)} Loop until convergence, or until iteration counter exceeds some k m a x {\displaystyle k_{max}} . Error calculation: Compute the error vector: e ( k ) = Y w ( k ) − b ( k ) {\displaystyle \mathbf {e} (k)=\mathbf {Yw} (k)-\mathbf {b} (k)} . Margin update: Update the margin vector: b ( k + 1 ) = b ( k ) + 2 η k ( e ( k ) + | e ( k ) | ) {\displaystyle \mathbf {b} (k+1)=\mathbf {b} (k)+2\eta _{k}(\mathbf {e} (k)+|\mathbf {e} (k)|)} where η k {\displaystyle \eta _{k}} is a positive learning rate parameter, and | e ( k ) | {\displaystyle |\mathbf {e} (k)|} denotes the element-wise absolute value. Weight calculation: Compute the weight vector using the pseudoinverse: w ( k + 1 ) = Y + b ( k + 1 ) {\displaystyle \mathbf {w} (k+1)=\mathbf {Y} ^{+}\mathbf {b} (k+1)} . Convergence check: If | | e ( k ) | | ≤ θ {\displaystyle ||\mathbf {e} (k)||\leq \theta } for some predetermined threshold θ {\displaystyle \theta } (close to zero), then return b ( k + 1 ) , w ( k + 1 ) {\displaystyle \mathbf {b} (k+1),\mathbf {w} (k+1)} . if e ( k ) ≤ 0 {\displaystyle \mathbf {e} (k)\leq \mathbf {0} } (all components non-positive), return "Samples not separable.". Return "Algorithm failed to converge in time.". == Properties == If the training data is linearly separable, the algorithm converges to a solution (where e ( k ) = 0 {\displaystyle \mathbf {e} (k)=\mathbf {0} } ) in a finite number of iterations. If the data is not linearly separable, the algorithm may or may not ever reach the point where e ( k ) = 0 {\displaystyle \mathbf {e} (k)=\mathbf {0} } . However, if it does happen that e ( k ) ≤ 0 {\displaystyle \mathbf {e} (k)\leq \mathbf {0} } at some iteration, this proves non-separability. The convergence rate depends on the choice of the learning rate parameter ρ {\displaystyle \rho } and the degree of linear separability of the data. == Relationship to other algorithms == Perceptron algorithm: Both seek linear separators. The perceptron updates weights incrementally based on individual misclassified samples, while Ho–Kashyap is a batch method that processes all samples to compute the pseudoinverse and updates based on an overall error vector. Linear discriminant analysis (LDA): LDA assumes underlying Gaussian distributions with equal covariances for the classes and derives the decision boundary from these statistical assumptions. Ho–Kashyap makes no explicit distributional assumptions and instead tries to solve a system of linear inequalities directly. Support vector machines (SVM): For linearly separable data, SVMs aim to find the maximum-margin hyperplane. The Ho–Kashyap algorithm finds a separating hyperplane but not necessarily the one with the maximum margin. If the data is not separable, soft-margin SVMs allow for some misclassifications by optimizing a trade-off between margin size and misclassification penalty, while Ho–Kashyap provides a least-squares solution. == Variants == Modified Ho–Kashyap algorithm changes weight calculation step w ( k + 1 ) = Y + b ( k + 1 ) {\displaystyle \mathbf {w} (k+1)=\mathbf {Y} ^{+}\mathbf {b} (k+1)} to w ( k + 1 ) = w ( k ) + η k Y + | e ( k ) | {\displaystyle \mathbf {w} (k+1)=\mathbf {w} (k)+\eta _{k}\mathbf {Y} ^{+}|\mathbf {e} (k)|} . Kernel Ho–Kashyap algorithm: Applies kernel methods (the "kernel trick") to the Ho–Kashyap framework to enable non-linear classification by implicitly mapping data to a higher-dimensional feature space.

Symbaloo

Symbaloo is a cloud-based site that allows users to organize and categorize web links in the form of buttons. Symbaloo works from a web browser and can be configured as a homepage, allowing users to create a personalized virtual desktop accessible from any device with an Internet connection. Symbaloo users, which must be previously registered, have a page with a grid of buttons that can be configured to link to a specific page. The site allows users to assign different colors to the buttons for easy visual classification. Symbaloo allows a single user to create different pages or screens with buttons. These screens called webmix are useful to separate topics and links can be shared with other users, making them public and sending the link via email. As of 2015 Symbaloo has 6 million users worldwide and mainly used as an online education resource. Symbaloo's slogan is "Start Simple".

Golden record (informatics)

In informatics, a golden record is the valid version of a data element (record) in a single source of truth system. It may refer to a database, specific table or data field, or any unit of information used. A golden copy is a consolidated data set, and is supposed to provide a single source of truth and a "well-defined version of all the data entities in an organizational ecosystem". Other names sometimes used include master source or master version. The term has been used in conjunction with data quality, master data management, and similar topics. (Different technical solutions exist, see master data management). == Master data == In master data management (MDM), the golden copy refers to the master data (master version) of the reference data which works as an authoritative source for the "truth" for all applications in a given IT landscape.

Schema crosswalk

A schema crosswalk is a table that shows equivalent elements (or "fields") in more than one database schema. It maps the elements in one schema to the equivalent elements in another. Crosswalk tables are often employed within or in parallel to enterprise systems, especially when multiple systems are interfaced or when the system includes legacy system data. In the context of Interfaces, they function as an internal extract, transform, load (ETL) mechanism. For example, this is a metadata crosswalk from MARC standards to Dublin Core: Crosswalks show people where to put the data from one scheme into a different scheme. They are often used by libraries, archives, museums, and other cultural institutions to translate data to or from MARC standards, Dublin Core, Text Encoding Initiative (TEI), and other metadata schemes. For example, an archive has a MARC record in its catalog describing a manuscript. Suppose the archive makes a digital copy of that manuscript and wants to display it on the web along with the information from the catalog. In that case, it will have to translate the data from the MARC catalog record into a different format, such as Metadata Object Description Schema, that is viewable on a webpage. Because MARC has various fields than MODS, decisions must be made about where to put the data into MODS. This type of "translating" from one format to another is often called "metadata mapping" or "field mapping," and is related to "data mapping", and "semantic mapping". Crosswalks also have several technical capabilities. They help databases using different metadata schemes to share information. They help metadata harvesters create union catalogs. They enable search engines to search multiple databases simultaneously with a single query. == Challenges for crosswalks == One of the biggest challenges for crosswalks is that no two metadata schemes are 100% equivalent. One scheme may have a field that doesn't exist in another scheme or a field that is split into two different fields in another scheme; this is why data is often lost when mapping from a complex scheme to a simpler one. For example, when mapping from MARC to Simple Dublin Core, the distinction between types of titles is lost: Simple Dublin Core only has one "Title" element, so all of the different types of MARC titles get lumped together without further distinctions. A future attempt to convert the metadata back into MARC would enter the information in the basic MARC 245 Title Statement field, with none of the original distinctions. This is why crosswalks are said to be "lateral" (one-way) mappings from one scheme to another. Separate crosswalks would be required to map from scheme A to scheme B and from scheme B to scheme A. === Difficulties in mapping === Other mapping problems arise when: One scheme has one element that needs to be split up with different parts of it placed in multiple other elements in the second scheme ("one-to-many" mapping) One scheme allows an element to be repeated more than once while another only allows that element to appear once with multiple terms in it Schemes have different data formats (e.g. John Doe or Doe, John) An element in one scheme is indexed, but the equivalent element in the other scheme is not Schemes may use different controlled vocabularies Schemes change their standards over time Some of these problems are not fixable. As Karen Coyle says in "Crosswalking Citation Metadata: The University of California's Experience," "The more metadata experience we have, the more it becomes clear that metadata perfection is not attainable, and anyone who attempts it will be sorely disappointed. When metadata is crosswalked between two or more unrelated sources, there will be data elements that cannot be reconciled in an ideal manner. The key to a successful metadata crosswalk is intelligent flexibility. It is essential to focus on the important goals and be willing to compromise to reach a practical conclusion to projects."

AI notetaker

An AI notetaker is a tool using artificial intelligence to take notes during meetings. They are created by tech companies such as Microsoft and Google; by AI transcription services such Otter.ai, and by smaller firms such as Cluely and Krisp. Some business executives send AI notetakers to attend meetings not only to take notes, but also to answer questions on their behalf. The use of AI notetakers raises ethical questions, including recording meetings without the consent of all participants and the possibility that the notetaker will hallucinate and misrepresent what was said during meetings. There are also concerns when it comes to the privacy and security of meeting data and the sensitive information that lives inside meetings. Further controversies have developed from the use of AI notetakers such as Cluely to cheat in technical job interviews. == Technology == Large technology companies have integrated transcription capabilities into broader productivity and accessibility tools, including real-time captioning, dictation, and meeting documentation features embedded in operating systems and office platforms. Standalone transcription platforms, such as Transkriptor, focus specifically on automated transcription workflows and apply AI-based speech recognition to convert audio and video recordings into text. The software supports transcription in multiple languages and processes recordings uploaded via a web interface as well as through mobile and browser extensions. Tools of this type typically provide editable, time-aligned transcripts and export options for text and subtitle formats, cloud-based processing, multilingual support, and automation in transcription technology.

Visualization (graphics)

Visualization (or visualisation in Commonwealth English; see spelling differences), also known as graphics visualization, is any technique for creating images, diagrams, or animations to communicate a message. Visualization through visual imagery has been an effective way to communicate both abstract and concrete ideas since the dawn of humanity. Examples from history include cave paintings, Egyptian hieroglyphs, Greek geometry, and Leonardo da Vinci's revolutionary methods of technical drawing for engineering purposes that actively involve scientific requirements. Visualization today has ever-expanding applications in science, education, engineering (e.g., product visualization), interactive multimedia, medicine, etc. Typical of a visualization application is the field of computer graphics. The invention of computer graphics (and 3D computer graphics) may be the most important development in visualization since the invention of central perspective in the Renaissance period. The development of animation also helped advance visualization. == Overview == The use of visualization to present information is not a new phenomenon. It has been used in maps, scientific drawings, and data plots for over a thousand years. Examples from cartography include Ptolemy's Geographia (2nd century AD), a map of China (1137 AD), and Minard's map (1861) of Napoleon's invasion of Russia a century and a half ago. Most of the concepts learned in devising these images carry over in a straightforward manner to computer visualization. Edward Tufte has written three critically acclaimed books that explain many of these principles. Computer graphics has from its beginning been used to study scientific problems. However, in its early days the lack of graphics power often limited its usefulness. The recent emphasis on visualization started in 1987 with the publication of Visualization in Scientific Computing, a special issue of Computer Graphics. Since then, there have been several conferences and workshops, co-sponsored by the IEEE Computer Society and ACM SIGGRAPH, devoted to the general topic, and special areas in the field, for example volume visualization. Most people are familiar with the digital animations produced to present meteorological data during weather reports on television, though few can distinguish between those models of reality and the satellite photos that are also shown on such programs. TV also offers scientific visualizations when it shows computer drawn and animated reconstructions of road or airplane accidents. Some of the most popular examples of scientific visualizations are computer-generated images that show real spacecraft in action, out in the void far beyond Earth, or on other planets. Dynamic forms of visualization, such as educational animation or timelines, have the potential to enhance learning about systems that change over time. Apart from the distinction between interactive visualizations and animation, the most useful categorization is probably between abstract and model-based scientific visualizations. The abstract visualizations show completely conceptual constructs in 2D or 3D. These generated shapes are completely arbitrary. The model-based visualizations either place overlays of data on real or digitally constructed images of reality or make a digital construction of a real object directly from the scientific data. Scientific visualization is usually done with specialized software, though there are a few exceptions, noted below. Some of these specialized programs have been released as open source software, having very often its origins in universities, within an academic environment where sharing software tools and giving access to the source code is common. There are also many proprietary software packages of scientific visualization tools. Models and frameworks for building visualizations include the data flow models popularized by systems such as AVS, IRIS Explorer, and VTK toolkit, and data state models in spreadsheet systems such as the Spreadsheet for Visualization and Spreadsheet for Images. == Applications == === Scientific visualization === As a subject in computer science, scientific visualization is the use of interactive, sensory representations, typically visual, of abstract data to reinforce cognition, hypothesis building, and reasoning. Scientific visualization is the transformation, selection, or representation of data from simulations or experiments, with an implicit or explicit geometric structure, to allow the exploration, analysis, and understanding of the data. Scientific visualization focuses and emphasizes the representation of higher order data using primarily graphics and animation techniques. It is a very important part of visualization and maybe the first one, as the visualization of experiments and phenomena is as old as science itself. Traditional areas of scientific visualization are flow visualization, medical visualization, astrophysical visualization, and chemical visualization. There are several different techniques to visualize scientific data, with isosurface reconstruction and direct volume rendering being the more common. === Data and information visualization === Data visualization is a related subcategory of visualization dealing with statistical graphics and geospatial data (as in thematic cartography) that is abstracted in schematic form. Information visualization concentrates on the use of computer-supported tools to explore large amount of abstract data. The term "information visualization" was originally coined by the User Interface Research Group at Xerox PARC and included Jock Mackinlay. Practical application of information visualization in computer programs involves selecting, transforming, and representing abstract data in a form that facilitates human interaction for exploration and understanding. Important aspects of information visualization are dynamics of visual representation and the interactivity. Strong techniques enable the user to modify the visualization in real-time, thus affording unparalleled perception of patterns and structural relations in the abstract data in question. === Educational visualization === Educational visualization is using a simulation to create an image of something so it can be taught about. This is very useful when teaching about a topic that is difficult to otherwise see, for example, atomic structure, because atoms are far too small to be studied easily without expensive and difficult to use scientific equipment. === Knowledge visualization === The use of visual representations to transfer knowledge between at least two persons aims to improve the transfer of knowledge by using computer and non-computer-based visualization methods complementarily. Thus properly designed visualization is an important part of not only data analysis but knowledge transfer process, too. Knowledge transfer may be significantly improved using hybrid designs as it enhances information density but may decrease clarity as well. For example, visualization of a 3D scalar field may be implemented using iso-surfaces for field distribution and textures for the gradient of the field. Examples of such visual formats are sketches, diagrams, images, objects, interactive visualizations, information visualization applications, and imaginary visualizations as in stories. While information visualization concentrates on the use of computer-supported tools to derive new insights, knowledge visualization focuses on transferring insights and creating new knowledge in groups. Beyond the mere transfer of facts, knowledge visualization aims to further transfer insights, experiences, attitudes, values, expectations, perspectives, opinions, and estimates in different fields by using various complementary visualizations. See also: picture dictionary, visual dictionary === Product visualization === Product visualization involves visualization software technology for the viewing and manipulation of 3D models, technical drawing and other related documentation of manufactured components and large assemblies of products. It is a key part of product lifecycle management. Product visualization software typically provides high levels of photorealism so that a product can be viewed before it is actually manufactured. This supports functions ranging from design and styling to sales and marketing. Technical visualization is an important aspect of product development. Originally technical drawings were made by hand, but with the rise of advanced computer graphics the drawing board has been replaced by computer-aided design (CAD). CAD-drawings and models have several advantages over hand-made drawings such as the possibility of 3-D modeling, rapid prototyping, and simulation. 3D product visualization promises more interactive experiences for online shoppers, but also challenges retailers to overcome hurdles in the production of 3D content, as large-scale 3D content production can be extremel

Artificial intelligence in government

Artificial intelligence (AI) has a range of uses in government. It can be used to further public policy objectives (in areas such as emergency services, health and welfare), as well as assist the public to interact with the government (through the use of virtual assistants, for example). According to the Harvard Business Review, "Applications of artificial intelligence to the public sector are broad and growing, with early experiments taking place around the world." Hila Mehr from the Ash Center for Democratic Governance and Innovation at Harvard University notes that AI in government is not new, with postal services using machine methods in the late 1990s to recognise handwriting on envelopes to automatically route letters. The use of AI in government comes with significant benefits, including efficiencies resulting in cost savings (for instance by reducing the number of front office staff) and reducing the opportunities for corruption. However, it also carries risks (described below). == Uses of AI in government == The potential uses of AI in government are wide and varied, with Deloitte considering that "Cognitive technologies could eventually revolutionize every facet of government operations". Mehr suggests that six types of government problems are appropriate for AI applications: Resource allocation—such as where administrative support is required to complete tasks more quickly. Large datasets—where these are too large for employees to work efficiently and multiple datasets could be combined to provide greater insights. Expert shortage—including where basic questions could be answered and niche issues can be learned. Predictable scenario—historical data makes the situation predictable. Procedural tasks refer to repetitive tasks in which the answers to inputs or outputs are binary. Diverse data—where data takes various forms (such as visual and linguistic) and needs to be summarized regularly. Mehr states that "While applications of AI in government work have not kept pace with the rapid expansion of AI in the private sector, the potential use cases in the public sector mirror common applications in the private sector." Potential and actual uses of AI in government can be divided into three broad categories: those that contribute to public policy objectives, those that assist public interactions with the government, and other uses. === Contributing to public policy objectives === There are a range of examples of where AI can contribute to public policy objectives. These include: Receiving benefits at job loss, retirement, bereavement and child birth almost immediately, in an automated way (thus without requiring any actions from citizens at all) Social insurance service provision Classifying emergency calls based on their urgency (like the system used by the Cincinnati Fire Department in the United States) Detecting and preventing the spread of diseases Assisting public servants in making welfare payments and immigration decisions Adjudicating bail hearings Triaging health care cases Monitoring social media for public feedback on policies Monitoring social media to identify emergency situations Identifying fraudulent benefits claims Predicting a crime and recommending optimal police presence Predicting traffic congestion and car accidents Anticipating road maintenance requirements Identifying breaches of health regulations Providing personalised education to students Marking exam papers Assisting with defence and national security (see Artificial intelligence § Military and Applications of artificial intelligence § Other fields in which AI methods are implemented respectively) Artificial Intelligence in China has been used to drive both political and economic markets. In 2019, Shanghai’s government rolled out 100 billion yuan to assist in funding enterprises that used AI to introduce 22 new policy agendas. Shanghai invested in these enterprises to attract top international talent in order to set up the Shanghai Municipal Big Data Center. City Brain AI is an urban management platform made by Alibaba. China uses City Brain AI to maintain a significant share of capital investment through public and state owned enterprises. The synergy between public and private sectors are more than capital-driven with City Brain AI. The blend of both public and private shareholding is only made out to be through the role of provincial and sub-provincial governments. Both hold control over the direction that City Brain AI makes both socially and economically. === Assisting public interactions with government === AI can be used to assist members of the public to interact with government and access government services, for example by: Answering questions using virtual assistants or chatbots (see below) Directing requests to the appropriate area within government Filling out forms Assisting with searching documents (e.g. IP Australia's trade mark search) Scheduling appointments Various governments, including those of Australia and Estonia, have implemented virtual assistants to aid citizens in navigating services, with applications ranging from tax inquiries to life-event registrations. === Gerrymandering === Gerrymandering is a method of influencing political process by drawing map boundaries in favor of incumbent parties. Academic researchers Wendy Tam Cho and Bruce Cain have proposed partially automating the map-drawing process with an AI system to reduce partisan gerrymandering. Even with this AI system, the process may still be manipulated to favor partisan interests, so the researchers emphasized the importance of transparency and human involvement. === Other uses === Other uses of AI in government include: Translation Language interpretation pioneered by the European Commission's Directorate General for Interpretation and Florika Fink-Hooijer. Drafting documents == Potential benefits == AI offers potential efficiencies and cost savings for the government. For example, Deloitte has estimated that automation could save US Government employees between 96.7 million to 1.2 billion hours a year, resulting in potential savings of between $3.3 billion to $41.1 billion a year. The Harvard Business Review has stated that while this may lead a government to reduce employee numbers, "Governments could instead choose to invest in the quality of its services. They can re-employ workers' time towards more rewarding work that requires lateral thinking, empathy, and creativity—all things at which humans continue to outperform even the most sophisticated AI program." == Risks == Risks associated with the use of AI in government include AI becoming susceptible to bias, a lack of transparency in how an AI application may make decisions, and the accountability for any such decisions. For example, a 2026 lawsuit alleged that the U.S. Department of Government Efficiency used ChatGPT to flag and cancel federal humanities grants, including projects on Jewish history and Israeli culture, over some objections from NEH officials, illustrating how automated decision-making could affect funding outcomes.