AI Code Unlimited

AI Code Unlimited — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Elastic cloud storage

    Elastic cloud storage

    An elastic cloud is a cloud computing offering that provides variable service levels based on changing needs. Elasticity is an attribute that can be applied to most cloud services. It states that the capacity and performance of any given cloud service can expand or contract according to a customer's requirements and that this can potentially be changed automatically as a consequence of some software-driven event or, at worst, can be reconfigured quickly by the customer's infrastructure management team. Elasticity has been described as one of the five main principles of cloud computing by Rosenburg and Mateos in The Cloud at Your Service - Manning 2011. == History == Cloud computing was first described by Gillet and Kapor in 1996; however, the first practical implementation was a consequence of a strategy to leverage Amazon's excess data center capacity. Amazon and other pioneers of the commercial use of this technology were primarily interested in providing a “public” cloud service, whereby they could offer customers the benefits of using the cloud, particularly the utility-based pricing model benefit. Other suppliers followed suit with a range of cloud-based models all offering elasticity as a core component, but these suppliers were only offering this service as an element of their public cloud service. Due to perceived weaknesses in security, or at least a lack of proven compliance, many organizations, particularly in the financial and public sectors, have been slow adopters of cloud technologies. These wary organizations can achieve some of the benefits of cloud computing by adopting private cloud technologies. An alternative form of the elastic cloud has been offered by vendors such as EMC and IBM, whereby the service is based around an enterprise's own infrastructure but still retains elements of elasticity and the potential to bill by consumption. == Description == Elasticity in cloud computing is the ability for the organization to adjust its storage requirements in terms of capacity and processing with respect to operational requirements. This has the following benefits: Operational Benefits - Services can be acquired quickly, meaning that the evolving requirements of the business can be addressed almost immediately, giving an organization a potential agility advantage. A properly implemented elastic system will provision/de-provision according to application demands, so if a particular business has activity spikes then the provision can be enabled to match the demand and the capacity can be re-allocated. Research and Development (R&D) Projects - R&D activities are no longer hindered by a requirement to secure a capex budget prior to a project starting. Capability can simply be provisioned from the cloud and released at the end of the exercise. Testing and Deployment - With most large-scale projects a size test needs to be performed prior to final rollout. By taking advantage of the elasticity of the cloud and creating a full-scale avatar of the proposed production system, realistic data and traffic volumes can be provisioned and released as needed. Expensive Resources Allocated - This will normally apply only in the context where a customer is applying at least some of their own servers as part of a cloud infrastructure, specifically where a business (for performance reasons) has decided to invest in solid-state storage as opposed to spinning platters. There are instances when, due to activity spikes, a less critical process may need to be moved from the high-performance resources to more traditional storage. Server Specification - When a customer has elected to own/lease hardware, they can select and specify servers that are specifically tuned to meet the likely needs of their operation (i.e., directly controlling the cost/benefit equation). Utility Based Payments - There is, of course, a key cost driver in this process, and the notion that you should pay for what you consume is acceptable for many organizations. When hardware capacity is sourced internally, organizations need to over-provision. This applies just as much to traditional outsourcing as it does to capex-related expenditure on in-house servers. Cloud Platform – At the heart of any cloud storage system is the ability to manage hyperscale object storage and a Hadoop Distributed Files System (HDFS). Elastic storage capability is particularly well suited to hyperscale and Hadoop environments, where its capability to rapidly respond to changing circumstances and priorities is essential

    Read more →
  • Principle of rationality

    Principle of rationality

    The principle of rationality (or rationality principle) was coined by Karl R. Popper in his Harvard Lecture of 1963, and published in his book Myth of Framework. It is related to what he called the 'logic of the situation' in an Economica article of 1944/1945, published later in his book The Poverty of Historicism. According to Popper's rationality principle, agents act in the most adequate way according to the objective situation. It is an idealized conception of human behavior which he used to drive his model of situational analysis. Cognitive scientist Allen Newell elaborated on the principle in his account of knowledge level modeling. == Popper == Popper called for social science to be grounded in what he called situational analysis or situational logic. This requires building models of social situations which include individual actors and their relationship to social institutions, e.g. markets, legal codes, bureaucracies, etc. These models attribute certain aims and information to the actors. This forms the 'logic of the situation', the result of reconstructing meticulously all circumstances of an historical event. The 'principle of rationality' is the assumption that people are instrumental in trying to reach their goals, and this is what drives the model. Popper believed that this model could be continuously refined to approach the objective truth. Popper called his principle of rationality nearly empty (a technical term meaning without empirical content) and strictly speaking false, but nonetheless tremendously useful. These remarks earned him a lot of criticism because seemingly he had swerved from his famous Logic of Scientific Discovery. Among the many philosophers having discussed Popper's principle of rationality from the 1960s up to now are Noretta Koertge, R. Nadeau, Viktor J. Vanberg, Hans Albert, E. Matzner, Ian C. Jarvie, Mark A. Notturno, John Wettersten, Ian C. Böhm. == Newell == In the context of knowledge-based systems, Newell (in 1982) proposed the following principle of rationality: "If an agent has knowledge that one of its actions will lead to one of its goals, then the agent will select that action." This principle is employed by agents at the knowledge level to move closer to a desired goal. An important philosophical difference between Newell and Popper is that Newell argued that the knowledge level is real in the sense that it exists in nature and is not made up. This allowed Newell to treat the rationality principle as a way of understanding nature and avoid the problems Popper ran into by treating knowledge as non physical and therefore non empirical.

    Read more →
  • Human-centered AI

    Human-centered AI

    Human-centered AI is the initiative at the intersection of the fields of artificial intelligence (AI) and human-computer interaction (HCI) to develop AI systems in a way that prioritizes human values, needs, and general flourishing. Emphasis is placed on the recognition that artificial intelligence systems are rapidly changing, and will continue to influence, many aspects of the human experience, in areas ranging from scientific inquiry, governance and policy, labor and the economy, and creative expression, with an aim set to adapt current developments and guide future developments on a trajectory which is most beneficial to the human population at large, with the goal of augmenting human intelligence and capacities across these areas, as opposed to replacing them. Particular attention is paid to mitigating negative effects of AI automation on the livelihoods of the labor force, the use of AI in healthcare fields, and imbuing AI systems with societal values. Human-centered AI is linked to related endeavors in AI alignment and AI safety, but while these fields primarily focus on mitigating risks posed by AI that is unaligned to human values and/or uncontrollable AI self-development, human-centered AI places significant focus in exploring how AI systems can augment human capacities and serve as collaborators. == Conceptual history == The importance of the alignment of artificial intelligence development towards human values in some sense predates artificial intelligence itself, as before the modern conception of artificial intelligence as coined at the 1956 Dartmouth Workshop, the conception of robots as constructed, autonomous agents entered the cultural consciousness as early as the 1920s, with Karel Capek's Rossum's Universal Robots. The imagined issues relating to robots' aims and values requiring intentional alignment and direction with those of humans followed soon after, most widely known from science fiction author Isaac Asimov’s Three Laws of Robotics, dating to his 1942 short story “Runaround”. Two of the three eponymous laws are directly concerned with robots’ interaction with and positioned deference towards humans, and have in recent times been reexamined in the face of modern AI. In 1985, after artificial intelligence research had taken off and its effects were more acutely conceptualized, Asimov added a Rule Zero, treating robots' relationship with humanity as a whole, distinct from individual humans. While modern artificial intelligence is largely distinct from robotics, the conceptualization of both robots and AI systems as autonomous agents positions this as a foundation for conceptions of human-centered AI. Aside from robots, artificially intelligent autonomous agents in interaction with humans have been conceived of for at least 75 years. In 1950, Alan Turing published his famous "Imitation Game", often also called the Turing Test, a thought experiment that uses human-machine interaction as an assessor for the intelligence of a system. In recent times, artificial intelligence researchers such as Stanford's Erik Brynjolfsson have conceived of rapid AI development leading to a so-called "Turing Trap". == Augmentation and automation == A major stated aim of human-centered AI is to promote the development of AI in ways that augment human capabilities, rather than replacing them. To this end, organizations and initiatives that take a human-centered approach to AI development focus on frameworks that encourage collaboration between humans and artificial intelligence systems to build towards even greater progress, rather than attempting to automate tasks currently handled by humans. Such avenues include everything from data visualization for big data, allowing human engineers to better understand extremely large datasets, allowing for the design of better machine learning models to handle them, to AI-powered sensors to monitor vitals, allowing for better responsiveness from healthcare providers. Many human-centered AI initiatives often position it as a better alternative to the apparent mainstream in AI development, which is primarily concerned with automation. Driven by the pressures of the market economy, AI development that does replace tasks currently performed by humans with automated processes is incentivized, as it allows for greater profit margins; this often comes at the detriment of the human whose performance is replaced, thus leading to an environment wherein human workers are outcompeted by AI systems across various service-sector and technology-based industries. At the same time, automation and augmentation are not always incompatible; a major aim of human-centered AI is towards the automation of rote tasks that would otherwise hinder a human’s productivity or creativity, freeing them to direct their energy and intelligence towards higher-level tasks, thus achieving augmentation through automation. Empirical research in pharmaceutical sales has shown that a human-centered implementation - where work procedures, training, and incentives are designed around individuals' cognitive needs - improves augmentation performance, while implementation without such adaptation can worsen outcomes relative to a legacy system. == Research == Much of the work done on human-centered AI comes from research institutes, within universities, companies, and as freestanding organizations. The Stanford Institute for Human-Centered AI (abbreviated to HAI) is one such group, engaging academics, industry professionals, and policymakers centered in Stanford University to conduct research and inform policy in various areas in human-centered AI, including on aspects of the intelligence itself, augmentation, and on measuring the impacts of AI systems on sociopolitcal and cultural institutions. Similar groups exist at other universities, including the Chicago Human + AI (CHAI) Lab at the University of Chicago, the HCAI@GU group at the University of Gothenburg, and the Human-Centered AI (HAI) Lab at the University of Oxford. Outside of the academy, companies such as IBM have research initiatives dedicated to advancements in human-centered AI. At Kenyon College, the Integrated Program for Humane Studies (IPHS) launched a human-centered AI program in 2016 integrating artificial intelligence research with humanities and social science inquiry. This approach treats computation and humanistic scholarship as a single unified field of research rather than as separate disciplines requiring collaboration. The program's researchers have published in both AI venues (such as the International Conference on Machine Learning and Frontiers of Computer Science) and humanities journals (such as PMLA and Poetics Today), and the lab was selected in December 2025 by Schmidt Sciences for its Humanities and AI Virtual Institute to apply AI methods to cultural heritage preservation.

    Read more →
  • Knowledge integration

    Knowledge integration

    Knowledge integration is the process of synthesizing multiple knowledge models (or representations) into a common model (representation). Compared to information integration, which involves merging information having different schemas and representation models, knowledge integration focuses more on synthesizing the understanding of a given subject from different perspectives. For example, multiple interpretations are possible of a set of student grades, typically each from a certain perspective. An overall, integrated view and understanding of this information can be achieved if these interpretations can be put under a common model, say, a student performance index. The Web-based Inquiry Science Environment (WISE), from the University of California at Berkeley has been developed along the lines of knowledge integration theory. Knowledge integration has also been studied as the process of incorporating new information into a body of existing knowledge with an interdisciplinary approach. This process involves determining how the new information and the existing knowledge interact, how existing knowledge should be modified to accommodate the new information, and how the new information should be modified in light of the existing knowledge. A learning agent that actively investigates the consequences of new information can detect and exploit a variety of learning opportunities; e.g., to resolve knowledge conflicts and to fill knowledge gaps. By exploiting these learning opportunities the learning agent is able to learn beyond the explicit content of the new information. The machine learning program KI, developed by Murray and Porter at the University of Texas at Austin, was created to study the use of automated and semi-automated knowledge integration to assist knowledge engineers constructing a large knowledge base. A possible technique which can be used is semantic matching. More recently, a technique useful to minimize the effort in mapping validation and visualization has been presented which is based on Minimal Mappings. Minimal mappings are high quality mappings such that i) all the other mappings can be computed from them in time linear in the size of the input graphs, and ii) none of them can be dropped without losing property i). The University of Waterloo operates a Bachelor of Knowledge Integration undergraduate degree program as an academic major or minor. The program started in 2008.

    Read more →
  • Computer-aided lean management

    Computer-aided lean management

    Computer-aided lean management, in business management, is a methodology of developing and using software-controlled, lean systems integration. Its goal is to drive innovation towards cost and cycle-time savings. It attempts to create an efficient use of capital and resources through the development and use of one integrated system model to run a business's planning, engineering, design, maintenance, and operations. == Overview == Computer-Aided Lean Management (CALM) is a management philosophy that uses software to reduce risk and inefficiencies. CALM acts on uncertainties and business inefficiencies to increase profitability through the use of computational decision-making tools that enable opportunities for additional value creation. It is based on the application of software to enable continuous improvement through an Integrated System Model (ISM) of the business’s physical assets, business processes, and machine learning. This integration of software applications using lean principles was developed in the aerospace industry and has migrated to the energy industry. The creation of an ISM removes the barriers posed by the silos or stovepipes inherent in the departmentalization of most companies. Integration enables lean uses of information for the creation of actionable knowledge. CALM strives to create such a lean management approach to running the company through the rigors of software enforcement. From this software enforcement comes clear policy and procedures that are adhered to, activity-based costing, measurement of effectiveness, and the capability of using advanced algorithms for dramatic improvements in optimization of resources. CALM creates business capabilities through software to enable technology application, streamlining of processes, and a lean organizational structure. The methodology is based on a common sense approach for running a business, by measuring actions taken and using those measurements to design more efficient processes. == History == CALM was inspired by lean processes and techniques that were already dominant management technologies with a wide diversity of applications and successes. Motorola and General Electric had been known for the concepts of Six Sigma; Boeing had been managing mass (using modular and flexible assembly options), and Toyota combined elements of these methodologies to create the Toyota Production System. Boeing then took the Toyota model and added computer-aided enforcement of lean methodologies throughout the manufacturing process. One of the major sources for CALM's outgrowth was integrated definition (IDEF) modeling in aerospace manufacturing that was pioneered by the U.S. Air Force in the 1970s. IDEF is a methodology designed to model the end-to-end decisions, actions, and activities of an organization or system so that costs, performance, and cycle times can be optimized. IDEF methods have been adapted for wider use in automotive, aerospace, pharmaceuticals, and software development industries. IDEF methods serve as a starting point to understand lean management through semantic data modeling. The IDEF process begins by mapping the existing functions of an enterprise, creating a graphical model, or road map, that shows what controls each important function, who performs it, what resources are required for carrying it out, what it produces, how much it costs, and what relationships it has to other functions of the organization. IDEF simulations have been found to be efficient at streamlining and modernizing both companies and governmental agencies. Perhaps the best-developed evolution of the IDEF model beyond Toyota was at Boeing. Their project life-cycle process has grown into a rigorous software system that links people, tasks, tools, materials, and the environmental impact of any newly planned project, before any building is allowed to begin. Routinely, more than half of the time for any given project is spent building the precedence diagrams, or three-dimensional process maps, integrating with outside suppliers, and designing the implementation plan–all on the computer. Once real activity is initiated, an action tracker is used to monitor inputs and outputs versus the schedule and delivery metrics in real time throughout the organization. When the execution of a new airplane design begins, it is so well organized that it consistently cuts both costs and build time in half for each successive generation of airframe. Boeing created a complex lean management process called 'define and control airplane configuration/manufacturing resource management' (DCAC/MRM). The process was built with the help of the operations research and computer sciences departments of the University of Pittsburgh. The manufacture of the Boeing 777 was ultimately a success, and it became the precursor to succeeding generations of CALM at Boeing. The methodology of CALM has recently been applied to field orientated infrastructure based businesses with highly interdependent systems, such as electric utilities where a smart grid concept is being researched and developed. The management of infrastructure-based industries like oil, gas, electricity, water, transportation, and renewables requires massive investments in interdependent, physical infrastructure, as well as simultaneous attention to disparate market forces. In infrastructure businesses that manage field assets, uncertainty is the biggest impediment to profitability, rather than the maintenance of efficient supply chains or the management of factory assembly lines. These businesses are dominated by risk from uncertainties such as weather, market variations, transportation disruptions, government actions, logistic difficulties, geology, and asset reliability. CALM has been applied to deal with these types of infrastructure based challenges.

    Read more →
  • Intelligent database

    Intelligent database

    Until the 1980s, databases were viewed as computer systems that stored record-oriented and business data such as manufacturing inventories, bank records, and sales transactions. A database system was not expected to merge numeric data with text, images, or multimedia information, nor was it expected to automatically notice patterns in the data it stored. In the late 1980s the concept of an intelligent database was put forward as a system that manages information (rather than data) in a way that appears natural to users and which goes beyond simple record keeping. The term was introduced in 1989 by the book Intelligent Databases by Kamran Parsaye, Mark Chignell, Setrag Khoshafian and Harry Wong. The concept postulated three levels of intelligence for such systems: high level tools, the user interface and the database engine. The high level tools manage data quality and automatically discover relevant patterns in the data with a process called data mining. This layer often relies on the use of artificial intelligence techniques. The user interface uses hypermedia in a form that uniformly manages text, images and numeric data. The intelligent database engine supports the other two layers, often merging relational database techniques with object orientation. In the twenty-first century, intelligent databases have now become widespread, e.g. hospital databases can now call up patient histories consisting of charts, text and x-ray images just with a few mouse clicks, and many corporate databases include decision support tools based on sales pattern analysis.

    Read more →
  • Knowledge integration

    Knowledge integration

    Knowledge integration is the process of synthesizing multiple knowledge models (or representations) into a common model (representation). Compared to information integration, which involves merging information having different schemas and representation models, knowledge integration focuses more on synthesizing the understanding of a given subject from different perspectives. For example, multiple interpretations are possible of a set of student grades, typically each from a certain perspective. An overall, integrated view and understanding of this information can be achieved if these interpretations can be put under a common model, say, a student performance index. The Web-based Inquiry Science Environment (WISE), from the University of California at Berkeley has been developed along the lines of knowledge integration theory. Knowledge integration has also been studied as the process of incorporating new information into a body of existing knowledge with an interdisciplinary approach. This process involves determining how the new information and the existing knowledge interact, how existing knowledge should be modified to accommodate the new information, and how the new information should be modified in light of the existing knowledge. A learning agent that actively investigates the consequences of new information can detect and exploit a variety of learning opportunities; e.g., to resolve knowledge conflicts and to fill knowledge gaps. By exploiting these learning opportunities the learning agent is able to learn beyond the explicit content of the new information. The machine learning program KI, developed by Murray and Porter at the University of Texas at Austin, was created to study the use of automated and semi-automated knowledge integration to assist knowledge engineers constructing a large knowledge base. A possible technique which can be used is semantic matching. More recently, a technique useful to minimize the effort in mapping validation and visualization has been presented which is based on Minimal Mappings. Minimal mappings are high quality mappings such that i) all the other mappings can be computed from them in time linear in the size of the input graphs, and ii) none of them can be dropped without losing property i). The University of Waterloo operates a Bachelor of Knowledge Integration undergraduate degree program as an academic major or minor. The program started in 2008.

    Read more →
  • Symbol level

    Symbol level

    In knowledge-based systems, agents choose actions based on the principle of rationality to move closer to a desired goal. The agent is able to make decisions based on knowledge it has about the world (see knowledge level). But for the agent to actually change its state, it must use whatever means it has available. This level of description for the agent's behavior is the symbol level. The term was coined by Allen Newell in 1982. For example, in a computer program, the knowledge level consists of the information contained in its data structures that it uses to perform certain actions. The symbol level consists of the program's algorithms, the data structures themselves, and so on.

    Read more →
  • TAChart

    TAChart

    TAChart is a component for the Lazarus IDE that provides charting services. Similar to Tchart and Teechart for Delphi it supports a collection of different chart types including bar charts, pie charts, line charts and point series. Apart from a screen canvas, output is possible in form of SVG, OpenGL, printer, WMF, and other formats. TAChart is bundled with the Lazarus Component Library. Although not intended to be a TChart clone, why its usage differs in certain points, its basic functionality is very similar and some source code written for TeeChart may be reused. == History == The first version of TAChart was developed by Philippe Martinole for the TeleAuto project, a program for automation of astronomic observations. Later functionality was introduced by Luis Rodrigues while porting the Epanet application from Delphi to Lazarus. In the ensuing years the code has extensively rewritten, expanded and is now maintained by Alexander Klenin. == Data sources == TAChart is able to use input from various sources. Examples include lists of real values, user defined buffers in the computer's memory, vectors of random values, fields in databases, calculated values provided by pre-defined functions and results of embedded code written in Pascal Script

    Read more →
  • Artificial intelligence in spirituality

    Artificial intelligence in spirituality

    Some users of artificial intelligence (AI) technologies, especially chatbots, may develop beliefs that AI has or can attain supernatural or spiritual powers. AI models such as ChatGPT are turned to for fortune telling, mysticism and remote viewing. Recent and sudden advances in large language models have led to folk myths about their origin or capabilities, as well as their deification or worship by some users. Tucker Carlson has made similar claims, including directly to Sam Altman. Pope Leo XIV advised priests against using LLM models when it came to the creation of sermons.

    Read more →
  • Journal of Machine Learning Research

    Journal of Machine Learning Research

    The Journal of Machine Learning Research is a peer-reviewed open access scientific journal covering machine learning. It was established in 2000 and the first editor-in-chief was Leslie Kaelbling. The current editors-in-chief are Francis Bach (Inria) and David Blei (Columbia University). == History == The journal was established as an open-access alternative to the journal Machine Learning. In 2001, forty editorial board members of Machine Learning resigned, saying that in the era of the Internet, it was detrimental for researchers to continue publishing their papers in expensive journals with pay-access archives. The open access model employed by the Journal of Machine Learning Research allows authors to publish articles for free and retain copyright, while archives are freely available online. Print editions of the journal were published by MIT Press until 2004 and by Microtome Publishing thereafter. From its inception, the journal received no revenue from the print edition and paid no subvention to MIT Press or Microtome Publishing. In response to the prohibitive costs of arranging workshop and conference proceedings publication with traditional academic publishing companies, the journal launched a proceedings publication arm in 2007 and now publishes proceedings for several leading machine learning conferences, including the International Conference on Machine Learning, COLT, AISTATS, and workshops held at the Conference on Neural Information Processing Systems.

    Read more →
  • Feature hashing

    Feature hashing

    In machine learning, feature hashing, also known as the hashing trick (by analogy to the kernel trick), is a fast and space-efficient way of vectorizing features, i.e. turning arbitrary features into indices in a vector or matrix. It works by applying a hash function to the features and using their hash values as indices directly (after a modulo operation), rather than looking the indices up in an associative array. In addition to its use for encoding non-numeric values, feature hashing can also be used for dimensionality reduction. This trick is often attributed to Weinberger et al. (2009), but there exists a much earlier description of this method published by John Moody in 1989. == Motivation == === Motivating example === In a typical document classification task, the input to the machine learning algorithm (both during learning and classification) is free text. From this, a bag of words (BOW) representation is constructed: the individual tokens are extracted and counted, and each distinct token in the training set defines a feature (independent variable) of each of the documents in both the training and test sets. Machine learning algorithms, however, are typically defined in terms of numerical vectors. Therefore, the bags of words for a set of documents is regarded as a term-document matrix where each row is a single document, and each column is a single feature/word; the entry i, j in such a matrix captures the frequency (or weight) of the j'th term of the vocabulary in document i. (An alternative convention swaps the rows and columns of the matrix, but this difference is immaterial.) Typically, these vectors are extremely sparse—according to Zipf's law. The common approach is to construct, at learning time or prior to that, a dictionary representation of the vocabulary of the training set, and use that to map words to indices. Hash tables and tries are common candidates for dictionary implementation. E.g., the three documents John likes to watch movies. Mary likes movies too. John also likes football. can be converted, using the dictionary to the term-document matrix ( John likes to watch movies Mary too also football 1 1 1 1 1 0 0 0 0 0 1 0 0 1 1 1 0 0 1 1 0 0 0 0 0 1 1 ) {\displaystyle {\begin{pmatrix}{\textrm {John}}&{\textrm {likes}}&{\textrm {to}}&{\textrm {watch}}&{\textrm {movies}}&{\textrm {Mary}}&{\textrm {too}}&{\textrm {also}}&{\textrm {football}}\\1&1&1&1&1&0&0&0&0\\0&1&0&0&1&1&1&0&0\\1&1&0&0&0&0&0&1&1\end{pmatrix}}} (Punctuation was removed, as is usual in document classification and clustering.) The problem with this process is that such dictionaries take up a large amount of storage space and grow in size as the training set grows. On the contrary, if the vocabulary is kept fixed and not increased with a growing training set, an adversary may try to invent new words or misspellings that are not in the stored vocabulary so as to circumvent a machine learned filter. To address this challenge, Yahoo! Research attempted to use feature hashing for their spam filters. Note that the hashing trick isn't limited to text classification and similar tasks at the document level, but can be applied to any problem that involves large (perhaps unbounded) numbers of features. === Mathematical motivation === Mathematically, a token is an element t {\displaystyle t} in a finite (or countably infinite) set T {\displaystyle T} . Suppose we only need to process a finite corpus, then we can put all tokens appearing in the corpus into T {\displaystyle T} , meaning that T {\displaystyle T} is finite. However, suppose we want to process all possible words made of the English letters, then T {\displaystyle T} is countably infinite. Most neural networks can only operate on real vector inputs, so we must construct a "dictionary" function ϕ : T → R n {\displaystyle \phi :T\to \mathbb {R} ^{n}} . When T {\displaystyle T} is finite, of size | T | = m ≤ n {\displaystyle |T|=m\leq n} , then we can use one-hot encoding to map it into R n {\displaystyle \mathbb {R} ^{n}} . First, arbitrarily enumerate T = { t 1 , t 2 , . . , t m } {\displaystyle T=\{t_{1},t_{2},..,t_{m}\}} , then define ϕ ( t i ) = e i {\displaystyle \phi (t_{i})=e_{i}} . In other words, we assign a unique index i {\displaystyle i} to each token, then map the token with index i {\displaystyle i} to the unit basis vector e i {\displaystyle e_{i}} . One-hot encoding is easy to interpret, but it requires one to maintain the arbitrary enumeration of T {\displaystyle T} . Given a token t ∈ T {\displaystyle t\in T} , to compute ϕ ( t ) {\displaystyle \phi (t)} , we must find out the index i {\displaystyle i} of the token t {\displaystyle t} . Thus, to implement ϕ {\displaystyle \phi } efficiently, we need a fast-to-compute bijection h : T → { 1 , . . . , m } {\displaystyle h:T\to \{1,...,m\}} , then we have ϕ ( t ) = e h ( t ) {\displaystyle \phi (t)=e_{h(t)}} . In fact, we can relax the requirement slightly: It suffices to have a fast-to-compute injection h : T → { 1 , . . . , n } {\displaystyle h:T\to \{1,...,n\}} , then use ϕ ( t ) = e h ( t ) {\displaystyle \phi (t)=e_{h(t)}} . In practice, there is no simple way to construct an efficient injection h : T → { 1 , . . . , n } {\displaystyle h:T\to \{1,...,n\}} . However, we do not need a strict injection, but only an approximate injection. That is, when t ≠ t ′ {\displaystyle t\neq t'} , we should probably have h ( t ) ≠ h ( t ′ ) {\displaystyle h(t)\neq h(t')} , so that probably ϕ ( t ) ≠ ϕ ( t ′ ) {\displaystyle \phi (t)\neq \phi (t')} . At this point, we have just specified that h {\displaystyle h} should be a hashing function. Thus we reach the idea of feature hashing. == Algorithms == === Feature hashing (Weinberger et al. 2009) === The basic feature hashing algorithm presented in (Weinberger et al. 2009) is defined as follows. First, one specifies two hash functions: the kernel hash h : T → { 1 , 2 , . . . , n } {\displaystyle h:T\to \{1,2,...,n\}} , and the sign hash ζ : T → { − 1 , + 1 } {\displaystyle \zeta :T\to \{-1,+1\}} . Next, one defines the feature hashing function: ϕ : T → R n , ϕ ( t ) = ζ ( t ) e h ( t ) {\displaystyle \phi :T\to \mathbb {R} ^{n},\quad \phi (t)=\zeta (t)e_{h(t)}} Finally, extend this feature hashing function to strings of tokens by ϕ : T ∗ → R n , ϕ ( t 1 , . . . , t k ) = ∑ j = 1 k ϕ ( t j ) {\displaystyle \phi :T^{}\to \mathbb {R} ^{n},\quad \phi (t_{1},...,t_{k})=\sum _{j=1}^{k}\phi (t_{j})} where T ∗ {\displaystyle T^{}} is the set of all finite strings consisting of tokens in T {\displaystyle T} . Equivalently, ϕ ( t 1 , . . . , t k ) = ∑ j = 1 k ζ ( t j ) e h ( t j ) = ∑ i = 1 n ( ∑ j : h ( t j ) = i ζ ( t j ) ) e i {\displaystyle \phi (t_{1},...,t_{k})=\sum _{j=1}^{k}\zeta (t_{j})e_{h(t_{j})}=\sum _{i=1}^{n}\left(\sum _{j:h(t_{j})=i}\zeta (t_{j})\right)e_{i}} ==== Geometric properties ==== We want to say something about the geometric property of ϕ {\displaystyle \phi } , but T {\displaystyle T} , by itself, is just a set of tokens, we cannot impose a geometric structure on it except the discrete topology, which is generated by the discrete metric. To make it nicer, we lift it to T → R T {\displaystyle T\to \mathbb {R} ^{T}} , and lift ϕ {\displaystyle \phi } from ϕ : T → R n {\displaystyle \phi :T\to \mathbb {R} ^{n}} to ϕ : R T → R n {\displaystyle \phi :\mathbb {R} ^{T}\to \mathbb {R} ^{n}} by linear extension: ϕ ( ( x t ) t ∈ T ) = ∑ t ∈ T x t ζ ( t ) e h ( t ) = ∑ i = 1 n ( ∑ t : h ( t ) = i x t ζ ( t ) ) e i {\displaystyle \phi ((x_{t})_{t\in T})=\sum _{t\in T}x_{t}\zeta (t)e_{h(t)}=\sum _{i=1}^{n}\left(\sum _{t:h(t)=i}x_{t}\zeta (t)\right)e_{i}} There is an infinite sum there, which must be handled at once. There are essentially only two ways to handle infinities. One may impose a metric, then take its completion, to allow well-behaved infinite sums, or one may demand that nothing is actually infinite, only potentially so. Here, we go for the potential-infinity way, by restricting R T {\displaystyle \mathbb {R} ^{T}} to contain only vectors with finite support: ∀ ( x t ) t ∈ T ∈ R T {\displaystyle \forall (x_{t})_{t\in T}\in \mathbb {R} ^{T}} , only finitely many entries of ( x t ) t ∈ T {\displaystyle (x_{t})_{t\in T}} are nonzero. Define an inner product on R T {\displaystyle \mathbb {R} ^{T}} in the obvious way: ⟨ e t , e t ′ ⟩ = { 1 , if t = t ′ , 0 , else. ⟨ x , x ′ ⟩ = ∑ t , t ′ ∈ T x t x t ′ ⟨ e t , e t ′ ⟩ {\displaystyle \langle e_{t},e_{t'}\rangle ={\begin{cases}1,{\text{ if }}t=t',\\0,{\text{ else.}}\end{cases}}\quad \langle x,x'\rangle =\sum _{t,t'\in T}x_{t}x_{t'}\langle e_{t},e_{t'}\rangle } As a side note, if T {\displaystyle T} is infinite, then the inner product space R T {\displaystyle \mathbb {R} ^{T}} is not complete. Taking its completion would get us to a Hilbert space, which allows well-behaved infinite sums. Now we have an inner product space, with enough structure to describe the geometry of the feature hashing function ϕ : R T → R n {\displaystyle \phi :\ma

    Read more →
  • Zero-shot learning

    Zero-shot learning

    Zero-shot learning (ZSL) is a problem setup in deep learning where, at test time, a learner observes samples from classes which were not observed during training, and needs to predict the class that they belong to. The name is a play on words based on the earlier concept of one-shot learning, in which classification can be learned from only one, or a few, examples. Zero-shot methods generally work by associating observed and non-observed classes through some form of auxiliary information, which encodes observable distinguishing properties of objects. For example, given a set of images of animals to be classified, along with auxiliary textual descriptions of what animals look like, an artificial intelligence model which has been trained to recognize horses, but has never been given a zebra, can still recognize a zebra when it also knows that zebras look like striped horses. This problem is widely studied in computer vision, natural language processing, and machine perception. == Background and history == The first paper on zero-shot learning in natural language processing appeared in a 2008 paper by Chang, Ratinov, Roth, and Srikumar, at the AAAI'08, but the name given to the learning paradigm there was dataless classification. The first paper on zero-shot learning in computer vision appeared at the same conference, under the name zero-data learning. The term zero-shot learning itself first appeared in the literature in a 2009 paper from Palatucci, Hinton, Pomerleau, and Mitchell at NIPS'09. This terminology was repeated later in another computer vision paper and the term zero-shot learning caught on, as a take-off on one-shot learning that was introduced in computer vision years earlier. In computer vision, zero-shot learning models learned parameters for seen classes along with their class representations and rely on representational similarity among class labels so that, during inference, instances can be classified into new classes. In natural language processing, the key technical direction developed builds on the ability to "understand the labels"—represent the labels in the same semantic space as that of the documents to be classified. This supports the classification of a single example without observing any annotated data, the purest form of zero-shot classification. The original paper made use of the Explicit Semantic Analysis (ESA) representation but later papers made use of other representations, including dense representations. This approach was also extended to multilingual domains, fine entity typing and other problems. Moreover, beyond relying solely on representations, the computational approach has been extended to depend on transfer from other tasks, such as textual entailment and question answering. The original paper also points out that, beyond the ability to classify a single example, when a collection of examples is given, with the assumption that they come from the same distribution, it is possible to bootstrap the performance in a semi-supervised like manner (or transductive learning). Unlike standard generalization in machine learning, where classifiers are expected to correctly classify new samples to classes they have already observed during training, in ZSL, no samples from the classes have been given during training the classifier. It can therefore be viewed as an extreme case of domain adaptation. == Prerequisite information for zero-shot classes == Naturally, some form of auxiliary information has to be given about these zero-shot classes, and this type of information can be of several types. Learning with attributes: classes are accompanied by pre-defined structured description. For example, for bird descriptions, this could include "red head", "long beak". These attributes are often organized in a structured compositional way, and taking that structure into account improves learning. While this approach was used mostly in computer vision, there are some examples for it also in natural language processing. Learning from textual description. As pointed out above, this has been the key direction pursued in natural language processing. Here class labels are taken to have a meaning and are often augmented with definitions or free-text natural-language description. This could include for example a wikipedia description of the class. Class-class similarity. Here, classes are embedded in a continuous space. A zero-shot classifier can predict that a sample corresponds to some position in that space, and the nearest embedded class is used as a predicted class, even if no such samples were observed during training. == Generalized zero-shot learning == The above ZSL setup assumes that at test time, only zero-shot samples are given, namely, samples from new unseen classes. In generalized zero-shot learning, samples from both new and known classes, may appear at test time. This poses new challenges for classifiers at test time, because it is very challenging to estimate if a given sample is new or known. Some approaches to handle this include: a gating module, which is first trained to decide if a given sample comes from a new class or from an old one, and then, at inference time, outputs either a hard decision, or a soft probabilistic decision a generative module, which is trained to generate feature representation of the unseen classes—a standard classifier can then be trained on samples from all classes, seen and unseen. == Domains of application == Zero shot learning has been applied to the following fields: image classification semantic segmentation image generation object detection natural language processing computational biology abstract reasoning

    Read more →
  • Hyperparameter optimization

    Hyperparameter optimization

    In machine learning, hyperparameter optimization or tuning is the problem of choosing a set of optimal hyperparameters for a learning algorithm. A hyperparameter is a parameter whose value is used to control the learning process, which must be configured before the process starts. Hyperparameter optimization determines the set of hyperparameters that yields an optimal model which minimizes a predefined loss function on a given data set. The objective function takes a set of hyperparameters and returns the associated loss. Cross-validation is often used to estimate this generalization performance, and therefore choose the set of values for hyperparameters that maximize it. == Approaches == === Grid search === The traditional method for hyperparameter optimization has been grid search, or a parameter sweep, which is simply an exhaustive searching through a manually specified subset of the hyperparameter space of a learning algorithm. A grid search algorithm must be guided by some performance metric, typically measured by cross-validation on the training set or evaluation on a hold-out validation set. Since the parameter space of a machine learner may include real-valued or unbounded value spaces for certain parameters, manually set bounds and discretization may be necessary before applying grid search. For example, a typical soft-margin SVM classifier equipped with an RBF kernel has at least two hyperparameters that need to be tuned for good performance on unseen data: a regularization constant C and a kernel hyperparameter γ. Both parameters are continuous, so to perform grid search, one selects a finite set of "reasonable" values for each, say C ∈ { 10 , 100 , 1000 } {\displaystyle C\in \{10,100,1000\}} γ ∈ { 0.1 , 0.2 , 0.5 , 1.0 } {\displaystyle \gamma \in \{0.1,0.2,0.5,1.0\}} Grid search then trains an SVM with each pair (C, γ) in the Cartesian product of these two sets and evaluates their performance on a held-out validation set (or by internal cross-validation on the training set, in which case multiple SVMs are trained per pair). Finally, the grid search algorithm outputs the settings that achieved the highest score in the validation procedure. Grid search suffers from the curse of dimensionality, but is often embarrassingly parallel because the hyperparameter settings it evaluates are typically independent of each other. === Random search === Random Search replaces the exhaustive enumeration of all combinations by selecting them randomly. This can be simply applied to the discrete setting described above, but also generalizes to continuous and mixed spaces. A benefit over grid search is that random search can explore many more values than grid search could for continuous hyperparameters. It can outperform Grid search, especially when only a small number of hyperparameters affects the final performance of the machine learning algorithm. In this case, the optimization problem is said to have a low intrinsic dimensionality. Random Search is also embarrassingly parallel, and additionally allows the inclusion of prior knowledge by specifying the distribution from which to sample. Despite its simplicity, random search remains one of the important base-lines against which to compare the performance of new hyperparameter optimization methods. === Bayesian optimization === Bayesian optimization is a global optimization method for noisy black-box functions. Applied to hyperparameter optimization, Bayesian optimization builds a probabilistic model of the function mapping from hyperparameter values to the objective evaluated on a validation set. By iteratively evaluating a promising hyperparameter configuration based on the current model, and then updating it, Bayesian optimization aims to gather observations revealing as much information as possible about this function and, in particular, the location of the optimum. It tries to balance exploration (hyperparameters for which the outcome is most uncertain) and exploitation (hyperparameters expected close to the optimum). In practice, Bayesian optimization has been shown to obtain better results in fewer evaluations compared to grid search and random search, due to the ability to reason about the quality of experiments before they are run. === Gradient-based optimization === For specific learning algorithms, it is possible to compute the gradient with respect to hyperparameters and then optimize the hyperparameters using gradient descent. The first usage of these techniques was focused on neural networks. Since then, these methods have been extended to other models such as support vector machines or logistic regression. A different approach in order to obtain a gradient with respect to hyperparameters consists in differentiating the steps of an iterative optimization algorithm using automatic differentiation. A more recent work along this direction uses the implicit function theorem to calculate hypergradients and proposes a stable approximation of the inverse Hessian. The method scales to millions of hyperparameters and requires constant memory. In a different approach, a hypernetwork is trained to approximate the best response function. One of the advantages of this method is that it can handle discrete hyperparameters as well. Self-tuning networks offer a memory efficient version of this approach by choosing a compact representation for the hypernetwork. More recently, Δ-STN has improved this method further by a slight reparameterization of the hypernetwork which speeds up training. Δ-STN also yields a better approximation of the best-response Jacobian by linearizing the network in the weights, hence removing unnecessary nonlinear effects of large changes in the weights. Apart from hypernetwork approaches, gradient-based methods can be used to optimize discrete hyperparameters also by adopting a continuous relaxation of the parameters. Such methods have been extensively used for the optimization of architecture hyperparameters in neural architecture search. === Evolutionary optimization === Evolutionary optimization is a methodology for the global optimization of noisy black-box functions. In hyperparameter optimization, evolutionary optimization uses evolutionary algorithms to search the space of hyperparameters for a given algorithm. Evolutionary hyperparameter optimization follows a process inspired by the biological concept of evolution: Create an initial population of random solutions (i.e., randomly generate tuples of hyperparameters, typically 100+) Evaluate the hyperparameter tuples and acquire their fitness function (e.g., 10-fold cross-validation accuracy of the machine learning algorithm with those hyperparameters) Rank the hyperparameter tuples by their relative fitness Replace the worst-performing hyperparameter tuples with new ones generated via crossover and mutation Repeat steps 2-4 until satisfactory algorithm performance is reached or is no longer improving. Evolutionary optimization has been used in hyperparameter optimization for statistical machine learning algorithms, automated machine learning, typical neural network and deep neural network architecture search, as well as training of the weights in deep neural networks. === Population-based === Population Based Training (PBT) learns both hyperparameter values and network weights. Multiple learning processes operate independently, using different hyperparameters. As with evolutionary methods, poorly performing models are iteratively replaced with models that adopt modified hyperparameter values and weights based on the better performers. This replacement model warm starting is the primary differentiator between PBT and other evolutionary methods. PBT thus allows the hyperparameters to evolve and eliminates the need for manual hypertuning. The process makes no assumptions regarding model architecture, loss functions or training procedures. PBT and its variants are adaptive methods: they update hyperparameters during the training of the models. On the contrary, non-adaptive methods have the sub-optimal strategy to assign a constant set of hyperparameters for the whole training. === Early stopping-based === A class of early stopping-based hyperparameter optimization algorithms is purpose-built for large search spaces of continuous and discrete hyperparameters, particularly when the computational cost to evaluate the performance of a set of hyperparameters is high. Irace implements the iterated racing algorithm, that focuses the search around the most promising configurations, using statistical tests to discard the ones that perform poorly. Another early stopping hyperparameter optimization algorithm is successive halving (SHA), which begins as a random search but periodically prunes low-performing models, thereby focusing computational resources on more promising models. Asynchronous successive halving (ASHA) further improves upon SHA's resource utilization profile by removing the need to synchronously evaluate a

    Read more →
  • Gibberlink

    Gibberlink

    GibberLink is an acoustic data transmission project, with an open-source client available on GitHub, in which two conversational AI agents switch from speaking to one another in a Human-listenable language (such as English) to their own unique language that consists of a sound-level protocol after confirming they are both AI agents. The project was created by Anton Pidkuiko and Boris Starkov. == Reception == The project won the global top prize at the ElevenLabs Worldwide Hackathon. It has also been cited as raising questions around AI ethics and oversight. On February 23, 2025, a YouTube video of two independent conversational ElevenLabs AI agents being prompted to chat about booking a hotel (one as a caller, one as a receptionist) received coverage for going viral. In this video, both agents are prompted to switch to ggwave data-over-sound protocol when they identify the other side as AI, and keep speaking in English otherwise.

    Read more →