AI Essay Writer Undetectable Free

AI Essay Writer Undetectable Free — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Computer security

    Computer security

    Computer security (also cybersecurity, digital security, or information technology (IT) security) is a subdiscipline within the field of information security. It focuses on protecting computer software, systems, and networks from threats that can lead to unauthorized information disclosure, theft, or damage to hardware, software, or data, as well as to the disruption or misdirection of the services they provide. The growing significance of computer security reflects the increasing dependence on computer systems, the Internet, and evolving wireless network standards. This reliance has expanded with the proliferation of smart devices, including smartphones, televisions, and other components of the Internet of things (IoT). As digital infrastructure becomes more embedded in everyday life, cybersecurity has emerged as a critical concern. The complexity of modern information systems—and the societal functions they underpin—has introduced new vulnerabilities. Systems that manage essential services, such as power grids, electoral processes, and finance, are particularly sensitive to security breaches. Although many aspects of computer security involve digital security, such as electronic passwords and encryption, physical security measures, such as metal locks, are still used to prevent unauthorized tampering. IT security is not a perfect subset of information security and therefore does not completely align with the security convergence schema. == Vulnerabilities and attacks == A vulnerability refers to a flaw in the structure, execution, functioning, or internal oversight of a computer or system that compromises its security. Most of the vulnerabilities that have been discovered are documented in the Common Vulnerabilities and Exposures (CVE) database. An exploitable vulnerability is one for which at least one working exploit exists. Actors maliciously seeking vulnerabilities are known as threats. Vulnerabilities can be researched, reverse-engineered, hunted, or exploited using automated tools or customized scripts. Various people or parties are vulnerable to cyberattacks; however, different groups are likely to experience different types of attacks more than others. In April 2023, the United Kingdom Department for Science, Innovation & Technology released a report on cyberattacks over the previous 12 months. They surveyed 2,263 UK businesses, 1,174 UK registered charities, and 554 education institutions. The research found that "32% of businesses and 24% of charities overall recall any breaches or attacks from the last 12 months." These figures were much higher for "medium businesses (59%), large businesses (69%), and high-income charities with £500,000 or more in annual income (56%)." Yet, although medium or large businesses are more often the victims, since larger companies have generally improved their security over the last decade, small and midsize businesses (SMBs) have also become increasingly vulnerable as they often "do not have advanced tools to defend the business." SMBs are most likely to be affected by malware, ransomware, phishing, man-in-the-middle attacks, and Denial-of Service (DoS) Attacks. Normal internet users are most likely to be affected by untargeted cyberattacks. These are where attackers indiscriminately target as many devices, services, or users as possible. They do this using techniques that take advantage of the openness of the Internet. These strategies mostly include phishing, ransomware, water holing and scanning. To secure a computer system, it is important to understand the attacks that can be made against it, and these threats can typically be classified into one of the following categories: === Backdoor === A backdoor in a computer system, a cryptosystem or an algorithm, is any secret method of bypassing normal authentication or security controls. These weaknesses may exist for many reasons, including original design or poor configuration. Due to the nature of backdoors, they are of greater concern to companies and databases as opposed to individuals. Backdoors may be added by an authorized party to allow some legitimate access or by an attacker for malicious reasons. Criminals often use malware to install backdoors, giving them remote administrative access to a system. Once they have access, cybercriminals can "modify files, steal personal information, install unwanted software, and even take control of the entire computer." Backdoors can be difficult to detect, as they often remain hidden within source code or system firmware and may require intimate knowledge of the operating system to identify. === Denial-of-service attack === Denial-of-service attacks (DoS) are designed to make a machine or network resource unavailable to its intended users. Attackers can deny service to individual victims, such as by deliberately entering an incorrect password enough consecutive times to cause the victim's account to be locked, or they may overload the capabilities of a machine or network and block all users at once. While a network attack from a single IP address can be blocked by adding a new firewall rule, many forms of distributed denial-of-service (DDoS) attacks are possible, where the attack comes from a large number of points. In this case, defending against these attacks is much more difficult. Such attacks can originate from the zombie computers of a botnet or from a range of other possible techniques, including distributed reflective denial-of-service (DRDoS), where innocent systems are fooled into sending traffic to the victim. With such attacks, the amplification factor makes the attack easier for the attacker because they have to use little bandwidth themselves. To understand why attackers may carry out these attacks, see the 'attacker motivation' section. === Physical access attacks === A direct-access attack is when an unauthorized user (an attacker) gains physical access to a computer, typically to copy data from it or steal information. Attackers may also compromise security by making operating system modifications, installing software worms, keyloggers, covert listening devices or using wireless microphones. Even when the system is protected by standard security measures, these may be bypassed by booting another operating system or tool from a CD-ROM or other bootable media. Disk encryption and the Trusted Platform Module standard are designed to prevent these attacks. Direct service attackers are related in concept to direct memory attacks which allow an attacker to gain direct access to a computer's memory. The attacks "take advantage of a feature of modern computers that allows certain devices, such as external hard drives, graphics cards, or network cards, to access the computer's memory directly." === Eavesdropping === Eavesdropping is the act of surreptitiously listening to a private computer conversation (communication), usually between hosts on a network. It typically occurs when a user connects to a network where traffic is not secured or encrypted and sends sensitive business data to a colleague, which, when listened to by an attacker, could be exploited. Data transmitted across an open network can be intercepted by an attacker using various methods. Unlike malware, direct-access attacks, or other forms of cyberattacks, eavesdropping attacks are unlikely to negatively affect the performance of networks or devices, making them difficult to notice. In fact, "the attacker does not need to have any ongoing connection to the software at all. The attacker can insert the software onto a compromised device, perhaps by direct insertion or perhaps by a virus or other malware, and then come back some time later to retrieve any data that is found or trigger the software to send the data at some determined time." Using a virtual private network (VPN), which encrypts data between two points, is one of the most common forms of protection against eavesdropping. Using the best form of encryption possible for wireless networks is best practice, as well as using HTTPS instead of an unencrypted HTTP. Programs such as Carnivore and NarusInSight have been used by the Federal Bureau of Investigation (FBI) and the NSA to eavesdrop on the systems of internet service providers. Even machines that operate as a closed system (i.e., with no contact with the outside world) can be eavesdropped upon by monitoring the faint electromagnetic transmissions generated by the hardware. TEMPEST is a specification by the NSA referring to these attacks. === Malware === Malicious software (malware) is any software code or computer program "intentionally written to harm a computer system or its users." Once present on a computer, it can leak sensitive details such as personal information, business information and passwords, can give control of the system to the attacker, and can corrupt or delete data permanently. ==== Types of malware ==== Viruses are a specific type of malware, and are normally a malicious code that hijac

    Read more →
  • DIKW pyramid

    DIKW pyramid

    The DIKW pyramid (also known as the knowledge pyramid or information hierarchy) is a model describing relationships between data, information, knowledge and wisdom sometimes also stylized as a chain, refer to models of possible structural and functional relationships between a set of components—often four, data, information, knowledge, and wisdom. The concept has roots predating the 1980s. In the latter years of that decade, interest in the models grew after explicit presentations and discussions, including from Milan Zeleny, Russell Ackoff, and Robert W. Lucky. Subsequent important discussions extended along theoretical and practical lines into the coming decades. While debate continues as to actual meaning of the component terms of DIKW-type models, and the actual nature of their relationships—including occasional doubt being cast over any simple, linear, unidirectional model—even so they have become very popular visual representations in use by business, the military, and others. Among the academic and popular, not all versions of the DIKW-type models include all four components (earlier ones excluding data, later ones excluding or downplaying wisdom, and several including additional components (for instance Ackoff inserting "understanding" before and Zeleny adding "enlightenment" after the wisdom component). In addition, DIKW-type models are no longer always presented as pyramids, instead also as a chart or framework (e.g., by Zeleny), as flow diagrams (e.g., by Liew, and by Chisholm et al.), and sometimes as a continuum (e.g., by Choo et al.). == Short description == As Rowley noted in 2007, the DIKW model "is often quoted, or used implicitly, in definitions of data, information and knowledge in the information management, information systems and knowledge management literatures, but [as of that date] there ha[d] been limited direct discussion of the hierarchy". Reviews of textbooks and a survey of scholars in relevant fields indicate that there was not a consensus as to definitions used in the model as of that date, and as reviewed by Liew in that year, even less "in the description of the processes that transform components lower in the hierarchy into those above them". Zins work, published in 2007—from studies in 2003-2005 that documented "130 definitions of data, information, and knowledge formulated by 45 scholars", published in 2007—to suggest that the data–information–knowledge components of DIKW refer to a class of no less than five models, as a function of whether data, information, and knowledge are each conceived of as subjective, objective (what Zins terms, "universal" or "collective") or both. In Zins' usage, subjective and objective "are not related to arbitrariness and truthfulness, which are usually attached to the concepts of subjective knowledge and objective knowledge". Information science, Zins argues, studies data and information, but not knowledge, as knowledge is an internal (subjective) rather than an external (universal–collective) phenomenon. == Representations == === Graphical representation === DIKW is a hierarchical model often depicted as a pyramid, sometimes as a chain, with data at its base and wisdom at its apex (or chain-beginning and -end). Both Zeleny and Ackoff have been credited with originating the pyramid representation, although neither used a pyramid to present their ideas. According to Wallace, Debons and colleagues may have been the first to "present the hierarchy graphically". Many variations of the DIKW-type pyramid have been produced. One, in use by knowledge managers in the United States Department of Defense, attempts to show the DIKW progression to enable effective decisions and consequent activities supporting shared understanding throughout defense organizations, as well as supporting management of risks associated with decisions. DIKW-type hierarchical information paradigms have also been represented as two-dimensional charts, and as flow diagrams, where relationships between the components may be presented less hierarchically, with defining aspects of the relationships, feedback loops, etc. === Computational representation === Intelligent decision support systems are trying to improve decision making by introducing new technologies and methods from the domain of modeling and simulation in general, and in particular from the domain of intelligent software agents in the contexts of agent-based modeling. The following example describes a military decision support system, but the architecture and underlying conceptual idea are transferable to other application domains: The value chain starts with data quality describing the information within the underlying command and control systems. Information quality tracks the completeness, correctness, currency, consistency and precision of the data items and information statements available. Knowledge quality deals with procedural knowledge and information embedded in the command and control system such as templates for adversary forces, assumptions about entities such as ranges and weapons, and doctrinal assumptions, often coded as rules. Awareness quality measures the degree of using the information and knowledge embedded within the command and control system. Awareness is explicitly placed in the cognitive domain. By the introduction of a common operational picture, data are put into context, which leads to information instead of data. The next step, which is enabled by service-oriented web-based infrastructures (but not yet operationally used), is the use of models and simulations for decision support. Simulation systems are the prototype for procedural knowledge, which is the basis for knowledge quality. Finally, using intelligent software agents to continually observe the battle sphere, apply models and simulations to analyze what is going on, to monitor the execution of a plan, and to do all the tasks necessary to make the decision maker aware of what is going on, command and control systems could even support situational awareness, the level in the value chain traditionally limited to pure cognitive methods. == History == Danny P. Wallace, a professor of library and information science, explained that the origin of the DIKW pyramid is uncertain: The presentation of the relationships among data, information, knowledge, and sometimes wisdom in a hierarchical arrangement has been part of the language of information science for many years. Although it is uncertain when and by whom those relationships were first presented, the ubiquity of the notion of a hierarchy is embedded in the use of the acronym DIKW as a shorthand representation for the data-to-information-to-knowledge-to-wisdom transformation.Many authors think that the idea of the DIKW relationship originated from two lines in the poem "Choruses", by T. S. Eliot, that appeared in the pageant play The Rock, in 1934: === Knowledge, intelligence, and wisdom === In 1927, Clarence W. Barron addressed his employees at Dow Jones & Company on the hierarchy: "Knowledge, Intelligence and Wisdom". === Data, information, knowledge === In 1955, English-American economist and educator Kenneth Boulding presented a variation on the hierarchy consisting of "signals, messages, information, and knowledge". However, "[t]he first author to distinguish among data, information, and knowledge and to also employ the term 'knowledge management' may have been American educator Nicholas L. Henry", in a 1974 journal article. === Data, information, knowledge, wisdom === Other early versions (prior to 1982) of the hierarchy that refer to a data tier include those of Chinese-American geographer Yi-Fu Tuan and sociologist-historian Daniel Bell.. In 1980, Irish-born engineer Mike Cooley invoked the same hierarchy in his critique of automation and computerization, in his book Architect or Bee?: The Human / Technology Relationship. Thereafter, in 1987, Czechoslovakia-born educator Milan Zeleny mapped the components of the hierarchy to knowledge forms: know-nothing, know-what, know-how, and know-why. Zeleny "has frequently been credited with proposing the [representation of DIKW as a pyramid ]... although he actually made no reference to any such graphical model." The hierarchy appears again in a 1988 address to the International Society for General Systems Research, by American organizational theorist Russell Ackoff, published in 1989. Subsequent authors and textbooks cite Ackoff's as the "original articulation" of the hierarchy or otherwise credit Ackoff with its proposal. Ackoff's version of the model includes an understanding tier (as Adler had, before him), interposed between knowledge and wisdom. Although Ackoff did not present the hierarchy graphically, he has also been credited with its representation as a pyramid. In 1989, Bell Labs veteran Robert W. Lucky wrote about the four-tier "information hierarchy" in the form of a pyramid in his book Silicon Dreams. In the same year as Ackoff presented his a

    Read more →
  • Upper ontology

    Upper ontology

    In information science, an upper ontology (also known as a top-level ontology, upper model, or foundation ontology) is an ontology (in the sense used in information science) that consists of very general terms (such as "object", "property", "relation") that are common across all domains. An important function of an upper ontology is to support broad semantic interoperability among a large number of domain-specific ontologies by providing a common starting point for the formulation of definitions. Terms in the domain ontology are ranked under the terms in the upper ontology, e.g., the upper ontology classes are superclasses or supersets of all the classes in the domain ontologies. A number of upper ontologies have been proposed, each with its own proponents. Library classification systems predate upper ontology systems. Though library classifications organize and categorize knowledge using general concepts that are the same across all knowledge domains, neither system is a replacement for the other. == Development == Any standard foundational ontology is likely to be contested among different groups, each with its own idea of "what exists". One factor exacerbating the failure to arrive at a common approach has been the lack of open-source applications that would permit the testing of different ontologies in the same computational environment. The differences have thus been debated largely on theoretical grounds, or are merely the result of personal preferences. Foundational ontologies can however be compared on the basis of adoption for the purposes of supporting interoperability across domain ontologies. No particular upper ontology has yet gained widespread acceptance as a de facto standard. Different organizations have attempted to define standards for specific domains. The 'Process Specification Language' (PSL) created by the National Institute of Standards and Technology (NIST) is one example. Another important factor leading to the absence of wide adoption of any existing upper ontology is the complexity. Some upper ontologies—Cyc is often cited as an example in this regard—are very large, ranging up to thousands of elements (classes, relations), with complex interactions among them and with a complexity similar to that of a human natural language, and the learning process can be even longer than for a natural language because of the unfamiliar format and logical rules. The motivation to overcome this learning barrier is largely absent because of the paucity of publicly accessible examples of use. As a result, those building domain ontologies for local applications tend to create the simplest possible domain-specific ontology, not related to any upper ontology. Such domain ontologies may function adequately for the local purpose, but they are very time-consuming to relate accurately to other domain ontologies. To solve this problem, some genuinely top level ontologies have been developed, which are deliberately designed to have minimal overlap with any domain ontologies. Examples are Basic Formal Ontology and the DOLCE (see below). === Arguments for the infeasibility of an upper ontology === Historically, many attempts in many societies have been made to impose or define a single set of concepts as more primal, basic, foundational, authoritative, true or rational than all others. A common objection to such attempts points out that humans lack the sort of transcendent perspective — or God's eye view — that would be required to achieve this goal. Humans are bound by language or culture, and so lack the sort of objective perspective from which to observe the whole terrain of concepts and derive any one standard. Thomasson, under the headline "1.5 Skepticism about Category Systems", wrote: "category systems, at least as traditionally presented, seem to presuppose that there is a unique true answer to the question of what categories of entity there are – indeed the discovery of this answer is the goal of most such inquiries into ontological categories. [...] But actual category systems offered vary so much that even a short survey of past category systems like that above can undermine the belief that such a unique, true and complete system of categories may be found. Given such a diversity of answers to the question of what the ontological categories are, by what criteria could we possibly choose among them to determine which is uniquely correct?" Another objection is the problem of formulating definitions. Top level ontologies are designed to maximize support for interoperability across a large number of terms. Such ontologies must therefore consist of terms expressing very general concepts, but such concepts are so basic to our understanding that there is no way in which they can be defined, since the very process of definition implies that a less basic (and less well understood) concept is defined in terms of concepts that are more basic and so (ideally) more well understood. Very general concepts can often only be elucidated, for example by means of examples, or paraphrase. There is no self-evident way of dividing the world up into concepts, and certainly no non-controversial one There is no neutral ground that can serve as a means of translating between specialized (or "lower" or "application-specific") ontologies Human language itself is already an arbitrary approximation of just one among many possible conceptual maps. To draw any necessary correlation between English words and any number of intellectual concepts, that we might like to represent in our ontologies, is just asking for trouble. (WordNet, for instance, is successful and useful, precisely because it does not pretend to be a general-purpose upper ontology; rather, it is a tool for semantic / syntactic / linguistic disambiguation, which is richly embedded in the particulars and peculiarities of the English language.) Any hierarchical or topological representation of concepts must begin from some ontological, epistemological, linguistic, cultural, and ultimately pragmatic perspective. Such pragmatism does not allow for the exclusion of politics between persons or groups, indeed it requires they be considered as perhaps more basic primitives than any that are represented. Those who doubt the feasibility of general purpose ontologies are more inclined to ask "what specific purpose do we have in mind for this conceptual map of entities and what practical difference will this ontology make?" This pragmatic philosophical position surrenders all hope of devising the encoded ontology version of "The world is everything that is the case." (Wittgenstein, Tractatus Logico-Philosophicus). Finally, there are objections similar to those against artificial intelligence. Technically, the complex concept acquisition and the social / linguistic interactions of human beings suggest any axiomatic foundation of "most basic" concepts must be cognitive biological or otherwise difficult to characterize since we don't have axioms for such systems. Ethically, any general-purpose ontology could quickly become an actual tyranny by recruiting adherents into a political program designed to propagate it and its funding means, and possibly defend it by violence. Historically, inconsistent and irrational belief systems have proven capable of commanding obedience to the detriment or harm of persons both inside and outside a society that accepts them. How much more harmful would a consistent rational one be, were it to contain even one or two basic assumptions incompatible with human life? === Arguments for the feasibility of an upper ontology === Many of those who doubt the possibility of developing wide agreement on a common upper ontology fall into one of two traps: they assert that there is no possibility of universal agreement on any conceptual scheme; but they argue that a practical common ontology does not need to have universal agreement, it only needs a large enough user community (as is the case for human languages) to make it profitable for developers to use it as a means to general interoperability, and for third-party developer to develop utilities to make it easier to use; and they point out that developers of data schemes find different representations congenial for their local purposes; but they do not demonstrate that these different representations are in fact logically inconsistent. In fact, different representations of assertions about the real world (though not philosophical models), if they accurately reflect the world, must be logically consistent, even if they focus on different aspects of the same physical object or phenomenon. If any two assertions about the real world are logically inconsistent, one or both must be wrong, and that is a topic for experimental investigation, not for ontological representation. In practice, representations of the real world are created as and known to be approximations to the basic reality, and their use is circumscribed by the limits of e

    Read more →
  • Artificial intelligence industry in Italy

    Artificial intelligence industry in Italy

    The artificial intelligence industry in Italy is growing and supports industrial development. In 2024 it reached a new record, reaching 1.2 billion euros with a growth of +58% compared to 2023. While in 2025, the growth of artificial intelligence in the industrial application was even greater than in 2024 both in terms of value and application to industrial sectors. == History == The roots of AI research in Italy extend back to the 1970s, when Italian scholars began exploring automated reasoning, programming language semantics, and pattern recognition. Researchers such as those involved in early projects at the National Research Council and various universities laid the groundwork for subsequent academic and industrial developments in the field. During this period, the focus was predominantly on developing algorithms for automated theorem proving and building systems to reason about complex mathematical problems. This era witnessed the birth of methodologies that would later influence numerous AI subfields, from natural language processing (NLP) to robotics. === Institutional milestones and academic contributions === A turning point in the Italian AI landscape was the formation of the Italian Association for Artificial Intelligence (AIxIA) in 1988. Founded by academics, including Luigia Carlucci Aiello, the association established a platform for collaboration between universities, research centers, and industry. Led by Aiello, AIIA played a role in promoting research, organizing national conferences, and fostering international partnerships that connected Italy's AI community to global networks. At the same time, professors such as Roberto Navigli and numerous practitioners contributed to the advancement of AI in Italy. Navigli has worked in multilingual NLP, including the creation of BabelNet, and led the Minerva project. === Industrial AI === Over recent decades, numerous national and European initiatives supported by funding from programs such as the National Recovery and Resilience Plan (PNRR) have spurred the transition from theoretical research to practical applications. Industrial sectors including manufacturing, banking, and healthcare increasingly embraced AI-driven automation, while research institutions collaborated with industrial partners to deploy cutting-edge solutions. In recent years, Italy has also seen the establishment of specialized research centers and institutes aimed at bridging the gap between academic innovation and industrial application. These initiatives indicate a broader national commitment to integrating AI into the fabric of Italian industry. == Recent developments == === Emergence of generative AI === A landmark in Italy's modern AI evolution is the development of Minerva AI. Developed by the Sapienza NLP research group at Sapienza University of Rome and led by Professor Roberto Navigli, Minerva represents the first family of large language models (LLMs) trained from scratch with a primary focus on the Italian language. ==== Minerva 7B ==== The latest iteration, Minerva 7B, has 7 billion parameters and has been trained on an extensive corpus of over 1.5 trillion words. By using advanced instruction tuning techniques, Minerva 7B is able to produce highly accurate, coherent, and contextually sensitive responses addressing common issues such as hallucinations and inappropriate content generation. This breakthrough sets a benchmark for transparent, open-source AI development in the country. Minerva's development, carried out within the FAIR (Future Artificial Intelligence Research) project in collaboration with CINECA and supported by supercomputing resources like the Leonardo (supercomputer), aligns closely with Italy's cultural and linguistic heritage. === Establishment of AI4I === The recent establishment of the Istituto Italiano per l’Intelligenza Artificiale (AI4I) is part of Italy's strategy to improve its industrial competitiveness in AI. This dedicated institute aims to bridge the gap between research institutions and industrial enterprises; promote training and R&D support to nurture the next generation of Italian AI experts; and enhance national competitiveness. This initiative is expected to serve as a hub for applied AI research, driving innovations that are tailored to the specific needs of Italian industry and public administration. === Benefits of InvestAI === Italy's AI industry stands to benefit from the European InvestAI initiative, a plan unveiled at the recent AI Action Summit in Paris. InvestAI is an effort by the European Commission to mobilize €200 billion for AI investments, with a dedicated €20 billion fund earmarked for building AI gigafactories. These gigafactories are planned as large-scale hubs for training advanced, complex AI models using approximately 100,000 last-generation AI chips. For Italy, this investment presents several major opportunities: Access to State-of-the-Art Infrastructure: Italian companies, research institutions, and start-ups can leverage the gigafactories’ immense computational resources, enabling them to train highly sophisticated language models and other AI systems. Enhanced Competitiveness and Collaboration: With InvestAI's layered funding model where EU funds help de-risk private investments Italian firms can access capital more readily. This will bolster public–private partnerships and create a more dynamic AI ecosystem that spans from academic research to industrial applications. Alignment with National and Regional Initiatives: The Istituto Italiano per l’Intelligenza Artificiale (AI4I), based in Turin, is already recognized as a strategic asset by both Italy and the European Union. As the main recipient of InvestAI funds in Italy, AI4I will play a pivotal role in implementing these investments locally, fostering innovation in sectors like manufacturing, healthcare and aerospace. Commission President Ursula von der Leyen emphasized that InvestAI is designed to democratize AI innovation throughout Europe by ensuring that even smaller companies have access to high-performance computing power. For Italy, this means not only keeping pace with global leaders but also harnessing European-scale investments to transform its AI industry and drive economic growth.

    Read more →
  • Dabbler

    Dabbler

    Dabbler is natural media drawing software for beginners. It was initially developed by Fractal Design Corporation. It is a simplified version of Fractal Design Painter, and included multimedia tutorials and a fullscreen interface. Dabbler was released as "Art Dabbler" after the MetaCreations merger, and rights were eventually transferred to Corel. Dabbler operating systems are Mac OS and Microsoft Windows.

    Read more →
  • Higuchi dimension

    Higuchi dimension

    In fractal geometry, the Higuchi dimension (or Higuchi fractal dimension (HFD)) is an approximate value for the box-counting dimension of the graph of a real-valued function or time series. This value is obtained via an algorithmic approximation so one also talks about the Higuchi method. It has many applications in science and engineering and has been applied to subjects like characterizing primary waves in seismograms, clinical neurophysiology and analyzing changes in the electroencephalogram in Alzheimer's disease. == Formulation of the method == The original formulation of the method is due to T. Higuchi. Given a time series X : { 1 , … , N } → R {\displaystyle X:\{1,\dots ,N\}\to \mathbb {R} } consisting of N {\displaystyle N} data points and a parameter k m a x ≥ 2 {\displaystyle k_{\mathrm {max} }\geq 2} the Higuchi Fractal dimension (HFD) of X {\displaystyle X} is calculated in the following way: For each k ∈ { 1 , … , k m a x } {\displaystyle k\in \{1,\dots ,k_{\mathrm {max} }}\} and m ∈ { 1 , … , k } {\displaystyle m\in \{1,\dots ,k}\} define the length L m ( k ) {\displaystyle L_{m}(k)} by L m ( k ) = N − 1 ⌊ N − m k ⌋ k 2 ∑ i = 1 ⌊ N − m k ⌋ | X N ( m + i k ) − X N ( m + ( i − 1 ) k ) | . {\displaystyle L_{m}(k)={\frac {N-1}{\lfloor {\frac {N-m}{k}}\rfloor k^{2}}}\sum _{i=1}^{\lfloor {\frac {N-m}{k}}\rfloor }|X_{N}(m+ik)-X_{N}(m+(i-1)k)|.} The length L ( k ) {\displaystyle L(k)} is defined by the average value of the k {\displaystyle k} lengths L 1 ( k ) , … , L k ( k ) {\displaystyle L_{1}(k),\dots ,L_{k}(k)} , L ( k ) = 1 k ∑ m = 1 k L m ( k ) . {\displaystyle L(k)={\frac {1}{k}}\sum _{m=1}^{k}L_{m}(k).} The slope of the best-fitting linear function through the data points { ( log ⁡ 1 k , log ⁡ L ( k ) ) } {\displaystyle \left\{\left(\log {\frac {1}{k}},\log L(k)\right)\right\}} is defined to be the Higuchi fractal dimension of the time-series X {\displaystyle X} . == Application to functions == For a real-valued function f : [ 0 , 1 ] → R {\displaystyle f:[0,1]\to \mathbb {R} } one can partition the unit interval [ 0 , 1 ] {\displaystyle [0,1]} into N {\displaystyle N} equidistantly intervals [ t j , t j + 1 ) {\displaystyle [t_{j},t_{j+1})} and apply the Higuchi algorithm to the times series X ( j ) = f ( t j ) {\displaystyle X(j)=f(t_{j})} . This results into the Higuchi fractal dimension of the function f {\displaystyle f} . It was shown that in this case the Higuchi method yields an approximation for the box-counting dimension of the graph of f {\displaystyle f} as it follows a geometrical approach (see Liehr & Massopust 2020). == Robustness and stability == Applications to fractional Brownian functions and the Weierstrass function reveal that the Higuchi fractal dimension can be close to the box-dimension. On the other hand, the method can be unstable in the case where the data X ( 1 ) , … , X ( N ) {\displaystyle X(1),\dots ,X(N)} are periodic or if subsets of it lie on a horizontal line (see Liehr & Massopust 2020).

    Read more →
  • Very large database

    Very large database

    A very large database, (originally written very large data base) or VLDB, is a database that contains a very large amount of data, so much that it can require specialized architectural, management, processing and maintenance methodologies. == Definition == The vague adjectives of very and large allow for a broad and subjective interpretation, but attempts at defining a metric and threshold have been made. Early metrics were the size of the database in a canonical form via database normalization or the time for a full database operation like a backup. Technology improvements have continually changed what is considered very large. One definition has suggested that a database has become a VLDB when it is "too large to be maintained within the window of opportunity… the time when the database is quiet". == Sizes of a VLDB database == There is no absolute amount of data that can be cited. For example, one cannot say that any database with more than 1 TB of data is considered a VLDB. This absolute amount of data has varied over time as computer processing, storage and backup methods have become better able to handle larger amounts of data. That said, VLDB issues may start to appear when 1 TB is approached, and are more than likely to have appeared as 30 TB or so is exceeded. == VLDB challenges == Key areas where a VLDB may present challenges include configuration, storage, performance, maintenance, administration, availability and server resources. === Configuration === Careful configuration of databases that lie in the VLDB realm is necessary to alleviate or reduce issues raised by VLDB databases. === Administration === The complexities of managing a VLDB can increase exponentially for the database administrator as database size increases. === Availability and maintenance === When dealing with VLDB operations relating to maintenance and recovery such as database reorganizations and file copies which were quite practical on a non-VLDB take very significant amounts of time and resources for a VLDB database. In particular it typically infeasible to meet a typical recovery time objective (RTO), the maximum expected time a database is expected to be unavailable due to interruption, by methods which involve copying files from disk or other storage archives. To overcome these issues techniques such as clustering, cloned/replicated/standby databases, file-snapshots, storage snapshots or a backup manager may help achieve the RTO and availability, although individual methods may have limitations, caveats, license, and infrastructure requirements while some may risk data loss and not meet the recovery point objective (RPO). For many systems only geographically remote solutions may be acceptable. ==== Backup and recovery ==== Best practice is for backup and recovery to be architectured in terms of the overall availability and business continuity solution. === Performance === Given the same infrastructure there may typically be a decrease in performance, that is increase in response time as database size increases. Some accesses will simply have more data to process (scan) which will take proportionally longer (linear time); while the indexes used to access data may grow slightly in height requiring perhaps an extra storage access to reach the data (sub-linear time). Other effects can be caching becoming less efficient because proportionally less data can be cached and while some indexes such as the B+ automatically sustain well with growth others such as a hash table may need to be rebuilt. Should an increase in database size cause the number of accessors of the database to increase then more server and network resources may be consumed, and the risk of contention will increase. Some solutions to regaining performance include partitioning, clustering, possibly with sharding, or use of a database machine. ==== Partitioning ==== Partitioning may be able assist the performance of bulk operations on a VLDB including backup and recovery., bulk movements due to information lifecycle management (ILM), reducing contention as well as allowing optimization of some query processing. === Storage === In order to satisfy needs of a VLDB the database storage needs to have low access latency and contention, high throughput, and high availability. === Server resources === The increasing size of a VLDB may put pressure on server and network resources and a bottleneck may appear that may require infrastructure investment to resolve. == Relationship to big data == VLDB is not the same as big data, but the storage aspect of big data may involve a VLDB database. That said some of the storage solutions supporting big data were designed from the start to support large volumes of data, so database administrators may not encounter VLDB issues that older versions of traditional RDBMS's might encounter.

    Read more →
  • Gutmann method

    Gutmann method

    The Gutmann method is an algorithm for securely erasing the contents of computer hard disk drives, such as files. Devised by Peter Gutmann and Colin Plumb and presented in the paper Secure Deletion of Data from Magnetic and Solid-State Memory in July 1996, it involved writing a series of 35 patterns over the region to be erased. The selection of patterns assumes that the user does not know the encoding mechanism used by the drive, so it includes patterns designed specifically for three types of drives. A user who knows which type of encoding the drive uses can choose only those patterns intended for their drive. A drive with a different encoding mechanism would need different patterns. Most of the patterns in the Gutmann method were designed for older MFM/RLL-encoded disks. Gutmann himself has noted that more modern drives no longer use these older encoding techniques, making parts of the method irrelevant. He said "In the time since this paper was published, some people have treated the 35-pass overwrite technique described in it more as a kind of voodoo incantation to banish evil spirits than the result of a technical analysis of drive encoding techniques". Since about 2001, some ATA IDE and SATA hard drive manufacturer designs include support for the ATA Secure Erase standard, obviating the need to apply the Gutmann method when erasing an entire drive. The Gutmann method does not apply to USB sticks: a 2011 study reports that 71.7% of data remained available. On solid state drives it resulted in 0.8–4.3% recovery. == Background == The delete function in most operating systems simply marks the space occupied by the file as reusable (removes the pointer to the file) without immediately removing any of its contents. At this point the file can be fairly easily recovered by numerous recovery applications. However, once the space is overwritten with other data, there is no known way to use software to recover it. It cannot be done with software alone since the storage device only returns its current contents via its normal interface. Gutmann claims that intelligence agencies have sophisticated tools, including magnetic force microscopes, which together with image analysis, can detect the previous values of bits on the affected area of the media (for example hard disk). This claim however seems to be invalid based on the thesis "Data Reconstruction from a Hard Disk Drive using Magnetic Force Microscopy". == Method == An overwrite session consists of a lead-in of four random write patterns, followed by patterns 5 to 31 (see rows of table below), executed in a random order, and a lead-out of four more random patterns. Each of patterns 5 to 31 was designed with a specific magnetic media encoding scheme in mind, which each pattern targets. The drive is written to for all the passes even though the table below only shows the bit patterns for the passes that are specifically targeted at each encoding scheme. The result should obscure any data on the drive so that only the most advanced physical scanning (e.g., using a magnetic force microscope) of the drive is likely to be able to recover any data. The series of patterns is as follows: Encoded bits shown in bold are what should be present in the ideal pattern, although due to the encoding the complementary bit is actually present at the start of the track. == Criticism == Daniel Feenberg of the National Bureau of Economic Research, an American private nonprofit research organization, criticized Gutmann's claim that intelligence agencies are likely to be able to read overwritten data, citing a lack of evidence for such claims. He finds that Gutmann cites one non-existent source and sources that do not actually demonstrate recovery, only partially-successful observations. The definition of "random" is also quite different from the usual one used: Gutmann expects the use of pseudorandom data with sequences known to the recovering side, not an unpredictable one such as a cryptographically secure pseudorandom number generator. Nevertheless, some published government security procedures consider an overwritten disk to still be sensitive. Human factors and potential limitations in the overwriting software create a residual risk that is not considered acceptable at the highest security levels. Gutmann himself has responded to some of these criticisms and also criticized how his algorithm has been abused in an epilogue to his original paper, in which he states: In the time since this paper was published, some people have treated the 35-pass overwrite technique described in it more as a kind of voodoo incantation to banish evil spirits than the result of a technical analysis of drive encoding techniques. As a result, they advocate applying the voodoo to PRML and EPRML drives even though it will have no more effect than a simple scrubbing with random data. In fact performing the full 35-pass overwrite is pointless for any drive since it targets a blend of scenarios involving all types of (normally-used) encoding technology, which covers everything back to 30+-year-old MFM methods (if you don't understand that statement, re-read the paper). If you're using a drive which uses encoding technology X, you only need to perform the passes specific to X, and you never need to perform all 35 passes. For any modern PRML/EPRML drive, a few passes of random scrubbing is the best you can do. As the paper says, "A good scrubbing with random data will do about as well as can be expected". This was true in 1996, and is still true now. Gutmann's statement has been criticized for not recognizing that PRML/EPRML does not replace RLL, with critics claiming PRML/EPRML to be a signal detection method rather than a data encoding method. Polish data recovery service Kaleron has also claimed that Gutmann's publication contains further factual errors and assumptions that do not apply to actual disks.

    Read more →
  • Feature hashing

    Feature hashing

    In machine learning, feature hashing, also known as the hashing trick (by analogy to the kernel trick), is a fast and space-efficient way of vectorizing features, i.e. turning arbitrary features into indices in a vector or matrix. It works by applying a hash function to the features and using their hash values as indices directly (after a modulo operation), rather than looking the indices up in an associative array. In addition to its use for encoding non-numeric values, feature hashing can also be used for dimensionality reduction. This trick is often attributed to Weinberger et al. (2009), but there exists a much earlier description of this method published by John Moody in 1989. == Motivation == === Motivating example === In a typical document classification task, the input to the machine learning algorithm (both during learning and classification) is free text. From this, a bag of words (BOW) representation is constructed: the individual tokens are extracted and counted, and each distinct token in the training set defines a feature (independent variable) of each of the documents in both the training and test sets. Machine learning algorithms, however, are typically defined in terms of numerical vectors. Therefore, the bags of words for a set of documents is regarded as a term-document matrix where each row is a single document, and each column is a single feature/word; the entry i, j in such a matrix captures the frequency (or weight) of the j'th term of the vocabulary in document i. (An alternative convention swaps the rows and columns of the matrix, but this difference is immaterial.) Typically, these vectors are extremely sparse—according to Zipf's law. The common approach is to construct, at learning time or prior to that, a dictionary representation of the vocabulary of the training set, and use that to map words to indices. Hash tables and tries are common candidates for dictionary implementation. E.g., the three documents John likes to watch movies. Mary likes movies too. John also likes football. can be converted, using the dictionary to the term-document matrix ( John likes to watch movies Mary too also football 1 1 1 1 1 0 0 0 0 0 1 0 0 1 1 1 0 0 1 1 0 0 0 0 0 1 1 ) {\displaystyle {\begin{pmatrix}{\textrm {John}}&{\textrm {likes}}&{\textrm {to}}&{\textrm {watch}}&{\textrm {movies}}&{\textrm {Mary}}&{\textrm {too}}&{\textrm {also}}&{\textrm {football}}\\1&1&1&1&1&0&0&0&0\\0&1&0&0&1&1&1&0&0\\1&1&0&0&0&0&0&1&1\end{pmatrix}}} (Punctuation was removed, as is usual in document classification and clustering.) The problem with this process is that such dictionaries take up a large amount of storage space and grow in size as the training set grows. On the contrary, if the vocabulary is kept fixed and not increased with a growing training set, an adversary may try to invent new words or misspellings that are not in the stored vocabulary so as to circumvent a machine learned filter. To address this challenge, Yahoo! Research attempted to use feature hashing for their spam filters. Note that the hashing trick isn't limited to text classification and similar tasks at the document level, but can be applied to any problem that involves large (perhaps unbounded) numbers of features. === Mathematical motivation === Mathematically, a token is an element t {\displaystyle t} in a finite (or countably infinite) set T {\displaystyle T} . Suppose we only need to process a finite corpus, then we can put all tokens appearing in the corpus into T {\displaystyle T} , meaning that T {\displaystyle T} is finite. However, suppose we want to process all possible words made of the English letters, then T {\displaystyle T} is countably infinite. Most neural networks can only operate on real vector inputs, so we must construct a "dictionary" function ϕ : T → R n {\displaystyle \phi :T\to \mathbb {R} ^{n}} . When T {\displaystyle T} is finite, of size | T | = m ≤ n {\displaystyle |T|=m\leq n} , then we can use one-hot encoding to map it into R n {\displaystyle \mathbb {R} ^{n}} . First, arbitrarily enumerate T = { t 1 , t 2 , . . , t m } {\displaystyle T=\{t_{1},t_{2},..,t_{m}\}} , then define ϕ ( t i ) = e i {\displaystyle \phi (t_{i})=e_{i}} . In other words, we assign a unique index i {\displaystyle i} to each token, then map the token with index i {\displaystyle i} to the unit basis vector e i {\displaystyle e_{i}} . One-hot encoding is easy to interpret, but it requires one to maintain the arbitrary enumeration of T {\displaystyle T} . Given a token t ∈ T {\displaystyle t\in T} , to compute ϕ ( t ) {\displaystyle \phi (t)} , we must find out the index i {\displaystyle i} of the token t {\displaystyle t} . Thus, to implement ϕ {\displaystyle \phi } efficiently, we need a fast-to-compute bijection h : T → { 1 , . . . , m } {\displaystyle h:T\to \{1,...,m\}} , then we have ϕ ( t ) = e h ( t ) {\displaystyle \phi (t)=e_{h(t)}} . In fact, we can relax the requirement slightly: It suffices to have a fast-to-compute injection h : T → { 1 , . . . , n } {\displaystyle h:T\to \{1,...,n\}} , then use ϕ ( t ) = e h ( t ) {\displaystyle \phi (t)=e_{h(t)}} . In practice, there is no simple way to construct an efficient injection h : T → { 1 , . . . , n } {\displaystyle h:T\to \{1,...,n\}} . However, we do not need a strict injection, but only an approximate injection. That is, when t ≠ t ′ {\displaystyle t\neq t'} , we should probably have h ( t ) ≠ h ( t ′ ) {\displaystyle h(t)\neq h(t')} , so that probably ϕ ( t ) ≠ ϕ ( t ′ ) {\displaystyle \phi (t)\neq \phi (t')} . At this point, we have just specified that h {\displaystyle h} should be a hashing function. Thus we reach the idea of feature hashing. == Algorithms == === Feature hashing (Weinberger et al. 2009) === The basic feature hashing algorithm presented in (Weinberger et al. 2009) is defined as follows. First, one specifies two hash functions: the kernel hash h : T → { 1 , 2 , . . . , n } {\displaystyle h:T\to \{1,2,...,n\}} , and the sign hash ζ : T → { − 1 , + 1 } {\displaystyle \zeta :T\to \{-1,+1\}} . Next, one defines the feature hashing function: ϕ : T → R n , ϕ ( t ) = ζ ( t ) e h ( t ) {\displaystyle \phi :T\to \mathbb {R} ^{n},\quad \phi (t)=\zeta (t)e_{h(t)}} Finally, extend this feature hashing function to strings of tokens by ϕ : T ∗ → R n , ϕ ( t 1 , . . . , t k ) = ∑ j = 1 k ϕ ( t j ) {\displaystyle \phi :T^{}\to \mathbb {R} ^{n},\quad \phi (t_{1},...,t_{k})=\sum _{j=1}^{k}\phi (t_{j})} where T ∗ {\displaystyle T^{}} is the set of all finite strings consisting of tokens in T {\displaystyle T} . Equivalently, ϕ ( t 1 , . . . , t k ) = ∑ j = 1 k ζ ( t j ) e h ( t j ) = ∑ i = 1 n ( ∑ j : h ( t j ) = i ζ ( t j ) ) e i {\displaystyle \phi (t_{1},...,t_{k})=\sum _{j=1}^{k}\zeta (t_{j})e_{h(t_{j})}=\sum _{i=1}^{n}\left(\sum _{j:h(t_{j})=i}\zeta (t_{j})\right)e_{i}} ==== Geometric properties ==== We want to say something about the geometric property of ϕ {\displaystyle \phi } , but T {\displaystyle T} , by itself, is just a set of tokens, we cannot impose a geometric structure on it except the discrete topology, which is generated by the discrete metric. To make it nicer, we lift it to T → R T {\displaystyle T\to \mathbb {R} ^{T}} , and lift ϕ {\displaystyle \phi } from ϕ : T → R n {\displaystyle \phi :T\to \mathbb {R} ^{n}} to ϕ : R T → R n {\displaystyle \phi :\mathbb {R} ^{T}\to \mathbb {R} ^{n}} by linear extension: ϕ ( ( x t ) t ∈ T ) = ∑ t ∈ T x t ζ ( t ) e h ( t ) = ∑ i = 1 n ( ∑ t : h ( t ) = i x t ζ ( t ) ) e i {\displaystyle \phi ((x_{t})_{t\in T})=\sum _{t\in T}x_{t}\zeta (t)e_{h(t)}=\sum _{i=1}^{n}\left(\sum _{t:h(t)=i}x_{t}\zeta (t)\right)e_{i}} There is an infinite sum there, which must be handled at once. There are essentially only two ways to handle infinities. One may impose a metric, then take its completion, to allow well-behaved infinite sums, or one may demand that nothing is actually infinite, only potentially so. Here, we go for the potential-infinity way, by restricting R T {\displaystyle \mathbb {R} ^{T}} to contain only vectors with finite support: ∀ ( x t ) t ∈ T ∈ R T {\displaystyle \forall (x_{t})_{t\in T}\in \mathbb {R} ^{T}} , only finitely many entries of ( x t ) t ∈ T {\displaystyle (x_{t})_{t\in T}} are nonzero. Define an inner product on R T {\displaystyle \mathbb {R} ^{T}} in the obvious way: ⟨ e t , e t ′ ⟩ = { 1 , if t = t ′ , 0 , else. ⟨ x , x ′ ⟩ = ∑ t , t ′ ∈ T x t x t ′ ⟨ e t , e t ′ ⟩ {\displaystyle \langle e_{t},e_{t'}\rangle ={\begin{cases}1,{\text{ if }}t=t',\\0,{\text{ else.}}\end{cases}}\quad \langle x,x'\rangle =\sum _{t,t'\in T}x_{t}x_{t'}\langle e_{t},e_{t'}\rangle } As a side note, if T {\displaystyle T} is infinite, then the inner product space R T {\displaystyle \mathbb {R} ^{T}} is not complete. Taking its completion would get us to a Hilbert space, which allows well-behaved infinite sums. Now we have an inner product space, with enough structure to describe the geometry of the feature hashing function ϕ : R T → R n {\displaystyle \phi :\ma

    Read more →
  • Artificial intelligence industry in China

    Artificial intelligence industry in China

    The roots of the development of artificial intelligence in the People's Republic of China started in the late 1970s following Deng Xiaoping's reform and opening up emphasizing science and technology as the country's primary productive force. The initial stages of China's AI development were slow and encountered significant challenges due to lack of resources and talent. At the beginning China was behind most Western countries in terms of AI development. A majority of the research was led by scientists who had received higher education abroad. Since 2006, the Chinese government has steadily developed a national agenda for artificial intelligence development and emerged as one of the leading nations in artificial intelligence research and development. In 2016, the Chinese Communist Party (CCP) released its 13th Five-Year Plan in which it aimed to become a global AI leader by 2030. As of 2025, China is considered to be a world leader in AI technology along with the United States. The State Council has a list of "national AI teams" including fifteen China-based companies, including Baidu, Tencent, Alibaba, SenseTime, and iFlytek. Each company should lead the development of a designated specialized AI sector in China, such as facial recognition, software/hardware, and speech recognition. China's rapid AI development has significantly impacted Chinese society in many areas, including the socio-economic, military, intelligence, and political spheres. Agriculture, transportation, accommodation and food services, and manufacturing are the top industries that would be the most impacted by further AI deployment. The private sector, university laboratories, and the military are working collaboratively in many aspects as there are few current existing boundaries. In 2021, China published the Data Security Law of the People's Republic of China, its first national law addressing AI-related ethical concerns. In October 2022, the United States federal government announced a series of export controls and trade restrictions intended to restrict China's access to advanced computer chips for AI applications. In 2023, the Cyberspace Administration of China issued guidelines requiring that AI content upholds the ideology of the CCP including Core Socialist Values, avoids discrimination, respects intellectual property rights, and safeguards user data. In 2025, the Chinese government issued a document regarding training data, requiring companies to use as little as data deemed "unsafe" as possible, as well as requiring companies to test models regularly. Concerns have been raised about the effects of the Chinese government's censorship regime on the development of generative artificial intelligence and long-term talent acquisition with state of the country's demographics. Others have noted that official notions of AI safety require following the priorities of the CCP and are antithetical to standards in democratic societies and raised concerns about the extension of China's system of mass surveillance and censorship abroad. == History == The Chinese term for artificial intelligence (réngōngzhìnéng 人工智能) connotes "humanmade" intelligence. The term developed as mid-20th century localisation of the Japanese term jinko chino. The research and development of artificial intelligence in China started in the 1980s, with the announcement by Deng Xiaoping of the importance of science and technology for China's economic growth. === Late 1970s to early 2010s === Chinese artificial intelligence research and development began in late 1970s after Deng Xiaoping's reform and opening up. China's first national conference on AI occurred in 1979. Academic journals in the late 1970s began publishing literature reviews of Western research on AI topics. In the 1980s, a group of Chinese scientists launched AI research led by Qian Xuesen and Wu Wenjun. However, during the time, China's society still had a generally conservative view towards AI. In the early 1980s, Science Press published translated versions of Western textbooks such as Patrick Winston's Artificial Intelligence and Nils John Nilsson's Principles of Artificial Intelligence. In 1980, a journal of the Chinese Academy of Sciences convened its first annual National Symposium on Artificial Intelligence, which included national and international scholars like Herbert A. Simon. The Chinese Association for Artificial Intelligence (CAAI) was founded in September 1981 and was authorized by the Ministry of Civil Affairs. CAAI has continued to be the largest AI association in China as of 2025. In 1982, CAAI began publishing the Artificial Intelligence Journal, which published early AI research by Chinese academics. In the 1980s, Chinese research on AI was influenced by the field of cybernetics, particularly the work of Norbert Weiner and his text Cybernetics: Or Control and Communication in the Animal and the Machine. Chinese researchers at the time sought to situate AI as part of a broader "Intelligence Science" field which would include disciplines like mathematics, computer science, cognitive science, social sciences, and philosophy. In 1987, Tsinghua University began a research publication on AI. Beginning in 1993, smart automation and intelligence have been part of China's national technology plan. Since the 2000s, the Chinese government has further expanded its research and development funds for AI and the number of government-sponsored research projects has dramatically increased. In 2006, China announced a policy priority for the development of artificial intelligence, which was included in the National Medium and Long Term Plan for the Development of Science and Technology (2006–2020), released by the State Council. In the same year, artificial intelligence was also mentioned in the 11th Five-Year Plan. In 2011, the Association for the Advancement of Artificial Intelligence (AAAI) established a branch in Beijing, China. At same year, the Wu Wenjun Artificial Intelligence Science and Technology Award was founded in honor of Chinese mathematician Wu Wenjun, and it became the highest award for Chinese achievements in the field of artificial intelligence. The first award ceremony was held on May 14, 2012. In 2013, the International Joint Conferences on Artificial Intelligence (IJCAI) was held in Beijing, marking the first time the conference was held in China. This event coincided with the Chinese government's announcement of the "Chinese Intelligence Year," a significant milestone in China's development of artificial intelligence. === Late 2010s to early 2020s === AI became a major issue of commercial, public, and political focus in China in the latter half of the 2010s. Various interpretations of the primary cause for this increased focus exist, with some analyses focusing on the 2016 Go match between Google's AlphaGo and Lee Sedol, others emphasising the U.S. increasing trade restrictions on China's technology industries and the desire to achieve national technological self-sufficiency. The State Council of China issued "A Next Generation Artificial Intelligence Development Plan" (State Council Document [2017] No. 35) on 20 July 2017. In the document, the CCP Central Committee and the State Council urged governing bodies in China to promote the development of artificial intelligence. Specifically, the plan described AI as a strategic technology that has become a "focus of international competition".:2 The document urged significant investment in a number of strategic areas related to AI and called for close cooperation between the state and private sectors. It set the goal of China becoming the preeminent country for AI research and application by 2030. During the general secretaryship of Xi Jinping, artificial intelligence has been a focus of the CCP's military-civil fusion efforts. On the occasion of Xi's speech at the first plenary meeting of the Central Military-Civil Fusion Development Committee (CMCFDC), scholars from the National Defense University wrote in the PLA Daily that the "transferability of social resources" between economic and military ends is an essential component to being a great power. During the Two Sessions 2017,"artificial intelligence plus" was proposed to be elevated to a strategic level. The same year witnessed the emergence of multiple application-level usages in the medical field according to reports. In 2018, Xinhua News Agency, in partnership with Tencent's subsidiary Sogou, launched its first artificial intelligence-generated news anchor. In 2018, the State Council budgeted $2.1 billion for an AI industrial park in Mentougou district. In order to achieve this the State Council stated the need for massive talent acquisition, theoretical and practical developments, as well as public and private investments. Some of the stated motivations that the State Council gave for pursuing its AI strategy include the potential of artificial intelligence for industrial transformation, better social

    Read more →
  • Ballin' (Mustard and Roddy Ricch song)

    Ballin' (Mustard and Roddy Ricch song)

    "Ballin'" is a song by American record producer Mustard featuring American rapper Roddy Ricch. The track was released as the third single from Mustard's third studio album, Perfect Ten, on August 20, 2019, though it was available as early as the end of June 2019. The song and its accompanying video received acclaim from music critics, with Complex magazine naming it the Best Song of 2019. It peaked at number 11 on the Billboard Hot 100, marking Mustard's highest charting song in the US. The song received a nomination for Best Rap/Sung Performance at the 2020 Grammy Awards, making it the first time Ricch has been nominated for a Grammy and Mustard's first nomination as an artist. Later in 2019, the two released another collaboration, "High Fashion". == Background == Roddy Ricch revealed in an interview that the song was composed in late 2018, but Mustard wanted to keep it for his album, Perfect Ten, which he was still working on. The song was later included on the album, released in June 2019. Ricch said he knew the song was "hard enough" the first time he heard it, while Mustard proclaimed "this is going to be the one". == Composition and lyrics == "Ballin'" has a "rags to riches" theme. In its intro, the song samples girl group 702's 1997 top ten hit "Get It Together". The song features a "smooth, bouncy beat", with Roddy Ricch rapping about his come-up and ascent in the music industry. In the first verse, Ricch salutes fellow Los Angeles rapper, the late Nipsey Hussle and his girlfriend Lauren London: "I run these racks up with my queen like London and Nip". The line simultaneously references Ricch and Hussle's collaboration "Racks in the Middle", released earlier in 2019 as Hussle's last single before his death. Billboard's Heran Mamo noted that "in typical Hussle fashion", Roddy Ricch "narrates his life's hardships before delving into his newfound treasures". == Critical reception == The song was widely acclaimed by music critics. Charles Holmes of Rolling Stone magazine called it "a song of the year contender", while Complex and Billboard both named it as a "standout track" on the album. Pitchfork magazine included "Ballin'" in its list of The Best Rap Songs of 2019 and called it "the centerpiece of Mustard's underappreciated album Perfect Ten". Complex later named it the Best Song of 2019, calling it "a feel-good anthem so infectious you'll need antibiotics just to stop running it back". == Chart performance == "Ballin'" was at the time Mustard's highest charting song in the US, peaking at number 11 on the Billboard Hot 100. It was also Roddy Ricch's highest charting song, until he surpassed it a week later, with the release of his album track "The Box", which eventually reached number 1 on the chart. It reached number one on Billboard's Rhythmic Songs chart, becoming Mustard's second number one following "Pure Water" and Ricch's first number one. The song also topped the Rap Airplay chart. == Music video == The music video for the track was teased by Mustard on his Instagram page on September 29, 2019. The music video for the track was eventually released on October 2, 2019 to critical acclaim. The video features Mustard and Roddy Ricch driving a Lamborghini Aventador in Los Angeles, where they both are from, playing poker in a casino, and going to a strip club. This is contrasted with scenes in which Mustard and Roddy Ricch as children play cards with Monopoly money and playing with miniature toy Lamborghinis together, aspiring for wealth and luxury, representing how they went from "rags to riches". The video also pays tribute to rapper Nipsey Hussle, who had been killed a few months ago. == Live performances == On December 16, 2019, Roddy Ricch performed the song live, alongside an 8-piece orchestra, at Peppermint Club in Los Angeles for Audiomack's Trap Symphony series. Along with Mustard, he performed it at The Pop Out: Ken & Friends on June 19, 2024. == Other uses == The song can be heard on "Elyse's Skit", track 10 off Roddy Ricch's debut album Please Excuse Me for Being Antisocial. In the skit, which is an actual voicenote recording, the mother of a woman named Elyse sends her daughter a voicenote, with "Ballin'" playing in the background, while the mother proceeds to say "I can't get that damn song out my head", jokingly calling it "inappropriate music". Ricch called the skit "something natural". In 2023, AI covers of the song using models based on pop culture characters and real-world celebrities gained viral popularity. == Awards and nominations == 62nd Annual Grammy Awards == Charts == == Certifications ==

    Read more →
  • Anyword

    Anyword

    Anyword is a technology company that offers an artificial intelligence platform, using natural language processing to generate and optimize marketing text for websites, social media, email, and ads. The company also offers a complete managed service to publishers and brands to help them increase their revenue through social ads. It is used by National Geographic, Red Bull, The New York Times, BBC, Ted Baker, etc. The company has an office in New York, and Tel Aviv. == History == It was founded in 2013 — its original name was Keywee Inc. In March 2015, Anyword received $9.1 million in the Series A funding round led by a notable group of investors. In July 2016, the company was selected as an official Facebook Marketing Partner. In August 2019, Anyword was named Best Content Marketing Platform in the Digiday Technology Award winners. In November 2021, it raised $21 million in its Series B funding round.

    Read more →
  • Mozilla VPN

    Mozilla VPN

    Mozilla VPN is an open-source virtual private network developed by Mozilla. It launched in beta as Firefox Private Network on September 10, 2019, and officially launched on July 15, 2020, as Mozilla VPN. Mozilla VPN should not be confused with the built-in VPN in Firefox since version 149 released in March 2026, which is free with a monthly data limit of 50 GB but only masks traffic that originates in Firefox unlike Mozilla VPN that protects the entire device. == History == The Firefox Private Network web browser extension beta version was released on September 10, 2019, as part of the relaunch of Mozilla's Test Pilot Program, a program that allowed Firefox users to test experimental new features which had been shuttered in January 2019. The beta of the subscription-based standalone virtual private network for Android, Microsoft Windows, and Chromebook launched on February 19, 2020, with the iOS version following soon after. Firefox Private Network was rebranded as "Mozilla VPN" on June 18, 2020, and officially launched as Mozilla VPN on July 15, 2020. At launch, Mozilla VPN was available in six countries (the United States, Canada, the United Kingdom, Singapore, Malaysia, and New Zealand) for Windows 10, Android, and iOS (beta). Over time, the service also launched in Germany, France, Italy, Spain, Switzerland, Austria, Belgium, Netherlands, Ireland, Finland, Sweden, Poland, Czechia, Hungary, Romania, Bulgaria, Slovakia, Portugal, Denmark, Croatia, Lithuania, Slovenia, Latvia, Luxembourg, Estonia, Cyprus, and Malta. == Audits history == Cybersecurity firm Cure53 conducted a security audit for Mozilla VPN in August 2020 and identified multiple vulnerabilities, including one critical-severity vulnerability. In March 2021, Cure53 conducted a second security audit, which noted significant improvements since the 2020 audit. The second audit identified multiple issues, including two medium-severity and one high-severity vulnerability, but concluded that by the time of publication, only one vulnerability remained unresolved, and that it would require "a strong state-funded attacker-model" to be exploitable. Mozilla disclosed most of the vulnerabilities in July 2021 and released the full report by Cure53 in August 2021. In April 2023, Cure53 conducted a third security audit, the results of which Mozilla disclosed in December that year, along with the full report by Cure53. == Features == Mozilla VPN masks the user's IP address, hiding the user's location data from the websites accessed by the user, and encrypts all network activity. The service allows for up to 5 simultaneous connections, to any of more than 500 servers in 30+ countries, and is available on the mobile operating systems iOS and Android and the desktop operating systems Microsoft Windows, macOS and Linux. Mozilla VPN's infrastructure is provided by the Swedish Mullvad VPN service, which uses the WireGuard VPN protocol. The VPN software comes with additional features, like recommended server locations, the ability to block ads, block ad trackers and malware, the ability to exclude certain applications from protection, the ability to set multi-hop connections, and to set custom DNS servers. When used with Firefox and the official extension, Mozilla VPN allows the use of different settings per container as well as bypassing the VPN for specific websites.

    Read more →
  • Algorithm engineering

    Algorithm engineering

    Algorithm engineering focuses on the design, analysis, implementation, optimization, profiling and experimental evaluation of computer algorithms, bridging the gap between algorithmics theory and practical applications of algorithms in software engineering. It is a general methodology for algorithmic research. == Origins == In 1995, a report from an NSF-sponsored workshop "with the purpose of assessing the current goals and directions of the Theory of Computing (TOC) community" identified the slow speed of adoption of theoretical insights by practitioners as an important issue and suggested measures to reduce the uncertainty by practitioners whether a certain theoretical breakthrough will translate into practical gains in their field of work, and tackle the lack of ready-to-use algorithm libraries, which provide stable, bug-free and well-tested implementations for algorithmic problems and expose an easy-to-use interface for library consumers. But also, promising algorithmic approaches have been neglected due to difficulties in mathematical analysis. The term "algorithm engineering" was first used with specificity in 1997, with the first Workshop on Algorithm Engineering (WAE97), organized by Giuseppe F. Italiano. == Difference from algorithm theory == Algorithm engineering does not intend to replace or compete with algorithm theory, but tries to enrich, refine and reinforce its formal approaches with experimental algorithmics (also called empirical algorithmics). This way it can provide new insights into the efficiency and performance of algorithms in cases where the algorithm at hand is less amenable to algorithm theoretic analysis, formal analysis pessimistically suggests bounds which are unlikely to appear on inputs of practical interest, the algorithm relies on the intricacies of modern hardware architectures like data locality, branch prediction, instruction stalls, instruction latencies which the machine model used in Algorithm Theory is unable to capture in the required detail, the crossover between competing algorithms with different constant costs and asymptotic behaviors needs to be determined. == Methodology == Some researchers describe algorithm engineering's methodology as a cycle consisting of algorithm design, analysis, implementation and experimental evaluation, joined by further aspects like machine models or realistic inputs. They argue that equating algorithm engineering with experimental algorithmics is too limited, because viewing design and analysis, implementation and experimentation as separate activities ignores the crucial feedback loop between those elements of algorithm engineering. === Realistic models and real inputs === While specific applications are outside the methodology of algorithm engineering, they play an important role in shaping realistic models of the problem and the underlying machine, and supply real inputs and other design parameters for experiments. === Design === Compared to algorithm theory, which usually focuses on the asymptotic behavior of algorithms, algorithm engineers need to keep further requirements in mind: Simplicity of the algorithm, implementability in programming languages on real hardware, and allowing code reuse. Additionally, constant factors of algorithms have such a considerable impact on real-world inputs that sometimes an algorithm with worse asymptotic behavior performs better in practice due to lower constant factors. === Analysis === Some problems can be solved with heuristics and randomized algorithms in a simpler and more efficient fashion than with deterministic algorithms. Unfortunately, this makes even simple randomized algorithms difficult to analyze because there are subtle dependencies to be taken into account. === Implementation === Huge semantic gaps between theoretical insights, formulated algorithms, programming languages and hardware pose a challenge to efficient implementations of even simple algorithms, because small implementation details can have rippling effects on execution behavior. The only reliable way to compare several implementations of an algorithm is to spend an considerable amount of time on tuning and profiling, running those algorithms on multiple architectures, and looking at the generated machine code. === Experiments === See: Experimental algorithmics === Application engineering === Implementations of algorithms used for experiments differ in significant ways from code usable in applications. While the former prioritizes fast prototyping, performance and instrumentation for measurements during experiments, the latter requires thorough testing, maintainability, simplicity, and tuning for particular classes of inputs. === Algorithm libraries === Stable, well-tested algorithm libraries like LEDA play an important role in technology transfer by speeding up the adoption of new algorithms in applications. Such libraries reduce the required investment and risk for practitioners, because it removes the burden of understanding and implementing the results of academic research. == Conferences == Two main conferences on Algorithm Engineering are organized annually, namely: Symposium on Experimental Algorithms (SEA), established in 1997 (formerly known as WEA). SIAM Meeting on Algorithm Engineering and Experiments (ALENEX), established in 1999. The 1997 Workshop on Algorithm Engineering (WAE'97) was held in Venice (Italy) on September 11–13, 1997. The Third International Workshop on Algorithm Engineering (WAE'99) was held in London, UK in July 1999. The first Workshop on Algorithm Engineering and Experimentation (ALENEX99) was held in Baltimore, Maryland on January 15–16, 1999. It was sponsored by DIMACS, the Center for Discrete Mathematics and Theoretical Computer Science (at Rutgers University), with additional support from SIGACT, the ACM Special Interest Group on Algorithms and Computation Theory, and SIAM, the Society for Industrial and Applied Mathematics.

    Read more →
  • Divide-and-conquer algorithm

    Divide-and-conquer algorithm

    In computer science, divide and conquer is an algorithm design paradigm. A divide-and-conquer algorithm recursively breaks down a problem into two or more sub-problems of the same or related type, until these become simple enough to be solved directly. The solutions to the sub-problems are then combined to give a solution to the original problem. The divide-and-conquer technique is the basis of efficient algorithms for many problems, such as sorting (e.g., quicksort, merge sort), multiplying large numbers (e.g., the Karatsuba algorithm), finding the closest pair of points, syntactic analysis (e.g., top-down parsers), SAT solving, and computing the discrete Fourier transform (FFT). Designing efficient divide-and-conquer algorithms can be difficult. As in mathematical induction, it is often necessary to generalize the problem to make it amenable to a recursive solution. The correctness of a divide-and-conquer algorithm is usually proved by mathematical induction, and its computational cost is often determined by solving recurrence relations. == Divide and conquer == The divide-and-conquer paradigm is often used to find an optimal solution of a problem. Its basic idea is to decompose a given problem into two or more similar, but simpler, subproblems, to solve them in turn, and to compose their solutions to solve the given problem. Problems of sufficient simplicity are solved directly. For example, to sort a given list of n natural numbers, split it into two lists of about n/2 numbers each, sort each of them in turn, and interleave both results appropriately to obtain the sorted version of the given list (see the picture). This approach is known as the merge sort algorithm. The name "divide and conquer" is sometimes applied to algorithms that reduce each problem to only one sub-problem, such as the binary search algorithm for finding a record in a sorted list (or its analogue in numerical computing, the bisection algorithm for root finding). These algorithms can be implemented more efficiently than general divide-and-conquer algorithms; in particular, if they use tail recursion, they can be converted into simple loops. Under this broad definition, however, every algorithm that uses recursion or loops could be regarded as a "divide-and-conquer algorithm". Therefore, some authors consider that the name "divide and conquer" should be used only when each problem may generate two or more subproblems. The name decrease and conquer has been proposed instead for the single-subproblem class. An important application of divide and conquer is in optimization, where if the search space is reduced ("pruned") by a constant factor at each step, the overall algorithm has the same asymptotic complexity as the pruning step, with the constant depending on the pruning factor (by summing the geometric series); this is known as prune and search. == Early historical examples == Early examples of these algorithms are primarily decrease and conquer – the original problem is successively broken down into single subproblems, and indeed can be solved iteratively. Binary search, a decrease-and-conquer algorithm where the subproblems are of roughly half the original size, has a long history. While a clear description of the algorithm on computers appeared in 1946 in an article by John Mauchly, the idea of using a sorted list of items to facilitate searching dates back at least as far as Babylonia in 200 BC. Another ancient decrease-and-conquer algorithm is the Euclidean algorithm to compute the greatest common divisor of two numbers by reducing the numbers to smaller and smaller equivalent subproblems, which dates to several centuries BC. An early example of a divide-and-conquer algorithm with multiple subproblems is Gauss's 1805 description of what is now called the Cooley–Tukey fast Fourier transform (FFT) algorithm, although he did not analyze its operation count quantitatively, and FFTs did not become widespread until they were rediscovered over a century later. An early two-subproblem D&C algorithm that was specifically developed for computers and properly analyzed is the merge sort algorithm, invented by John von Neumann in 1945. Another notable example is the algorithm invented by Anatolii A. Karatsuba in 1960 that could multiply two n-digit numbers in O ( n log 2 ⁡ 3 ) {\displaystyle O(n^{\log _{2}3})} operations (in Big O notation). This algorithm disproved Andrey Kolmogorov's 1956 conjecture that Ω ( n 2 ) {\displaystyle \Omega (n^{2})} operations would be required for that task. As another example of a divide-and-conquer algorithm that did not originally involve computers, Donald Knuth gives the method a post office typically uses to route mail: letters are sorted into separate bags for different geographical areas, each of these bags is itself sorted into batches for smaller sub-regions, and so on until they are delivered. This is related to a radix sort, described for punch-card sorting machines as early as 1929. == Advantages == === Solving difficult problems === Divide and conquer is a powerful tool for solving conceptually difficult problems: all it requires is a way of breaking the problem into sub-problems, of solving the trivial cases, and of combining sub-problems to the original problem. Similarly, decrease and conquer only requires reducing the problem to a single smaller problem, such as the classic Tower of Hanoi puzzle, which reduces moving a tower of height n {\displaystyle n} to move a tower of height n − 1 {\displaystyle n-1} . === Algorithm efficiency === The divide-and-conquer paradigm often helps in the discovery of efficient algorithms. It was the key, for example, to Karatsuba's fast multiplication method, the quicksort and mergesort algorithms, the Strassen algorithm for matrix multiplication, and fast Fourier transforms. In all these examples, the D&C approach led to an improvement in the asymptotic cost of the solution. For example, if (a) the base cases have constant-bounded size, the work of splitting the problem and combining the partial solutions is proportional to the problem's size n {\displaystyle n} , and (b) there is a bounded number p {\displaystyle p} of sub-problems of size ~ n p {\displaystyle {\frac {n}{p}}} at each stage, then the cost of the divide-and-conquer algorithm will be O ( n log p ⁡ n ) {\displaystyle O(n\log _{p}n)} . For other types of divide-and-conquer approaches, running times can also be generalized. For example, when a) the work of splitting the problem and combining the partial solutions take c n {\displaystyle cn} time, where n {\displaystyle n} is the input size and c {\displaystyle c} is some constant; b) when n < 2 {\displaystyle n<2} , the algorithm takes time upper-bounded by c {\displaystyle c} , and c) there are q {\displaystyle q} subproblems where each subproblem has size ~ n 2 {\displaystyle {\frac {n}{2}}} . Then, the running times are as follows: if the number of subproblems q > 2 {\displaystyle q>2} , then the divide-and-conquer algorithm's running time is bounded by O ( n log 2 ⁡ q ) {\displaystyle O(n^{\log _{2}q})} . if the number of subproblems is exactly one, then the divide-and-conquer algorithm's running time is bounded by O ( n ) {\displaystyle O(n)} . If, instead, the work of splitting the problem and combining the partial solutions take c n 2 {\displaystyle cn^{2}} time, and there are 2 subproblems where each has size n 2 {\displaystyle {\frac {n}{2}}} , then the running time of the divide-and-conquer algorithm is bounded by O ( n 2 ) {\displaystyle O(n^{2})} . === Parallelism === Divide-and-conquer algorithms are naturally adapted for execution in multi-processor machines, especially shared-memory systems where the communication of data between processors does not need to be planned in advance because distinct sub-problems can be executed on different processors. === Memory access === Divide-and-conquer algorithms naturally tend to make efficient use of memory caches. The reason is that once a sub-problem is small enough, it and all its sub-problems can, in principle, be solved within the cache, without accessing the slower main memory. An algorithm designed to exploit the cache in this way is called cache-oblivious, because it does not contain the cache size as an explicit parameter. Moreover, D&C algorithms can be designed for important algorithms (e.g., sorting, FFTs, and matrix multiplication) to be optimal cache-oblivious algorithms–they use the cache in a probably optimal way, in an asymptotic sense, regardless of the cache size. In contrast, the traditional approach to exploiting the cache is blocking, as in loop nest optimization, where the problem is explicitly divided into chunks of the appropriate size—this can also use the cache optimally, but only when the algorithm is tuned for the specific cache sizes of a particular machine. The same advantage exists with regards to other hierarchical storage systems, such as NUMA or virtual memory, as well as for multip

    Read more →