Trigram tagger

Trigram tagger

In computational linguistics, a trigram tagger is a statistical method for automatically identifying words as being nouns, verbs, adjectives, adverbs, etc. based on second order Markov models that consider triples of consecutive words. It is trained on a text corpus as a method to predict the next word, taking the product of the probabilities of unigram, bigram and trigram. In speech recognition, algorithms utilizing trigram-tagger score better than those algorithms utilizing IIMM tagger but less well than Net tagger. The description of the trigram tagger is provided by Brants (2000).

Confidential computing

Confidential computing is a security and privacy-enhancing computational technique focused on protecting data in use. Confidential computing can be used in conjunction with storage and network encryption, which protect data at rest and data in transit respectively. It is designed to address software, protocol, cryptographic, and basic physical and supply-chain attacks, although some critics have demonstrated architectural and side-channel attacks effective against the technology. The technology protects data in use by performing computations in a hardware-based trusted execution environment (TEE). Confidential data is released to the TEE only once it is assessed to be trustworthy. Different types of confidential computing define the level of data isolation used, whether virtual machine, application, or function, and the technology can be deployed in on-premise data centers, edge locations, or the public cloud. It is often compared with other privacy-enhancing computational techniques such as fully homomorphic encryption, secure multi-party computation, and Trusted Computing. Confidential computing is promoted by the Confidential Computing Consortium (CCC) industry group, whose membership includes major providers of the technology. == Properties == Trusted execution environments (TEEs) "prevent unauthorized access or modification of applications and data while they are in use, thereby increasing the security level of organizations that manage sensitive and regulated data". Trusted execution environments can be instantiated on a computer's processing components such as a central processing unit (CPU) or a graphics processing unit (GPU). In their various implementations, TEEs can provide different levels of isolation including virtual machine, individual application, or compute functions. Typically, data in use in a computer's compute components and memory exists in a decrypted state and can be vulnerable to examination or tampering by unauthorized software or administrators. According to the CCC, confidential computing protects data in use through a minimum of three properties: Data confidentiality: "Unauthorized entities cannot view data while it is in use within the TEE". Data integrity: "Unauthorized entities cannot add, remove, or alter data while it is in use within the TEE". Code integrity: "Unauthorized entities cannot add, remove, or alter code executing in the TEE". In addition to trusted execution environments, remote cryptographic attestation is an essential part of confidential computing. The attestation process assesses the trustworthiness of a system and helps ensure that confidential data is released to a TEE only after it presents verifiable evidence that it is genuine and operating with an acceptable security posture. It allows the verifying party to assess the trustworthiness of a confidential computing environment through an "authentic, accurate, and timely report about the software and data state" of that environment. "Hardware-based attestation schemes rely on a trusted hardware component and associated firmware to execute attestation routines in a secure environment". Without attestation, a compromised system could deceive others into trusting it, claim it is running certain software in a TEE, and potentially compromise the confidentiality or integrity of the data being processed or the integrity of the trusted code. == Technical approaches == Technical approaches to confidential computing may vary in which software, infrastructure and administrator elements are allowed to access confidential data. The "trust boundary," which circumscribes a trusted computing base (TCB), defines which elements have the potential to access confidential data, whether they are acting benignly or maliciously. Confidential computing implementations enforce the defined trust boundary at a specific level of data isolation. The three main types of confidential computing are: Virtual machine isolation Application isolation, also known as process isolation Function isolation, also known as library isolation Virtual machine isolation removes the elements controlled by the computer infrastructure or cloud provider, but allows potential data access by elements inside a virtual machine running on the infrastructure. Application or process isolation permits data access only by authorized software applications or processes. Function or library isolation is designed to permit data access only by authorized subroutines or modules within a larger application, blocking access by any other system element, including unauthorized code in the larger application. == Threat model == As confidential computing is concerned with the protection of data in use, only certain threat models can be addressed by this technique. Other types of attacks are better addressed by other privacy-enhancing technologies. === In scope === The following threat vectors are generally considered in scope for confidential computing: Software attacks: including attacks on the host’s software and firmware. This may include the operating system, hypervisor, BIOS, other software and workloads. Protocol attacks: including "attacks on protocols associated with attestation as well as workload and data transport". This includes vulnerabilities in the "provisioning or placement of the workload" or data that could cause a compromise. Cryptographic attacks: including "vulnerabilities found in ciphers and algorithms due to a number of factors, including mathematical breakthroughs, availability of computing power and new computing approaches such as quantum computing". The CCC notes several caveats in this threat vector, including relative difficulty of upgrading cryptographic algorithms in hardware and recommendations that software and firmware be kept up-to-date. A multi-faceted, defense-in-depth strategy is recommended as a best practice. Basic physical attacks: including cold boot attacks, bus and cache snooping and plugging attack devices into an existing port, such as a PCI Express slot or USB port. Basic upstream supply-chain attacks: including attacks that would compromise TEEs through changes such as added debugging ports. The degree and mechanism of protection against these threats varies with specific confidential computing implementations. === Out of scope === Threats generally defined as out of scope for confidential computing include: Sophisticated physical attacks: including physical attacks that "require long-term and/or invasive access to hardware" such as chip scraping techniques and electron microscope probes. Upstream hardware supply-chain attacks: including attacks on the CPU manufacturing process, CPU supply chain in key injection/generation during manufacture. Attacks on components of a host system that are not directly providing the capabilities of the trusted execution environment are also generally out-of-scope. Availability attacks: confidential computing is designed to protect the confidentiality and integrity of protected data and code. It does not address availability attacks such as Denial of Service or Distributed Denial of Service attacks. == Use cases == Confidential computing can be deployed in the public cloud, on-premise data centers, or distributed "edge" locations, including network nodes, branch offices, industrial systems and others. === Data privacy and security === Confidential computing protects the confidentiality and integrity of data and code from the infrastructure provider, unauthorized or malicious software and system administrators, and other cloud tenants, which may be a concern for organizations seeking control over sensitive or regulated data. The additional security capabilities offered by confidential computing can help accelerate the transition of more sensitive workloads to the cloud or edge locations. === Multi-party analytics === Confidential computing can enable multiple parties to engage in joint analysis using confidential or regulated data inside a TEE while preserving privacy and regulatory compliance. In this case, all parties benefit from the shared analysis, but no party's sensitive data or confidential code is exposed to the other parties or system host. Examples include multiple healthcare organizations contributing data to medical research, or multiple banks collaborating to identify financial fraud or money laundering. Oxford University researchers proposed the alternative paradigm called "Confidential Remote Computing" (CRC), which supports confidential operations in Trusted Execution Environments across endpoint computers considering multiple stakeholders as mutually distrustful data, algorithm and hardware providers. === Confidential generative AI === Confidential computing technologies can be applied to various stages of a generative AI deployments to help increase data or model privacy, security, and regulatory compliance. TEEs and remote attestation can protect the integrity of data during AI model training, keep

AlphaGo Zero

AlphaGo Zero is a version of DeepMind's Go software AlphaGo. AlphaGo's team published an article in Nature in October 2017 introducing AlphaGo Zero, a version created without using data from human games, and stronger than any previous version. By playing games against itself, AlphaGo Zero: surpassed the strength of AlphaGo Lee in three days by winning 100 games to 0; reached the level of AlphaGo Master in 21 days; and exceeded all previous versions in 40 days. Training artificial intelligence (AI) without datasets derived from human experts has significant implications for the development of AI with superhuman skills, as expert data is "often expensive, unreliable, or simply unavailable." Demis Hassabis, the co-founder and CEO of DeepMind, said that AlphaGo Zero was so powerful because it was "no longer constrained by the limits of human knowledge". Furthermore, AlphaGo Zero performed better than standard deep reinforcement learning models (such as Deep Q-Network implementations) due to its integration of Monte Carlo tree search. David Silver, one of the first authors of DeepMind's papers published in Nature on AlphaGo, said that it is possible to have generalized AI algorithms by removing the need to learn from humans. Google later developed AlphaZero, a generalized version of AlphaGo Zero that could play chess and shōgi in addition to Go. In December 2017, AlphaZero beat the 3-day version of AlphaGo Zero by winning 60 games to 40, and with 8 hours of training it outperformed AlphaGo Lee on an Elo scale. AlphaZero also defeated a top chess program (Stockfish) and a top Shōgi program (Elmo). == Architecture == The network in AlphaGo Zero is a ResNet with two heads. The stem of the network takes as input a 17x19x19 tensor representation of the Go board. 8 channels are the positions of the current player's stones from the last eight time steps. (1 if there is a stone, 0 otherwise. If the time step go before the beginning of the game, then 0 in all positions.) 8 channels are the positions of the other player's stones from the last eight time steps. 1 channel is all 1 if black is to move, and 0 otherwise. The body is a ResNet with either 20 or 40 residual blocks and 256 channels. There are two heads, a policy head and a value head. Policy head outputs a logit array of size 19 × 19 + 1 {\displaystyle 19\times 19+1} , representing the logit of making a move in one of the points, plus the logit of passing. Value head outputs a number in the range ( − 1 , + 1 ) {\displaystyle (-1,+1)} , representing the expected score for the current player. -1 represents current player losing, and +1 winning. == Training == AlphaGo Zero's neural network was trained using TensorFlow, with 64 GPU workers and 19 CPU parameter servers. Only four TPUs were used for inference. The neural network initially knew nothing about Go beyond the rules. Unlike earlier versions of AlphaGo, Zero only perceived the board's stones, rather than having some rare human-programmed edge cases to help recognize unusual Go board positions. The AI engaged in reinforcement learning, playing against itself until it could anticipate its own moves and how those moves would affect the game's outcome. In the first three days AlphaGo Zero played 4.9 million games against itself in quick succession. It appeared to develop the skills required to beat top humans within just a few days, whereas the earlier AlphaGo took months of training to achieve the same level. According to Epoch.ai, training cost 3e23 FLOPs. For comparison, the researchers also trained a version of AlphaGo Zero using human games, AlphaGo Master, and found that it learned more quickly, but actually performed more poorly in the long run. DeepMind submitted its initial findings in a paper to Nature in April 2017, which was then published in October 2017. == Hardware cost == The hardware cost for a single AlphaGo Zero system in 2017, including the four TPUs, has been quoted as around $25 million. == Applications == According to Hassabis, AlphaGo's algorithms are likely to be of the most benefit to domains that require an intelligent search through an enormous space of possibilities, such as protein folding (see AlphaFold) or accurately simulating chemical reactions. AlphaGo's techniques are probably less useful in domains that are difficult to simulate, such as learning how to drive a car. DeepMind stated in October 2017 that it had already started active work on attempting to use AlphaGo Zero technology for protein folding, and stated it would soon publish new findings. == Reception == AlphaGo Zero was widely regarded as a significant advance, even when compared with its groundbreaking predecessor, AlphaGo. Oren Etzioni of the Allen Institute for Artificial Intelligence called AlphaGo Zero "a very impressive technical result" in "both their ability to do it—and their ability to train the system in 40 days, on four TPUs". The Guardian called it a "major breakthrough for artificial intelligence", citing Eleni Vasilaki of Sheffield University and Tom Mitchell of Carnegie Mellon University, who called it an impressive feat and an “outstanding engineering accomplishment" respectively. Mark Pesce of the University of Sydney called AlphaGo Zero "a big technological advance" taking us into "undiscovered territory". Gary Marcus, a psychologist at New York University, has cautioned that for all we know, AlphaGo may contain "implicit knowledge that the programmers have about how to construct machines to play problems like Go" and will need to be tested in other domains before being sure that its base architecture is effective at much more than playing Go. In contrast, DeepMind is "confident that this approach is generalisable to a large number of domains". In response to the reports, South Korean Go professional Lee Sedol said, "The previous version of AlphaGo wasn’t perfect, and I believe that’s why AlphaGo Zero was made." On the potential for AlphaGo's development, Lee said he will have to wait and see but also said it will affect young Go players. Mok Jin-seok, who directs the South Korean national Go team, said the Go world has already been imitating the playing styles of previous versions of AlphaGo and creating new ideas from them, and he is hopeful that new ideas will come out from AlphaGo Zero. Mok also added that general trends in the Go world are now being influenced by AlphaGo's playing style. "At first, it was hard to understand and I almost felt like I was playing against an alien. However, having had a great amount of experience, I’ve become used to it," Mok said. "We are now past the point where we debate the gap between the capability of AlphaGo and humans. It’s now between computers." Mok has reportedly already begun analyzing the playing style of AlphaGo Zero along with players from the national team. "Though having watched only a few matches, we received the impression that AlphaGo Zero plays more like a human than its predecessors," Mok said. Chinese Go professional Ke Jie commented on the remarkable accomplishments of the new program: "A pure self-learning AlphaGo is the strongest. Humans seem redundant in front of its self-improvement." == Comparison with predecessors == == AlphaZero == On 5 December 2017, DeepMind team released a preprint on arXiv, introducing AlphaZero, a program using generalized AlphaGo Zero's approach, which achieved within 24 hours a superhuman level of play in chess, shogi, and Go, defeating world-champion programs, Stockfish, Elmo, and 3-day version of AlphaGo Zero in each case. AlphaZero (AZ) is a more generalized variant of the AlphaGo Zero (AGZ) algorithm, and is able to play shogi and chess as well as Go. Differences between AZ and AGZ include: AZ has hard-coded rules for setting search hyperparameters. The neural network is now updated continually. Chess (unlike Go) can end in a tie; therefore AZ can take into account the possibility of a tie game. An open source program, Leela Zero, based on the ideas from the AlphaGo papers is available. It uses a GPU instead of the TPUs recent versions of AlphaGo rely on.

SAS Viya

SAS Viya is an artificial intelligence, analytics and data management platform developed by SAS Institute. == History == SAS Viya was released in 2016. The software was containerized with the release of Viya 4 in 2020. Viya has become one of SAS' most widely used platforms during the AI boom, as artificial intelligence becomes more widely used in business and computing. == Technical overview == The platform is cloud-native, and is executed on SAS's Cloud Analytics Services (CAS) engine. It is compatible with open source software, allowing users to build models using open sources tool such as R, Python and Jupyter. It integrates with major large language models like GPT-4 and Gemini Pro. The platform uses econometrics to create predictive models for forecasting scenarios based on complex data. It also has features for detecting algorithmic bias, auditing decisions and monitoring models. It is implemented through a low-code, no-code platform. The software is available on Amazon AWS Marketplace, Google Cloud, Red Hat OpenShift, and on Microsoft Azure Marketplace under a pay-as-you-use model. == Software == SAS Viya has released software as a service (SaaS) modules for creating AI content. These include Viya Workbench, Viya App Factory, Viya Copilot, and SAS Data Maker. The company also develops industry specific models, used by companies including Georgia-Pacific. == Applications == === Banking === The software is also widely used in business, especially in areas such as predictive modelling and fraud detection. === Insurance === SAS Viya is used in insurance for tasks such as actuarial analytics and modelling, as well as regulatory reporting. === Healthcare and life sciences === In 2023, the company introduced SAS Health, a common health data model built on the SAS Viya platform. AstraZeneca has partnered with SAS to use SAS Viya and SAS Life Science Analytics Framework in its delivery and approval processes. In 2024, SAS partnered with the University of Cambridge's Maxwell Center to use SAS Viya for healthcare research and development. === Public sector === SAS Viya is used in partnership with national and local governments to provide services and detect tax fraud. === Education === SAS Viya is used in research and education, particularly studies related to business intelligence, cybersecurity and data management. SAS Institute has partnered with educational institutions such as Appalachian State University, Clemson University, University of Arkansas, Stockholm University, and Marian University, to provide access to and training for using SAS Viya.

OntoCAPE

OntoCAPE is a large-scale ontology for the domain of Computer-Aided Process Engineering (CAPE). It can be downloaded free of charge via the OntoCAPE Homepage. OntoCAPE is partitioned into 62 sub-ontologies, which can be used individually or as an integrated suite. The sub-ontologies are organized across different abstraction layers, which separate general knowledge from knowledge about particular domains and applications. The upper layers have the character of an upper ontology, covering general topics such as mereotopology, systems theory, quantities and units. The lower layers conceptualize the domain of chemical process engineering, covering domain-specific topics such as materials, chemical reactions, or unit operations.

Frankenstein complex

The Frankenstein complex is a term coined by Isaac Asimov in his robot series, referring to the fear of mechanical men. == History == Some of Asimov's science fiction short stories and novels predict that this suspicion will become strongest and most widespread in respect of "mechanical men" that most-closely resemble human beings (see android), but it is also present on a lower level against robots that are plainly electromechanical automatons. The "Frankenstein complex" is similar in many respects to Masahiro Mori's uncanny valley hypothesis. The name, "Frankenstein complex", is derived from the name of Victor Frankenstein in the 1818 novel Frankenstein; or, The Modern Prometheus by Mary Shelley. In Shelley's story, Frankenstein created an intelligent, somewhat superhuman being, but he finds that his creation is horrifying to behold and abandons it. This ultimately leads to Victor's death at the conclusion of a vendetta between himself and his creation. In much of his fiction, Asimov depicts the general attitude of the public towards robots as negative, with ordinary people fearing that robots will either replace them or dominate them, although dominance would not be allowed under the specifications of the Three Laws of Robotics, the first of which is: "A robot may not harm a human being or, through inaction, allow a human being to come to harm." However, Asimov's fictitious earthly public is not fully persuaded by this, and remains largely suspicious and fearful of robots. I, Robot's short story "Little Lost Robot" is about this "fear of robots". In Asimov's robot novels, the Frankenstein complex is a major problem for roboticists and robot manufacturers. They do all they can to reassure the public that robots are harmless, even though this sometimes involves hiding the truth because they think that the public would misunderstand it. The fear by the public and the response of the manufacturers is an example of the theme of paternalism, the dread of paternalism, and the conflicts that arise from it in Asimov's fiction. The same theme occurs in many later works of fiction featuring robots, although it is rarely referred to as such.

ELVIS Act

The ELVIS Act or Ensuring Likeness Voice and Image Security Act, signed into law by Tennessee Governor Bill Lee on March 21, 2024, marked a significant milestone in the area of regulation of artificial intelligence and public sector policies for artists in the era of artificial intelligence (AI) and AI alignment. It was noted as the first enacted legislation in the United States specifically designed to protect musicians from the unauthorized use of their voices through artificial intelligence technologies and against audio deepfakes and voice cloning. This legislation distinguishes itself by adding penalties for copying a performer's voice. == Origin and advocacy == The inception of the ELVIS Act has been attributed to Gebre Waddell, founder of Sound Credit, who initially conceptualized a framework in 2023 that later evolved into the legislation. Representative Justin J. Pearson acknowledged Waddell's pivotal role during the March 4 House Floor Session on the bill. Leading Tennessee musicians supported the ELVIS Act. Tennessee Governor Bill Lee endorsed it as a Governor's Bill, and it was introduced in the Tennessee Legislature as House Bill 2091 by William Lamberth (R-44) and Senate Bill 2096 by Jack Johnson (R-27). The ELVIS Act is an amendment to a 1984 law that was the result of the Elvis Presley estate litigation for controlling how his likeness could be used after death. == Lobbying from the recording industry == The legislative journey of the ELVIS Act included a broad coalition of music industry stakeholders, including: These organizations, led by the Recording Academy and the RIAA, played roles in drafting the legislation, advocating for passage, and rallying support among the industry and legislators. The act gained momentum through discussions that bridged industry concerns with legislative action. This collaborative process led to a proposal that specifically targets the use of AI to create unauthorized reproductions of artists' voices and images. == Opposition == The ELVIS Act saw industry opposition from the Motion Picture Association, including testimony in the House Banking & Consumer Affairs Subcommittee, including remarks that the law risks "interference with our members’ ability to portray real people and events." TechNet, representing companies such as OpenAI, Google and Amazon, expressed their opposition in the hearing to the bill as drafted, asserting that the language was too broadly written and could have unintended consequences. Other concerns included its potential application to cover bands, but lawmakers assured people that this was not the intention. The bill passed the Tennessee House and Senate with a unanimous, bi-partisan vote including 93 ayes and 0 Noes in the House, and 30 ayes and 0 noes in the Senate. == Passage == By explicitly addressing AI impersonation, the ELVIS Act originated a legal approach to safeguarding personal rights, in the context of digital and technological advancements. It extends protections to an artist's voice and likeness, areas vulnerable to exploitation with the proliferation of AI technologies that occurred in 2023. The legislation received widespread support from the music industry, signaling a significant step forward in the ongoing effort to balance innovation with the protection of individual rights and creative integrity. It was reported as underscoring Tennessee's commitment to its musical heritage and showed the state as a leader in adapting copyright and privacy protections to the modern technological landscape. Artists including Chris Janson and Luke Bryan appeared at the signing ceremony hosted at Robert's Western World to support the new law and commemorate its passing. == Legal precedent == The ELVIS Act was reported as representing a development in the discourse surrounding AI, intellectual property, and personal rights. It was hoped by proponents to set a precedent for future legislative efforts both within and beyond Tennessee, offering a model for how states and potentially the federal government could address similar challenges. As AI technology continues to evolve, the act represents a foundational framework for protecting the authenticity and rights of artists, ensuring contributions remain protected. The act prohibits usage of AI to clone the voice of an artist without consent and can be criminally enforced as a Class A misdemeanor. This legislation's success was hoped by its supporters to inspire similar actions in other states, contributing to a unified approach to copyright and privacy in the digital age. Such a national response would reinforce the importance of safeguarding artists' rights against unauthorized use of their voices and likenesses.