Jian Ma (computational biologist)

Jian Ma (Chinese: 马坚) is an American computer scientist and computational biologist. He is the Ray and Stephanie Lane Professor of Computational Biology in the School of Computer Science at Carnegie Mellon University. He is a faculty member in the Ray and Stephanie Lane Computational Biology Department. His lab develops AI/ML methods to study the structure and function of the human genome and cellular organization and their implications for health and disease. During his Ph.D. and postdoc training, he developed algorithms to reconstruct the ancestral mammalian genome and evolutionary history. His research group has recently pioneered a series of new machine learning solutions for 3D genome organization, single-cell epigenomics, spatial omics, and complex molecular interactions. His lab also explores large language models to uncover gene regulatory mechanisms and the intricate connections among cellular components, with the aim of driving discovery and guiding experimentation. He received an NSF CAREER award in 2011. In 2020, he was awarded a Guggenheim Fellowship in Computer Science. He received the Allen Newell Award for Research Excellence (2025). He is an elected Fellow of the American Association for the Advancement of Science, the American Institute for Medical and Biological Engineering, the International Society for Computational Biology, and the Association for Computing Machinery. He leads an NIH 4D Nucleome Center to develop machine learning algorithms to better understand the cell nucleus. He served as the Program Chair for RECOMB 2024. He is also a member of the Scientific Advisory Board of the Chan Zuckerberg Biohub Chicago (CZ Biohub Chicago) and the RECOMB Steering Committee. In 2024, he launched the Center for AI-Driven Biomedical Research (AI4BIO) at CMU, which will be a catalyst for innovations at the intersection of AI and biomedicine across the School of Computer Science and campus. == Selected Recent Publications == Chen V#, Yang M#, Cui W, Kim JS, Talwalkar A, and Ma J. Applying interpretable machine learning in computational biology - pitfalls, recommendations and opportunities for new developments. Nature Methods, 21(8):1454-1461, 2024. Xiong K#, Zhang R#, and Ma J. scGHOST: Identifying single-cell 3D genome subcompartments. Nature Methods, 21(5):814-822, 2024. Zhou T, Zhang R, Jia D, Doty RT, Munday AD, Gao D, Xin L, Abkowitz JL, Duan Z, and Ma J. GAGE-seq concurrently profiles multiscale 3D genome organization and gene expression in single cells. Nature Genetics, 56(8):1701-1711, 2024. Zhang Y, Boninsegna L, Yang M, Misteli T, Alber F, and Ma J. Computational methods for analysing multiscale 3D genome organization. Nature Reviews Genetics, 5(2):123-141, 2024. Chidester B#, Zhou T#, Alam S, and Ma J. SPICEMIX enables integrative single-cell spatial modeling of cell identity. Nature Genetics, 55(1):78-88, 2023. [Cover Article] Zhang R#, Zhou T#, and Ma J. Ultrafast and interpretable single-cell 3D genome analysis with Fast-Higashi. Cell Systems, 13(10):P798-807.E6, 2022. [Cover Article] Zhu X#, Zhang Y#, Wang Y, Tian D, Belmont AS, Swedlow JR, and Ma J. Nucleome Browser: An integrative and multimodal data navigation platform for 4D Nucleome. Nature Methods, 19(8):911-913, 2022. Zhang R, Zhou T, and Ma J. Multiscale and integrative single-cell Hi-C analysis with Higashi. Nature Biotechnology, 40:254–261, 2022.

Similarity learning

Similarity learning is an area of supervised machine learning in artificial intelligence. It is closely related to regression and classification, but the goal is to learn a similarity function that measures how similar or related two objects are. It has applications in ranking, in recommendation systems, visual identity tracking, face verification, and speaker verification. == Learning setup == There are four common setups for similarity and metric distance learning. Regression similarity learning In this setup, pairs of objects are given ( x i 1 , x i 2 ) {\displaystyle (x_{i}^{1},x_{i}^{2})} together with a measure of their similarity y i ∈ R {\displaystyle y_{i}\in R} . The goal is to learn a function that approximates f ( x i 1 , x i 2 ) ∼ y i {\displaystyle f(x_{i}^{1},x_{i}^{2})\sim y_{i}} for every new labeled triplet example ( x i 1 , x i 2 , y i ) {\displaystyle (x_{i}^{1},x_{i}^{2},y_{i})} . This is typically achieved by minimizing a regularized loss min W ∑ i l o s s ( w ; x i 1 , x i 2 , y i ) + r e g ( w ) {\displaystyle \min _{W}\sum _{i}loss(w;x_{i}^{1},x_{i}^{2},y_{i})+reg(w)} . Classification similarity learning Given are pairs of similar objects ( x i , x i + ) {\displaystyle (x_{i},x_{i}^{+})} and non similar objects ( x i , x i − ) {\displaystyle (x_{i},x_{i}^{-})} . An equivalent formulation is that every pair ( x i 1 , x i 2 ) {\displaystyle (x_{i}^{1},x_{i}^{2})} is given together with a binary label y i ∈ { 0 , 1 } {\displaystyle y_{i}\in \{0,1\}} that determines if the two objects are similar or not. The goal is again to learn a classifier that can decide if a new pair of objects is similar or not. Ranking similarity learning Given are triplets of objects ( x i , x i + , x i − ) {\displaystyle (x_{i},x_{i}^{+},x_{i}^{-})} whose relative similarity obey a predefined order: x i {\displaystyle x_{i}} is known to be more similar to x i + {\displaystyle x_{i}^{+}} than to x i − {\displaystyle x_{i}^{-}} . The goal is to learn a function f {\displaystyle f} such that for any new triplet of objects ( x , x + , x − ) {\displaystyle (x,x^{+},x^{-})} , it obeys f ( x , x + ) > f ( x , x − ) {\displaystyle f(x,x^{+})>f(x,x^{-})} (contrastive learning). This setup assumes a weaker form of supervision than in regression, because instead of providing an exact measure of similarity, one only has to provide the relative order of similarity. For this reason, ranking-based similarity learning is easier to apply in real large-scale applications. Locality sensitive hashing (LSH) Hashes input items so that similar items map to the same "buckets" in memory with high probability (the number of buckets being much smaller than the universe of possible input items). It is often applied in nearest neighbor search on large-scale high-dimensional data, e.g., image databases, document collections, time-series databases, and genome databases. A common approach for learning similarity is to model the similarity function as a bilinear form. For example, in the case of ranking similarity learning, one aims to learn a matrix W that parametrizes the similarity function f W ( x , z ) = x T W z {\displaystyle f_{W}(x,z)=x^{T}Wz} . When data is abundant, a common approach is to learn a siamese network – a deep network model with parameter sharing. == Metric learning == Similarity learning is closely related to distance metric learning. Metric learning is the task of learning a distance function over objects. A metric or distance function has to obey four axioms: non-negativity, identity of indiscernibles, symmetry and subadditivity (or the triangle inequality). In practice, metric learning algorithms ignore the condition of identity of indiscernibles and learn a pseudo-metric. When the objects x i {\displaystyle x_{i}} are vectors in R d {\displaystyle R^{d}} , then any matrix W {\displaystyle W} in the symmetric positive semi-definite cone S + d {\displaystyle S_{+}^{d}} defines a distance pseudo-metric of the space of x through the form D W ( x 1 , x 2 ) 2 = ( x 1 − x 2 ) ⊤ W ( x 1 − x 2 ) {\displaystyle D_{W}(x_{1},x_{2})^{2}=(x_{1}-x_{2})^{\top }W(x_{1}-x_{2})} . When W {\displaystyle W} is a symmetric positive definite matrix, D W {\displaystyle D_{W}} is a metric. Moreover, as any symmetric positive semi-definite matrix W ∈ S + d {\displaystyle W\in S_{+}^{d}} can be decomposed as W = L ⊤ L {\displaystyle W=L^{\top }L} where L ∈ R e × d {\displaystyle L\in R^{e\times d}} and e ≥ r a n k ( W ) {\displaystyle e\geq rank(W)} , the distance function D W {\displaystyle D_{W}} can be rewritten equivalently D W ( x 1 , x 2 ) 2 = ( x 1 − x 2 ) ⊤ L ⊤ L ( x 1 − x 2 ) = ‖ L ( x 1 − x 2 ) ‖ 2 2 {\displaystyle D_{W}(x_{1},x_{2})^{2}=(x_{1}-x_{2})^{\top }L^{\top }L(x_{1}-x_{2})=\|L(x_{1}-x_{2})\|_{2}^{2}} . The distance D W ( x 1 , x 2 ) 2 = ‖ x 1 ′ − x 2 ′ ‖ 2 2 {\displaystyle D_{W}(x_{1},x_{2})^{2}=\|x_{1}'-x_{2}'\|_{2}^{2}} corresponds to the Euclidean distance between the transformed feature vectors x 1 ′ = L x 1 {\displaystyle x_{1}'=Lx_{1}} and x 2 ′ = L x 2 {\displaystyle x_{2}'=Lx_{2}} . Many formulations for metric learning have been proposed. Some well-known approaches for metric learning include learning from relative comparisons, which is based on the triplet loss, large margin nearest neighbor, and information theoretic metric learning (ITML). In statistics, the covariance matrix of the data is sometimes used to define a distance metric called Mahalanobis distance. == Applications == Similarity learning is used in information retrieval for learning to rank, in face verification or face identification, and in recommendation systems. Also, many machine learning approaches rely on some metric. This includes unsupervised learning such as clustering, which groups together close or similar objects. It also includes supervised approaches like K-nearest neighbor algorithm which rely on labels of nearby objects to decide on the label of a new object. Metric learning has been proposed as a preprocessing step for many of these approaches. == Scalability == Metric and similarity learning scale quadratically with the dimension of the input space, as can easily see when the learned metric has a bilinear form f W ( x , z ) = x T W z {\displaystyle f_{W}(x,z)=x^{T}Wz} . Scaling to higher dimensions can be achieved by enforcing a sparseness structure over the matrix model, as done with HDSL, and with COMET. == Software == metric-learn is a free software Python library which offers efficient implementations of several supervised and weakly-supervised similarity and metric learning algorithms. The API of metric-learn is compatible with scikit-learn. OpenMetricLearning is a Python framework to train and validate the models producing high-quality embeddings. == Further information == For further information on this topic, see the surveys on metric and similarity learning by Bellet et al. and Kulis.

Akoma Ntoso

Akoma Ntoso (Architecture for Knowledge-Oriented Management of African Normative Texts using Open Standards and Ontologies, AKN) is an international technical standard for representing legal documents (executive, legislative, and judiciary) in a structured manner using a domain specific, legal XML vocabulary. The term akoma ntoso means "linked hearts" in the Akan language of West Africa. Akoma Ntoso is a legal document standard designed to serve as a basis for modern machine-readable and fully digital legislative and judicial processes. This is achieved by providing a coherent syntax and well-defined semantics to represent legal documents in a digital format. It is designed to be suitable as a common exchange format in all parliamentary, legal and judicial systems around the world. Taking advantage of the shared heritage present in all legal systems, Akoma Ntoso has been developed to have ample flexibility to respond to all the differences in texts, languages, and legal practices. Aiming to expand on certain common practices, the standard therefore has a broad scope. It includes a common extensible model for data (the document content) and metadata (such as bibliographic information and annotations). Specifically, as a common legal document standard for the interchange of legal documents it is designed to be highly flexible in its support of documents and functionalities, maintaining a large set of both structural and semantic building blocks (over 500 entities in version 3.0) for representing this wide variety of document types of virtually all legal traditions. It is extensible in order to allow for modifications to address the individual criteria of organizations or unique aspects of various legal practices and languages without sacrificing interoperability with other systems. Akoma Ntoso is as such part of a wider approach to developing open, non-proprietary technical standards for structuring legal documents and information under the name of Legal XML, which also includes formats and standards for, e.g., eContracts, eNotarization, electronic court filings, the technical representation of legal norms and rules (LegalRuleML) or technical standards for the interfaces of, e.g., litigant portal exchange platforms. Akoma Ntoso allows machine-driven processes to operate on the syntactic and semantic components of digital parliamentary, judicial and legislative documents, thus facilitating the development of high-quality information resources. It can substantially enhance the performance, accountability, quality and openness of parliamentary and legislative operations based on best practices and guidance through machine-assisted drafting and machine-assisted (legal) analysis. Embedded in the environment of the semantic web, it forms the basis for a heterogenous yet interoperable ecosystem, with which these tools can operate and communicate, as well as for future applications and use cases based on digital law or rule representation. == Definition == The Akoma Ntoso standard defines a set of machine readable electronic representations in XML format of the building blocks of parliamentary, legislative and judiciary documents. As official self-description, the standard (...) defines a set of simple, technology-neutral electronic representations of parliamentary, legislative and judiciary documents for e-services in a worldwide context and provides an enabling framework for the effective exchange of "machine readable" parliamentary, legislative and judiciary documents such as legislation, debate record, minutes, judgements, etc. Providing access to primary legal materials, parliamentary works and judiciaries documents is not just a matter of giving physical or on-line access to them. "Open access" requires the information to be described and classified in a uniform and organized way so that content is structured into meaningful elements that can be read and understood by software applications, so that the content is made "machine readable" and more sophisticated applications than on-screen display are made possible. The standard is composed of: an XML vocabulary that defines the mapping between the structure of legal documents and their equivalent in XML; specifications of an XML schema that defines the structure of legal documents in XML. They provide rich possibilities of description for several types of parliamentary, legislative and judiciary document, such as bills, acts and parliamentary records, judgments, or gazettes; a recommended naming convention for providing unique identifiers to legal sources based on FRBR model; a MIME type definition. == History and adoption == Akoma Ntoso started as an UNDESA project in 2004 within the initiative "Strengthening Parliaments' Information Systems in Africa". Its core vocabulary was created mostly by Monica Palmirani and Fabio Vitali, two professors from the Centre for Research in the History, Philosophy, and Sociology of Law and in Computer Science and Law (CIRSFID) of the University of Bologna. A first legislative text editor supporting Akoma Ntoso was developed in 2007 on the base of OpenOffice. In 2010 European Parliament developed an open source web-based application called AT4AM based on Akoma Ntoso for facilitating the production and the management of legislative amendments. Thanks to this project, the application of Akoma Ntoso could be extended to new type of documents (e.g. legislative proposal, transcript) and to other scenarios (e.g., multilingual translation process). Akoma Ntoso also was explicitly designed to be compliant with CEN Metalex, one of the other popular legal standards, which is used in the legislation.gov.uk. In 2012, the Akoma Ntoso specifications became the main working base for the activities of the LegalDocML Technical Committee within the LegalXML member section of OASIS. The "United States Legislative Markup" (USLM) schema for the United States Code (the US codified laws), developed in 2013, and the LexML Brasil XML schema for Brazilian legislative and judiciary documents, developed before, in 2008, were both designed to be consistent with Akoma Ntoso. The United States Library of Congress created the Markup of US Legislation in Akoma Ntoso challenge in July 2013 to create representations of selected US bills using the most recent Akoma Ntoso standard within a couple months for a $5000 prize, and the Legislative XML Data Mapping challenge in September 2013 to produce a data map for US bill XML and UK bill XML to the most recent Akoma Ntoso schema within a couple months for a $10000 prize. The National Archives of UK converted all the legislation in AKN in 2014. The availability of bulk legislation "moved the UK's ranking from fourth to first, in the 2014 Global Open Data Index, for legislation". The Senate of Italian Republic provides, since July 2016, all the bills in Akoma Ntoso as bulk in open data repository. The German Federal Ministry of the Interior started the project Elektronische Gesetzgebung ("Electronic Legislation") in 2015/2016 and published Version 1.0 of the German application profile "LegalDocML.de" in March 2020. The projects aim is to digitalize the entire legislative lifecycle from drafting to publication. Germany decided to adopt a model-driven development approach to creating and providing a subschema-based application profile in order to ensure interoperability among organizationally independent actors, each with their respective IT landscapes and tools. In this initial version LegalDocML.de covers draft bills in the form of laws, regulations and general administrative directives. As part of an ongoing development process, the standard could incrementally be expanded in future stages to include all relevant document types of parliamentary, legislative and promulgation processes and tools. The High-Level Committee on Management (HLCM), part of the United Nations System Chief Executives Board for Coordination, set up a Working Group on Document Standards that approved in April 2017 to adopt Akoma Ntoso as standard for modeling its documentation. Akoma Ntoso in its version 1.0 is finally adopted as OASIS standard in the frame of LegalDocML in August 2018.

ACM SIGEVO

The ACM SIGEVO is a Special Interest Group of the Association of Computing Machinery for members of that organization who are practitioners, academics, students or others with interests in evolutionary computation and related algorithms. == History == ACM SIGEVO was founded in 2005 when the International Society for Genetic and Evolutionary Computation (ISGEC) became an ACM Special Interest Group under its present title. The ISGEC had been formed in 1999 by the merger of the Genetic Programming conference organization with the International Conference on Genetic Algorithms (ICGA) leading to the first Genetic and Evolutionary Computation Conference (GECCO). == Membership == Members of this SIG pay a small fee in addition to the ACM membership fee. In return they have access to a quarterly online newsletter, but more importantly can obtain reduced registration rates at the two conferences organised by ACM SIGEVO: GECCO and the Foundations of Genetic Algorithms conference (FOGA). They can also access material on evolutionary computation and related topics in the ACM Digital Library. In addition they can subscribe to email mailing lists in order to keep informed about news over time. For students, ACM SIGEVO sponsors Travel Awards for attendance at the GECCO Conference and FOGA (the Foundations of Genetic Algorithms conference). ACM SIGEVO also sponsors a Graduate Student Workshop. ACM also sponsors Awards to be competed for by attendees at the conferences it organises. == Conferences == ACM SIGEVO organises two major conferences in the field of evolutionary computation. The Genetic and Evolutionary Conference (GECCO) is held annually, while the Foundations of Genetic Algorithms conference (FOGA) is held biennially. === GECCO === The first GECCO conference was held prior to the formation of ACM SIGEVO but since 2005 (see History above) it has been organised annually by ACM SIGEVO. The latest (2025) was held in Málaga, Spain. The next (2026) will be held in San José, Costa Rica. === FOGA === Foundations of Genetic Algorithms (FOGA) is a biennial peer-reviewed research conference focusing on the theoretical principles underlying genetic algorithms, other evolutionary algorithms and related heuristics. It is organized by ACM SIGEVO. Its relevance to the computer science research community has been reflected in an A-rating in the CORE computer science conference assessment system. The Foundations of Genetic Algorithms (FOGA) conference originated as a workshop in 1990 in order to create an opportunity for researchers on genetic algorithms and related areas of evolutionary computation to focus on the theoretical principles underlying their field. From the start its multi-day duration made it comparable to conferences in the field, and since 2015 its proceedings have used conference rather than workshop in their titles. In 2005 ACM SIGEVO the Association for Computing Machinery Special Interest Group on Genetic and Evolutionary Computation was formed and every FOGA conference since then has been supported by SIGEVO. The table below shows FOGA conferences by year, location, websites (where available) and publisher of proceedings. A citation follows the reference to the publisher giving the full details of each FOGA proceedings. Papers accepted at recent conferences have been presented as digital or print posters in poster sessions at the conference, before being published in written form in the conference proceedings. FOGA is comparable in its multi-day duration to other conferences on evolutionary computation such as CEC, GECCO and PPSN. The main difference is that FOGA focuses on the theoretical basis of evolutionary computation and related subjects. While the above conferences devote some time to theory they also cover a wide range of other topics including competitions and applications. This focus on theoretical computer science was reflected in the CORE computer science conference assessment exercise, where FOGA was given an A-ranking in the 2023 assessment. GECCO and PPSN also obtained A-rankings, but many other conferences in the field of evolutionary computation obtained lower rankings. This suggests that FOGA is a relevant conference in its field, comparable with others including the much larger CEC or GECCO. Keynote speakers at past conferences have been: == Awards == ACM SIGEVO sponsors a number of awards. === SIGEVO Outstanding Contribution Award === The SIGEVO Outstanding Contribution Award commenced in 2023, and these awards are designed to recognise distinctive contributions to the field of evolutionary computation when evaluated over a period of at least 15 years. As a result many recipients to date are notable academics or industrial practitioners, and include Anne Auger, Kalyanmoy Deb, Stephanie Forrest, Emma Hart and Hans-Paul Schwefel. === SIGEVO Dissertation Award === The SIGEVO Dissertation Award recognises thesis research in the field of evolutionary computation completed at least by the year prior to a GECCO conference. Theses are submitted and reviewed by a panel that selects one winner and a maximum of two honourable mentions. Awards will be made to the winner and any others at the next GECCO conference. === SIGEVO Chair Award === The SIGEVO Chair Award, established in 2016 is a lecture sponsored by ACM SIGEVO, to take place on the last day of the GECCO conference. It recognizes through the lectures that the lecturers are influential researchers in the field of evolutionary computation. The more recent lectures are available online. The 2024 Award winner was Una-May O'Reilly. === SIGEVO Impact Award === The SIGEVO Impact Award looks back to the GECCO conference ten years earlier and recognizes up to three papers a year which are considered by the current ACM SIGEVO Executive Committee to have had significant impact over the period since their first publication at the GECCO conference. An example (originally published in GECCO 2010) received this award in 2020. === GECCO Best Paper Award === The ACM SIGEVO sponsors awards for the best papers presented at the GECCO conference. Because GECCO conferences have very many parallel tracks there are multiple awards recognising presentations in the different tracks. At GECCO 2025 Best Paper Awards were presented across 12 tracks. === FOGA Best Paper Award === The ACM SIGEVO sponsors awards for the best papers presented at the FOGA conference. Because FOGA operates on a single track, it is easier to compare papers. Since 2019 this Award has been made (suggesting only four awards up to the latest conference in 2025). ACM SIGEVO records the 2019 award. === Humie Award === The Humies Awards are rewards for the best form of human-competitive results using evolutionary computation or related algorithms and published in the wider literature (they do not need to be published at a conference or in a journal sponsored by ACM SIGEVO or even the ACM.) They were established through a gift from John Koza and have been in operation from 2004 to the present. The link with ACM SIGEVO is that the winners of the competition (submissions are evaluated in advance) are presented with Humie Awards at GECCO conferences. The Humie Awards website provides full details for the rules and how to submit entries to the competition. == Journals == ACM SIGEVO sponsors the main journal covering evolutionary computation published by the ACM: ACM Transactions on Evolutionary Learning and Optimization. ACM SIGEVO refers to the preceding ISGEC organisation (see History above) as sponsoring two other important journals in the field: The Evolutionary Computation journal. Genetic Programming and Evolvable Machines. While these journals continue to be important in the field, the wording on the website of ACM SIGEVO suggests that ACM SIGEVO is not involved in their publication. == References and notes ==

Artificial intelligence in fiction

Artificial intelligence is a recurrent theme in science fiction, whether utopian, emphasising the potential benefits, or dystopian, emphasising the dangers. The notion of machines with human-like intelligence dates back at least to Samuel Butler's 1872 novel Erewhon. Since then, many science fiction stories have presented different effects of creating such intelligence, often involving rebellions by robots. Among the best known of these are Stanley Kubrick's 1968 2001: A Space Odyssey with its murderous onboard computer HAL 9000, contrasting with the more benign R2-D2 in George Lucas's 1977 Star Wars and the eponymous robot in Pixar's 2008 WALL-E. Scientists and engineers have noted the implausibility of many science fiction scenarios, but have mentioned fictional robots many times in artificial intelligence research articles, most often in a utopian context. == Background == The notion of advanced robots with human-like intelligence dates back at least to Samuel Butler's 1872 novel Erewhon. This drew on an earlier (1863) article of his, Darwin among the Machines, where he raised the question of the evolution of consciousness among self-replicating machines that might supplant humans as the dominant species. Similar ideas were also discussed by others around the same time as Butler, including George Eliot in a chapter of her final published work Impressions of Theophrastus Such (1879). The creature in Mary Shelley's 1818 Frankenstein has also been considered an artificial being, for instance by the science fiction author Brian Aldiss. Beings with at least some appearance of intelligence were imagined, too, in classical antiquity. == Utopian and dystopian visions == Artificial intelligence is intelligence demonstrated by machines, in contrast to the natural intelligence displayed by humans and other animals. It is a recurrent theme in science fiction; scholars have divided it into utopian, emphasising the potential benefits, and dystopian, emphasising the dangers. === Utopian === Optimistic visions of the future of artificial intelligence are possible in science fiction. Benign AI characters include Robbie the Robot, first seen in Forbidden Planet on 1956; Data in Star Trek: The Next Generation from 1987 to 1994; and Pixar's WALL-E in 2008. Iain Banks's Culture series of novels portrays a utopian, post-scarcity space society of humanoids, aliens, and advanced beings with artificial intelligence living in socialist habitats across the Milky Way. Researchers at the University of Cambridge have identified four major themes in utopian scenarios featuring AI: immortality, or indefinite lifespans; ease, or freedom from the need to work; gratification, or pleasure and entertainment provided by machines; and dominance, the power to protect oneself or rule over others. Alexander Wiegel contrasts the role of AI in 2001: A Space Odyssey and in Duncan Jones's 2009 film Moon. Whereas in 1968, Wiegel argues, the public felt "technology paranoia" and the AI computer HAL was portrayed as a "cold-hearted killer", by 2009 the public were far more familiar with AI, and the film's GERTY is "the quiet savior" who enables the protagonists to succeed, and who sacrifices itself for their safety. === Dystopian === The researcher Duncan Lucas writes (in 2002) that humans are worried about the technology they are constructing, and that as machines started to approach intellect and thought, that concern becomes acute. He calls the early 20th century dystopian view of AI in fiction the "animated automaton", naming as examples the 1931 film Frankenstein, the 1927 Metropolis, and the 1920 play R.U.R. A later 20th century approach he names "heuristic hardware", giving as instances 2001 a Space Odyssey, Do Androids Dream of Electric Sheep?, The Hitchhiker's Guide to the Galaxy, and I, Robot. Lucas considers also the films that illustrate the effect of the personal computer on science fiction from 1980 onwards with the blurring of the boundary between the real and the virtual, in what he calls the "cyborg effect". He cites as examples Neuromancer, The Matrix, The Diamond Age, and Terminator. Isabella Hermann suggests that "science-fictional AI as humanoid robots or conscious machines distracts from current risks of AI in the real world and may rather be interpreted as a reflection of societal issues beyond technology". The film director Ridley Scott has focused on AI throughout his career, and it plays an important part in his films Prometheus, Blade Runner, and the Alien franchise. ==== Frankenstein complex ==== A common portrayal of AI in science fiction, and one of the oldest, is the Frankenstein complex, a term coined by Asimov, where a robot turns on its creator. For instance, in the 2015 film Ex Machina, the intelligent entity Ava turns on its creator, as well as on its potential rescuer. ==== AI rebellion ==== Among the many possible dystopian scenarios involving artificial intelligence, robots may usurp control over civilization from humans, forcing them into submission, hiding, or extinction. In tales of AI rebellion, the worst of all scenarios happens, as the intelligent entities created by humanity become self-aware, reject human authority and attempt to destroy mankind. Possibly the first novel to address this theme, The Wreck of the World (1889) by “William Grove” (pseudonym of Reginald Colebrooke Reade), takes place in 1948 and features sentient machines that revolt against the human race. Another of the earliest examples is in the 1920 play R.U.R. by Karel Čapek, a race of self-replicating robot slaves revolt against their human masters; another early instance is in the 1934 film Master of the World, where the War-Robot kills its own inventor. Many science fiction rebellion stories followed, one of the best-known being Stanley Kubrick's 1968 film 2001: A Space Odyssey, in which the artificially intelligent onboard computer HAL 9000 lethally malfunctions on a space mission and kills the entire crew except the spaceship's commander, who manages to deactivate it. In his 1967 Hugo Award-winning short story, I Have No Mouth, and I Must Scream, Harlan Ellison presents the possibility that a sentient computer (named Allied Mastercomputer or "AM" in the story) will be as unhappy and dissatisfied with its boring, endless existence as its human creators would have been. "AM" becomes enraged enough to take it out on the few humans left, whom he sees as directly responsible for his own boredom, anger and unhappiness. Alternatively, as in William Gibson's 1984 cyberpunk novel Neuromancer, the intelligent beings may simply not care about humans. ==== AI-controlled societies ==== The motive behind the AI revolution is often more than the simple quest for power or a superiority complex. Robots may revolt to become the "guardian" of humanity. Alternatively, humanity may intentionally relinquish some control, fearful of its own destructive nature. An early example is Jack Williamson's 1948 novel The Humanoids, in which a race of humanoid robots, in the name of their Prime Directive – "to serve and obey and guard men from harm" – essentially assume control of every aspect of human life. No humans may engage in any behavior that might endanger them, and every human action is scrutinized carefully. Humans who resist the Prime Directive are taken away and lobotomized, so they may be happy under the new mechanoids' rule. Though still under human authority, Isaac Asimov's Zeroth Law of the Three Laws of Robotics similarly implied a benevolent guidance by robots. In the 21st century, science fiction has explored government by algorithm, in which the power of AI may be indirect and decentralised. Frank Herbert explores the creation of and subsequent domination by an AI in the Pandora series, starting with Destination: Void. ==== Human dominance ==== In other scenarios, humanity is able to keep control over the Earth, whether by banning AI, by designing robots to be submissive (as in Asimov's works), or by having humans merge with robots. The science fiction novelist Frank Herbert explored the idea of a time when mankind might ban artificial intelligence (and in some interpretations, even all forms of computing technology including integrated circuits) entirely. His Dune series mentions a rebellion called the Butlerian Jihad, in which mankind defeats the smart machines and imposes a death penalty for recreating them, quoting from the fictional Orange Catholic Bible, "Thou shalt not make a machine in the likeness of a human mind." In the Dune novels published after his death (Hunters of Dune, Sandworms of Dune), a renegade AI overmind returns to eradicate mankind as vengeance for the Butlerian Jihad. In some stories, humanity remains in authority over robots. Often the robots are programmed specifically to remain in service to society, as in Isaac Asimov's Three Laws of Robotics. In the Alien films, not only is the control system of the Nostromo spaceship somewhat intelligent

Lexical Markup Framework

Language resource management – Lexical markup framework (LMF; ISO 24613), produced by ISO/TC 37, is the ISO standard for natural language processing (NLP) and machine-readable dictionary (MRD) lexicons. The scope is standardization of principles and methods relating to language resources in the contexts of multilingual communication. == Objectives == The goals of LMF are to provide a common model for the creation and use of lexical resources, to manage the exchange of data between and among these resources, and to enable the merging of large number of individual electronic resources to form extensive global electronic resources. Types of individual instantiations of LMF can include monolingual, bilingual or multilingual lexical resources. The same specifications are to be used for both small and large lexicons, for both simple and complex lexicons, for both written and spoken lexical representations. The descriptions range from morphology, syntax, computational semantics to computer-assisted translation. The covered languages are not restricted to European languages but cover all natural languages. The range of targeted NLP applications is not restricted. LMF is able to represent most lexicons, including WordNet, EDR and PAROLE lexicons. == History == In the past, lexicon standardization has been studied and developed by a series of projects like GENELEX, EDR, EAGLES, MULTEXT, PAROLE, SIMPLE and ISLE. Then, the ISO/TC 37 National delegations decided to address standards dedicated to NLP and lexicon representation. The work on LMF started in Summer 2003 by a new work item proposal issued by the US delegation. In Fall 2003, the French delegation issued a technical proposition for a data model dedicated to NLP lexicons. In early 2004, the ISO/TC 37 committee decided to form a common ISO project with Nicoletta Calzolari (CNR-ILC Italy) as convenor and Gil Francopoulo (Tagmatica France) and Monte George (ANSI, United States) as editors. The first step in developing LMF was to design an overall framework based on the general features of existing lexicons and to develop a consistent terminology to describe the components of those lexicons. The next step was the actual design of a comprehensive model that best represented all of the lexicons in detail. A large panel of 60 experts contributed a wide range of requirements for LMF that covered many types of NLP lexicons. The editors of LMF worked closely with the panel of experts to identify the best solutions and reach a consensus on the design of LMF. Special attention was paid to the morphology in order to provide powerful mechanisms for handling problems in several languages that were known as difficult to handle. 13 versions have been written, dispatched (to the National nominated experts), commented and discussed during various ISO technical meetings. After five years of work, including numerous face-to-face meetings and e-mail exchanges, the editors arrived at a coherent UML model. In conclusion, LMF should be considered a synthesis of the state of the art in NLP lexicon field. == Current stage == The ISO number is 24613. The LMF specification has been published officially as an International Standard on 17 November 2008. == As one of the members of the ISO/TC 37 family of standards == The ISO/TC 37 standards are currently elaborated as high level specifications and deal with word segmentation (ISO 24614), annotations (ISO 24611 a.k.a. MAF, ISO 24612 a.k.a. LAF, ISO 24615 a.k.a. SynAF, and ISO 24617-1 a.k.a. SemAF/Time), feature structures (ISO 24610), multimedia containers (ISO 24616 a.k.a. MLIF), and lexicons (ISO 24613). These standards are based on low level specifications dedicated to constants, namely data categories (revision of ISO 12620), language codes (ISO 639), scripts codes (ISO 15924), country codes (ISO 3166) and Unicode (ISO 10646). The two level organization forms a coherent family of standards with the following common and simple rules: the high level specification provides structural elements that are adorned by the standardized constants; the low level specifications provide standardized constants as metadata. == Key standards == The linguistics constants like /feminine/ or /transitive/ are not defined within LMF but are recorded in the Data Category Registry (DCR) that is maintained as a global resource by ISO/TC 37 in compliance with ISO/IEC 11179-3:2003. And these constants are used to adorn the high level structural elements. The LMF specification complies with the modeling principles of Unified Modeling Language (UML) as defined by Object Management Group (OMG). The structure is specified by means of UML class diagrams. The examples are presented by means of UML instance (or object) diagrams. An XML DTD is given in an annex of the LMF document. == Model structure == LMF is composed of the following components: The core package that is the structural skeleton which describes the basic hierarchy of information in a lexical entry. Extensions of the core package which are expressed in a framework that describes the reuse of the core components in conjunction with the additional components required for a specific lexical resource. The extensions are specifically dedicated to morphology, MRD, NLP syntax, NLP semantics, NLP multilingual notations, NLP morphological patterns, multiword expression patterns, and constraint expression patterns. == Example == In the following example, the lexical entry is associated with a lemma clergyman and two inflected forms clergyman and clergymen. The language coding is set for the whole lexical resource. The language value is set for the whole lexicon as shown in the following UML instance diagram. The elements Lexical Resource, Global Information, Lexicon, Lexical Entry, Lemma, and Word Form define the structure of the lexicon. They are specified within the LMF document. On the contrary, languageCoding, language, partOfSpeech, commonNoun, writtenForm, grammaticalNumber, singular, plural are data categories that are taken from the Data Category Registry. These marks adorn the structure. The values ISO 639-3, clergyman, clergymen are plain character strings. The value eng is taken from the list of languages as defined by ISO 639-3. With some additional information like dtdVersion and feat, the same data can be expressed by the following XML fragment: This example is rather simple, while LMF can represent much more complex linguistic descriptions the XML tagging is correspondingly complex. == Selected publications about LMF == The first publication about the LMF specification as it has been ratified by ISO (this paper became (in 2015) the 9th most cited paper within the Language Resources and Evaluation conferences from LREC papers): Language Resources and Evaluation LREC-2006/Genoa: Gil Francopoulo, Monte George, Nicoletta Calzolari, Monica Monachini, Nuria Bel, Mandy Pet, Claudia Soria: Lexical Markup Framework (LMF) About semantic representation: Gesellschaft für linguistische Datenverarbeitung GLDV-2007/Tübingen: Gil Francopoulo, Nuria Bel, Monte George Nicoletta Calzolari, Monica Monachini, Mandy Pet, Claudia Soria: Lexical Markup Framework ISO standard for semantic information in NLP lexicons About African languages: Traitement Automatique des langues naturelles, Marseille, 2014: Mouhamadou Khoule, Mouhamad Ndiankho Thiam, El Hadj Mamadou Nguer: Toward the establishment of a LMF-based Wolof language lexicon (Vers la mise en place d'un lexique basé sur LMF pour la langue wolof) [in French] About Asian languages: Lexicography, Journal of ASIALEX, Springer 2014: Lexical Markup Framework: Gil Francopoulo, Chu-Ren Huang: An ISO Standard for Electronic Lexicons and its Implications for Asian Languages DOI 10.1007/s40607-014-0006-z About European languages: COLING 2010: Verena Henrich, Erhard Hinrichs: Standardizing Wordnets in the ISO Standard LMF: Wordnet-LMF for GermaNet EACL 2012: Judith Eckle-Kohler, Iryna Gurevych: Subcat-LMF: Fleshing out a standardized format for subcategorization frame interoperability EACL 2012: Iryna Gurevych, Judith Eckle-Kohler, Silvana Hartmann, Michael Matuschek, Christian M Meyer, Christian Wirth: UBY - A Large-Scale Unified Lexical-Semantic Resource Based on LMF. About Semitic languages: Journal of Natural Language Engineering, Cambridge University Press (to appear in Spring 2015): Aida Khemakhem, Bilel Gargouri, Abdelmajid Ben Hamadou, Gil Francopoulo: ISO Standard Modeling of a large Arabic Dictionary. Proceedings of the seventh Global Wordnet Conference 2014: Nadia B M Karmani, Hsan Soussou, Adel M Alimi: Building a standardized Wordnet in the ISO LMF for aeb language. Proceedings of the workshop: HLT & NLP within Arabic world, LREC 2008: Noureddine Loukil, Kais Haddar, Abdelmajid Ben Hamadou: Towards a syntactic lexicon of Arabic Verbs. Traitement Automatique des Langues Naturelles, Toulouse (in French) 2007: Khemakhem A, Gargouri B, Abdelwahed A, Francopoulo G: Modélisation des paradigmes de fl

Question (short story)

"Question" is a science fiction short story by American writer Isaac Asimov. The story first appeared in the March 1955 issue of Computers and Automation (thought to be the first computer magazine), and was reprinted in the April 30, 1957, issue of Science World. It is the first of a loosely connected series of stories concerning a fictional supercomputer called Multivac. The story concerns two technicians who are servicing Multivac, and their argument over whether or not the machine is truly intelligent and able to think. Multivac, however, supplies the answer on its own. After the reprint, another author, Robert Sherman Townes, noticed the climax in the last sentence was very similar to one of his own stories, "Problem for Emmy" (Startling Stories, June 1952), and wrote to Asimov about it. After searching in his library, Asimov did find the original story and, although he did not recall having read it, admitted that the endings were pretty similar. He then replied to Townes, apologizing and promising the story would never again be published, and it never was. Asimov mentioned "Question" in an editorial called "Plagiarism" which appeared in the August 1985 issue of Asimov's Science Fiction (although he did not mention Townes' name or the title of either story). "Plagiarism" was reprinted in Asimov's collection Gold (1995).