András Kornai

András Kornai

András Kornai (born 1957 in Budapest) is a mathematical linguist. == Education == Kornai is the son of economist János Kornai. He earned two PhDs with the first being in mathematics in 1983 from Eötvös Loránd University in Budapest, where his advisor was Miklós Ajtai. His second was in linguistics in 1991 from Stanford University, where his advisor was Paul Kiparsky. == Career == He is a professor in the Department of Algebra at the Budapest Institute of Technology, where he works on an open source Hungarian morphological analyzer. He was Chief Scientist at MetaCarta, where he worked on information extraction before the company was acquired by Nokia. Prior to MetaCarta, he was Chief Scientist at Northern Light. He is on the board of the journal Grammars and YourAmigo PLC. His research interests include all mathematical aspects of natural language processing, speech recognition, and OCR. As area editor he was responsible for the Mathematical Linguistics area of the Oxford International Encyclopedia of Linguistics, and his joint work with Geoffrey Pullum, "The X-bar Theory of Phrase Structure", formally reconstructed that then-popular linguistic theory. == Awards and honors == 2009: ACM Distinguished Member == Monographs == Semantics. Springer Nature, 2020. ISBN 978-3-319-65644-1 Mathematical Linguistics. Springer Verlag, in the series Advanced Information and Knowledge Processing, November 2007. ISBN 978-1-84628-985-9 Hardbound, approximately 300 pages. See description. Formal Phonology. In the series Outstanding Dissertations in Linguistics, Garland Publishing, 1994, ISBN 0-8153-1730-1, hardbound, 240 pages Contents, Preface, Introduction (20 pages) On Hungarian Morphology. In the series Linguistica, Hungarian Academy of Sciences, 1994, ISBN 963-8461-73-X, paperbound, 174 pages Contents, Preface, Introduction (10 pages) == Books edited == Oxford International Encyclopedia of Linguistics (Mathematical Linguistics Area Editor under Editor in Chief William Frawley). 4 volumes, Oxford University Press, 2003, ISBN 978-0-19-513977-8. Proceedings of the HLT-NAACL Workshop on the Analysis of Geographic References. Jointly with Beth Sundheim. Association for Computational Linguistics, 2003, ISBN 1-932432-04-3 (WS9), paperbound, vi+81 pages. See related material. Extended Finite State Models of Language (editor). In the series Studies in Natural Language Processing, Cambridge University Press, 1999, ISBN 0-521-63198-X, hardbound, x+278 pages Contents, Introduction (7 pages). == Selected papers == Digital Language Death. PLoS ONE 8(10): e77056, 2012. [1] Hunmorph: open source word analysis (Jointly with V. Tron, Gy. Gyepesi, P. Halacsy, L. Nemeth, and D. Varga). In Proc. ACL 2005 Software Workshop 77-85 [2] Leveraging the open source ispell codebase for minority language analysis (Jointly with P. Halacsy, L. Nemeth, A. Rung, I. Szakadat, and V. Tron). In J. Carson-Berndsen (ed): Proc. SALTMIL 2004 56-59 [3] Explicit Finitism, International Journal of Theoretical Physics 2003/2 301-307 [4] Mathematical Linguistics (Jointly with G.K. Pullum) In W. Frawley (ed): Oxford International Encyclopedia of Linguistics, Oxford University Press 2003, v3 17-20 [5] Optical Character Recognition, In W. Frawley (ed): Oxford International Encyclopedia of Linguistics, Oxford University Press 2003, v3 33-34 [6] How many words are there? Glottometrics 2002/4 61-86 [7] Zipf's law outside the middle range Proc. Sixth Meeting on Mathematics of Language University of Central Florida, 1999 347-356 [8] A Robust, Language-Independent OCR System. (Jointly with Z. Lu, I. Bazzi, J. Makhoul, P. Natarajan, and R. Schwartz) In: Robert J. Mericsko (ed): Proc. 27th AIPR Workshop: Advances in Computer-Assisted Recognition SPIE Proceedings 3584 1999 [9] Quantitative Comparison of Languages. Grammars 1998/2 155-165 [10] The generative power of feature geometry. Annals of Mathematics and Artificial Intelligence 8 1993 37-46 [11] The X-bar Theory of Phrase Structure. (Jointly with G.K. Pullum) Language 66 1990 24-50 [12]

Text Database and Dictionary of Classic Mayan

The project Text Database and Dictionary of Classic Mayan (abbr. TWKM) promotes research on the writing and language of pre-Hispanic Maya culture. It is housed in the Faculty of Arts at the University of Bonn and was established with funding from the North Rhine-Westphalian Academy of Sciences, Humanities and the Arts. The project has a projected run-time of fifteen years and is directed by Nikolai Grube from the Department of Anthropology of the Americas at the University of Bonn. The goal of the project is to conduct computer-based studies of all extant Maya hieroglyphic texts from an epigraphic and cultural-historical standpoint, and to produce and publish a database and a comprehensive dictionary of the Classic Mayan language. == Subject of the Project == The text database, as well as the dictionary that will be compiled by the conclusion of the project, will be assembled based on all known texts from the pre-Hispanic Maya culture. These texts were produced and used between approximately the third century B.C. through A.D. 1500, in a region that today includes parts of the countries of Mexico, Guatemala, Belize, and Honduras. The thousands of hieroglyphic inscriptions on monuments, ceramics, or daily objects that have survived into the present offer insight into the language's vocabulary and structure. The project's database and dictionary will digitally represent original spellings using the logo-syllabic Maya hieroglyphs, as well as their transcription and transliteration in the Roman alphabet. The data will be additionally annotated with various epigraphic analyses, translations, and further object-specific information. == Project Partners == TWKM will employ digital technologies in order to compile and make available the data and metadata, as well as to publish the project's research results. The project thereby methodologically positions itself in the field of the digital humanities. The project will be conducted in cooperation with the project partners (below), the research association for the eHumanities TextGrid, as well as the University and Regional Library of Bonn (ULB). The working environment that is currently under construction, in which the data and metadata will be compiled and annotated, will be realized in theTextGrid Laboratory, a software of the virtual research environment. A further component of this software, the TextGrid Repository, will make the data that are authorized for publication freely available online and ensure their long-term storage. The tools for data compilation and annotation attained from the modularly constructed and extended TextGrid lab thereby provide all the necessary materials for facilitating the research team's the typical epigraphic workflow. The workflow usually begins by documenting the texts and the objects on which they are preserved, and by compiling descriptive data. It then continues with the various levels of epigraphic and linguistic analysis, and concludes in the best case scenario with a translation of the analyzed inscription and a corresponding publication. In cooperation with the ULB, selected data will additionally be made available. The project's Virtual Inscription Archive will present online, in the Digital Collections of the ULB, hieroglyphic inscriptions selected from the published data in the repository, including an image of and brief information about the texts and the objects on which they are written, epigraphic analysis, and translation. == Project Goal == One of the project's goals is to produce a dictionary of Classic Mayan, in both digital and print form, towards the end of the project run-time. Additionally, a database with a corpus of inscriptions, including their translations and epigraphic analyses, will be made freely available online. The database furthermore will provide an ontology-like link of the contextual object data with the inscriptions and with each other, thereby allowing a cultural-historical arrangement of all contents within the periods of pre-Hispanic Maya culture. The contents of the database are additionally linked to citations of relevant literature. As a result, the database will also make freely available to both the scientific community and other interested parties a bibliography representing the research history and a base of knowledge concerning ancient Maya culture and script. In addition, the Classic Maya script, in its temporally defined stages of language development, will be gathered into and documented in a comprehensive language corpus with the aid of the information gathered by the project. In collaboration with all project participants, the corpus data can be used, together with the aid of various comparable analyses and also computational linguistic methods, such as inference-based methods, to confirm readings of some hieroglyphs that are currently only partially confirmed, and to eventually completely decipher the Classic Maya script.

Packingham v. North Carolina

Packingham v. North Carolina, 582 U.S. 98 (2017), is a case in which the Supreme Court of the United States held that a North Carolina statute that prohibited registered sex offenders from using social media websites was unconstitutional because it violated the First Amendment to the U.S. Constitution, which protects freedom of speech. In 2010, Lester Gerard Packingham, a registered sex offender, posted on Facebook under a pseudonym to comment favorably on a recent traffic court experience. Police then identified Packingham and charged him with violating North Carolina's law. Packingham moved to dismiss the charges, arguing that the state's law violated the First Amendment. The trial court dismissed this motion and ultimately convicted Packingham. A state appellate court initially reversed the trial court, holding that the law did violate the First Amendment, but the North Carolina Supreme Court, the state's highest court, disagreed and reinstated the conviction. In June 2017, the U.S. Supreme Court unanimously reversed the North Carolina Supreme Court's judgment. In the majority opinion authored by Justice Anthony Kennedy, the Court held that social media—defined broadly to include Facebook, Amazon.com, The Washington Post, and WebMD, among many others—is a "protected space" under the First Amendment for lawful speech. The Court offered that North Carolina could protect children through less restrictive means, such as prohibiting "conduct that often presages a sexual crime, like contacting a minor or using a website to gather information about a minor". == Background == === North Carolina statute === In 2008, the state of North Carolina passed a law that made it a felony for a registered sex offender "to access a commercial social networking Web site where the sex offender knows that the site permits minor children to become members or to create or maintain personal Web pages". The law defined a "commercial social networking Web site" using four criteria. Specifically, the website must: be "operated by a person who derives revenue from membership fees, advertising, or other sources related to the operation of the Web site". facilitate "the social introduction between two or more persons for the purposes of friendship, meeting other persons, or information exchanges". allow "users to create Web pages or personal profiles that contain information such as the name or nickname of the user, photographs placed on the personal Web page by the user, other personal information about the user, and links to other personal Web pages on the commercial social networking Web site of friends or associates of the user that may be accessed by other users or visitors to the Web site". provide "users or visitors... mechanisms to communicate with other users, such as a message board, chat room, electronic mail, or instant messenger". The law exempted websites that "Provid[e] only one of the following discrete services: photo-sharing, electronic mail, instant messenger, or chat room or message board platform", as well as websites that have as their primary purpose "the facilitation of commercial transactions involving goods or services between [their] members or visitors". === Facts of the case === In 2002, Lester Gerard Packingham was convicted of taking "indecent liberties with a child", a felony that required him to register as a sex offender. A North Carolina court sentenced him to 10–12 months in prison with 24 months of supervised release. He was given no other special instructions on his behavior outside of prison other than to "remain away from" the minor. In 2010, after a state court dismissed a traffic ticket against Packingham, he submitted a post on Facebook under the name "J. R. Gerrard", stating: "Man God is Good! How about I got so much favor they dismissed the ticket before court even started? No fine, no court cost, no nothing spent. . . . . .Praise be to GOD, WOW! Thanks JESUS!" The Durham Police Department identified Packingham as the author of the post after cross-checking the time of the post with recently dismissed traffic tickets, and a grand jury indicted him for violating the North Carolina statute. === Lower court proceedings === Initially, Packingham moved to dismiss his indictment, arguing that it violated the First Amendment. A North Carolina Superior Court judge denied this motion, and he was convicted of violating the North Carolina social media law. Packingham appealed his conviction to the North Carolina Court of Appeals, which reversed the trial court's decision in 2013. Applying intermediate scrutiny, the court of appeals determined that North Carolina's law violated the First Amendment because it was too broad, applying to all registered sex offenders regardless of whether the offender had committed a crime involving a minor or whether the offender was a continuing threat to minors. The appeals court also stated that the law had been defined broadly enough to prohibit a registered sex offender from conducting a wide array of Internet activity, such as "conducting a 'Google' search, purchasing items on Amazon.com, or accessing a plethora of Web sites unrelated to online communication with minors". In 2015, the North Carolina Supreme Court, the state's highest court, reversed the court of appeals, holding that the law was "constitutional in all respects". The North Carolina Supreme Court found that the statute was a "limitation on conduct" and did not impede any free speech. The state had a vested interest in “forestalling the illicit lurking and contact of minors” by registered sex offenders and potential future victims, and upheld Packingham's conviction. == Supreme Court ruling == Packingham filed a petition for a writ of certiorari with the Supreme Court of the United States. The federal government also filed a brief recommending that the Supreme Court grant certiorari, arguing that the North Carolina Supreme Court incorrectly decided the case in favor of the state. The U.S. Supreme Court granted certiorari in October 2016. Amicus briefs in support of Packingham were filed by the libertarian Cato Institute and the American Civil Liberties Union. The North Carolina Supreme Court filed a brief supporting its prior decision, urging the importance of protecting minors from being stalked online. === Oral argument === The oral argument took place in February 2017. Packingham’s lawyer, David T. Goldberg, argued that the law banned “vast swaths of First Amendment activity”, went too far in restricting which Internet sites could be accessed, and forbade use of the Internet in general. The law targeted speech on some of the platforms that Americans use most often, Goldberg noted, and that under the law Packingham could not even use Twitter to read the myriad messages discussing his own case. He further noted that the law imposes punishment without regard to whether the offender actually did anything wrong. North Carolina’s senior deputy Attorney General, Robert C. Montgomery, argued for the state, and claimed that communication through social media sites is a “crucial channel”. Justice Sonia Sotomayor asked Montgomery to provide evidence as to the claim that by giving Packingham Internet privileges, he would commit another crime. Justice Stephen Breyer added that “It seems to be well-settled law that the state can’t (bar usage) unless there is a 'clear and present danger'." === Opinion of the Court === In June 2017 the Supreme Court delivered a judgment in favor of Packingham, unanimously voting to reverse the state court's ruling. Justice Anthony Kennedy authored the decision, joined by Justice Ginsburg, Justice Breyer, Justice Sotomayor, and Justice Kagan. Kennedy explained the decision: "A fundamental principle of the First Amendment is that all persons have access to places where they can speak and listen, and then, after reflection, speak and listen once more." He continued that "By prohibiting sex offenders from using those websites, North Carolina with one broad stroke bars access to what for many are the principal sources for knowing current events, checking ads for employment, speaking and listening in the modern public square, and otherwise exploring the vast realms of human thought and knowledge." Citing Ashcroft v. Free Speech Coalition as a precedent, Kennedy also wrote: "It is well established that, as a general rule, the Government 'may not suppress lawful speech as the means to suppress unlawful speech'." === Concurring opinion === Justice Samuel Alito wrote an opinion concurring in the judgment, joined by John Roberts and Clarence Thomas. While Alito agreed that the state statute at issue violated the First Amendment, he noted that there are reasonable scenarios for which legal bans for sex offenders can be placed, such as for sites targeted at teenagers. Justice Gorsuch took no part in the decision of the case. == Impact == Packingham v. North Carolina was one of the first U.S. Supreme Court cases to ana

CloudLibrary

CloudLibrary (stylized as "cloudLibrary") is a cloud-based software system through which libraries lend electronic books; it is also the name of the app that users download to access the e-books. CloudLibrary was created in 2011 by 3M as part of its library systems unit as a competitor to OverDrive, Inc.; in 2015 3M sold the North American part of that unit to Bibliotheca Group GmbH, a company founded in 2011 that was funded by One Equity Partners Capital Advisors, a division of JP Morgan Chase. By 2019, Bibliotecha had tried, unsuccessfully, to negotiate with Amazon to add Kindle-ebook compatibility to cloudLibrary - something that, as of then, Amazon had only made available to Overdrive. In that year, cloudLibrary, along with hoopla offered by Midwest Tape, ODILO, and Baker & Taylor’s Axis 360, were the main competitors to the Overdrive and Libby apps offered by OverDrive, Inc. in the library e-book market. In April 2024, Bibliotheca sold cloudLibrary to the nonprofit cooperative OCLC. By that time, cloudLibrary was used by around 500 libraries in around 20 countries in around 50 languages, and was used to lend audiobooks, digital magazines, newspapers, and comics, and streaming media, along with e-books.

Radio code

A radio code is any code that is commonly used over a telecommunication system such as Morse code, brevity codes and procedure words. == Brevity code == Brevity codes are designed to convey complex information with a few words or codes. Specific brevity codes include: ACP-131 Aeronautical Code signals ARRL Numbered Radiogram Multiservice tactical brevity code Ten-code Phillips Code NOTAM Code === Operating signals === Brevity codes that are specifically designed for use between communications operators and to support communication operations are referred to as "operating signals". These include: Prosigns for Morse code 92 Code, Western Union telegraph brevity codes Q code, initially developed for commercial radiotelegraph communication, later adopted by other radio services, especially amateur radio. Used since circa 1909. QN Signals, published by the ARRL and used by Amateur radio operators to assist in the transmission of ARRL Radiograms in the National Traffic System. R and S brevity codes, published by the British Post Office in 1908 for coastal wireless stations and ships, superseded in 1912 by Q codes X code, used by European military services as a wireless telegraphy code in the 1930s and 1940s Z code, also used in the early days of radiotelegraph communication. == Other == Morse code is commonly used in amateur radio. Morse code abbreviations are a type of brevity code. Procedure words used in radiotelephony procedure, are a type of radio code. Spelling alphabets, including the ICAO spelling alphabet, are commonly used in communication over radios and telephones. == Other meanings == Many car audio systems (car radios) have a so-called 'radio code' number which needs to be entered after a power disconnection. This was introduced as a measure to deter theft of these devices. If the code is entered correctly, the radio is activated for use. Entering the code incorrectly several times in a row will cause a temporary or permanent lockout. Some car radios have another check which operates in conjunction with car electronics. If the VIN or another vehicle ID matches the previously stored one, the radio is activated. If the radio cannot verify the vehicle, it is considered to be moved into another vehicle. The radio will then request for the code number or simply refuse to operate and display an error message such as "CANCHECK" or "SECURE".

Jais (language model)

Jais is an open-source large language model launched in August 2023. Developed as a collaboration between Emirati AI company G42, the Mohamed bin Zayed University of Artificial Intelligence (MBZUAI), and US-based Cerebras Systems, Jais was designed to produce high-quality Arabic text and was also trained on English data. The model's creation was motivated by the underrepresentation of the Arabic language in the field of generative artificial intelligence. It aims to provide a more culturally and linguistically accurate model for the world's 400 million Arabic speakers. Its name is a reference to Jebel Jais, the highest mountain in the UAE. == Background and development == Jais was developed in response to the limited availability of advanced generative artificial intelligence models for the Arabic language, despite it being spoken by over 400 million people. Existing models were often trained on limited or low-quality Arabic web content, resulting in poor performance. The project represents a significant investment by the United Arab Emirates in the field of AI as part of its national strategy. The model was created through a partnership between Inception (now Core42), a subsidiary of the Abu Dhabi-based AI company G42; the Mohamed bin Zayed University of Artificial Intelligence (MBZUAI); and Cerebras Systems, a US company specializing in AI hardware. The model is named after Jebel Jais, the highest peak in the UAE. == Training == The initial version of Jais released in August 2023 had 13 billion parameters. In November 2023, Core42 released Jais 30B, an improved version with 30 billion parameters. Both models were trained on a subset of the Cerebras Condor Galaxy 1 supercomputer. The training dataset consisted of a mix of Arabic, English, and computer code. According to Timothy Baldwin, a professor of natural language processing at MBZUAI, training the model on a diverse Arabic dataset allows it to switch between dialects. == Features == Jais is designed to generate text in both English and Arabic. The project has also released instruction-tuned "Chat" variants for both the 13B and 30B models, which are specifically optimized for conversational applications. Additional functionality for working with images, graphs, and tabular data is planned for future releases.

Virtual DOM

A virtual DOM is a lightweight JavaScript representation of the Document Object Model (DOM) used in declarative web frameworks such as React, Vue.js, and Elm. Since generating a virtual DOM is relatively fast, any given framework is free to rerender the virtual DOM as many times as needed relatively cheaply. The framework can then find the differences between the previous virtual DOM and the current one (diffing), and only makes the necessary changes to the actual DOM (reconciliation). While technically slower than using just vanilla JavaScript, the pattern makes it much easier to write websites with a lot of dynamic content, since markup is directly coupled with state. Similar techniques include Ember.js' Glimmer and Angular's incremental DOM. == History == The JavaScript DOM API has historically been inconsistent across browsers, clunky to use, and difficult to scale for large projects. While libraries like jQuery aimed to improve the overall consistency and ergonomics of interacting with HTML, it too was prone to repetitive code that didn't describe the nature of the changes being made well and decoupled logic from markup. The release of AngularJS in 2010 provided a major paradigm shift in the interaction between JavaScript and HTML with the idea of dirty checking. Instead of imperatively declaring and destroying event listeners and modifying individual DOM nodes, changes in variables were tracked and sections of the DOM were invalidated and rerendered when a variable in their scope changed. This digest cycle provided a framework to write more declarative code that coupled logic and markup in a more logical way. While AngularJS aimed to provide a more declarative experience, it still required data to be explicitly bound to and watched by the DOM, and performance concerns were cited over the expensive process of dirty checking hundreds of variables. To alleviate these issues, React was the first major library to adopt a virtual DOM in 2013, which removed both the performance bottlenecks (since diffing and reconciling the DOM was relatively cheap) and the difficulty of binding data (since components were effectively just objects). Other benefits of a virtual DOM included improved security since XSS was effectively impossible and better extensibility since a component's state was entirely encapsulated. Its release also came with the advent of JSX, which further coupled HTML and JavaScript with an XML-like syntax extension. Following React's success, many other web frameworks copied the general idea of an ideal DOM representation in memory, such as Vue.js in 2014, which used a template compiler instead of JSX and had fine-grained reactivity built as part of the framework. In recent times, the virtual DOM has been criticized for being slow due to the additional time required for diffing and reconciling DOM nodes. This has led to the development of frameworks without a virtual DOM, such as Svelte, and frameworks that edit the DOM in-place such as Angular 2. == Implementations == === React === React pioneered the use of a virtual DOM to make components declaratively. Virtual DOM nodes are constructed using the createElement() function, but are often transpiled from JSX to make writing components more ergonomic. In class-based React, virtual DOM nodes are returned from the render() function, while in functional hook-based components, the return value of the function itself serves as the page markup. === Vue.js === Vue.js uses a virtual DOM to handle state changes, but is usually not directly interacted with; instead, a compiler is used to transform HTML templates into virtual DOM nodes as an implementation detail. While Vue supports writing JSX and custom render functions, it's more typical to use the template compiler since a build step isn't required that way. === Svelte === Svelte does not have a virtual DOM, with its creator Rich Harris calling the virtual DOM "pure overhead". Instead of diffing and reconciling DOM nodes at runtime, Svelte uses compile-time reactivity to analyze markup and generate JavaScript code that directly manipulates the DOM, drastically increasing performance.