AI Headshot Generator

AI Headshot Generator — hands-on reviews, top picks, pricing, pros and cons and a practical how-to guide on Aizhi.

  • Audio mining

    Audio mining

    Audio mining is a technique by which the content of an audio signal can be automatically analyzed and searched. It is most commonly used in the field of automatic speech recognition, where the analysis tries to identify any speech within the audio. The term audio mining is sometimes used interchangeably with audio indexing, phonetic searching, phonetic indexing, speech indexing, audio analytics, speech analytics, word spotting, and information retrieval. Audio indexing, however, is mostly used to describe the pre-process of audio mining, in which the audio file is broken down into a searchable index of words. == History == Academic research on audio mining began in the late 1970s in schools like Carnegie Mellon University, Columbia University, the Georgia Institute of Technology, and the University of Texas. Audio data indexing and retrieval began to receive attention and demand in the early 1990s, when multimedia content started to develop and the volume of audio content significantly increased. Before audio mining became the mainstream method, written transcripts of audio content were created and manually analyzed. == Process == Audio mining is typically split into four components: audio indexing, speech processing and recognition systems, feature extraction and audio classification. The audio will typically be processed by a speech recognition system in order to identify word or phoneme units that are likely to occur in the spoken content. This information may either be used immediately in pre-defined searches for keywords or phrases (a real-time "word spotting" system), or the output of the speech recognizer may be stored in an index file. One or more audio mining index files can then be loaded at a later date in order to run searches for keywords or phrases. The results of a search will normally be in terms of hits, which are regions within files that are good matches for the chosen keywords. The user may then be able to listen to the audio corresponding to these hits in order to verify if a correct match was found. === Audio Indexing === In audio, there is the main problem of information retrieval - there is a need to locate the text documents that contain the search key. Unlike humans, a computer is not able to distinguish between the different types of audios such as speed, mood, noise, music or human speech - an effective searching method is needed. Hence, audio indexing allows efficient search for information by analyzing an entire file using speech recognition. An index of content is then produced, bearing words and their locations done through content-based audio retrieval, focusing on extracted audio features. It is done through mainly two methods: Large Vocabulary Continuous Speech Recognition (LVCSR) and Phonetic-based Indexing. ==== Large Vocabulary Continuous Speech Recognizers (LVCSR) ==== In text-based indexing or large vocabulary continuous speech recognition (LVCSR), the audio file is first broken down into recognizable phonemes. It is then run through a dictionary that can contain several hundred thousand entries and matched with words and phrases to produce a full text transcript. A user can then simply search a desired word term and the relevant portion of the audio content will be returned. If the text or word could not be found in the dictionary, the system will choose the next most similar entry it can find. The system uses a language understanding model to create a confidence level for its matches. If the confidence level be below 100 percent, the system will provide options of all the found matches. ===== Advantages and disadvantages ===== The main draw of LVCSR is its high accuracy and high searching speed. In LVCSR, statistical methods are used to predict the likelihood of different word sequences, hence the accuracy is much higher than the single word lookup of a phonetic search. If the word can be found, the probability of the word spoken is very high. Meanwhile, while initial processing of audio takes a fair bit of time, searching is quick as just a simple test to text matching is needed. On the other hand, LVCSR is susceptible to common issues of speech recognition. The inherent random nature of audio and problems of external noise all affect the accuracies of text-based indexing. Another problem with LVCSR is its over reliance on its dictionary database. LVCSR only recognizes words that are found in their dictionary databases, and these dictionaries and databases are unable to keep up with the constant evolving of new terminology, names and words. Should the dictionary not contain a word, there is no way for the system to identify or predict it. This reduces the accuracy and reliability of the system. This is named the Out-of-vocabulary (OOV) problem. Audio mining systems try to cope with OOV by continuously updating the dictionary and language model used, but the problem still remains significant and has probed a search for alternatives. Additionally, due to the need to constantly update and maintain task-based knowledge and large training databases to cope with the OOV problem, high computational costs are incurred. This makes LVCSR an expensive approach to audio mining. ==== Phonetic-based Indexing ==== Phonetic-based indexing also breaks the audio file into recognizable phonemes, but instead of converting them to a text index, they are kept as they are and analyzed to create a phonetic-based index. The process of phonetic-based indexing can be split into two phases. The first phase is indexing. It begins by converting the input media into a standard audio representation format (PCM). Then, an acoustic model is applied to the speech. This acoustic model represents characteristics of both an acoustic channel (an environment in which the speech was uttered and a transducer through which it was recorded) and a natural language (in which human beings expressed the input speech). This produces a corresponding phonetic search track, or phonetic audio track (PAT), a highly compressed representation of the phonetic content of the input media. The second phase is searching. The user's search query term is parsed into a possible phoneme string using a phonetic dictionary. Then, multiple PAT files can be scanned at high speed during a single search for likely phonetic sequences that closely match corresponding strings of phonemes in the query term. ===== Advantages and disadvantages ===== Phonetic indexing is most attractive as it is largely unaffected by linguistic issues such as unrecognized words and spelling errors. Phonetic preprocessing maintains an open vocabulary that does not require updating. That makes it particularly useful for searching specialized terminology or words in foreign languages that do not commonly appear in dictionaries. It is also more effective for searching audio files with disruptive background noise and/or unclear utterances as it can compile results based on the sounds it can discern, and should the user wish to, they can search through the options until they find the desired item. Furthermore, in contrast to LVCSR, it can process audio files very quickly as there are very few unique phonemes between languages. However, phonemes cannot be effectively indexed like an entire word, thus searching on a phonetic-based system is slow. An issue with phonetic indexing is its low accuracy. Phoneme-based searches result in more false matches than text-based indexing. This is especially prevalent for short search terms, which have a stronger likelihood of sounding similar to other words or being part of bigger words. It could also return irrelevant results from other languages. Unless the system recognizes exactly the entire word, or understands phonetic sequences of languages, it is difficult for phonetic-based indexing to return accurate findings. === Speech processing and recognition system === Deemed as the most critical and complex component of audio mining, speech recognition requires the knowledge of human speech production system and its modeling. To correspond the Human speech production system, the electrical speech production system is developed to consist of: Speech generation Speech perception Voiced & unvoiced speech Model of human speech The electrical speech production system converts acoustic signal into corresponding representation of the spoken through the acoustic models in their software where all phonemes are represented. A statistical language model aids in the process by identifying how likely words are to follow each other in certain languages. Put together with a complex probability analysis, the speech recognition system is capable of taking an unknown speech signal and transcribing it into words based on the program's dictionary. ASR (automatic speech recognition) system includes: Acoustic analysis: input sound waveform is transformed into a feature Acoustic model: establishes relationship between speech signal and phonemes, pronunciation model and lang

    Read more →
  • Artificial Intelligence for Digital Response

    Artificial Intelligence for Digital Response

    Artificial Intelligence for Digital Response (AIDR) is a free and open source platform to filter and classify social media messages related to emergencies, disasters, and humanitarian crises. It has been developed by the Qatar Computing Research Institute and awarded the Grand Prize for the 2015 Open Source Software World Challenge. Muhammad Imran stated that he and his team "have developed novel computational techniques and technologies, which can help gain insightful and actionable information from online sources to enable rapid decision-making" - according to him the system "combines human intelligence with machine learning techniques, to solve many real-world challenges during mass emergencies and health issues". == How to use == It can be used by logging in with ones Twitter credentials and by collecting tweets by specifying keywords or hashtags, like #ChileEarthquake, and possibly a geographical region as well. == Use == It has been deployed in conjunction with UNICEF in Zambia to classify short messages related to AIDS/HIV received through the U-Report platform. AIDR was used for the first time during the 2010 Pakistan floods. The first real test of AIDR took place during the 2014 Iquique earthquake in Chile. == Related talks and events == Muhammad Imran delivered a keynote talk on the science behind the AIDR system at the International Conference on Information Systems for Crisis Response And Management (ISCRAM). Abdelkader Lattab and Ji Lucas also presented the system at the 2016 QCRI-IBM Data Science Connect event.

    Read more →
  • Full30

    Full30

    Full30 was an American online video-sharing platform primarily dedicated to firearms and shooting sports-related content. The service was established in 2014 by Tim Harmsen and Mark Hammonds as a result of YouTube's increasing restrictions on gun-related videos. == History == After the 2018 Parkland high school shooting, many companies attempted to distance themselves from any association with the firearms industry. As a result, YouTube began demonetizing and sometimes outright deleting firearms-related videos, and in one case, popular YouTube poster Hickok45's channel was completely deleted but later restored. In response, Harmsen, who operates the Military Arms Channel on YouTube, decided to create his own video-hosting website to allow himself and other firearms content creators a platform free from such restrictions; he named the website Full30 — a reference to the popular 30-round STANAG magazine. In July 2020, site representatives announced the site had new ownership. By the end of 2022, the site began to be redirected to a series of other websites. By 2025, it was largely deactivated with the front page replaced by a form to be filled out to receive "updates", with no other explanation. == Contributors == Hickok45 Military Arms Channel Forgotten Weapons Bavarian Shooter Liberty Doll CloverTac

    Read more →
  • Digital journalism

    Digital journalism

    Digital journalism, also known as netizen journalism or online journalism, is a contemporary form of journalism where editorial content is distributed via the Internet, as opposed to publishing via print or broadcast. What constitutes digital journalism is debated amongst scholars. However, the primary product of journalism, which is news and features on current affairs, is presented solely or in combination as text, audio, video, or some interactive forms like storytelling stories or newsgames and disseminated through digital media technology. Fewer barriers to entry, lowered distribution costs and diverse computer networking technologies have led to the widespread practice of digital journalism. It has democratized the flow of information that was previously controlled by traditional media including newspapers, magazines, radio and television. Most readers expect online journalists to be reliable and competent, but these journalists often fail to meet this standard because they have very short deadlines and do not have enough resources to produce decent work. Some have asserted that a greater degree of creativity can be exercised with digital journalism when compared to traditional journalism and traditional media. The digital aspect may be central to the journalistic message and remains, to some extent, within the creative control of the writer, editor and/or publisher. It has been acknowledged that reports of its growth have tended to be exaggerated. In fact, a 2019 Pew survey showed a 16% decline in the time spent on online news sites since 2016. In the United States, reports issued by the Federal Communications Commission (FCC) in 2011 and by the Government Accountability Office (GAO) and the Congressional Research Service (CRS) in 2023 found that increases in newsroom staffing at digital-native news websites from 2008 to 2020 were not offsetting cuts in newsroom staffing among newspapers (which numbered in the tens of thousands of jobs), and that newspapers and television (which had been seeing declining newsroom staffing alongside newspapers) still employed the majority of payrolled newsroom staff in the United States in 2022 while online-only news websites employed less than 10%. The GAO and CRS reports noted further that the reduction in subscription and advertising revenue for the U.S. newspaper industry from 2000 to 2020 that constituted the overwhelming majority of its inflation-adjusted total revenue was not being offset by digital circulation or online advertising despite almost two-thirds of U.S. advertising spending in total by 2020 being online. Also, while the FCC report noted that local television stations in the United States had become some of the largest providers of local news online, the FCC found in a 2021 working paper that inflation-adjusted advertising revenue for television stations fell nationally from 2010 to 2018. == Overview == Digital journalism flows as journalism flows and is difficult to pinpoint where it is and where it is going. In partnership with digital media, digital journalism uses facets of digital media to perform journalist tasks, for example, using the internet as a tool rather than a singular form of digital media. There is no absolute agreement as to what constitutes digital journalism. Mu Lin argues that, "Web and mobile platforms demand us to adopt a platform-free mindset for an all-inclusive production approach – create the [digital] contents first, then distribute via appropriate platforms." The repurposing of print content for an online audience is sufficient for some, while others require content created with the digital medium's unique features like hypertextuality. Fondevila Gascón adds multimedia and interactivity to complete the digital journalism essence. For Deuze, online journalism can be functionally differentiated from other kinds of journalism by its technological component which journalists have to consider when creating or displaying content. Digital journalistic work may range from purely editorial content like CNN (produced by professional journalists) online to public-connectivity websites like Slashdot (communication lacking formal barriers of entry). The difference of digital journalism from traditional journalism may be in its re-conceptualised role of the reporter in relation to audiences and news organizations. The expectations of society for instant information was important for the evolution of digital journalism. However, it is likely that the exact nature and roles of digital journalism will not be fully known for some time. Some researchers even argue that the free distribution of online content, online advertisement and the new way recipients use news could undermine the traditional business model of mass media distributors that is based on single-copy sales, subscriptions and the selling of advertisement space. == History == The first type of digital journalism, called teletext, was invented in the UK in 1970. Teletext is a system allowing viewers to choose which stories they wish to read and see it immediately. The information provided through teletext is brief and instant, similar to the information seen in digital journalism today. The information was broadcast between the frames of a television signal in what was called the vertical blanking interval or VBI. American journalist Hunter S. Thompson relied on early digital communication technology beginning by using a fax machine to report from the 1971 US presidential campaign trail as documented in his book Fear and Loathing on the Campaign Trail. After the invention of teletext was the invention of videotex, of which Prestel was the world's first system, launching commercially in 1979 with various British newspapers, such as the Financial Times lining up to deliver newspaper stories online through it. Videotex closed down in 1986 due to failing to meet end-user demand. American newspaper companies took notice of the new technology and created their own videotex systems, the largest and most ambitious being Viewtron, a service of Knight-Ridder launched in 1981. Others were Keycom in Chicago and Gateway in Los Angeles. All of them had closed by 1986. Next came computer Bulletin Board Systems. In the late 1980s and early 1990s, several smaller newspapers started online news services using BBS software and telephone modems. The first of these was the Albuquerque Tribune in 1989. Computer Gaming World in September 1992 broke the news of Electronic Arts' acquisition of Origin Systems on Prodigy, before its next issue went to press. Online news websites began to proliferate in the 1990s. An early adopter was The News & Observer in Raleigh, North Carolina which offered online news as Nando. Steve Yelvington wrote on the Poynter Institute website about Nando, owned by The N&O, by saying "Nando evolved into the first serious, professional news site on the World Wide Web". It originated in the early 1990s as "NandO Land". It is believed that a major increase in digital online journalism occurred around this time when the first commercial web browsers, Netscape Navigator (1994) and Internet Explorer (1995). By 1996, most news outlets had an online presence. Although journalistic content was repurposed from original text/video/audio sources without change in substance, it could be consumed in different ways because of its online form through toolbars, topically grouped content, and intertextual links. A twenty-four-hour news cycle and new ways of user-journalist interaction web boards were among the features unique to the digital format. Later, portals such as AOL and Yahoo! and their news aggregators (sites that collect and categorize links from news sources) led to news agencies such as The Associated Press to supplying digitally suited content for aggregation beyond the limit of what client news providers could use in the past. Also, Salon, was founded in 1995. In 2001, the American Journalism Review called Salon the Internet's "preeminent independent venue for journalism." In 2008, for the first time, more Americans reported getting their national and international news from the internet, rather than newspapers. Young people aged 18 to 29 now primarily get their news via the Internet, according to a Pew Research Center report. Audiences to news sites continued to grow due to the launch of new news sites, continued investment in news online by conventional news organizations, and the continued growth in internet audiences overall. Sixty-five percent of youth now primarily access the news online. Mainstream news sites are the most widespread form of online news media production. As of 2000, the vast majority of journalists in the Western world now use the internet regularly in their daily work. In addition to mainstream news sites, digital journalism is found in index and category sites (sites without much original content but multiple links to existing news sites), meta- and comment sites (sites about

    Read more →
  • Language-Theoretic Security

    Language-Theoretic Security

    Language-theoretic security, or LangSec, is an approach to software security that focuses on input handling, complexity, and program design as strategies to improve the verifiability of computer programs. It was introduced in 2005 by Robert J. Hansen and Meredith L. Patterson at BlackHat and in 2011 by Len Sassaman and Patterson. It aims to create a formal description of which software is likely to have security vulnerabilities of particular classes, and why. It considers programs to have an inherent parser component, whether or not explicit, composed of that part of the program which operates on external input before that input is fully parsed. A central hypothesis of language-theoretic security is that vulnerabilities in software increase according to the computational power of the notional input-accepting automaton equivalent to this parser, using the definitions of automata theory. The lower bound on this computational power is the input language complexity of the program. The extent to which reducing this complexity is possible is a function of the specification of the communication protocol or file format the program takes as input. == Parsing as a security mechanism == The behaviour of a program is defined with reference to its expected input. Unexpected input being used by a program is a factor in numerous security bugs, including the so-called Android master key vulnerability (CVE-2013-4787), because accepting unexpected input renders the program's specification ambiguous. In that instance, the unexpected ambiguity came in the form of a ZIP file with duplicate filenames. If a program fully parses its input and only acts on input that unambiguously meets the specification, it follows that the program will avoid these types of vulnerabilities. This is an intentional inversion of the Postel principle. Accepting only unambiguous and valid input is a more formal requirement than input validation or sanitization, and narrows the number of possible but unanticipated program states that can be induced in an application via user input. Conversely, failure to do this is associated with security vulnerabilities. Input sanitization in particular is held to be an inadequate approach to avoiding malicious input because it inherently ignores context-sensitive properties of the input; it can therefore result in paradoxical effects, such as sanitization code activating otherwise inert cross-site scripting payloads in browsers. === Parser differentials === If the language of accepted program input is sufficiently simple, it is possible to verify that two implementations parse the same input language consistently. This is advantageous because it shows no parser differential exists between the two implementations. The requisite level of simplicity is theoretically that for which there is a solution to the equivalence problem. If the two parsers involved in CVE-2013-4787 were equivalent - that is, if they rendered the same output state given the same input state - the vulnerability could not have existed. One strategy for doing this is to publish machine-readable specifications of a format or protocol, and then use a parser generator to generate the parser code. An example of a parser generator built for this purpose is DaeDaLus. The combination of Lex with any of GNU Bison, ANTLR, or Yacc also accomplishes this. However, many parser generators allow the mixing of general purpose code with the parsing definitions, which weakens the guarantees provided by parsing. === Analysis of injection attacks === Injection attacks are generally the result of differences between the serializer (or "unparser") and the corresponding parser at a layer boundary in a system; therefore, they are a special case of parser differentials. In a SQL injection attack, for example, an attacker is able to cause the application with which they are interacting to serialize a SQL query that has different semantics than intended. In the simplest case where the payload ends a string and adds new code, the payload has crossed the code-data boundary in SQL. In language-theoretic security, this is treated as a bug in the serializer of the SQL query, which should instead be written in a way that constrains its possible outputs to those within the scope of the intended query. === Parser combinators === If a parser generator is not used, it is still possible to avoid implementation bugs by using parser combinator such as Nom to implement the parser code. This has the drawback of relying on a programmer correctly translating the specification into the language of the parser generator library, though this task is still less error-prone than hand-coding a parser. == Input format complexity == Complexity in computer programs is associated with security vulnerabilities. Within the domain of language-theoretic security, complexity is described with reference to the computational power of the abstract machine necessary to implement the program, or more particularly, to implement the parser for its input language. This complexity describes whether it is possible to show that there is no unintended or undesired functionality in the program which might be exploitable by an attacker. To be bounded in complexity, the program's input must be well-defined both in terms of form and of semantics. === Weird machines === A weird machine is a model of computation in a program that exists in parallel with, but is distinct from, the intended abstract model of computation in that program. Some classes of weird machine arise from the multi-layered nature of computer programs, or the context in which the programs run; others result from the unanticipated functionality a program has due to its complexity or to software bugs. The more complex the computation model of a program, the more likely it is to implement a weird machine. Depending on context, the weird machine may or may not be concretely useful for an attacker. Since the space of weird machines in the context of some program is the universe of all possible states that are not within the program's intended states, many exploited states including remote code execution and injection attacks belong to the domain of weird machines. A reduction in weird machines is therefore a likely correlate with reduced program vulnerability. === SafeDocs project === SafeDocs is a DARPA project undertaken in 2018 to take existing file formats, create safer subsets of them, and develop programming tools to work for the safer formats. The initial test case for this was PDF. The purpose of creating safer subsets in this case is to lower the minimum bound on parser complexity so that it becomes possible to create tools that will generate correct, normative parsers for them. == Relation to programming languages == The analytic framework of language-theoretic security assumes programs to be virtual machines that execute their input. A document that is read by an application is in this sense a form of machine code, in a generalization of the data as code idea, following the automata theory description of parsers. === Type-safe programming languages === Parsing input and serializing output are operations that consume one data type and emit another. A programming language can therefore check that data is correctly parsed and contains the expected structure by checking data types, and correct serializing (or unparsing) can be implemented as operations on the data types that are relevant to the program's output. This approach can be used to show that the recognizer and unparser patterns have been implemented. It is also possible to implement type checking across a distributed system to enforce parsing and unparsing of the expected structures and to verify that the assumptions made in designing the compositional properties of a distributed system have been followed. === Memory-safe programming languages === In the general case, spatial memory correctness is undecidable. If any proof of spatial memory correctness is to be made, it is therefore necessary to bound the complexity of the code. Interpreted languages such as Java and Python effectively accomplish this via runtime bounds checking, and frameworks for runtime bounds checking also exist for C. The effect of these strategies for spatial memory correctness are to create a halt state in place of a spatial memory correctness violation; therefore, it can be shown that the program will not violate spatial memory correctness, but in exchange, it cannot be shown in the general case that programs will not have runtime bounds checking exceptions. Some programming languages, such as Rust, accomplish this using borrow checking. The borrow checker acts to assure spatial memory correctness by compile-time reference counting. Code for which spatial memory correctness cannot be shown to not be violated therefore does not compile, inherently limiting the complexity of the spatial memory correctness of the program to what is decidable. Thi

    Read more →
  • Photonically Optimized Embedded Microprocessors

    Photonically Optimized Embedded Microprocessors

    The Photonically Optimized Embedded Microprocessors (POEM) is DARPA program. It should demonstrate photonic technologies that can be integrated within embedded microprocessors and enable energy-efficient high-capacity communications between the microprocessor and DRAM. For realizing POEM technology CMOS and DRAM-compatible photonic links should operate at high bit-rates with very low power dissipation. == Current research == Currently research in this field is at University of Colorado, Berkley University, and Nanophotonic Systems Laboratory ( Ultra-Efficient CMOS-Compatible Grating Coupler Design).

    Read more →
  • Optical recording

    Optical recording

    The history of optical recording can be divided into a few number of distinct major contributions. The pioneers of optical recording worked mostly independently, and their solutions to the many technical challenges have very distinctive features, such as reflective disc (Compaan and Kramer) transparent disc (Gregg) floppy disc (Russell) rigid disc (Compaan and Kramer) focused laser beam for read-out through transparent substrate (Compaan and Kramer). == Gregg 1958 == Laserdisc technology, using a transparent disc, was invented by David Paul Gregg in 1958 (and patented in 1970 and 1990). By 1969 Philips had developed a videodisc in reflective mode, which has great advantages over the transparent mode. MCA and Philips decided to join their efforts. They first publicly demonstrated the videodisc in 1972. Laserdisc was first available on the market, in Atlanta, on December 15, 1978, two years after the VHS VCR and four years before the CD, which is based on Laserdisc technology. Philips produced the players and MCA produced the discs. The Philips/MCA cooperation was not successful, and discontinued after a few years. Several of the scientists responsible for the early research (John Winslow, Richard Wilkinson and Ray Dakin) founded Optical Disc Corporation (now ODC Nimbus). == Russell 1965 == While working at Pacific Northwest National Laboratory, James Russell invented an optical storage system for digital audio and video, patenting the concept in 1970. The earliest patents by Russell, US 3,501,586, and 3,795,902 were filed in 1966, and 1969. respectively. He built prototypes, and the first was operating in 1973. Russell had found a way to record digital information onto a photosensitive plate in tiny dark spots, each spot one micrometre from centre to centre, with a laser that wrote the binary patterns. Russell's first optical disc was distinctly different from the eventual compact disc product: the disc in the player was not read by laser light. A key characteristic of Russell's invention is that a laser is not used for the reading the disc, instead the entire disc or oblong sheet to be read is illuminated by a large playback light source at the back of the transparent foil. As a result, the information density is relatively low. By 1985, Russell held over 25 patents to various technologies related to optical recording and playback. Russell's intellectual property was purchased by Optical Recording Corporation (ORC) in Toronto in 1985, and this firm notified a number of CD manufacturers that their CD technology was based on patents held by ORC. In 1987, ORC signed an agreement with Sony whereby Sony paid for licensing of the technology. Further licenses followed from Philips and others. Warner Communications did not sign, and was sued by ORC. In 1992, the large CD manufacturer, now called Time Warner, was ordered to pay ORC US$30 million in patent violations. In the 1970 patent, the spot diameter was around 10 micrometres. Thus, the areal information density was around a factor hundred less than that of the CD as later developed. Russell continued to refine the concept throughout the 1970s. Philips and Sony, however, were able to put far greater resources into the parallel development of the concept, arriving at a smaller and more sophisticated product in just a few years. Russell's various partners and ventures failed to produce a single consumer product. == Korpel 1968 == Adrianus Korpel worked for the Zenith Electronics Corporation, when he developed very early optical videodisc systems, including holographic storage. == Kramer and Compaan 1969 == The Philips development of the videodisc technology began in 1969 with efforts by Dutch physicists Klaas Compaan and Piet Kramer to record video images in holographic form on disc. Their prototype Laserdisc shown in 1972 used a laser beam in reflective mode to read a track of pits using an FM video signal. Together with MCA, Philips brought the optical videodisk to market in 1978. The cooperation between Philips and MCA did not last long, and discontinued after a few years. == Immink and Doi 1979 == The Compact Disc (CD), which is based on MCA/Philips Laserdisc technology, was developed by a taskforce of Sony and Philips in 1979–1980. Toshi Doi and Kees Schouhamer Immink created the digital technologies that turned the analog Laserdisc into a high-density low-cost digital audio disc. The CD, available on the market since October 1982, remains the standard physical medium for sale of commercial audio recordings Standard CDs have a diameter of 120 mm and can hold up to 80 minutes of audio (700 MB of data). The Mini CD has various diameters ranging from 60 to 80 mm; they are sometimes used for CD singles or device drivers, storing up to 24 minutes of audio. The technology was later adapted and expanded to include data storage CD-ROM, write-once audio and data storage CD-R, rewritable media CD-RW, Super Audio CD (SACD), Video Compact Discs (VCD), Super Video Compact Discs (SVCD), PhotoCD, PictureCD, CD-i, and Enhanced CD. CD-ROMs and CD-Rs remain widely used technologies in the computer industry. The CD and its extensions have been extremely successful: in 2004, worldwide sales of CD audio, CD-ROM, and CD-R reached about 30 billion discs. By 2007, 200 billion CDs had been sold worldwide.

    Read more →
  • Ethiopian feminists facing digital gender-based violence

    Ethiopian feminists facing digital gender-based violence

    Against a background of traditional views of women, rising internet use, a young population and an unsafe offline life, women and girls in Ethiopia are facing increasing amounts of digital violence. Some women, feeling endangered, have left the country as a result. Researchers, activists and lawyers have called for online content to be taken down and specific digital legislation to be drafted and enforced. == Online violence and its offline effects == Sexual violence against women and girls in Ethiopia is common. In 2023, in the Women, Peace and Security Index by Georgetown University, Ethiopia came 146 out of 177 countries. Over several years online harassment of and violence against women and girls in Ethiopia has increased. It can range from sexist remarks about appearance and women’s role in society, to revenge porn, threats of beating, acid attacks, abduction, rape or death. The real-life effect on women and girls of these attacks can include mental health problems, damaged reputations and a withdrawal from public and economic life. When the online attacks migrate to the real world, for example when online attackers find out where the targeted women and girls live, this can result in physical attacks, street harassment, threats to children and can cause victims to move house or job or even flee the country in fear of femicide. In a country that criminalises homosexuality, it can also lead to physical attacks on LGBTQI+ people in particular and indeed on anybody labelled as homosexual. == Research studies == The Centre for Information Resilience (CIR) conducted interviews with Ethiopian women holding public roles or being active online. The centre published a report on this in 2024 entitled ‘Silenced, Shamed and Threatened’. They found that technology-facilitated gender-based violence (TFGBV) had become “normalised to the point of invisibility.” In 2024, CER also published an analysis of gendered hate speech on social media in Ethiopia called ‘Normalised and invisible.’ It is thought that traditional views of women, the young population, the rise in internet use and the war in Tigray, when sexual violence was used as a weapon of war by Ethiopian and Eritrean soldiers, have all helped to create an online environment in which even femicide is considered unremarkable. AFP Fact Check collaborated with Deutsche Welle Akademie, to investigate the cyber harassment of women in Ethiopia, analysing misogynistic posts published on TikTok and Facebook. They discovered disparaging remarks about women’s physical appearance, threats of acid attacks and other physical violence, and the public sharing of women’s phone numbers. == Individuals affected == Women in particular jeopardy of digital gender-based violence are feminists, activists, politicians and those with a public profile. Some women are known to have fled Ethiopia fearing for their lives after online and offline threats. Yordanos Bezabih, an Ethiopian women’s rights activist, started a campaign with the hashtag #JusticeforHeaven to fight against gender-based cyberspace violence. As a result, she herself become a target. She experienced years of online threats of acid attacks, gang-rape and death. In 2025, subscribers to an online community organised a search for her address. Deepfake nude images of her were shared, she was filmed in real life, her house and online accounts were broken into, her private photos and messages posted on social media. When the attackers finally circulated her address, suggesting that she be executed, she left Ethiopia on a human rights defender scholarship. In 2023, Lella Misikir helped to start a campaign, called ‘My Whistle, My Voice’, that suggested women carry whistles and use them if they were harassed in the street. A TikTok video of the campaign became popular. Shortly after, videos of Misikir were circulated suggesting that she was gay. Her online attackers next searched for her address. In November 2024, Misikir left the country. == Legal issues == Ethiopia has some laws on online harassment and defamation, for example the Computer Crimes Proclamation. However, technology-facilitated, gender-based violence (TFGBV), such as deepfakes, non-consensual image sharing, and coordinated harassment, is not explicitly recognized as crime. In practice too, women are often not believed when reporting such violence and are not taken seriously. Police advice is often that women affected should simply leave the online space. Social media platforms can remove content when it is brought to their attention but the offenders are not banned. Users can only block them.

    Read more →
  • TAChart

    TAChart

    TAChart is a component for the Lazarus IDE that provides charting services. Similar to Tchart and Teechart for Delphi it supports a collection of different chart types including bar charts, pie charts, line charts and point series. Apart from a screen canvas, output is possible in form of SVG, OpenGL, printer, WMF, and other formats. TAChart is bundled with the Lazarus Component Library. Although not intended to be a TChart clone, why its usage differs in certain points, its basic functionality is very similar and some source code written for TeeChart may be reused. == History == The first version of TAChart was developed by Philippe Martinole for the TeleAuto project, a program for automation of astronomic observations. Later functionality was introduced by Luis Rodrigues while porting the Epanet application from Delphi to Lazarus. In the ensuing years the code has extensively rewritten, expanded and is now maintained by Alexander Klenin. == Data sources == TAChart is able to use input from various sources. Examples include lists of real values, user defined buffers in the computer's memory, vectors of random values, fields in databases, calculated values provided by pre-defined functions and results of embedded code written in Pascal Script

    Read more →
  • Amplified conference

    Amplified conference

    An amplified conference is a conference or similar event in which the talks and discussions at the conference are 'amplified' through use of networked technologies in order to extend the reach of the conference deliberations. The term was originally coined by Lorcan Dempsey in a blog post. The term is now widely used within the academic and research community with Wankel proposing the following definition: The extension of a physical event (or a series of events) through the use of social media tools for expanding access to (aspects of) the event beyond physical and temporal bounds. Such amplification takes place in the context of intent to make the most of the intellectual content, discussion, networking, and discovery initiated by the event through the process of sharing with co-attendees, colleagues, friends and wider informed publics. A paper by Haider and others illustrates how amplified conferences are becoming mainstream in a discussion on "how social media have been employed as part of the project, particularly around event amplification". As described by Guy in the Ariadne ejournal the term is not a prescriptive one, but rather describes a pattern of behaviors which initially took place at IT and Web-oriented conferences once WiFi networks started to become available at conference venues and delegates started to bring with them networked devices such as laptops and, more recently, PDAs and mobile phones. == Different Approaches to 'Amplification' of Conferences == There are a number of ways in which conferences can be amplified through use of networked technologies: Amplification of the audiences' voice: Prior to the availability of real time chat technologies at events (whether use of IRC, Twitter, instant messaging clients, etc.) it was only feasible to discuss talks with immediate neighbours, and even then this may be considered rude. Amplification of the speaker's talk: The availability of video and audio-conferencing technologies make it possible for a speaker to be heard by an audience which isn't physically present at the conference. Although use of video technologies has been available to support conferences for some time, this has normally been expensive and require use of dedicated video-conferencing technologies. However the availability of lightweight desktop tools make it much easier to deploy such technologies, without even, requiring the involvement of conference organisers. Amplification across time: Video and audio technologies can also be used to allow a speaker's talk to be made available after the event, with use of podcasting or videocasting technologies allowing the talks to be easily syndicated to mobile devices as well as accessed on desktop computers. Amplification of the speaker's slides: The popularity of global repository services for slides, such as SlideShare, enable the slides used by a speaker to be more easily found, embedded on other Web sites and commented upon, in ways that were not possible when the slides, if made available at all, were only available on a conference Web site. Amplification of feedback to the speaker: Micro-blogging technologies, such as Twitter, are being used not only as a discussion channel for conference participants but also as a way of providing real-time feedback to a speaker during a talk. We are also now seeing dedicated microblogging technologies, such as Coveritlive and Scribblelive, being developed which aim to provide more sophisticated 'back channels' for use at conferences. Amplification of a conference's collective memory: The popularity of digital cameras and the photographic capabilities of many mobile phones is leading to many photographs being taken at conferences. With such photographs often being uploaded to popular photographic sharing services, such as Flickr, and such collections being made more easy to discover through agreed use of tags, we are seeing amplification of the memories of an event though the sharing of such resources. The ability of such photographic resources to be 'mashed up' with, say, accompanying music, can similarly help to enrich such collective experiences. Amplification of the learning: The ability to be able to follow links to resources and discuss the points made by a speaker during a talk can enrich the learning which takes place at an event, as described by Shabajee's article on "'Hot' or Not? Welcome to real-time peer review" published in the Times Higher Education Supplement in May 2003. Long term amplification of conference outputs: The availability in a digital format of conference resources, including 'official' resources such as slides, video and audio recordings, etc. which have been made by the conference organisers with the approval of speakers, together with more nebulous resources such as archives of conference back channels, and photographs and unofficial recordings taken at the event may help to provide a more authentic record of an event, which could potentially provide a valuable historical record. The amplification of conferences can be viewed as an example of how new technologies are altering standard practice. By using these techniques a different type of interaction is created at the conference itself, but also the boundaries around the conference can be seen as permeable, with remote participants engaging in discussion. An amplified conference also provides a considerably altered archive compared with a 'traditional' one. For the latter, the printed proceedings will be the main record, but for an amplified event this record is distributed across many media and takes in a wider range of content types, including the papers, videos of the presentations (for example on YouTube), the slides (e.g. on Slideshare), photos of the event (Flickr), interaction between participants (Twitter), reflections and comments (blogs), etc. The amplified conference represents an example of changing practice in digital scholarship.

    Read more →
  • Hardware backdoor

    Hardware backdoor

    A hardware backdoor is a backdoor implemented within the physical components of a computer system, also known as its hardware. They can be created by introducing malicious code to a component's firmware, or even during the manufacturing process of an integrated circuit. Often, they are used to undermine security in smartcards and cryptoprocessors, unless investment is made in anti-backdoor design methods. They have also been considered for car hacking. Backdoors differ from hardware Trojans as backdoors are introduced intentionally by the original designer or during the design process, whereas hardware Trojans are inserted later by an external party. == Background == The existence of hardware backdoors poses significant security risks for several reasons. They are difficult to detect and are impossible to remove using conventional methods like antivirus software. They can also bypass other security measures, such as disk encryption. Hardware trojans can be introduced during manufacturing where the end-user lacks control over the production chain. == History == In 2008, the FBI reported the discovery of approximately 3,500 counterfeit Cisco network components in the United States, some of which were introduced in military and government infrastructure. In the same year, the possibility of a backdoor SPARC CPU was demonstrated with an FPGA running Linux that supported various hidden malicious services. A few years later, in 2011, Jonathan Brossard presented "Rakshasa", a proof-of-concept hardware backdoor. This backdoor could be installed by an individual with physical access to the hardware. It utilized coreboot to re-flash the BIOS with a SeaBIOS and iPXE-based bootkit composed of legitimate, open-source tools, allowing malware to be fetched from the internet during the boot process. The following year, in 2012, Sergei Skorobogatov and Christopher Woods from the University of Cambridge Computer Laboratory reported the discovery of a backdoor in a military-grade FPGA device, which could be exploited to access and modify sensitive information. It has been said that this was proven to be a software problem and not a deliberate attempt at sabotage. This still brought to attention that equipment manufacturers should ensure that microchips operate as intended. Later that year, two mobile phones developed by the Chinese company ZTE were found to carry a root access backdoor. According to security researcher Dmitri Alperovitch, the exploit used a hard-coded password in its software. Starting in 2012, the United States stated that Huawei might have backdoors present in their products. In 2013, researchers at the University of Massachusetts devised a method of breaking a CPU's internal cryptographic mechanisms by introducing specific impurities into the crystalline structure of transistors to change Intel's random-number generator. Documents revealed from 2013 onwards during the surveillance disclosures initiated by Edward Snowden showed that the Tailored Access Operations (TAO) unit and other NSA employees intercepted servers, routers, and other network gear being shipped to organizations targeted for surveillance to install covert implant firmware onto them before delivery. These tools include custom BIOS exploits that survive the reinstallation of operating systems and USB cables with spy hardware and radio transceiver packed inside. In June 2016 it was reported that University of Michigan Department of Electrical Engineering and Computer Science had built a hardware backdoor that leveraged "analog circuits to create a hardware attack" so that after the capacitors store up enough electricity to be fully charged, it would be switched on, to give an attacker complete access to whatever system or device − such as a PC − that contains the backdoored chip. In the study that won the "best paper" award at the IEEE Symposium on Privacy and Security they also note that microscopic hardware backdoor wouldn't be caught by practically any modern method of hardware security analysis, and could be planted by a single employee of a chip factory. In October 2018 Bloomberg reported that an attack by Chinese spies reached almost 30 U.S. companies, including Amazon and Apple, by compromising America's technology supply chain. == Countermeasures == Skorobogatov has developed a technique capable of detecting malicious insertions into chips. New York University Tandon School of Engineering researchers have developed a way to corroborate a chip's operation using verifiable computing whereby "manufactured for sale" chips contain an embedded verification module that proves the chip's calculations are correct and an associated external module validates the embedded verification module. Another technique developed by researchers at University College London (UCL) relies on distributing trust between multiple identical chips from disjoint supply chains. Assuming that at least one of those chips remains honest the security of the device is preserved. Researchers at the University of Southern California Ming Hsieh Department of Electrical and Computer Engineering and the Photonic Science Division at the Paul Scherrer Institute have developed a new technique called Ptychographic X-ray laminography. This technique is the only current method that allows for verification of the chips blueprint and design without destroying or cutting the chip. It also does so in significantly less time than other current methods. Anthony F. J. Levi Professor of electrical and computer engineering at University of Southern California explains “It’s the only approach to non-destructive reverse engineering of electronic chips—[and] not just reverse engineering but assurance that chips are manufactured according to design. You can identify the foundry, aspects of the design, who did the design. It’s like a fingerprint.” This method currently is able to scan chips in 3D and zoom in on sections and can accommodate chips up to 12 millimeters by 12 millimeters easily accommodating an Apple A12 chip but not yet able to scan a full Nvidia Volta GPU. "Future versions of the laminography technique could reach a resolution of just 2 nanometers or reduce the time for a low-resolution inspection of that 300-by-300-micrometer segment to less than an hour, the researchers say."

    Read more →
  • DBOS

    DBOS

    DBOS (Formerly Database-Oriented Operating System, now just DBOS) is an open source durable workflow execution software library written for the Python, TypeScript, Java, and Go programming languages. DBOS arose from a joint open source project from MIT and Stanford, after a discussion between Michael Stonebraker and Matei Zaharia on how to scale and improve scheduling and performance of millions of Apache Spark tasks. Today it is a commercial company that offers an open source system to add durable computing to any software, built on concepts derived from the joint research project. == History == === 2020: Academic R&D Project === DBOS originated in 2020 as a joint open source project between MIT, Stanford, and Carnegie Mellon. The project explored the idea of operating system services built atop a distributed database - a database-oriented operating system meant to simplify and improve the scalability, security and resilience of large-scale distributed applications. The basic concept was to run a multi-node multi-core, transactional, highly-available distributed database, such as VoltDB, as the only application for a microkernel, and then to implement scheduling, messaging, file systems and other operating system services on top of the database. The architectural philosophy is described by this quote from the abstract of their initial preprint: All operating system state should be represented uniformly as database tables, and operations on this state should be made via queries from otherwise stateless tasks. This design makes it easy to scale and evolve the OS without whole-system refactoring, inspect and debug system state, upgrade components without downtime, manage decisions using machine learning, and implement sophisticated security features. A prototype was built with competitive performance to existing systems. ==

    Read more →
  • Kuaishou

    Kuaishou

    Kuaishou Technology is a Chinese publicly traded partly state-owned holding company based in Haidian District, Beijing, that was founded in 2011 by Hua Su (Chinese: 宿华) and Cheng Yixiao (Chinese: 程一笑). The company, listed on the Hong Kong Stock Exchange, is known for developing a mobile app for sharing users' short videos, a social network, and video special effects editor. The app is known as Kwai in many countries outside of China. It is also known as Snack Video in India, Pakistan and Indonesia. == Ownership and governance == Kuaishou's overseas team is led by the former CEO of the application 99, and staff from Google, Facebook, Netflix, and TikTok were recruited to lead the company's international expansion. The China Internet Investment Fund, a state-owned enterprise controlled by the Cyberspace Administration of China, holds a golden share ownership stake in Kuaishou. == History == Kuaishou is China's first short video platform that was developed in 2011 by engineer Hua Su and Cheng Yixiao. Prior to co-founding Kuaishou, Su Hua had worked for both Google and Baidu as a software engineer. The company is headquartered in Haidian District, Beijing. Kuaishou's predecessor "GIF Kuaishou" was founded in March 2011. GIF Kuaishou was a mobile app with which users could make and share GIF pictures. In 2013, Kuaishou became a short-video social platform. By 2013, the app had reached 100 million daily users. By 2019, it had exceeded 200 million active daily users. In March 2017, Kuaishou closed a US$350 million investment round that was led by Tencent. In January 2018, Forbes estimated the company's valuation to be US$18 billion. In April 2018, Kuaishou's app was briefly banned from Chinese app stores after China Central Television (CCTV) reported on the platform popularizing videos of teenage mothers. In 2019, the company announced a partnership with the People's Daily, an official newspaper of the Central Committee of the Chinese Communist Party, to help it experiment with the use of artificial intelligence in news. In June 2020, following the start of the 2020–2021 China–India skirmishes, the Government of India banned Kwai along with 58 other apps, citing "data and privacy issues". In January 2021, Kuaishou announced it was planning an initial public offering (IPO) to raise approximately US$5 billion. Kuaishou's stock completed its first day of trading at $300 Hong Kong dollars (HKD) (US$38.70), more than doubling its initial offer price, and causing its market value to rise to over $1 trillion HKD (US$159 billion). In February 2021, Kuaishou made a debut on the Hong Kong Stock Exchange, with its shares soaring by 194% at the opening. The company subsequently encountered major setbacks as a result of heightened regulatory restrictions on Chinese internet firms, which contributed to its share price falling by nearly 80% from its post-IPO peak. By December 2021, Kuaishou announced a major reorganization, including the layoff of 30% of its staff, primarily targeting mid-level employees earning an annual salary of $157,000 or more. This restructuring aimed to cut costs and mitigate financial losses. In October 2022, state-owned Beijing Radio and Television Station took a minority ownership stake in Kuaishou. In April 2024, a Financial Times article citing current and former Kuaishou employees stated that the company has been running an ageist redundancy programme known internally as "Limestone", culling workers in their mid-30s. In June 2024, Kuaishou and the Sichuan international communication center launched a branch center in São Paulo, Brazil. In June 2024, Kuaishou released its diffusion transformer text-to-video model, Kling, which they claimed could generate two minutes of video at 30 frames per second and in 1080p resolution. The model has been compared to that of OpenAI's Sora text-to-video model. It is accessible to the public on Kuaishou's video editing app KwaiCut via signing up for a waitlist with a Chinese phone number. In December 2025, Kuaishou came under a cyberattack which led to a temporary influx of violent and pornographic content. == Popularity == As of 2019, it had a worldwide user base of over 200 million, leading the "Most Downloaded" lists of the Google Play and Apple App Store in eight countries, such as Brazil, where it was introduced in 2019. Its main short-video platform competitor was Douyin, which is known as TikTok outside China. Compared to Douyin, Kuaishou is more popular with older users living outside China's Tier 1 cities. Its initial popularity came from videos of Chinese rural life. The app is particularly well known for its "rustic" aesthetic and is popular among rural people. Kuaishou also relied more on e-commerce revenue than on advertising revenue compared to its main competitor. == Reception == Kwai (as the app is called outside of China) was banned in India in 2020 along with other short video apps like TikTok. Kuaishou then released the clone SnackVideo, which was subsequently also banned. The app is one of the most popular social media platforms in Brazil, where Kuaishou partnered with creators to make telenovela style content, and appeals to football fans by working with football teams CR Flamengo and Santos FC and sponsoring the tournament Copa América. Kwai was notable in Brazil for spreading information (and misinformation) about the COVID-19 vaccine and political misinformation. === Manjiao Wenhua === "Manjiao wenhua" (慢脚文化) is a sarcasm term on Chinese internet on the unethical or illegal contents on Kuaishou. State broadcaster China Central Television (CCTV) reported that many contents are about child pregnancy. "Dating, pregnancy, bearing a child...these are strictly prohibited in the real time by a minor, but these contents can easily shown to audiences here." In addition, many students from primary or secondary schools make a pose of smoking. Wang Zhenhui (王贞会) from CUPSL stated that these kinds of bad values will give negative effects to the minors.

    Read more →
  • Digital cassettes

    Digital cassettes

    Digital audio cassette formats introduced to the professional audio and consumer markets: Digital Audio Tape (or DAT) is the most well-known, and had some success as an audio storage format among professionals and "prosumers" before the prices of hard drive and solid-state flash memory-based digital recording devices dropped in the late 1990s. Hard-drive recording has mostly made DAT obsolete, as hard disk recorders offer more editing versatility than tape, and easier importation into digital audio workstations (DAWs) and non-linear video editing (NLE) systems. Digital Compact Cassette was intended as a digital replacement for the mass-market analog cassette tape, but received very little attention or adaptation. Its failure is generally attributed to higher production costs than audio CDs, durability and indifferent reception by consumers. Digital video cassettes include: Betacam IMX (Sony) D-VHS (JVC) D1 (Sony) D2 (Sony) D3 D5 HD Digital-S D9 (JVC) Digital Betacam (Sony) Digital8 (Sony) DV HDV ProHD (JVC) MiniDV MicroMV == Analog cassettes used as digital data storage == Historically, the compact audio cassette which was originally designed for analog storage of music was used as an alternative to disk drives in the late 1970s and early 1980s to provide data storage for home computers. There is a number of unique and incompatible cassette tape data storage formats that all use the same analog compact audio cassette tape media. The ADAT system uses Super VHS tapes to record 8 synchronized digital audiotracks at once. There have also been several audio recording systems that used VHS video recorders as storage devices and video tape transports, generally by encoding the digital data to be recorded into an analog composite video signal (which resembles static) and then recording this to magnetic tape. These systems were often used as "mixdown" recorders, to record the finished mix from a multi-track recorder in preparation for the manufacture of a vinyl record, cassette tape, or CD. An example was the Dbx Model 700. Another example is the Sony PCM adaptor series. Several companies sold VHS backup solutions in the 1980s and 1990s where data was converted to a video image which was then saved onto a VHS tape. the Corvus "Mirror" ( U.S. patent 4380047A ) the Metrum Model 64 on S-VHS tape, the Danmere Backer tape backup system, the Alpha Microsystems Videotrax the Legacy Storage Systems International VAST (Variable Array Storage) the ArVid the Video Backup System Amiga, The S2 VLBI system at three NASA Deep Space Network complexes and over 20 other radio telescopes stores digital data on SVHS tapes.

    Read more →
  • GlTF

    GlTF

    glTF (Graphics Library Transmission Format or GL Transmission Format and formerly known as WebGL Transmissions Format or WebGL TF) is a standard file format for three-dimensional scenes and models. A glTF file uses one of two possible file extensions: .gltf (JSON/ASCII) or .glb (binary). Both .gltf and .glb files may reference external binary and texture resources. Alternatively, both formats may be self-contained by directly embedding binary data buffers (as base64-encoded strings in .gltf files or as raw byte arrays in .glb files). An open standard developed and maintained by the Khronos Group, it supports 3D model geometry, appearance, scene graph hierarchy, and animation. It is intended to be a streamlined, interoperable format for the delivery of 3D assets, while minimizing file size and runtime processing by apps. As such, its creators have described it as the "JPEG of 3D". == Overview == The glTF format stores data primarily in JSON. The JSON may also contain blobs of binary data known as buffers, and refer to external files, for storing mesh data, images, etc. The binary .glb format also contains JSON text, but serialized with binary chunk headers to allow blobs to be directly appended to the file. The fundamental building blocks of a glTF scene are nodes. Nodes are organized into a hierarchy, such that a node may have other nodes defined as children. Nodes may have transforms relative to their parent. Nodes may refer to resources, such as meshes, skins, and cameras. Meshes may refer to materials, which refer to textures, which refer to images. Scenes are defined using an array of root nodes. Most of the top-level glTF properties use a flat hierarchy for storage. Nodes are saved in an array and are referred to by index, including by other nodes. A glTF scene refers to its root nodes by index. Furthermore, nodes refer to meshes by index, which refer to materials by index, which refer to textures by index, which refer to images by index. All glTF data structures support being extended using a JSON property, allowing arbitrary JSON data to be added. == Releases == === glTF 1.0 === Members of the COLLADA working group conceived the file format in 2012. At SIGGRAPH 2012, Khronos presented a demo of glTF, which was then called WebGL Transmissions Format (WebGL TF). On October 19, 2015, Khronos released the glTF 1.0 specification. ==== Adoption of glTF 1.0 ==== At SIGGRAPH 2016, Oculus announced their adoption of glTF citing the similarities to their ovrscene format. In October 2016, Microsoft joined the 3D Formats working group at Khronos to collaborate on glTF. === glTF 2.0 === The second version, glTF 2.0, was released in June 2017, and is a complete overhaul of the file format from version 1.0, with most tools adopting the 2.0 version. Based on a proposal by Fraunhofer originally presented at SIGGRAPH 2016, physically based rendering (PBR) was added, replacing WebGL shaders used in glTF 1.0. glTF 2.0 added the GLB binary format into the base specification. Other upgrades include sparse accessors and morph targets for techniques such as facial animation, and schema tweaks and breaking changes for corner cases or performance such as replacing top-level glTF object properties with arrays for faster index-based access. There is ongoing work towards import and export in Unity and an integrated multi-engine viewer and validator. ==== Adoption of glTF 2.0 ==== On March 3, 2017, Microsoft announced that they would be using glTF 2.0 as the 3D asset format across their product line, including Paint 3D, 3D Viewer, Remix 3D, Babylon.js, and Microsoft Office. Sketchfab also announced support for glTF 2.0. The glTF and GLB formats are used on and supported by companies including DGG, UX3D, Sketchfab, Facebook, Microsoft, Meta, Google, Adobe, Box, TurboSquid, Unreal Engine, Unity, and Qt Quick 3D. The format has been noted as an important standard for augmented reality, integrating with modeling software such as Autodesk Maya, Autodesk 3ds Max, and Poly. In February 2020, the Smithsonian Institution launched their Open Access Initiative, releasing approximately 2.8 million 2D images and 3D models into the public domain, using glTF for the 3D models. In July 2022, glTF 2.0 was released as the ISO/IEC 12113:2022 International Standard. Khronos stated they would make regular submissions to bring updates and new widely adopted glTF functionality into refreshed versions of ISO/IEC 12113 to ensure that there is no long-term divergence between the ISO/IEC and Khronos specifications. The open-source game engine Godot supports importing glTF 2.0 files since version 3.0 and export since version 4.0. === Extensions === The glTF format can be extended with arbitrary JSON to add new data and functionality. Extensions can be placed on any part of a glTF, including nodes, animations, materials, textures, and on the entire document. Khronos keeps a non-comprehensive registry of glTF extensions on GitHub, including all official Khronos extensions and a few third-party extensions. PBR extensions model the physical appearance of real-world objects, allowing developers to create realistic 3D assets that have the correct appearance. As new PBR extensions are released, they continue to expand PBR capabilities within the glTF framework, allowing a wider range of scenes and objects to be realistically rendered as 3D assets. The KTX 2.0 extension for universal texture compression enables 3D models in the glTF format to be highly compressed and to use natively supported texture formats, reducing file size and boosting rendering speed. Draco is a glTF extension for mesh compression, to compress and decompress 3D meshes, to help reduce the size of 3D files. It compresses vertex attributes, normals, colors, and texture coordinates. Various glTF extensions for game engine interoperability have been developed by OMI group. This includes extensions for physics shapes, physics bodies, physics joints, audio playback, seats, spawn points, and more. The VRM consortium has developed glTF extensions for advanced humanoid 3D avatars including dynamic spring bones and toon materials. == Derivative formats == 3D Tiles, an OGC Community Standard, builds on glTF to add a spatial data structure, metadata, and declarative styling for streaming massive heterogeneous 3D geospatial datasets. VRM, a model format for VR, is built on the .glb format. It is a 3D humanoid avatar specification and file format. == Software ecosystem == Khronos maintains the glTF Sample Viewer for viewing glTF assets. Khronos also maintains the glTF Validator for validating if 3D models conform to the glTF specification. Khronos maintains a glTF Compressor tool to interactively optimize and fine-tune compression settings for glTF assets using KTX 2.0 textures. glTF loaders are in open-source WebGL engines including PlayCanvas, Three.js, Babylon.js, Cesium, PEX, xeogl, and A-Frame. The Godot game engine supports and recommends the glTF format, with both import and export support. Open-source glTF converters are available from COLLADA, FBX, and OBJ. Assimp can import and export glTF. glTF files can also be directly exported from a variety of 3D editors, such as Blender, Unity (using the glTFast importer/exporter), Freecad, Vectary, Autodesk 3ds Max (natively or using Verge3D exporter), Autodesk Maya (using babylon.js exporter), Autodesk Inventor, Modo, Houdini, Paint 3D, Godot, and Substance Painter. Open-source glTF utility libraries are available for programming languages including JavaScript, Node.js, C++, C#, Python, Haskell, Java, Go, Rust, Haxe, Ada, and TypeScript. Khronos keeps a list of these libraries and other related applications on their ecosystem site. The Khronos 3D Commerce Working Group released Asset Creation Guidelines in 2020 outlining best practices for use of the glTF file format in 3D Commerce. In 2025, the Working Group launched Asset Creation Guidelines 2.0, a continuously updated resource with additional guidance for geometry, mesh optimization, UV maps, textures, materials/PBR performance, and web optimization. The Khronos PBR Neutral Tone Mappers specification is a tone mapper designed to faithfully reproduce an object's base color, hue, and saturation when using PBR rendering under grayscale lighting, supporting brand- and product-accurate color representation. Khronos maintains the glTF Asset Auditor to allow retailers and advertising technology platforms to validate 3D assets against either a default Audit Profile modelled on the 2020 3D Commerce Asset Creation Guidelines or a custom profile defined by the target application.

    Read more →