AI Coding Godot

AI Coding Godot — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Digistar

    Digistar

    Digistar is the first computer graphics-based planetarium projection and content system. It was designed by Evans & Sutherland and released in 1983. The technology originally focused on accurate and high quality display of stars, including for the first time showing stars from points of view other than Earth's surface, travelling through the stars, and accurately showing celestial bodies from different times in the past and future. Beginning with the Digistar 3 the system now projects full-dome video. == Projector == Unlike modern full-dome systems, which use LCD, DLP, SXRD, or laser projection technology, the Digistar projection system was designed for projecting bright pinpoints of light representing stars. This was accomplished using a calligraphic display, a form of vector graphics, rather than raster graphics. The heart of the Digistar projector is a large cathode-ray tube (CRT). A phosphor plate is mounted atop the tube, and light is then dispersed by a large lens with a 160 degree field of view to cover the planetarium dome. The original lens bore the inscription: "August 1979 mfg. by Lincoln Optical Corp., L.A., CA for Evans and Sutherland Computer Corp., SLC, UT, Digital planetarium CRT projection lens, 43mm, f2.8, 160 degree field of view". The coordinates of the stars and wire-frame models to be displayed by the projector were stored in computer RAM in a display list. The display would read each set of coordinates in turn and drive the CRT's electron beam directly to those coordinates. If the electron beam was enabled while being moved a line would be painted on the phosphor plate. Otherwise, the electron beam would be enabled once at its destination and a star would be painted. Once all coordinates in the display list had been processed, the display would repeat from the top of the display list. Thus, the shorter the display list the more frequently the electron beam would refresh the charge on a given point on the phosphor plate, making the projection of the points brighter. In this way, the stars projected by Digistar were substantially brighter than could be achieved using a raster display, which has to touch every point on the phosphor plate before repeating. Likewise, the calligraphic technology allowed Digistar to have a darker black-level than full-dome projectors, since the portions of the phosphor plate representing dark sky were never hit by the electron beam. As it is only one tube, with no pixelated color filter screen, the Digistar projector is monochromatic. The Digistar projects a bright, phosphorescent green, though many (including both visitors and planetarians) report they cannot distinguish between this green and white. Additionally, unlike a raster display, the calligraphic display is not discretized into pixels, so the displayed stars were a more realistic single spot of light, without the blocky or ropy artifacts that are hard to avoid with raster graphics. Due to the use of vector graphics, as opposed to raster imaging, the Digistar does not have the resolution issues that many full-dome systems have. Thanks to this, and the brightness of the CRT, only one projector is needed to project on the entire dome, whereas most full-dome systems require up to six raster projectors, depending on dome size. The projector in the original Digistar was housed in a square pyramid-shaped sheathing. When powered on, the four sides at the tip of the pyramid would recede into the housing, exposing the lens and appearing as a cut-off pyramid. As Digistar II was being developed, many planetaria were sold Digistar LEA projectors. The LEA, called Digistar 1.5 by many users, was effectively a prototype of the D2 projector, compatible with Digistar and upgradable to Digistar II. There are no significant differences in performance between the LEA and the true D2. == History == Digistar was the brainchild of Stephen McAllister and Brent Watson, both of whom were long-time amateur astronomers and computer graphics engineers. In 1977, E&S had been consulting with Johnson Space Center regarding training simulators for astronauts. McAllister had been writing proof-of-concept software for this consultation and in summer 1977 entered the data for 400 bright stars and wrote the software to display them. Steve and Brent both originally saw the system's purpose as celestial navigation training. Brent, who had until recently worked at Hansen planetarium, asked his planetarium coworkers what they thought of a potential digital planetarium system, and then Steve and Brent both targeted the system toward planetaria. The primary goal of the planetarium system was to use computer graphics to overcome the limitation of traditional star ball technology that only allowed display of star fields from the point of view of Earth's surface. By using computer graphics the stars could be displayed from viewpoints in space, including simulating the appearance of space flight. Likewise, planets and moons within the Solar System could be displayed accurately for any time in history, from any point of view. The system used the location of real stars from the Yale Bright Star Catalogue, as well as random stars. A laboratory prototype of Digistar was used to generate the star fields and tactical displays in the 1982 science fiction film Star Trek II: The Wrath of Khan. Filming was done directly from the Digistar display in the lab. ILM projected the effort would take two weeks, but in fact it took from late November 1981 until mid-February 1982. The last shot recorded was what became the first entirely computer generated feature film sequence. It was the opening scene of the film, a rotating forward translation through a star field that lasted 3.5 minutes. It was recorded in one take, at a rate of one frame every 3.5 seconds, taking four hours for the shoot. The Digistar team members are credited in the film. After prototyping in labs at Evans and Sutherland the team repeatedly used Salt Lake City's Hansen planetarium to beta test the system at the planetarium at night. The Digistar team performed one week of shows at the planetarium as a fund raiser to benefit the planetarium. The company also later gave the planetarium an improved prototype Digistar to replace "Jake", the planetarium's aging Spitz planetarium projector. The first customer installation was to the newly constructed Universe Planetarium at the Science Museum of Virginia in 1983, the largest planetarium dome in the world at the time, for $595,000. By September 1986 there were four installed Digistars. Even at this point the long-term success of the product was very much in doubt, but as of 2019 Digistar has an installed base of over 550 planetaria. === Versions === Digistar (1983) Digistar II (1995) Digistar 3 (2002) Digistar 4 (2010?) Digistar 5 (2012) Digistar 6 (2016) Digistar 7 (2021) == Hardware == Digistar was driven by a VAX-11/780 minicomputer, with custom graphics hardware related to the E&S Picture System 2. Later versions of Digistar 1 used a DEC MicroVAX 2, driving a custom version of a PS/300. The original Digistar and Digistar 2 had a physical control panel that was used for running the star shows. This control panel was approximately 3' x 4' and contained a keyboard, a 6 DOF joystick, and a large array of back-lit buttons. One button that was used for moving the viewpoint forward in space was labeled "Boldly Go". Later iterations of Digistar replaced the physical control panel with a common graphical user interface. Digistar 3 was the first Digistar system to offer full-dome video in 2002, using six projectors. Digistar 4 was able to cover the dome using only two projectors. == System limitations == Though technologically advanced in its day, and the closest system to true full-dome video at the time of its release, the original Digistar and Digistar 2 are limited to only projecting dots and lines—meaning only wireframe models can be projected. To compensate for this, the projector is capable of defocusing specific models, blurring lines and dots together. An example of this is in the Digistar 2's built-in Milky Way model. The model is a circle of parallel lines that, when defocused, appear as the continuous band of the Milky Way across the sky. On more complex models, especially three-dimensional ones, brightness and details may be lost in this process, so it is not useful in all situations. The Digistar and Digistar 2 also suffer focus limitations. Because they use a single lens to cover the entire dome, it is difficult to gain perfect focus across the dome. Coupled with this, stars greater than a certain brightness are "multihit" points, meaning the projector draws two dots at the given position to accommodate the brightness of the star. Errors in the projector can lead the second dot to be slightly out-of-place with the first one. These two issues together, along with other issues that can occur within the projector's focus system, give the stars a blobby look. Some p

    Read more →
  • Voiceverse NFT plagiarism scandal

    Voiceverse NFT plagiarism scandal

    In January 2022, 15—the pseudonymous Massachusetts Institute of Technology (MIT) artificial intelligence researcher and creator of the non-commercial generative artificial intelligence voice synthesis research project 15.ai—discovered that the blockchain-based technology company Voiceverse had plagiarized from their platform. Voiceverse marketed itself as a service that offered AI voice cloning technology that could be purchased and traded as non-fungible tokens (NFTs). Amid heightened controversy over NFTs in the gaming industry, voice actor Troy Baker (who has been described as one of the most famous voice actors in video games) announced his partnership with Voiceverse on January 14, 2022, triggering immediate backlash over concerns about the environmental impact of NFTs, potential for fraud, predatory monetization in video games, and the potential of AI displacing jobs for human voice actors. Later that same day, 15 revealed through server logs that Voiceverse had generated voice lines using 15's free text-to-speech platform, pitch-shifted the audio to make them unrecognizable, and falsely marketed the samples as their own technology before selling them as NFTs. Within an hour of being confronted with evidence, Voiceverse confessed and stated that their marketing team had used 15.ai without proper attribution while rushing to create a technology demo to coincide with Baker's partnership announcement, further exacerbating the already negative reception to the original announcement. In response, 15 replied "Go fuck yourself"; the interaction went viral and garnered a large amount of support for the developer. News publications universally characterized this incident as Voiceverse having "stolen" from 15.ai. The next day, Baker appeared on a podcast and stated that his motivation had been to help independent creators who were unable to afford professional voice actors. Following continued backlash and the plagiarism revelation, Baker ended his partnership with Voiceverse on January 31, 2022. Subsequently, the incident was documented in multiple AI ethics databases, criticisms of predatory monetization in video games, and retrospectives as one of the earliest instances of plagiarism and theft stemming from artificial intelligence during the AI boom. == Background == === Troy Baker === Troy Baker is a prominent voice actor in the video game industry best known for his performances as Joel Miller in The Last of Us franchise. Baker has been described as "ubiquitous" by Polygon, "one of the most high-profile and prolific voice actors in video games" by Eurogamer, and "arguably the most famous voice actor in the gaming industry" by GameGuru. His other prominent roles include voicing Agent John "Jonesy" Jones in Fortnite, Booker DeWitt in BioShock Infinite, and both Batman and Joker in multiple Batman video games. As of October 2025, Baker holds the record for the most acting nominations at the BAFTA Games Awards, with five between 2013 and 2021. === Voiceverse === Voiceverse is a blockchain-based startup founded by the Bored Ape Yacht Club that marketed itself as offering AI voice cloning technology in the form of NFTs. Prior to the announcement of their partnership with Baker, Voiceverse had partnered with LOVO, Inc., an AI voice platform that, according to LOVO, could generate human-like voices. Voiceverse stated that any user who purchases a voice NFT would have unlimited and perpetual access to the voice model, which could be used to create content such as audiobooks, YouTube videos, podcasts, e-learning materials, in-game voice chat, and Zoom calls. Voiceverse promised that buyers would "OWN [sic] all of the IP" of content they created using these voices. Voiceverse's roadmap included plans to release 8,888 initial voice NFTs, a feature to add emotions to existing voices, and the ability for users to mint their own voices as NFTs. Prior to Baker's partnership, Voiceverse had also partnered with voice actors Charlet Chung, who voices D.Va in Overwatch, and Andy Milonakis of The Andy Milonakis Show. === 15.ai === 15.ai is a free web application launched in 2020 that uses artificial intelligence to generate text-to-speech voices of fictional characters from popular media. Created by a pseudonymous artificial intelligence researcher known as 15, who began developing the technology as a freshman during their undergraduate research at MIT, it was an early example of an application of generative artificial intelligence during the initial stages of the AI boom. The platform showed that deep neural networks could generate emotionally expressive speech with only 15 seconds of speech; the name "15.ai" references the creator's statement that a voice can be convincingly cloned with just 15 seconds of audio, as opposed to the tens of hours of data previously required. 15.ai became an Internet phenomenon in early 2021 when content utilizing it went viral on social media and quickly gained widespread use among various Internet fandoms. 15 has emphasized that it remain free and non-commercial; it only requires users to give proper credit when using the service for content creation. === NFTs in the video game industry === By early 2022, NFTs had become highly controversial within the gaming industry. Critics raised concerns about their environmental impact due to the significant energy consumption of blockchain technology. In addition, the prevalence of scams, fraud, and potential money laundering associated with NFT sales, as well as fears that NFTs were a new form of predatory monetization following the increasing frequency of loot boxes, caused vocal pushback from the gaming community. Several major gaming companies had begun exploring NFT integration into their products, though fan backlash had already forced some projects to be cancelled. On December 16, 2021, the developers of S.T.A.L.K.E.R. 2: Heart of Chernobyl announced that they would be including NFTs in the game, but cancelled within an hour of the announcement due to immediate universal backlash. Simultaneously, the rise of AI voice technology raised concerns among voice actors about potential job displacement and the devaluation of their work amidst the voice acting industry's ongoing struggles for better compensation and working conditions. == Partnership announcement and backlash == On January 14, 2022, 1:02 a.m. EST, Baker announced on Twitter that he was partnering with Voiceverse "to explore ways where together we might bring new tools to new creators to make new things, and allow everyone a chance to own & invest in the IP's they create." The announcement concluded with the statement "You can hate. Or you can create." Baker's specific role with Voiceverse remained unclear at the time of the announcement. Along with Baker's announcement, Voiceverse promoted their supposed voice AI technology on Twitter by posting animated videos that featured a cat character created by NFT firm Chubbiverse. The videos concluded with text that read "The Voice Powered By Voiceverse"; Voiceverse stated on Twitter that the voices in the animations had been generated using their own AI voice synthesis technology and presented the videos as a technology demonstration of their voice NFT capabilities. The announcement provoked immediate and widespread backlash from the gaming community. Baker's tweet received thousands of replies and quote retweets (the vast majority of which were negative), far more than the number of likes; Michael McWhertor of Polygon described it as a "textbook example of being ratioed" and commented that reactions had been amplified by the final part of Baker's announcement. Michael Beckwith of Metro called Baker's approach "bizarrely aggressive". Later that day, Baker responded to the backlash by apologizing for his choice of words. He said he appreciated people's thoughts and acknowledged that the "hate/create part might have been a bit antagonistic," calling it a "bad attempt to bring levity". Despite the apology, Baker and his fellow voice actors did not distance themselves from Voiceverse at this point. At the same time, Voiceverse attempted to address the criticisms, stating that they were working to move to more environmentally friendly blockchain technology and that voice actors would receive royalties from NFT sales, with actors benefiting from any increase in NFT value. == Plagiarism revelation == On December 13, 2021, amidst the increasingly negative reactions toward NFTs among the general public, the creator of 15.ai (known pseudonymously as 15) announced that they had "no interest in incorporating NFTs into any aspect of [their] work." On January 14, 2022, 11:17 a.m. EST (10 hours after Baker's initial announcement), 15 commented on the Voiceverse venture, stating that it "sounds like a scam". Two hours later, at 1:20 p.m., 15 explicitly accused Voiceverse of "actively attempting to appropriate [15's] work for [Voiceverse's] own benefit." 15 provided evidence through

    Read more →
  • Anyword

    Anyword

    Anyword is a technology company that offers an artificial intelligence platform, using natural language processing to generate and optimize marketing text for websites, social media, email, and ads. The company also offers a complete managed service to publishers and brands to help them increase their revenue through social ads. It is used by National Geographic, Red Bull, The New York Times, BBC, Ted Baker, etc. The company has an office in New York, and Tel Aviv. == History == It was founded in 2013 — its original name was Keywee Inc. In March 2015, Anyword received $9.1 million in the Series A funding round led by a notable group of investors. In July 2016, the company was selected as an official Facebook Marketing Partner. In August 2019, Anyword was named Best Content Marketing Platform in the Digiday Technology Award winners. In November 2021, it raised $21 million in its Series B funding round.

    Read more →
  • Ontology merging

    Ontology merging

    Ontology merging defines the act of bringing together two conceptually divergent ontologies or the instance data associated to two ontologies. This is similar to work in database merging (schema matching). This merging process can be performed in a number of ways, manually, semi automatically, or automatically. Manual ontology merging although ideal is extremely labour-intensive and current research attempts to find semi or entirely automated techniques to merge ontologies. These techniques are statistically driven often taking into account similarity of concepts and raw similarity of instances through textual string metrics and semantic knowledge. These techniques are similar to those used in information integration employing string metrics from open source similarity libraries.

    Read more →
  • Production (computer science)

    Production (computer science)

    In computer science, a production or production rule is a rewrite rule that replaces some symbols with other symbols. A finite set of productions P {\displaystyle P} is the main component in the specification of a formal grammar (specifically a generative grammar). In such grammars, a set of productions is a special case of relation on the set of strings V ∗ {\displaystyle V^{}} (where ∗ {\displaystyle {}^{}} is the Kleene star operator) over a finite set of symbols V {\displaystyle V} called a vocabulary that defines which non-empty strings can be substituted with others. The set of productions is thus a special kind subset P ⊂ V ∗ × V ∗ {\displaystyle P\subset V^{}\times V^{}} and productions are then written in the form u → v {\displaystyle u\to v} to mean that ( u , v ) ∈ P {\displaystyle (u,v)\in P} (not to be confused with → {\displaystyle \to } being used as function notation, since there may be multiple rules for the same u {\displaystyle u} ). Given two subsets A , B ⊂ V ∗ {\displaystyle A,B\subset V^{}} , productions can be restricted to satisfy P ⊂ A × B {\displaystyle P\subset A\times B} , in which case productions are said "to be of the form A → B {\displaystyle A\to B} . Different choices and constructions of A , B {\displaystyle A,B} lead to different types of grammars. In general, any production of the form u → ϵ , {\displaystyle u\to \epsilon ,} where ϵ {\displaystyle \epsilon } is the empty string (sometimes also denoted λ {\displaystyle \lambda } ), is called an erasing rule, while productions that would produce strings out of nowhere, namely of the form ϵ → v , {\displaystyle \epsilon \to v,} are never allowed. In order to allow the production rules to create meaningful sentences, the vocabulary is partitioned into (disjoint) sets Σ {\displaystyle \Sigma } and N {\displaystyle N} providing two different roles: Σ {\displaystyle \Sigma } denotes the terminal symbols known as an alphabet containing the symbols allowed in a sentence; N {\displaystyle N} denotes nonterminal symbols, containing a distinguished start symbol S ∈ N {\displaystyle S\in N} , that are needed together with the production rules to define how to build the sentences. In the most general case of an unrestricted grammar, a production u → v {\displaystyle u\to v} , is allowed to map arbitrary strings u {\displaystyle u} and v {\displaystyle v} in V {\displaystyle V} (terminals and nonterminals), as long as u {\displaystyle u} is not empty. So unrestricted grammars have productions of the form V ∗ ∖ { ϵ } → V ∗ {\displaystyle V^{}\setminus \{\epsilon \}\to V^{}} or if we want to disallow changing finished sentences V ∗ N V ∗ = ( V ∗ ∖ Σ ∗ ) → V ∗ {\displaystyle V^{}NV^{}=(V^{}\setminus \Sigma ^{})\to V^{}} , where V ∗ N V ∗ {\displaystyle V^{}NV^{}} indicates concatenation and forces a non-terminal symbol to always be present on the left-hand side of the productions, and ∖ {\displaystyle \setminus } denotes set minus or set difference. If we do not allow the start symbol to occur in v {\displaystyle v} (the word on the right side), we have to replace V ∗ {\displaystyle V^{}} with ( V ∖ { S } ) ∗ {\displaystyle (V\setminus \{S\})^{}} on the right-hand side. The other types of formal grammar in the Chomsky hierarchy impose additional restrictions on what constitutes a production. Notably in a context-free grammar, the left-hand side of a production must be a single nonterminal symbol. So productions are of the form: N → V ∗ {\displaystyle N\to V^{}} == Grammar generation == To generate a string in the language, one begins with a string consisting of only a single start symbol, and then successively applies the rules (any number of times, in any order) to rewrite this string. This stops when a string containing only terminals is obtained. The language consists of all the strings that can be generated in this manner. Any particular sequence of legal choices taken during this rewriting process yields one particular string in the language. If there are multiple different ways of generating this single string, then the grammar is said to be ambiguous. For example, assume the alphabet consists of a {\displaystyle a} and b {\displaystyle b} , with the start symbol S {\displaystyle S} , and we have the following rules: 1. S → a S b {\displaystyle S\rightarrow aSb} 2. S → b a {\displaystyle S\rightarrow ba} then we start with S {\displaystyle S} , and can choose a rule to apply to it. If we choose rule 1, we replace S {\displaystyle S} with a S b {\displaystyle aSb} and obtain the string a S b {\displaystyle aSb} . If we choose rule 1 again, we replace S {\displaystyle S} with a S b {\displaystyle aSb} and obtain the string a a S b b {\displaystyle aaSbb} . This process is repeated until we only have symbols from the alphabet (i.e., a {\displaystyle a} and b {\displaystyle b} ). If we now choose rule 2, we replace S {\displaystyle S} with b a {\displaystyle ba} and obtain the string a a b a b b {\displaystyle aababb} , and are done. We can write this series of choices more briefly, using symbols: S ⇒ a S b ⇒ a a S b b ⇒ a a b a b b {\displaystyle S\Rightarrow aSb\Rightarrow aaSbb\Rightarrow aababb} . The language of the grammar is the set of all the strings that can be generated using this process: { b a , a b a b , a a b a b b , a a a b a b b b , … } {\displaystyle \{ba,abab,aababb,aaababbb,\dotsc \}} .

    Read more →
  • Recommender system

    Recommender system

    A recommender system, also called a recommendation algorithm, recommendation engine, or recommendation platform, is a type of information filtering system that suggests items most relevant to a particular user. The value of these systems becomes particularly evident in scenarios where users must select from a large number of options, such as products, media, or content. Major social media platforms and streaming services rely on recommender systems that employ machine learning to analyze user behavior and preferences, thereby enabling personalized content feeds. Typically, the suggestions refer to a variety decision-making processes, including the selection of a product, musical selection, or online news source to read. The implementation of recommender systems is pervasive, with commonly recognised examples including the generation of playlist for video and music services, the provision of product recommendations for e-commerce platforms, and the recommendation of content on social media platforms and the open web. These systems can operate using a single type of input, such as music, or multiple inputs from diverse platforms, including news, books and search queries. Additionally, popular recommender systems have been developed for specific topics, such as restaurants and online dating services. Recommender systems have also been developed to explore research articles and experts, collaborators, and financial services. A content discovery platform is a software recommendation platform that employs recommender system tools. It utilizes user metadata in order to identify and suggest relevant content, whilst reducing ongoing maintenance and development costs. A content discovery platform delivers personalized content to websites, mobile devices, and set-top boxes. A large range of content discovery platforms currently exist for various forms of content ranging from news articles and academic journal articles to television. As operators compete to serve as the gateway to home entertainment, personalized television emerges as a key service differentiator. Academic content discovery has recently become another area of interest, the emergence of numerous companies dedicated to assisting academic researchers in keeping up to date with relevant academic content and facilitating serendipitous discovery of new content. == Overview == Recommender systems usually make use of either or both collaborative filtering and content-based filtering, as well as other systems such as knowledge-based systems. Collaborative filtering approaches build a model from a user's past behavior (e.g., items previously purchased or selected and/or numerical ratings given to those items) as well as similar decisions made by other users. This model is then used to predict items (or ratings for items) that the user may have an interest in. Content-based filtering approaches utilize a series of discrete, pre-tagged characteristics of an item in order to recommend additional items with similar properties. === Example === The differences between collaborative and content-based filtering can be demonstrated by comparing two early music recommender systems, Last.fm and Pandora Radio. We can also look at how these methods are applied in e-commerce, for example, on platforms like Amazon. Last.fm creates a "station" of recommended songs by observing what bands and individual tracks the user has listened to on a regular basis and comparing those against the listening behavior of other users. Last.fm will play tracks that do not appear in the user's library, but are often played by other users with similar interests. As this approach leverages the behavior of users, it is an example of a collaborative filtering technique. Pandora uses the properties of a song or artist (a subset of the 450 attributes provided by the Music Genome Project) to seed a "station" that plays music with similar properties. User feedback is used to refine the station's results, deemphasizing certain attributes when a user "dislikes" a particular song and emphasizing other attributes when a user "likes" a song. This is an example of a content-based approach. In e-commerce, Amazon's well-known "customers who bought X also bought Y" feature is a prime example of collaborative filtering. It also uses content-based filtering when it recommends a book by the same author you've previously read or a pair of shoes in a similar style to ones you've viewed. Each type of system has its strengths and weaknesses. In the above example, Last.fm requires a large amount of information about a user to make accurate recommendations. This is an example of the cold start problem, and is common in collaborative filtering systems. Whereas Pandora needs very little information to start, it is far more limited in scope (for example, it can only make recommendations that are similar to the original seed). === Alternative implementations === Recommender systems are a useful alternative to search algorithms since they help users discover items they might not have found otherwise. Of note, recommender systems are often implemented using search engines indexing non-traditional data. In some cases, like in the Gonzalez v. Google Supreme Court case, may argue that search and recommendation algorithms are different technologies. Recommender systems have been the focus of several granted patents, and there are more than 50 software libraries that support the development of recommender systems including LensKit, RecBole, ReChorus and RecPack. == History == Elaine Rich created the first recommender system in 1979, called Grundy. She looked for a way to recommend users books they might like. Her idea was to create a system that asks users specific questions and classifies them into classes of preferences, or "stereotypes", depending on their answers. Depending on users' stereotype membership, they would then get recommendations for books they might like. Another early recommender system, called a "digital bookshelf", was described in a 1990 technical report by Jussi Karlgren at Columbia University, and implemented at scale and worked through in technical reports and publications from 1994 onwards by Jussi Karlgren, then at SICS, and research groups led by Pattie Maes at MIT, Will Hill at Bellcore, and Paul Resnick, also at MIT, whose work with GroupLens was awarded the 2010 ACM Software Systems Award. Montaner provided the first overview of recommender systems from an intelligent agent perspective. Adomavicius provided a new, alternate overview of recommender systems. Herlocker provides an additional overview of evaluation techniques for recommender systems, and Beel et al. discussed the problems of offline evaluations. Beel et al. have also provided literature surveys on available research paper recommender systems and existing challenges. == Approaches == === Collaborative filtering === One approach to the design of recommender systems that has wide use is collaborative filtering. Collaborative filtering is based on the assumption that people who agreed in the past will agree in the future, and that they will like similar kinds of items as they liked in the past. The system generates recommendations using only information about rating profiles for different users or items. By locating peer users/items with a rating history similar to the current user or item, they generate recommendations using this neighborhood. This approach is a cornerstone for e-commerce sites that analyze the purchasing patterns of thousands of users to suggest what you might like. Collaborative filtering methods are classified as memory-based and model-based. A well-known example of memory-based approaches is the user-based algorithm, while that of model-based approaches is matrix factorization (recommender systems). A key advantage of the collaborative filtering approach is that it does not rely on machine analyzable content and therefore it is capable of accurately recommending complex items such as movies without requiring an "understanding" of the item itself. Many algorithms have been used in measuring user similarity or item similarity in recommender systems. For example, the k-nearest neighbor (k-NN) approach and the Pearson Correlation as first implemented by Allen. When building a model from a user's behavior, a distinction is often made between explicit and implicit forms of data collection. Examples of explicit data collection include the following: Asking a user to rate an item on a sliding scale. Asking a user to search. Asking a user to rank a collection of items from favorite to least favorite. Presenting two items to a user and asking him/her to choose the better one of them. Asking a user to create a list of items that he/she likes (see Rocchio classification or other similar techniques). Examples of implicit data collection include the following: Observing the items that a user views in an online store, media library, or other repository of med

    Read more →
  • Quality of Data

    Quality of Data

    Quality of Data (QoD) is a designation coined by L. Veiga, that specifies and describes the required Quality of Service of a distributed storage system from the Consistency point of view of its data. It can be used to support big data management frameworks, Workflow management, and HPC systems (mainly for data replication and consistency). It takes into account data semantics, namely the Time interval of data freshness, the Sequence of tolerable number of outstanding versions of the data read before ore refresh, and the Value divergence allowed before displaying it. Initially it was based on a model from an existing research work regarding vector-field Consistency, awarded the best-paper prize in the ACM/IFIP/Usenix Middleware Conference 2007 and later enhanced for increased scalability and fault-tolerance. This consistency model has been successfully applied and proven in big data key/value store Apache HBase, initially designed as a middleware module seating between clusters from separate data centres. The HBase-QoD coupling minimises bandwidth usage and optimises resources allocation during replication achieving the desired consistency level at a more fine-grained level. QoD is defined by the three-dimensions of vector k=(θ,σ,ν), but with a broader view of the issue, applicable also to large-scale data management techniques in regards to their timely delivery. == Other descriptions == Quality of Data should not be confused with other definitions for data quality such as completeness, validity, and accuracy.

    Read more →
  • Australian Geoscience Data Cube

    Australian Geoscience Data Cube

    The Australian Geoscience Data Cube (AGDC) is an approach to storing, processing and analyzing large collections of Earth observation data. The technology is designed to meet challenges of national interest by being agile and flexible with vast amounts of layered grid data. The AGDC reduces processing time of traditional image analysis by calibrating, pre-computing known extents, pixel alignment and storing metadata in a cell lattice structure. The temporal-pixel aligned data can often be analysed faster across space and time dimensions than previous scene based techniques. This allows the AGDC to be flexible in tackling future challenges and improve analysis times on every-increasing data repositories of earth observation. The AGDC has also been used internationally to allow countries to maintain ecologically sustainable programs and reduce the difficulty curve of utilizing Remote Sensing data. == Background == The AGDC was originally conceived by Geoscience Australia but is now maintained in a partnership between Geoscience Australia, Commonwealth Scientific and Industrial Research Organisation (CSIRO) and National Computational Infrastructure National Facility (Australia) (NCI). This is made possible by the funding from the partnership and a number of organisations such as National Collaborative Research Infrastructure Strategy (NCRIS). == Analysis ready data, ingestion and indexing == The data processed in the cube is made analysis ready before being ingested and indexed into the AGDC. Analysis ready data is pre-processed data that has applied corrections for instrument calibration (gains and offsets), geolocation (spatial alignment) and radiometry (solar illumination, incidence angle, topography, atmospheric interference). The ingestion process manages the translation of datasets into the storage units while maintaining a database index. The data within the storage and index can be accessed via API calls often compiled within code such as Python (programming language). Example: s2a_l1c = dc.load(product='s2a_level1c_granule',x=(147.36, 147.41), y=(-35.1, -35.15), measurements=['04','03','02'], output_crs='EPSG:4326', resolution=(-0.00025,0.00025)) === Datasets currently stored === Geoscience Australia Landsat Surface Reflectance (1987 to present) Landsat Pixel Quality Landsat Fractional Cover Landsat NDVI === Datasets that have been piloted === USGS Landsat Surface Reflectance SRTM DEM Himawari 8 MODIS Sentinel-2 L1C / S2A Australian Gridded Climate Data == Open source == The AGDC code base is situated in GitHub as an open repository. The core code base moved to the Open Data Cube in early 2017 as part of an international collaboration. Whilst the code base is the Open Data Cube, individual cubes exist as their own right such as the AGDC on the National Computational Infrastructure National Facility (Australia) (NCI) using the High-Performance Computing Cluster HPCC. The core code can be installed on personal computers or public computers (using git) and has many unit tests. Documentation for the code base exists on Read the Docs. == Challenges of the AGDC == The AGDC is designed to meet nationally significant challenges such as the following. Sustainability Environment Water resource management Disaster assist Policy development Community planning Forest preservation Carbon measurement == International awards == The AGDC won the 2016 Content Platform of the Year award from Geospatial World Forum.

    Read more →
  • CMU Pronouncing Dictionary

    CMU Pronouncing Dictionary

    The CMU Pronouncing Dictionary (also known as CMUdict) is an open-source pronouncing dictionary originally created by the Speech Group at Carnegie Mellon University (CMU) for use in speech recognition research. CMUdict provides a mapping orthographic/phonetic for English words in their North American pronunciations. It is commonly used to generate representations for speech recognition (ASR), e.g. the CMU Sphinx system, and speech synthesis (TTS), e.g. the Festival system. CMUdict can be used as a training corpus for building statistical grapheme-to-phoneme (g2p) models that will generate pronunciations for words not yet included in the dictionary. The most recent release is 0.7b; it contains over 134,000 entries. An interactive lookup version is available. == Database format == The database is distributed as a plain text file with one entry to a line in the format "WORD " with a two-space separator between the parts. If multiple pronunciations are available for a word, variants are identified using numbered versions (e.g. WORD(1)). The pronunciation is encoded using a modified form of the ARPABET system, with the addition of stress marks on vowels of levels 0, 1, and 2. A line-initial ;;; token indicates a comment. A derived format, directly suitable for speech recognition engines is also available as part of the distribution; this format collapses stress distinctions (typically not used in ASR). The following is a table of phonemes used by CMU Pronouncing Dictionary. == History == == Applications == The Unifon converter is based on the CMU Pronouncing Dictionary. The Natural Language Toolkit contains an interface to the CMU Pronouncing Dictionary. The Carnegie Mellon Logios tool incorporates the CMU Pronouncing Dictionary. PronunDict, a pronunciation dictionary of American English, uses the CMU Pronouncing Dictionary as its data source. Pronunciation is transcribed in IPA symbols. This dictionary also supports searching by pronunciation. Some singing voice synthesizer software like CeVIO Creative Studio and Synthesizer V uses modified version of CMU Pronouncing Dictionary for synthesizing English singing voices. Transcriber, a tool for the full text phonetic transcription, uses the CMU Pronouncing Dictionary 15.ai, a real-time text-to-speech tool using artificial intelligence, uses the CMU Pronouncing Dictionary

    Read more →
  • Mathematical knowledge management

    Mathematical knowledge management

    Mathematical knowledge management (MKM) is the study of how society can effectively make use of the vast and growing literature on mathematics. It studies approaches such as databases of mathematical knowledge, automated processing of formulae and the use of semantic information, and artificial intelligence. Mathematics is particularly suited to a systematic study of automated knowledge processing due to the high degree of interconnectedness between different areas of mathematics.

    Read more →
  • Xulvi-Brunet–Sokolov algorithm

    Xulvi-Brunet–Sokolov algorithm

    Xulvi-Brunet and Sokolov's algorithm generates networks with chosen degree correlations. This method is based on link rewiring, in which the desired degree is governed by parameter ρ. By varying this single parameter it is possible to generate networks from random (when ρ = 0) to perfectly assortative or disassortative (when ρ = 1). This algorithm allows to keep network's degree distribution unchanged when changing the value of ρ. == Assortative model == In assortative networks, well-connected nodes are likely to be connected to other highly connected nodes. Social networks are examples of assortative networks. This means that an assortative network has the property that almost all nodes with the same degree are linked only between themselves. The Xulvi-Brunet–Sokolov algorithm for this type of networks is the following. In a given network, two links connecting four different nodes are chosen randomly. These nodes are ordered by their degrees. Then, with probability ρ, the links are randomly rewired in such a way that one link connects the two nodes with the smaller degrees and the other connects the two nodes with the larger degrees. If one or both of these links already existed in the network, the step is discarded and is repeated again. Thus, there will be no self-connected nodes or multiple links connecting the same two nodes. Different degrees of assortativity of a network can be achieved by changing the parameter ρ. Assortative networks are characterized by highly connected groups of nodes with similar degree. As assortativity grows, the average path length and clustering coefficient increase. == Disassortative model == In disassortative networks, highly connected nodes tend to connect to less-well-connected nodes with larger probability than in uncorrelated networks. Examples of such networks include biological networks. The Xulvi-Brunet and Sokolov's algorithm for this type of networks is similar to the one for assortative networks with one minor change. As before, two links of four nodes are randomly chosen and the nodes are ordered with respect to their degrees. However, in this case, the links are rewired (with probability p) such that one link connects the highest connected node with the node with the lowest degree and the other link connects the two remaining nodes randomly with probability 1 − ρ. Similarly, if the new links already existed, the previous step is repeated. This algorithm does not change the degree of nodes and thus the degree distribution of the network.

    Read more →
  • Algorithmic Puzzles

    Algorithmic Puzzles

    Algorithmic Puzzles is a book of puzzles based on computational thinking. It was written by computer scientists Anany and Maria Levitin, and published in 2011 by Oxford University Press. == Topics == The book begins with a "tutorial" introducing classical algorithm design techniques including backtracking, divide-and-conquer algorithms, and dynamic programming, methods for the analysis of algorithms, and their application in example puzzles. The puzzles themselves are grouped into three sets of 50 puzzles, in increasing order of difficulty. A final two chapters provide brief hints and more detailed solutions to the puzzles, with the solutions forming the majority of pages of the book. Some of the puzzles are well known classics, some are variations of known puzzles making them more algorithmic, and some are new. They include: Puzzles involving chessboards, including the eight queens puzzle, knight's tours, and the mutilated chessboard problem Balance puzzles River crossing puzzles The Tower of Hanoi Finding the missing element in a data stream The geometric median problem for Manhattan distance == Audience and reception == The puzzles in the book cover a wide range of difficulty, and in general do not require more than a high school level of mathematical background. William Gasarch notes that grouping the puzzles only by their difficulty and not by their themes is actually an advantage, as it provides readers with fewer clues about their solutions. Reviewer Narayanan Narayanan recommends the book to any puzzle aficionado, or to anyone who wants to develop their powers of algorithmic thinking. Reviewer Martin Griffiths suggests another group of readers, schoolteachers and university instructors in search of examples to illustrate the power of algorithmic thinking. Gasarch recommends the book to any computer scientist, evaluating it as "a delight".

    Read more →
  • Tesla Dojo

    Tesla Dojo

    Tesla Dojo is a series of supercomputers designed and built by Tesla for computer vision video processing and recognition. It was used for training Tesla's machine learning models to improve its Full Self-Driving (FSD) advanced driver-assistance system. It went into production in July 2023. Dojo's goal was to efficiently process millions of terabytes of video data captured from real-life driving situations from Tesla's 4+ million cars. This goal led to a considerably different architecture than conventional supercomputer designs. In August 2025, Bloomberg News reported that the Dojo project had been disbanded, though it was restarted in January 2026. == History == Tesla operates several massively parallel computing clusters for developing its Autopilot advanced driver assistance system. Its primary unnamed cluster using 5,760 Nvidia A100 graphics processing units (GPUs) was touted by Andrej Karpathy in 2021 at the fourth International Joint Conference on Computer Vision and Pattern Recognition (CCVPR 2021) to be "roughly the number five supercomputer in the world" at approximately 81.6 petaflops, based on scaling the performance of the Nvidia Selene supercomputer, which uses similar components. However, the performance of the primary Tesla GPU cluster has been disputed, as it was not clear if this was measured using single-precision or double-precision floating point numbers (FP32 or FP64). Tesla also operates a second 4,032 GPU cluster for training and a third 1,752 GPU cluster for automatic labeling of objects. The primary unnamed Tesla GPU cluster has been used for processing one million video clips, each ten seconds long, taken from Tesla Autopilot cameras operating in Tesla cars in the real world, running at 36 frames per second. Collectively, these video clips contained six billion object labels, with depth and velocity data; the total size of the data set was 1.5 petabytes. This data set was used for training a neural network intended to help Autopilot computers in Tesla cars understand roads. By August 2022, Tesla had upgraded the primary GPU cluster to 7,360 GPUs. Dojo was first mentioned by Elon Musk in April 2019 during Tesla's "Autonomy Investor Day". In August 2020, Musk stated it was "about a year away" due to power and thermal issues. Dojo was officially announced at Tesla's Artificial Intelligence (AI) Day on August 19, 2021. Tesla revealed details of the D1 chip and its plans for "Project Dojo", a datacenter that would house 3,000 D1 chips; the first "Training Tile" had been completed and delivered the week before. In October 2021, Tesla released a "Dojo Technology" whitepaper describing the Configurable Float8 (CFloat8) and Configurable Float16 (CFloat16) floating point formats and arithmetic operations as an extension of Institute of Electrical and Electronics Engineers (IEEE) standard 754. At the follow-up AI Day in September 2022, Tesla announced it had built several System Trays and one Cabinet. During a test, the company stated that Project Dojo drew 2.3 megawatts (MW) of power before tripping a local San Jose, California power substation. At the time, Tesla was assembling one Training Tile per day. In August 2023, Tesla powered on Dojo for production use as well as a new training cluster configured with 10,000 Nvidia H100 GPUs. In January 2024, Musk described Dojo as "a long shot worth taking because the payoff is potentially very high. But it's not something that is a high probability." In June 2024, Musk explained that ongoing construction work at Gigafactory Texas is for a computing cluster claiming that it is planned to comprise an even mix of "Tesla AI" and Nvidia/other hardware with a total thermal design power of at first 130 MW and eventually exceeding 500 MW. In August 2025, Bloomberg News reported that the Dojo project was disbanded, though Musk announced it would be restarted in January 2026 with a new chip iteration. == Technical architecture == The fundamental unit of the Dojo supercomputer is the D1 chip, designed by a team at Tesla led by ex-AMD CPU designer Ganesh Venkataramanan, including Emil Talpes, Debjit Das Sarma, Douglas Williams, Bill Chang, and Rajiv Kurian. The D1 chip is manufactured by the Taiwan Semiconductor Manufacturing Company (TSMC) using 7 nanometer (nm) semiconductor nodes, has 50 billion transistors and a large die size of 645 mm2 (1.0 square inch). Updating at Artificial Intelligence (AI) Day in 2022, Tesla announced that Dojo would scale by deploying multiple ExaPODs, in which there would be: 10 Cabinets per ExaPOD (1,062,000 cores, 3,000 D1 chips) 2 System Trays per Cabinet (106,200 cores, 300 D1 chips) 6 Training Tiles per System Tray (53,100 cores, along with host interface hardware) 25 D1 chips per Training Tile (8,850 cores) 354 computing cores per D1 chip According to Venkataramanan, Tesla's senior director of Autopilot hardware, Dojo will have more than an exaflop (a million teraflops) of computing power. For comparison, according to Nvidia, in August 2021, the (pre-Dojo) Tesla AI-training center used 720 nodes, each with eight Nvidia A100 Tensor Core GPUs for 5,760 GPUs in total, providing up to 1.8 exaflops of performance. === D1 chip === Each node (computing core) of the D1 processing chip is a general purpose 64-bit CPU with a superscalar core. It supports internal instruction-level parallelism, and includes simultaneous multithreading (SMT). It doesn't support virtual memory and uses limited memory protection mechanisms. Dojo software/applications manage chip resources. The D1 instruction set supports both 64-bit scalar and 64-byte single instruction, multiple data (SIMD) vector instructions. The integer unit mixes reduced instruction set computer (RISC-V) and custom instructions, supporting 8, 16, 32, or 64 bit integers. The custom vector math unit is optimized for machine learning kernels and supports multiple data formats, with a mix of precisions and numerical ranges, many of which are compiler composable. Up to 16 vector formats can be used simultaneously. ==== Node ==== Each D1 node uses a 32-byte fetch window holding up to eight instructions. These instructions are fed to an eight-wide decoder which supports two threads per cycle, followed by a four-wide, four-way SMT scalar scheduler that has two integer units, two address units, and one register file per thread. Vector instructions are passed further down the pipeline to a dedicated vector scheduler with two-way SMT, which feeds either a 64-byte SIMD unit or four 8×8×4 matrix multiplication units. The network on-chip (NOC) router links cores into a two-dimensional mesh network. It can send one packet in and one packet out in all four directions to/from each neighbor node, along with one 64-byte read and one 64-byte write to local SRAM per clock cycle. Hardware native operations transfer data, semaphores and barrier constraints across memories and CPUs. System-wide double data rate 4 (DDR4) synchronous dynamic random-access memory (SDRAM) memory works like bulk storage. ==== Memory ==== Each core has a 1.25 megabytes (MB) of SRAM main memory. Load and store speeds reach 400 gigabytes (GB) per second and 270 GB/sec, respectively. The chip has explicit core-to-core data transfer instructions. Each SRAM has a unique list parser that feeds a pair of decoders and a gather engine that feeds the vector register file, which together can directly transfer information across nodes. ==== Die ==== Twelve nodes (cores) are grouped into a local block. Nodes are arranged in an 18×20 array on a single die, of which 354 cores are available for applications. The die runs at 2 gigahertz (GHz) and totals 440 MB of SRAM (360 cores × 1.25 MB/core). It reaches 376 teraflops using 16-bit brain floating point (BF16) numbers or using configurable 8-bit floating point (CFloat8) numbers, which is a Tesla proposal, and 22 teraflops at FP32. Each die comprises 576 bi-directional serializer/deserializer (SerDes) channels along the perimeter to link to other dies, and moves 8 TB/sec across all four die edges. Each D1 chip has a thermal design power of approximately 400 watts. === Training Tile === The water-cooled Training Tile packages 25 D1 chips into a 5×5 array. Each tile supports 36 TB/sec of aggregate bandwidth via 40 input/output (I/O) chips - half the bandwidth of the chip mesh network. Each tile supports 10 TB/sec of on-tile bandwidth. Each tile has 11 GB of SRAM memory (25 D1 chips × 360 cores/D1 × 1.25 MB/core). Each tile achieves 9 petaflops at BF16/CFloat8 precision (25 D1 chips × 376 TFLOP/D1). Each tile consumes 15 kilowatts; 288 amperes at 52 volts. === System Tray === Six tiles are aggregated into a System Tray, which is integrated with a host interface. Each host interface includes 512 x86 cores, providing a Linux-based user environment. Previously, the Dojo System Tray was known as the Training Matrix, which includes six Training Tiles, 20 Dojo Interface Processor cards across four host servers, and Ethernet-l

    Read more →
  • Run-to-completion scheduling

    Run-to-completion scheduling

    Run-to-completion scheduling or nonpreemptive scheduling is a scheduling model in which each task runs until it either finishes, or explicitly yields control back to the scheduler. Run-to-completion systems typically have an event queue which is serviced either in strict order of admission by an event loop, or by an admission scheduler which is capable of scheduling events out of order, based on other constraints such as deadlines. Some preemptive multitasking scheduling systems behave as run-to-completion schedulers in regard to scheduling tasks at one particular process priority level, at the same time as those processes still preempt other lower priority tasks and are themselves preempted by higher priority tasks.

    Read more →
  • Information explosion

    Information explosion

    Information explosion is the rapid increase in the amount of published information or data and the effects of this abundance. As the amount of available data grows, the problem of managing the information becomes more difficult, which can lead to information overload. The Online Oxford English Dictionary indicates use of the phrase in a March 1964 New Statesman article. The New York Times first used the phrase in its editorial content in an article by Walter Sullivan on June 7, 1964, in which he described the phrase as "much discussed". The earliest known use of the phrase was in a speech about television by NBC president Pat Weaver at the Institute of Practitioners of Advertising in London on September 27, 1955. The speech was rebroadcast on radio station WSUI in Iowa City and excerpted in the Daily Iowan newspaper two months later. Many sectors are seeing this rapid increase in the amount of information available such as healthcare, supermarkets, and governments. Another sector that is being affected by this phenomenon is journalism. Such a profession, which in the past was responsible for the dissemination of information, may be suppressed by the overabundance of information today. Techniques to gather knowledge from an overabundance of electronic information (e.g., data fusion may help in data mining) have existed since the 1970s. Another common technique to deal with such amount of information is qualitative research. Such approaches aim to organize the information, synthesizing, categorizing and systematizing in order to be more usable and easier to search. == Growth patterns == The world's technological capacity to store information grew from, optimally compressed, 2.6 exabytes in 1986 to 15.7 in 1993, over 54.5 in 2000, and to 295 exabytes in 2007. The world's technological capacity to receive information through one-way broadcast networks was 432 exabytes of (optimally compressed) information in 1986, 715 (optimally compressed) exabytes in 1993, 1,200 (optimally compressed) exabytes in 2000, and 1,900 in 2007. The world's effective capacity to exchange information through two-way telecommunications networks was 0.281 exabytes of (optimally compressed) information in 1986, 0.471 in 1993, 2.2 in 2000, and 65 (optimally compressed) exabytes in 2007. A new metric that is being used in an attempt to characterize the growth in person-specific information, is the disk storage per person (DSP), which is measured in megabytes/person (where megabytes is 106 bytes and is abbreviated MB). Global DSP (GDSP) is the total rigid disk drive space (in MB) of new units sold in a year divided by the world population in that year. The GDSP metric is a crude measure of how much disk storage could possibly be used to collect person-specific data on the world population. In 1983, one million fixed drives with an estimated total of 90 terabytes were sold worldwide; 30MB drives had the largest market segment. In 1996, 105 million drives, totaling 160,623 terabytes were sold with 1 and 2 gigabyte drives leading the industry. By the year 2000, with 20GB drive leading the industry, rigid drives sold for the year are projected to total 2,829,288 terabytes Rigid disk drive sales to top $34 billion in 1997. According to Latanya Sweeney, there are three trends in data gathering today: Type 1. Expansion of the number of fields being collected, known as the “collect more” trend. Type 2. Replace an existing aggregate data collection with a person-specific one, known as the “collect specifically” trend. Type 3. Gather information by starting a new person-specific data collection, known as the “collect it if you can” trend. == Related terms == Since "information" in electronic media is often used synonymously with "data", the term information explosion is closely related to the concept of data flood (also dubbed data deluge). Sometimes the term information flood is used as well. All of those basically boil down to the ever-increasing amount of electronic data exchanged per time unit. A term that covers the potential negative effects of information explosion is information inflation. The awareness about non-manageable amounts of data grew along with the advent of ever more powerful data processing since the mid-1960s. == Challenges == Even though the abundance of information can be beneficial in several levels, some problems may be of concern such as privacy, legal and ethical guidelines, filtering and data accuracy. Filtering refers to finding useful information in the middle of so much data, which relates to the job of data scientists. A typical example of a necessity of data filtering (data mining) is in healthcare since in the next years is due to have EHRs (Electronic Health Records) of patients available. With so much information available, the doctors will need to be able to identify patterns and select important data for the diagnosis of the patient. On the other hand, according to some experts, having so much public data available makes it difficult to provide data that is actually anonymous. Another point to take into account is the legal and ethical guidelines, which relates to who will be the owner of the data and how frequently he/she is obliged to the release this and for how long. With so many sources of data, another problem will be accuracy of such. An untrusted source may be challenged by others, by ordering a new set of data, causing a repetition in the information. According to Edward Huth, another concern is the accessibility and cost of such information. The accessibility rate could be improved by either reducing the costs or increasing the utility of the information. The reduction of costs according to the author, could be done by associations, which should assess which information was relevant and gather it in a more organized fashion. == Web servers == As of August 2005, there were over 70 million web servers. As of September 2007 there were over 135 million web servers. == Blogs == According to Technorati, the number of blogs doubles about every 6 months with a total of 35.3 million blogs as of April 2006. This is an example of the early stages of logistic growth, where growth is approximately exponential, since blogs are a recent innovation. As the number of blogs approaches the number of possible producers (humans), saturation occurs, growth declines, and the number of blogs eventually stabilizes.

    Read more →