Tumblr

Tumblr ( TUM-blər) is a microblogging and social media platform founded by David Karp in 2007 and operated by American company Tumblr, Inc., a subsidiary of Automattic. The service allows users to post multimedia and other content to a short-form blog. It has attracted significant attention and controversy for hosting a wide range of progressive user-generated content. == History == === Beginnings (2006–2012) === Development of Tumblr began in 2006 during a two-week gap between contracts at David Karp's software consulting company, Davidville. Karp had been interested in tumblelogs (short-form blogs, hence the name Tumblr) for some time and was waiting for one of the established blogging platforms to introduce their own tumblelogging platform. As none had done so after a year of waiting, Karp and developer Marco Arment began working on their own platform. Tumblr was launched in February 2007, and within two weeks had gained 75,000 users. Arment left the company in September 2010 to work on Instapaper. In June 2012, Tumblr featured its first major brand advertising campaign in collaboration with Adidas, who launched an official soccer Tumblr blog and bought ad placements on the user dashboard. This launch came only two months after Tumblr announced it would be moving towards paid advertising on its site. === Ownership by Yahoo! (2013–2018) === On May 20, 2013, it was announced that Yahoo and Tumblr had reached an agreement for Yahoo! Inc. to acquire Tumblr for $1.1 billion in cash. Many of Tumblr's users were unhappy with the news, causing some to start a petition, achieving nearly 170,000 signatures. David Karp remained CEO and the deal was finalized on June 20, 2013. Advertising sales goals were not met and in 2016 Yahoo wrote down $712 million of Tumblr's value. Verizon Communications acquired Yahoo in June 2017, and placed Yahoo and Tumblr under its Oath subsidiary. Karp announced in November 2017 that he would be leaving Tumblr by the end of the year. Jeff D'Onofrio, Tumblr's president and COO, took over leading the company. The site, along with the rest of the Oath division (renamed Verizon Media Group in 2019), continued to struggle under Verizon. In March 2019, Similarweb estimated Tumblr had lost 30% of its user traffic since December 2018, when the site had introduced a stricter content policy with heavier restrictions on adult content (which had been a notable draw to the service). In May 2019, it was reported that Verizon was considering selling the site due to its continued struggles since the purchase (as it had done with another Yahoo property, Flickr, via its sale to SmugMug). Following this news, Pornhub's vice president publicly expressed interest in purchasing Tumblr, with a promise to reinstate the previous adult content policies. === Automattic (2019–present) === On August 12, 2019, Verizon Media announced that it would sell Tumblr to Automattic, the operator of blog service WordPress.com and corporate backer of the open source blog software of the same name. The sale was for an undisclosed amount, but Axios reported that the sale price was less than $3 million, less than 0.3% of Yahoo's original purchase price. Automattic CEO Matt Mullenweg stated that the site will operate as a complementary service to WordPress.com, and that there were no plans to reverse the content policy decisions made during Verizon ownership. In November 2022, Mullenweg stated that Tumblr will add support for the decentralized social networking protocol ActivityPub. In November 2023, most of Tumblr's product development and marketing teams were transferred to other groups within Automattic. Mullenweg stated that focus would shift to core functionality and streamlining existing features. In February 2024, Automattic announced that it would begin selling user data from Tumblr and WordPress.com to Midjourney and OpenAI. Tumblr users are opted-in by default, with an option to opt out. In August 2024, Automattic announced that it would migrate Tumblr's backend to an architecture derived from WordPress, in order to ease development and code sharing between the platforms. The company stated that this migration would not impact the service's user experience and content, and that users "won't even notice a difference from the outside". In January 2025, Mullenweg stated that the migration, once completed, would also "unlock" ActivityPub access for Tumblr, including native support for the company's official ActivityPub plugin for WordPress. In April 2025, Automattic announced layoffs for 16% of its workforce, reducing a large portion of Tumblr staff. On March 16, 2026, Tumblr implemented a change to how notes were assigned to reblogs, making it more similar to sites like Twitter and Bluesky. The change was rolled back the next day after heavy user backlash. == Features == === Blog management === Dashboard: The dashboard is the primary tool for the typical Tumblr user. It is a live feed of recent posts from blogs that they follow. Through the dashboard, users are able to comment, reblog, and like posts from other blogs that appear on their dashboard. The dashboard allows the user to upload text posts, images, videos, quotes, or links to their blog with a click of a button displayed at the top of the dashboard. Users are also able to connect their blogs to their Twitter and Facebook accounts, so that whenever they make a post, it will also be sent as a tweet and a status update. As of June 2022, users can also turn off reblogs on specific posts through the dashboard. Queue: Users are able to set up a schedule to delay posts that they make. They can spread their posts over several hours or even days. Tags: Users can help their audience find posts about certain topics by adding tags. If someone were to upload a picture to their blog and wanted their viewers to find pictures, they would add the tag #picture, and their viewers could use that word to search for posts with the tag #picture. HTML editing: Tumblr allows users to edit their blog's theme using HTML to control the appearance of their blog. Custom themes are able to be shared and used by other users, or sold. Custom domains: Tumblr allows users to use custom domains for their blogs. Users must purchase a domain from Tumblr Domains, an in-house registrar that provides domains that can only be used with Tumblr unless removed from the user's blog and transferred to another registrar. Blogs previously were able to be linked with any domain/subdomain from any registrar, however following the introduction of the Tumblr Domains service, now requires you to purchase a domain directly from Tumblr to be used with a blog. Users who kept their blogs connected to a domain after the introduction got to keep their custom domain, as long as they do not disconnect it from Tumblr or let the domain expire. === Tags === The tagging system on the website operates on a hybrid tagging system, involving both self-tagging (user write their own tags on their posts) and an auto-manual function (the website will recommend popular tags and ones that the user has used before.) Only the first 20 tags added to any post will be indexed by the site. The tags are prefaced by a hashtag and separated by commas, and spaces and special characters are allowed, but only up to 140 characters total per tag. There are two main types used by Tumblr users: descriptive tagging, and opinion or commentary tagging. Descriptive tags are usually introduced by the original poster, and describe what is in the post (e.g. #art, #sky). These are important for the original poster to use, so their post will be indexed and searchable by others wishing to view that subject of content. Tags used as a form of communication are unique to Tumblr, and are typically more personal, expressing opinions, reactions, meta-commentary, background information, and more. Instead of adding onto the reblogged post (with their comments becoming an addition to each subsequent reblog from them) a user may add their comments in the tags, not changing the content or appearance of the original post in any way. Not all users choose to use tags this way, but those who do use tags for commentary may prefer it over adding a comment on the actual post. === Mobile === With Tumblr's 2009 acquisition of Tumblerette, an iOS application created by Jeff Rock and Garrett Ross, the service launched its official iPhone app. The site became available to BlackBerry smartphones on April 17, 2010, via a Mobelux application in BlackBerry World. In June 2012, Tumblr released a new version of its iOS app, Tumblr 3.0, allowing support for Spotify integration, hi-res images and offline access. An app for Android is also available. A Windows Phone app was released on April 23, 2013. An app for Google Glass was released on May 16, 2013. === Inbox and messaging === Tumblr blogs have the option to allow users to submit questions, either as themselves or anonymously, to the blog for a response. Tumblr

Symbol level

In knowledge-based systems, agents choose actions based on the principle of rationality to move closer to a desired goal. The agent is able to make decisions based on knowledge it has about the world (see knowledge level). But for the agent to actually change its state, it must use whatever means it has available. This level of description for the agent's behavior is the symbol level. The term was coined by Allen Newell in 1982. For example, in a computer program, the knowledge level consists of the information contained in its data structures that it uses to perform certain actions. The symbol level consists of the program's algorithms, the data structures themselves, and so on.

Investigative Data Warehouse

Investigative Data Warehouse (IDW) is a searchable database operated by the FBI. It was created in 2004. Much of the nature and scope of the database is classified. The database is a centralization of multiple federal and state databases, including criminal records from various law enforcement agencies, the U.S. Department of the Treasury's Financial Crimes Enforcement Network (FinCEN), and public records databases. According to Michael Morehart's testimony before the House Committee on Financial Services in 2006, the "IDW is a centralized, web-enabled, closed system repository for intelligence and investigative data. This system, maintained by the FBI, allows appropriately trained and authorized personnel throughout the country to query for information of relevance to investigative and intelligence matters." == Overview == In 2004, according to a government solicitation for bids to manage the project, it was approximately 10TB in size. In 2005, according to one FBI official, the IDW contained approximately 100 million documents. In 2006 it contained more than 560 million documents and was accessible by more than 12,000 individuals. According to the FBI's website, as of August 22, 2007, the database contained 700 million records from 53 databases and was accessible by 13,000 individuals around the world. As of 2007, the FBI was the subject of a lawsuit brought by the EFF (Electronic Frontier Foundation) because of a lack of public notice describing the database and the criteria for including personal information, as required by the Privacy Act of 1974. The lawsuits were a result of two Freedom of Information Act requests filed by the EFF in 2006. It was built in part by Chiliad corporation, the FBI Office of the Chief Technology Officer, and others. Companies listed on the FOIA files include Northrop Grumman . == Purpose == Investigative Data Warehouse–Secret (IDW-S) "provides data and data processing/analysis services to FBI agents and analysts as they perform counter-terrorism, counter-intelligence, and law enforcement missions". The core subsystem supports the Counter-Terrorism Division (CTD), the Special Event Unit, and via DOCLAB-S, the Joint Intelligence Committee Investigation (JICI) and IntelPlus. According to a 2005 email, "IDW will also be used for criminal and other authorized non-CT investigations as it evolves." (CT being counter terrorism) == Subsystems == Within the system, there were subsystems named IDW-S Core, SPT, and DOCLAB-S The special projects team (SPT): allows for the rapid import of new specialized data sources. These data sources are not made available to the general IDW users but instead are provided to a small group of users who have a demonstrated "need-to-know". The SPT System is similar in function to the IDW-S system, with the main difference is a different set of data sources. The SPT System allows its users to access not only the standard IDW Data Store but the specialized SPT Data Store. == Privacy == According to internal emails, the FBI performed several Privacy Impact Assessments (PIAs) of the IDW system. They worked with lawyers from their National Security Law Branch (NSLB) to attempt to make sure their system was complying with various laws regarding sharing of information and secrecy (for example, rule 6e of the Federal Rules of Criminal Procedure, regarding the secrecy of Grand Jury material ). The Information Sharing Policy Group (ISPG) formed a Discretionary Access Control Team (DACT), to work on "approval of data sets" and "access control requirements" for IDW and DataMart, and responding to other Intelligence Community agencies requesting access. The EFF FOIA IDW website states "Despite the vast amount of personal information contained in the IDW, the FBI has never published a Privacy Act notice describing the system or explaining the ways in which the records might be used." There was also a 2005 email from someone on the Office of General Council (OGC) about "preliminary staff musings that maybe we should limit FBI PIA requirements to non-NS systems" (NS being National Security). There was also an email from 2006 saying that 'national security systems are exempt from E-Gov', apparently referring to the E-Government Act of 2002, which has a section that deals with privacy. == Data sources == The IDW used many data sources. The FOIA documents from EFF are heavily redacted, but some of the sources are as follows: FBI Automated Case Support system (ACS), subset of the Electronic Case File (ECF) system Joint Intelligence Committee Investigation documents (JICI), with OCR text "Open Source News" (public websites, such as the Washington Post and others) Secure Automated Messaging Network (SAMNet) Violent Gang and Terrorist Organizing File (VGTOF) DARPA TIDES program ('open source news' that has been organized and collected) IntelPlus Filerooms, with OCR text FBI National Crime Information Center (NCIC) FBI Records Management Division (RMD), Document Laboratory (DocLab), FBIHQ MiTAP (collects data from public sources, websites, etc.) SPT-Specific data sources (partial list, FOIA files have large parts redacted): Unified Name Index (UNI) extracts Financial Center (FinCen), including Bank Secrecy Act data "Various Sources", including the Transportation Security Administration FBI Counterterrorism Division (CTD) Telephone numbers / addresses from ACS Case data from ACS Terrorist Watch List (TWL) "Other NJTTF data" DoS ... Lost/Stolen Passport data No Fly List, from TSA Selectee list, from TSA ACS/ECF with some case types excluded CIA non-TS/non-SCI Technical Discussions (TDs) and Intelligence Information Reports (IIRs) from 1978 to the May 2004 There was also talk of linking the FTTTF "Data Mart" with IDW. The data in IDW is classified at the 'Secret' level or lower. Higher classifications are not allowed, and can be removed

NCSA Brown Dog

NCSA Brown Dog is a research project to develop a method for easily accessing historic research data stored in order to maintain the long-term viability of large bodies of scientific research. It is supported by the National Center for Supercomputing Applications (NCSA) that is funded by the National Science Foundation (NSF). == History == Brown Dog is part of the DataNet partners program funded by NSF in 2008. DataNet was conceived to address the increasingly digital and data-intensive nature of science, engineering and education. Brown Dog is part of a follow-on effort called Data Infrastructure Building Blocks (DIBBs), focused on building software to support DataNet. The project was proposed by researchers at NCSA and the University of Illinois Urbana-Champaign as well as researchers from Boston University and the University of North Carolina at Chapel Hill. == Unstructured, uncurated, long tail data == Much scientific data is smaller, unstructured and uncurated and thus not easily shared. Such data is sometimes referred to as "long tail" data. This borrows a term from statistics and refers to the tail of the distribution of project sizes. The majority of smaller projects lack the resources to properly steward the data they produce. This so-called "long tail" data, both past and present, has the potential to inform future research in many study areas. Much of this data has become inaccessible due to obsolete software and file formats. The resulting impossibility of reviewing data from older research disrupts the overall scientific research project. == Approach == Brown Dog describes itself as the "super mutt" of software (thus the name "Brown Dog"), serving as a low-level data infrastructure to interface digital data content across the internet. Its approach is to use every possible source of automated help (i.e., software) in existence in a robust and provenance-preserving manner to create a service that can deal with as much of this data as possible. The project sees the broader impact of its work in its potential to serve the general public as a sort of "DNS for data", with the goal of making all data and all file formats as accessible as webpages are today. == Technology == Brown Dog seeks to address problems involving the use of uncurated and unstructured data collections through the development of two services: the Data Access Proxy (DAP) to aid in the conversion of file formats and the Data Tilling Services (DTS) for the automatic extraction of metadata from file contents. Once developed, researchers and general public users will be able to download browser plugins and other tools from the Brown Dog tool catalog. === Data Tilling Service === Data Tilling Service (DTS) will allow users to search data collections using an existing file to discover other similar files in a collection. A DTS search field will be appended to configured browsers where example files can be dropped. This tells DTS to search all the files under a given URL for files similar to the dropped file. For example, while browsing an online image collection, a user could drop an image of three people into the search field, and the DTS would return all images in the collection that also contain three people. If DTS encounters a foreign file format, it will utilize DAP to make the file accessible. DTS also indexes the data and extract and appends metadata to files and collections enabling users to gain some sense of the type of data they are encountering. This service runs on port 9443. === Data Access Proxy === Data Access Proxy (DAP) allows users to access data files that would otherwise be unreadable. Similar to an internet gateway or Domain Name Service, the DAP configuration would be entered into a user's machine and browser settings. Data requests over HTTP would first be examined by DAP to determine if the native file format is readable on the client device. If not, DAP converts the file into the best available format readable by the client machine. Alternatively, the user could specify the desired format themselves. This service runs on port 8184. == Use cases == Brown Dog targets three use cases proposed by groups within the EarthCube research communities. Developers and researchers from these communities will work together on use cases that span geoscience, engineering, biology and social science. === Long tail vegetation data in ecology and global change biology === This use case is led by Michael Dietze, Boston University Data on the abundance, species composition, and size structure of vegetation is critically important for a wide array of sub-disciplines in ecology, conservation, natural resource management, and global change biology. However, addressing many of the pressing questions in these disciplines will require that terrestrial biosphere and hydrologic models are able to assimilate the large amount of long-tail data that exists but is largely inaccessible. The Brown Dog team in cooperation with researches from Dietze's lab will facilitate the capture of a huge body of smaller research-oriented vegetation data sets collected over many decades and historical vegetation data embedded in Public Land Survey data dating back to 1785. This data will be used as initial conditions for models, to make sense of other large data sets and for model calibration and validation. === Designing green infrastructure considering storm water and human requirements === This use case is led by Barbara Minsker], University of Illinois at Urbana-Champaign]; William Sullivan, University of Illinois at Urbana-Champaign; Arthur Schmidt, University of Illinois at Urbana-Champaign. This case study involves developing novel green infrastructure design criteria and models that integrate requirements for storm water management and ecosystem and human health and well being. To address the scientific and social problems associated with the design of green spaces, data accessibility and availability is a major challenge. This study will focus on identified areas of the Green Healthy Neighborhood Planning region within the City of Chicago where existing local sewer performance is most deficient and where changes in impervious area through green infrastructure would be beneficial to under served neighborhoods. Brown Dog will be used to extract long-tail experimental data on human landscape preferences and health impacts. This data will be used to develop a human health impacts model that will then be linked together with a terrestrial biosphere model and a storm water model using Brown Dog technology. === Development and application for critical zone studies === This use case is led by Praveen Kumar, University of Illinois at Urbana-Champaign Critical Zone (CZ) is the "skin" of the earth that extends from the treetops to the bedrock that is created by life processes working at scales from microbes to biomes. The Critical Zone supports all terrestrial living systems. Its upper part is the bio-mantle. This is where terrestrial biota live, reproduce, use and expend energy, and where their wastes and remains accumulate and decompose. It encompasses the soil, which acts as a geomembrane through which water and solutes, energy, gases, solids, and organisms interact with the atmosphere, biosphere, hydrosphere, and lithosphere. A variety of drivers affect this bio-dynamic zone, ranging from climate and deforestation to agriculture, grazing and human development. Understanding and predicting these effects is central to managing and sustaining vital ecosystem services such as soil fertility, water purification, and production of food resources, and, at larger scales, global carbon cycling and carbon sequestration. The CZ provides a unifying framework for integrating terrestrial surface and near-surface environments, and reflects an intricate web of biological and chemical processes and human impacts occurring at vastly different temporal and spatial scales. The nature of these data create significant challenges for inter-disciplinary studies of the CZ because integration of the variety and number of data products and models has been a barrier. On the other hand, CZ data provides an excellent opportunity for defining, testing and implementing Brown Dog technologies. In this context "unstructured" data is viewed broadly as consisting of a collection of heterogeneous data with formats that reflect temporal and disciplinary legacies, data from emerging low cost open hardware based sensors and embedded sensor networks that lack well defined metadata and sensor characteristics, as well as data that are available as maps, images and text. == NSF Award == CIF21 DIBBs: Brown Dog was awarded in the winter of 2013 with a start date of October 1, 2013. Estimated expiration date is September 30, 2018. The award amount was $10,519,716.00, the largest DIBB award. The principal investigator is Kenton McHenry of NCSA at the University of Illinois at Urbana-Champaign. Coleaders are Jong Lee NCSA/UIU

Chinese speech synthesis

Chinese speech synthesis is the application of speech synthesis to the Chinese language (usually Standard Chinese). It poses additional difficulties due to Chinese characters frequently having different pronunciations in different contexts and the complex prosody, which is essential to convey the meaning of words, and sometimes the difficulty in obtaining agreement among native speakers concerning what the correct pronunciation is of certain phonemes. == Concatenation (Ekho and KeyTip) == Recordings can be concatenated in any desired combination, but the joins sound forced (as is usual for simple concatenation-based speech synthesis) and this can severely affect prosody; these synthesizers are also inflexible in terms of speed and expression. However, because these synthesizers do not rely on a corpus, there is no noticeable degradation in performance when they are given more unusual or awkward phrases. Ekho is an open source TTS which simply concatenates sampled syllables. It currently supports Cantonese, Mandarin, and experimentally Korean. Some of the Mandarin syllables have been pitched-normalised in Praat. A modified version of these is used in Gradint's "synthesis from partials". cjkware.com used to ship a product called KeyTip Putonghua Reader which worked similarly; it contained 120 Megabytes of sound recordings (GSM-compressed to 40 Megabytes in the evaluation version), comprising 10,000 multi-syllable dictionary words plus single-syllable recordings in 6 different prosodies (4 tones, neutral tone, and an extra third-tone recording for use at the end of a phrase). == Lightweight synthesizers (eSpeak and Yuet) == The lightweight open-source speech project eSpeak, which has its own approach to synthesis, has experimented with Mandarin and Cantonese. eSpeak was used by Google Translate from May 2010 until December 2010. The commercial product "Yuet" is also lightweight (it is intended to be suitable for resource-constrained environments like embedded systems); it was written from scratch in ANSI C starting from 2013. Yuet claims a built-in NLP model that does not require a separate dictionary; the speech synthesised by the engine claims clear word boundaries and emphasis on appropriate words. Communication with its author is required to obtain a copy. Both eSpeak and Yuet can synthesis speech for Cantonese and Mandarin from the same input text, and can output the corresponding romanisation (for Cantonese, Yuet uses Yale and eSpeak uses Jyutping; both use Pinyin for Mandarin). eSpeak does not concern itself with word boundaries when these don't change the question of which syllable should be spoken. == Corpus-based == A "corpus-based" approach can sound very natural in most cases but can err in dealing with unusual phrases if they can't be matched with the corpus. The synthesiser engine is typically very large (hundreds or even thousands of megabytes) due to the size of the corpus. === iFlyTek === Anhui USTC iFlyTek Co., Ltd (iFlyTek) published a W3C paper in which they adapted Speech Synthesis Markup Language to produce a mark-up language called Chinese Speech Synthesis Markup Language (CSSML) which can include additional markup to clarify the pronunciation of characters and to add some prosody information. The amount of data involved is not disclosed by iFlyTek but can be seen from the commercial products that iFlyTek have licensed their technology to; for example, Bider's SpeechPlus is a 1.3 Gigabyte download, 1.2 Gigabytes of which is used for the highly compressed data for a single Chinese voice. iFlyTek's synthesiser can also synthesise mixed Chinese and English text with the same voice (e.g. Chinese sentences containing some English words); they claim their English synthesis to be "average". The iFlyTek corpus appears to be heavily dependent on Chinese characters, and it is not possible to synthesize from pinyin alone. It is sometimes possible by means of CSSML to add pinyin to the characters to disambiguate between multiple possible pronunciations, but this does not always work. === NeoSpeech === There is an online interactive demonstration for NeoSpeech speech synthesis, which accepts Chinese characters and also pinyin if it's enclosed in their proprietary "VTML" markup. === Mac OS === Mac OS had Chinese speech synthesizers available up to version 9. This was removed in 10.0 and reinstated in 10.7 (Lion). === Historical corpus-based synthesizers (no longer available) === A corpus-based approach was taken by Tsinghua University in SinoSonic, with the Harbin dialect voice data taking 800 Megabytes. This was planned to be offered as a download but the link was never activated. Nowadays, only references to it can be found on Internet Archive. Bell Labs' approach, which was demonstrated online in 1997 but subsequently removed, was described in a monograph "Multilingual Text-to-Speech Synthesis: The Bell Labs Approach" (Springer, October 31, 1997, ISBN 978-0-7923-8027-6), and the former employee who was responsible for the project, Chilin Shih (who subsequently worked at the University of Illinois) put some notes about her methods on her website.

Crucible (software)

Crucible is a collaborative code review application by Australian software company Atlassian. Like other Atlassian products, Crucible is a Web-based application primarily aimed at enterprise, and certain features that enable peer review of a codebase may be considered enterprise social software. Crucible is particularly tailored to remote workers, and facilitates asynchronous review and commenting on code. Crucible also integrates with popular source control tools, such as Git and Subversion. Crucible is not open source, but customers are allowed to view and modify the code for their own use.

Transaction data

Transaction data or transaction information is a category of data describing transactions. Transaction data/information gather variables generally referring to reference data or master data – e.g. dates, times, time zones, currencies. Typical transactions are: Financial transactions about orders, invoices, payments; Work transactions about plans, activity records; Logistic transactions about deliveries, storage records, travel records, etc.. == Management == Recording and storing transactions is called records management. The record of the transaction is stored in a place where the retention can be guaranteed and where data is archived or removed following a retention period. Formats of recorded transactions can be digital data in databases and spreadsheets, or handwritten texts in physical documents like former bankbooks. Transaction processing systems are application software that generate transactions and manage transaction data/information, e.g. SAP and Oracle Financials. == Data warehousing == Transaction data can be summarised in a data warehouse, which helps accessibility and analysis of the data.