Noisy text analytics

Noisy text analytics

Noisy text analytics is a process of information extraction whose goal is to automatically extract structured or semistructured information from noisy unstructured text data. While Text analytics is a growing and mature field that has great value because of the huge amounts of data being produced, processing of noisy text is gaining in importance because a lot of common applications produce noisy text data. Noisy unstructured text data is found in informal settings such as online chat, text messages, e-mails, message boards, newsgroups, blogs, wikis and web pages. Also, text produced by processing spontaneous speech using automatic speech recognition and printed or handwritten text using optical character recognition contains processing noise. Text produced under such circumstances is typically highly noisy containing spelling errors, abbreviations, non-standard words, false starts, repetitions, missing punctuations, missing letter case information, pause filling words such as “um” and “uh” and other texting and speech disfluencies. Such text can be seen in large amounts in contact centers, chat rooms, optical character recognition (OCR) of text documents, short message service (SMS) text, etc. Documents with historical language can also be considered noisy with respect to today's knowledge about the language. Such text contains important historical, religious, ancient medical knowledge that is useful. The nature of the noisy text produced in all these contexts warrants moving beyond traditional text analysis techniques. == Techniques for noisy text analysis == Missing punctuation and the use of non-standard words can often hinder standard natural language processing tools such as part-of-speech tagging and parsing. Techniques to both learn from the noisy data and then to be able to process the noisy data are only now being developed. == Possible source of noisy text == World Wide Web: Poorly written text is found in web pages, online chat, blogs, wikis, discussion forums, newsgroups. Most of these data are unstructured and the style of writing is very different from, say, well-written news articles. Analysis for the web data is important because they are sources for market buzz analysis, market review, trend estimation, etc. Also, because of the large amount of data, it is necessary to find efficient methods of information extraction, classification, automatic summarization and analysis of these data. Contact centers: This is a general term for help desks, information lines and customer service centers operating in domains ranging from computer sales and support to mobile phones to apparels. On an average a person in the developed world interacts at least once a week with a contact center agent. A typical contact center agent handles over a hundred calls per day. They operate in various modes such as voice, online chat and E-mail. The contact center industry produces gigabytes of data in the form of E-mails, chat logs, voice conversation transcriptions, customer feedback, etc. A bulk of the contact center data is voice conversations. Transcription of these using state of the art automatic speech recognition results in text with 30-40% word error rate. Further, even written modes of communication like online chat between customers and agents and even the interactions over email tend to be noisy. Analysis of contact center data is essential for customer relationship management, customer satisfaction analysis, call modeling, customer profiling, agent profiling, etc., and it requires sophisticated techniques to handle poorly written text. Printed Documents: Many libraries, government organizations and national defence organizations have vast repositories of hard copy documents. To retrieve and process the content from such documents, they need to be processed using Optical Character Recognition. In addition to printed text, these documents may also contain handwritten annotations. OCRed text can be highly noisy depending on the font size, quality of the print etc. It can range from 2-3% word error rates to as high as 50-60% word error rates. Handwritten annotations can be particularly hard to decipher, and error rates can be quite high in their presence. Short Messaging Service (SMS): Language usage over computer mediated discourses, like chats, emails and SMS texts, significantly differs from the standard form of the language. An urge towards shorter message length facilitating faster typing and the need for semantic clarity, shape the structure of this non-standard form known as the texting language.

Zamzar

Zamzar is an online file converter and compressor, created by brothers Mike and Chris Whyley in England in 2006. It allows users to convert files online, without downloading a software tool, and supports over 1,200 different conversion types. Since its formation, the service has converted over 510 million files for users from 245 different countries. The service supports the conversion of documents, images, audio, video, e-Books, CAD files and compressed file formats. Users can type in a URL or upload one or more files (if they are all of the same format) from their computer; Zamzar will then convert the file(s) to another user-specified format, such as an Adobe PDF file to a Microsoft Word document. Once conversion is complete, users can immediately download the file from their web browser. Users can also choose to receive an email with a link to download the converted file. In February 2021 Zamzar expanded their tool and announced a new file compression service. The compressor is visually similar to the conversion tool with a drag and drop download feature. As with the converter, users have the option to subscribe for a paid plan if they wish to compress multiple or larger files than the free service permits == File conversion API == in 2015 Zamzar launched a file conversion API, allowing users to integrate file conversion capabilities into their own websites and applications. Sample code is provided to allow users to integrate file conversion capabilities in C#, Java, Node.js, PHP, Python and cURL. Zamzar also maintains a project on GitHub which allows users to perform file conversion from the command line on Linux, MacOS or Windows systems. == Email file conversion == It is also possible to send files for conversion by emailing them to Zamzar. Zamzar launched this capability in 2012, allowing users to email files to dedicated email addresses for the file to be automatically converted to a different format. A link is then emailed back to the end user to allow them to download their converted file. == User privilege levels == Zamzar is currently free to use, but there is a limit of two conversions per hour for files up to 100MB. Users can pay a monthly subscription in order to access preferential features, such as unlimited file conversions, online file management, shorter response and queuing times and other benefits. == Name == Its name comes from Franz Kafka's The Metamorphosis. Its main character is called Gregor Samsa and it is from his surname that Zamzar is derived. The founders of the service considered three other names – Konvertieren, Khamailen and Obrogo – before settling on Zamzar.

Amino (app)

Amino was a social media application originally developed by Narvii, Inc. It was originally created by Yin Wang and Ben Anderson in 2010, and then launched as an app in 2012. Amino was acquired by MediaLab AI Inc in January 2021, and the founders are no longer associated with the application. The platform ceased all operations in December 2025. == History == In 2010, Wang and Anderson came up with the idea for a convention-like community while attending an anime convention in Boston, Massachusetts. Later that year, they would release two apps revolving around K-pop and photography that allowed fans of those subjects to chat freely. That same year, Amino was officially released. === Shutdown === In early December 2025, the Amino platform abruptly stopped all operations. Users worldwide lost access to the mobile application and website, with server requests returning connection time-out errors. Parent company MediaLab AI has issued no official statement regarding the cause to date, or declared any possible cause behind it. === Final Message === According to Shawn, a member of Amino support, Amino has ceased operations as of December 19th. The message that was sent out from Shawn reads: "Hey there, Thanks for your message. Amino has ceased operations. As of December 19th, we no longer retain personal data relating to you. Accordingly, we are unable to provide a copy of your data. Kind regards, - Amino Support" This message was sent on January 4th, 2026. This was the final support message sent from the Amino Support mail. == Growth == Amino received 1.65 million dollars of seed funding in 2014, primarily from Union Ventures. Some additional seed investors include Google Ventures, SV Angel, Box Group, and other interested parties. By July 2014, Amino's apps were downloaded 500,000 times. Though only having 15 communities at that time, Amino eventually grew to have 41 communities in September 2015. Amino's apps had been downloaded 13 million times by July 2016. Fandoms had migrated from websites like Facebook and Reddit to Amino, partly because of the app's mobile-native experience. Before 2016, when a user wanted to join a new Amino, they had to download another app for the Amino they wanted to join, with each apps name beginning with "Amino for:". In 2016, Amino Apps launched a centralized portal that hosted every Amino community in one app, meaning users no longer had to download multiple apps. In July of the same year, ACM, an app that allowed users to create their own communities, was launched. This resulted in the number of communities on Amino skyrocketing to over 2.5 million as of June 2018. == Features == The main feature of Amino was communities dedicated to a certain topic that users could join. Users could also chat with other members of a community in three ways: text, voice, or screening room, which allowed users to watch videos together while voice chatting. Other features include polls, blog posts, image posts, wiki entries, stories, and quizzes. In some cases, posts that were very well-made and had been noticed by a community's administration would end up receiving a feature, making it appear on the front page along with other featured content. In 2018, a premium membership option called Amino+ was added. Amino+ comes with additional features such as exclusive stickers, the ability to make stickers, custom chat bubbles, high resolution images, and other perks. Membership can now only be purchased with money. Amino coins can be purchased or earned through enabling ads, watching ad videos, completing activities on the Offer Wall, and playing Lucky Draw when checking in, but are of little use due to the users not being able to buy Amino+ by amino coins anymore. Members can give and receive coins through props. In 2019, Amino introduced six original short-form animated series, labelled "Amino Originals," produced by independent artists from across the internet. ATJ's "Little Red," a re-imagining of Little Red Riding Hood, premiered on November 15, 2019. "Little Red" was joined by five other shows in late December. Sophie Feher's "The Reef," a comedy featuring an aspiring marine biologist meeting a merman, premiered on December 27 alongside "Princely," an LGBT fairy tale created by Matt Bruneau-Richardson of Tiny Siren Animation. "Spaced Out," an alien abduction comedy by Michael Jae, and YouTuber Alex Clark's "Wyndvania II" premiered on December 28. Mysie Pereira's fairy tale "Turned to Stone" and Marcin Pawlowski's "Stranded" premiered on December 29, 2019. == Administration == On each community, there are two types of staff members, these being ‘Leader’ and ‘Curator.’ Leaders are higher rank than curators. Curators are usually the ones who feature posts, or post important announcements for users to see. Curators are able to disable a post or public chat, delete comments or chat threads, manage featured content, manage posts in topic categories, and approve Wiki entries. Leaders have more power than curators. In addition to curator powers, leaders can submit a community to be listed, change the Amino's features, change navigation, alter the community appearance, change the Amino's privacy settings, manage the Amino's join requests, send invites, appoint or demote Curators, strike or ban members, manage flagged content, change users' custom titles, manage topics and wiki categories, and create broadcasts (notifications sent for posts). One leader will have the status of agent. An agent is the primary leader of a community; the person who created the community is automatically agent. An agent has the ability to delete their community as long as it is not too large or too active. An agent can appoint and remove both leaders and curators. Agent status can be transferred voluntarily to another leader, curator, or community member. If an agent is inactive, Team Amino may assist in transferring agent status. == Apps == === Amino Community Manager === Otherwise known as ACM, this application is what users use to create and manage their own community in Amino. This app allows moderators to customize a community's theme, icon, and categories. ACM also allows moderation to customize community descriptions, pick leaders, change language settings, create a tagline for the community, change the home page lay out, alter the side navigation menu, and more. Unlisted communities are able to change their community's title and Amino ID, but this is not an option once a community is listed. A leader can use ACM to submit a request for their community to be listed on the explore page, after which the community will be reviewed by Team Amino for approval. Communities can be deleted on ACM, but only by the agent of that community. == Guidelines == Amino has a set of guidelines that all communities must comply with. Amino does not allow harassment or hate, spam or self-promotion (including promotion of one's own Amino community), sexual/NSFW content, self harm, real graphic/gross content (fictional content is generally acceptable), unsafe/illegal content, or content that violates copyright. Communities are allowed to have additional rules so long as they do not violate Amino's rules. In addition to Amino's rules, users are required to be at least 13 years of age in the U.S. and 16 years of age in European Union countries. While sexual imagery is not allowed in any community and text based sexual content is not allowed in public areas, some private communities are allowed to discuss sexual themes. However, they are not exempt from Amino's rules on NSFW content. If guidelines are broken, a leader may disable content or impose a warning, strike, or ban, depending on the severity of the infringement. A warning is a message informing the user that they have violated a guideline and may face further punishment unless they change their behaviour. A strike will put the user in read-only mode for up to 24 hours; this mode prevents the user from posting, chatting, or interacting with posts in that community. A ban removes the user from the community. Team Amino can separately issue users with strikes or bans across the entire platform. == Controversies == In 2017, organizations in Argentina for the protection of minors reported inappropriate material on the app, ranging from pornography to material promoting suicide to underage users. In 2019, Abilene police in Texas released a statement that sexual predators were using Amino chat rooms to approach minors. In 2020, authorities from the Christian County in the state of Kentucky alerted parents about possible sexual predators on Amino. In 2025, the British Police identified Amino as one of several platforms used by a child exploitation network that had previously extorted minors in different countries in Europe and North America. Several families reported to the National Society for the Prevention of Cruelty to Children that pedophiles were using the app for the purpose of sexual role-playing with minors, c

Digital cinema

Digital cinema is the digital technology used within the film industry to distribute or project motion pictures as opposed to the historical use of reels of motion picture film, such as 35 mm film. Whereas film reels have to be shipped to movie theaters, a digital movie can be distributed to cinemas in a number of ways: over the Internet or dedicated satellite links, or by sending hard drives or optical discs such as Blu-ray discs, then projected using a digital video projector instead of a film projector. Typically, digital movies are shot using digital movie cameras or in animation transferred from a file and are edited using a non-linear editing system (NLE). The NLE is often a video editing application installed in one or more computers that may be networked to access the original footage from a remote server, share or gain access to computing resources for rendering the final video, and allow several editors to work on the same timeline or project. Alternatively a digital movie could be a film reel that has been digitized using a motion picture film scanner and then restored, or, a digital movie could be recorded using a film recorder onto film stock for projection using a traditional film projector. Digital cinema is distinct from high-definition television and does not necessarily use traditional television or other traditional high-definition video standards, aspect ratios, or frame rates. In digital cinema, resolutions are represented by the horizontal pixel count, usually 2K (2048×1080 or 2.2 megapixels) or 4K (4096×2160 or 8.8 megapixels). The 2K and 4K resolutions used in digital cinema projection are often referred to as DCI 2K and DCI 4K. DCI stands for Digital Cinema Initiatives. As digital cinema technology improved in the early 2010s, most theaters across the world converted to digital video projection. Digital cinema technology has continued to develop over the years with RealD 3D, IMAX, RPX, 4DX, Dolby Cinema, and ScreenX, allowing moviegoers more immersive experiences. == History == The transition from film to digital video was preceded by cinema's transition from analog to digital audio, with the release of the Dolby Digital (AC-3) audio coding standard in 1991. Its main basis is the modified discrete cosine transform (MDCT), a lossy audio compression algorithm. It is a modification of the discrete cosine transform (DCT) algorithm, which was first proposed by Nasir Ahmed in 1972 and was originally intended for image compression. The DCT was adapted into the MDCT by J.P. Princen, A.W. Johnson and Alan B. Bradley at the University of Surrey in 1987, and then Dolby Laboratories adapted the MDCT algorithm along with perceptual coding principles to develop the AC-3 audio format for cinema needs. Cinema in the 1990s typically combined analog photochemical images with digital audio. Digital media playback of high-resolution 2K files has at least a 20-year history. Early video data storage units (RAIDs) fed custom frame buffer systems with large memories. In early digital video units, the content was usually restricted to several minutes of material. Transfer of content between remote locations was slow and had limited capacity. It was not until the late 1990s that feature-length films could be sent over the "wire" (Internet or dedicated fiber links). On October 23, 1998, Digital light processing (DLP) projector technology was publicly demonstrated with the release of The Last Broadcast, the first feature-length movie, shot, edited and distributed digitally. In conjunction with Texas Instruments, the movie was publicly demonstrated in five theaters across the United States (Philadelphia, Portland (Oregon), Minneapolis, Providence, and Orlando). === Foundations === In the United States, on June 18, 1999, Texas Instruments' DLP Cinema projector technology was publicly demonstrated on two screens in Los Angeles and New York for the release of Lucasfilm's Star Wars Episode I: The Phantom Menace. In Europe, on February 2, 2000, Texas Instruments' DLP Cinema projector technology was publicly demonstrated, by Philippe Binant, on one screen in Paris for the release of Toy Story 2. From 1997 to 2000, the JPEG 2000 image compression standard was developed by a Joint Photographic Experts Group (JPEG) committee chaired by Touradj Ebrahimi (later the JPEG president). In contrast to the original 1992 JPEG standard, which is a DCT-based lossy compression format for static digital images, JPEG 2000 is a discrete wavelet transform (DWT) based compression standard that could be adapted for motion imaging video compression with the Motion JPEG 2000 extension. JPEG 2000 technology was later selected as the video coding standard for digital cinema in 2004. In 1992, Hughes-JVC was founded by JVC and Hughes Electronics to develop ILA (Image Light Amplifer) digital video projectors for commercial movie theaters using liquid crystal on silicon (LCOS) technology. In 1997, JVC introduced D-ILA (Direct-Drive ILA) technology with a 2K resolution digital video projector. In 2000, JVC introduced a 4K resolution video projector using D-ILA technology. === Initiatives === On January 19, 2000, the Society of Motion Picture and Television Engineers, in the United States, initiated the first standards group dedicated to developing digital cinema. By December 2000, there were 15 digital cinema screens in the United States and Canada, 11 in Western Europe, 4 in Asia, and 1 in South America. Digital Cinema Initiatives (DCI) was formed in March 2002 as a joint project of many motion picture studios (Disney, Fox, MGM, Paramount, Sony Pictures, Universal and Warner Bros.) to develop a system specification for digital cinema. The same month it was reported that the number of cinemas equipped with digital projectors had increased to about 50 in the US and 30 more in the rest of the world. In April 2004, in collaboration with the American Society of Cinematographers, DCI created standard evaluation material (the ASC/DCI StEM material) for testing of 2K and 4K playback and compression technologies. DCI selected JPEG 2000 as the basis for the compression in the system the same year. Initial tests with JPEG 2000 produced bit rates of around 75–125 Mbit/s for 2K resolution and 100–200 Mbit/s for 4K resolution. === Worldwide deployment === In China, in June 2005, an e-cinema system called "dMs" was established and was used in over 15,000 screens spread across China's 30 provinces. DMs estimated that the system would expand to 40,000 screens in 2009. In 2005, the UK Film Council Digital Screen Network launched in the UK by Arts Alliance Media creating a chain of 250 2K digital cinema systems. The roll-out was completed in 2006. This was the first mass roll-out in Europe. AccessIT/Christie Digital also started a roll-out in the United States and Canada. By mid-2006, about 400 theaters were equipped with 2K digital projectors with the number increasing every month. In August 2006, the Malayalam digital movie Moonnamathoral, produced by Benzy Martin, was distributed via satellite to cinemas, thus becoming the first Indian digital cinema. This was done by Emil and Eric Digital Films, a company based at Thrissur using the end-to-end digital cinema system developed by Singapore-based DG2L Technologies. In January 2007, Guru became the first Indian film mastered in the DCI-compliant JPEG 2000 Interop format and also the first Indian film to be previewed digitally, internationally, at the Elgin Winter Garden in Toronto. This film was digitally mastered at Real Image Media Technologies in India. In 2007, the UK became home to Europe's first DCI-compliant fully digital multiplex cinemas; Odeon Hatfield and Odeon Surrey Quays (in London), with a total of 18 digital screens, were launched on 9 February 2007. By March 2007, with the release of Disney's Meet the Robinsons, about 600 screens had been equipped with digital projectors. In June 2007, Arts Alliance Media announced the first European commercial digital cinema Virtual Print Fee (VPF) agreements (with 20th Century Fox and Universal Pictures). In March 2009, AMC Theatres announced that it closed a $315 million deal with Sony to replace all of its movie projectors with 4K HDR digital projectors starting in the second quarter of 2009; it was anticipated that this replacement would be finished by 2012. As digital cinema technology improved in the early 2010s, most theaters across the world converted to digital video projection. In January 2011, the total number of digital screens worldwide was 36,242, up from 16,339 at end 2009 or a growth rate of 121.8 percent during the year. There were 10,083 d-screens in Europe as a whole (28.2 percent of global figure), 16,522 in the United States and Canada (46.2 percent of global figure) and 7,703 in Asia (21.6 percent of global figure). Worldwide progress was slower as in some territories, particularly Latin America and Africa. As of 31 March 2015, 38,719 screens (out of a total of 3

Open Mashup Alliance

The Open Mashup Alliance (OMA) is a non-profit consortium that promotes the adoption of mashup solutions in the enterprise through the evolution of enterprise mashup standards like EMML. The initial members of the OMA include some large technology companies such as Adobe Systems, Hewlett-Packard, and Intel and some major technology users such as Bank of America and Capgemini. According to Dion Hinchcliffe, "Ultimately, the OMA creates a standardized approach to enterprise mashups that creates an open and vibrant market for competing runtimes, mashups, and an array of important aftermarket services such as development/testing tools, management and administration appliances, governance frameworks, education, professional services, and so on." == Specification development == The initial focus of the OMA is developing EMML, which is a declarative mashup domain-specific language (DSL) aimed at creating enterprise mashups. The EMML language provides a comprehensive set of high-level mashup-domain vocabulary to consume and mash a variety of web data sources. EMML provides a uniform syntax to invoke heterogeneous service styles: REST, WSDL, RSS/ATOM, RDBMS, and POJO. EMML also provides the ability to mix and match diverse data formats: XML, JSON, JDBC, JavaObjects, and primitive types. The OMA website provides the EMML specification, the EMML schema, a reference runtime implementation capable of running EMML scripts, sample EMML mashup scripts, and technical documentation. The OMA is developing EMML under a Creative Commons Attribution No Derivatives license. The eventual objective of the OMA is to submit the EMML specification and any other OMA specifications to a recognized industry standards body.

Unfold (app)

Unfold is a mobile application that allows users to create social media content using a variety of templates and other tools. It was founded in 2018 by Alfonso Cobo and Andy McCune. It enables users to add photos, video, and text with a variety of tools. In 2019, Unfold was acquired by Squarespace. == History == In January 2017, Alfonso Cobo was studying at Parsons School of Design when he realized there was no software or app that could create a portfolio of his work on an iPad. Cobo created an app called Portfolio, a basic version of a portfolio layout app, and the first one to exist for iPad. He launched it in 2017. After launching the first version of Portfolio, Cobo realized the more popular market and use case was on mobile. Around that time, Instagram was launching Stories. As a result, Cobo pivoted the app away from portfolios and instead focused on an app to showcase one's stories. Cobo later contacted Andy McCune, founder of social media account Earth, to collaborate with Unfold. Unfold also partnered with various companies to create custom templates. These include Equinox, Tommy Hilfiger, NARS, Billboard Music Awards, and Product Red. Unfold also launched a collection of Product Red templates to help eliminate HIV/AIDS in several African countries. In 2019, Squarespace acquired Unfold. The Unfold app has been downloaded over 60 million times and has been used to create over 1 billion Instagram stories. == Features == With Unfold, users can utilize hundreds of templates to make social content for social media platforms such as Instagram, Snapchat, and Facebook. The free app offers users basic templates and standard fonts, filters, and stickers, and there are also premium templates available for a monthly subscription. With Unfold+ and Unfold Pro (previously Unfold for Brands), users can access premium templates and tools, as well as upload custom brand assets and fonts. In 2020, Unfold launched Bio Sites, which allows users to link to multiple sites and platforms.

Enterprise bookmarking

Enterprise bookmarking is a method for Web 2.0 users to tag, organize, store, and search bookmarks of both web pages on the Internet and data resources stored in a distributed database or fileserver. This is done collectively and collaboratively in a process by which users add tag (metadata) and knowledge tags. In early versions of the software, these tags are applied as non-hierarchical keywords, or terms assigned by a user to a web page, and are collected in tag clouds. Examples of this software are Connectbeam and Dogear. New versions of the software such as Jumper 2.0 and Knowledge Plaza expand tag metadata in the form of knowledge tags that provide additional information about the data and are applied to structured and semi-structured data and are collected in tag profiles. == History == Enterprise bookmarking is derived from Social bookmarking that got its modern start with the launch of the website del.icio.us in 2003. The first major announcement of an enterprise bookmarking platform was the IBM Dogear project, developed in Summer 2006. Version 1.0 of the Dogear software was announced at Lotusphere 2007, and shipped later that year on June 27 as part of IBM Lotus Connections. The second significant commercial release was Cogenz in September 2007. Since these early releases, Enterprise bookmarking platforms have diverged considerably. The most significant new release was the Jumper 2.0 platform, with expanded and customizable knowledge tagging fields. == Differences == === Versus social bookmarking === In a social bookmarking system, individuals create personal collections of bookmarks and share their bookmarks with others. These centrally stored collections of Internet resources can be accessed by other users to find useful resources. Often these lists are publicly accessible, so that other people with similar interests can view the links by category or by the tags themselves. Most social bookmarking sites allow users to search for bookmarks which are associated with given "tags", and rank the resources by the number of users which have bookmarked them. Enterprise bookmarking is a method of tagging and linking any information using an expanded set of tags to capture knowledge about data. It collects and indexes these tags in a web-infrastructure knowledge base server residing behind the firewall. Users can share knowledge tags with specified people or groups, shared only inside specific networks, typically within an organization. Enterprise bookmarking is a knowledge management discipline that embraces Enterprise 2.0 methodologies to capture specific knowledge and information that organizations consider proprietary and are not shared on the public Internet. === Tag management === Enterprise bookmarking tools also differ from social bookmarking tools in the way that they often face an existing taxonomy. Some of these tools have evolved to provide Tag management which is the combination of uphill abilities (e.g. faceted classification, predefined tags, etc.) and downhill gardening abilities (e.g. tag renaming, moving, merging) to better manage the bottom-up folksonomy generated from user tagging.