Apertium

Apertium is a free/open-source rule-based machine translation platform. It is free software and released under the terms of the GNU General Public License. == Overview == Apertium is a transfer-based machine translation system, which uses finite state transducers for all of its lexical transformations, and Constraint Grammar taggers as well as hidden Markov models or Perceptrons for part-of-speech tagging / word category disambiguation. A structural transfer component is responsible for word movement and agreement; most Apertium language pairs up until now have used "chunking" or shallow transfer rules, though newer pairs use (possibly recursive) rules defined in a Context-free grammar. Many existing machine translation systems available at present are commercial or use proprietary technologies, which makes them very hard to adapt to new usages. Apertium code and data is free software and uses a language-independent specification, to allow for the ease of contributing to Apertium, more efficient development, and enhancing the project's overall growth. At present (December 2020), Apertium has released 51 stable language pairs, delivering fast translation with reasonably intelligible results (errors are easily corrected). Being an open-source project, Apertium provides tools for potential developers to build their own language pair and contribute to the project. == History == Apertium originated as one of the machine translation engines in the project OpenTrad, which was funded by the Spanish government, and developed by the Transducens research group at the Universitat d'Alacant. It was originally designed to translate between closely related languages, although it has recently been expanded to treat more divergent language pairs. To create a new machine translation system, one just has to develop linguistic data (dictionaries, rules) in well-specified XML formats. Language data developed for it (in collaboration with the Universidade de Vigo, the Universitat Politècnica de Catalunya and the Universitat Pompeu Fabra) currently support (in stable version) the Arabic, Aragonese, Asturian, Basque, Belarusian, Breton, Bulgarian, Catalan, Crimean Tatar, Danish, English, Esperanto, French, Galician, Hindi, Icelandic, Indonesian, Italian, Kazakh, Macedonian, Malaysian, Maltese, Northern Sami, Norwegian (Bokmål and Nynorsk), Occitan, Polish, Portuguese, Romanian, Russian, Sardinian, Serbo-Croatian, Silesian, Slovene, Spanish, Swedish, Tatar, Ukrainian, Urdu, and Welsh languages. A full list is available below. Several companies are also involved in the development of Apertium, including Prompsit Language Engineering, Imaxin Software and Eleka Ingeniaritza Linguistikoa. The project has taken part in the 2009, 2010, 2011, 2012, 2013 and 2014 editions of Google Summer of Code and the 2010, 2011, 2012, 2013, 2014, 2015, 2016 and 2017 editions of Google Code-In. == Translation methodology == This is an overall, step-by-step view how Apertium works. The diagram displays the steps that Apertium takes to translate a source-language text (the text we want to translate) into a target-language text (the translated text). Source language text is passed into Apertium for translation. The deformatter removes formatting markup (HTML, RTF, etc.) that should be kept in place but not translated. The morphological analyser segments the text (expanding elisions, marking set phrases, etc.), and looks up segments in the language dictionaries, returning dictionary forms and tags for all matches. In pairs that involve agglutinative morphology, including a number of Turkic languages, a Helsinki Finite State Transducer (HFST) is used. Otherwise, an Apertium-specific finite state transducer system called lttoolbox, is used. The morphological disambiguator (the morphological analyser and the morphological disambiguator together form the part of speech tagger) resolves ambiguous segments (i.e., when there is more than one match) by choosing one match. Apertium uses Constraint Grammar rules (with the vislcg3 parser) for most of its language pairs. Retokenisation uses a finite state transducer to match sequences of lexical units and may reorder or translate tags (often used for translating idiomatic expressions into something that more approaches the target language grammar) Lexical transfer looks up disambiguated source-language basewords to find their target-language equivalents (i.e., mapping source language to target language). For lexical transfer, Apertium uses an XML-based dictionary format called bidix. Lexical selection chooses between alternative translations when the source text word has alternative meanings. Apertium uses a specific XML-based technology, apertium-lex-tools, to perform lexical selection. Structural transfer (i.e., it is an XML format that allows writing complex structural transfer rules) can consist of one-step chunking transfer, three-step chunking transfer or a CFG-based transfer module. The chunking modules flag grammatical differences between the source language and target language (e.g. gender or number agreement) by creating a sequence of chunks containing markers for this. They then reorder or modify chunks in order to produce a grammatical translation in the target-language. The newer CFG-based module matches input sequences into possible parse trees, selecting the best-ranking one and applying transformation rules on the tree. The morphological generator uses the tags to deliver the correct target language surface form. The morphological generator is a morphological transducer, just like the morphological analyser. A morphological transducer both analyses and generates forms. The post-generator makes any necessary orthographic changes due to the contact of words (e.g. elisions). The reformatter replaces formatting markup (HTML, RTF, etc.) that was removed by the deformatter in the first step. Apertium delivers the target-language translation. == Supported languages == As of June 2026, the following 108 pairs and 51 languages and languages varieties are supported by Apertium.

Cloud load balancing

Cloud load balancing is a type of load balancing that is performed in cloud computing. Cloud load balancing is the process of distributing workloads across multiple computing resources. Cloud load balancing reduces costs associated with document management systems and maximizes availability of resources. It is a type of load balancing and not to be confused with Domain Name System (DNS) load balancing. While DNS load balancing uses software or hardware to perform the function, cloud load balancing uses services offered by various computer network companies. == Comparison With DNS load balancing == Cloud load balancing has an advantage over DNS load balancing as it can transfer loads to servers globally as opposed to distributing it across local servers. In the event of a local server outage, cloud load balancing delivers users to the closest regional server without interruption for the user. Cloud load balancing addresses issues relating to TTL reliance present during DNS load balancing. DNS directives can only be enforced once in every TTL cycle and can take several hours if switching between servers during a lag or server failure. Incoming server traffic will continue to route to the original server until the TTL expires and can create an uneven performance as different internet service providers may reach the new server before other internet service providers. Another advantage is that cloud load balancing improves response time by routing remote sessions to the best performing data centers. == Importance of Load Balancing == Cloud computing brings advantages in "cost, flexibility and availability of service users." Those advantages drive the demand for Cloud services. The demand raises technical issues in Service Oriented Architectures and Internet of Services (IoS)-style applications, such as high availability and scalability. As a major concern in these issues, load balancing allows cloud computing to "scale up to increasing demands" by efficiently allocating dynamic local workload evenly across all nodes. == Load Balancing Techniques == === Scheduling Algorithms === Opportunistic Load Balancing (OLB) is the algorithm that assigns workloads to nodes in free order. It is simple but does not consider the expected execution time of each node. Load balance Min-Min (LBMM) assigns sub-tasks to the node which requires minimum execution time. === Load Balancing Policies === Workload and Client Aware Policy (WCAP) specifies the unique and special property (USP) of requests and computing nodes. With the information of USP, the schedule can decide the most suitable node to complete a request. WCAP makes the most of computing nodes by reducing their idle time. Also, it reduces performance time through searches based on content information. === A Comparative Study of Algorithms === Biased Random Sampling bases its job allocation on the network represented by a directed graph. For each execution node in this graph, in-degree means available resources and out-degree means allocated jobs. In-degree will decrease during job execution while out-degree will increase after job allocation. Active Clustering is a self-aggregation algorithm to rewire the network. The experiment result is that"Active Clustering and Random Sampling Walk predictably perform better as the number of processing nodes is increased" while the Honeyhive algorithm does not show the increasing pattern. == Client-side Load Balancer Using Cloud Computing == Load balancer forwards packets to web servers according to different workloads on servers. However, it is hard to implement a scalable load balancer because of both the "cloud's commodity business model and the limited infrastructure control allowed by cloud providers." Client-side Load Balancer (CLB) solve this problem by using a scalable cloud storage service. CLB allows clients to choose back-end web servers for dynamic content although it delivers static content.

Single address space operating system

In computer science, a single address space operating system (or SASOS) is an operating system that provides only one globally shared address space for all processes. In a single address space operating system, numerically identical (virtual memory) logical addresses in different processes all refer to exactly the same byte of data. In a traditional OS with private per-process address space, memory protection is based on address space boundaries ("address space isolation"). Single address-space operating systems make translation and protection orthogonal, which in no way weakens protection. The core advantage is that pointers (i.e. memory references) have global validity, meaning their meaning is independent of the process using it. This allows sharing pointer-connected data structures across processes, and making them persistent, i.e. storing them on backup store. Some processor architectures have direct support for protection independent of translation. On such architectures, a SASOS may be able to perform context switches faster than a traditional OS. Such architectures include Itanium, and Version 5 of the Arm architecture, as well as capability architectures such as CHERI. A SASOS should not be confused with a flat memory model, which provides no address translation and generally no memory protection. In contrast, a SASOS makes protection orthogonal to translation: it may be possible to name a data item (i.e. know its virtual address) while not being able to access it. SASOS projects using hardware-based protection include the following: Angel IBM i (formerly called OS/400) Iguana at NICTA, Australia Mungi at NICTA, Australia Nemesis Opal Scout Sombrero Related are OSes that provide protection through language-level type safety: Br1X Genera JX a research Java OS Phantom OS Singularity Theseus OS Torsion

Futel

Futel is a public arts organization in Portland, Oregon dedicated to preserving and maintaining public telephone hardware and offering free phone and basic information services. Futel was founded by Karl Anderson, a former software engineer, and Elijah St. Clair. == Technology == Karl Anderson stated that one motivation for the project was to explore the idea of urban furniture. Other reasons were to preserve an important part of hacker history, and to salvage and re-use manufactured items at the end of their lifecycle. The original Futel phones were set up in Portland, Oregon. The organization cleans and repurposes old public payphones which are often salvaged from Craigslist or scrappers. Using interface boxes, they are converted into VoIP phones which are made available publicly, with no cost for phone calls. Anderson has said the service runs on "Asterisk and OpenVPN and a lot of scripts." The payphones operate using publicly-available internet connections. The phones have automated phone trees and users can make a call to local social services, to a weather forecast line, or access local transit information. Volunteers act as telephone operators, offering information about the Futel service, or are available for conversation. Users using Futel's phones may also access voicemail boxes. The system has a "wildcard line" where people can listen to samples of audio left on the main voicemail line along with commentary from Anderson and others. == Network == In February 2021, there were 10 Futel phones in Portland and 3 in other cities. Phones were set up in Detroit and Ypsilanti, Michigan, and Long Beach, Washington. The organization has provided free phone service for a Portland-area homeless encampment after receiving funding from the Awesome Foundation. In 2019 the organization reported their phones being used to make 12,000 phone calls. Futel also said their usage went up and not down during the first year of the COVID-19 pandemic when they outfitted their phone kiosks with handwashing stations and used volunteers to keep the phones clean. The project is funded is primarily through grants and is staffed with volunteers. The project has inspired others such as the PhilTel project in Philadelphia and the RandTel project in Randolph, Vermont. Futel publishes a zine called Party Line.

Web series

A web series, also known as a short-form series or web show, is a collection of short scripted or unscripted online videos released on the Internet (i.e., World Wide Web), generally in episodic form. A single installment of a web series can be called a webisode or an episode. The scale of a web series is small, and a typical episode can be anywhere from 3 to 15 minutes long (though some may run up to 20 minutes). Web series first emerged in the mid-1990s and became more prominent in the early 2000s. Web series are distributed online on video-sharing websites and apps, such as YouTube, Vimeo, and TikTok, and can be watched on devices such as smartphones, tablets, desktops, laptops, and Smart TVs (or television sets connected to the Internet with a media streaming device). They can also be released on social media platforms. Because of the nature of the Internet, a web series may be interactive and immersive. Web series are classified as new media. Web series are different from streaming television series, as the latter are designed to be watched on streaming platforms such as Netflix, Amazon Prime Video, or Hotstar, with the streaming services offering original productions made for and by them, as well as acquiring the rights to distribute licensed content. The length of a streaming television series episode is 30 to 60 minutes (runtimes can also be longer). Although the design of a web series can be similar to that of a television series, its development and production do not entail the same financial investment required for a television series. The popularity of some web series, however, has led to them being optioned for television. Web series differ from short-form content in that the latter are vertical videos specifically designed for smartphone viewing and intended for fast-paced consumption, with runtimes typically ranging from less than one minute to three minutes. There are film festivals for web series, like Webfest Berlin, NYC Web Fest, LA Web Fest, and Vancouver Web Fest. Awards organizations have also been established to celebrate excellence in web series, such as the Streamys, Webbys, IAWTV Awards, and Indie Series Awards. Most major award ceremonies have also created web series and digital media award categories, including the Emmy Awards and the Canadian Screen Awards. == History == === 1990s === In April 1995, "Global Village Idiots", an episode of the reality-based program Rox on public access cable television in Bloomington, Indiana, was uploaded to the Internet, making Rox the first show distributed via the web. The same year, Scott Zakarin created The Spot, an episodic online story that integrated photos, videos, and blogs into the storyline. Likened to Melrose Place-on-the-Web, The Spot featured a rotating cast of characters playing trendy twenty-somethings who rented rooms in a fabled Santa Monica, California beach house called "The Spot". The Spot earned Infoseek's "Cool Site of the Year," an award which later became the Webby. In January 1999, Showtime licensed the animated sci-fi web series WhirlGirl, making it the first independently produced web series licensed by a national television network. In February 1999, the show premiered simultaneously on Showtime and online. The character occasionally appeared on Showtime, for example, hosting a "Lethal Ladies" programming block, but spent most of her time online, appearing in 100 webisodes. === 2000s === As broadband bandwidth increased in speed and availability, delivering high-quality video over the Internet became a reality. In the early 2000s, the Japanese anime industry began broadcasting original net animation (ONA), a type of original video animation (OVA) series, on the Internet. Early examples of the ONA series include Infinite Ryvius: Illusion (2000), Ajimu (2001), and Mahou Yuugi (2001). In 2000, The Brothers Chaps launched the Adobe Flash-created web series Homestar Runner. After being put on hiatus in 2010, it returned in 2014. In 2002, Matt Jolly (better known as "Krinkels") released the first episode of Madness Combat to Newgrounds. The show is still ongoing, with the latest episode "Madness Combat 12: Contravention" released on Twitch in September 2024. In 2003, Microsoft launched MSN Video, offering NBC-related content. Its web series, Weird TV 2000, a spin-off of the syndicated television series Weird TV, featured dozens of shorts, comedy sketches, and mini-documentaries produced exclusively for MSN Video. The video-sharing site YouTube was launched in early 2005, allowing users to share television programs. YouTube co-founder Jawed Karim said the inspiration for YouTube first came from Janet Jackson's role in the 2004 Super Bowl incident, when her breast was exposed during her performance, and later from the 2004 Indian Ocean tsunami. Karim could not easily find video clips of either event online, which led to the idea of a video-sharing site. From 2003 to 2006, many independent web series gained significant popularity, most notably the science fiction series Red vs. Blue by Rooster Teeth. The series was distributed independently via online portals YouTube and Revver, as well as the Rooster Teeth website, acquiring over 100 million social media views during its run. (Rooster Teeth would eventually create the computer-animated web series RWBY in 2013.) In 2004, the adult-animated series Salad Fingers was created, which amassed a cult following. The comedy show The Burg, hailed as the internet's first sitcom and starring Kelli Giddish and Lindsey Broad, rapidly gained an audience and press attention before its creators signed a creation deal with Michael Eisner. The drama Sam Has 7 Friends, which ran in the summer and fall of 2006, was nominated for a Daytime Emmy Award and was temporarily removed from the Internet when it was also acquired by Eisner. In 2004–2005, Spanish producer Pedro Alonso Pablos recorded a series of video interviews featuring actors and directors such as Guillermo del Toro, Santiago Segura, Álex de la Iglesia, and Keanu Reeves, which were distributed through his own website. lonelygirl15, California Heaven, "The Burg", and SamHas7Friends also gained popularity during this time, acquiring audiences in the millions. (Science fiction thriller lonelygirl15 was so successful that it secured a sponsorship deal with Neutrogena in 2007.) In 2004, Stewart St. John, executive producer and head writer of 1990s webisodies The Spot, revived the brand for online audiences as The Spot (2.0), with a new cast, and as a separate soap opera on Sprint PCS Vision-enabled cell phones, creating the first American mobile phone series. St. John and partner Todd Fisher produced over 2,500 daily videos of the mobile soap, driving story lines across platforms to its web counterpart. In 2007, the creators of lonelygirl15 followed up on the show's success with KateModern, a comedy-drama series that debuted on social network Bebo, and took place in the same fictional universe as their previous show. Big Fantastic created and produced the soap opera Prom Queen, financed and distributed by Michael Eisner's production firm Vuguru, and debuted the series on MySpace. Vuguru partnered with Mark Cuban's channel HDNet to release All-for-nots, a mockumentary series by The Burg creators Kathleen Grace and Thom Woodley, which debuted at the SXSW Festival in 2008. These web series highlighted interactivity with the audience in addition to the narrative on relatively low budgets. In contrast, the eight-episode show Sanctuary, starring actor/producer Amanda Tapping, cost $4.3 million to produce. Both Sanctuary and Prom Queen were nominated for a Daytime Emmy Award. Award-winning producer/director Marshall Herskovitz created the drama Quarterlife, which debuted on MySpace and was later distributed on NBC. In 2008, major television studios began releasing web series, such as the ABC comedy show Squeegies, the NBC sci-fi show Gemini Division, and the Bravo reality series The Malan Show. Warner Bros. relaunched The WB as an online network beginning with original mystery web series, Sorority Forever, created and produced by Big Fantastic and executive produced by McG. Meanwhile, MTV announced a new original web series created by Craig Brewer, $5 Cover, that brought together the indie music world and new media expansion. Joss Whedon created, produced, and self-financed musical comedy-drama Dr. Horrible's Sing-Along Blog starring Neil Patrick Harris and Felicia Day. Big Fantastic wrote and produced Foreign Body, a mystery web series that served as a prequel to Robin Cook's novel of the same name. Beckett and Goodfried founded a new Internet studio, EQAL, and produced a spin-off of lonelygirl15 titled LG15: The Resistance. The mainstream press began to provide coverage. In the United Kingdom, KateModern ended its run on Bebo. Bebo also hosted a six-month-long reality travel show, The Gap Year, produced by Endemol UK, and produced an interactive sci-fi drama Kirill for

Cache language model

A cache language model is a type of statistical language model. These occur in the natural language processing subfield of computer science and assign probabilities to given sequences of words by means of a probability distribution. Statistical language models are key components of speech recognition systems and of many machine translation systems: they tell such systems which possible output word sequences are probable and which are improbable. The particular characteristic of a cache language model is that it contains a cache component and assigns relatively high probabilities to words or word sequences that occur elsewhere in a given text. The primary, but by no means sole, use of cache language models is in speech recognition systems. To understand why it is a good idea for a statistical language model to contain a cache component one might consider someone who is dictating a letter about elephants to a speech recognition system. Standard (non-cache) N-gram language models will assign a very low probability to the word "elephant" because it is a very rare word in English. If the speech recognition system does not contain a cache component, the person dictating the letter may be annoyed: each time the word "elephant" is spoken another sequence of words with a higher probability according to the N-gram language model may be recognized (e.g., "tell a plan"). These erroneous sequences will have to be deleted manually and replaced in the text by "elephant" each time "elephant" is spoken. If the system has a cache language model, "elephant" will still probably be misrecognized the first time it is spoken and will have to be entered into the text manually; however, from this point on the system is aware that "elephant" is likely to occur again – the estimated probability of occurrence of "elephant" has been increased, making it more likely that if it is spoken it will be recognized correctly. Once "elephant" has occurred several times, the system is likely to recognize it correctly every time it is spoken until the letter has been completely dictated. This increase in the probability assigned to the occurrence of "elephant" is an example of a consequence of machine learning and more specifically of pattern recognition. There exist variants of the cache language model in which not only single words but also multi-word sequences that have occurred previously are assigned higher probabilities (e.g., if "San Francisco" occurred near the beginning of the text subsequent instances of it would be assigned a higher probability). The cache language model was first proposed in a paper published in 1990, after which the IBM speech-recognition group experimented with the concept. The group found that implementation of a form of cache language model yielded a 24% drop in word-error rates once the first few hundred words of a document had been dictated. A detailed survey of language modeling techniques concluded that the cache language model was one of the few new language modeling techniques that yielded improvements over the standard N-gram approach: "Our caching results show that caching is by far the most useful technique for perplexity reduction at small and medium training data sizes". The development of the cache language model has generated considerable interest among those concerned with computational linguistics in general and statistical natural language processing in particular: recently, there has been interest in applying the cache language model in the field of statistical machine translation. The success of the cache language model in improving word prediction rests on the human tendency to use words in a "bursty" fashion: when one is discussing a certain topic in a certain context, the frequency with which one uses certain words will be quite different from their frequencies when one is discussing other topics in other contexts. The traditional N-gram language models, which rely entirely on information from a very small number (four, three, or two) of words preceding the word to which a probability is to be assigned, do not adequately model this "burstiness". Recently, the cache language model concept – originally conceived for the N-gram statistical language model paradigm – has been adapted for use in the neural paradigm. For instance, recent work on continuous cache language models in the recurrent neural network (RNN) setting has applied the cache concept to much larger contexts than before, yielding significant reductions in perplexity. Another recent line of research involves incorporating a cache component in a feed-forward neural language model (FN-LM) to achieve rapid domain adaptation.

Homeboyz Interactive

Homeboyz Interactive (HBI) was a faith-based recruitment, training and job placement non-profit business in Milwaukee, Wisconsin, United States, founded by a Jesuit brother in 1996 to transform gang members into productive workers. == History == James Holub, a former Jesuit brother affiliated with Wheeling Jesuit University, asked gang members in the Southside of Milwaukee, WI how they could be helped, to break the cycle of poverty and violence. The youth suggested that they be trained for work they found exciting. To attract interest, the training must lead to jobs that paid at least a living wage, and computer skills seemed the most attractive. The non-profit Homeboyz Interactive was established to prepare professionals in web design, application development, and PC/network support. This non-profit outfit spawned the for-profit web design firm HBI Consulting, which provided trainees with work experience. It turned out more than 20 teachers yearly for computer and computer network programs for high schools and other clients, as well as for computer service providers. Some graduates of the program continued their education, some founded their own business, and others continued working at HBI. The Economist described this effort as "turning thugs into programmers" on Milwaukee's South Side, which has proportionally twice as many murders as New York. Holub had "buried his 28th gang member" before he implemented the Homeboyz plan, with the understanding that "nothing stops a bullet like a job." The programs would pass through about 80 prospects a year who successfully completed training and provide them with a job while studying for their high school equivalency test, before they were asked to decide in which direction to go. Most accepted a job or went on to community college but about 25 entered the Homeboyz training for computer programmers. Of first 150 graduates of this program none lost their job; their average pay after two years was US$63,000. Some preferred to return to full-time work at HBI. By 2002, a total of 142 people had graduated from HBI training and moved into full-time IT careers. The training curriculum as of 2000 included JavaScript and Photoshop, among other web-development tools. In 2000, HBI received a 14% ownership stake in reEmploy.com, a payrolling company, in exchange for the development of an electronic time sheet created by the organization. As of 2001, HBI Consulting, the for profit web design firm, had 72 clients. Among those clients were GE Medical, Toyota Forklift, Northwestern Mutual Life, Verizon Wireless, BP; and Marquette University. Companies that graduates of HBI's training programs secured positions have included Northwestern Mutual and Manpower Inc., United Community Center in Milwaukee and EKI Consulting. A pair of graduates also started their own company in 2002, Innovative Source, a web design firm, which itself has had clients such as the University of Wisconsin-Milwaukee and the Milwaukee Women's Center. This was a common path forward, graduates starting their own consulting firms. In 2004, HBI received a grant for General Support from the Vine and Branches Foundation in the amount of US$120,000. The product Project Foundry found its start in the difficulty of managing project-based learning across dozens of students with widely varying levels of skill, a problem encountered by Shane Krukowski, who developed the software while teaching at HBI. Krukowski subsequently an eponymous company to commercialize the software through a subscription-based business model. Some came to Homeboyz through the criminal courts or Department of Corrections. A Jesuit Volunteer (JV) was assigned to work with the program, and to add a spiritual dimension through regular reflection together. Gradually the market began prioritizing graphic design and flash images more than site construction. After 2006 Homeboyz HBI morphed into several spinoffs and ceased to exist as a separate entity.