Learning to rank

Learning to rank

Learning to rank (LTR) or machine-learned ranking (MLR) is the application of machine learning, often supervised, semi-supervised or reinforcement learning, in the construction of ranking models for information retrieval and recommender systems. Training data may, for example, consist of lists of items with some partial order specified between items in each list. This order is typically induced by giving a numerical or ordinal score or a binary judgment (e.g. "relevant" or "not relevant") for each item. The goal of constructing the ranking model is to rank new, unseen lists in a similar way to rankings in the training data. == Applications == === In information retrieval === Ranking is a central part of many information retrieval problems, such as document retrieval, collaborative filtering, sentiment analysis, and online advertising. A possible architecture of a machine-learned search engine is shown in the accompanying figure. Training data consists of queries and documents matching them together with the relevance degree of each match. It may be prepared manually by human assessors (or raters, as Google calls them), who check results for some queries and determine relevance of each result. It is not feasible to check the relevance of all documents, and so typically a technique called pooling is used — only the top few documents, retrieved by some existing ranking models are checked. This technique may introduce selection bias. Alternatively, training data may be derived automatically by analyzing clickthrough logs (i.e. search results which got clicks from users), query chains, or such search engines' features as Google's (since-replaced) SearchWiki. Clickthrough logs can be biased by the tendency of users to click on the top search results on the assumption that they are already well-ranked. Training data is used by a learning algorithm to produce a ranking model which computes the relevance of documents for actual queries. Typically, users expect a search query to complete in a short time (such as a few hundred milliseconds for web search), which makes it impossible to evaluate a complex ranking model on each document in the corpus, and so a two-phase scheme is used. First, a small number of potentially relevant documents are identified using simpler retrieval models which permit fast query evaluation, such as the vector space model, Boolean model, weighted AND, or BM25. This phase is called top- k {\displaystyle k} document retrieval and many heuristics were proposed in the literature to accelerate it, such as using a document's static quality score and tiered indexes. In the second phase, a more accurate but computationally expensive machine-learned model is used to re-rank these documents. === In other areas === Learning to rank algorithms have been applied in areas other than information retrieval: In machine translation for ranking a set of hypothesized translations; In computational biology for ranking candidate 3-D structures in protein structure prediction problems; In recommender systems for identifying a ranked list of related news articles to recommend to a user after he or she has read a current news article. == Feature vectors == For the convenience of MLR algorithms, query-document pairs are usually represented by numerical vectors, which are called feature vectors. Such an approach is sometimes called bag of features and is analogous to the bag of words model and vector space model used in information retrieval for representation of documents. Components of such vectors are called features, factors or ranking signals. They may be divided into three groups (features from document retrieval are shown as examples): Query-independent or static features — those features, which depend only on the document, but not on the query. For example, PageRank or document's length. Such features can be precomputed in off-line mode during indexing. They may be used to compute document's static quality score (or static rank), which is often used to speed up search query evaluation. Query-dependent or dynamic features — those features, which depend both on the contents of the document and the query, such as TF-IDF score or other non-machine-learned ranking functions. Query-level features or query features, which depend only on the query. For example, the number of words in a query. Some examples of features, which were used in the well-known LETOR dataset: TF, TF-IDF, BM25, and language modeling scores of document's zones (title, body, anchors text, URL) for a given query; Lengths and IDF sums of document's zones; Document's PageRank, HITS ranks and their variants. Selecting and designing good features is an important area in machine learning, which is called feature engineering. == Evaluation measures == There are several measures (metrics) which are commonly used to judge how well an algorithm is doing on training data and to compare the performance of different MLR algorithms. Often a learning-to-rank problem is reformulated as an optimization problem with respect to one of these metrics. Examples of ranking quality measures: Mean average precision (MAP); DCG and NDCG; Precision@n, NDCG@n, where "@n" denotes that the metrics are evaluated only on top n documents; Mean reciprocal rank; Kendall's tau; Spearman's rho. DCG and its normalized variant NDCG are usually preferred in academic research when multiple levels of relevance are used. Other metrics such as MAP, MRR and precision, are defined only for binary judgments. Recently, there have been proposed several new evaluation metrics which claim to model user's satisfaction with search results better than the DCG metric: Expected reciprocal rank (ERR); Yandex's pfound. Both of these metrics are based on the assumption that the user is more likely to stop looking at search results after examining a more relevant document, than after a less relevant document. == Approaches == Learning to Rank approaches are often categorized using one of three approaches: pointwise (where individual documents are ranked), pairwise (where pairs of documents are ranked into a relative order), and listwise (where an entire list of documents are ordered). Tie-Yan Liu of Microsoft Research Asia has analyzed existing algorithms for learning to rank problems in his book Learning to Rank for Information Retrieval. He categorized them into three groups by their input spaces, output spaces, hypothesis spaces (the core function of the model) and loss functions: the pointwise, pairwise, and listwise approach. In practice, listwise approaches often outperform pairwise approaches and pointwise approaches. This statement was further supported by a large scale experiment on the performance of different learning-to-rank methods on a large collection of benchmark data sets. In this section, without further notice, x {\displaystyle x} denotes an object to be evaluated, for example, a document or an image, f ( x ) {\displaystyle f(x)} denotes a single-value hypothesis, h ( ⋅ ) {\displaystyle h(\cdot )} denotes a bi-variate or multi-variate function and L ( ⋅ ) {\displaystyle L(\cdot )} denotes the loss function. === Pointwise approach === In this case, it is assumed that each query-document pair in the training data has a numerical or ordinal score. Then the learning-to-rank problem can be approximated by a regression problem — given a single query-document pair, predict its score. Formally speaking, the pointwise approach aims at learning a function f ( x ) {\displaystyle f(x)} predicting the real-value or ordinal score of a document x {\displaystyle x} using the loss function L ( f ; x j , y j ) {\displaystyle L(f;x_{j},y_{j})} . A number of existing supervised machine learning algorithms can be readily used for this purpose. Ordinal regression and classification algorithms can also be used in pointwise approach when they are used to predict the score of a single query-document pair, and it takes a small, finite number of values. === Pairwise approach === In this case, the learning-to-rank problem is approximated by a classification problem — learning a binary classifier h ( x u , x v ) {\displaystyle h(x_{u},x_{v})} that can tell which document is better in a given pair of documents. The classifier shall take two documents as its input and the goal is to minimize a loss function L ( h ; x u , x v , y u , v ) {\displaystyle L(h;x_{u},x_{v},y_{u,v})} . The loss function typically reflects the number and magnitude of inversions in the induced ranking. In many cases, the binary classifier h ( x u , x v ) {\displaystyle h(x_{u},x_{v})} is implemented with a scoring function f ( x ) {\displaystyle f(x)} . As an example, RankNet adapts a probability model and defines h ( x u , x v ) {\displaystyle h(x_{u},x_{v})} as the estimated probability of the document x u {\displaystyle x_{u}} has higher quality than x v {\displaystyle x_{v}} : P u , v ( f ) = CDF ( f ( x u ) − f ( x v ) ) , {\displaystyle P_{u,v}(f)={\text{CDF}

Packed pixel

In packed pixel or chunky framebuffer organization, the bits defining each pixel are clustered and stored consecutively. For example, if there are 16 bits per pixel, each pixel is represented in two consecutive (contiguous) 8-bit bytes in the framebuffer. If there are 4 bits per pixel, each framebuffer byte defines two pixels, one in each nibble. The latter example is as opposed to storing a single 4-bit pixel in a byte, leaving 4 bits of the byte unused. If a pixel has more than one channel, the channels are interleaved when using packed pixel organization. Packed pixel displays were common on early microcomputer system that shared a single main memory for both the central processing unit (CPU) and display driver. In such systems, memory was normally accessed a byte at a time, so by packing the pixels, the display system could read out several pixels worth of data in a single read operation. Packed pixel is one of two major ways to organize graphics data in memory, the other being planar organization, where each pixel is made of individual bits stored in their own plane. For a 4-bit color value, memory would be organized as four screen-sized planes of one bit each and a single pixel's value built up by selecting the appropriate bit from each plane. Planar organization has the advantage that the data can be accessed in parallel, and is used when memory bandwidth is an issue.

Full30

Full30 was an American online video-sharing platform primarily dedicated to firearms and shooting sports-related content. The service was established in 2014 by Tim Harmsen and Mark Hammonds as a result of YouTube's increasing restrictions on gun-related videos. == History == After the 2018 Parkland high school shooting, many companies attempted to distance themselves from any association with the firearms industry. As a result, YouTube began demonetizing and sometimes outright deleting firearms-related videos, and in one case, popular YouTube poster Hickok45's channel was completely deleted but later restored. In response, Harmsen, who operates the Military Arms Channel on YouTube, decided to create his own video-hosting website to allow himself and other firearms content creators a platform free from such restrictions; he named the website Full30 — a reference to the popular 30-round STANAG magazine. In July 2020, site representatives announced the site had new ownership. By the end of 2022, the site began to be redirected to a series of other websites. By 2025, it was largely deactivated with the front page replaced by a form to be filled out to receive "updates", with no other explanation. == Contributors == Hickok45 Military Arms Channel Forgotten Weapons Bavarian Shooter Liberty Doll CloverTac

Global digital divide

The global digital divide describes global disparities, primarily between developed and developing countries, in regards to access to computing and information resources such as the Internet and the opportunities derived from such access. The Internet is expanding very quickly, and not all countries—especially developing countries—can keep up with the constant changes. The term "digital divide" does not necessarily mean that someone does not have technology; it could mean that there is simply a difference in technology. These differences can refer to, for example, high-quality computers, fast Internet, technical assistance, or telephone services. == Statistics == There is a large inequality worldwide in terms of the distribution of installed telecommunication bandwidth. In 2014 only three countries (China, US, Japan) host 50% of the globally installed bandwidth potential (see pie-chart Figure on the right). This concentration is not new, as historically only ten countries have hosted 70–75% of the global telecommunication capacity (see Figure). The U.S. lost its global leadership in terms of installed bandwidth in 2011, being replaced by China, which hosts more than twice as much national bandwidth potential in 2014 (29% versus 13% of the global total). == Versus the digital divide == The global digital divide is a special case of the digital divide; the focus is set on the fact that "Internet has developed unevenly throughout the world" causing some countries to fall behind in technology, education, labor, democracy, and tourism. The concept of the digital divide was originally popularized regarding the disparity in Internet access between rural and urban areas of the United States of America; the global digital divide mirrors this disparity on an international scale. The global digital divide also contributes to the inequality of access to goods and services available through technology. Computers and the Internet provide users with improved education, which can lead to higher wages; the people living in nations with limited access are therefore disadvantaged. This global divide is often characterized as falling along what is sometimes called the North–South divide of "northern" wealthier nations and "southern" poorer ones. == Obstacles to a solution == Some people argue that necessities need to be considered before achieving digital inclusion, such as an ample food supply and quality health care. Minimizing the global digital divide requires considering and addressing the following types of access: === Physical access === Involves "the distribution of ICT devices per capita…and land lines per thousands". Individuals need to obtain access to computers, landlines, and networks in order to access the Internet. This access barrier is also addressed in Article 21 of the convention on the Rights of Persons with Disabilities by the United Nations. === Financial access === The cost of ICT devices, traffic, applications, technician and educator training, software, maintenance, and infrastructures require ongoing financial means. Financial access and "the levels of household income play a significant role in widening the gap". === Socio-demographic access === Empirical tests have identified that several socio-demographic characteristics foster or limit ICT access and usage. Among different countries, educational levels and income are the most powerful explanatory variables, with age being a third one. While a Global Gender Gap in access and usage of ICT's exist, empirical evidence shows that this is due to unfavorable conditions concerning employment, education and income and not to technophobia or lower ability. In the contexts understudy, women with the prerequisites for access and usage turned out to be more active users of digital tools than men. In the US, for example, the figures for 2018 show 89% of men and 88% of women use the Internet. === Cognitive access === In order to use computer technology, a certain level of information literacy is needed. Further challenges include information overload and the ability to find and use reliable information. === Design access === Computers need to be accessible to individuals with different learning and physical abilities including complying with Section 508 of the Rehabilitation Act as amended by the Workforce Investment Act of 1998 in the United States. === Institutional access === In illustrating institutional access, Wilson states "the numbers of users are greatly affected by whether access is offered only through individual homes or whether it is offered through schools, community centers, religious institutions, cybercafés, or post offices, especially in poor countries where computer access at work or home is highly limited". === Political access === Guillen & Suarez argue that "democratic political regimes enable faster growth of the Internet than authoritarian or totalitarian regimes." The Internet is considered a form of e-democracy, and attempting to control what citizens can or cannot view is in contradiction to this. Recently situations in Iran and China have denied people the ability to access certain websites and disseminate information. Iran has prohibited the use of high-speed Internet in the country and has removed many satellite dishes in order to prevent the influence of Western culture, such as music and television. === Cultural access === Many experts claim that bridging the digital divide is not sufficient and that the images and language needed to be conveyed in a language and images that can be read across different cultural lines. A 2013 study conducted by Pew Research Center noted how participants taking the survey in Spanish were nearly twice as likely not to use the internet. == Examples == In the early 21st century, residents of developed countries enjoy many Internet services which are not yet widely available in developing countries, including: Mobile phones and small electronic communication devices; E-communities and social-networking; Fast broadband Internet connections, enabling advanced Internet applications; Affordable and widespread Internet access, either through personal computers at home or work, through public terminals in public libraries and Internet cafes, and through wireless access points; E-commerce enabled by efficient electronic payment networks like credit cards and reliable shipping services; Virtual globes featuring street maps searchable down to individual street addresses and detailed satellite and aerial photography; Online research systems which enable users to peruse newspaper and magazine articles that may be centuries old, without having to leave home; Electronic readers such as Kindle, Sony Reader, Samsung Papyrus and Iliad by iRex Technologies; Price engines which help consumers find the best possible online prices and similar services which find the best possible prices at local retailers; Electronic services delivery of government services, such as the ability to pay taxes, fees, and fines online. Further civic engagement through e-government and other sources such as finding information about candidates regarding political situations. == Proposed remedies == There are four specific arguments why it is important to "bridge the gap": Economic equality – For example, the telephone is often seen as one of the most important components, because having access to a working telephone can lead to higher safety. If there were to be an emergency, one could easily call for help if one could use a nearby phone. In another example, many work-related tasks are online, and people without access to the Internet may not be able to complete work up to company standards. The Internet is regarded by some as a basic component of civic life that developed countries ought to guarantee for their citizens. Additionally, welfare services, for example, are sometimes offered via the Internet. Social mobility – Computer and Internet use is regarded as being very important to development and success. However, some children are not getting as much technical education as others, because lower socioeconomic areas cannot afford to provide schools with computer facilities. For this reason, some kids are being separated and not receiving the same chance as others to be successful. Democracy – Some people believe that eliminating the digital divide would help countries become healthier democracies. They argue that communities would become much more involved in events such as elections or decision making. Economic growth – It is believed that less-developed nations could gain quick access to economic growth if the information infrastructure were to be developed and well used. By improving the latest technologies, certain countries and industries can gain a competitive advantage. While these four arguments are meant to lead to a solution to the digital divide, there are a couple of other components that need to be considered. The first one is rural living versus s

Gaumina

Gaumina is the largest interactive agency in the Baltics, providing services of web design, web development, online advertising, video, multimedia, mobile and viral. The company works on projects for Procter & Gamble, Nokia, Nissan, Unilever, YX Energi, 7 Up, Vodafone, MTV, Dunnes Stores, Philip Morris, FIBA Europe as well as Irish public sector. == History == Founded in 1998, Gaumina accounts for 39 percent of the Lithuanian interactive market and has completed more than 2,000 online projects. Since 2004 the company has been operating in the UK and Ireland as Gaumina.co.uk. In 2007 Gaumina gained wide media coverage for winning three awards in three days. A website developed by Gaumina won the Best Social Networking website award at the same the Irish Golden Spiders awards. A website developed by Gaumina was named among the 21 best European multimedia projects of 2007 in the final of Europrix Top Talent Award in Austria. The company was also named one of the winners of the national Innovation Prize 2007, awarding the Lithuania's most innovative companies, in the category of Innovative Enterprise. The agency was named "Digital Agency of the Year" by International advertising festival Golden Hammer in September 2008. The agency also won the main prize at the best at Best Use of Film, Digital Animation or Motion Graphics category by the Irish Golden Spider awards in November 2008. Gaumina is currently managed by CEO Darius Bagdžiūnas.

Sprayprinter

SprayPrinter is a device that attaches to aerosol paint cans whereby users can print images via Bluetooth from a smartphone onto a wall or almost any surface. == History == The technology behind SprayPrinter was developed by Mihkel Joala. He explained in a 2016 interview with New Atlas that his idea was inspired by the modern car engine and the Nintendo Wii console. "Engines nowadays use extremely fast valves to spray fuel to [the] combustion chamber," says Joala. "I realized I can use them to shoot paint with pinpoint accuracy." As of December 2021, the company appears to be no longer selling products. == Awards and Recognitions == In 2015, SprayPrinter received €8,000 from the Estonian prototyping contest Prototron for its initial prototype. In 2016, the SprayPrinter team won the grand prize of €30,000 from the televised pitching competition Ajujaht.

News ticker

A news ticker (sometimes called a crawler, crawl, slide, zipper, ticker tape, or chyron) is a horizontal or vertical (depending on the language's writing system) text-based display either in the form of a graphic that typically resides in the lower third of the screen space on a television station or network (usually during news programming) or as a long, thin scoreboard-style display seen around the facades of some offices or public buildings dedicated to presenting headlines or minor pieces of news. It is an evolution of the paper strips tapes, a continuous paper print-out of stock quotes from a printing telegraph which was mainly used to transmit companies' share price information over telegraph lines before the advance of technology in the 1960s. News tickers have been used in Europe in countries such as United Kingdom, Germany and Ireland for some years; they are also used in several Asian countries and Australia. In the United States, tickers were long used on a special event basis by broadcast television stations to disseminate weather warnings, school closings, and election results. Sports telecasts occasionally used a ticker to update other contests in progress before the expansion of cable news networks and the internet for news content. In addition, some ticker displays are used to relay continuous business and financial information. Most tickers are traditionally displayed in the form of scrolling text running from right to left across the screen or building display (or in the opposite direction for right-to-left writing systems such as Arabic script and Hebrew), allowing for headlines of varying degrees of detail; some used by television broadcasters, however, display stories in a static manner (allowing for the seamless switching of each story individually programmed for display) or utilize a "flipping" effect (in which each individual headline is shown for a few seconds before transitioning to the next, instead of scrolling across the screen, usually resulting in a relatively quicker run through of all of the information programmed into the ticker). Since the growth in usage of the World Wide Web, some news tickers have syndicated news stories posted largely on websites of broadcasters or by other independent news agencies. == Current uses == === Television === The presentation of headlines or other information in a news ticker has become a common element of many different news networks. The use of the ticker has differed on a number of channels: News networks and local newscasts commonly use a setup in which news headlines are scrolled across an area near the bottom of the screen, though some variations have formed, such as showing one headline at a time with a scrolling or "flipper" effect. Financial news channels use two or more tickers displaying company shares prices and business headlines. Networks with a focus on sports often use a slightly different system, where scores and statuses of ongoing and finished games are displayed one by one, along with minor sports highlights, statistics and sports news headlines. They are typically divided into categories devoted to specific leagues and events (with college basketball and football usually focusing on the top 25 ranked teams on the AP Poll, occasionally supplemented by sections for specific conferences). Some programs, including news-based programs emphasizing viewer interactivity, or special events, may also use tickers to display messages and reactions from viewers and others that relate to the program. These comments are often sourced from social networking services such as Facebook and Twitter, typically curating comments from a specific page or hashtag. Due to their current prevalence, they have been occasionally been made targets of pranks and vandalism. In one such example, News 14 Carolina allowed viewers to submit relevant information such as school closings or traffic delays via telephone or the Internet that would be incorporated into the ticker; the system was exploited in February 2004 to display humorous and crude messages, including the infamous "All your base are belong to us". Occasionally messages intended for training accidentally end up being put on the live ticker as happened on BBC News in 2022 when "Weather rain everywhere" and "Manchester United are rubbish" appeared on the live news ticker. Some businesses and organizations have utilized tickers intended for relaying weather-related closings as a surreptitious source for free guerrilla marketing, proclaiming they were open rather than closed and giving their phone number if possible, allowing them to 'advertise' on a television station all day for free. Since then, many stations have required pre-registration of businesses or organizations with an authorized representative and a signed affidavit on company letterhead affirming their authenticity, along with filtering out unfamiliar businesses and organizations, before being able to display their closing announcements. Stations also confirm all closings involving school districts with authorized officials to prevent situations in which students either show up to canceled classes in dangerous conditions, or do not attend school due to an erroneous, prank-submitted, or false listing. === On personal computers === Various applications have been developed over time to install news tickers on personal computer desktops using RSS feeds from news organizations, which are displayed in a fashion similar to those used by television channels but enable the user to access to underlying news stories, a feature not offered by traditional television channels. The Bloomberg Terminal and other financial information-tracking programs and devices also utilize tickers. A ticker may also be used as an unobtrusive method by businesses in order to deliver important information to their staff. The ticker can be set to reappear, stay on screen, or be put into a retractable mode (where a small tab is left visible on-screen). In the United Kingdom, broadcasters have stopped using this technology as other forms of communications have become available and increased in popularity. BBC News and Sky News discontinued their respective desktop tickers in March 2011 and 2012 to focus on other products, such as smartphone applications, to deliver updated information on breaking news and sport stories. === News tickers on buildings === Since the advent of the telegraph, newspapers commonly used their buildings to share the latest headlines. At first simple chalkboard signs were used for bulletins, but limelight illumination, electric lights, magic lantern projections, and other novel techniques were later employed. The method of using electric lights to spell out moving letters was invented by Frank C. Reilly (August 20, 1888 – April 10, 1947) and patented in 1923. Reilly called his invention the Motograph News Bulletin. In 1928, The New York Times installed a Motograph News Bulletin to display news headlines on the sides of Times Tower. The display was 388 feet (118 m) long, 5 feet (1.5 m) high, and employed over 14,800 light bulbs. Popularly known as the "Zipper", the sign remained in use until the building was sold in 1961. The sign was darkened during World War II to comply with wartime lighting restrictions. The Motograph operated until 1994 and was replaced by an electronic version in 1995, which was in turn removed in 2017 due to the replacement of all individual screens on the front of One Times Square with a 350 foot (110 m)-tall LED billboard in 2018. Ticker displays appear today on the exterior of the News Corp Building, which houses the headquarters for Fox News Channel/News Corp in the west extension of Manhattan's Rockefeller Center, as well as one that displays delayed stock market data that is located in Times Square. NASDAQ itself features a large display screen on the facade of the NASDAQ MarketSite building in Times Square. The Reuters buildings at Canary Wharf and in Toronto have news and stock tickers; the latter type features market data for the New York Stock Exchange, NASDAQ and London Stock Exchange, while the Toronto building's ticker also includes quotes from the Toronto Stock Exchange. A red-LED ticker was added to the perimeter of 10 Rockefeller Center in 1994, as the building was being renovated to accommodate the studios for NBC's Today. Placed at the juncture of the first and second floors, the ticker is visible to spectators in Rockefeller Plaza and passersby on West 49th Street and updates continuously, even at times when Today is not being produced and broadcast. As of 2015, the ticker strip is only a small part of a large two-floor LCD video display that is placed within the window of the studio showing promotional information. The Martin Place Headquarters of Seven News, the news division of Australian television broadcaster Seven Network, also incorporates a ticker that wraps around the building. == In popular culture == The use of new