IPUMS

IPUMS

IPUMS, originally the Integrated Public Use Microdata Series, is the world's largest individual-level population database. IPUMS consists of microdata samples from United States (IPUMS-USA) and international (IPUMS-International) census records, as well as data from U.S. and international surveys. The records are converted into a consistent format and made available to researchers through a web-based data dissemination and analysis system. IPUMS is housed at the Institute for Social Research and Data Innovation (ISRDI), an interdisciplinary research center at the University of Minnesota, under the direction of Professor Steven Ruggles. == Description == IPUMS includes all persons enumerated in the United States censuses from 1850 to 1950 (though, the 1890 census is missing because it was destroyed in a fire) and from the American Community Survey since 2000 and the Current Population Survey since 1962. IPUMS includes household-level data for United States Censuses from 1790 to 1840, due to the first six censuses only including the name of the head of household, with tallied household totals following. IPUMS provides consistent variable names, coding schemes, and documentation across all the samples, facilitating the analysis of long-term change. IPUMS-International includes countries from Africa, Asia, Europe, and Latin America for 1960 forward. The database currently includes more than a billion individuals enumerated in 365 censuses from 94 countries around the world. IPUMS-International converts census microdata for multiple countries into a consistent format, allowing for comparisons across countries and time periods. Special efforts are made to simplify use of the data while losing no meaningful information. Comprehensive documentation is provided in a coherent form to facilitate comparative analyses of social and economic change. Additional databases in the IPUMS family include the: North Atlantic Population Project (NAPP) IPUMS National Historical Geographic Information System (NHGIS) IPUMS Health Surveys IPUMS Global Health IPUMS Time Use The Journal of American History described the effort as "One of the great archival projects of the past two decades." Liens Socio, the French portal for the social sciences, gave IPUMS the only “best site” designation that has gone to any non-French website, writing “IPUMS est un projet absolument extraordinaire...époustouflante [mind-blowing]!” The official motto of IPUMS is "use it for good, never for evil." All public IPUMS data and documentation are available online free of charge.

Local coordinates

Local coordinates are the ones used in a local coordinate system or a local coordinate space. Simple examples: Houses. In order to work in a house construction, the measurements are referred to a control arbitrary point that will allow to check it: stick/sticks on the ground, steel bar, nails... Addresses. Using house numbers to locate a house on a street; the street is a local coordinate system within a larger system composed of city townships, states, countries, postal codes, etc. Local systems exist for convenience. On ancient times, every work was made on relative bases as there was no conception of global systems. Practically, it is better to use local systems for small works as houses, buildings... For most of the applications, it is desired the position of one element relative to one building or location, and in a more local way, relative to one furniture or person. In a regular way, you will not give your position by geographical coordinates rather than "I am 15 meters away of the entry to the building". So it is a pretty common way to locate things. It is possible to bring latitude and longitude for all terrestrial locations, but unless one has a highly precise GPS device or you make astronomical observations, this is impractical. It is much simpler to use a tape, a rope, a chain... The position information (global) should be transformed into a location. Position refers to a numeric or symbolic description within a spatial reference system, whereas location refers to information about surrounding objects and their interrelationships. (Topological space) == Use == In computer graphics and computer animation, local coordinate spaces are also useful for their ability to model independently transformable aspects of geometrical scene graphs. When modeling a car, for example, it is desirable to describe the center of each wheel with respect to the car's coordinate system, but then specify the shape of each wheel in separate local spaces centered about these points. This way, the information describing each wheel can be simply duplicated four times, and independent transformations (e.g., steering rotation) can be similarly effected. Bounding volumes of objects may be described more accurately using extents in the local coordinates, (i.e. an object oriented bounding box, contrasted with the simpler axis aligned bounding box). The trade-off for this flexibility is additional computational cost: the rendering system must access the higher-level coordinate system of the car and combine it with the space of each wheel in order to draw everything in its proper place. Local coordinates also afford digital designers a means around the finite limits of numerical representation. The tread marks on a tire, for example, can be described using millimeters by allowing the whole tire to occupy the entire range of numeric precision available. The larger aspects of the car, such as its frame, might be described in centimeters, and the terrain that the car travels on could be specified in meters. In differential topology, local coordinates on a manifold are defined by means of an atlas of charts. The basic idea behind coordinate charts is that each small patch of a manifold can be endowed with a set of local coordinates. These are collected together into an atlas, and stitched together in such a way that they are self-consistent on the manifold. In Cartography and Maps, the traditional way of works are local datum. With a local datum the land can be mapped on relative small areas as a country. With the need of global systems, the transformations on between datum became a problem, so geodetic datum have been created. More than 150 local datum have been used in the world.

Birkhoff algorithm

Birkhoff's algorithm (also called Birkhoff-von-Neumann algorithm) is an algorithm for decomposing a bistochastic matrix into a convex combination of permutation matrices. It was published by Garrett Birkhoff in 1946. It has many applications. One such application is for the problem of fair random assignment: given a randomized allocation of items, Birkhoff's algorithm can decompose it into a lottery on deterministic allocations. == Terminology == A bistochastic matrix (also called: doubly-stochastic) is a matrix in which all elements are greater than or equal to 0 and the sum of the elements in each row and column equals 1. An example is the following 3-by-3 matrix: ( 0.2 0.3 0.5 0.6 0.2 0.2 0.2 0.5 0.3 ) {\displaystyle {\begin{pmatrix}0.2&0.3&0.5\\0.6&0.2&0.2\\0.2&0.5&0.3\end{pmatrix}}} A permutation matrix is a special case of a bistochastic matrix, in which each element is either 0 or 1 (so there is exactly one "1" in each row and each column). An example is the following 3-by-3 matrix: ( 0 1 0 0 0 1 1 0 0 ) {\displaystyle {\begin{pmatrix}0&1&0\\0&0&1\\1&0&0\end{pmatrix}}} A Birkhoff decomposition (also called: Birkhoff-von-Neumann decomposition) of a bistochastic matrix is a presentation of it as a sum of permutation matrices with non-negative weights. For example, the above matrix can be presented as the following sum: 0.2 ( 0 1 0 0 0 1 1 0 0 ) + 0.2 ( 1 0 0 0 1 0 0 0 1 ) + 0.1 ( 0 1 0 1 0 0 0 0 1 ) + 0.5 ( 0 0 1 1 0 0 0 1 0 ) {\displaystyle 0.2{\begin{pmatrix}0&1&0\\0&0&1\\1&0&0\end{pmatrix}}+0.2{\begin{pmatrix}1&0&0\\0&1&0\\0&0&1\end{pmatrix}}+0.1{\begin{pmatrix}0&1&0\\1&0&0\\0&0&1\end{pmatrix}}+0.5{\begin{pmatrix}0&0&1\\1&0&0\\0&1&0\end{pmatrix}}} Birkhoff's algorithm receives as input a bistochastic matrix and returns as output a Birkhoff decomposition. == Tools == A permutation set of an n-by-n matrix X is a set of n entries of X containing exactly one entry from each row and from each column. A theorem by Dénes Kőnig says that: Every bistochastic matrix has a permutation-set in which all entries are positive.The positivity graph of an n-by-n matrix X is a bipartite graph with 2n vertices, in which the vertices on one side are n rows and the vertices on the other side are the n columns, and there is an edge between a row and a column if the entry at that row and column is positive. A permutation set with positive entries is equivalent to a perfect matching in the positivity graph. A perfect matching in a bipartite graph can be found in polynomial time, e.g. using any algorithm for maximum cardinality matching. Kőnig's theorem is equivalent to the following:The positivity graph of any bistochastic matrix admits a perfect matching.A matrix is called scaled-bistochastic if all elements are non-negative, and the sum of each row and column equals c, where c is some positive constant. In other words, it is c times a bistochastic matrix. Since the positivity graph is not affected by scaling:The positivity graph of any scaled-bistochastic matrix admits a perfect matching. == Algorithm == Birkhoff's algorithm is a greedy algorithm: it greedily finds perfect matchings and removes them from the fractional matching. It works as follows. Let i = 1. Construct the positivity graph GX of X. Find a perfect matching in GX, corresponding to a positive permutation set in X. Let z[i] > 0 be the smallest entry in the permutation set. Let P[i] be a permutation matrix with 1 in the positive permutation set. Let X := X − z[i] P[i]. If X contains nonzero elements, Let i = i + 1 and go back to step 2. Otherwise, return the sum: z[1] P[1] + ... + z[2] P[2] + ... + z[i] P[i]. The algorithm is correct because, after step 6, the sum in each row and each column drops by z[i]. Therefore, the matrix X remains scaled-bistochastic. Therefore, in step 3, a perfect matching always exists. == Run-time complexity == By the selection of z[i] in step 4, in each iteration at least one element of X becomes 0. Therefore, the algorithm must end after at most n2 steps. However, the last step must simultaneously make n elements 0, so the algorithm ends after at most n2 − n + 1 steps, which implies O ( n 2 ) {\displaystyle O(n^{2})} . In 1960, Joshnson, Dulmage and Mendelsohn showed that Birkhoff's algorithm actually ends after at most n2 − 2n + 2 steps, which is tight in general (that is, in some cases n2 − 2n + 2 permutation matrices may be required). == Application in fair division == In the fair random assignment problem, there are n objects and n people with different preferences over the objects. It is required to give an object to each person. To attain fairness, the allocation is randomized: for each (person, object) pair, a probability is calculated, such that the sum of probabilities for each person and for each object is 1. The probabilistic-serial procedure can compute the probabilities such that each agent, looking at the matrix of probabilities, prefers his row of probabilities over the rows of all other people (this property is called envy-freeness). This raises the question of how to implement this randomized allocation in practice? One cannot just randomize for each object separately, since this may result in allocations in which some people get many objects while other people get no objects. Here, Birkhoff's algorithm is useful. The matrix of probabilities, calculated by the probabilistic-serial algorithm, is bistochastic. Birkhoff's algorithm can decompose it into a convex combination of permutation matrices. Each permutation matrix represents a deterministic assignment, in which every agent receives exactly one object. The coefficient of each such matrix is interpreted as a probability; based on the calculated probabilities, it is possible to pick one assignment at random and implement it. == Extensions == The problem of computing the Birkhoff decomposition with the minimum number of terms has been shown to be NP-hard, but some heuristics for computing it are known. This theorem can be extended for the general stochastic matrix with deterministic transition matrices. Budish, Che, Kojima and Milgrom generalize Birkhoff's algorithm to non-square matrices, with some constraints on the feasible assignments. They also present a decomposition algorithm that minimizes the variance in the expected values. Vazirani generalizes Birkhoff's algorithm to non-bipartite graphs. Valls et al. showed that it is possible to obtain an ϵ {\displaystyle \epsilon } -approximate decomposition with O ( log ⁡ ( 1 / ϵ 2 ) ) {\displaystyle O(\log(1/\epsilon ^{2}))} permutations.

Anyword

Anyword is a technology company that offers an artificial intelligence platform, using natural language processing to generate and optimize marketing text for websites, social media, email, and ads. The company also offers a complete managed service to publishers and brands to help them increase their revenue through social ads. It is used by National Geographic, Red Bull, The New York Times, BBC, Ted Baker, etc. The company has an office in New York, and Tel Aviv. == History == It was founded in 2013 — its original name was Keywee Inc. In March 2015, Anyword received $9.1 million in the Series A funding round led by a notable group of investors. In July 2016, the company was selected as an official Facebook Marketing Partner. In August 2019, Anyword was named Best Content Marketing Platform in the Digiday Technology Award winners. In November 2021, it raised $21 million in its Series B funding round.

Subject (documents)

In library and information science documents (such as books, articles and pictures) are classified and searched by subject – as well as by other attributes such as author, genre and document type. This makes "subject" a fundamental term in this field. Library and information specialists assign subject labels to documents to make them findable. There are many ways to do this and in general there is not always consensus about which subject should be assigned to a given document. To optimize subject indexing and searching, we need to have a deeper understanding of what a subject is. The question: "what is to be understood by the statement 'document A belongs to subject category X'?" has been debated in the field for more than 100 years (see below) == Theoretical view == === Charles Ammi Cutter (1837–1903) === For Cutter the stability of subjects depends on a social process in which their meaning is stabilized in a name or a designation. A subject "referred [...] to those intellections [...] that had received a name that itself represented a distinct consensus in usage" (Miksa, 1983a, p. 60) and: the "systematic structure of established subjects" is "resident in the public realm" (Miksa, 1983a, p. 69); "[s]ubjects are by their very nature locations in a classificatory structure of publicly accumulated knowledge (Miksa, 1983a, p. 61). Bernd Frohmann adds: "The stability of the public realm in turn relies upon natural and objective mental structures which, with proper education, govern a natural progression from particular to general concepts. Since for Cutter, mind, society, and SKO [Systems of Knowledge Organization] stand one behind the other, each supporting each, all manifesting the same structure, his discursive construction of subjects invites connections with discourses of mind, education, and society. The Dewey Decimal Classification (DDC), by contrast, severs those connections. Melvil Dewey emphasized more than once that his system maps no structure beyond its own; there is neither a "transcendental deduction" of its categories nor any reference to Cutter's objective structure of social consensus. It is content-free: Dewey disdained any philosophical excogitation of the meaning of his class symbols, leaving the job of finding verbal equivalents to others. His innovation and the essence of the system lay in the notation. The DDC is a poorly semiotic system of expanding nests of ten digits, lacking any referent beyond itself. In it, a subject is wholly constituted in terms of its position in the system. The essential characteristic of a subject is a class symbol which refers only to other symbols. Its verbal equivalent is accidental, a merely pragmatic characteristic... .... The conflict of interpretations over "subjects" became explicit in the battles between "bibliography" (an approach to subjects having much in common with Cutter's) and Dewey's "close classification". William Fletcher spoke for the scholarly bibliographer.... Fletcher's "subjects", like Cutter's, referred to the categories of a fantasized, stable social order, whereas Dewey's subjects were elements of a semiological system of standardized, techno-bureaucratic administrative software for the library in its corporate, rather than high culture, incarnation". (Frohmann, 1994, 112–113). Cutter's early view on what a subject is, is probably wiser than most understandings that dominated the 20th century – and also the understanding reflected in the ISO-standard quoted below. The early statements quoted by Frohmann indicate that subjects are somehow shaped in social processes. When that is said, it should be added that they are not particularly detailed or clear. We only get a vague idea of the social nature of subjects. === S. R. Ranganathan (1892–1972) === A classification system with an explicit theoretical foundation is Ranganathan's Colon Classification. Ranganathan provided an explicit definition of the concept of "subject": Subject – an organized body of ideas, whose extension and intension are likely to fall coherently within the field of interests and comfortably within the intellectual competence and the field of inevitable specialization of a normal person. A related definition is given by one of Ranganathan's students: A subject is an organized and systematized body of ideas. It may consist of one idea or a combination of several... Ranganathan's definition of "subject" is strongly influenced by his Colon Classification system. The colon system is based on the combination of single elements from facets to subject designation. This is the reason why the combined nature of subjects are emphasized so strongly. It leads, however, to absurdities such as the claim that gold cannot be a subject (but is alternatively termed "an isolate"). This aspect of the theory has been criticized by Metcalfe (1973, p. 318). Metcalfe's skepticism regarding Ranganathan's theory is formulated in hard words (op. cit., p. 317): "This pseudo-science imposed itself on British disciples from about 1950 on...". It seems unacceptable that Ranganathan defines the word subject in a way that favors his own system. A scientific concept like "subject" should make it possible to compare different ways of establishing access to information. Whether or not subjects are combined or not should be examined once their definition has been given, it should not determined a priori, in the definition. Besides the emphasis on the combined, organizing and systematizing nature of subjects contains Ranganathan's definition of subject the pragmatic demand, that a subject should be determined in a way that suits a normal person's competency or specialization. Again we see a strange kind of wishful thinking mixing a general understanding of a concept with demands put by his own specific system. One thing is what the word subject means, quite another issue is how to provide subject descriptions that fulfill demands such as the specificity of a given information retrieval language which fulfill demands put on the system, such as precision and recall. If researchers too often define terms in ways that favor specific kinds of systems, that are such definitions not useful to provide more general theories about subjects, subject analysis and IR. Among other things are comparative studies of different kinds of systems made difficult. Based on these arguments, as well as additional arguments which have been used in the literature, we may conclude that Ranganathan's definition of the concept "subject" is not suited for scientific use. Like the definition of "subject" given by the ISO-standard for topic maps, may Ranganathan's definition be useful within his own closed system. The purpose of a scientific and scholarly field is, however, to examine the relative fruitfulness of systems such as topic maps and Colon Classification. For such purpose is another understanding of "subject" necessary. === Patrick Wilson (1927–2003) === In his book Wilson (1968) examined – in particular by thought experiments – the suitability of different methods of examining the subject of a document. The methods were: identifying the author's purpose for writing the document, weighing the relative dominance and subordination of different elements in the picture, which the reading imposes on the reader, grouping or count the document's use of concepts and references, construing a set of rules for selecting elements deemed necessary (as opposed to unnecessary) for the work as a whole. Patrick Wilson shows convincingly that each of these methods are insufficient to determine the subject of a document and is led to conclude ( p. 89): "The notion of the subject of a writing is indeterminate..." or, on p. 92 (about what users may expect to find using a particular position in a library classification system): "For nothing definite can be expected of the things found at any given position". In connection to the last quote has Wilson an interesting footnote in which he writes that authors of documents often use terms in ambiguous ways ("hostility" is used as an example). Even if the librarian could personally develop a very precise understanding of a concept, he would be unable to use it in his classification, because none of the documents use the term in the same precise way. Based on this argumentation is Wilson led to conclude: "If people write on what are for them ill-defined phenomena, a correct description of their subjects must reflect the ill-definedness". Wilson's concept of subject was discussed by Hjørland (1992) who found that it is problematic to give up the precise understanding of such a basic term in LIS. Wilson's arguments led him to an agnostic position which Hjørland found unacceptable and unnecessary. Concerning the authors' use of ambiguous terms, the role of the subject analysis is to determine which documents would be fruitful for users to identify whether or not the documents use one or another term or whether a given term i

Stairstep interpolation

In the field of image processing, stairstep interpolation is a widely employed method technique for interpolating pixels after enlarging an image. The fundamental concept is to interpolate multiple times, in small increments, using any interpolation algorithm that is better than nearest-neighbor interpolation such as; bilinear interpolation, and bicubic interpolation. A common scenario is to interpolate an image by using a bicubic interpolation which increases the image size by no more than 10% (110% of the original size) at a time until the desired size is reached. Fred Miranda, a developer, popularized this method by creating and developing several Photoshop plug-ins that incorporate this technique. == Example ==

Education by algorithm

Education by algorithm refers to automated solutions that algorithmic agents or social bots offer to education, to assist with mundane educational tasks. These are often instrumentalist “educational reforms” or “curriculum transformations”, which have been implemented by policy makers and are supported by proprietary education technologies. New educational policies, mandated by transnational governance forums (like the OECD), have manufactured a connection between economies and education. Governments, schools and universities are expected to introduce or prepare students for an “unknown future”, to “future proof” them against an identified issue or to mitigate a national crisis. Technologies are seen as a catalyst to effect these changes. However, these policies mask a deeper problem, which include the assetization of education and the use of technologies as a means for surveillance and behavior modification. The traces that students and leave, through cookies, logins learning activities, assignments and tests, are collected, facetted, and shared with commercial organizations by these agents, to both predict future behavior and shape it. Techno solutionist thinking has led to managers adopting educational policies and reforms, and looking towards technologies to act as disrupters, liberators or agents to improve efficiency. During the COVID-19 pandemic, many more students had to modify their learning and working circumstances to protect themselves. Academics shifted their assessment practices from the dominant assessment of learning paradigm to an orientation that saw value in "assessment for learning". Big tech assisted, and teaching infrastructure became further privatized, and unbundling of education provision went a step further. Following the return to class, this assessment paradigm became rationalised in education. Leaving the space for algorithmic agents to step in. Academics work was increasingly driven by learning experience platforms and student understanding was extended through interleaving, behavior modification nudges and rewards and scheduled high stakes assessments. This data collection may also be construed as surveillance., or perceived as evidence of a Fourth Industrial Revolution