AI Face Mix

AI Face Mix — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Conversica

    Conversica

    Conversica is a US-based cloud software technology company, headquartered in San Mateo, California, that provides two-way AI-driven conversational software and a suite of Intelligent Virtual Assistants for businesses to engage customers via email, chat, and SMS. == History == 2007: The company was founded by Ben Brigham in Bellingham, Washington, originally as AutoFerret.com. The company's initial product was a Customer Relationship Management (CRM) targeted at automotive dealerships. This soon expanded to lead generation, and then lead validation and qualification. The AI Conversica uses currently was made to follow up on and filter out low-quality leads. The focus of the company shifted toward this automated lead engagement technology. 2010: The company started commercially selling AVA, the first Automated Virtual Assistant for sales, and the company name was changed to AVA.ai. Early customers for AVA were automotive dealerships. As the company moved away from generating leads themselves, and providing the CRM themselves, it became necessary to integrate with existing CRM and Marketing Automation platforms, such as DealerSocket, VinSolutions and Salesforce. 2013: The company raised $16m Series A funding, led by Kennet Partners, and named Mark Bradley as CEO. It also moved its headquarters from Bellingham, Washington to Foster City, California. 2014: The company changed its name from AVA.ai to Conversica. 2015: Alex Terry joined Conversica as its CEO. The business expanded to include customers in additional verticals, including technology, education, and financial services. 2016: The company raised $34m Series B funding, led by Providence Strategic Growth. 2017: Conversica expanded its intelligent automation platform and IVAs to support additional communication channels (e-mail and SMS text messaging) and communication languages. Conversica also opened a new technology center in Seattle, Washington to expand its AI and machine learning capabilities. 2018: The company raised $31m Series C funding, led by Providence Strategic Growth. Conversica also acquired Intelligens.ai, providing a regional presence in Latin America with an office in Las Condes, Santiago, Chile. The company launched an AI-powered Admissions Assistant for Higher Education industry. 2019: Conversica was selected by Fast Company magazine as one of the Top 10 Most Innovative AI Companies in the World, and was named Marketo's Technology Partner of the Year. The company officially expanded into the EMEA region with the opening of a London office. As of August 2019, Conversica has over 50 different integrations with third parties. In October Conversica won three awards at the fourth annual Global Annual Achievement Awards for Artificial Intelligence. Also that month, Alex Terry stepped down from his role as CEO and was replaced by Jim Kaskade. 2020: As part of Conversica's response to COVID-19, they optimized the business to become profitable in both 2Q20 and 3Q20, before reinvesting in 4Q20. The company transitioned both international operations in EMEA and LATAM to an indirect model with partners (LeadFabric and Nectia Cloud Solutions respectively), and moved a portion of its US-based employees to near-shore centers in Mexico and Brazil, effectively downsizing the company from 250 to 200. Conversica's reseller partner, Nectia, is a major Latin American affiliate and Chile's number one Salesforce partner, and, as part of the partnership, Nectia devoted capital to a brand new company segment, Predict-IA, dedicated to web-based artificial intelligent solutions. Predict-IA was able to immediately service all LATAM opportunities and clients with Conversica's AI Assistants with end-to-end services (marketing, sales, professional services, customer success, and technical support). Conversica's reseller partner, Leadfabric, has offices in Belgium, Amsterdam, Paris, UK, Taiwan, and Romania. == Technology == Conversica's Revenue Digital Assistants™ are AI assistants who engage with leads, prospects, customers, employees, and other persons of interest (Contacts) in a two-way human-like manner, via email, SMS text, and website chat, in English, French, German, Spanish, Portuguese, and Japanese. The RDAs are built on an Intelligent Automation platform that leverages natural language understanding, natural language processing, natural language generation, deep learning and machine learning. The Assistants are generally deployed alongside sales and marketing, customer success, account management, and higher education admissions teams, as part of an augmented workforce. The Intelligent Automation platform integrates with over 50 external systems, including CRM, Marketing Automation, and other systems of record. A partial list of integration partners includes: Salesforce, Marketo, Oracle, HubSpot, DealerSocket, Reynolds & Reynolds, CDK Global, VinSolutions and many more.

    Read more →
  • MaPS S.A.

    MaPS S.A.

    MaPS S.A. is a software editor founded in 2011 by Thierry Muller. The company is headquartered in Luxembourg. Its platform, called MaPS System, provides Data Management software for Multichannel Marketing. == History and funding == The first version of MaPS System was released under the agency Prem1um S.A. in 2005 in the partnership with Pingroom agency. In combination with MaPS System, Prem1um also provided various consulting services in Marketing, Publishing and Sales. This is where MaPS System takes its names (M stands for Marketing, P for Publishing and S for Sales). In 2011, after being successful, Prem1um S.A. decided to enable the software MaPS System to operate independently under MaPS S.A., as a separate company and editor of the software. The first financial supports were provided by Malta ICI, a Venture Capital firm, and the local partner Chameleon Invest, a seed-capital fund led by Business Angels, who invested €900,000. In a second investment round in 2014 led by Newion Investments, a Venture Capital firm, €1.4 Million were raised, thus amounting to total assets of €2.2 Million. In 2016, the company was taken over by three private investors. In 2018, after two years of continuous growth and European expansion in Belgium, Germany and Switzerland, MaPS S.A acquired Awevo, an e-commerce web agency. == Products == The services included in MaPS System range from the data centralization, Data Governance to an optimized Multichannel Marketing. The software currently includes more than 35 modules for Master Data Management, Product Information Management, Digital Asset Management, Business Process Management including catalogue Publishing features. == Certifications == In 2019, MaPS System and Awevo received "Made in Luxembourg" label, given to the companies whose services are entirely designed in Luxembourg, without any production or development offshoring. MaPS System is a member of ICT Cluster by Luxinnovation.

    Read more →
  • Algorithm

    Algorithm

    In mathematics and computer science, an algorithm ( ) is a finite sequence of mathematically rigorous instructions, typically used to solve a class of specific problems or to perform a computation. Algorithms are used as specifications for performing calculations and data processing. More advanced algorithms can use conditionals to divert the code execution through various routes (referred to as automated decision-making) and deduce valid inferences (referred to as automated reasoning). In contrast, a heuristic is an approach to solving problems without well-defined correct or optimal results. For example, although social media recommender systems are commonly called "algorithms", they actually rely on heuristics as there is no truly "correct" recommendation. As an effective method, an algorithm can be expressed within a finite amount of space and time and in a well-defined formal language for calculating a function. Starting from an initial state and input, a computation occurs at each step, eventually producing output and terminating. The transition between states can be non-deterministic; randomized algorithms incorporate random input. == Etymology == Around 825 AD, Persian scientist and polymath Muḥammad ibn Mūsā al-Khwārizmī wrote kitāb al-ḥisāb al-hindī ("Book of Indian computation") and kitab al-jam' wa'l-tafriq al-ḥisāb al-hindī ("Addition and subtraction in Indian arithmetic"). In the early 12th century, Latin translations of these texts involving the Hindu–Arabic numeral system and arithmetic appeared, for example Liber Alghoarismi de practica arismetrice, attributed to John of Seville, and Liber Algoritmi de numero Indorum, attributed to Adelard of Bath. Here, alghoarismi or algoritmi is the Latinization of Al-Khwarizmi's name; the text starts with the phrase Dixit Algoritmi, or "Thus spoke Al-Khwarizmi". The word algorism in English came to mean the use of place-value notation in calculations; it occurs in the Ancrene Wisse from circa 1225. By the time Geoffrey Chaucer wrote The Canterbury Tales in the late 14th century, he used a variant of the same word in describing augrym stones, stones used for place-value calculation. In the 15th century, under the influence of the Greek word ἀριθμός (arithmos, "number"; cf. "arithmetic"), the Latin word was altered to algorithmus. By 1596, this form of the word was used in English, as algorithm, by Thomas Hood. == Definition == One informal definition is "a set of rules that precisely defines a sequence of operations", which would include all computer programs, and any bureaucratic procedure or cook-book recipe. In general, a program is an algorithm only if it stops eventually. Formally, algorithm is an explicit set of instructions to produce an output, that can be followed by a computer or a human performing specific operations on symbols.. == History == === Ancient algorithms === Step-by-step procedures for solving mathematical problems have been recorded since antiquity. This includes in Babylonian mathematics (around 2500 BC), Egyptian mathematics (around 1550 BC), Indian mathematics (around 800 BC and later), the Ifa Oracle (around 500 BC), Greek mathematics (around 240 BC), Chinese mathematics (around 200 BC and later), and Arabic mathematics (around 800 AD). The earliest evidence of algorithms is found in ancient Mesopotamian mathematics. A Sumerian clay tablet found in Shuruppak near Baghdad and dated to c. 2500 BC describes the earliest division algorithm. During the Hammurabi dynasty c. 1800 – c. 1600 BC, Babylonian clay tablets described algorithms for computing formulas. Algorithms were also used in Babylonian astronomy. Babylonian clay tablets describe and employ algorithmic procedures to compute the time and place of significant astronomical events. Algorithms for arithmetic are also found in ancient Egyptian mathematics, dating back to the Rhind Mathematical Papyrus c. 1550 BC. Algorithms were later used in ancient Hellenistic mathematics. Two examples are the Sieve of Eratosthenes, which was described in the Introduction to Arithmetic by Nicomachus, and the Euclidean algorithm, which was first described in Euclid's Elements (c. 300 BC).Examples of ancient Indian mathematics included the Shulba Sutras, the Kerala School, and the Brāhmasphuṭasiddhānta. In the 9th century, Muḥammad ibn Mūsā al-Khwārizmī revolutionized the field by establishing the algorithm as a systematic, finite sequence of logical steps to solve mathematical problems. In his influential work, The Compendious Book on Calculation by Completion and Balancing, he moved beyond specific numerical solutions to introduce general procedures for algebraic reduction and balancing. This transformed mathematics into a 'mechanical' process of well-defined rules—a fundamental shift that laid the groundwork for modern algorithmic theory. The Latin translation of his arithmetic treatise, titled Algoritmi de numero Indorum, led to the term algorithm being derived from the Latinization of his name, Algoritmi, specifically to describe this new rule-based approach to mathematics. The first cryptographic algorithm for deciphering encrypted code was developed by Al-Kindi, a 9th-century Arab mathematician, in A Manuscript On Deciphering Cryptographic Messages. He gave the first description of cryptanalysis by frequency analysis, the earliest codebreaking algorithm. === Computers === ==== Weight-driven clocks ==== Weight-driven clocks were a key European invention in Middle Ages, specifically the verge escapement mechanism producing the tick of mechanical clocks. Accurate automatic machines led to mechanical automata in the 13th century and computational machines—the difference and analytical engines of Charles Babbage and Ada Lovelace in the mid-19th century. Lovelace designed the first algorithm intended for a computer, Babbage's analytical engine, the first real Turing-complete computer, more than the mechanical calculators of the time. Although the full implementation of Babbage's second device was only built decades after her lifetime, Lovelace has been called "history's first programmer". ==== Electromechanical relay ==== The Jacquard loom, a precursor to punch cards, and telephone switching machines led to the development of the first computers. By the mid-19th century, the telegraph, was in use throughout the world. By the late 19th century, ticker tape (c. 1870s) and punch cards (c. 1890) were developed. Then came the teleprinter (c. 1910) with its punched-paper use of Baudot code on tape. Telephone-switching networks of electromechanical relays were invented in 1835. These led to the invention of the digital adding device by George Stibitz in 1937. While working in Bell Laboratories, he observed the "burdensome" use of mechanical calculators with gears, prompting him to experiment create an experimental digital adder at home. === Formalization === In 1928, a partial formalization of the modern concept of algorithms began with attempts to solve David Hilbert's Entscheidungsproblem (decision problem). Later formalizations were framed as attempts to define "effective calculability" or "effective method". Those formalizations included the Gödel–Herbrand–Kleene recursive functions of 1930, 1934 and 1935, Alonzo Church's lambda calculus of 1936, Emil Post's Formulation 1 of 1936, and Alan Turing's Turing machines of 1936–37 and 1939. === Modern Algorithms === For decades, it was assumed that algorithm evolution progresses from heuristics to formal algorithms. A Symbolic integration provides a classic illustration. In 1961, James Slagle’s program SAINT used heuristics to solve 52 of 54 freshman calculus exercises from an MIT textbook (≈96%). In 1967, Larry Moses’s SIN refined the heuristics and achieved 100% success, though it remained heuristic. Finally, in 1969, Robert Risch introduced the Risch Algorithm with formal guarantees. This trajectory defined the traditional path: heuristics evolving until a definitive, guaranteed algorithm emerged. However, the rise of transformer-based AI has inverted this sequence — classical algorithms are now being displaced by heuristics once again. Algorithms have evolved and improved in many ways as time goes on. Common uses of algorithms today include social media apps like Instagram and YouTube. Algorithms are used as a way to analyze what people like and push more of those things to the people who interact with them. Quantum computing uses quantum algorithm procedures to solve problems faster. More recently, in 2024, NIST updated their post-quantum encryption standards, which includes new encryption algorithms to enhance defenses against attacks using quantum computing. == Representations == Algorithms can be expressed in many kinds of notation, including natural languages, pseudocode, flowcharts, drakon-charts, programming languages or control tables. Natural language expressions of algorithms tend to be verbose and ambiguous and are rarely used for complex or technical algor

    Read more →
  • Generalized distributive law

    Generalized distributive law

    The generalized distributive law (GDL) is a generalization of the distributive property which gives rise to a general message passing algorithm. It is a synthesis of the work of many authors in the information theory, digital communications, signal processing, statistics, and artificial intelligence communities. The law and algorithm were introduced in a semi-tutorial by Srinivas M. Aji and Robert J. McEliece with the same title. == Introduction == "The distributive law in mathematics is the law relating the operations of multiplication and addition, stated symbolically, a ∗ ( b + c ) = a ∗ b + a ∗ c {\displaystyle a(b+c)=ab+ac} ; that is, the monomial factor a {\displaystyle a} is distributed, or separately applied, to each term of the binomial factor b + c {\displaystyle b+c} , resulting in the product a ∗ b + a ∗ c {\displaystyle ab+ac} " – Britannica. As it can be observed from the definition, application of distributive law to an arithmetic expression reduces the number of operations in it. In the previous example the total number of operations reduced from three (two multiplications and an addition in a ∗ b + a ∗ c {\displaystyle ab+ac} ) to two (one multiplication and one addition in a ∗ ( b + c ) {\displaystyle a(b+c)} ). Generalization of distributive law leads to a large family of fast algorithms. This includes the FFT and Viterbi algorithm. This is explained in a more formal way in the example below: α ( a , b ) = d e f ∑ c , d , e ∈ A f ( a , c , b ) g ( a , d , e ) {\displaystyle \alpha (a,\,b){\stackrel {\mathrm {def} }{=}}\displaystyle \sum \limits _{c,d,e\in A}f(a,\,c,\,b)\,g(a,\,d,\,e)} where f ( ⋅ ) {\displaystyle f(\cdot )} and g ( ⋅ ) {\displaystyle g(\cdot )} are real-valued functions, a , b , c , d , e ∈ A {\displaystyle a,b,c,d,e\in A} and | A | = q {\displaystyle |A|=q} (say) Here we are "marginalizing out" the independent variables ( c {\displaystyle c} , d {\displaystyle d} , and e {\displaystyle e} ) to obtain the result. When we are calculating the computational complexity, we can see that for each q 2 {\displaystyle q^{2}} pairs of ( a , b ) {\displaystyle (a,b)} , there are q 3 {\displaystyle q^{3}} terms due to the triplet ( c , d , e ) {\displaystyle (c,d,e)} which needs to take part in the evaluation of α ( a , b ) {\displaystyle \alpha (a,\,b)} with each step having one addition and one multiplication. Therefore, the total number of computations needed is 2 ⋅ q 2 ⋅ q 3 = 2 q 5 {\displaystyle 2\cdot q^{2}\cdot q^{3}=2q^{5}} . Hence the asymptotic complexity of the above function is O ( n 5 ) {\displaystyle O(n^{5})} . If we apply the distributive law to the RHS of the equation, we get the following: α ( a , b ) = d e f ∑ c ∈ A f ( a , c , b ) ⋅ ∑ d , e ∈ A g ( a , d , e ) {\displaystyle \alpha (a,\,b){\stackrel {\mathrm {def} }{=}}\displaystyle \sum \limits _{c\in A}f(a,\,c,\,b)\cdot \sum _{d,\,e\in A}g(a,\,d,\,e)} This implies that α ( a , b ) {\displaystyle \alpha (a,\,b)} can be described as a product α 1 ( a , b ) ⋅ α 2 ( a ) {\displaystyle \alpha _{1}(a,\,b)\cdot \alpha _{2}(a)} where α 1 ( a , b ) = d e f ∑ c ∈ A f ( a , c , b ) {\displaystyle \alpha _{1}(a,b){\stackrel {\mathrm {def} }{=}}\displaystyle \sum \limits _{c\in A}f(a,\,c,\,b)} and α 2 ( a ) = d e f ∑ d , e ∈ A g ( a , d , e ) {\displaystyle \alpha _{2}(a){\stackrel {\mathrm {def} }{=}}\displaystyle \sum \limits _{d,\,e\in A}g(a,\,d,\,e)} Now, when we are calculating the computational complexity, we can see that there are q 3 {\displaystyle q^{3}} additions in α 1 ( a , b ) {\displaystyle \alpha _{1}(a,\,b)} and α 2 ( a ) {\displaystyle \alpha _{2}(a)} each and there are q 2 {\displaystyle q^{2}} multiplications when we are using the product α 1 ( a , b ) ⋅ α 2 ( a ) {\displaystyle \alpha _{1}(a,\,b)\cdot \alpha _{2}(a)} to evaluate α ( a , b ) {\displaystyle \alpha (a,\,b)} . Therefore, the total number of computations needed is q 3 + q 3 + q 2 = 2 q 3 + q 2 {\displaystyle q^{3}+q^{3}+q^{2}=2q^{3}+q^{2}} . Hence the asymptotic complexity of calculating α ( a , b ) {\displaystyle \alpha (a,b)} reduces to O ( n 3 ) {\displaystyle O(n^{3})} from O ( n 5 ) {\displaystyle O(n^{5})} . This shows by an example that applying distributive law reduces the computational complexity which is one of the good features of a "fast algorithm". == History == Some of the problems that used distributive law to solve can be grouped as follows: Decoding algorithms: A GDL like algorithm was used by Gallager's for decoding low density parity-check codes. Based on Gallager's work Tanner introduced the Tanner graph and expressed Gallagers work in message passing form. The tanners graph also helped explain the Viterbi algorithm. It is observed by Forney that Viterbi's maximum likelihood decoding of convolutional codes also used algorithms of GDL-like generality. Forward–backward algorithm: The forward backward algorithm helped as an algorithm for tracking the states in the Markov chain. And this also was used the algorithm of GDL like generality Artificial intelligence: The notion of junction trees has been used to solve many problems in AI. Also the concept of bucket elimination used many of the concepts. == The MPF problem == MPF or marginalize a product function is a general computational problem which as special case includes many classical problems such as computation of discrete Hadamard transform, maximum likelihood decoding of a linear code over a memory-less channel, and matrix chain multiplication. The power of the GDL lies in the fact that it applies to situations in which additions and multiplications are generalized. A commutative semiring is a good framework for explaining this behavior. It is defined over a set K {\displaystyle K} with operators " + {\displaystyle +} " and " . {\displaystyle .} " where ( K , + ) {\displaystyle (K,\,+)} and ( K , . ) {\displaystyle (K,\,.)} are a commutative monoids and the distributive law holds. Let p 1 , … , p n {\displaystyle p_{1},\ldots ,p_{n}} be variables such that p 1 ∈ A 1 , … , p n ∈ A n {\displaystyle p_{1}\in A_{1},\ldots ,p_{n}\in A_{n}} where A {\displaystyle A} is a finite set and | A i | = q i {\displaystyle |A_{i}|=q_{i}} . Here i = 1 , … , n {\displaystyle i=1,\ldots ,n} . If S = { i 1 , … , i r } {\displaystyle S=\{i_{1},\ldots ,i_{r}\}} and S ⊂ { 1 , … , n } {\displaystyle S\,\subset \{1,\ldots ,n\}} , let A S = A i 1 × ⋯ × A i r {\displaystyle A_{S}=A_{i_{1}}\times \cdots \times A_{i_{r}}} , p S = ( p i 1 , … , p i r ) {\displaystyle p_{S}=(p_{i_{1}},\ldots ,p_{i_{r}})} , q S = | A S | {\displaystyle q_{S}=|A_{S}|} , A = A 1 × ⋯ × A n {\displaystyle \mathbf {A} =A_{1}\times \cdots \times A_{n}} , and p = { p 1 , … , p n } {\displaystyle \mathbf {p} =\{p_{1},\ldots ,p_{n}\}} Let S = { S j } j = 1 M {\displaystyle S=\{S_{j}\}_{j=1}^{M}} where S j ⊂ { 1 , . . . , n } {\displaystyle S_{j}\subset \{1,...\,,n\}} . Suppose a function is defined as α i : A S i → R {\displaystyle \alpha _{i}:A_{S_{i}}\rightarrow R} , where R {\displaystyle R} is a commutative semiring. Also, p S i {\displaystyle p_{S_{i}}} are named the local domains and α i {\displaystyle \alpha _{i}} as the local kernels. Now the global kernel β : A → R {\displaystyle \beta :\mathbf {A} \rightarrow R} is defined as: β ( p 1 , . . . , p n ) = ∏ i = 1 M α ( p S i ) {\displaystyle \beta (p_{1},...\,,p_{n})=\prod _{i=1}^{M}\alpha (p_{S_{i}})} Definition of MPF problem: For one or more indices i = 1 , . . . , M {\displaystyle i=1,...\,,M} , compute a table of the values of S i {\displaystyle S_{i}} -marginalization of the global kernel β {\displaystyle \beta } , which is the function β i : A S i → R {\displaystyle \beta _{i}:A_{S_{i}}\rightarrow R} defined as β i ( p S i ) = ∑ p S i c ∈ A S i c β ( p ) {\displaystyle \beta _{i}(p_{S_{i}})\,=\displaystyle \sum \limits _{p_{S_{i}^{c}}\in A_{S_{i}^{c}}}\beta (p)} Here S i c {\displaystyle S_{i}^{c}} is the complement of S i {\displaystyle S_{i}} with respect to { 1 , . . . , n } {\displaystyle \mathbf {\{} 1,...\,,n\}} and the β i ( p S i ) {\displaystyle \beta _{i}(p_{S_{i}})} is called the i t h {\displaystyle i^{th}} objective function, or the objective function at S i {\displaystyle S_{i}} . It can observed that the computation of the i t h {\displaystyle i^{th}} objective function in the obvious way needs M q 1 q 2 q 3 ⋯ q n {\displaystyle Mq_{1}q_{2}q_{3}\cdots q_{n}} operations. This is because there are q 1 q 2 ⋯ q n {\displaystyle q_{1}q_{2}\cdots q_{n}} additions and ( M − 1 ) q 1 q 2 . . . q n {\displaystyle (M-1)q_{1}q_{2}...q_{n}} multiplications needed in the computation of the i th {\displaystyle i^{\text{th}}} objective function. The GDL algorithm which is explained in the next section can reduce this computational complexity. The following is an example of the MPF problem. Let p 1 , p 2 , p 3 , p 4 , {\displaystyle p_{1},\,p_{2},\,p_{3},\,p_{4},} and p 5 {\displaystyle p_{5}} be variables such that p 1 ∈ A 1 , p 2 ∈ A 2 , p 3 ∈ A 3 , p 4 ∈ A 4 , {\displaystyle p_{1}\in

    Read more →
  • Smart speaker industry in South Korea

    Smart speaker industry in South Korea

    Smart speakers, or AI speakers, have been developed by multiple domestic electronics and telecommunications firms in South Korea. Since their introduction to the local market in 2016, they have been used by millions of people in the country. == Brands == === Google === In September 2018, Google Home (including the Google Home Mini) launched in South Korea. Running Google Assistant, it featured simultaneous recognition of two languages among a total of seven, including Korean. At launch, it could play music from Bugs!, in addition to YouTube. === Kakao === In November 2017, Kakao launched the Kakao Mini, featuring integrated KakaoTalk functionality. === KT === KT launched the GiGA Genie smart speaker in January 2017, using a Harman Kardon speaker. In November 2017, KT announced GiGA Genie LTE, a portable AI speaker with LTE support. They also released a mini speaker called GiGA Genie Buddy. In 2018, KT created a special version of GiGa Genie with a screen for use in hotels. On 29 April 2019, KT announced the GiGA Genie Table TV, a consumer-oriented smart speaker with a display. It featured paid TV access through Wi-Fi. Based on usage data from the hotel model, KT decided not to add a touchscreen. The Table TV also featured a limited-access "personalized-text-to-speech technology" which could use parents' voice recording inputs to read children books. In February 2022, KT began rolling out Amazon Alexa integration into its speakers for English support. === Naver === In August 2017, Naver announced the Wave smart speaker, operating on Clova. In October 2017, Naver launched the Friends smart speaker, which were designed based on Line characters. ==== LG Uplus ==== In December 2017, LG Uplus launched the Friends+ speaker with Naver, operating on U+ Home AI. === Samsung === In August 2018, Samsung announced the Samsung Galaxy Home in partnership with Spotify. The original size was delayed, while the Galaxy Home Mini appeared briefly as a bonus for Samsung Galaxy S20 preorders in South Korea in February 2020. === SK Telecom === SK Telecom launched the Nugu smart speaker in September 2016, using an Astell & Kern audio system. In August 2017, SKT released a portable speaker named Nugu mini. In July 2018, SKT launched the Nugu Candle, featuring expanded mood lighting. The first-generation Nugu was subsequently discontinued. On 18 April 2019, SKT released the NUGU Nemo AI, which featured a display and JBL stereo speaker. In August 2019, SKT collaborated with SM Entertainment, incorporating functions related to the agency's artists into Nugu. In January 2022, SKT showcased the NUGU Candle SE, introducing Alexa support. == Usage == In 2018, approximately 3 million people in South Korea used smart speakers. According to data from KT in 2018, the most common commands to its speakers were for controlling televisions. Based on a broader survey in 2017, music was selected as the most frequent use case. By 2018, smart speaker companies were partnering with reading and other education services, adding potential use-cases for children. By 2022, smart speakers were being utilized by the South Korean government. SKT, in partnership with 70 regional governments, distributed smart speakers to 12,000 senior citizens living alone. The government paid for monthly subscriptions to help seniors stay mentally engaged. Naver made an agreement with the Seoul Metropolitan Government to provide Clova CareCall, an automated health checkup program to hundreds of senior citizens living alone. KT's AI care service included an emergency dispatch call function and medication notifications. == Criticism == === Communication === In a survey of 300 users in 2017, approximately half reported having some type of communication issue with their smart speakers. === Privacy === South Korean smart speakers sparked privacy concerns when they were found to be collecting and documenting user audio data in 2019. The speaker companies responded that only a minority of data was collected and that it was anonymized. They stated that such recordings were collected for performance improvements.

    Read more →
  • Sikidy

    Sikidy

    Sikidy is a form of algebraic geomancy practiced by Malagasy peoples in Madagascar. It involves algorithmic operations performed on random data generated from tree seeds, which are ritually arranged in a tableau called a toetry and divinely interpreted after being mathematically operated on. Columns of seeds, designated "slaves" or "princes" belonging to respective "lands" for each, interact symbolically to express vintana ('fate') in the interpretation of the diviner. The diviner also prescribes solutions to problems and ways to avoid fated misfortune, often involving a sacrifice. The centuries-old practice derives from Islamic influence brought to the island by medieval Arab traders. The sikidy is consulted for a range of divinatory questions pertaining to fate and the future, including identifying sources of and rectifying misfortune, reading the fate of newborns, and planning annual migrations. The mathematics of sikidy involves Boolean algebra, symbolic logic and parity. == History == The practice is several centuries old, and is influenced by Arab geomantic traditions of Arab Muslim traders on the island. Most writers link the origins of sikidy to the "sea-going trade involving the southwest coast of India, the Persian Gulf, and the east coast of Africa in the 9th or 10th century C.E." Stephen Ellis and Solofo Randrianja describe sikidy as "probably one of the oldest components of Malagasy culture", writing that it most likely the product of an indigenous divinatory art later influenced by Islamic practice. Umar H. D. Danfulani writes that the integration of Arabic divination into indigenous divination is "clearly demonstrated" in Madagascar, where the Arabic astrological system was adapted to the indigenous agricultural system and meshed with Malagasy lunar months by "adapting indigenous months, volana, to the astrological months, vintana". Danfulani also describes the concepts in sikidy of "houses" (lands) and "kings in their houses" as retained from medieval Arabic astrology. Chemillier et al. say the practice's spread across Madagascar likely originated with the southeastern Antemoro people, among whom Arab influence was the strongest. Though the etymology of sikidy is unknown, it has been posited that the word derives from the Arabic sichr ('incantation' or 'charm'). Sikidy was of central importance to pre-Christian Malagasy religion, with one practitioner quoted in 1892 as calling sikidy "the Bible of our ancestors". A missionary report from 1616 describes one form of sikidy using tamarind seeds, and another using fingered markings in the sand. The early colonial French governor of Madagascar Étienne de Flacourt documented sikidy in the mid-17th century: Matatane country in southeastern Madagascar [...] where the Antemoro [...] live was a center of astrological study as early as the fourteenth century [...]. This area was also the site of early Arab settlements, although strict Islamic observances were lost centuries ago [...]. Historical evidence shows that Antemoro diviners, bearers of the astrological system, infiltrated nearly all the ancient kingdoms of Madagascar beginning in the sixteenth century. [...] Today, although many persons claim to be ombiasy [diviners], only the Antemoro diviners are considered true professionals. The area is still a famous place of learning where specialists go for training and then return to their home communities with a certain body of knowledge. Now we can better understand the degree of similarity of divination forms found throughout Madagascar. For centuries Matitanana has remained a training center for diviners who have migrated widely, usually attaining important positions in their home communities and with various royal families. Comparison of contemporary rites with centuries-old texts show that sikidy has been remarkably unchanged throughout its history. The "infiltration" of Malagasy kingdoms by Antemoro diviners, and Matitanana's role as a place for astrological and divinatory learning, help to explain the relatively uniform practicing of sikidy across Madagascar. Chemallier et al. write that the mathematical construction of the arrangement of seeds is procedurally consistent across all of Madagascar, with variations in practice between groups and regions being limited to more minor aspects, such as the alignment of figures according to cardinal directions. One exception is the simplified Merina sikidy joria. === Origin myths === Mythic tradition relating to the origin of sikidy "links [the practice] both to the return by walking on water of Arab ancestors who had intermarried with Malagasy but then left, and to the names of the days of the week" and holds that the art was supernaturally communicated to the ancestors, with Zanahary (the supreme deity of Malagasy religion) giving it to Ranakandriana, who then gave it to a line of diviners (Ranakandriana to Ramanitralanana to Rabibi-andrano to Andriambavi-maitso (who was a woman) to Andriam-bavi-nosy), the last of whom terminated the monopoly by giving it to the people, declaring: "Behold, I give you the sikidy, of which you may inquire what offerings you should present in order to obtain blessings; and what expiation you should make so as to avert evils, when any are ill or under apprehension of some future calamity". A mythic anecdote of Ranakandriana says that two men observed him one day playing in the sand. In fact he was practicing a form of sikidy worked in sand called sikidy alanana. The two men seized him, and Ranakandriana promised that he would teach them something if they released him. They agreed, and Ranakandriana taught them in depth how to work the sikidy. The two men then went to their chief and told him that they could tell him "the past and the future—what was good and what was bad—what increased and what diminished." The chief asked them to tell him how he could obtain plenty of cattle. The two men worked their sikidy and told the chief to kill all of his bulls, and that "great numbers would come to him" on the following Friday. The chieftain, doubting, asked what would happen if their prediction didn't come true, and the two men promised they would pay with their lives. The chief agreed and killed his bulls. On Thursday, thinking he'd been duped, he prematurely killed the first man of the two who'd told him about the divinatory art. On Friday, however, "vast herds" came amidst heavy rain, actually filling an immense plain in their crowd. The chieftain lamented the mpisikidy's wrongful execution and ordered for him a pompous funeral. The chieftain took the second man as his close adviser and friend, and trusted the sikidy forever afterwards. The British missionary William Ellis recorded in 1839 two idiomatic expressions used in Madagascar that come from this story: "Tsy mahandry andro Zoma" (lit. 'He cannot wait 'til Friday') is said of someone extremely impatient, and heavy rainshowers falling in rapid succession are called "sese omby" (lit. 'a crowding together of cattle'). == Rites and arrangement of seeds == The divination is performed by a practitioner called an mpisikidy, ny màsina (lit. 'sacred one'), ombiasy, or ambiàsa (derived from the Arabic anbia, meaning 'prophet') who guides the client through the process and interprets the results in the context of the client's inquiries and desires. As part of an mpisikidy's formal initiation into the art, which includes a long period of apprenticeship, the initiate (called a mianatsy) must gather 124 and 200 fàno (Entada sp.) or kily (tamarind) tree seeds for his subsequent ritual use in sikidy. Raymond Decary writes that, at least among the Sakalava, a man must be 40 years old before learning and practicing sikidy, or he risks death. Before beginning to study, a student practitioner must make incisions at the tips of his index finger, his middle finger, and his tongue, and put within the incisions a paste containing red pepper and crushed wasp. This paste impregnates the fingers that will move the seeds of the sikidy and the tongue that will speak their revelations with the power to decipher the sikidy. Once this is done, he leaves at dawn to search for a fano (Entada chrysostachys) tree. Upon finding it, he throws his spear at its branches, shaking the tree and causing its large seed pods to fall. During this act, some initiates say: "When you were on the steep peak and in the dense forest, on you the crabs climbed, from you the crocodiles made their bed, with their paws the birds trod on you. Whether you are suspended in the trees or buried, you are never dried up nor rotten." In his study (written in 1941 and revised in 1948), Decary reported that the salary paid by a mianatsy to his master is "not very high": up to five francs, plus a red rooster's feather. The mpisikidy ritually arranges his seeds into a sixteen-column table consisting of four columns of randomly-generated data (representing fate) and eight columns of data derived from logical ope

    Read more →
  • Emotion-sensitive software

    Emotion-sensitive software

    Emotion-sensitive software (ESS) is software specifically designed to target and monitor emotional response in a human being. Some software measures anger by comparing the pitch of a voice to a regular, or calm, pitch. Another approach is the measurement of physical appearance. If a camera or similar recording device picks up a certain amount of red pigmentation in the skin the system can be alerted that this person is angered. The competitive landscape in the Electronic Surveillance Software (ESS) industry is marked by a high level of secrecy regarding the operational details of these software systems. Many producers deliberately withhold information about the inner workings of their ESS products, a strategy that serves dual purposes: firstly, it intensifies competition among companies in the sector, as each strives to maintain a unique edge without revealing trade secrets that could be leveraged by competitors; secondly, this secrecy acts as a deterrent against individuals or entities who might try to circumvent the surveillance mechanisms. One application of ESS was developed by University of Notre Dame Assistant Professor of Psychology Sidney D'Mello, Art Graesser from the University of Memphis and a colleague from Massachusetts Institute of Technology. They used the technology to create an electronic tutor that could assess a student's level of boredom and frustration based on facial expression and body language, and react accordingly.

    Read more →
  • Ontology for Biomedical Investigations

    Ontology for Biomedical Investigations

    The Ontology for Biomedical Investigations (OBI) is an open-access, integrated ontology for the description of biological and clinical investigations. OBI provides a model for the design of an investigation, the protocols and instrumentation used, the materials used, the data generated and the type of analysis performed on it. The project is being developed as part of the OBO Foundry and as such adheres to all the principles therein such as orthogonal coverage (i.e. clear delineation from other foundry member ontologies) and the use of a common formal language. In OBI the common formal language used is the Web Ontology Language (OWL). As of March 2008, a pre-release version of the ontology was made available at the project's SVN repository. == Scope == The Ontology for Biomedical Investigations (OBI) addresses the need for controlled vocabularies to support integration and joint ("cross-omics") analysis of experimental data, a need originally identified in the transcriptomics domain by the FGED Society, which developed the MGED Ontology as an annotation resource for microarray data.Smith B, Ashburner M, Rosse C, Bard J, Bug W, Ceusters W, et al. (November 2007). "The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration". Nature Biotechnology. 25 (11): 1251–5. doi:10.1038/nbt1346. PMC 2814061. PMID 17989687. OBI uses the basic formal ontology upper-level ontology as a means of describing general entities that do not belong to a specific problem domain. As such, all OBI classes are a subclass of some BFO class. The ontology has the scope of modeling all biomedical investigations and as such contains ontology terms for aspects such as: biological material – for example blood plasma instrument (and parts of an instrument therein) – for example DNA microarray, centrifuge information content – such as an image or a digital information entity such as an electronic medical record design and execution of an investigation (and individual experiments therein) – for example study design, electrophoresis material separation data transformation (incorporating aspects such as data normalization and data analysis) – for example principal components analysis dimensionality reduction, mean calculation Less 'concrete' aspects such as the role a given entity may play in a particular scenario (for example the role of a chemical compound in an experiment) and the function of an entity (for example the digestive function of the stomach to nutriate the body) are also covered in the ontology. == OBI consortium == The MGED Ontology was originally identified in the transcriptomics domain by the FGED Society and was developed to address the needs of data integration. Following a mutual decision to collaborate, this effort later became a wider collaboration between groups such as FGED, PSI and MSI in response to the needs of areas such as transcriptomics, proteomics and metabolomics and the FuGO (Functional Genomics Investigation Ontology) was created. This later became the OBI covering the wider scope of all biomedical investigations. As an international, cross-domain initiative, the OBI consortium draws upon a pool of experts from a variety of fields, not limited to biology. The current list of OBI consortium members is available at the OBI consortium website. The consortium is made up of a coordinating committee which is a combination of two subgroups, the Community Representative (those representing a particular biomedical community) and the Core Developers (ontology developers who may or may not be members of any single community). Separate to the coordinating committee is the Developers Working Group which consists of developers within the communities collaborating in the development of OBI at the discretion of current OBI Consortium members. == Papers on OBI ==

    Read more →
  • Cuboid (computer vision)

    Cuboid (computer vision)

    In computer vision, the term cuboid is used to describe a small spatiotemporal volume extracted for purposes of behavior recognition. The cuboid is regarded as a basic geometric primitive type and is used to depict three-dimensional objects within a three dimensional representation of a flat, two dimensional image. == Production == Cuboids can be produced from both two-dimensional and three-dimensional images. One method used to produce cuboids utilizes scene understanding (SUN) primitive databases, which are collections of pictures that already contain cuboids. By sorting through SUN primitive databases with machine learning tools, computers observe the conditions in which cuboids are produced in images from SUN primitive databases and can learn to produce cuboids from other images. RGB-D images, which are RGB images that also record the depth of each pixel, are occasionally used to produce cuboids because computers no longer need to determine the depth of an object, as they typically do because depth is already recorded. Cuboid production is sensitive to changes in color and illumination, blockage, and background clutter. This means that it is difficult for computers to produce cuboids of objects that are multicolored, irregularly illuminated, or partially covered, or if there are many objects in the background. This is partially due to the fact that algorithms for producing cuboids are still relatively simple. == Usage == Cuboids are created for point cloud-based three-dimensional maps and can be utilized in various situations such as augmented reality, the automated control of cars, drones, and robots, and object detection. Cuboids allow for software to identify a scene through geometric descriptions in an “object-agnostic” fashion. Interest points, locations within images that are identified by a computer as essential to identifying the image, created from two-dimensional images can be used with cuboids for image matching, identifying a room or scene, and instance recognition. Interest points created from three dimensional images can be used with cuboids to recognize activities. This is possible because interest points aid software to focus on only the most important aspects of the images. RGB-D images and SLAM systems are used together in RGB-D SLAM systems, which are employed by Computer-aided design systems to generate point cloud-based three-dimensional maps. Most industrial multi-axis machining tools use computer-aided manufacturing and subsequently work in cuboid work spaces.

    Read more →
  • Nike+iPod

    Nike+iPod

    The Nike+iPod Sport Kit is an activity tracker device, developed by Nike, Inc., which measures and records the distance and pace of a walk or run. The Nike+iPod consists of a small transmitter device attached to or embedded in a shoe, which communicates with either the Nike+ Sportband, or a receiver plugged into an iPod Nano. It can also work directly with a 2nd Generation iPod Touch (or higher), iPhone 3GS, iPhone 4, iPhone 4S, iPhone 5, The Nike+iPod was announced on May 23, 2006. On September 7, 2010, Nike released the Nike+ Running App (originally called Nike+ GPS) on the App Store, which used a tracking engine powered by MotionX that does not require the separate shoe sensor or pedometer. This application works using the accelerometer and GPS of the iPhone and the accelerometer of the iPod Touch, which does not have a GPS chip. Nike+Running is compatible with the iPhone 6 and iPhone 6 Plus down to iPhone 3GS and iPod touch. On June 21, 2012, Nike released Nike+ Running App for Android. The current app is compatible with all Android phones running 4.0.3 and up. == Overview == The sensor and iPod kit were revealed on May 20, 2006. The kit stores information such as the elapsed time of the workout, the distance traveled, pace, and calories burned by the individual. Nike+ was a collaboration between Nike and Apple; the platform consisted of an iPod, a wireless chip, Nike shoes that accepted the wireless chip, an iTunes membership, and a Nike+ online community. iPods using Nike iPod require a sensor and remote. The next upgraded product was the Sportband kit, which was announced in April 2008. The kit allows users to store run information without the iPod Nano. The Sportband consists of two parts: a rubber holding strap which is worn around the wrist, and a receiver which resembles a USB key-disk. The receiver displays information comparable to that of the iPod kit on the built-in display. After a run, the receiver can be plugged straight into a USB port and the software will upload the run information automatically to the Nike+ website. As of August 2008 "Nike+iPod for the Gym" launched, allowing users to record their cardio workouts directly to their iPods. No Sport kit or shoe sensor is required; all that is needed is a compatible iPod (1st–6th generation iPod Nano or 2nd/3rd gen iPod Touch) and an enabled piece of cardio equipment. As of March 2009, the seven largest commercial equipment providers were shipping enabled equipment (Life Fitness, Technogym, Precor USA, Star Trac, Cybex International, Matrix Fitness and Free Motion). The models of compatible cardio equipment include treadmills, stationary bicycles, stair climbers, ellipticals, and others such as Precor's Adaptive Motion Trainer. Once the user syncs an iPod with iTunes, the cardio workouts are automatically stored at Nikeplus.com, where each workout is visualized and tracked based on the number of calories burned. The calories are converted to "CardioMiles", at a ratio of 100:1, allowing cardio users to take full advantage of all the tools and features of Nikeplus.com, and allow them to engage in challenges with other runners, walkers and cardio users, using a common currency. With the release of the second-generation iPod Touch in 2008, Apple Inc. included a built-in ability to receive Nike+ signals, which allowed the iPod to connect directly to the wireless sensor thus eliminating the need for an external receiver to be connected. Apple also added this capability to the iPhone 3GS (released 2009), iPhone 4 (2010), and third-generation iPod Touch (2009). Those devices use their Broadcom Bluetooth chipset to receive the signals. On June 7, 2010, Polar and Nike introduced the Polar WearLink+ that works with Nike+. This new product works with the Nike+ SportBand and the fifth generation iPod nano in conjunction with the Nike+ iPod Sport Kit. Polar WearLink+ that works with Nike+ communicates directly with the fifth generation iPod nano and Nike+ SportBand using a proprietary digital protocol but it is dual-mode so it is also compatible with most Polar training computers (all those using 5 kHz analog transmission technology). Nike+ had 18 million global users as of April 2013. One year later, Nike updated the number of global users to 28 million. In iOS 6.1.2 (and possibly higher), a hole in the compatibility for the app has allowed jailbroken iPad users to use the native Nike + iPod iPhone and iPod app by moving the app bundle and setting permissions for the app. On April 30, 2018, Nike retired services for legacy Nike wearable devices, such as the Nike+ FuelBand and the Nike+ SportWatch GPS, and previous versions of apps, including Nike Run Club and Nike Training Club version 4.X and lower. Likewise, Nike no longer supported the Nike+ Connect software that transferred data to a NikePlus Profile or the Nike+ Fuel/FuelBand and Nike+ Move apps. == Sports kit equipment == The kit consists of two pieces: a piezoelectric sensor with a Nordic Semiconductor nRF2402 transmitter that is mounted under the inner sole of the shoe and a receiver that connects to the iPod. They communicate using a 2.4 GHz wireless radio and use Nordic Semiconductor's "ShockBurst" network protocol. The wireless data is encrypted in transit, but some uniquely identifying data is sent in the plain. The wireless protocol was reverse engineered and documented by Dmitry Grinberg in 2011. Nike recommends that the shoe be a Nike+ model with a special pocket in which to place the device. Nike has released the sensor for individual sale meaning that consumers no longer have to purchase the whole set (the iPod receiver and sensor). As the sensor battery cannot be replaced, a new one must be purchased every time the battery runs out. Aftermarket solutions are available to users who do not want to use shoes with built-in or hand-made pockets for the foot sensor, such as shoe pouches and containment devices designed to affix the sensor against the shoe laces. No matter how the sensor is integrated with the user's shoes, care must be taken that it is firmly fixed in place and will not jerk around while in use, which would degrade the accuracy. == Sports kit usage == The Sports Kit can be used to track running, which it refers to as "workouts". New workouts are started by plugging the receiving unit into the iPod, then navigating through the iPod menu system. The user chooses a goal for the workout, which might be to cover a specific distance, or burn a number of calories, or work out for a specified time. A workout can also be started without a goal, which is called a "Basic Workout". When the workout goal has been set, the receiver seeks the sensor, possibly asking the user to "walk around to activate [the] sensor". The user then must press the center button on the iPod to begin the workout. Audio feedback is provided in the user's choice of generic male or female voice by the iPod over the course of the workout, depending on the type of workout chosen. For goal-oriented workouts, the feedback will correspond to significant milestones toward the goal. In a distance workout, for example, the audio feedback will inform the user as each mile or kilometer has been completed, as well as the half-way point of the workout, and a countdown of four 100-meter increments at the end of the workout. The iPod's control wheel functions change slightly during a workout. The Pause button now not only pauses the music but also the workout. Similarly, the Menu button is used to access the controls to end the workout. The Forward and Back buttons are unchanged, performing audio track skip and reverse functions. The Center button has two functions: audio feedback about the current distance, time, and pace are provided when the button is tapped once, while if the button is held down the iPod skips to the "PowerSong" - an audio track chosen by the user, generally intended for motivation. In addition to the in-workout audio feedback, there are pre-recorded congratulations provided by Lance Armstrong, Tiger Woods, Joan Benoit Samuelson, and Paula Radcliffe whenever a user achieves a personal best (such as fastest mile, fastest 5K, fastest 10K, longest run yet) or reaches certain long-term milestones (such as 250 miles, 500 kilometers). This "celebrity feedback" is heard after the usual end-of-run statistics. While the Sports Kit can be used immediately after purchase, it will report more accurate results if it is calibrated before the first usage and then regularly afterwards. For calibration, the user finds a fixed known distance of at least 0.25 mile or 400 meters and then sets the Nike+ to calibration mode for the walk or run over that distance. When the walk or run is complete, the device calibrates itself and future workout reporting will reflect statistics closer to that individual user's workout style. Consumer Reports magazine tested the device and found it accurate as long as you keep an even pace. In workouts with varied pa

    Read more →
  • Grid-oriented storage

    Grid-oriented storage

    Grid-oriented Storage (GOS) was a term used for data storage by a university project during the era when the term grid computing was popular. == Description == GOS was a successor of the term network-attached storage (NAS). GOS systems contained hard disks, often RAIDs (redundant arrays of independent disks), like traditional file servers. GOS was designed to deal with long-distance, cross-domain and single-image file operations, which is typical in Grid environments. GOS behaves like a file server via the file-based GOS-FS protocol to any entity on the grid. Similar to GridFTP, GOS-FS integrates a parallel stream engine and Grid Security Infrastructure (GSI). Conforming to the universal VFS (Virtual Filesystem Switch), GOS-FS can be pervasively used as an underlying platform to best utilize the increased transfer bandwidth and accelerate the NFS/CIFS-based applications. GOS can also run over SCSI, Fibre Channel or iSCSI, which does not affect the acceleration performance, offering both file level protocols and block level protocols for storage area network (SAN) from the same system. In a grid infrastructure, resources may be geographically distant from each other, produced by differing manufacturers, and have differing access control policies. This makes access to grid resources dynamic and conditional upon local constraints. Centralized management techniques for these resources are limited in their scalability both in terms of execution efficiency and fault tolerance. Provision of services across such platforms requires a distributed resource management mechanism and the peer-to-peer clustered GOS appliances allow a single storage image to continue to expand, even if a single GOS appliance reaches its capacity limitations. The cluster shares a common, aggregate presentation of the data stored on all participating GOS appliances. Each GOS appliance manages its own internal storage space. The major benefit of this aggregation is that clustered GOS storage can be accessed by users as a single mount point. GOS products fit the thin-server categorization. Compared with traditional “fat server”-based storage architectures, thin-server GOS appliances deliver numerous advantages, such as the alleviation of potential network/grid bottle-necks, CPU and OS optimized for I/O only, ease of installation, remote management and minimal maintenance, low cost and Plug and Play, etc. Examples of similar innovations include NAS, printers, fax machines, routers and switches. An Apache server has been installed in the GOS operating system, ensuring an HTTPS-based communication between the GOS server and an administrator via a Web browser. Remote management and monitoring makes it easy to set up, manage, and monitor GOS systems. == History == Frank Zhigang Wang and Na Helian proposed a funding proposal to the UK government titled “Grid-Oriented Storage (GOS): Next Generation Data Storage System Architecture for the Grid Computing Era” in 2003. The proposal was approved and granted one million pounds in 2004. The first prototype was constructed in 2005 at Centre for Grid Computing, Cambridge-Cranfield High Performance Computing Facility. The first conference presentation was at IEEE Symposium on Cluster Computing and Grid (CCGrid), 9–12 May 2005, Cardiff, UK. As one of the five best work-in-progress, it was included in the IEEE Distributed Systems Online. In 2006, the GOS architecture and its implementations was published in IEEE Transactions on Computers, titled “Grid-oriented Storage: A Single-Image, Cross-Domain, High-Bandwidth Architecture”. Starting in January 2007, demonstrations were presented at Princeton University, Cambridge University Computer Lab and others. By 2013, the Cranfield Centre still used future tense for the project. Peer-to-peer file sharings use similar techniques.

    Read more →
  • Broadcast (parallel pattern)

    Broadcast (parallel pattern)

    Broadcast is a collective communication primitive in parallel programming to distribute programming instructions or data to nodes in a cluster. It is the reverse operation of reduction. The broadcast operation is widely used in parallel algorithms, such as matrix-vector multiplication, Gaussian elimination and shortest paths. The Message Passing Interface implements broadcast in MPI_Bcast. == Definition == A message M [ 1.. m ] {\displaystyle M[1..m]} of length m {\displaystyle m} should be distributed from one node to all other p − 1 {\displaystyle p-1} nodes. T byte {\displaystyle T_{\text{byte}}} is the time it takes to send one byte. T start {\displaystyle T_{\text{start}}} is the time it takes for a message to travel to another node, independent of its length. Therefore, the time to send a package from one node to another is t = s i z e × T byte + T start {\displaystyle t=\mathrm {size} \times T_{\text{byte}}+T_{\text{start}}} . p {\displaystyle p} is the number of nodes and the number of processors. == Binomial Tree Broadcast == With Binomial Tree Broadcast the whole message is sent at once. Each node that has already received the message sends it on further. This grows exponentially as each time step the amount of sending nodes is doubled. The algorithm is ideal for short messages but falls short with longer ones as during the time when the first transfer happens only one node is busy. Sending a message to all nodes takes log 2 ⁡ ( p ) t {\displaystyle \log _{2}(p)t} time which results in a runtime of log 2 ⁡ ( p ) ( m T byte + T start ) {\displaystyle \log _{2}(p)(mT_{\text{byte}}+T_{\text{start}})} == Linear Pipeline Broadcast == The message is split up into k {\displaystyle k} packages and sent piecewise from node n {\displaystyle n} to node n + 1 {\displaystyle n+1} . The time needed to distribute the first message piece is p t = m k T byte + T start {\textstyle pt={\frac {m}{k}}T_{\text{byte}}+T_{\text{start}}} whereby t {\displaystyle t} is the time needed to send a package from one processor to another. Sending a whole message takes ( p + k ) ( m T byte k + T start ) = ( p + k ) t = p t + k t {\displaystyle (p+k)\left({\frac {mT_{\text{byte}}}{k}}+T_{\text{start}}\right)=(p+k)t=pt+kt} . Optimal is to choose k = m ( p − 2 ) T byte T start {\displaystyle k={\sqrt {\frac {m(p-2)T_{\text{byte}}}{T_{\text{start}}}}}} resulting in a runtime of approximately m T byte + p T start + m p T start T byte {\displaystyle mT_{\text{byte}}+pT_{\text{start}}+{\sqrt {mpT_{\text{start}}T_{\text{byte}}}}} The run time is dependent on not only message length but also the number of processors that play roles. This approach shines when the length of the message is much larger than the amount of processors. == Pipelined Binary Tree Broadcast == This algorithm combines Binomial Tree Broadcast and Linear Pipeline Broadcast, which makes the algorithm work well for both short and long messages. The aim is to have as many nodes work as possible while maintaining the ability to send short messages quickly. A good approach is to use Fibonacci trees for splitting up the tree, which are a good choice as a message cannot be sent to both children at the same time. This results in a binary tree structure. We will assume in the following that communication is full-duplex. The Fibonacci tree structure has a depth of about d ≈ log Φ ⁡ ( p ) {\displaystyle d\approx \log _{\Phi }(p)} whereby Φ = 1 + 5 2 {\displaystyle \Phi ={\frac {1+{\sqrt {5}}}{2}}} the golden ratio. The resulting runtime is ( m k T byte + T start ) ( d + 2 k − 2 ) {\textstyle ({\frac {m}{k}}T_{\text{byte}}+T_{\text{start}})(d+2k-2)} . Optimal is k = n ( d − 2 ) T byte 3 T start {\displaystyle k={\sqrt {\frac {n(d-2)T_{\text{byte}}}{3T_{\text{start}}}}}} . This results in a runtime of 2 m T byte + T start log Φ ⁡ ( p ) + 2 m log Φ ⁡ ( p ) T start T byte {\displaystyle 2mT_{\text{byte}}+T_{\text{start}}\log _{\Phi }(p)+{\sqrt {2m\log _{\Phi }(p)T_{\text{start}}T_{\text{byte}}}}} . == Two Tree Broadcast (23-Broadcast) == === Definition === This algorithm aims to improve on some disadvantages of tree structure models with pipelines. Normally in tree structure models with pipelines (see above methods), leaves receive just their data and cannot contribute to send and spread data. The algorithm concurrently uses two binary trees to communicate over. Those trees will be called tree A and B. Structurally in binary trees there are relatively more leave nodes than inner nodes. Basic Idea of this algorithm is to make a leaf node of tree A be an inner node of tree B. It has also the same technical function in opposite side from B to A tree. This means, two packets are sent and received by inner nodes and leaves in different steps. === Tree construction === The number of steps needed to construct two parallel-working binary trees is dependent on the amount of processors. Like with other structures one processor can is the root node who sends messages to two trees. It is not necessary to set a root node, because it is not hard to recognize that the direction of sending messages in binary tree is normally top to bottom. There is no limitation on the number of processors to build two binary trees. Let the height of the combined tree be h = ⌈log(p + 2)⌉. Tree A and B can have a height of h − 1 {\displaystyle h-1} . Especially, if the number of processors correspond to p = 2 h − 1 {\displaystyle p=2^{h}-1} , we can make both sides trees and a root node. To construct this model efficiently and easily with a fully built tree, we can use two methods called "Shifting" and "Mirroring" to get second tree. Let assume tree A is already modeled and tree B is supposed to be constructed based on tree A. We assume that we have p {\displaystyle p} processors ordered from 0 to p − 1 {\displaystyle p-1} . ==== Shifting ==== The "Shifting" method, first copies tree A and moves every node one position to the left to get tree B. The node, which will be located on -1, becomes a child of processor p − 2 {\displaystyle p-2} . ==== Mirroring ==== "Mirroring" is ideal for an even number of processors. With this method tree B can be more easily constructed by tree A, because there are no structural transformations in order to create the new tree. In addition, a symmetric process makes this approach simple. This method can also handle an odd number of processors, in this case, we can set processor p − 1 {\displaystyle p-1} as root node for both trees. For the remaining processors "Mirroring" can be used. === Coloring === We need to find a schedule in order to make sure that no processor has to send or receive two messages from two trees in a step. The edge, is a communication connection to connect two nodes, and can be labelled as either 0 or 1 to make sure that every processor can alternate between 0 and 1-labelled edges. The edges of A and B can be colored with two colors (0 and 1) such that no processor is connected to its parent nodes in A and B using edges of the same color- no processor is connected to its children nodes in A or B using edges of the same color. In every even step the edges with 0 are activated and edges with 1 are activated in every odd step. === Time complexity === In this case the number of packet k is divided in half for each tree. Both trees are working together the total number of packets k = k / 2 + k / 2 {\displaystyle k=k/2+k/2} (upper tree + bottom tree) In each binary tree sending a message to another nodes takes 2 i {\displaystyle 2i} steps until a processor has at least a packet in step i {\displaystyle i} . Therefore, we can calculate all steps as d := log 2 ⁡ ( p + 1 ) ⇒ log 2 ⁡ ( p + 1 ) ≈ log 2 ⁡ ( p ) {\displaystyle d:=\log _{2}(p+1)\Rightarrow \log _{2}(p+1)\approx \log _{2}(p)} . The resulting run time is T ( m , p , k ) ≈ ( m k T byte + T start ) ( 2 d + k − 1 ) {\textstyle T(m,p,k)\approx ({\frac {m}{k}}T_{\text{byte}}+T_{\text{start}})(2d+k-1)} . (Optimal k = m ( 2 d − 1 ) T byte / T start {\textstyle k={\sqrt {{m(2d-1)T_{\text{byte}}}/{T_{\text{start}}}}}} ) This results in a run time of T ( m , p ) ≈ m T byte + T start ⋅ 2 log 2 ⁡ ( p ) + m ⋅ 2 log 2 ⁡ ( p ) T start T byte {\displaystyle T(m,p)\approx mT_{\text{byte}}+T_{\text{start}}\cdot 2\log _{2}(p)+{\sqrt {m\cdot 2\log _{2}(p)T_{\text{start}}T_{\text{byte}}}}} . == ESBT-Broadcasting (Edge-disjoint Spanning Binomial Trees) == In this section, another broadcasting algorithm with an underlying telephone communication model will be introduced. A Hypercube creates network system with p = 2 d ( d = 0 , 1 , 2 , 3 , . . . ) {\displaystyle p=2^{d}(d=0,1,2,3,...)} . Every node is represented by binary 0 , 1 {\displaystyle {0,1}} depending on the number of dimensions. Fundamentally ESBT(Edge-disjoint Spanning Binomial Trees) is based on hypercube graphs, pipelining( m {\displaystyle m} messages are divided by k {\displaystyle k} packets) and binomial trees. The Processor 0 d {\displaystyle 0^{d}} cyclically spreads packets to roots of ESB

    Read more →
  • Site reliability engineering

    Site reliability engineering

    Site reliability engineering (SRE) is a discipline in the field of software engineering and IT infrastructure support that monitors and improves the availability and performance of deployed software systems and large software services (which are expected to deliver reliable response times across events such as new software deployments, hardware failures, and cybersecurity attacks). There is typically a focus on automation and an infrastructure as code methodology. SRE uses elements of software engineering, IT infrastructure, web development, and operations to assist with reliability. It is similar to DevOps as they both aim to improve the reliability and availability of deployed software systems. == History == Site Reliability Engineering originated at Google with Benjamin Treynor Sloss, who founded SRE team in 2003. The concept expanded within the software development industry, leading various companies to employ site reliability engineers. By March 2016, Google had more than 1,000 site reliability engineers on staff. Dedicated SRE teams are common at larger web development companies. In middle-sized and smaller companies, DevOps teams sometimes perform SRE, as well. Organizations that have adopted the concept include Airbnb, Dropbox, IBM, LinkedIn, Netflix, and Wikimedia. == Definition == Site reliability engineers (SREs) are responsible for a combination of system availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning. SREs often have backgrounds in software engineering, systems engineering, and/or system administration. The focuses of SRE include automation, system design, and improvements to system resilience. SRE is considered a specific implementation of DevOps; focusing specifically on building reliable systems, whereas DevOps covers a broader scope of operations. Despite having different focuses, some companies have rebranded their operations teams to SRE teams. == Principles and practices == Common definitions of the practices include (but are not limited to): Automation of repetitive tasks for cost-effectiveness. Defining reliability goals to prevent endless effort. Design of systems with a goal to reduce risks to availability, latency, and efficiency. Observability, the ability to ask arbitrary questions about a system without having to know ahead of time what to ask. Common definitions of the principles include (but are not limited to): Toil management, the implementation of the first principle outlined above. Defining and measuring reliability goals—SLIs, SLOs, and error budgets. Non-Abstract Large Scale Systems Design (NALSD) with a focus on reliability. Designing for and implementing observability. Defining, testing, and running an incident management process. Capacity planning. Change and release management, including CI/CD. Chaos engineering. == Deployment == SRE teams collaborate with other departments within organizations to guide the implementation of the mentioned principles. Below is an overview of common practices: === Kitchen Sink === Kitchen Sink refers to the expansive and often unbounded scope of services and workflows that SRE teams oversee. Unlike traditional roles with clearly defined boundaries, SREs are tasked with various responsibilities, including system performance optimization, incident management, and automation. This approach allows SREs to address multiple challenges, ensuring that systems run efficiently and evolve in response to changing demands and complexities. === Infrastructure === Infrastructure SRE teams focus on maintaining and improving the reliability of systems that support other teams' workflows. While they sometimes collaborate with platform engineering teams, their primary responsibility is ensuring up-time, performance, and efficiency. Platform teams, on the other hand, primarily develop the software and systems used across the organization. While reliability is a goal for both, platform teams prioritize creating and maintaining the tools and services used by internal stakeholders, whereas Infrastructure SRE teams are tasked with ensuring those systems run smoothly and meet reliability standards. === Tools === SRE teams utilize a variety of tools with the aim of measuring, maintaining, and enhancing system reliability. These tools play a role in monitoring performance, identifying issues, and facilitating proactive maintenance. For instance, Nagios Core is commonly employed for system monitoring and alerting, while Prometheus (software) is frequently used for collecting and querying metrics in cloud-native environments. === Product or Application === SRE teams dedicated to specific products or applications are common in large organizations. These teams are responsible for ensuring the reliability, scalability, and performance of key services. In larger companies, it's typical to have multiple SRE teams, each focusing on different products or applications, ensuring that each area receives specialized attention to meet performance and availability targets. === Embedded === In an embedded model, individual SREs or small SRE pairs are integrated within software engineering teams. These SREs collaborate with developers, applying core SRE principles—such as automation, monitoring, and incident response—directly to the software development lifecycle. This approach aims to enhance reliability, performance, and collaboration between SREs and developers. === Consulting === Consulting SRE teams specialize in advising organizations on the implementation of SRE principles and practices. Typically composed of seasoned SREs with a history across various implementations, these teams provide insights and guidance for specific organizational needs. When working directly with clients, these SREs are often referred to as 'Customer Reliability Engineers.' In large organizations that have adopted SRE, a hybrid model is common. This model includes various implementations, such as multiple Product/Application SRE teams dedicated to addressing the specific reliability needs of different products. An Infrastructure SRE team may collaborate with a Platform engineering group to achieve shared reliability goals for a unified platform that supports all products and applications. == Industry == Since 2014, the USENIX organization has hosted the annual SREcon conference, bringing together site reliability engineers from various industries. This conference is a platform for professionals to share knowledge, explore effective practices, and discuss trends in site reliability engineering.

    Read more →
  • The Citation Project

    The Citation Project

    The Citation Project is a series of studies that measure and analyze first-year college writing students' source use and their ability to understand and implement sources within their own writing. The Citation Project reveals students' source-use habits and the issues that can be seen based on their lack of proper citation skills, such as the prevalence of plagiarism, institution policies, and the results of current writing pedagogy. The Citation Project's central findings were first presented at the Conference on College Composition and Communication in 2012. Although The Citation Project originally referred to this single 2012 study, the feedback received led to the conception of the Project as a broader initiative and as a place to gather and publish studies and data relating to student writing habits for the usage of other researches. == Method == The Citation Project's data comes from the work of 20 researchers analyzing 174 first-year composition students' research papers. The student papers studied originated from 16 institutions across the United States of America, including community colleges, public and private universities, denominational colleges, and Ivy Leagues. Researchers used bibliographic coding to aggregate data regarding the type, length, reading level, and usage of students' sources. == Findings == === Student source assessment and use === This study found that students were capable of identifying, locating, and accessing librarian-approved academic sources, most commonly accessing them with the internet. Despite students demonstrating their ability to find appropriate sources, they tend to exclusively cite the first few pages of their sources. Students' use and analysis of their citations are often limited, frequently resorting to patchwriting, directly restating their source's points, and omitting their own interpretations of their reference's ideas. The Citation Project also highlights students' struggle to accurately determine, address, and value their sources' bias, authority, and credibility. According to the Project's researchers' analysis, these habits demonstrate that first-year college writing students minimally engage with their sources and the academic conversations between them. One researcher from the Citation Project, Rebecca Moore Howard, believes these findings do not point towards students being lazy, but is rather a result of a writing pedagogy that prioritizes efficient, product-focused writing. Another interpretation offered by Sandra Jamieson, another researcher from the Citation Project explains their findings as a result of a lack of adherence to Information Learning (IL) Standards. === Pedagogy === A significant focus of The Citation Project is the development of pedagogical practices intended to equip students with writing and research techniques that will set them up for future success. Writers associated with The Citation Project, such as Tricia Serviss, believe that the practices of teachers surrounding academic integrity and writing practices are what form the foundation of how students think about writing and how to engage with assignments throughout their academic career. They also stress the importance of teaching students to effectively engage with sources rather than simply how to correctly cite them. The Citation Project asserts that endowing students with the ability to read, understand, and synthesize a variety of sources in their writing is a skill that will benefit them throughout their academic careers, and that the surface level typographical focus that many writing programs utilize is inadequate. == Plagiarism == One of the areas that The Citation Project also looks at is how students commit plagiarism throughout their writing. Plagiarism tends to be a checkpoint that gives instructors a sense where students' citation skills stand. Findings from The Citation Project reveal that the most common type of plagiarism is patchwriting which is the act of using the same sentence with only changing a couple of words. These types of issues can be seen as a learning curve due to lack of proper training. Student's that commit plagiarism are often unaware. === Policies === Another issue found is that academic plagiarism policies may not benefit a student's growth but may instead obstruct it. Policies against plagiarism tend to be harsh on the student that committed of offense. Even though student plagiarism is often unintentional academic institutions see this behavior as intentional. Student may then face harsh consequences as a result from their lack of citation knowledge. Additionally, higher level institutions assume that new students already have the skill set to avoid plagiarism which may be an unrealistic expectation. == Legacy == === Inspired studies === ==== Parrott and Napier ==== In one study, "Critical Reading and Student Self-Selected Texts: Results of a Collaborative, Explicit Curricular Approach," Jill Parrot and Trenia Napier quoted the Citation Project's findings as evidence that current collegiate writing curriculums are an ineffective means of teaching students how to properly write academic research papers. The researchers accredited current writing pedagogy's lack of emphasis on teaching critical reading skills. Parrott and Napier tested their thesis by seeing if students would produce more academic writing if they partook in a writing course that taught critical reading. Their results mostly went against this hypothesis, finding students who received additional critical reading training only significantly improved in how they integrated their sources. ==== Kocatepe ==== In May Mehtap Kocatep's study, "Reconceptualising the notion of finding information: How undergraduate students construct information as they read-to-write in an academic writing class," Kocatep expresses that she believes current conversations around writing pedagogy, including the Citation Project, operate with the underlying misconception that information is an easily discoverable static entity and its retrieval is an objective, unbiased decision. Kocatepe instead offers the analysis of what students view as valuable information and if it is worth using is influenced by the socially constructed meanings available to writers at the moment. To further examine students' source engagement, Kocatepe did a study on how female university students from the United Arab Emirates find, retrieve, use, and value sources. Kocatepe's results mainly noted students' almost exclusive reliance on using Google to find sources, as well as how students' navigated mainly English-speaking academic conversations as non-native English speakers. ==== Dahlen, Nordstrom-Sanchez, and Graff ==== Dahlen, Nordstrom-Sanchez, and Graff built their study off The Citation Project research in order to explore the attitudes and practices of students in an undergraduate writing course. As the researchers acknowledge, data collected by the Citation Project was the subject of the bulk of their analysis. This study sought to examine undergraduate writing practices tied to source-usage and elucidate any relevant trends. Dahlen, Nordstrom-Sanchez and Graff found that undergraduate writing students were not engaging with outside sources properly. Key issues discussed include lack of engagement with broad source ideas (in favor of picking out quotes), lack of paraphrasing, and inability to link information between multiple sources. ==== Davis ==== Phillip M. Davis based much of the analysis in his study on data gathered by the Citation Project. This study aimed to examine the particular effects web-based research and study had on undergraduate's papers and the replicability of their bibliographies. Davis sought to see how the shift from physical in-person library based research to online, often at-home research changed the function and usability of the bibliography as a form of documenting source usage in a given work. The primary method of analysis involved examining students' bibliographies to see where they were finding information online and how these sources were accessed. A main issue Davis found was "persistency" of URLs used for online citations. He found that only 18% of URL-based citations continued to function (the others either no longer pointing to the correct document or ceasing to exist altogether) within 3 years of their usage by students, and more than half of claimed online citations could not be found in any form. He suggests that this result brings up questions about how web-based citations should be dealt with in a university context.

    Read more →
  • Data quality

    Data quality

    Data quality refers to the state of qualitative or quantitative pieces of information. There are many definitions of data quality, but data is generally considered high quality if it is "fit for [its] intended uses in operations, decision making and planning". Data is deemed of high quality if it correctly represents the real-world construct to which it refers. Apart from these definitions, as the number of data sources increases, the question of internal data consistency becomes significant, regardless of fitness for use for any particular external purpose. People's views on data quality can often be in disagreement, even when discussing the same set of data used for the same purpose. When this is the case, businesses may adopt recognised international standards for data quality (See #International Standards for Data Quality below). Data governance can also be used to form agreed upon definitions and standards, including international standards, for data quality. In such cases, data cleansing, including standardization, may be required in order to ensure data quality. == Definitions == Defining data quality is difficult due to the many contexts data are used in, as well as the varying perspectives among end users, producers, and custodians of data. From a consumer perspective, data quality is: "data that are fit for use by data consumers" data "meeting or exceeding consumer expectations" data that "satisfies the requirements of its intended use" From a business perspective, data quality is: data that are "'fit for use' in their intended operational, decision-making and other roles" or that exhibits "'conformance to standards' that have been set, so that fitness for use is achieved" data that "are fit for their intended uses in operations, decision making and planning" "the capability of data to satisfy the stated business, system, and technical requirements of an enterprise" From a standards-based perspective, data quality is: the "degree to which a set of inherent characteristics (quality dimensions) of an object (data) fulfills requirements" "the usefulness, accuracy, and correctness of data for its application" Arguably, in all these cases, "data quality" is a comparison of the actual state of a particular set of data to a desired state, with the desired state being typically referred to as "fit for use," "to specification," "meeting consumer expectations," "free of defect," or "meeting requirements." These expectations, specifications, and requirements are usually defined by one or more individuals or groups, standards organizations, laws and regulations, business policies, or software development policies. == Dimensions of data quality == Drilling down further, those expectations, specifications, and requirements are stated in terms of characteristics or dimensions of the data, such as: accessibility or availability accuracy or correctness comparability completeness or comprehensiveness consistency, coherence, or clarity credibility, reliability, or reputation flexibility plausibility relevance, pertinence, or usefulness timeliness or latency uniqueness validity or reasonableness A systematic scoping review of the literature suggests that data quality dimensions and methods with real world data are not consistent in the literature, and as a result quality assessments are challenging due to the complex and heterogeneous nature of these data. == International standards for data quality == ISO 8000 is an international standard for data quality. Managed by the International Organization for Standardization, the ISO 8000 standards address and describe general aspects of data quality including principles, vocabulary and measurement data governance data quality management data quality assessment quality of master data, including exchange of characteristic data and identifiers quality of industrial data == History == Before the rise of the inexpensive computer data storage, massive mainframe computers were used to maintain name and address data for delivery services. This was so that mail could be properly routed to its destination. The mainframes used business rules to correct common misspellings and typographical errors in name and address data, as well as to track customers who had moved, died, gone to prison, married, divorced, or experienced other life-changing events. Government agencies began to make postal data available to a few service companies to cross-reference customer data with the National Change of Address registry (NCOA). This technology saved large companies millions of dollars in comparison to manual correction of customer data. Large companies saved on postage, as bills and direct marketing materials made their way to the intended customer more accurately. Initially sold as a service, data quality moved inside the walls of corporations, as low-cost and powerful server technology became available. Companies with an emphasis on marketing often focused their quality efforts on name and address information, but data quality is recognized as an important property of all types of data. Principles of data quality can be applied to supply chain data, transactional data, and nearly every other category of data found. For example, making supply chain data conform to a certain standard has value to an organization by: 1) avoiding overstocking of similar but slightly different stock; 2) avoiding false stock-out; 3) improving the understanding of vendor purchases to negotiate volume discounts; and 4) avoiding logistics costs in stocking and shipping parts across a large organization. For companies with significant research efforts, data quality can include developing protocols for research methods, reducing measurement error, bounds checking of data, cross tabulation, modeling and outlier detection, verifying data integrity, etc. == Overview == There are a number of theoretical frameworks for understanding data quality. A systems-theoretical approach influenced by American pragmatism expands the definition of data quality to include information quality, and emphasizes the inclusiveness of the fundamental dimensions of accuracy and precision on the basis of the theory of science (Ivanov, 1972). One framework, dubbed "Zero Defect Data" (Hansen, 1991) adapts the principles of statistical process control to data quality. Another framework seeks to integrate the product perspective (conformance to specifications) and the service perspective (meeting consumers' expectations) (Kahn et al. 2002). Another framework is based in semiotics to evaluate the quality of the form, meaning and use of the data (Price and Shanks, 2004). One highly theoretical approach analyzes the ontological nature of information systems to define data quality rigorously (Wand and Wang, 1996). A considerable amount of data quality research involves investigating and describing various categories of desirable attributes (or dimensions) of data. Nearly 200 such terms have been identified and there is little agreement in their nature (are these concepts, goals or criteria?), their definitions or measures (Wang et al., 1993). Software engineers may recognize this as a similar problem to "ilities". MIT has an Information Quality (MITIQ) Program, led by Professor Richard Wang, which produces a large number of publications and hosts a significant international conference in this field (International Conference on Information Quality, ICIQ). This program grew out of the work done by Hansen on the "Zero Defect Data" framework (Hansen, 1991). In practice, data quality is a concern for professionals involved with a wide range of information systems, ranging from data warehousing and business intelligence to customer relationship management and supply chain management. One industry study estimated the total cost to the U.S. economy of data quality problems at over U.S. $600 billion per annum (Eckerson, 2002). Incorrect data – which includes invalid and outdated information – can originate from different data sources – through data entry, or data migration and conversion projects. In 2002, the USPS and PricewaterhouseCoopers released a report stating that 23.6 percent of all U.S. mail sent is incorrectly addressed. One reason contact data becomes stale very quickly in the average database – more than 45 million Americans change their address every year. In fact, the problem is such a concern that companies are beginning to set up a data governance team whose sole role in the corporation is to be responsible for data quality. In some organizations, this data governance function has been established as part of a larger Regulatory Compliance function - a recognition of the importance of Data/Information Quality to organizations. Problems with data quality don't only arise from incorrect data; inconsistent data is a problem as well. Eliminating data shadow systems and centralizing data in a warehouse is one of the initiatives a company can take to ensure data consistency. En

    Read more →