Locally recoverable code

Locally recoverable codes are a family of error correction codes that were introduced first by D. S. Papailiopoulos and A. G. Dimakis and have been widely studied in information theory due to their applications related to distributive and cloud storage systems. An [ n , k , d , r ] q {\displaystyle [n,k,d,r]_{q}} LRC is an [ n , k , d ] q {\displaystyle [n,k,d]_{q}} linear code such that there is a function f i {\displaystyle f_{i}} that takes as input i {\displaystyle i} and a set of r {\displaystyle r} other coordinates of a codeword c = ( c 1 , … , c n ) ∈ C {\displaystyle c=(c_{1},\ldots ,c_{n})\in C} different from c i {\displaystyle c_{i}} , and outputs c i {\displaystyle c_{i}} . == Overview == Erasure-correcting codes, or simply erasure codes, for distributed and cloud storage systems, are becoming more and more popular as a result of the present spike in demand for cloud computing and storage services. This has inspired researchers in the fields of information and coding theory to investigate new facets of codes that are specifically suited for use with storage systems. It is well-known that LRC is a code that needs only a limited set of other symbols to be accessed in order to restore every symbol in a codeword. This idea is very important for distributed and cloud storage systems since the most common error case is when one storage node fails (erasure). The main objective is to recover as much data as possible from the fewest additional storage nodes in order to restore the node. Hence, Locally Recoverable Codes are crucial for such systems. The following definition of the LRC follows from the description above: an [ n , k , r ] {\displaystyle [n,k,r]} -Locally Recoverable Code (LRC) of length n {\displaystyle n} is a code that produces an n {\displaystyle n} -symbol codeword from k {\displaystyle k} information symbols, and for any symbol of the codeword, there exist at most r {\displaystyle r} other symbols such that the value of the symbol can be recovered from them. The locality parameter satisfies 1 ≤ r ≤ k {\displaystyle 1\leq r\leq k} because the entire codeword can be found by accessing k {\displaystyle k} symbols other than the erased symbol. Furthermore, Locally Recoverable Codes, having the minimum distance d {\displaystyle d} , can recover d − 1 {\displaystyle d-1} erasures. == Definition == Let C {\displaystyle C} be a [ n , k , d ] q {\displaystyle [n,k,d]_{q}} linear code. For i ∈ { 1 , … , n } {\displaystyle i\in \{1,\ldots ,n\}} , let us denote by r i {\displaystyle r_{i}} the minimum number of other coordinates we have to look at to recover an erasure in coordinate i {\displaystyle i} . The number r i {\displaystyle r_{i}} is said to be the locality of the i {\displaystyle i} -th coordinate of the code. The locality of the code is defined as An [ n , k , d , r ] q {\displaystyle [n,k,d,r]_{q}} locally recoverable code (LRC) is an [ n , k , d ] q {\displaystyle [n,k,d]_{q}} linear code C ∈ F q n {\displaystyle C\in \mathbb {F} _{q}^{n}} with locality r {\displaystyle r} . Let C {\displaystyle C} be an [ n , k , d ] q {\displaystyle [n,k,d]_{q}} -locally recoverable code. Then an erased component can be recovered linearly, i.e. for every i ∈ { 1 , … , n } {\displaystyle i\in \{1,\ldots ,n\}} , the space of linear equations of the code contains elements of the form x i = f ( x i 1 , … , x i r ) {\displaystyle x_{i}=f(x_{i_{1}},\ldots ,x_{i_{r}})} , where i j ≠ i {\displaystyle i_{j}\neq i} . == Optimal locally recoverable codes == Theorem Let n = ( r + 1 ) s {\displaystyle n=(r+1)s} and let C {\displaystyle C} be an [ n , k , d ] q {\displaystyle [n,k,d]_{q}} -locally recoverable code having s {\displaystyle s} disjoint locality sets of size r + 1 {\displaystyle r+1} . Then An [ n , k , d , r ] q {\displaystyle [n,k,d,r]_{q}} -LRC C {\displaystyle C} is said to be optimal if the minimum distance of C {\displaystyle C} satisfies == Tamo–Barg codes == Let f ∈ F q [ x ] {\displaystyle f\in \mathbb {F} _{q}[x]} be a polynomial and let ℓ {\displaystyle \ell } be a positive integer. Then f {\displaystyle f} is said to be ( r {\displaystyle r} , ℓ {\displaystyle \ell } )-good if • f {\displaystyle f} has degree r + 1 {\displaystyle r+1} , • there exist distinct subsets A 1 , … , A ℓ {\displaystyle A_{1},\ldots ,A_{\ell }} of F q {\displaystyle \mathbb {F} _{q}} such that – for any i ∈ { 1 , … , ℓ } {\displaystyle i\in \{1,\ldots ,\ell \}} , f ( A i ) = { t i } {\displaystyle f(A_{i})=\{t_{i}\}} for some t i ∈ F q {\displaystyle t_{i}\in \mathbb {F} _{q}} , i.e., f {\displaystyle f} is constant on A i {\displaystyle A_{i}} , – # A i = r + 1 {\displaystyle \#A_{i}=r+1} , – A i ∩ A j = ∅ {\displaystyle A_{i}\cap A_{j}=\varnothing } for any i ≠ j {\displaystyle i\neq j} . We say that { A 1 , … , A ℓ {\displaystyle A_{1},\ldots ,A_{\ell }} } is a splitting covering for f {\displaystyle f} . === Tamo–Barg construction === The Tamo–Barg construction utilizes good polynomials. • Suppose that a ( r , ℓ ) {\displaystyle (r,\ell )} -good polynomial f ( x ) {\displaystyle f(x)} over F q {\displaystyle \mathbb {F} _{q}} is given with splitting covering i ∈ { 1 , … , ℓ } {\displaystyle i\in \{1,\ldots ,\ell \}} . • Let s ≤ ℓ − 1 {\displaystyle s\leq \ell -1} be a positive integer. • Consider the following F q {\displaystyle \mathbb {F} _{q}} -vector space of polynomials V = { ∑ i = 0 s g i ( x ) f ( x ) i : deg ⁡ ( g i ( x ) ) ≤ deg ⁡ ( f ( x ) ) − 2 } . {\displaystyle V=\left\{\sum _{i=0}^{s}g_{i}(x)f(x)^{i}:\deg(g_{i}(x))\leq \deg(f(x))-2\right\}.} • Let T = ⋃ i = 1 ℓ A i {\textstyle T=\bigcup _{i=1}^{\ell }A_{i}} . • The code { ev T ⁡ ( g ) : g ∈ V } {\displaystyle \{\operatorname {ev} _{T}(g):g\in V\}} is an ( ( r + 1 ) ℓ , ( s + 1 ) r , d , r ) {\displaystyle ((r+1)\ell ,(s+1)r,d,r)} -optimal locally coverable code, where ev T {\displaystyle \operatorname {ev} _{T}} denotes evaluation of g {\displaystyle g} at all points in the set T {\displaystyle T} . === Parameters of Tamo–Barg codes === • Length. The length is the number of evaluation points. Because the sets A i {\displaystyle A_{i}} are disjoint for i ∈ { 1 , … , ℓ } {\displaystyle i\in \{1,\ldots ,\ell \}} , the length of the code is | T | = ( r + 1 ) ℓ {\displaystyle |T|=(r+1)\ell } . • Dimension. The dimension of the code is ( s + 1 ) r {\displaystyle (s+1)r} , for s {\displaystyle s} ≤ ℓ − 1 {\displaystyle \ell -1} , as each g i {\displaystyle g_{i}} has degree at most deg ⁡ ( f ( x ) ) − 2 {\displaystyle \deg(f(x))-2} , covering a vector space of dimension deg ⁡ ( f ( x ) ) − 1 = r {\displaystyle \deg(f(x))-1=r} , and by the construction of V {\displaystyle V} , there are s + 1 {\displaystyle s+1} distinct g i {\displaystyle g_{i}} . • Distance. The distance is given by the fact that V ⊆ F q [ x ] ≤ k {\displaystyle V\subseteq \mathbb {F} _{q}[x]_{\leq k}} , where k = r + 1 − 2 + s ( r + 1 ) {\displaystyle k=r+1-2+s(r+1)} , and the obtained code is the Reed-Solomon code of degree at most k {\displaystyle k} , so the minimum distance equals ( r + 1 ) ℓ − ( ( r + 1 ) − 2 + s ( r + 1 ) ) {\displaystyle (r+1)\ell -((r+1)-2+s(r+1))} . • Locality. After the erasure of the single component, the evaluation at a i ∈ A i {\displaystyle a_{i}\in A_{i}} , where | A i | = r + 1 {\displaystyle |A_{i}|=r+1} , is unknown, but the evaluations for all other a ∈ A i {\displaystyle a\in A_{i}} are known, so at most r {\displaystyle r} evaluations are needed to uniquely determine the erased component, which gives us the locality of r {\displaystyle r} . To see this, g {\displaystyle g} restricted to A j {\displaystyle A_{j}} can be described by a polynomial h {\displaystyle h} of degree at most deg ⁡ ( f ( x ) ) − 2 = r + 1 − 2 = r − 1 {\displaystyle \deg(f(x))-2=r+1-2=r-1} thanks to the form of the elements in V {\displaystyle V} (i.e., thanks to the fact that f {\displaystyle f} is constant on A j {\displaystyle A_{j}} , and the g i {\displaystyle g_{i}} 's have degree at most deg ⁡ ( f ( x ) ) − 2 {\displaystyle \deg(f(x))-2} ). On the other hand | A j ∖ { a j } | = r {\displaystyle |A_{j}\backslash \{a_{j}\}|=r} , and r {\displaystyle r} evaluations uniquely determine a polynomial of degree r − 1 {\displaystyle r-1} . Therefore h {\displaystyle h} can be constructed and evaluated at a j {\displaystyle a_{j}} to recover g ( a j ) {\displaystyle g(a_{j})} . === Example of Tamo–Barg construction === We will use x 5 ∈ F 41 [ x ] {\displaystyle x^{5}\in \mathbb {F} _{41}[x]} to construct [ 15 , 8 , 6 , 4 ] {\displaystyle [15,8,6,4]} -LRC. Notice that the degree of this polynomial is 5, and it is constant on A i {\displaystyle A_{i}} for i ∈ { 1 , … , 8 } {\displaystyle i\in \{1,\ldots ,8\}} , where A 1 = { 1 , 10 , 16 , 18 , 37 } {\displaystyle A_{1}=\{1,10,16,18,37\}} , A 2 = 2 A 1 {\displaystyle A_{2}=2A_{1}} , A 3 = 3 A 1 {\displaystyle A_{3}=3A_{1}} , A 4 = 4 A 1 {\displaystyle A_{4}=4A_{1}} , A 5 = 5 A 1 {\displaystyle A_{5}=5A_{1}} , A 6 = 6 A 1 {\displaystyle A_{6}=6A_{1}}

Gollum browser

Gollum browser is a discontinued web browser for accessing Wikipedia. Since 2017, Gollum is no longer accessible online. Gollum is designed to browse Wikipedia in an easier way than directly using the web browser. Links external to Wikipedia are opened in the user's regular browser. Gollum is opened from a regular browser and makes a window that puts the Wikipedia search bar on the toolbar. Gollum was created by Harald Hanek in 2005 using PHP and Ajax. According to one blogger, Gollum provides a way to bypass censorship of Wikipedia in China. == Languages == Though the website is available only in English and German, Gollum's GUI is available in more than 32 languages and can browse nearly 50 Wikipedia editions. === Gollum's GUI === === Browsable Wikipedia editions ===

Apache cTAKES

Apache cTAKES: clinical Text Analysis and Knowledge Extraction System is an open-source Natural Language Processing (NLP) system that extracts clinical information from electronic health record unstructured text. It processes clinical notes, identifying types of clinical named entities — drugs, diseases/disorders, signs/symptoms, anatomical sites and procedures. Each named entity has attributes for the text span, the ontology mapping code, context (family history of, current, unrelated to patient), and negated/not negated. cTAKES was built using the UIMA Unstructured Information Management Architecture framework and OpenNLP natural language processing toolkit. == Components == Components of cTAKES are specifically trained for the clinical domain, and create rich linguistic and semantic annotations that can be utilized by clinical decision support systems and clinical research. These components include: Named Section identifier Sentence boundary detector Rule-based tokenizer Formatted list identifier Normalizer Context dependent tokenizer Part-of-speech tagger Phrasal chunker Dictionary lookup annotator Context annotator Negation detector Uncertainty detector Subject detector Dependency parser patient smoking status identifier Drug mention annotator == History == Development of cTAKES began at the Mayo Clinic in 2006. The development team, led by Dr. Guergana Savova and Dr. Christopher Chute, included physicians, computer scientists and software engineers. After its deployment, cTAKES became an integral part of Mayo's clinical data management infrastructure, processing more than 80 million clinical notes. When Dr. Savova's moved to Boston Children's Hospital in early 2010, the core development team grew to include members there. Further external collaborations include: University of Colorado Brandeis University University of Pittsburgh University of California at San Diego Such collaborations have extended cTAKES' capabilities into other areas such as Temporal Reasoning, Clinical Question Answering, and coreference resolution for the clinical domain. In 2010, cTAKES was adopted by the i2b2 program and is a central component of the SHARP Area 4. In 2013, cTAKES released their first release as an Apache Software Foundation incubator project: cTAKES 3.0. In March 2013, cTAKES became an Apache Software Foundation Top Level Project (TLP).

Sophia Ananiadou

Sophia Ananiadou is a Greek-British computer scientist and computational linguist. She led the development of and directs the National Centre for Text Mining (NaCTeM) in the United Kingdom. She is also Professor in Computer Science in the Department of Computer Science at the University of Manchester. Her research focusses on biomedical text mining and natural language processing and has fed into the development of numerous applications that, for example, facilitate the discovery of new knowledge, enable exploration of historical archives, allow semantic search of biomedical literature, reduce human effort in screening search hits for production of systematic reviews, enable enrichment of metabolic pathway models with evidence from the literature, allow discovery of risk in the construction industry from health and safety incident reports and enable interoperability of components in text mining workflows. == Education == Ananiadou was educated at the Lycée français St Joseph in Athens, Greece (1969–1975). She received a Bachelor of Arts (Ptychion) from the University of Athens (1979), a Master of Advanced Studies (DEA) in Linguistics from Paris VII, Jussieu, France (1980), a DEA in Literature from Paris IV, Sorbonne, France (1984) and a PhD in Computational linguistics from the University of Manchester Institute of Science and Technology (UMIST), in 1988. == Career and research == Ananiadou was a research assistant at Dalle Molle Institute for Semantic and Cognitive Studies (ISSCO, 1983–1984), a research assistant (1985–1988) then research associate (1988–1993) in the department of language engineering at UMIST, senior lecturer at Manchester Metropolitan University (1993–1999), senior lecturer then reader in the School of Computing Science and Engineering, University of Salford (2000–2005), then reader in the School of Computer Science, University of Manchester (2005–2009). Since 2009, she has served as professor in computer science in the Department of Computer Science at the University of Manchester. In July 2025, she became deputy director of the Christabel Pankhurst Institute for health technology research and innovation, University of Manchester. From 2018–2026, she served as the deputy director of the Institute for Data Science and Artificial Intelligence, University of Manchester. She is a senior lead researcher of the ARCHIMEDES research unit of the Athena Research Centre, Greece. ARCHIMEDES is a research and innovation hub fostering international collaboration and knowledge exchange on Artificial Intelligence and Data Science. On February 7, 2025, she was appointed a member of the Artificial Intelligence Sectoral Scientific Council of the Greek Ministry of Development (announcement of appointment in Greek). She is also a Visiting Distinguished Research Fellow in the Knowledge and Information Research Team at the Artificial Intelligence Research Center (AIRC), Japan, which is a research unit of the Japanese National Institute of Advanced Industrial Science and Technology (AIST). In addition, she was appointed to the honorary position of Adjunct Professor of Wuhan University, People's Republic of China, for the period October 2025 to October 2028, collaborating with the School of Artificial Intelligence. Ananiadou has published since 1986, has an h-index of 81 and a Research.com United Kingdom ranking in Computer Science of 104. She is also ranked number 1 internationally in text mining by ScholarGPS. In addition, she is included in the Stanford/Elsevier Top 2% Scientist Rankings for 2025. Ananiadou received a Diplôme de traducteur (Diploma of Translator) from the Institut français d'Athènes, Greece (1979) and a Certificate in Counselling from the University of Salford, UK (2004). === Awards and honours === In 2019, in recognition of her contributions in Artificial Intelligence and text mining for Biomedicine, Ananiadou received an honorary doctorate from the University of the Aegean, on the 20th anniversary of its Department of Mediterranean Studies, Rhodes. Ananiadou received the Unstructured Information Management Architecture (UIMA) innovation award from IBM three years running (2006, 2007 & 2008). She was awarded the Daiwa Adrian Prize in 2004 and also received a Japan Trust award from the Ministry of Education, Japan in 1997. Ananiadou was a Turing Fellow of the Alan Turing Institute in London from 2018 to 2023. Since 2021, she is a member and, since 2024, a Fellow, of the ELLIS Society, the professional society of the cross-national European Laboratory for Learning and Intelligent Systems. Ananiadou served as vice president (VP) of the European Association for Terminology from 1997 to 1999. At the 28th International Conference on Computational Linguistics (COLING 2020), she received, with M. Li and H. Takamura, an Outstanding Paper designation for the paper "A Neural Model for Aggregating Coreference Annotation in Crowdsourcing".

Brian D. Ripley

Brian David Ripley FRSE (born 29 April 1952) is a British statistician. From 1990, he was professor of applied statistics at the University of Oxford and also a professorial fellow at St Peter's College. He retired August 2014 due to ill health. == Biography == Ripley has made contributions to the fields of spatial statistics and pattern recognition. His work on artificial neural networks in the 1990s helped to bring aspects of machine learning and data mining to the attention of statistical audiences. He emphasised the value of robust statistics in his books Pattern Recognition and Neural Networks and Modern Applied Statistics with S. Ripley helped develop the S-PLUS programming language and its open source derivative R. He co-authored two books based on S, S Programming and Modern Applied Statistics with S. Since mid-1997 he is a member of the "R Core Team" and from 2000 to 2021 he was one of the most active committers to the R core. The package MASS is one of only fifteen "recommended packages" for R (with June 2024 more than 20,900). He was educated at the University of Cambridge, where he was awarded both the Smith's Prize (at the time awarded to the best graduate essay writer who had been undergraduate at Cambridge in that cohort) and the Rollo Davidson Prize. The university also awarded him the Adams Prize in 1987 for an essay entitled Statistical Inference for Spatial Processes, later published as a book. He served on the faculty of Imperial College, London from 1976 until 1983, at which point he moved to the University of Strathclyde. == Authored books == Ripley, B. D. (1981) Spatial Statistics. Wiley, 252pp. ISBN 0-471-08367-4. Ripley, B. D. (1983) Stochastic Simulation. Wiley, ISBN 0-471-81884-4. Ripley, B. D. (1988). Statistical Inference for Spatial Processes. Cambridge University Press. ISBN 0-521-35234-7. Ripley, B. D. (1996) Pattern Recognition and Neural Networks. Cambridge University Press. 403 pages. ISBN 0-521-46086-7. Venables, W. N. and Ripley, B. D. (2000) S Programming. Springer, 264pp. ISBN 978-0-387-98966-2. Venables, W. N. and Ripley, B. D. (2002) Modern Applied Statistics with S (Fourth Edition; previous editions published as Modern Applied Statistics with S-PLUS in 1994, 1997 & 1999). Springer, 462pp. ISBN 978-0-387-95457-8.

Intel Threat Detection Technology

Intel Threat Detection Technology (TDT) is a CPU-level technology created by Intel in 2018 to enable host endpoint protections to use a CPU's low-level access to detect threats to a system. TDT consists of multiple components including Accelerated Memory Scanning, which uses the CPU's integrated GPU to scan memory, and Advanced Platform Telemetry, which uses processor-level activity monitoring to detect unusual activity. It is supported on sixth-generation or newer Intel Core CPUs and additional capabilities were added to the 11th generation Core processors. Intel TDT is integrated into several third-party anti-malware solutions including Microsoft Defender, Check Point Harmony Endpoint, CrowdStrike Falcon, and others. == Accelerated Memory Scanning == Accelerated Memory Scanning (also referred to as "Advanced Memory Scanning") uses the CPU's integrated GPU to scan memory for malicious code, instead of using the CPU directly. This improves system responsiveness during anti-malware scanning. and lowers power consumption. Features include pattern matching, using random forest decision trees, string extraction, entropy calculation, and Euclidean clustering. == Advanced Platform Telemetry == Advanced Platform Telemetry collects CPU-level telemetry to detect uncommon activity patterns which might be indicative of malware. The telemetry data is collected from the CPU performance monitoring unit (PMU) and doesn't require a large signature database to detect malware. Instead, it uses machine-learning based correlations to identify indicators of attack For example, Microsoft Defender is able to use TDT's Advanced Platform Telemetry features to detect processor usage patterns indicative of ransomware and cryptojacking with TDT so it can detect them.

SYSTRAN

SYSTRAN, founded by Dr. Peter Toma in 1968, is one of the oldest machine translation companies. SYSTRAN has done extensive work for the United States Department of Defense and the European Commission. SYSTRAN provided the technology for Yahoo! Babel Fish until May 30, 2012, among others. It was used by Google's language tools until 2007. SYSTRAN is used by the Dashboard Translation widget in macOS. Commercial versions of SYSTRAN can run on Microsoft Windows (including Windows Mobile), Linux, and Solaris. Historically, SYSTRAN systems used rule-based machine translation (RbMT) technology. With the release of SYSTRAN Server 7 in 2010, SYSTRAN implemented a hybrid rule-based/statistical machine translation (SMT) technology which was the first of its kind in the marketplace. As of 2008, the company had 59 employees of whom 26 are computational experts and 15 computational linguists. The number of employees decreased from 70 in 2006 to 59 in 2008. In January 2024, ChapsVision acquired Systran. == History == With its origin in the Georgetown machine translation effort, SYSTRAN was one of the few machine translation systems to survive the major decrease of funding after the ALPAC Report of the mid-1960s. The company was established in La Jolla in California to work on translation of Russian to English text for the United States Air Force during the Cold War. Large numbers of Russian scientific and technical documents were translated using SYSTRAN under the auspices of the USAF Foreign Technology Division (later the National Air and Space Intelligence Center) at Wright-Patterson Air Force Base, Ohio. The quality of the translations, although only approximate, was usually adequate for understanding content. The company headquarters is in Paris, while its U.S. headquarters is in San Diego, CA. During the dot-com boom, the international language industry started a new era, and SYSTRAN entered into agreements with a number of translation integrators, the most successful of these being WorldLingo. In 2016, the Harvard NLP group and SYSTRAN founded OpenNMT, an open source ecosystem for neural machine translation and neural sequence learning. This has enabled machine translation software with learning capabilities, dramatically increasing MT translation quality. The project has since been used in several research and industry applications, and its open source ecosystem is currently maintained by SYSTRAN and Ubiqus. == Business situation == Most of SYSTRAN's revenue comes from a few customers. 57.1% comes from the 10 main customers and the three largest customers account for 10.9%, 8.9%, and 8.9% of its revenues, respectively. Revenues had been declining in the early 2000s: 10.2 million euros in 2004, 10.1 million euros in 2005, 9.3 million euros in 2006, 8.8 million euros in 2007, and 7.6 million euros in 2008, before seeing a rebound in 2009 with 8.6 million euros. == Languages == The following is a list of the languages in which SYSTRAN translate from and to English: Russian into English in 1968 and English into Russian in 1973 for the Apollo–Soyuz project.