AI Avatar Of Deceased

AI Avatar Of Deceased — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Cognitive computing

    Cognitive computing

    Cognitive computing refers to technology platforms that, broadly speaking, are based on the scientific disciplines of artificial intelligence and signal processing. These platforms encompass machine learning, reasoning, natural language processing, speech recognition and vision (object recognition), human–computer interaction, dialog and narrative generation, among other technologies. == Definition == At present, there is no widely agreed upon definition for cognitive computing in either academia or industry. In general, the term cognitive computing has been used to refer to new hardware and/or software that mimics the functioning of the human brain (2004). In this sense, cognitive computing is a new type of computing with the goal of more accurate models of how the human brain/mind senses, reasons, and responds to stimulus. Cognitive computing applications link data analysis and adaptive page displays (AUI) to adjust content for a particular type of audience. As such, cognitive computing hardware and applications strive to be more affective and more influential by design. The term "cognitive system" also applies to any artificial construct able to perform a cognitive process where a cognitive process is the transformation of data, information, knowledge, or wisdom to a new level in the DIKW Pyramid. While many cognitive systems employ techniques having their origination in artificial intelligence research, cognitive systems, themselves, may not be artificially intelligent. For example, a neural network trained to recognize cancer on an MRI scan may achieve a higher success rate than a human doctor. This system is certainly a cognitive system but is not artificially intelligent. Cognitive systems may be engineered to feed on dynamic data in real-time, or near real-time, and may draw on multiple sources of information, including both structured and unstructured digital information, as well as sensory inputs (visual, gestural, auditory, or sensor-provided). == Cognitive analytics == Cognitive computing-branded technology platforms typically specialize in the processing and analysis of large, unstructured datasets. == Applications == Education Even if cognitive computing can not take the place of teachers, it can still be a heavy driving force in the education of students. Cognitive computing being used in the classroom is applied by essentially having an assistant that is personalized for each individual student. This cognitive assistant can relieve the stress that teachers face while teaching students, while also enhancing the student's learning experience over all. Teachers may not be able to pay each and every student individual attention, this being the place that cognitive computers fill the gap. Some students may need a little more help with a particular subject. For many students, Human interaction between student and teacher can cause anxiety and can be uncomfortable. With the help of Cognitive Computer tutors, students will not have to face their uneasiness and can gain the confidence to learn and do well in the classroom. While a student is in class with their personalized assistant, this assistant can develop various techniques, like creating lesson plans, to tailor and aid the student and their needs. Healthcare Numerous tech companies are in the process of developing technology that involves cognitive computing that can be used in the medical field. The ability to classify and identify is one of the main goals of these cognitive devices. This trait can be very helpful in the study of identifying carcinogens. This cognitive system that can detect would be able to assist the examiner in interpreting countless numbers of documents in a lesser amount of time than if they did not use Cognitive Computer technology. This technology can also evaluate information about the patient, looking through every medical record in depth, searching for indications that can be the source of their problems. Commerce Together with Artificial Intelligence, it has been used in warehouse management systems to collect, store, organize and analyze all related supplier data. All these aims at improving efficiency, enabling faster decision-making, monitoring inventory and fraud detection Human Cognitive Augmentation In situations where humans are using or working collaboratively with cognitive systems, called a human/cog ensemble, results achieved by the ensemble are superior to results obtainable by the human working alone. Therefore, the human is cognitively augmented. In cases where the human/cog ensemble achieves results at, or superior to, the level of a human expert then the ensemble has achieved synthetic expertise. In a human/cog ensemble, the "cog" is a cognitive system employing virtually any kind of cognitive computing technology. Other use cases Speech recognition Sentiment analysis Face detection Risk assessment Fraud detection Behavioral recommendations == Industry work == Cognitive computing in conjunction with big data and algorithms that comprehend customer needs, can be a major advantage in economic decision making. The powers of cognitive computing and artificial intelligence hold the potential to affect almost every task that humans are capable of performing. This can negatively affect employment for humans, as there would be no such need for human labor anymore. It would also increase the inequality of wealth; the people at the head of the cognitive computing industry would grow significantly richer, while workers without ongoing, reliable employment would become less well off. The more industries start to use cognitive computing, the more difficult it will be for humans to compete. Increased use of the technology will also increase the amount of work that AI-driven robots and machines can perform. The influence of competitive individuals in conjunction with artificial intelligence/cognitive computing has the potential to change the course of humankind.

    Read more →
  • Query rewriting

    Query rewriting

    Query rewriting is a typically automatic transformation that takes a set of database tables, views, and/or queries, usually indices, often gathered data and query statistics, and other metadata, and yields a set of different queries, which produce the same results but execute with better performance (for example, faster, or with lower memory use). Query rewriting can be based on relational algebra or an extension thereof (e.g. multiset relational algebra with sorting, aggregation and three-valued predicates i.e. NULLs as in the case of SQL). The equivalence rules of relational algebra are exploited, in other words, different query structures and orderings can be mathematically proven to yield the same result. For example, filtering on fields A and B, or cross joining R and S can be done in any order, but there can be a performance difference. Multiple operations may be combined, and operation orders may be altered. The result of query rewriting may not be at the same abstraction level or application programming interface (API) as the original set of queries (though often is). For example, the input queries may be in relational algebra or SQL, and the rewritten queries may be closer to the physical representation of the data, e.g. array operations. Query rewriting can also involve materialization of views and other subqueries; operations that may or may not be available to the API user. The query rewriting transformation can be aided by creating indices from which the optimizer can choose (some database systems create their own indexes if deemed useful), mandating the use of specific indices, creating materialized and/or denormalized views, or helping a database system gather statistics on the data and query use, as the optimality depends on patterns in data and typical query usage. Query rewriting may be rule based or optimizer based. Some sources discuss query rewriting as a distinct step prior to optimization, operating at the level of the user accessible algebra API (e.g. SQL). There are other, largely unrelated concepts also named similarly, for example, query rewriting by search engines.

    Read more →
  • Iteration

    Iteration

    Iteration means repeating a process to generate a (possibly unbounded) sequence of outcomes. Each repetition of the process is a single iteration, and the outcome of each iteration is the starting point of the next iteration. In mathematics and computer science, iteration (along with the related technique of recursion) is a standard element of algorithms. == Mathematics == In mathematics, iteration may refer to the process of iterating a function, i.e. applying a function repeatedly, using the output from one iteration as the input to the next. Iteration of apparently simple functions can produce complex behaviors and difficult problems – for examples, see the Collatz conjecture and juggler sequences. Another use of iteration in mathematics is in iterative methods which are used to produce approximate numerical solutions to certain mathematical problems. Newton's method is an example of an iterative method. Manual calculation of a number's square root is a common use and a well-known example. == Computing == In computing, iteration is a technique that marks out of a block of statements within a computer program for a defined number of repetitions. That block of statements is said to be iterated. A computer programmer might also refer to that block of statements as an iteration. === Implementations === Loops constitute the most common language constructs for performing iterations. The following pseudocode "iterates" three times the line of code between begin & end through a for loop, and uses the values of i as increments. It is permissible, and often necessary, to use values from other parts of the program outside the bracketed block of statements, to perform the desired function. Iterators constitute alternative language constructs to loops, which ensure consistent iterations over specific data structures. They can eventually save time and effort in later coding attempts. In particular, an iterator allows one to repeat the same kind of operation at each node of such a data structure, often in some pre-defined order. Iteratees are purely functional language constructs, which accept or reject data during the iterations. === Relation with recursion === Recursions and iterations have different algorithmic definitions, even though they can generate identical results. The primary difference is that recursion can be a solution without prior knowledge as to how many times the action must repeat, while a successful iteration requires that foreknowledge. Some types of programming languages, known as functional programming languages, are designed such that they do not set up a block of statements for explicit repetition, as with the for loop. Instead, those programming languages exclusively use recursion. Rather than call out a block of code to repeate a pre-defined number of times, the executing code block instead "divides" the work into a number of separate pieces, after which the code block executes itself on each individual piece. Each piece of work is divided repeatedly until the "amount" of work is as small as possible, at which point the algorithm does that work very quickly. The algorithm then "reverses" and reassembles the pieces into a complete whole. The classic example of recursion is in list-sorting algorithms, such as merge sort. The merge sort recursive algorithm first repeatedly divides the list into consecutive pairs. Each pair is then ordered, then each consecutive pair of pairs, and so forth until the elements of the list are in the desired order. The code below is an example of a recursive algorithm in the Scheme programming language that outputs the same result as the pseudocode under the previous heading. == Education == In some schools of pedagogy, iterations are used to describe the process of teaching or guiding students to repeat experiments, assessments, or projects, until more accurate results are found, or the student has mastered the technical skill. This idea is found in the old adage, "Practice makes perfect." In particular, "iterative" is defined as the "process of learning and development that involves cyclical inquiry, enabling multiple opportunities for people to revisit ideas and critically reflect on their implication." Unlike computing and math, educational iterations are not predetermined; instead, the task is repeated until success according to some external criteria (often a test) is achieved.

    Read more →
  • Vector-field consistency

    Vector-field consistency

    Vector-Field Consistency is a consistency model for replicated data (for example, objects), initially described in a paper which was awarded the best-paper prize in the ACM/IFIP/Usenix Middleware Conference 2007. It has since been enhanced for increased scalability and fault-tolerance in a recent paper. == Description == This consistency model was initially designed for replicated data management in ad hoc gaming in order to minimize bandwidth usage without sacrificing playability. Intuitively, it captures the notion that although players require, wish, and take advantage of information regarding the whole of the game world (as opposed to a restricted view to rooms, arenas, etc. of limited size employed in many multiplayer video games), they need to know information with greater freshness, frequency, and accuracy as other game entities are located closer and closer to the player's position. It prescribes a multidimensional divergence bounding scheme, based on a vector field that employs consistency vectors k=(θ,σ,ν), standing for maximum allowed time - or replica staleness, sequence - or missing updates, and value - or user-defined measured replica divergence, applied to all space coordinates in game scenario or world. The consistency vector-fields emanate from field-generators designated as pivots (for example, players) and field intensity attenuates as distance grows from these pivots in concentric or square-like regions. This consistency model unifies locality-awareness techniques employed in message routing and consistency enforcement for multiplayer games, with divergence bounding techniques traditionally employed in replicated database and web scenarios.

    Read more →
  • Neurocomputing (journal)

    Neurocomputing (journal)

    Neurocomputing is a peer-reviewed scientific journal covering research on artificial intelligence, machine learning, and neural computation. It was established in 1989 and is published by Elsevier. The editor-in-chief is Zidong Wang (Brunel University London). Independent scientometric studies noted that despite being one of the most productive journals in the field, it has kept its reputation across the years intact and plays an important role in leading the research in the area. The journal is abstracted and indexed in Scopus and Science Citation Index Expanded. According to the Journal Citation Reports, its 2023 impact factor is 5.5.

    Read more →
  • Artificial Intelligence Applications Institute

    Artificial Intelligence Applications Institute

    The Artificial Intelligence Applications Institute (AIAI) at the School of Informatics at the University of Edinburgh is a non-profit technology transfer organisation that promoted research in the field of artificial intelligence. == History == The Artificial Intelligence Applications Institute (AIAI) was founded in 1983 at the University of Edinburgh as a specialist research and technology-transfer unit focusing on the practical uses of artificial intelligence (AI). The institute was established by Professor Jim Howe and colleagues from the Science and Engineering Research Council (SERC) Special Interest Group in AI in the Department of Artificial Intelligence, with a mission to apply AI techniques to solve real-world industrial and governmental problems. Under the directorship of Austin Tate, who served from 1985 to 2019, AIAI became one of the leading UK research centres devoted to AI programming systems, intelligent planning systems, decision support, and knowledge-based engineering. It collaborated with both academic partners and international organisations such as the European Space Agency and the UK Ministry of Defence. In 2001, AIAI joined the newly created Centre for Intelligent Systems and their Applications (CISA) within the University's School of Informatics. In December 2019, the institute was renamed the Artificial Intelligence and its Applications Institute to reflect a broader integration of fundamental and applied AI research. == Research programmes == AIAI’s research spans multiple areas of artificial intelligence, including: AI programming Systems - Edinburgh Prolog, Edinburgh Common Lisp, Logo; Knowledge representation and reasoning – development of ontologies, rule-based inference, and semantic modelling; Automated planning and scheduling – intelligent task management systems used in aerospace, manufacturing, and emergency response; Natural language processing and intelligent agents – interaction frameworks for human–computer collaboration; AI ethics and decision-making – research into responsible deployment and evaluation of autonomous systems. The institute also contributes to interdisciplinary fields such as computational creativity, explainable AI, and human–AI interaction. AIAI maintains close collaboration with the Bayes Centre and the Alan Turing Institute through joint research programmes and doctoral training initiatives. == Technology transfer and impact == From its inception, AIAI has combined academic research with technology-transfer activity, offering professional training, industrial consultancy, and bespoke software systems. It pioneered one of the earliest knowledge-based project-management systems, O-Plan, later evolved into the I-Plan framework used for autonomous planning and workflow management.

    Read more →
  • Artificial intelligence in Brazilian industry

    Artificial intelligence in Brazilian industry

    In 2022, 16.9% (1,620) of the 9,586 Brazilian industrial companies with 100 or more employees used artificial intelligence in their operations Among the companies that used AI, the areas of administration (73.8%), product project development (65.9%), processes, services and marketing (65.1%) were those that used it the most, followed by the areas of production (56.4%) and logistics (48.4%). == Current scenario == === Adoption in Brazilian industrial sectors === In senior management, the majority (56%) of executives have a long-term vision for its use. The study also shows that IT, Innovation, and Marketing are the areas where AI use is most widespread, and that 43% of companies are developing or adapting the algorithms they use. The majority of large institutions that reported some type of AI use purchased these solutions from other companies (76%). Some factors for the adoption of artificial intelligence in companies include the establishment of an autonomous strategy by the company (87.0%), and the influence of suppliers and/or customers (63.0%) and the main difficulties in using technologies were high costs (80.8%), lack of qualified personnel in the company (54.6%) and excessive economic risks (49.5%). Three variables are considered the most relevant to explain the option to use AI: the implementation of a digital security policy, the size of companies with 250 or more employees and the characteristics of the company related to information and communication. When analyzing AI use by company size in Brazil, large companies have the highest proportion of AI use, mainly due to their investment capacity and technology experimentation. However, when comparing Brazil and Europe, indicators show an acceleration in AI use among large European companies, while in Brazil the situation remains stable. In 2023, 30% of large companies in the European bloc used some type of AI, a figure that rose to 41% in 2024, while in Brazil these proportions were 41% in 2023 and 38% in 2024. === Workforce === The challenge of upskilling begins with employees who are capable of understanding recent technological changes. Similarly, companies must create the environment and conditions for workforce development conducive to innovation, and universities must be prepared to provide knowledge aligned with the transition process, which in turn must be supported by public policies. The concern with training a specialized workforce in AI can be seen in the low number of graduates and PhDs in computer science and computer engineering in Brazil, compared to the number shown in other countries. As recorded in the document Recommendations for the Advancement of Artificial Intelligence in Brazil, 2019 data from the Coordination for the Improvement of Higher Education Personnel (CAPES) indicate that "the number of PhDs graduated annually in computing remained below 400 in 2016, and is not expected to have increased during the Covid-19 pandemic" (ABC, 2023). In the United States, by contrast, the number of PhDs graduated in these two areas has remained around 1,800 for the past 11 years, and during this period, the number of PhDs specializing in AI jumped from 10% to 19%. Based on data from the CNPq Lattes Platform (October 2019), it is possible to observe that the number of professionals in the AI field in Brazil is 4,429 specialists. This is still a small number compared to the 415,166 IT jobs in the country's business sector alone. === R&D, scientific production and integration with industry === China and the United States lead in the number of publications. These two countries are followed by the G7 members: India, Austria, South Korea, and Spain. Brazil appears in the next group, alongside the Netherlands, Russia, Indonesia, and Ireland. Regarding the promotion of research and technologies related to AI, public entities such as the Coordination for the Improvement of Higher Education Personnel (Capes) and the National Council for Scientific and Technological Development (CNPq) stood out as the main funders. Currently, different countries and territories have been promoting the development of Artificial Intelligence (AI). In the Brazilian case, one of the main initiatives is the creation of Engineering Research Centers/Applied Research Centers (CPE/CPA) in AI by the São Paulo Research Foundation (FAPESP), in collaboration with the Ministry of Science, Technology and Innovation (MCTI), the Ministry of Communications (MC) and the Brazilian Internet Steering Committee (CGI.br). In terms of the number of patents filed and the volume of investments, the leading nations in AI are the United States, China, France, Germany, the United Kingdom, Russia, India, Switzerland, Japan, South Korea, the Netherlands, Sweden, Finland, Ireland, Singapore, Canada, Israel, and Italy. Brazil appears among the top twenty countries in some rankings, mainly due to its good number of publications (approximately 10% of the number of articles published by the United States). The US is home to approximately 60% of the world's top AI researchers, followed by China (11%), Europe (10%), and Canada (6%). To change this scenario, in August 2024, the Brazilian government announced an investment of R$23 billion until 2028 in artificial intelligence, seeking to “transform the country into a global reference in innovation”. == Future challenges == The Organization for Economic Cooperation and Development (2020) report highlighted three factors that hinder the digital transformation journey and application of AI in Brazil: insufficient infrastructure, high costs due to the tax system, and financial limitations, such as limited access to financing. The costs of adopting technology, its incompatibility with the business, and the lack of training also represent obstacles that Brazilian industry must overcome. There are also inherent obstacles for companies. A McKinsey review emphasizes that once a company chooses one or more sectors to focus on, it must select specific applications. Buyers aren't interested in artificial intelligence simply because it's a breakthrough technology; they want AI to generate a good return on investment, whether by solving specific problems, saving money, or increasing sales. If an AI vendor tried to offer a horizontal solution, the value proposition might not be as compelling. Part of the solution to Brazil's technological backwardness involves building an ecosystem fueled by private institutions, universities, and governments.

    Read more →
  • Small Data

    Small Data

    Small Data: the Tiny Clues that Uncover Huge Trends is Martin Lindstrom's seventh book. It chronicles his work as a branding expert, working with consumers across the world to better understand their behavior. The theory behind the book is that businesses can better create products and services based on observing consumer behavior in their homes, as opposed to relying solely on big data. == Content == The book is based on a several year period of consumer studies for major corporations across the globe. It features case studies of the author's work interviewing consumers in their homes and using his observations to create hypotheses as to why they use products the way that they do. == Public reception == The book was a New York Times Bestseller upon release and was positively reviewed on several websites, Including Entrepreneur and Forbes. In 2016, it was named a Best Business Book by strategy+business and one of Inc. Magazine's Best Sales and Marketing books.

    Read more →
  • COVID-19 apps

    COVID-19 apps

    COVID-19 apps include mobile-software applications for digital contact-tracing—i.e. the process of identifying persons ("contacts") who may have been in contact with an infected individual—deployed during the COVID-19 pandemic. Numerous tracing applications have been developed or proposed, with official government support in some territories and jurisdictions. Several frameworks for building contact-tracing apps have been developed. Privacy concerns have been raised, especially about systems that are based on tracking the geographical location of app users. Less overtly intrusive alternatives include the co-option of Bluetooth signals to log a user's proximity to other cellphones. (Bluetooth technology has form in tracking cell-phones' locations.)) On 10 April 2020, Google and Apple jointly announced that they would integrate functionality to support such Bluetooth-based apps directly into their Android and iOS operating systems. India's COVID-19 tracking app Aarogya Setu became the world's fastest growing application—beating Pokémon Go—with 50 million users in the first 13 days of its release. == Rationale == Contact tracing is an important tool in infectious disease control, but as the number of cases rises time constraints make it more challenging to effectively control transmission. Digital contact tracing, especially if widely deployed, may be more effective than traditional methods of contact tracing. In a March 2020 model by the University of Oxford Big Data Institute's Christophe Fraser's team, a coronavirus outbreak in a city of one million people is halted if 80% of all smartphone users take part in a tracking system; in the model, the elderly are still expected to self-isolate en masse, but individuals who are neither symptomatic nor elderly are exempt from isolation unless they receive an alert that they are at risk of carrying the disease. Some proponents advocate for legislation exempting certain COVID-19 apps from general privacy restrictions. == Issues == === Uptake === Ross Anderson, professor of security engineering at Cambridge University, listed a number of potential practical problems with app-based systems, including false positives and the potential lack of effectiveness if takeup of the app is limited to only a small fraction of the population. In Singapore, only one person in three had downloaded the TraceTogether app by the end of June 2020, despite legal requirements for most workers; the app was also underused, as it required users to keep it open at all times on iOS. A team at the University of Oxford simulated the effect of a contact tracing app on a city of 1 million. They estimated that if the app was used in conjunction with the shielding of over-70s, then 56% of the population would have to be using the app for it to suppress the virus. This would be equivalent to 80% of smartphone users in the United Kingdom. They found that the app could still slow the spread of the virus if fewer people downloaded it, with one infection being prevented for every one or two users. In August 2020, the American Civil Liberties Union (ACLU) argued that there were disparities in smartphone use between demographics and minority groups, and that "even the most comprehensive, all-seeing contact tracing system is of little use without social and medical systems in place to help those who may have the virus — including access to medical care, testing, and support for those who are quarantined." === App store restrictions === Addressing concerns about the spread of misleading or harmful apps, Apple, Google and Amazon set limits on which types of organizations could add coronavirus-related apps to its App Store, limiting them to only "official" or otherwise reputable organizations. === Ethical principles of mass surveillance using COVID-19 contact tracing apps === The advent of COVID-19 contact tracing apps has led to concerns around privacy, the rights of app users, and governmental authority. The European Convention on Human Rights, the International Covenant on Civil and Political Rights (ICCPR) and the United Nations and the Siracusa Principles have outlined 4 principles to consider when looking at the ethical principles of mass surveillance with COVID-19 contact tracing apps. These are necessity, proportionality, scientific validity, and time boundedness. Necessity is defined as the idea that governments should only interfere with a person's rights when deemed essential for public health interests. The potential risks associated with infringements of personal privacy must be outweighed by the possibility of reducing significant harm to others. Potential benefits of contact-tracing apps that may be considered include allowing for blanket population-level quarantine measures to be lifted sooner and the minimization of people under quarantine. Hence, some contend that contact-tracing apps are justified as they may be less intrusive than blanket quarantine measures. Furthermore, the delay of an effective contact-tracing app with significant health and economic benefits may be considered unethical. Proportionality refers to the concept that a contact tracing app's potential negative impact on a person's rights should be justifiable by the severity of the health risks that are being addressed. Apps must use the most privacy-preserving options available to achieve their goals, and the selected option should not only be a logical option for achieving the goal but also an effective one. Scientific validity evaluates whether an app is effective, timely and accurate. Traditional manual contact-tracing procedures are not efficient enough for the COVID-19 pandemic, and do not consider asymptomatic transmission. Contact-tracing apps, on the other hand, can be effective COVID-19 contact-tracing tools that reduce R value to less than 1, leading to sustained epidemic suppression. However, for apps to be effective, there needs to be a minimum 56-60% uptake in the population. Apps should be continually modified to reflect current knowledge on the diseases being monitored. Some argue that contact-tracing apps should be considered societal experimental trials where results and adverse effects are evaluated according to the stringent guidelines of social experiments. Analyses should be conducted by independent research bodies and published for wide dissemination. Despite the current urgency of our pandemic situation, we should still adhere to the standard rigors of scientific evaluation. Time boundedness describe the need for establishing legal and technical sunset clauses so that they are only allowed to operate as long as necessary to address the pandemic situation. Apps should be withdrawn as soon as possible after the end of the pandemic. If the end of the pandemic cannot be predicted, the use of apps should be regularly reviewed and decisions about continued use should be made at each review. Collected data should only be retained by public health authorities for research purposes with clear stipulations on how long the data will be held for and who will be responsible for security, oversight, and ownership. === Privacy, discrimination and marginalisation concerns === The American Civil Liberties Union (ACLU) has published a set of principles for technology-assisted contact tracing and Amnesty International and over 100 other organizations issued a statement calling for limits on this kind of surveillance. The organisations declared eight conditions on governmental projects: surveillance would have to be "lawful, necessary and proportionate"; extensions of monitoring and surveillance would have to have sunset clauses; the use of data would have to be limited to COVID-19 purposes; data security and anonymity would have to be protected and shown to be protected based on evidence; digital surveillance would have to address the risk of exacerbating discrimination and marginalisation; any sharing of data with third parties would have to be defined in law; there would have to be safeguards against abuse and the rights of citizens to respond to abuses; "meaningful participation" by all "relevant stakeholders" would be required, including that of public health experts and marginalised groups. The German Chaos Computer Club (CCC) and Reporters Without Borders also issued checklists. The Exposure Notification service intends to address the problem of persistent surveillance by removing the tracing mechanism from their device operating systems once it is no longer needed. On 20 April 2020, it was reported that over 300 academics had signed a statement favouring decentralised proximity tracing applications over centralised models, given the difficulty in precluding centralised options being used "to enable unwarranted discrimination and surveillance." In a centralised model, a central database records the ID codes of meetings between users. In a decentralised model, this information is recorded on individual phones, with the role of the central

    Read more →
  • ArchiMate

    ArchiMate

    ArchiMate ( AR-ki-mayt) is an open and independent enterprise architecture modeling language to support the description, analysis and visualization of architecture within and across business domains in an unambiguous way. ArchiMate is a technical standard from The Open Group and is based on concepts from the now superseded IEEE 1471 standard. It is supported by various tool vendors and consulting firms. ArchiMate is also a registered trademark of The Open Group. The Open Group has a certification program for ArchiMate users, software tools and courses. ArchiMate distinguishes itself from other languages such as Unified Modeling Language (UML) and Business Process Modeling and Notation (BPMN) by its enterprise modelling scope. Also, UML and BPMN are meant for a specific use and they are quite heavy – containing about 150 (UML) and 250 (BPMN) modeling concepts whereas ArchiMate works with just about 50 (in version 2.0). The goal of ArchiMate is to be ”as small as possible”, not to cover every edge scenario imaginable. To be easy to learn and apply, ArchiMate was intentionally restricted “to the concepts that suffice for modeling the proverbial 80% of practical cases". == Overview == ArchiMate offers a common language for describing the construction and operation of business processes, organizational structures, information flows, IT systems, and technical infrastructure. This insight helps the different stakeholders to design, assess, and communicate the consequences of decisions and changes within and between these business domains. The main concepts and relationships of the ArchiMate language can be seen as a framework, the so-called Archimate Framework: It divides the enterprise architecture into a business, application and technology layer. In each layer, three aspects are considered: active elements, an internal structure and elements that define use or communicate information. One of the objectives of the ArchiMate language is to define the relationships between concepts in different architecture domains. The concepts of this language therefore hold the middle between the detailed concepts, which are used for modeling individual domains (for example, the Unified Modeling Language (UML) for modeling software products), and Business Process Model and Notation (BPMN), which is used for business process modeling. == History == ArchiMate is partly based on the now superseded IEEE 1471 standard. It was developed in the Netherlands by a project team from the Telematica Instituut in cooperation with several Dutch partners from government, industry and academia. Among the partners were Ordina NV, Radboud Universiteit Nijmegen, the Leiden Institute for Advanced Computer Science (LIACS) and the Centrum Wiskunde & Informatica (CWI). Later, tests were performed in organizations such as ABN AMRO, the Dutch Tax and Customs Administration and the ABP. The development process lasted from July 2002 to December 2004, and took about 35 person years and approximately 4 million euros. The development was funded by the Dutch government (Dutch Tax and Customs Administration), and business partners, including ABN AMRO and the ABP Pension Fund. In 2008 the ownership and stewardship of ArchiMate was transferred to The Open Group. It is now managed by the ArchiMate Forum within The Open Group. In February 2009 The Open Group published the ArchiMate 1.0 standard as a formal technical standard. In January 2012 the ArchiMate 2.0 standard, and in 2013 the ArchiMate 2.1 standard was released. In June 2016, the Open Group released version 3.0 of the ArchiMate Specification. An update to Archimate 3.0.1 came out in August 2017. Archimate 3.1 was published 5 November 2019. The latest version of the ArchiMate Specification is version 3.2 released October 2022. Version 3.0 adds enhanced support for capability-oriented strategic modelling, new entities representing physical resources (for modelling the ingredients, equipment and transport resources used in the physical world) and a generic metamodel showing the entity types and the relationships between them. == ArchiMate framework == === Core framework === The main concepts and elements of the ArchiMate language are being presented as ArchiMate core framework. It consists of three layers and three aspects. This creates a matrix of combinations. Every layer has its passive structure, behavior and active structure aspects. ==== Layers ==== ArchiMate has a layered and service-oriented look on architectural models. The higher layers make use of services that are provided by the lower layers. Although, at an abstract level, the concepts that are used within each layer are similar, we define more concrete concepts that are specific for a certain layer. In this context, we distinguish three main layers: The business layer is about business processes, services, functions and events of business units. This layer "offers products and services to external customers, which are realized in the organization by business processes performed by business actors and roles". The application layer is about software applications that "support the components in the business with application services". The technology layer deals "with the hardware and communication infrastructure to support the application layer. This layer offers infrastructural services needed to run applications, realized by computer and communication hardware and system software". Each of these main layers can be further divided in sub-layers. For example, in the business layer, the primary business processes realising the products of a company may make use of a layer of secondary (supporting) business processes; in the application layer, the end-user applications may make use of generic services offered by supporting applications. On top of the business layer, a separate environment layer may be added, modelling the external customers that make use of the services of the organisation (although these may also be considered part of the business layer). In line with service orientation, the most important relation between layers is formed by use relations, which show how the higher layers make use of the services of lower layers. However, a second type of link is formed by realisation relations: elements in lower layers may realise comparable elements in higher layers; e.g., a ‘data object’ (application layer) may realise a ‘business object’ (business layer); or an ‘artifact’ (technology layer) may realise either a ‘data object’ or an ‘application component’ (application layer). ==== Aspects ==== Passive structure is the set of entities on which actions are conducted. In the business layer the example would be information objects, in the application layer data objects and in the technology layer, they could include physical objects. Behavior refers to the processes and functions performed by the actors. "Structural elements are assigned to behavioral elements, to show who or what displays the behavior". Active structure is the set of entities that display some behavior, e.g. business actors, devices, or application components. === Full framework === The Full ArchiMate framework is enriched by the physical layer, which was added to allow modeling of “physical equipment, materials, and distribution networks” and was not present in the previous version. The implementation and migration layer adds elements that allow architects to model a state of transition, to mark parts of the architecture that are temporary for the purpose, as the name says, of implementation and migration. Strategy layer adds three elements: resource, capability and course of action. These elements help to incorporate strategic dimension to the ArchiMate language by allowing it to depict the usage of resources and capabilities in order to achieve some strategic goals. Finally, there is a motivation aspect that allows different stakeholders to describe the motivation of specific actors or domains, which can be quite important when looking at one thing from several different angles. It adds several elements like stakeholder, value, driver, goal, meaning etc. == ArchiMate language == The ArchiMate language is formed as a top-level and is hierarchical. On the top, there is a model. A model is a collection of concepts. A concept can be either an element or a relationship. An element can be either of behavior type, structure, motivation or a so-called composite element (which means that it does not fit just one aspect of the framework, but two or more). The functionality of all concepts without a dependency on a specific layer is described by the generic metamodel. This layer-independent description of concepts is useful when trying to understand the mechanics of the Archimate language. === Concepts === ==== Elements ==== The generic elements are distributed into the same categories as the layers: Active structure elements Behavior elements Passive structure elements Motivation elements Active structure e

    Read more →
  • Reference data

    Reference data

    Reference data is data used to classify or categorize other data. Typically, they are static or slowly changing over time. Examples of reference data include: Units of measurement Country codes Corporate codes Fixed conversion rates e.g., weight, temperature, and length Calendar structure and constraints Reference data sets are sometimes alternatively referred to as a "controlled vocabulary" or "lookup" data. Reference data differs from master data. While both provide context for business transactions, reference data is concerned with classification and categorisation, while master data is concerned with business entities. A further difference between reference data and master data is that a change to the reference data values may require an associated change in business process to support the change, while a change in master data will always be managed as part of existing business processes. For example, adding a new customer or sales product is part of the standard business process. However, adding a new product classification (e.g. "restricted sales item") or a new customer type (e.g. "gold level customer") will result in a modification to the business processes to manage those items. == Externally-defined reference data == For most organisations, most or all reference data is defined and managed within that organisation. Some reference data, however, may be externally defined and managed, for example by standards organizations. An example of externally defined reference data is the set of country codes as defined in ISO 3166-1. == Reference data management == Curating and managing reference data is key to ensuring its quality and thus fitness for purpose. All aspects of an organisation, operational and analytical, are greatly dependent on the quality of an organization's reference data. Without consistency across business process or applications, for example, similar things may be described in quite different ways. Reference data gain in value when they are widely re-used and widely referenced. Examples of good practice in reference data management include: Formalize the reference data management Use external reference data as much as possible Govern the reference data specific to your enterprise Manage reference data at enterprise level Version control your reference data

    Read more →
  • Explore-then-commit algorithm

    Explore-then-commit algorithm

    Explore Then Commit (ETC) is an algorithm for the multi-armed bandit problem foc,used on finding the best trade-off between exploration and exploitation. == Multi-armed bandit problem == The multi-armed bandit problem is a sequential game where one player has to choose at each turn between K {\displaystyle K} actions (arms). Behind every arm a {\displaystyle a} is an unknown distribution ν a {\displaystyle \nu _{a}} that lies in a set D {\displaystyle {\mathcal {D}}} known by the player (for example, D {\displaystyle {\mathcal {D}}} can be the set of Gaussian distributions or Bernoulli distributions). At each turn t {\displaystyle t} the player chooses (pulls) an arm a t {\displaystyle a_{t}} , they then get an observation X t {\displaystyle X_{t}} of the distribution ν a t {\displaystyle \nu _{a_{t}}} . === Regret minimization === The goal is to minimize the regret at time T {\displaystyle T} that is defined as R T := ∑ a = 1 K Δ a E [ N a ( T ) ] {\displaystyle R_{T}:=\sum _{a=1}^{K}\Delta _{a}\mathbb {E} [N_{a}(T)]} where μ a := E [ ν a ] {\displaystyle \mu _{a}:=\mathbb {E} [\nu _{a}]} is the mean of arm a {\displaystyle a} μ ∗ := max a μ a {\displaystyle \mu ^{}:=\max _{a}\mu _{a}} is the highest mean Δ a := μ ∗ − μ a {\displaystyle \Delta _{a}:=\mu ^{}-\mu _{a}} N a ( t ) {\displaystyle N_{a}(t)} is the number of pulls of arm a {\displaystyle a} up to turn t {\displaystyle t} The player has to find an algorithm that chooses at each turn t {\displaystyle t} which arm to pull based on the previous actions and observations ( a s , X s ) s < t {\displaystyle (a_{s},X_{s})_{s Read more →

  • Couchbase Server

    Couchbase Server

    Couchbase Server, originally known as Membase, is a source-available, distributed (shared-nothing architecture) multi-model NoSQL document-oriented database software package optimized for interactive applications. These applications may serve many concurrent users by creating, storing, retrieving, aggregating, manipulating and presenting data. In support of these kinds of application needs, Couchbase Server is designed to provide easy-to-scale key-value, or JSON document access, with low latency and high sustainability throughput. It is designed to be clustered from a single machine to very large-scale deployments spanning many machines. Couchbase Server provided client protocol compatibility with memcached, but added disk persistence, data replication, live cluster reconfiguration, rebalancing and multitenancy with data partitioning. == Product history == Membase was developed by several leaders of the memcached project, who had founded a company, NorthScale, to develop a key-value store with the simplicity, speed, and scalability of memcached, but also the storage, persistence and querying capabilities of a database. The original membase source code was contributed by NorthScale, and project co-sponsors Zynga and Naver Corporation (then known as NHN) to a new project on membase.org in June 2010. On February 8, 2011, the Membase project founders and Membase, Inc. announced a merger with CouchOne (a company with many of the principal players behind CouchDB) with an associated project merger. The merged company was called Couchbase, Inc. In January 2012, Couchbase released Couchbase Server 1.8. In September of 2012, Orbitz said it had changed some of its systems to use Couchbase. In December of 2012, Couchbase Server 2.0 (announced in July 2011) was released and included a new JSON document store, indexing and querying, incremental MapReduce and replication across data centers. == Architecture == Every Couchbase node consists of a data service, index service, query service, and cluster manager component. Starting with the 4.0 release, the three services can be distributed to run on separate nodes of the cluster if needed. In the parlance of Eric Brewer's CAP theorem, Couchbase is normally a CP type system meaning it provides consistency and partition tolerance, or it can be set up as an AP system with multiple clusters. === Cluster manager === The cluster manager supervises the configuration and behavior of all the servers in a Couchbase cluster. It configures and supervises inter-node behavior like managing replication streams and re-balancing operations. It also provides metric aggregation and consensus functions for the cluster, and a RESTful cluster management interface. The cluster manager uses the Erlang programming language and the Open Telecom Platform. ==== Replication and fail-over ==== Data replication within the nodes of a cluster can be controlled with several parameters. In December of 2012, support was added for replication between different data centers. === Data manager === The data manager stores and retrieves documents in response to data operations from applications. It asynchronously writes data to disk after acknowledging to the client. In version 1.7 and later, applications can optionally ensure data is written to more than one server or to disk before acknowledging a write to the client. Parameters define item ages that affect when data is persisted, and how max memory and migration from main-memory to disk is handled. It supports working sets greater than a memory quota per "node" or "bucket". External systems can subscribe to filtered data streams, supporting, for example, full text search indexing, data analytics or archiving. ==== Data format ==== A document is the most basic unit of data manipulation in Couchbase Server. Documents are stored in JSON document format with no predefined schemas. Non-JSON documents can also be stored in Couchbase Server (binary, serialized values, XML, etc.) ==== Object-managed cache ==== Couchbase Server includes a built-in multi-threaded object-managed cache that implements memcached compatible APIs such as get, set, delete, append, prepend etc. ==== Storage engine ==== Couchbase Server has a tail-append storage design that is immune to data corruption, OOM killers or sudden loss of power. Data is written to the data file in an append-only manner, which enables Couchbase to do mostly sequential writes for update, and provide an optimized access patterns for disk I/O. === Performance === A performance benchmark done by Altoros in 2012, compared Couchbase Server with other technologies. Cisco Systems published a benchmark that measured the latency and throughput of Couchbase Server with a mixed workload in 2012. == Licensing and support == Couchbase Server is a packaged version of Couchbase's open source software technology and is available in a community edition without recent bug fixes with an Apache 2.0 license and an edition for commercial use. Couchbase Server builds are available for Ubuntu, Debian, Red Hat, SUSE, Oracle Linux, Microsoft Windows and macOS operating systems. Couchbase has supported software developers' kits for the programming languages .NET, PHP, Ruby, Python, C, Node.js, Java, Go, and Scala. == SQL++ == A query language called SQL++ (formerly called N1QL), is used for manipulating the JSON data in Couchbase, just like SQL manipulates data in RDBMS. It has SELECT, INSERT, UPDATE, DELETE, MERGE statements to operate on JSON data. It was initially announced in March 2015 as "SQL for documents". The SQL++ data model is non-first normal form (N1NF) with support for nested attributes and domain-oriented normalization. The SQL++ data model is also a proper superset and generalization of the relational model. === Example === Like query SELECT FROM `bucket` WHERE email LIKE "%@example.org"; Array query SELECT FROM `bucket` WHERE ANY x IN friends SATISFIES x.name = "Pavan" END; == Couchbase Mobile == Couchbase Mobile / Couchbase Lite is a mobile database providing data replication. Couchbase Lite (originally TouchDB) provides native libraries for offline-first NoSQL databases with built-in peer-to-peer or client-server replication mechanisms. Sync Gateway manages secure access and synchronization of data between Couchbase Lite and Couchbase Server. Couchbase Lite added support for Vector Search in version 3.2, allowing cloud to edge support for vector search in mobile applications. == Uses == Couchbase began as an evolution of Memcached, a high-speed data cache, and can be used as a drop-in replacement for Memcached, providing high availability for memcached application without code changes. Couchbase is used to support applications where a flexible data model, easy scalability, and consistent high performance are required, such as tracking real-time user activity or providing a store of user preferences or online applications. Couchbase Mobile, which stores data locally on devices (usually mobile devices) is used to create “offline-first” applications that can operate when a device is not connected to a network and synchronize with Couchbase Server once a network connection is re-established. The Catalyst Lab at Northwestern University uses Couchbase Mobile to support the Evo application, a healthy lifestyle research program where data is used to help participants improve dietary quality, physical activity, stress, or sleep. Amadeus uses Couchbase with Apache Kafka to support their “open, simple, and agile” strategy to consume and integrate data on loyalty programs for airline and other travel partners. High scalability is needed when disruptive travel events create a need to recognize and compensate high value customers. Starting in 2012, it played a role in LinkedIn's caching systems, including backend caching for recruiter and jobs products, counters for security defense mechanisms, for internal applications. == Alternatives == For caching, Couchbase competes with Memcached and Redis. For document databases, Couchbase competes with other document-oriented database systems. It is commonly compared with MongoDB, Amazon DynamoDB, Oracle RDBMS, DataStax, Google Bigtable, MariaDB, IBM Cloudant, Redis Enterprise, SingleStore, and MarkLogic.

    Read more →
  • Subject indexing

    Subject indexing

    Subject indexing is the act of describing or classifying a document by index terms, keywords, or other symbols in order to indicate what different documents are about, to summarize their contents or to increase findability. In other words, the objective is to identify and describe the subject of documents. Indexes are constructed, separately, on three distinct levels: terms in a document, such as a book; objects in a collection, such as a library; and documents (such as books and articles) within a field of knowledge. Subject indexing is used in information retrieval especially to create bibliographic indexes to retrieve documents on a particular subject. Examples of academic indexing services are Zentralblatt MATH, Chemical Abstracts, and PubMed. The index terms were mostly assigned by experts but author keywords are also common. The process of indexing begins with any analysis of the subject of the document. The indexer must then identify terms that appropriately identify the subject, either by extracting words directly from the document or assigning words from a controlled vocabulary. The terms in the index are then presented in a systematic order. Indexers must decide how many terms to include and how specific the terms should be. Together this gives a depth of indexing. == Subject analysis == The first step in indexing is to decide on the subject matter of the document. In manual indexing, the indexer would consider the subject matter in terms of answer to a set of questions such as "Does the document deal with a specific product, condition or phenomenon?". As the analysis is influenced by the knowledge and experience of the indexer, it follows that two indexers may analyze the content differently and so come up with different index terms. This will impact on the success of retrieval. === Automatic vs. manual subject analysis === Automatic indexing follows set processes of analyzing frequencies of word patterns and comparing results to other documents in order to assign to subject categories. This requires no understanding of the material being indexed. This leads to more uniform indexing but at the expense of the true meaning being interpreted. A computer program will not understand the meaning of statements and may therefore fail to assign some relevant terms or assign incorrectly. Human indexers focus their attention on certain parts of the document such as the title, abstract, summary and conclusions, as analyzing the full text in depth is costly and time-consuming. An automated system takes away the time limit and allows the entire document to be analyzed, but also has the option to be directed to particular parts of the document. == Term selection == The second stage of indexing involves the translation of the subject analysis into a set of index terms. This can involve extracting from the document or assigning from a controlled vocabulary. With the ability to conduct a full text search widely available, many people have come to rely on their own expertise in conducting information searches and full text search has become very popular. Subject indexing and its experts, professional indexers, catalogers, and librarians, remains crucial to information organization and retrieval. These experts understand controlled vocabularies and are able to find information that cannot be located by full text search. The cost of expert analysis to create subject indexing is not easily compared to the cost of hardware, software and labor to manufacture a comparable set of full-text, fully searchable materials. With new web applications that allow every user to annotate documents, social tagging has gained popularity especially in the Web. One application of indexing, the book index, remains relatively unchanged despite the information revolution. === Extraction/Derived indexing === Extraction indexing involves taking words directly from the document. It uses natural language and lends itself well to automated techniques where word frequencies are calculated and those with a frequency over a pre-determined threshold are used as index terms. A stop-list containing common words (such as "the", "and") would be referred to and such stop words would be excluded as index terms. Automated extraction indexing may lead to loss of meaning of terms by indexing single words as opposed to phrases. Although it is possible to extract commonly occurring phrases, it becomes more difficult if key concepts are inconsistently worded in phrases. Automated extraction indexing also has the problem that, even with use of a stop-list to remove common words, some frequent words may not be useful for allowing discrimination between documents. For example, the term glucose is likely to occur frequently in any document related to diabetes. Therefore, use of this term would likely return most or all the documents in the database. Post-coordinated indexing where terms are combined at the time of searching would reduce this effect but the onus would be on the searcher to link appropriate terms as opposed to the information professional. In addition terms that occur infrequently may be highly significant for example a new drug may be mentioned infrequently but the novelty of the subject makes any reference significant. One method for allowing rarer terms to be included and common words to be excluded by automated techniques would be a relative frequency approach where frequency of a word in a document is compared to frequency in the database as a whole. Therefore, a term that occurs more often in a document than might be expected based on the rest of the database could then be used as an index term, and terms that occur equally frequently throughout will be excluded. Another problem with automated extraction is that it does not recognize when a concept is discussed but is not identified in the text by an indexable keyword. Since this process is based on simple string matching and involves no intellectual analysis, the resulting product is more appropriately known as a concordance than an index. === Assignment indexing === An alternative is assignment indexing where index terms are taken from a controlled vocabulary. This has the advantage of controlling for synonyms as the preferred term is indexed and synonyms or related terms direct the user to the preferred term. This means the user can find articles regardless of the specific term used by the author and saves the user from having to know and check all possible synonyms. It also removes any confusion caused by homographs by inclusion of a qualifying term. A third advantage is that it allows the linking of related terms whether they are linked by hierarchy or association, e.g. an index entry for an oral medication may list other oral medications as related terms on the same level of the hierarchy but would also link to broader terms such as treatment. Assignment indexing is used in manual indexing to improve inter-indexer consistency as different indexers will have a controlled set of terms to choose from. Controlled vocabularies do not completely remove inconsistencies as two indexers may still interpret the subject differently. == Index presentation == The final phase of indexing is to present the entries in a systematic order. This may involve linking entries. In a pre-coordinated index the indexer determines the order in which terms are linked in an entry by considering how a user may formulate their search. In a post-coordinated index, the entries are presented singly and the user can link the entries through searches, most commonly carried out by computer software. Post-coordination results in a loss of precision in comparison to pre-coordination. == Depth of indexing == Indexers must make decisions about what entries should be included and how many entries an index should incorporate. The depth of indexing describes the thoroughness of the indexing process with reference to exhaustivity and specificity. === Exhaustivity === An exhaustive index is one which lists all possible index terms. Greater exhaustivity gives a higher recall, or more likelihood of all the relevant articles being retrieved, however, this occurs at the expense of precision. This means that the user may retrieve a larger number of irrelevant documents or documents which only deal with the subject in little depth. In a manual system a greater level of exhaustivity brings with it a greater cost as more man-hours are required. The additional time taken in an automated system would be much less significant. At the other end of the scale, in a selective index only the most important aspects are covered. Recall is reduced in a selective index as if an indexer does not include enough terms, a highly relevant article may be overlooked. Therefore, indexers should strive for a balance and consider what the document may be used. They may also have to consider the implications of time and expense. === Specificity === The specificity describes how closel

    Read more →
  • Documentation science

    Documentation science

    Documentation science is the study of the recording and retrieval of information. It includes methods for storing, retrieving, and sharing of information captured on physical as well as digital documents. This field is closely linked to the fields of library science and information science but has its own theories and practices. The term documentation science was coined by Belgian lawyer and peace activist Paul Otlet. He is considered to be the forefather of information science. He along with Henri La Fontaine laid the foundations of documentation science as a field of study. Professionals in this field are called documentalists. Over the years, documentation science has grown to become a large and important field of study. Evolving from traditional practices like archiving and retrieval to modern theories about the nature of documents, novel methods for organizing digital information, and applications in libraries, research, healthcare, business, and technology and more. This field continues to evolve in the digital age. == Developments in documentation science == 1895: The International Institute of Bibliography (originally Institut International de Bibliographie, IIB) was established on 12 September 1895, in Brussels, Belgium by Paul Otlet and Henri La Fontaine. It aimed to catalog all recorded knowledge using a universal classification system now known as the Universal Decimal Classification (UDC). 1931: International Institute of Bibliography (originally Institut International de Bibliographie, IIB) was renamed The International Institute for Documentation, (Institut International de Documentation, IID). 1934: Paul Otlet envisioned a “radiated library,” a global network of interconnected documents accessible from anywhere via telecommunication. This early idea is now seen as a forerunner of the internet. 1937: American Documentation Institute was founded (1968 nameshift to American Society for Information Science). 1951: Suzanne Briet published Qu'est-ce que la documentation? where she proposed that “a document is evidence in support of a fact,” expanding the definition to include objects such as animals in zoos when they are part of a scientific study. This was a significant theoretical shift in defining documents. 1965-1990: Documentation departments were established, for example, large research libraries, online computer retrieval systems and more. The persons doing the searches were called documentalists. But with the appearance of first CD-ROM databases in the mid-1980s and later the internet in 1990s, these intermediary searches decreased and most such departments closed or merged with other departments. 1996: "Dokvit", Documentation Studies, was established in 1996 at the University of Tromsø in Norway. 2001: The Document Academy was established. It is an international network that celebrates documentation. It was conducted by The Program of Documentation Studies, University of Tromsø, Norway and The School of Information Management and Systems, UC Berkeley. 2003: The first Document Research Conference (DOCAM), a series of conferences made by the Document Academy. DOCAM '03 (2003) was held 13–15 August 2003 at The School of Information Management and Systems (SIMS) at the University of California, Berkeley. 2007: Michael Buckland, Ronald Day, and Birger Hjørland expanded the theoretical foundations of documentation science. They researched and explored documents to be social artifacts, the role of ideology in classification, and how documents influenced knowledge systems. 2010s: The concept of post-documentation or “documentality” began in the 2010s, which focused on how digital traces (e.g., tweets, logs) function as documents without traditional physical form. This led to new thinking in document theory. 2016–present: The Document Academy's DOCAM conferences have continued, offering ongoing developments in the theory and practice of documentation. Themes include affect, memory, activism, and born-digital records. 2017: The journal Information Research published special issues addressing “document theory,” including views on documentation in virtual environments and digital archives. 2020–present: The growth of research data management (RDM) and open science has made documentation practices central to data sharing, metadata standards, and reproducibility in scientific work. == Theoretical foundations == Documentation science has some deep theories that explain what a document is, how people use documents, and how they are organized. These concepts were introduced by scholars who have not only studied libraries, but also philosophy, language, and social sciences. Suzanne Briet described a document as “any material form of evidence” that is made to be used as proof or to share information. An antelope in a zoo, for example, can be a document because it is being studied, classified, and described. Documents are not just things or materials but are also shaped by society. Michael Buckland noted that documents have meaning only when people agree they are useful or valid as information. He explained a document becomes a document when someone decides to use it as evidence. Ronald Day wrote about how documentation is not neutral, it can be influenced by power, ideology, and politics. He claimed that classification systems, like how libraries organize books, are not just technical tools. They also show what kinds of knowledge are seen as more important than others. In recent years, new theories have been introduced, like “documentality” by Maurizio Ferraris. He proposed that a document does not have to be a paper or file, it can also be something digital like a tweet, a database entry, or a log file, as long as it leaves a trace that can be looked at later. This theory helps explain modern digital documents. == Methodologies and practice == Documentation science includes many methods that help people collect, organize, store, and find information. These practices are used in libraries, archives, research labs, companies, and now also in online systems. === Collecting and creating documents === In the past, documentation work included gathering books, articles, reports, and other printed materials. People created records of these materials manually, using catalog cards, indexes, or bibliographies. Paul Otlet’s work with the Universal Bibliographic Repertory is one example. He created millions of card entries to organize knowledge from around the world. Today, documents are not only created by humans. Computers and machines also generate documents, like log files, metadata, and sensor data. These need new tools and methods for collection and management. === Organizing information === Organizing documents has always been a foundational element of documentation science. Methods like classification (dividing things into groups) and indexing (making lists of topics or keywords) help individuals find what they need. A widely used system is the Universal Decimal Classification (UDC) developed by Otlet and La Fontaine. Another is the Library of Congress Classification (LCC) used in the majority of U.S. libraries. Indexing can be performed by humans or by software programs that read the text and add tags to documents. Metadata is also used to describe documents. Metadata is “data about data” like the title, author, date, and subject of a document. Standards like Dublin Core are used in digital libraries to keep metadata consistent. === Retrieval and access === One of the main objectives of documentation is helping users find the right document. This is called information retrieval. In the past, this meant using catalog drawers or printed indexes. Today, people use search engines, databases, and digital libraries. Modern retrieval tools use Boolean logic, ranking algorithms, and sometimes machine learning to show the most useful results first. This is part of what is studied in both documentation science and information retrieval. === Preservation and archiving === Documents require long-term storage. This is called preservation of documents. Printed documents can be damaged by light, pests, or even time on the other hand digital documents can be deemed worthless if formats become outdated or storage facilities fail. Archivists use methods like migration, which includes moving files to new formats, and emulation, which replicates obsolete systems, to preserve materials. These methods and tools are ever changing as new technologies develop. But the main objective of documentation has remained the same, which is to keep information safe, organized, and easy to find. == Documentation in the digital age == With the expansion of the internet, computers, and cloud storage, documents are no longer just books, papers, or reports. They can now be emails, tweets, videos, websites, databases, or even log files created by machines. === Born-digital documents === Many documents today are created directly in digital form. These are called born-digit

    Read more →