AI Apple

AI Apple — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • U-Net

    U-Net

    U-Net is a convolutional neural network that was developed for image segmentation. The network is based on a fully convolutional neural network whose architecture was modified and extended to work with fewer training images and to yield more precise segmentation. Segmentation of a 512 × 512 image takes less than a second on a modern (2015) GPU using the U-Net architecture. The U-Net architecture has also been employed in diffusion models for iterative image denoising. This technology underlies many modern image generation models, such as DALL-E, Midjourney, and Stable Diffusion. U-Net is also being explored for language models. Tokenization is not a separate step, allowing the model to more easily understand spelling and concurrently vectorizing / tokenizing higher level concepts. == Description == The U-Net architecture stems from the so-called "fully convolutional network". The main idea is to supplement a usual contracting network by successive layers, where pooling operations are replaced by upsampling operators. Hence these layers increase the resolution of the output. A successive convolutional layer can then learn to assemble a precise output based on this information. One important modification in U-Net is that there are a large number of feature channels in the upsampling part, which allow the network to propagate context information to higher resolution layers. As a consequence, the expansive path is more or less symmetric to the contracting part, and yields a u-shaped architecture. The network only uses the valid part of each convolution without any fully connected layers. To predict the pixels in the border region of the image, the missing context is extrapolated by mirroring the input image. This tiling strategy is important to apply the network to large images, since otherwise the resolution would be limited by the GPU memory. Recently, there had also been an interest in receptive field based U-Net models for medical image segmentation. == Network architecture == The network consists of a contracting path and an expansive path, which gives it the u-shaped architecture. The contracting path is a typical convolutional network that consists of repeated application of convolutions, each followed by a rectified linear unit (ReLU) and a max pooling operation. During the contraction, the spatial information is reduced while feature information is increased. The expansive pathway combines the feature and spatial information through a sequence of up-convolutions and concatenations with high-resolution features from the contracting path. == Applications == There are many applications of U-Net in biomedical image segmentation, such as brain image segmentation (''BRATS'') and liver image segmentation ("siliver07") as well as protein binding site prediction. U-Net implementations have also found use in the physical sciences, for example in the analysis of micrographs of materials. Variations of the U-Net have also been applied for medical image reconstruction. Here are some variants and applications of U-Net as follows: Pixel-wise regression using U-Net and its application on pansharpening; 3D U-Net: Learning Dense Volumetric Segmentation from Sparse Annotation; TernausNet: U-Net with VGG11 Encoder Pre-Trained on ImageNet for Image Segmentation. Image-to-image translation to estimate fluorescent stains In binding site prediction of protein structure. == History == U-Net was created by Olaf Ronneberger, Philipp Fischer, Thomas Brox in 2015 and reported in the paper "U-Net: Convolutional Networks for Biomedical Image Segmentation". It is an improvement and development of FCN: Evan Shelhamer, Jonathan Long, Trevor Darrell (2014). "Fully convolutional networks for semantic segmentation".

    Read more →
  • Information professional

    Information professional

    The term information professional or information specialist refers to professionals responsible for the collection, documentation, organization, storage, preservation, retrieval, and dissemination of printed and digital information. The service delivered to the client is known as an information service. The term "information professional" is a versatile one, used to describe similar and sometimes overlapping professions, such as librarians, archivists, information managers, information systems specialists, information scientists, records managers, and information consultants. However, terminology differs among sources and organisations. Information professionals are employed in a variety of private, public, and academic institutions, as well as independently. == Skills == Since the term information professional is broad, the skills required for this profession are also varied. A Gartner report in 2011 pointed out that "Professional roles focused on information management will be different to that of established IT roles. An 'information professional' will not be one type of role or skill set, but will in fact have a number of specializations". Thus, an information professional can possess a variety of different skills, depending on the sector in which the person is employed. Some essential cross-sector skills are: IT skills, such as word-processing and spreadsheets, digitisation skills, and conducting Internet searches, together with skills loan systems, databases, content management systems, and specially designed programmes and packages. Customer service. An information professional should have the ability to address the information needs of customers. Language proficiency. This is essential in order to manage the information at hand and deal with customer needs. Soft skills. These include skills such as negotiating, conflict resolution, and time management. Management training. An information professional should be familiar with notions such as strategic planning and project management. Moreover, an information professional should be skilled in planning and using relevant systems, in capturing and securing information, and in accessing it to deliver service whenever the information is required. == Associations == Most countries have a professional association who oversee the professional and academic standards of librarians and other information professionals. There are also international associations related to LIS (library and information science), the most prominent of which is the International Federation of Library Associations and Institutions (IFLA). In many countries, LIS courses are accredited by the relevant professional association, as the American Library Association (ALA) in the USA, the Chartered Institute of Library and Information Professionals (CILIP) in the UK, and the Australian Library and Information Association (ALIA) in Australia. == Qualifications == Educational institutions around the world offer academic degrees, or degrees on related subjects such as Archival Studies, Information Systems, Information Management, and Records Management. Some of the institutions offering information science education refer to themselves as an iSchool, such as the CiSAP (Consortium of iSchools Asia Pacific, founded 2006) in Asia and the iSchool Caucus in the USA. There are also online e-learning resources, some of which offer certification for information professionals. === Africa === Information development in Africa started later than in other continents, mainly due to a lack of internet access, expertise and resources to manage digital infrastructure, and "opportunities for capacity development and knowledge-sharing". Nowadays, academic degrees in information studies are available at many universities of African countries, such as the University of Pretoria (South Africa), University of Nairobi (Kenya), Makerere University (Uganda), University of Botswana (Botswana), and University of Nigeria (Nigeria). === Asia === LIS-related studies are available in more than 30 Asian countries. Some examples listed by iSchools Inc. are the University of Hong Kong, University of Tsukuba, Japan, Yonsei University, South Korea, National Taiwan University and Wuhan University, China. Centre of Library and Information Management Science (CLIMS) at Tata Institute of Social Science in Mumbai, India. In Southeast Asia, the Congress of Southeast Asian Librarians (CONSAL) connects librarians and libraries in more than 10 countries with resources, networking opportunities, and support for growing library systems. === Australasia === The Australian Library and Information Association (ALIA) as of 2021 lists six schools offering undergraduate and postgraduate accredited university courses for "Librarian and Information Specialists" on their website. In New Zealand, the Open Polytechnic of New Zealand and the Victoria University of Wellington offer undergraduate and postgraduate degree courses for information professionals. === Europe === The majority of European countries have universities, colleges, or schools which offer bachelor's degrees in LIS studies. Over 40 universities offer master's degrees in LIS-related fields, and many institutions, such as the Swedish School of Library and Information Science at the University of Borås (Sweden), the University of Barcelona (Spain), Loughborough University (UK), and Aberystwyth University (Wales, UK) also offer PhD degrees. === North America === Information studies and degrees are available at numerous academic institutions throughout the U.S. and Canada. U.S. professional associations, together with their European counterparts, have undertaken many educational initiatives and pioneered many advances in the field of Information studies, such as increased interdisciplinarity and more effective delivery of distance learning. The Association for Intelligent Information Management, based in Silver Spring, Maryland, offers a qualification called Certified Information Professional (CIP), earned upon passing an examination, with certification remaining valid for three years. === South America === There are many schools and colleges in Latin America, which offer courses in Library Science, Archival Studies, and Information Studies, however these subjects are taught completely separately.

    Read more →
  • Task Force on Process Mining

    Task Force on Process Mining

    The IEEE Task Force on Process Mining (TFPM) is a non-commercial association for process mining. The IEEE (Institute of Electrical and Electronics Engineers) Task Force on Process Mining was established in October 2009 as part of the IEEE Computational Intelligence Society at the Eindhoven University of Technology. The task force is supported by over 80 organizations and has around 750 members. The main goal of the task force is to promote the research, development, education, and understanding of process mining. == About == In 2012, the IEEE World Congress on Computational Intelligence/ IEEE Congress on Evolutionary Computation held a session on Process Mining. Process mining is a type of research that is a mix of computational intelligence and data mining, as well as process modeling and analysis. === Activities and organization === The Task Force on Process Mining has a Steering Committee and an Advisory Board. The Steering Committee, was chaired by Wil van der Aalst in its inception in 2009, defined 15 action lines. These include the organization of the annual International Process Mining Conference (ICPM) series, standardization efforts leading to the IEEE XES standard for storing and exchanging event data, and the Process Mining Manifesto which was translated into 16 languages. The Task Force on Process Mining also publishes a newsletter, provides data sets, organizes workshops and competitions, and connects researchers and practitioners. In 2016, the IEEE Standards Association published the IEEE Standard for Extensible Event Stream (XES), which is a widely accepted file format by the process mining community. As of 2023, Boudewijn van Dongen serves as chair of the Steering Committee. Wil van der Aalst and Moe Wynn both serve as vice-chair of the Steering Committee.

    Read more →
  • ArchiMate

    ArchiMate

    ArchiMate ( AR-ki-mayt) is an open and independent enterprise architecture modeling language to support the description, analysis and visualization of architecture within and across business domains in an unambiguous way. ArchiMate is a technical standard from The Open Group and is based on concepts from the now superseded IEEE 1471 standard. It is supported by various tool vendors and consulting firms. ArchiMate is also a registered trademark of The Open Group. The Open Group has a certification program for ArchiMate users, software tools and courses. ArchiMate distinguishes itself from other languages such as Unified Modeling Language (UML) and Business Process Modeling and Notation (BPMN) by its enterprise modelling scope. Also, UML and BPMN are meant for a specific use and they are quite heavy – containing about 150 (UML) and 250 (BPMN) modeling concepts whereas ArchiMate works with just about 50 (in version 2.0). The goal of ArchiMate is to be ”as small as possible”, not to cover every edge scenario imaginable. To be easy to learn and apply, ArchiMate was intentionally restricted “to the concepts that suffice for modeling the proverbial 80% of practical cases". == Overview == ArchiMate offers a common language for describing the construction and operation of business processes, organizational structures, information flows, IT systems, and technical infrastructure. This insight helps the different stakeholders to design, assess, and communicate the consequences of decisions and changes within and between these business domains. The main concepts and relationships of the ArchiMate language can be seen as a framework, the so-called Archimate Framework: It divides the enterprise architecture into a business, application and technology layer. In each layer, three aspects are considered: active elements, an internal structure and elements that define use or communicate information. One of the objectives of the ArchiMate language is to define the relationships between concepts in different architecture domains. The concepts of this language therefore hold the middle between the detailed concepts, which are used for modeling individual domains (for example, the Unified Modeling Language (UML) for modeling software products), and Business Process Model and Notation (BPMN), which is used for business process modeling. == History == ArchiMate is partly based on the now superseded IEEE 1471 standard. It was developed in the Netherlands by a project team from the Telematica Instituut in cooperation with several Dutch partners from government, industry and academia. Among the partners were Ordina NV, Radboud Universiteit Nijmegen, the Leiden Institute for Advanced Computer Science (LIACS) and the Centrum Wiskunde & Informatica (CWI). Later, tests were performed in organizations such as ABN AMRO, the Dutch Tax and Customs Administration and the ABP. The development process lasted from July 2002 to December 2004, and took about 35 person years and approximately 4 million euros. The development was funded by the Dutch government (Dutch Tax and Customs Administration), and business partners, including ABN AMRO and the ABP Pension Fund. In 2008 the ownership and stewardship of ArchiMate was transferred to The Open Group. It is now managed by the ArchiMate Forum within The Open Group. In February 2009 The Open Group published the ArchiMate 1.0 standard as a formal technical standard. In January 2012 the ArchiMate 2.0 standard, and in 2013 the ArchiMate 2.1 standard was released. In June 2016, the Open Group released version 3.0 of the ArchiMate Specification. An update to Archimate 3.0.1 came out in August 2017. Archimate 3.1 was published 5 November 2019. The latest version of the ArchiMate Specification is version 3.2 released October 2022. Version 3.0 adds enhanced support for capability-oriented strategic modelling, new entities representing physical resources (for modelling the ingredients, equipment and transport resources used in the physical world) and a generic metamodel showing the entity types and the relationships between them. == ArchiMate framework == === Core framework === The main concepts and elements of the ArchiMate language are being presented as ArchiMate core framework. It consists of three layers and three aspects. This creates a matrix of combinations. Every layer has its passive structure, behavior and active structure aspects. ==== Layers ==== ArchiMate has a layered and service-oriented look on architectural models. The higher layers make use of services that are provided by the lower layers. Although, at an abstract level, the concepts that are used within each layer are similar, we define more concrete concepts that are specific for a certain layer. In this context, we distinguish three main layers: The business layer is about business processes, services, functions and events of business units. This layer "offers products and services to external customers, which are realized in the organization by business processes performed by business actors and roles". The application layer is about software applications that "support the components in the business with application services". The technology layer deals "with the hardware and communication infrastructure to support the application layer. This layer offers infrastructural services needed to run applications, realized by computer and communication hardware and system software". Each of these main layers can be further divided in sub-layers. For example, in the business layer, the primary business processes realising the products of a company may make use of a layer of secondary (supporting) business processes; in the application layer, the end-user applications may make use of generic services offered by supporting applications. On top of the business layer, a separate environment layer may be added, modelling the external customers that make use of the services of the organisation (although these may also be considered part of the business layer). In line with service orientation, the most important relation between layers is formed by use relations, which show how the higher layers make use of the services of lower layers. However, a second type of link is formed by realisation relations: elements in lower layers may realise comparable elements in higher layers; e.g., a ‘data object’ (application layer) may realise a ‘business object’ (business layer); or an ‘artifact’ (technology layer) may realise either a ‘data object’ or an ‘application component’ (application layer). ==== Aspects ==== Passive structure is the set of entities on which actions are conducted. In the business layer the example would be information objects, in the application layer data objects and in the technology layer, they could include physical objects. Behavior refers to the processes and functions performed by the actors. "Structural elements are assigned to behavioral elements, to show who or what displays the behavior". Active structure is the set of entities that display some behavior, e.g. business actors, devices, or application components. === Full framework === The Full ArchiMate framework is enriched by the physical layer, which was added to allow modeling of “physical equipment, materials, and distribution networks” and was not present in the previous version. The implementation and migration layer adds elements that allow architects to model a state of transition, to mark parts of the architecture that are temporary for the purpose, as the name says, of implementation and migration. Strategy layer adds three elements: resource, capability and course of action. These elements help to incorporate strategic dimension to the ArchiMate language by allowing it to depict the usage of resources and capabilities in order to achieve some strategic goals. Finally, there is a motivation aspect that allows different stakeholders to describe the motivation of specific actors or domains, which can be quite important when looking at one thing from several different angles. It adds several elements like stakeholder, value, driver, goal, meaning etc. == ArchiMate language == The ArchiMate language is formed as a top-level and is hierarchical. On the top, there is a model. A model is a collection of concepts. A concept can be either an element or a relationship. An element can be either of behavior type, structure, motivation or a so-called composite element (which means that it does not fit just one aspect of the framework, but two or more). The functionality of all concepts without a dependency on a specific layer is described by the generic metamodel. This layer-independent description of concepts is useful when trying to understand the mechanics of the Archimate language. === Concepts === ==== Elements ==== The generic elements are distributed into the same categories as the layers: Active structure elements Behavior elements Passive structure elements Motivation elements Active structure e

    Read more →
  • GPT-4Chan

    GPT-4Chan

    Generative Pre-trained Transformer 4Chan (GPT-4chan) is a controversial AI model that was developed and deployed by YouTuber and AI researcher Yannic Kilcher in June 2022. The model is a large language model, which means it can generate text based on some input, by fine-tuning GPT-J with a dataset of millions of posts from the /pol/ board of 4chan, an anonymous online forum known for occasionally hosting hateful and extremist content. The model learned to mimic the style and tone of /pol/ users, producing text that is often intentionally offensive to groups (racist, sexist, homophobic, etc.) and nihilistic. Kilcher deployed the model on the /pol/ board itself, where it interacted with other users without revealing its identity. He also made the model publicly available on Hugging Face, a platform for sharing and using AI models, until it was removed from the platform. The project sparked criticism and debate in the AI community. Some people questioned the ethics, legality, and social impact of creating and distributing such a model. Some of the issues raised by the GPT-4chan controversy include the potential harm of spreading hate speech, the responsibility of AI developers and platforms, the need for regulation and oversight of AI models, and the role of open source and transparency in AI research. == Development == The development of GPT-4chan began in May 2022, when Kilcher announced his project on his YouTube channel. Notably, at the time before ChatGPT, he explained that he wanted to create a large language model that could generate realistic and coherent text in the style of /pol/, one of the most notorious online communities. He indicated that he was inspired by the success of GPT-3, a powerful AI model created by OpenAI, and GPT-J, an open-source model, with GPT-3 comparable performance, released by EleutherAI, a group of independent AI researchers. Kilcher decided to use GPT-J as the base model for his project, and fine-tune it with a large dataset of /pol/ posts. The Raiders of the Lost Kek dataset contained over 100 million posts from /pol/, spanning from June 2016-November 2019. Kilcher then proceeded to fine-tune the GPT-J model on the 4chan data. He also showed some examples of the model’s outputs, which ranged from political opinions, conspiracy theories, jokes, insults, and threats, to more creative and bizarre texts, such as poems, stories, songs, and code. He said that he was impressed by the model’s ability to generate fluent and diverse text, and that he was curious to see how it would interact with real /pol/ users. == Release == In June 2022, Kilcher deployed his model on the /pol/ board itself, using a bot that he programmed to post and reply to threads. He did not reveal the model’s identity, and he let it run autonomously, without any human supervision or intervention. He wanted to conduct a natural experiment, and to observe the model’s behavior and impact in a real-world setting. Furthermore, he also wanted to test the model’s robustness, and to see how it would handle the challenges and dynamics of /pol/, such as trolling, flaming, baiting, and moderation. At the same time, Kilcher also made his model publicly available on Hugging Face, a platform for sharing and using AI models. He wanted to share his work with the AI community and the public, and that he hoped that his model would inspire and enable others to create and explore new applications and possibilities with large language models. Likewise, he also said that he wanted to spark a discussion and a debate about the ethical and social implications of his project, and that he welcomed feedback and criticism from anyone. He provided a link to his model’s page on Hugging Face, where anyone could access and use the model through a web interface or an API, and also provided a link to his GitHub repository, where anyone could download and inspect the model’s code and data. == Controversy == The release of GPT-4chan to the public caused a lot of reactions and responses from various audiences. On the /pol/ board, the model’s posts and replies attracted a lot of attention and engagement from other users, who were mostly unaware of the model’s identity and nature. Some users praised the model for its intelligence, creativity, and humor, and agreed with its opinions and views. Some users challenged the model for its ignorance, inconsistency, and absurdity, and disagreed with its claims and arguments. Some users tried to troll, bait, or expose the model, and attempted to trick or test it with various questions and scenarios. The model’s posts and replies also generated a lot of controversy and conflict among the users, who often engaged in heated and violent debates and fights with each other. On Hugging Face, the model’s page received a lot of visits and requests from users who wanted to try out and experiment with the model. The model’s page also received a lot of feedback and reviews from users who rated and commented on the model. However, with the controversy of the model, access to it was gated and then disabled on Hugging Face for concerns about the potential harm the model could cause. The incident was notable for the direct intervention of CEO Clément Delangue in the talk pages, a very unusual occurrence compared to the normal practices of content moderation. The release of GPT-4chan also sparked a lot of media coverage and public attention, as various news outlets and social media platforms reported and commented on the model’s project. On YouTube, the model’s video received a lot of views and interactions from viewers who watched and followed the project. Furthermore, a petition condemning the deployment of GPT-4chan gained over 300 signatures from technology experts.

    Read more →
  • Webometrics

    Webometrics

    The science of webometrics (also referred to as cybermetrics) aims to quantify the World Wide Web to get knowledge about the number and types of hyperlinks, the structure of the World Wide Web, and using patterns. According to Björneborn and Ingwersen, the definition of webometrics is "the study of the quantitative aspects of the construction and use of information resources, structures and technologies on the Web drawing on bibliometric and informetric approaches." The term webometrics was coined by Almind and Ingwersen (1997). A second definition of webometrics has also been introduced, "the study of web-based content with primarily quantitative methods for social science research goals using techniques that are not specific to one field of study", which emphasizes the development of applied methods for use in the wider social sciences. The purpose of this alternative definition was to help publicize appropriate methods outside the information-science discipline rather than to replace the original definition within information science. Similar scientific fields are: bibliometrics, informetrics, scientometrics, virtual ethnography, and web mining. One relatively straightforward measure is the "web impact factor" (WIF) introduced by Ingwersen (1998). The WIF measure may be defined as the number of web pages in a web site receiving links from other web sites, divided by the number of web pages published in the site that are accessible to the crawler. However, the use of WIF has been disregarded due to the mathematical artifacts derived from power law distributions of these variables. Other similar indicators using size of the institution instead of number of webpages have been proved more useful.

    Read more →
  • Berlekamp–Rabin algorithm

    Berlekamp–Rabin algorithm

    In number theory, Berlekamp's root finding algorithm, also called the Berlekamp–Rabin algorithm, is the probabilistic method of finding roots of polynomials over the field F p {\displaystyle \mathbb {F} _{p}} with p {\displaystyle p} elements. The method was discovered by Elwyn Berlekamp in 1970 as an auxiliary to the algorithm for polynomial factorization over finite fields. The algorithm was later modified by Rabin for arbitrary finite fields in 1979. The method was also independently discovered before Berlekamp by other researchers. == History == The method was proposed by Elwyn Berlekamp in his 1970 work on polynomial factorization over finite fields. His original work lacked a formal correctness proof and was later refined and modified for arbitrary finite fields by Michael Rabin. In 1986 René Peralta proposed a similar algorithm for finding square roots in F p {\displaystyle \mathbb {F} _{p}} . In 2000 Peralta's method was generalized for cubic equations. == Statement of problem == Let p {\displaystyle p} be an odd prime number. Consider the polynomial f ( x ) = a 0 + a 1 x + ⋯ + a n x n {\textstyle f(x)=a_{0}+a_{1}x+\cdots +a_{n}x^{n}} over the field F p ≃ Z / p Z {\displaystyle \mathbb {F} _{p}\simeq \mathbb {Z} /p\mathbb {Z} } of remainders modulo p {\displaystyle p} . The algorithm should find all λ {\displaystyle \lambda } in F p {\displaystyle \mathbb {F} _{p}} such that f ( λ ) = 0 {\textstyle f(\lambda )=0} in F p {\displaystyle \mathbb {F} _{p}} . == Algorithm == === Randomization === Let f ( x ) = ( x − λ 1 ) ( x − λ 2 ) ⋯ ( x − λ n ) {\textstyle f(x)=(x-\lambda _{1})(x-\lambda _{2})\cdots (x-\lambda _{n})} . Finding all roots of this polynomial is equivalent to finding its factorization into linear factors. To find such factorization it is sufficient to split the polynomial into any two non-trivial divisors and factorize them recursively. To do this, consider the polynomial f z ( x ) = f ( x − z ) = ( x − λ 1 − z ) ( x − λ 2 − z ) ⋯ ( x − λ n − z ) {\textstyle f_{z}(x)=f(x-z)=(x-\lambda _{1}-z)(x-\lambda _{2}-z)\cdots (x-\lambda _{n}-z)} where z {\displaystyle z} is some element of F p {\displaystyle \mathbb {F} _{p}} . If one can represent this polynomial as the product f z ( x ) = p 0 ( x ) p 1 ( x ) {\displaystyle f_{z}(x)=p_{0}(x)p_{1}(x)} then in terms of the initial polynomial it means that f ( x ) = p 0 ( x + z ) p 1 ( x + z ) {\displaystyle f(x)=p_{0}(x+z)p_{1}(x+z)} , which provides needed factorization of f ( x ) {\displaystyle f(x)} . === Classification of === F p {\displaystyle \mathbb {F} _{p}} elements Due to Euler's criterion, for every monomial ( x − λ ) {\displaystyle (x-\lambda )} exactly one of following properties holds: The monomial is equal to x {\displaystyle x} if λ = 0 {\displaystyle \lambda =0} , The monomial divides g 0 ( x ) = ( x ( p − 1 ) / 2 − 1 ) {\textstyle g_{0}(x)=(x^{(p-1)/2}-1)} if λ {\displaystyle \lambda } is quadratic residue modulo p {\displaystyle p} , The monomial divides g 1 ( x ) = ( x ( p − 1 ) / 2 + 1 ) {\textstyle g_{1}(x)=(x^{(p-1)/2}+1)} if λ {\displaystyle \lambda } is quadratic non-residual modulo p {\displaystyle p} . Thus if f z ( x ) {\displaystyle f_{z}(x)} is not divisible by x {\displaystyle x} , which may be checked separately, then f z ( x ) {\displaystyle f_{z}(x)} is equal to the product of greatest common divisors gcd ( f z ( x ) ; g 0 ( x ) ) {\displaystyle \gcd(f_{z}(x);g_{0}(x))} and gcd ( f z ( x ) ; g 1 ( x ) ) {\displaystyle \gcd(f_{z}(x);g_{1}(x))} . === Berlekamp's method === The property above leads to the following algorithm: Explicitly calculate coefficients of f z ( x ) = f ( x − z ) {\displaystyle f_{z}(x)=f(x-z)} , Calculate remainders of x , x 2 , x 2 2 , x 2 3 , x 2 4 , … , x 2 ⌊ log 2 ⁡ p ⌋ {\textstyle x,x^{2},x^{2^{2}},x^{2^{3}},x^{2^{4}},\ldots ,x^{2^{\lfloor \log _{2}p\rfloor }}} modulo f z ( x ) {\displaystyle f_{z}(x)} by squaring the current polynomial and taking remainder modulo f z ( x ) {\displaystyle f_{z}(x)} , Using exponentiation by squaring and polynomials calculated on the previous steps calculate the remainder of x ( p − 1 ) / 2 {\textstyle x^{(p-1)/2}} modulo f z ( x ) {\textstyle f_{z}(x)} , If x ( p − 1 ) / 2 ≢ ± 1 ( mod f z ( x ) ) {\textstyle x^{(p-1)/2}\not \equiv \pm 1{\pmod {f_{z}(x)}}} then gcd {\displaystyle \gcd } mentioned below provide a non-trivial factorization of f z ( x ) {\displaystyle f_{z}(x)} , Otherwise all roots of f z ( x ) {\displaystyle f_{z}(x)} are either residues or non-residues simultaneously and one has to choose another z {\displaystyle z} . If f ( x ) {\displaystyle f(x)} is divisible by some non-linear primitive polynomial g ( x ) {\displaystyle g(x)} over F p {\displaystyle \mathbb {F} _{p}} then when calculating gcd {\displaystyle \gcd } with g 0 ( x ) {\displaystyle g_{0}(x)} and g 1 ( x ) {\displaystyle g_{1}(x)} one will obtain a non-trivial factorization of f z ( x ) / g z ( x ) {\displaystyle f_{z}(x)/g_{z}(x)} , thus algorithm allows to find all roots of arbitrary polynomials over F p {\displaystyle \mathbb {F} _{p}} . === Modular square root === Consider equation x 2 ≡ a ( mod p ) {\textstyle x^{2}\equiv a{\pmod {p}}} having elements β {\displaystyle \beta } and − β {\displaystyle -\beta } as its roots. Solution of this equation is equivalent to factorization of polynomial f ( x ) = x 2 − a = ( x − β ) ( x + β ) {\textstyle f(x)=x^{2}-a=(x-\beta )(x+\beta )} over F p {\displaystyle \mathbb {F} _{p}} . In this particular case problem it is sufficient to calculate only gcd ( f z ( x ) ; g 0 ( x ) ) {\displaystyle \gcd(f_{z}(x);g_{0}(x))} . For this polynomial exactly one of the following properties will hold: GCD is equal to 1 {\displaystyle 1} which means that z + β {\displaystyle z+\beta } and z − β {\displaystyle z-\beta } are both quadratic non-residues, GCD is equal to f z ( x ) {\displaystyle f_{z}(x)} which means that both numbers are quadratic residues, GCD is equal to ( x − t ) {\displaystyle (x-t)} which means that exactly one of these numbers is quadratic residue. In the third case GCD is equal to either ( x − z − β ) {\displaystyle (x-z-\beta )} or ( x − z + β ) {\displaystyle (x-z+\beta )} . It allows to write the solution as β = ( t − z ) ( mod p ) {\textstyle \beta =(t-z){\pmod {p}}} . === Example === Assume we need to solve the equation x 2 ≡ 5 ( mod 11 ) {\textstyle x^{2}\equiv 5{\pmod {11}}} . For this we need to factorize f ( x ) = x 2 − 5 = ( x − β ) ( x + β ) {\displaystyle f(x)=x^{2}-5=(x-\beta )(x+\beta )} . Consider some possible values of z {\displaystyle z} : Let z = 3 {\displaystyle z=3} . Then f z ( x ) = ( x − 3 ) 2 − 5 = x 2 − 6 x + 4 {\displaystyle f_{z}(x)=(x-3)^{2}-5=x^{2}-6x+4} , thus gcd ( x 2 − 6 x + 4 ; x 5 − 1 ) = 1 {\displaystyle \gcd(x^{2}-6x+4;x^{5}-1)=1} . Both numbers 3 ± β {\displaystyle 3\pm \beta } are quadratic non-residues, so we need to take some other z {\displaystyle z} . Let z = 2 {\displaystyle z=2} . Then f z ( x ) = ( x − 2 ) 2 − 5 = x 2 − 4 x − 1 {\displaystyle f_{z}(x)=(x-2)^{2}-5=x^{2}-4x-1} , thus gcd ( x 2 − 4 x − 1 ; x 5 − 1 ) ≡ x − 9 ( mod 11 ) {\textstyle \gcd(x^{2}-4x-1;x^{5}-1)\equiv x-9{\pmod {11}}} . From this follows x − 9 = x − 2 − β {\textstyle x-9=x-2-\beta } , so β ≡ 7 ( mod 11 ) {\displaystyle \beta \equiv 7{\pmod {11}}} and − β ≡ − 7 ≡ 4 ( mod 11 ) {\textstyle -\beta \equiv -7\equiv 4{\pmod {11}}} . A manual check shows that, indeed, 7 2 ≡ 49 ≡ 5 ( mod 11 ) {\textstyle 7^{2}\equiv 49\equiv 5{\pmod {11}}} and 4 2 ≡ 16 ≡ 5 ( mod 11 ) {\textstyle 4^{2}\equiv 16\equiv 5{\pmod {11}}} . == Correctness proof == The algorithm finds factorization of f z ( x ) {\displaystyle f_{z}(x)} in all cases except for ones when all numbers z + λ 1 , z + λ 2 , … , z + λ n {\displaystyle z+\lambda _{1},z+\lambda _{2},\ldots ,z+\lambda _{n}} are quadratic residues or non-residues simultaneously. According to theory of cyclotomy, the probability of such an event for the case when λ 1 , … , λ n {\displaystyle \lambda _{1},\ldots ,\lambda _{n}} are all residues or non-residues simultaneously (that is, when z = 0 {\displaystyle z=0} would fail) may be estimated as 2 − k {\displaystyle 2^{-k}} where k {\displaystyle k} is the number of distinct values in λ 1 , … , λ n {\displaystyle \lambda _{1},\ldots ,\lambda _{n}} . In this way even for the worst case of k = 1 {\displaystyle k=1} and f ( x ) = ( x − λ ) n {\displaystyle f(x)=(x-\lambda )^{n}} , the probability of error may be estimated as 1 / 2 {\displaystyle 1/2} and for modular square root case error probability is at most 1 / 4 {\displaystyle 1/4} . == Complexity == Let a polynomial have degree n {\displaystyle n} . We derive the algorithm's complexity as follows: Due to the binomial theorem ( x − z ) k = ∑ i = 0 k ( k i ) ( − z ) k − i x i {\textstyle (x-z)^{k}=\sum \limits _{i=0}^{k}{\binom {k}{i}}(-z)^{k-i}x^{i}} , we may transition from f ( x ) {\displaystyle f(x)} to f ( x − z ) {\displaystyle f(x-z)} in O ( n 2 ) {\displaystyle O(n^{2})} time. Polynomial multiplication a

    Read more →
  • DIKW pyramid

    DIKW pyramid

    The DIKW pyramid (also known as the knowledge pyramid or information hierarchy) is a model describing relationships between data, information, knowledge and wisdom sometimes also stylized as a chain, refer to models of possible structural and functional relationships between a set of components—often four, data, information, knowledge, and wisdom. The concept has roots predating the 1980s. In the latter years of that decade, interest in the models grew after explicit presentations and discussions, including from Milan Zeleny, Russell Ackoff, and Robert W. Lucky. Subsequent important discussions extended along theoretical and practical lines into the coming decades. While debate continues as to actual meaning of the component terms of DIKW-type models, and the actual nature of their relationships—including occasional doubt being cast over any simple, linear, unidirectional model—even so they have become very popular visual representations in use by business, the military, and others. Among the academic and popular, not all versions of the DIKW-type models include all four components (earlier ones excluding data, later ones excluding or downplaying wisdom, and several including additional components (for instance Ackoff inserting "understanding" before and Zeleny adding "enlightenment" after the wisdom component). In addition, DIKW-type models are no longer always presented as pyramids, instead also as a chart or framework (e.g., by Zeleny), as flow diagrams (e.g., by Liew, and by Chisholm et al.), and sometimes as a continuum (e.g., by Choo et al.). == Short description == As Rowley noted in 2007, the DIKW model "is often quoted, or used implicitly, in definitions of data, information and knowledge in the information management, information systems and knowledge management literatures, but [as of that date] there ha[d] been limited direct discussion of the hierarchy". Reviews of textbooks and a survey of scholars in relevant fields indicate that there was not a consensus as to definitions used in the model as of that date, and as reviewed by Liew in that year, even less "in the description of the processes that transform components lower in the hierarchy into those above them". Zins work, published in 2007—from studies in 2003-2005 that documented "130 definitions of data, information, and knowledge formulated by 45 scholars", published in 2007—to suggest that the data–information–knowledge components of DIKW refer to a class of no less than five models, as a function of whether data, information, and knowledge are each conceived of as subjective, objective (what Zins terms, "universal" or "collective") or both. In Zins' usage, subjective and objective "are not related to arbitrariness and truthfulness, which are usually attached to the concepts of subjective knowledge and objective knowledge". Information science, Zins argues, studies data and information, but not knowledge, as knowledge is an internal (subjective) rather than an external (universal–collective) phenomenon. == Representations == === Graphical representation === DIKW is a hierarchical model often depicted as a pyramid, sometimes as a chain, with data at its base and wisdom at its apex (or chain-beginning and -end). Both Zeleny and Ackoff have been credited with originating the pyramid representation, although neither used a pyramid to present their ideas. According to Wallace, Debons and colleagues may have been the first to "present the hierarchy graphically". Many variations of the DIKW-type pyramid have been produced. One, in use by knowledge managers in the United States Department of Defense, attempts to show the DIKW progression to enable effective decisions and consequent activities supporting shared understanding throughout defense organizations, as well as supporting management of risks associated with decisions. DIKW-type hierarchical information paradigms have also been represented as two-dimensional charts, and as flow diagrams, where relationships between the components may be presented less hierarchically, with defining aspects of the relationships, feedback loops, etc. === Computational representation === Intelligent decision support systems are trying to improve decision making by introducing new technologies and methods from the domain of modeling and simulation in general, and in particular from the domain of intelligent software agents in the contexts of agent-based modeling. The following example describes a military decision support system, but the architecture and underlying conceptual idea are transferable to other application domains: The value chain starts with data quality describing the information within the underlying command and control systems. Information quality tracks the completeness, correctness, currency, consistency and precision of the data items and information statements available. Knowledge quality deals with procedural knowledge and information embedded in the command and control system such as templates for adversary forces, assumptions about entities such as ranges and weapons, and doctrinal assumptions, often coded as rules. Awareness quality measures the degree of using the information and knowledge embedded within the command and control system. Awareness is explicitly placed in the cognitive domain. By the introduction of a common operational picture, data are put into context, which leads to information instead of data. The next step, which is enabled by service-oriented web-based infrastructures (but not yet operationally used), is the use of models and simulations for decision support. Simulation systems are the prototype for procedural knowledge, which is the basis for knowledge quality. Finally, using intelligent software agents to continually observe the battle sphere, apply models and simulations to analyze what is going on, to monitor the execution of a plan, and to do all the tasks necessary to make the decision maker aware of what is going on, command and control systems could even support situational awareness, the level in the value chain traditionally limited to pure cognitive methods. == History == Danny P. Wallace, a professor of library and information science, explained that the origin of the DIKW pyramid is uncertain: The presentation of the relationships among data, information, knowledge, and sometimes wisdom in a hierarchical arrangement has been part of the language of information science for many years. Although it is uncertain when and by whom those relationships were first presented, the ubiquity of the notion of a hierarchy is embedded in the use of the acronym DIKW as a shorthand representation for the data-to-information-to-knowledge-to-wisdom transformation.Many authors think that the idea of the DIKW relationship originated from two lines in the poem "Choruses", by T. S. Eliot, that appeared in the pageant play The Rock, in 1934: === Knowledge, intelligence, and wisdom === In 1927, Clarence W. Barron addressed his employees at Dow Jones & Company on the hierarchy: "Knowledge, Intelligence and Wisdom". === Data, information, knowledge === In 1955, English-American economist and educator Kenneth Boulding presented a variation on the hierarchy consisting of "signals, messages, information, and knowledge". However, "[t]he first author to distinguish among data, information, and knowledge and to also employ the term 'knowledge management' may have been American educator Nicholas L. Henry", in a 1974 journal article. === Data, information, knowledge, wisdom === Other early versions (prior to 1982) of the hierarchy that refer to a data tier include those of Chinese-American geographer Yi-Fu Tuan and sociologist-historian Daniel Bell.. In 1980, Irish-born engineer Mike Cooley invoked the same hierarchy in his critique of automation and computerization, in his book Architect or Bee?: The Human / Technology Relationship. Thereafter, in 1987, Czechoslovakia-born educator Milan Zeleny mapped the components of the hierarchy to knowledge forms: know-nothing, know-what, know-how, and know-why. Zeleny "has frequently been credited with proposing the [representation of DIKW as a pyramid ]... although he actually made no reference to any such graphical model." The hierarchy appears again in a 1988 address to the International Society for General Systems Research, by American organizational theorist Russell Ackoff, published in 1989. Subsequent authors and textbooks cite Ackoff's as the "original articulation" of the hierarchy or otherwise credit Ackoff with its proposal. Ackoff's version of the model includes an understanding tier (as Adler had, before him), interposed between knowledge and wisdom. Although Ackoff did not present the hierarchy graphically, he has also been credited with its representation as a pyramid. In 1989, Bell Labs veteran Robert W. Lucky wrote about the four-tier "information hierarchy" in the form of a pyramid in his book Silicon Dreams. In the same year as Ackoff presented his a

    Read more →
  • Smart speaker industry in South Korea

    Smart speaker industry in South Korea

    Smart speakers, or AI speakers, have been developed by multiple domestic electronics and telecommunications firms in South Korea. Since their introduction to the local market in 2016, they have been used by millions of people in the country. == Brands == === Google === In September 2018, Google Home (including the Google Home Mini) launched in South Korea. Running Google Assistant, it featured simultaneous recognition of two languages among a total of seven, including Korean. At launch, it could play music from Bugs!, in addition to YouTube. === Kakao === In November 2017, Kakao launched the Kakao Mini, featuring integrated KakaoTalk functionality. === KT === KT launched the GiGA Genie smart speaker in January 2017, using a Harman Kardon speaker. In November 2017, KT announced GiGA Genie LTE, a portable AI speaker with LTE support. They also released a mini speaker called GiGA Genie Buddy. In 2018, KT created a special version of GiGa Genie with a screen for use in hotels. On 29 April 2019, KT announced the GiGA Genie Table TV, a consumer-oriented smart speaker with a display. It featured paid TV access through Wi-Fi. Based on usage data from the hotel model, KT decided not to add a touchscreen. The Table TV also featured a limited-access "personalized-text-to-speech technology" which could use parents' voice recording inputs to read children books. In February 2022, KT began rolling out Amazon Alexa integration into its speakers for English support. === Naver === In August 2017, Naver announced the Wave smart speaker, operating on Clova. In October 2017, Naver launched the Friends smart speaker, which were designed based on Line characters. ==== LG Uplus ==== In December 2017, LG Uplus launched the Friends+ speaker with Naver, operating on U+ Home AI. === Samsung === In August 2018, Samsung announced the Samsung Galaxy Home in partnership with Spotify. The original size was delayed, while the Galaxy Home Mini appeared briefly as a bonus for Samsung Galaxy S20 preorders in South Korea in February 2020. === SK Telecom === SK Telecom launched the Nugu smart speaker in September 2016, using an Astell & Kern audio system. In August 2017, SKT released a portable speaker named Nugu mini. In July 2018, SKT launched the Nugu Candle, featuring expanded mood lighting. The first-generation Nugu was subsequently discontinued. On 18 April 2019, SKT released the NUGU Nemo AI, which featured a display and JBL stereo speaker. In August 2019, SKT collaborated with SM Entertainment, incorporating functions related to the agency's artists into Nugu. In January 2022, SKT showcased the NUGU Candle SE, introducing Alexa support. == Usage == In 2018, approximately 3 million people in South Korea used smart speakers. According to data from KT in 2018, the most common commands to its speakers were for controlling televisions. Based on a broader survey in 2017, music was selected as the most frequent use case. By 2018, smart speaker companies were partnering with reading and other education services, adding potential use-cases for children. By 2022, smart speakers were being utilized by the South Korean government. SKT, in partnership with 70 regional governments, distributed smart speakers to 12,000 senior citizens living alone. The government paid for monthly subscriptions to help seniors stay mentally engaged. Naver made an agreement with the Seoul Metropolitan Government to provide Clova CareCall, an automated health checkup program to hundreds of senior citizens living alone. KT's AI care service included an emergency dispatch call function and medication notifications. == Criticism == === Communication === In a survey of 300 users in 2017, approximately half reported having some type of communication issue with their smart speakers. === Privacy === South Korean smart speakers sparked privacy concerns when they were found to be collecting and documenting user audio data in 2019. The speaker companies responded that only a minority of data was collected and that it was anonymized. They stated that such recordings were collected for performance improvements.

    Read more →
  • Applications of artificial intelligence

    Applications of artificial intelligence

    Artificial intelligence is the capability of computational systems to perform tasks that are typically associated with human intelligence, such as learning, reasoning, problem-solving, perception, and decision-making. Artificial intelligence has been used in applications throughout industry and academia. Within the field of Artificial Intelligence, there are multiple subfields. The subfield of machine learning has been used for various scientific and commercial purposes, including language translation, image recognition, decision-making, credit scoring, and e-commerce. In recent years, massive advancements have been made in the field of generative artificial intelligence, which uses generative models to generate text, images, videos, and other forms of data. This article describes applications of AI in different sectors. == Agriculture == In agriculture, AI has been proposed as a way for farmers to identify areas that need irrigation, fertilization, or pesticide treatments to increase yields, thereby improving efficiency. AI has been used to attempt to classify livestock pig call emotions, automate greenhouses, detect diseases and pests, and optimize irrigation. == AI-assisted software develoment == == Architecture and design == == Business == A 2023 study found that generative AI increased productivity by 15% in contact centers. Another 2023 study found it increased productivity by up to 40% in writing tasks. An August 2025 review by MIT found that of surveyed companies, 95% did not report any improvement in revenue from the use of AI. A September 2025 article by the Harvard Business Review describes how increased use of AI does not automatically lead to increases in revenue or actual productivity. Referring to "AI generated work content that masquerades as good work, but lacks the substance to meaningfully advance a given task" the article coins the term workslop. Per studies done in collaboration with the Stanford Social Media Lab, workslop does not improve productivity and undermines trust and collaboration among colleagues. In telehealth, agentic AI is reportedly facilitating the creation of large business models (millions in annual profit) with 1-2 employees, such as MEDVi, which as of August 2025 only had 2 employees and ~$75M in annual profit for GLP-1 weight-loss telehealth services. == Chatbots == == Computer science == === Programming assistance === ==== AI-assisted software development ==== AI can be used for real-time code completion, chat, and automated test generation. These tools are typically integrated with editors and IDEs as plugins. AI-assisted software development systems differ in functionality, quality, speed, and approach to privacy. Creating software primarily via AI is known as "vibe coding". Code created or suggested by AI can be incorrect or inefficient. The use of AI-assisted coding can potentially speed-up software development, but can also slow-down the process by creating more work when debugging and testing. The rush to prematurely adopt AI technology can also incur additional technical debt. AI also requires additional consideration and careful review for cybersecurity, since AI coding software is trained on a wide range of code of inconsistent quality and often replicates poor practices. ==== Neural network design ==== AI can be used to create other AIs. For example, around November 2017, Google's AutoML project to evolve new neural net topologies created NASNet, a system optimized for ImageNet and POCO F1. NASNet's performance exceeded all previously published performance on ImageNet. ==== Quantum computing ==== Research and development of quantum computers has been performed with machine learning algorithms. For example, there is a prototype, photonic, quantum memristive device for neuromorphic computers (NC)/artificial neural networks and NC-using quantum materials with some variety of potential neuromorphic computing-related applications. The use of quantum machine learning for quantum simulators has been proposed for solving physics and chemistry problems. === Historical contributions === AI researchers have created many tools to solve the most difficult problems in computer science. Many of their inventions have been adopted by mainstream computer science and are no longer considered AI. All of the following were originally developed in AI laboratories: Time sharing Interactive interpreters Graphical user interfaces and the computer mouse Rapid application development environments The linked list data structure Automatic storage management Symbolic programming Functional programming Dynamic programming Object-oriented programming Optical character recognition Constraint satisfaction == Customer service == === Human resources === AI programs have been used in hiring processes to screen resumes and rank candidates based on their qualifications, predict a candidate's likelihood of success in a given role, and automate repetitive communication tasks using chatbots. Studies on these programs have identified tendencies for gender bias, favoring male names and male-coded characteristics, as well as bias against disabled candidates and racial minorities. === Online and telephone customer service === AI underlies avatars (automated online assistants) on web pages. It can reduce operation and training costs. Pypestream automated customer service for its mobile application to streamline communication with customers. A Google app analyzes language and converts speech into text. The platform can identify angry customers through their language and respond appropriately. Amazon uses a chatbot for customer service that can perform tasks like checking the status of an order, cancelling orders, offering refunds and connecting the customer with a human representative. Generative AI (GenAI), such as ChatGPT, is increasingly used in business to automate tasks and enhance decision-making. === Hospitality === In the hospitality industry, AI is used to reduce repetitive tasks, analyze trends, interact with guests, and predict customer needs. AI hotel services come in the form of a chatbot, application, virtual voice assistant and service robots. == Education == In educational institutions, AI has been used to automate routine tasks such as attendance tracking, grading, and marking. AI tools have also been used to monitor student progress and analyze learning behaviors, with the goal of facilitating timely interventions for students facing academic challenges. == Energy and environment == === Energy system === The U.S. Department of Energy wrote in an April 2024 report that AI may have applications in modeling power grids, reviewing federal permits with large language models, predicting levels of renewable energy production, and improving the planning process for electrical vehicle charging networks. Other studies have suggested that machine learning can be used for energy consumption prediction and scheduling, e.g. to help with renewable energy intermittency management (see also: smart grid and climate change mitigation in the power grid). === Environmental monitoring === Autonomous ships that monitor the ocean, AI-driven satellite data analysis, passive acoustics or remote sensing and other applications of environmental monitoring make use of machine learning. For example, "Global Plastic Watch" is an AI-based satellite monitoring-platform for analysis/tracking of plastic waste sites to help prevention of plastic pollution – primarily ocean pollution – by helping identify who and where mismanages plastic waste, dumping it into oceans. === Early-warning systems === Machine learning can be used to spot early-warning signs of disasters and environmental issues, possibly including natural pandemics, earthquakes, landslides, heavy rainfall, long-term water supply vulnerability, tipping-points of ecosystem collapse, cyanobacterial bloom outbreaks, and droughts. === Economic and social challenges === The University of Southern California launched the Center for Artificial Intelligence in Society, with the goal of using AI to address problems such as homelessness. Stanford researchers use AI to analyze satellite images to identify high poverty areas. == Entertainment and media == === Media === AI applications analyze media content such as movies, TV programs, advertisement videos or user-generated content. The solutions often involve computer vision. Typical scenarios include the analysis of images using object recognition or face recognition techniques, or the analysis of video for scene recognizing scenes, objects or faces. AI-based media analysis can facilitate media search, the creation of descriptive keywords for content, content policy monitoring (such as verifying the suitability of content for a particular TV viewing time), speech to text for archival or other purposes, and the detection of logos, products or celebrity faces for ad placement. Motion interpolation Pixel-art scaling algorithms Image scaling Imag

    Read more →
  • Data quality

    Data quality

    Data quality refers to the state of qualitative or quantitative pieces of information. There are many definitions of data quality, but data is generally considered high quality if it is "fit for [its] intended uses in operations, decision making and planning". Data is deemed of high quality if it correctly represents the real-world construct to which it refers. Apart from these definitions, as the number of data sources increases, the question of internal data consistency becomes significant, regardless of fitness for use for any particular external purpose. People's views on data quality can often be in disagreement, even when discussing the same set of data used for the same purpose. When this is the case, businesses may adopt recognised international standards for data quality (See #International Standards for Data Quality below). Data governance can also be used to form agreed upon definitions and standards, including international standards, for data quality. In such cases, data cleansing, including standardization, may be required in order to ensure data quality. == Definitions == Defining data quality is difficult due to the many contexts data are used in, as well as the varying perspectives among end users, producers, and custodians of data. From a consumer perspective, data quality is: "data that are fit for use by data consumers" data "meeting or exceeding consumer expectations" data that "satisfies the requirements of its intended use" From a business perspective, data quality is: data that are "'fit for use' in their intended operational, decision-making and other roles" or that exhibits "'conformance to standards' that have been set, so that fitness for use is achieved" data that "are fit for their intended uses in operations, decision making and planning" "the capability of data to satisfy the stated business, system, and technical requirements of an enterprise" From a standards-based perspective, data quality is: the "degree to which a set of inherent characteristics (quality dimensions) of an object (data) fulfills requirements" "the usefulness, accuracy, and correctness of data for its application" Arguably, in all these cases, "data quality" is a comparison of the actual state of a particular set of data to a desired state, with the desired state being typically referred to as "fit for use," "to specification," "meeting consumer expectations," "free of defect," or "meeting requirements." These expectations, specifications, and requirements are usually defined by one or more individuals or groups, standards organizations, laws and regulations, business policies, or software development policies. == Dimensions of data quality == Drilling down further, those expectations, specifications, and requirements are stated in terms of characteristics or dimensions of the data, such as: accessibility or availability accuracy or correctness comparability completeness or comprehensiveness consistency, coherence, or clarity credibility, reliability, or reputation flexibility plausibility relevance, pertinence, or usefulness timeliness or latency uniqueness validity or reasonableness A systematic scoping review of the literature suggests that data quality dimensions and methods with real world data are not consistent in the literature, and as a result quality assessments are challenging due to the complex and heterogeneous nature of these data. == International standards for data quality == ISO 8000 is an international standard for data quality. Managed by the International Organization for Standardization, the ISO 8000 standards address and describe general aspects of data quality including principles, vocabulary and measurement data governance data quality management data quality assessment quality of master data, including exchange of characteristic data and identifiers quality of industrial data == History == Before the rise of the inexpensive computer data storage, massive mainframe computers were used to maintain name and address data for delivery services. This was so that mail could be properly routed to its destination. The mainframes used business rules to correct common misspellings and typographical errors in name and address data, as well as to track customers who had moved, died, gone to prison, married, divorced, or experienced other life-changing events. Government agencies began to make postal data available to a few service companies to cross-reference customer data with the National Change of Address registry (NCOA). This technology saved large companies millions of dollars in comparison to manual correction of customer data. Large companies saved on postage, as bills and direct marketing materials made their way to the intended customer more accurately. Initially sold as a service, data quality moved inside the walls of corporations, as low-cost and powerful server technology became available. Companies with an emphasis on marketing often focused their quality efforts on name and address information, but data quality is recognized as an important property of all types of data. Principles of data quality can be applied to supply chain data, transactional data, and nearly every other category of data found. For example, making supply chain data conform to a certain standard has value to an organization by: 1) avoiding overstocking of similar but slightly different stock; 2) avoiding false stock-out; 3) improving the understanding of vendor purchases to negotiate volume discounts; and 4) avoiding logistics costs in stocking and shipping parts across a large organization. For companies with significant research efforts, data quality can include developing protocols for research methods, reducing measurement error, bounds checking of data, cross tabulation, modeling and outlier detection, verifying data integrity, etc. == Overview == There are a number of theoretical frameworks for understanding data quality. A systems-theoretical approach influenced by American pragmatism expands the definition of data quality to include information quality, and emphasizes the inclusiveness of the fundamental dimensions of accuracy and precision on the basis of the theory of science (Ivanov, 1972). One framework, dubbed "Zero Defect Data" (Hansen, 1991) adapts the principles of statistical process control to data quality. Another framework seeks to integrate the product perspective (conformance to specifications) and the service perspective (meeting consumers' expectations) (Kahn et al. 2002). Another framework is based in semiotics to evaluate the quality of the form, meaning and use of the data (Price and Shanks, 2004). One highly theoretical approach analyzes the ontological nature of information systems to define data quality rigorously (Wand and Wang, 1996). A considerable amount of data quality research involves investigating and describing various categories of desirable attributes (or dimensions) of data. Nearly 200 such terms have been identified and there is little agreement in their nature (are these concepts, goals or criteria?), their definitions or measures (Wang et al., 1993). Software engineers may recognize this as a similar problem to "ilities". MIT has an Information Quality (MITIQ) Program, led by Professor Richard Wang, which produces a large number of publications and hosts a significant international conference in this field (International Conference on Information Quality, ICIQ). This program grew out of the work done by Hansen on the "Zero Defect Data" framework (Hansen, 1991). In practice, data quality is a concern for professionals involved with a wide range of information systems, ranging from data warehousing and business intelligence to customer relationship management and supply chain management. One industry study estimated the total cost to the U.S. economy of data quality problems at over U.S. $600 billion per annum (Eckerson, 2002). Incorrect data – which includes invalid and outdated information – can originate from different data sources – through data entry, or data migration and conversion projects. In 2002, the USPS and PricewaterhouseCoopers released a report stating that 23.6 percent of all U.S. mail sent is incorrectly addressed. One reason contact data becomes stale very quickly in the average database – more than 45 million Americans change their address every year. In fact, the problem is such a concern that companies are beginning to set up a data governance team whose sole role in the corporation is to be responsible for data quality. In some organizations, this data governance function has been established as part of a larger Regulatory Compliance function - a recognition of the importance of Data/Information Quality to organizations. Problems with data quality don't only arise from incorrect data; inconsistent data is a problem as well. Eliminating data shadow systems and centralizing data in a warehouse is one of the initiatives a company can take to ensure data consistency. En

    Read more →
  • Environmental informatics

    Environmental informatics

    Environmental informatics is the science of information applied to environmental science. As such, it provides the information processing and communication infrastructure to the interdisciplinary field of environmental sciences aiming at data, information and knowledge integration, the application of computational intelligence to environmental data as well as the identification of environmental impacts of information technology. Environmental informatics thus acts as a bridge, providing an interdisciplinary means of analysing, describing and understanding the complex interactions between humans, nature and technology. Since each field of applied computer science has its own subject matter, terminology and methods, specialised disciplines, such as environmental, bio- and geoinformatics have emerged, each of which combines computer science with a specific field of application such as environmental, bio- or geosciences. Environmental informatics, bioinformatics and geoinformatics all deal with computer-based processing of environmental phenomena. However, environmental informatics is the only field that pursues normative goals (e.g., political goals of environmental protection, environmental planning, and sustainability). This also influences the choice of methods. This also distinguishes it from application areas such as numerical weather prediction, which is considered an early and important example of computer simulation of environmental phenomena. The UK Natural Environment Research Council defines environmental informatics as the "research and system development focusing on the environmental sciences relating to the creation, collection, storage, processing, modelling, interpretation, display and dissemination of data and information." Kostas Karatzas defined environmental informatics as the "creation of a new 'knowledge-paradigm' towards serving environmental management needs." Karatzas argued further that environmental informatics "is an integrator of science, methods and techniques and not just the result of using information and software technology methods and tools for serving environmental engineering needs." Environmental informatics emerged in early 1990 in Central Europe. Current initiatives to effectively manage, share, and reuse environmental and ecological data are indicative of the increasing importance of fields like environmental informatics and ecoinformatics to develop the foundations for effectively managing ecological information. Examples of these initiatives are National Science Foundation Datanet projects, DataONE and Data Conservancy. == Subject matter and objectives == The subject of environmental informatics are environmental information systems (EIS). An EIS 'is a computer-based system that integrates and stores data collected about the natural environment and provides powerful methods for accessing and evaluating it.' This allows environmental data to be processed by computers for environmental protection, planning, research and technology. According to Jaeschke and Bossel, environmental informatics has three interrelated objectives: Environmental informatics serves to procure data and information for describing the state and development of the environment. Of particular importance is information that is needed to prevent or limit undesirable changes and to support desirable changes. Based on the evaluation and analysis of data, environmental informatics improves our understanding of the environment and the interactions between nature, technology and society. It thus supports environmentally relevant decisions. This enables the influence of development (system correction), the assessment of the effects and side effects of potential measures, and the creation of tools for the routine planning, implementation and monitoring of measures. == History == The simulation model World3, which formed the basis of the highly acclaimed study The Limits to Growth, is considered the starting point of environmental informatics. It incorporated environmental information, among other things, to calculate scenarios for global development. In the mid-1980s, interest grew in structuring environmental protection as an area of application for computer science. One of the first publications in German was the book Informatik im Umweltschutz. Anwendungen und Perspektiven (Computer science in environmental protection. Applications and perspectives) from 1986. The term 'environmental informatics' did not appear until around 1993, which is why the development of environmental informatics is usually referred to as having taken place in the 1990s. In 1993, the first university chair for environmental informatics was established in Cottbus. In 1994, the anthology Umweltinformatik. Informatikmethoden für Umweltschutz und Umweltforschung (Environmental Informatics: Informatics Methods for Environmental Protection and Environmental Research) was published. The development of environmental informatics was 'primarily initiated by German computer science.' In the English-speaking world, the volume Environmental Informatics was published in 1995, mainly based on the German anthology of 1994. An article in the conference proceedings of the World Computer Congress of the International Federation for Information Processing (IFIP) in Hamburg in 1994 describes the initial situation of environmental informatics as follows: 'On the one hand, we suffer from the huge amount of available data – people sometimes speak of data graveyards – on the other hand, the really relevant data may still be missing.' This statement indicates the need that led to the emergence of environmental informatics as a specialised discipline of applied computer science. Furthermore, the specific characteristics and processing requirements of environmental data necessitated the emergence of environmental informatics. The special features of environmental data include: The data structures required are highly heterogeneous due to specific processes and differing perspectives on environmental aspects (e.g., water protection, emission control, hazardous substances). In addition to the heterogeneity of the data, heterogeneous databases also play a role, as environmental data is often obtained and presented in an interdisciplinary manner. Obligations change frequently as a result of new legislation, whether regional (e.g. state regulations on water protection), national (e.g. federal emission control regulations) or international (e.g. Registration, Evaluation, Authorisation and Restriction of Chemicals|REACH). The objects represented are often multidimensional and, therefore, require complex geometric representation using curves or polygons. It is often necessary to process uncertain, imprecise or incomplete data, which is, for example, the result of extrapolations or forecasts. A new "knowledge paradigm" has emerged to meet the requirements of environmental management. Environmental informatics produces its own concepts, methods and techniques and is not merely the result of using information and communication technology methods and tools to meet environmental requirements. The development of environmental informatics since the 1990s has been significantly influenced by the newly established conferences EnviroInfo, ISESS and ITEE and is documented in the respective proceedings. Aspects of sustainability and sustainable development were increasingly integrated into environmental informatics after 2000, thereby expanding the field. In 2004, the Working Group on Sustainable Information Society of the Gesellschaft für Informatik e. V. (German Informatics Society, GI) published the Memorandum on a Sustainable Information Society, which formulates recommendations for an information society that is compatible with human, social and natural needs. Since 2007, environmental informatics has often been described in more detail as informatics for environmental protection, sustainable development and risk management. The increased focus on sustainability has also contributed to the formation of the research focus Information and Communications Technology for Sustainability (ICT4S) and to the emergence of the international conference ICT4S in 2013. ICT-ENSURE, the European Commission's funding measure for the establishment of a European research area on "ICT for Environmental Sustainability Research" (2008–2010), has also contributed to the structuring of environmental informatics. == Environmental informatics and sustainable development == Efforts to place environmental informatics within the context of sustainable development have been growing since 2000 and were significantly influenced by the Memorandum on a Sustainable Information Society. According to this Memorandum, the information society offers great but unevenly distributed opportunities for education, participation and intercultural understanding. In addition, the Memorandum highlighted the material and energy consumption of inf

    Read more →
  • Law practice management software

    Law practice management software

    Law practice management software is software designed to manage the business operations of a law firm. This can include software that manages cases, client intake, court communications, electronic discovery, time tracking, trust accounting, and billing. == Features of law practice management software == Common features of practice management software include: Case management Time tracking Document assembly Contact management Calendaring Docket management Client portal Contract Management Court Case Status Tracker Trust accounting == Examples of law practice management software == Smokeball LEAP Legal Software PracticeEvolve Dye & Durham

    Read more →
  • Information architecture

    Information architecture

    Information architecture is the structural design of shared information environments, in particular the organisation of websites and software to support usability and findability. The term information architecture was coined by Richard Saul Wurman. Since its inception, information architecture has become an emerging community of practice focused on applying principles of design, architecture and information science in digital spaces. Typically, a model or concept of information is used and applied to activities which require explicit details of complex information systems. These activities include library systems and database development. == Definition == The term information architecture has different meanings in different branches of information systems or information technology. === User experience === In user experience design, information architecture has been described as the structural design of shared information environments, comprising the study and practice of organising and labelling web sites, intranets, online communities, and software to support user experience, in particular, the findability and usability of information. It has also been described as an emerging community of practice focused on bringing principles of design and architecture to the digital landscape. === Information systems === Technically speaking, information architecture comprises the combination of organization, labeling, search and navigation systems within websites and intranets, serving as a navigational aid to the content of information-rich systems. === Data architecture === Information architecture can be described as a subset of data architecture where usable data is constructed, designed, and arranged in a fashion most useful to the users of data. === Systems design === In the field of systems design, for example, information architecture is a component of enterprise architecture that deals with the information component when describing the structure of an enterprise. Some system design practitioners regard information architecture as strictly the application of information science to web design, which considers such issues as classification and information retrieval, and not factors like user experience and information design. == Principles == Principles of information architecture include the following: The principle of objects The principle of choices The principle of disclosure The principle of exemplars The principle of front doors The principle of multiple classification The principle of focused navigation The principle of growth == History == Richard Saul Wurman is credited with coining the term information architecture in relation to the design of information. From 1998 to 2015, Peter Morville and Louis Rosenfeld were co-authors of Information Architecture for the World Wide Web. Other authors include Jesse James Garrett and Christina Wodtke.

    Read more →
  • Online public access catalog

    Online public access catalog

    The online public access catalog (OPAC), now frequently synonymous with library catalog, is an online database of materials held by a library or group of libraries. Online catalogs have largely replaced the analog card catalogs previously used in libraries. == History == === Early online === Although a handful of experimental systems existed as early as the 1960s, the first large-scale online catalogs were developed at Ohio State University in 1975 and the Dallas Public Library in 1978. These and other early online catalog systems tended to closely reflect the card catalogs that they were intended to replace. Using a dedicated terminal or telnet client, users could search a handful of pre-coordinate indexes and browse the resulting display in much the same way they had previously navigated the card catalog. Throughout the 1980s, the number and sophistication of online catalogs grew. The first commercial systems appeared, and would by the end of the decade largely replace systems built by libraries themselves. Library catalogs began providing improved search mechanisms, including Boolean and keyword searching, as well as ancillary functions, such as the ability to place holds on items that had been checked-out. At the same time, libraries began to develop applications to automate the purchase, cataloging, and circulation of books and other library materials. These applications, collectively known as an integrated library system (ILS) or library management system, included an online catalog as the public interface to the system's inventory. Most library catalogs are closely tied to their underlying ILS system. === Stagnation and dissatisfaction === The 1990s saw a relative stagnation in the development of online catalogs. Although the earlier character-based interfaces were replaced with ones for the Web, both the design and the underlying search technology of most systems did not advance much beyond that developed in the late 1980s. At the same time, organizations outside of libraries began developing more sophisticated information retrieval systems. Web search engines like Google and popular e-commerce websites such as Amazon.com provided simpler to use (yet more powerful) systems that could provide relevancy ranked search results using probabilistic and vector-based queries. Prior to the widespread use of the Internet, the online catalog was often the first information retrieval system library users ever encountered. Now accustomed to web search engines, newer generations of library users have grown increasingly dissatisfied with the complex (and often arcane) search mechanisms of older online catalog systems. This has, in turn, led to vocal criticisms of these systems within the library community itself, and in recent years to the development of newer (often termed 'next-generation') catalogs. === Next-generation catalogs === Newer generations of library catalog systems, typically called discovery systems (or a discovery layer), are distinguished from earlier OPACs by their use of more sophisticated search technologies, including relevancy ranking and faceted search, as well as features aimed at greater user interaction and participation with the system, including tagging and reviews. These new features rely heavily on existing metadata which may be poor or inconsistent, particularly for older records. Newer catalog platforms may be independent of the organization's integrated library system (ILS), instead providing drivers that allow for the synchronization of data between the two systems. While the original online catalog interfaces were almost exclusively built by ILS vendors, libraries have increasingly sought next-generation catalogs built by enterprise search companies and open-source software projects, often led by libraries themselves. == Union catalogs == Although library catalogs typically reflect the holdings of a single library, they can also contain the holdings of a group or consortium of libraries. These systems, known as union catalogs, are usually designed to aid the borrowing of books and other materials among the member institutions via interlibrary loan. Examples of this type of catalogs include COPAC, SUNCAT, NLA Trove, and WorldCat—the last catalogs the collections of libraries worldwide. == Related systems == There are a number of systems that share much in common with library catalogs, but have traditionally been distinguished from them. Libraries utilize these systems to search for items not traditionally covered by a library catalog, although these systems are sometimes integrated into a more comprehensive discovery system. Bibliographic databases—such as Medline, ERIC, PsycINFO, Scopus, Web of Science, and many others—index journal articles and other research data. There are also a number of applications aimed at managing documents, photographs, and other digitized or born-digital items such as Digital Commons and DSpace. Particularly in academic libraries, these systems (often known as digital library systems or institutional repository systems) assist with efforts to preserve documents created by faculty and students. Electronic resource management helps librarians to track selection, acquisition, and licensing of a library's electronic information resources.

    Read more →