Knowledge organization

Knowledge organization

Knowledge organization (KO), organization of knowledge, organization of information, or information organization is an intellectual discipline concerned with activities such as document description, indexing, and classification that serve to provide systems of representation and order for knowledge and information objects. According to The Organization of Information by Joudrey and Taylor, information organization: examines the activities carried out and tools used by people who work in places that accumulate information resources (e.g., books, maps, documents, datasets, images) for the use of humankind, both immediately and for posterity. It discusses the processes that are in place to make resources findable, whether someone is searching for a single known item or is browsing through hundreds of resources just hoping to discover something useful. Information organization supports a myriad of information-seeking scenarios. Issues related to knowledge sharing can be said to have been an important part of knowledge management for a long time. Knowledge sharing has received a lot of attention in research and business practice both within and outside organizations and its different levels. Sharing knowledge is not only about giving it to others, but it also includes searching, locating, and absorbing knowledge. Unawareness of the employees' work and duties tends to provoke the repetition of mistakes, the waste of resources, and duplication of the same projects. Motivating co-workers to share their knowledge is called knowledge enabling. It leads to trust among individuals and encourages a more open and proactive relationship that grants the exchange of information easily. Knowledge sharing is part of the three-phase knowledge management process which is a continuous process model. The three parts are knowledge creation, knowledge implementation, and knowledge sharing. The process is continuous, which is why the parts cannot be fully separated. Knowledge creation is the consequence of individuals' minds, interactions, and activities. Developing new ideas and arrangements alludes to the process of knowledge creation. Using the knowledge which is present at the company in the most effective manner stands for the implementation of knowledge. Knowledge sharing, the most essential part of the process for our topic, takes place when two or more people benefit by learning from each other. Traditional human-based approaches performed by librarians, archivists, and subject specialists are increasingly challenged by computational (big data) algorithmic techniques. KO as a field of study is concerned with the nature and quality of such knowledge-organizing processes (KOP) (such as taxonomy and ontology) as well as the resulting knowledge organizing systems (KOS). == Theoretical approaches == === Traditional approaches === Among the major figures in the history of KO are Melvil Dewey (1851–1931) and Henry Bliss (1870–1955). Dewey's goal was an efficient way to manage library collections; not an optimal system to support users of libraries. His system was meant to be used in many libraries as a standardized way to manage collections. The first version of this system was created in 1876. An important characteristic in Henry Bliss' (and many contemporary thinkers of KO) was that the sciences tend to reflect the order of Nature and that library classification should reflect the order of knowledge as uncovered by science: The implication is that librarians, in order to classify books, should know about scientific developments. This should also be reflected in their education: Again from the standpoint of the higher education of librarians, the teaching of systems of classification ... would be perhaps better conducted by including courses in the systematic encyclopedia and methodology of all the sciences, that is to say, outlines which try to summarize the most recent results in the relation to one another in which they are now studied together. ... (Ernest Cushing Richardson, quoted from Bliss, 1935, p. 2) Among the other principles, which may be attributed to the traditional approach to KO are: Principle of controlled vocabulary Cutter's rule about specificity Hulme's principle of literary warrant (1911) Principle of organizing from the general to the specific Today, after more than 100 years of research and development in LIS, the "traditional" approach still has a strong position in KO and in many ways its principles still dominate. === Facet analytic approaches === The date of the foundation of this approach may be chosen as the publication of S. R. Ranganathan's colon classification in 1933. The approach has been further developed by, in particular, the British Classification Research Group. The best way to explain this approach is probably to explain its analytico-synthetic methodology. The meaning of the term "analysis" is: breaking down each subject into its basic concepts. The meaning of the term synthesis is: combining the relevant units and concepts to describe the subject matter of the information package in hand. Given subjects (as they appear in, for example, book titles) are first analyzed into a few common categories, which are termed "facets". Ranganathan proposed his PMEST formula: Personality, Matter, Energy, Space and Time: Personality is the distinguishing characteristic of a subject. Matter is the physical material of which a subject may be composed. Energy is any action that occurs with respect to the subject. Space is the geographic component of the location of a subject. Time is the period associated with a subject. === The information retrieval tradition (IR) === Important in the IR-tradition have been, among others, the Cranfield experiments, which were founded in the 1950s, and the TREC experiments (Text Retrieval Conferences) starting in 1992. It was the Cranfield experiments, which introduced the measures "recall" and "precision" as evaluation criteria for systems efficiency. The Cranfield experiments found that classification systems like UDC and facet-analytic systems were less efficient compared to free-text searches or low level indexing systems ("UNITERM"). The Cranfield I test found, according to Ellis (1996, 3–6) the following results: Although these results have been criticized and questioned, the IR-tradition became much more influential while library classification research lost influence. The dominant trend has been to regard only statistical averages. What has largely been neglected is to ask: Are there certain kinds of questions in relation to which other kinds of representation, for example, controlled vocabularies, may improve recall and precision? === User-oriented and cognitive views === The best way to define this approach is probably by method: Systems based upon user-oriented approaches must specify how the design of a system is made on the basis of empirical studies of users. User studies demonstrated very early that users prefer verbal search systems as opposed to systems based on classification notations. This is one example of a principle derived from empirical studies of users. Adherents of classification notations may, of course, still have an argument: That notations are well-defined and that users may miss important information by not considering them. Folksonomies is a recent kind of KO based on users' rather than on librarians' or subject specialists' indexing. === Bibliometric approaches === These approaches are primarily based on using bibliographical references to organize networks of papers, mainly by bibliographic coupling (introduced by Kessler 1963) or co-citation analysis ( independently suggested by Marshakova 1973 and Small 1973). In recent years it has become a popular activity to construe bibliometric maps as structures of research fields. Two considerations are important in considering bibliometric approaches to KO: The level of indexing depth is partly determined by the number of terms assigned to each document. In citation indexing this corresponds to the number of references in a given paper. On the average, scientific papers contain 10–15 references, which provide quite a high level of depth. The references, which function as access points, are provided by the highest subject-expertise: The experts writing in the leading journals. This expertise is much higher than that which library catalogs or bibliographical databases typically are able to draw on. === The domain analytic approach === Domain analysis is a sociological-epistemological standpoint that advocates that the indexing of a given document should reflect the needs of a given group of users or a given ideal purpose. In other words, any description or representation of a given document is more or less suited to the fulfillment of certain tasks. A description is never objective or neutral, and the goal is not to standardize descriptions or make one description once and for all for different target groups. The develo

Cloud load balancing

Cloud load balancing is a type of load balancing that is performed in cloud computing. Cloud load balancing is the process of distributing workloads across multiple computing resources. Cloud load balancing reduces costs associated with document management systems and maximizes availability of resources. It is a type of load balancing and not to be confused with Domain Name System (DNS) load balancing. While DNS load balancing uses software or hardware to perform the function, cloud load balancing uses services offered by various computer network companies. == Comparison With DNS load balancing == Cloud load balancing has an advantage over DNS load balancing as it can transfer loads to servers globally as opposed to distributing it across local servers. In the event of a local server outage, cloud load balancing delivers users to the closest regional server without interruption for the user. Cloud load balancing addresses issues relating to TTL reliance present during DNS load balancing. DNS directives can only be enforced once in every TTL cycle and can take several hours if switching between servers during a lag or server failure. Incoming server traffic will continue to route to the original server until the TTL expires and can create an uneven performance as different internet service providers may reach the new server before other internet service providers. Another advantage is that cloud load balancing improves response time by routing remote sessions to the best performing data centers. == Importance of Load Balancing == Cloud computing brings advantages in "cost, flexibility and availability of service users." Those advantages drive the demand for Cloud services. The demand raises technical issues in Service Oriented Architectures and Internet of Services (IoS)-style applications, such as high availability and scalability. As a major concern in these issues, load balancing allows cloud computing to "scale up to increasing demands" by efficiently allocating dynamic local workload evenly across all nodes. == Load Balancing Techniques == === Scheduling Algorithms === Opportunistic Load Balancing (OLB) is the algorithm that assigns workloads to nodes in free order. It is simple but does not consider the expected execution time of each node. Load balance Min-Min (LBMM) assigns sub-tasks to the node which requires minimum execution time. === Load Balancing Policies === Workload and Client Aware Policy (WCAP) specifies the unique and special property (USP) of requests and computing nodes. With the information of USP, the schedule can decide the most suitable node to complete a request. WCAP makes the most of computing nodes by reducing their idle time. Also, it reduces performance time through searches based on content information. === A Comparative Study of Algorithms === Biased Random Sampling bases its job allocation on the network represented by a directed graph. For each execution node in this graph, in-degree means available resources and out-degree means allocated jobs. In-degree will decrease during job execution while out-degree will increase after job allocation. Active Clustering is a self-aggregation algorithm to rewire the network. The experiment result is that"Active Clustering and Random Sampling Walk predictably perform better as the number of processing nodes is increased" while the Honeyhive algorithm does not show the increasing pattern. == Client-side Load Balancer Using Cloud Computing == Load balancer forwards packets to web servers according to different workloads on servers. However, it is hard to implement a scalable load balancer because of both the "cloud's commodity business model and the limited infrastructure control allowed by cloud providers." Client-side Load Balancer (CLB) solve this problem by using a scalable cloud storage service. CLB allows clients to choose back-end web servers for dynamic content although it delivers static content.

Seismological Facility for the Advancement of Geoscience

The U.S. National Science Foundation's Seismological Facility for the Advancement of Geoscience (NSF SAGE) is a distributed, multi-user national facility that provides support for state of-the-art seismic research. It is operated by EarthScope Consortium. Its previous operator was the Incorporated Research Institutions for Seismology (IRIS), until its merger with UNAVCO to become EarthScope Consortium. NSF SAGE is one of the two premier geophysical facilities in support of geoscience and geoscience education of the National Science Foundation. The other premiere geophysical facility is NSF GAGE, the Geodetic Facility for the Advancement of Geoscience. The services of the facility include support for the Global Seismographic Network (GSN), Data Services, and instrument support via the EarthScope Primary Instrument Center (EPIC), including magnetotelluric (MT) geophysical research. == Global Seismographic Network (GSN) == NSF SAGE manages 40 stations of the 152-station Global Seismographic Network (GSN) for basic global seismicity and Earth structure research. The GSN also enables earthquake hazard mission-related data operations such as: Earthquake location and characterization Tsunami warning Nuclear explosion monitoring == Data Services == SAGE Data Services (DS) is the largest facility for the archiving, curation, and distribution of seismological and other geophysical data in the world. == EarthScope Primary Instrument Center (EPIC) == The EPIC facility maintains the largest open access, shared-use pool of portable seismic sensors in the world. It is located on the campus of New Mexico Tech. == MT == NSF SAGE provides instruments for magnetotelluric (MT) or electromagnetic geophysical research for the recording of our planet's ambient electric and magnetic fields, which allow for the characterization of the conductivity of the area consisting of the shallow crust to upper mantle. This helps with analysis of results obtained from seismic imaging methodologies. The NSF SAGE facility is: Developing open source MT data formatting and processing software. Providing access to proprietary software products.

OpenSMILE

openSMILE is source-available software for automatic extraction of features from audio signals and for classification of speech and music signals. "SMILE" stands for "Speech & Music Interpretation by Large-space Extraction". The software is mainly applied in the area of automatic emotion recognition and is widely used in the affective computing research community. The openSMILE project exists since 2008 and is maintained by the German company audEERING GmbH since 2013. openSMILE is provided free of charge for research purposes and personal use under a source-available license. For commercial use of the tool, the company audEERING offers custom license options. == Application Areas == openSMILE is used for academic research as well as for commercial applications in order to automatically analyze speech and music signals in real-time. In contrast to automatic speech recognition which extracts the spoken content out of a speech signal, openSMILE is capable of recognizing the characteristics of a given speech or music segment. Examples for such characteristics encoded in human speech are a speaker's emotion, age, gender, and personality, as well as speaker states like depression, intoxication, or vocal pathological disorders. The software further includes music classification technology for automatic music mood detection and recognition of chorus segments, key, chords, tempo, meter, dance-style, and genre. The openSMILE toolkit serves as benchmark in manifold research competitions such as Interspeech ComParE, AVEC, MediaEval, and EmotiW. == History == The openSMILE project was started in 2008 by Florian Eyben, Martin Wöllmer, and Björn Schuller at the Technical University of Munich within the European Union research project SEMAINE. The goal of the SEMAINE project was to develop a virtual agent with emotional and social intelligence. In this system, openSMILE was applied for real-time analysis of speech and emotion. The final SEMAINE software release is based on openSMILE version 1.0.1. In 2009, the emotion recognition toolkit (openEAR) was published based on openSMILE. "EAR" stands for "Emotion and Affect Recognition". In 2010, openSMILE version 1.0.1 was published and was introduced and awarded at the ACM Multimedia Open-Source Software Challenge. Between 2011 and 2013, the technology of openSMILE was extended and improved by Florian Eyben and Felix Weninger in the context of their doctoral thesis at the Technical University of Munich. The software was also applied for the project ASC-Inclusion, which was funded by the European Union. For this project, the software was extended by Erik Marchi in order to teach emotional expression to autistic children, based on automatic emotion recognition and visualization. In 2013, the company audEERING acquired the rights to the code-base from the Technical University of Munich and version 2.0 was published under a source-available research license. Until 2016, openSMILE was downloaded more than 50,000 times worldwide and has established itself as a standard toolkit for emotion recognition. == Awards == openSMILE was awarded in 2010 in the context of the ACM Multimedia Open Source Competition. The software tool is applied in numerous scientific publications on automatic emotion recognition. openSMILE and its extension openEAR have been cited in more than 1000 scientific publications until today.

Unrestricted algorithm

An unrestricted algorithm is an algorithm for the computation of a mathematical function that puts no restrictions on the range of the argument or on the precision that may be demanded in the result. The idea of such an algorithm was put forward by C. W. Clenshaw and F. W. J. Olver in a paper published in 1980. In the problem of developing algorithms for computing, as regards the values of a real-valued function of a real variable (e.g., g[x] in "restricted" algorithms), the error that can be tolerated in the result is specified in advance. An interval on the real line would also be specified for values when the values of a function are to be evaluated. Different algorithms may have to be applied for evaluating functions outside the interval. An unrestricted algorithm envisages a situation in which a user may stipulate the value of x and also the precision required in g(x) quite arbitrarily. The algorithm should then produce an acceptable result without failure.

Astrostatistics

Astrostatistics is a discipline which spans astrophysics, statistical analysis and data mining. It is used to process the vast amount of data produced by automated scanning of the cosmos, to characterize complex datasets, and to link astronomical data to astrophysical theory. Many branches of statistics are involved in astronomical analysis including nonparametrics, multivariate regression and multivariate classification, time series analysis, and especially Bayesian inference. The field is closely related to astroinformatics.

Novell Storage Manager

Novell Storage Manager is a system software package released by Novell in 2004 that uses identity, policy and directory events to automate full lifecycle management of file storage for individual users and organizational groups. By tying storage management to an organization's existing identity infrastructure, it has been pointed out, Novell Storage Manager enables the administration of users across all file servers "as a single pool rather than [in] separate independently managed domains." Novell Storage Manager is a component of the Novell File Management Suite. == How It Works == Novell Storage Manager dynamically manages and provisions storage based on user and group events that occur in the directory, including user creations, group assignments, moves, renames, and deletions. When a change happens in the directory that affects a user’s file storage needs or user storage policy, Storage Manager applies the appropriate policy and makes the necessary changes at the file system level to address those storage needs. The following key components comprise Novell Storage Manager's identity and policy-driven state machine architecture: Directory services; Storage policies; Novell Storage Manager event monitors; Novell Storage Manager policy engine; Novell Storage Manager agents; and Action objects. This state machine architecture enables the engine to properly deal with transient waits with directory synchronization issues. It also allows recovery from failures involving network communications, a target server or a server running a component of Storage Manager—including the policy engine itself. If a failure or interruption occurs at any point during operation, Storage Manager will be able to successfully continue the operation from where it was when the interruption occurred. == Reviews == Jon Toigo called Novell Storage Manager "a robust and smart approach to corralling user files... into an organized and efficient management scheme". He also said it was "best in class" of the products he'd reviewed.