Ordered weighted averaging

Ordered weighted averaging

In applied mathematics, specifically in fuzzy logic, the ordered weighted averaging (OWA) operators provide a parameterized class of mean type aggregation operators. They were introduced by Ronald R. Yager. Many notable mean operators such as the max, arithmetic average, median and min, are members of this class. They have been widely used in computational intelligence because of their ability to model linguistically expressed aggregation instructions. == Definition == An OWA operator of dimension n {\displaystyle \ n} is a mapping F : R n → R {\displaystyle F:\mathbb {R} ^{n}\rightarrow \mathbb {R} } that has an associated collection of weights W = [ w 1 , … , w n ] {\displaystyle \ W=[w_{1},\ldots ,w_{n}]} lying in the unit interval and summing to one and with F ( a 1 , … , a n ) = ∑ j = 1 n w j b j {\displaystyle F(a_{1},\ldots ,a_{n})=\sum _{j=1}^{n}w_{j}b_{j}} where b j {\displaystyle b_{j}} is the jth largest of the a i {\displaystyle a_{i}} . By choosing different W one can implement different aggregation operators. The OWA operator is a non-linear operator as a result of the process of determining the bj. == Notable OWA operators == F ( a 1 , … , a n ) = max ( a 1 , … , a n ) {\displaystyle \ F(a_{1},\ldots ,a_{n})=\max(a_{1},\ldots ,a_{n})} if w 1 = 1 {\displaystyle \ w_{1}=1} and w j = 0 {\displaystyle \ w_{j}=0} for j ≠ 1 {\displaystyle j\neq 1} F ( a 1 , … , a n ) = min ( a 1 , … , a n ) {\displaystyle \ F(a_{1},\ldots ,a_{n})=\min(a_{1},\ldots ,a_{n})} if w n = 1 {\displaystyle \ w_{n}=1} and w j = 0 {\displaystyle \ w_{j}=0} for j ≠ n {\displaystyle j\neq n} F ( a 1 , … , a n ) = a v e r a g e ( a 1 , … , a n ) {\displaystyle \ F(a_{1},\ldots ,a_{n})=\mathrm {average} (a_{1},\ldots ,a_{n})} if w j = 1 n {\displaystyle \ w_{j}={\frac {1}{n}}} for all j ∈ [ 1 , n ] {\displaystyle j\in [1,n]} == Properties == The OWA operator is a mean operator. It is bounded, monotonic, symmetric, and idempotent, as defined below. == Characterizing features == Two features have been used to characterize the OWA operators. The first is the attitudinal character, also called orness. This is defined as A − C ( W ) = 1 n − 1 ∑ j = 1 n ( n − j ) w j . {\displaystyle A-C(W)={\frac {1}{n-1}}\sum _{j=1}^{n}(n-j)w_{j}.} It is known that A − C ( W ) ∈ [ 0 , 1 ] {\displaystyle A-C(W)\in [0,1]} . In addition A − C(max) = 1, A − C(ave) = A − C(med) = 0.5 and A − C(min) = 0. Thus the A − C goes from 1 to 0 as we go from Max to Min aggregation. The attitudinal character characterizes the similarity of aggregation to OR operation(OR is defined as the Max). The second feature is the dispersion. This defined as H ( W ) = − ∑ j = 1 n w j ln ⁡ ( w j ) . {\displaystyle H(W)=-\sum _{j=1}^{n}w_{j}\ln(w_{j}).} An alternative definition is E ( W ) = ∑ j = 1 n w j 2 . {\displaystyle E(W)=\sum _{j=1}^{n}w_{j}^{2}.} The dispersion characterizes how uniformly the arguments are being used. == Type-1 OWA aggregation operators == The above Yager's OWA operators are used to aggregate the crisp values. Can we aggregate fuzzy sets in the OWA mechanism? The Type-1 OWA operators have been proposed for this purpose. So the type-1 OWA operators provides us with a new technique for directly aggregating uncertain information with uncertain weights via OWA mechanism in soft decision making and data mining, where these uncertain objects are modelled by fuzzy sets. The type-1 OWA operator is defined according to the alpha-cuts of fuzzy sets as follows: Given the n linguistic weights { W i } i = 1 n {\displaystyle \left\{{W^{i}}\right\}_{i=1}^{n}} in the form of fuzzy sets defined on the domain of discourse U = [ 0 , 1 ] {\displaystyle U=[0,\;\;1]} , then for each α ∈ [ 0 , 1 ] {\displaystyle \alpha \in [0,\;1]} , an α {\displaystyle \alpha } -level type-1 OWA operator with α {\displaystyle \alpha } -level sets { W α i } i = 1 n {\displaystyle \left\{{W_{\alpha }^{i}}\right\}_{i=1}^{n}} to aggregate the α {\displaystyle \alpha } -cuts of fuzzy sets { A i } i = 1 n {\displaystyle \left\{{A^{i}}\right\}_{i=1}^{n}} is given as Φ α ( A α 1 , … , A α n ) = { ∑ i = 1 n w i a σ ( i ) ∑ i = 1 n w i | w i ∈ W α i , a i ∈ A α i , i = 1 , … , n } {\displaystyle \Phi _{\alpha }\left({A_{\alpha }^{1},\ldots ,A_{\alpha }^{n}}\right)=\left\{{{\frac {\sum \limits _{i=1}^{n}{w_{i}a_{\sigma (i)}}}{\sum \limits _{i=1}^{n}{w_{i}}}}\left|{w_{i}\in W_{\alpha }^{i},\;a_{i}}\right.\in A_{\alpha }^{i},\;i=1,\ldots ,n}\right\}} where W α i = { w | μ W i ( w ) ≥ α } , A α i = { x | μ A i ( x ) ≥ α } {\displaystyle W_{\alpha }^{i}=\{w|\mu _{W_{i}}(w)\geq \alpha \},A_{\alpha }^{i}=\{x|\mu _{A_{i}}(x)\geq \alpha \}} , and σ : { 1 , … , n } → { 1 , … , n } {\displaystyle \sigma :\{\;1,\ldots ,n\;\}\to \{\;1,\ldots ,n\;\}} is a permutation function such that a σ ( i ) ≥ a σ ( i + 1 ) , ∀ i = 1 , … , n − 1 {\displaystyle a_{\sigma (i)}\geq a_{\sigma (i+1)},\;\forall \;i=1,\ldots ,n-1} , i.e., a σ ( i ) {\displaystyle a_{\sigma (i)}} is the i {\displaystyle i} th largest element in the set { a 1 , … , a n } {\displaystyle \left\{{a_{1},\ldots ,a_{n}}\right\}} . The computation of the type-1 OWA output is implemented by computing the left end-points and right end-points of the intervals Φ α ( A α 1 , … , A α n ) {\displaystyle \Phi _{\alpha }\left({A_{\alpha }^{1},\ldots ,A_{\alpha }^{n}}\right)} : Φ α ( A α 1 , … , A α n ) − {\displaystyle \Phi _{\alpha }\left({A_{\alpha }^{1},\ldots ,A_{\alpha }^{n}}\right)_{-}} and Φ α ( A α 1 , … , A α n ) + , {\displaystyle \Phi _{\alpha }\left({A_{\alpha }^{1},\ldots ,A_{\alpha }^{n}}\right)_{+},} where A α i = [ A α − i , A α + i ] , W α i = [ W α − i , W α + i ] {\displaystyle A_{\alpha }^{i}=[A_{\alpha -}^{i},A_{\alpha +}^{i}],W_{\alpha }^{i}=[W_{\alpha -}^{i},W_{\alpha +}^{i}]} . Then membership function of resulting aggregation fuzzy set is: μ G ( x ) = ∨ α : x ∈ Φ α ( A α 1 , ⋯ , A α n ) α ⁡ α {\displaystyle \mu _{G}(x)=\mathop {\vee } _{\alpha :x\in \Phi _{\alpha }\left({A_{\alpha }^{1},\cdots ,A_{\alpha }^{n}}\right)_{\alpha }}\alpha } For the left end-points, we need to solve the following programming problem: Φ α ( A α 1 , ⋯ , A α n ) − = min W α − i ≤ w i ≤ W α + i A α − i ≤ a i ≤ A α + i ∑ i = 1 n w i a σ ( i ) / ∑ i = 1 n w i {\displaystyle \Phi _{\alpha }\left({A_{\alpha }^{1},\cdots ,A_{\alpha }^{n}}\right)_{-}=\min \limits _{\begin{array}{l}W_{\alpha -}^{i}\leq w_{i}\leq W_{\alpha +}^{i}A_{\alpha -}^{i}\leq a_{i}\leq A_{\alpha +}^{i}\end{array}}\sum \limits _{i=1}^{n}{w_{i}a_{\sigma (i)}/\sum \limits _{i=1}^{n}{w_{i}}}} while for the right end-points, we need to solve the following programming problem: Φ α ( A α 1 , ⋯ , A α n ) + = max W α − i ≤ w i ≤ W α + i A α − i ≤ a i ≤ A α + i ∑ i = 1 n w i a σ ( i ) / ∑ i = 1 n w i {\displaystyle \Phi _{\alpha }\left({A_{\alpha }^{1},\cdots ,A_{\alpha }^{n}}\right)_{+}=\max \limits _{\begin{array}{l}W_{\alpha -}^{i}\leq w_{i}\leq W_{\alpha +}^{i}A_{\alpha -}^{i}\leq a_{i}\leq A_{\alpha +}^{i}\end{array}}\sum \limits _{i=1}^{n}{w_{i}a_{\sigma (i)}/\sum \limits _{i=1}^{n}{w_{i}}}} Zhou et al. presented a fast method to solve two programming problem so that the type-1 OWA aggregation operation can be performed efficiently. == OWA for committee voting == Amanatidis, Barrot, Lang, Markakis and Ries present voting rules for multi-issue voting, based on OWA and the Hamming distance. Barrot, Lang and Yokoo study the manipulability of these rules.

Wetware (brain)

Wetware is a term drawn from the computer-related idea of hardware or software, but applied to biological life forms. == Usage == The prefix "wet" is a reference to the water found in living creatures. Wetware is used to describe the elements equivalent to hardware and software found in a person, especially the central nervous system (CNS) and the human mind. The term wetware finds use in works of fiction, in scholarly publications and in popularizations. The "hardware" component of wetware concerns the bioelectric and biochemical properties of the CNS, specifically the brain. If the sequence of impulses traveling across the various neurons are thought of symbolically as software, then the physical neurons would be the hardware. The amalgamated interaction of this software and hardware is manifested through continuously changing physical connections, and chemical and electrical influences that spread across the body. The process by which the mind and brain interact to produce the collection of experiences that we define as self-awareness is in question. == History == Although the exact definition has shifted over time, the term Wetware and its fundamental reference to "the physical mind" has been around at least since the mid-1950s. Mostly used in relatively obscure articles and papers, it was not until the heyday of cyberpunk, however, that the term found broad adoption. Among the first uses of the term in popular culture was the Bruce Sterling novel Schismatrix (1985) and the Michael Swanwick novel Vacuum Flowers (1987). Rudy Rucker references the term in a number of books, including one entitled Wetware (1988): ... all sparks and tastes and tangles, all its stimulus/response patterns – the whole bio-cybernetic software of mind. Rucker did not use the word to simply mean a brain, nor in the human-resources sense of employees. He used wetware to stand for the data found in any biological system, analogous perhaps to the firmware that is found in a ROM chip. In Rucker's sense, a seed, a plant graft, an embryo, or a biological virus are all wetware. DNA, the immune system, and the evolved neural architecture of the brain are further examples of wetware in this sense. Rucker describes his conception in a 1992 compendium The Mondo 2000 User's Guide to the New Edge, which he quotes in a 2007 blog entry. Early cyber-guru Arthur Kroker used the term in his blog. With the term getting traction in trendsetting publications, it became a buzzword in the early 1990s. In 1991, Dutch media theorist Geert Lovink organized the Wetware Convention in Amsterdam, which was supposed to be an antidote to the "out-of-body" experiments conducted in high-tech laboratories, such as experiments in virtual reality. Timothy Leary, in an appendix to Info-Psychology originally written in 1975–76 and published in 1989, used the term wetware, writing that "psychedelic neuro-transmitters were the hot new technology for booting-up the 'wetware' of the brain". Another common reference is: "Wetware has 7 plus or minus 2 temporary registers." The numerical allusion is to a classic 1957 article by George A. Miller, The magical number 7 plus or minus two: some limits in our capacity for processing information, which later gave way to Miller's law.

Scalable Coherent Interface

The Scalable Coherent Interface or Scalable Coherent Interconnect (SCI), is a high-speed interconnect standard for shared memory multiprocessing and message passing. The goal was to scale well, provide system-wide memory coherence and a simple interface; i.e. a standard to replace existing buses in multiprocessor systems with one with no inherent scalability and performance limitations. The IEEE Std 1596-1992, IEEE Standard for Scalable Coherent Interface (SCI) was approved by the IEEE standards board on March 19, 1992. It saw some use during the 1990s, but never became widely used and has been replaced by other systems from the early 2000s. == History == Soon after the Fastbus (IEEE 960) follow-on Futurebus (IEEE 896) project in 1987, some engineers predicted it would already be too slow for the high performance computing marketplace by the time it would be released in the early 1990s. In response, a "Superbus" study group was formed in November 1987. Another working group of the standards association of the Institute of Electrical and Electronics Engineers (IEEE) spun off to form a standard targeted at this market in July 1988. It was essentially a subset of Futurebus features that could be easily implemented at high speed, along with minor additions to make it easier to connect to other systems, such as VMEbus. Most of the developers had their background from high-speed computer buses. Representatives from companies in the computer industry and research community included Amdahl, Apple Computer, BB&N, Hewlett-Packard, CERN, Dolphin Server Technology, Cray Research, Sequent, AT&T, Digital Equipment Corporation, McDonnell Douglas, National Semiconductor, Stanford Linear Accelerator Center, Tektronix, Texas Instruments, Unisys, University of Oslo, University of Wisconsin. The original intent was a single standard for all buses in the computer. The working group soon came up with the idea of using point-to-point communication in the form of insertion rings. This avoided the lumped capacitance, limited physical length/speed of light problems and stub reflections in addition to allowing parallel transactions. The use of insertion rings is credited to Manolis Katevenis who suggested it at one of the early meetings of the working group. The working group for developing the standard was led by David B. Gustavson (chair) and David V. James (Vice Chair). David V. James was a major contributor for writing the specifications including the executable C-code. Stein Gjessing’s group at the University of Oslo used formal methods to verify the coherence protocol and Dolphin Server Technology implemented a node controller chip including the cache coherence logic. Different versions and derivatives of SCI were implemented by companies like Dolphin Interconnect Solutions, Convex, Data General AViiON (using cache controller and link controller chips from Dolphin), Sequent and Cray Research. Dolphin Interconnect Solutions implemented a PCI and PCI-Express connected derivative of SCI that provides non-coherent shared memory access. This implementation was used by Sun Microsystems for its high-end clusters, Thales Group and several others including volume applications for message passing within HPC clustering and medical imaging. SCI was often used to implement non-uniform memory access architectures. It was also used by Sequent Computer Systems as the processor memory bus in their NUMA-Q systems. Numascale developed a derivative to connect with coherent HyperTransport. == The standard == The standard defined two interface levels: The physical level that deals with electrical signals, connectors, mechanical and thermal conditions The logical level that describes the address space, data transfer protocols, cache coherence mechanisms, synchronization primitives, control and status registers, and initialization and error recovery facilities. This structure allowed new developments in physical interface technology to be easily adapted without any redesign on the logical level. Scalability for large systems is achieved through a distributed directory-based cache coherence model. (The other popular models for cache coherency are based on system-wide eavesdropping (snooping) of memory transactions – a scheme which is not very scalable.) In SCI each node contains a directory with a pointer to the next node in a linked list that shares a particular cache line. SCI defines a 64-bit flat address space (16 exabytes) where 16 bits are used for identifying a node (65,536 nodes) and 48 bits for address within the node (256 terabytes). A node can contain many processors and/or memory. The SCI standard defines a packet switched network. === Topologies === SCI can be used to build systems with different types of switching topologies from centralized to fully distributed switching: With a central switch, each node is connected to the switch with a ringlet (in this case a two-node ring). In distributed switching systems, each node can be connected to a ring of arbitrary length and either all or some of the nodes can be connected to two or more rings. The most common way to describe these multi-dimensional topologies is k-ary n-cubes (or tori). The SCI standard specification mentions several such topologies as examples. The 2-D torus is a combination of rings in two dimensions. Switching between the two dimensions requires a small switching capability in the node. This can be expanded to three or more dimensions. The concept of folding rings can also be applied to the Torus topologies to avoid any long connection segments. === Transactions === SCI sends information in packets. Each packet consists of an unbroken sequence of 16-bit symbols. The symbol is accompanied by a flag bit. A transition of the flag bit from 0 to 1 indicates the start of a packet. A transition from 1 to 0 occurs 1 (for echoes) or 4 symbols before the packet end. A packet contains a header with address command and status information, payload (from 0 through optional lengths of data) and a CRC check symbol. The first symbol in the packet header contains the destination node address. If the address is not within the domain handled by the receiving node, the packet is passed to the output through the bypass FIFO. In the other case, the packet is fed to a receive queue and may be transferred to a ring in another dimension. All packets are marked when they pass the scrubber (a node is established as scrubber when the ring is initialized). Packets without a valid destination address will be removed when passing the scrubber for the second time to avoid filling the ring with packets that would otherwise circulate indefinitely. === Cache coherence === Cache coherence ensures data consistency in multiprocessor systems. The simplest form applied in earlier systems was based on clearing the cache contents between context switches and disabling the cache for data that were shared between two or more processors. These methods were feasible when the performance difference between the cache and memory were less than one order of magnitude. Modern processors with caches that are more than two orders of magnitude faster than main memory would not perform anywhere near optimal without more sophisticated methods for data consistency. Bus based systems use eavesdropping (snooping) methods since buses are inherently broadcast. Modern systems with point-to point links use broadcast methods with snoop filter options to improve performance. Since broadcast and eavesdropping are inherently non-scalable, these are not used in SCI. Instead, SCI uses a distributed directory-based cache coherence protocol with a linked list of nodes containing processors that share a particular cache line. Each node holds a directory for the main memory of the node with a tag for each line of memory (same line length as the cache line). The memory tag holds a pointer to the head of the linked list and a state code for the line (three states – home, fresh, gone). Associated with each node is also a cache for holding remote data with a directory containing forward and backward pointers to nodes in the linked list sharing the cache line. The tag for the cache has seven states (invalid, only fresh, head fresh, only dirty, head dirty, mid valid, tail valid). The distributed directory is scalable. The overhead for the directory based cache coherence is a constant percentage of the node’s memory and cache. This percentage is in the order of 4% for the memory and 7% for the cache. == Legacy == SCI is a standard for connecting the different resources within a multiprocessor computer system, and it is not as widely known to the public as for example the Ethernet family for connecting different systems. Different system vendors implemented different variants of SCI for their internal system infrastructure. These different implementations interface to very intricate mechanisms in processors and memory systems and each vendor has to preserve some degrees of

Chunked transfer encoding

Chunked transfer encoding is a streaming data transfer mechanism available in Hypertext Transfer Protocol (HTTP) version 1.1, defined in RFC 9112 §7.1. In chunked transfer encoding, the data stream is divided into a series of non-overlapping "chunks". The chunks are sent out and received independently of one another. At any given time, no knowledge of the data stream outside the currently-being-processed chunk is necessary for either the sender or the receiver. Each chunk is preceded by its size in bytes and transmission ends when a zero-length chunk is received. The chunked keyword in the Transfer-Encoding header is used to indicate chunked transfer. Chunked transfer encoding is not supported in HTTP/2, which provides its own mechanisms for data streaming. == Rationale == The introduction of chunked encoding provided various benefits: Chunked transfer encoding allows a server to maintain an HTTP persistent connection for dynamically generated content. In this case, the HTTP Content-Length header cannot be used to delimit the content and the next HTTP request/response, as the content size is not yet known. Chunked encoding has the benefit that it is not necessary to generate the full content before writing the header, as it allows streaming of content as chunks and explicitly signaling the end of the content, making the connection available for the next HTTP request/response. Chunked encoding allows the sender to send additional header fields after the message body. This is important in cases where values of a field cannot be known until the content has been produced, such as when the content of the message must be digitally signed. Without chunked encoding, the sender would have to buffer the content until it was complete in order to calculate a field value and send it before the content. == Applicability == For version 1.1 of the HTTP protocol, the chunked transfer mechanism is considered to be always and anyway acceptable, even if not listed in the Transfer-Encoding (TE) request header field, and when used with other transfer mechanisms, should always be applied last to the transferred data and never more than one time. This transfer encoding method also allows additional entity header fields to be sent after the last chunk if the client specified the "trailers" parameter as an argument of the TE request field. The origin server of the response can also decide to send additional entity trailers even if the client did not specify the "trailers" parameter, but only if the metadata is optional (i.e. the client can use the received entity without them). Whenever the trailers are used, the server should list their names in the Trailer header field; three header field types are specifically prohibited from appearing as a trailer field: Content-Length, Trailer, and Transfer-Encoding. == Format == If a Transfer-Encoding field with a value of "chunked" is specified in an HTTP message (either a request sent by a client or the response from the server), the body of the message consists of one or more chunks and one terminating chunk with an optional trailer before the final ␍␊ sequence (i.e. carriage return followed by line feed). Each chunk starts with the number of octets of the data it embeds expressed as a hexadecimal number in ASCII followed by optional parameters (chunk extension) and a terminating ␍␊ sequence, followed by the chunk data. The chunk is terminated by ␍␊. If chunk extensions are provided, the chunk size is terminated by a semicolon and followed by the parameters, each also delimited by semicolons. Each parameter is encoded as an extension name followed by an optional equal sign and value. These parameters could be used for a running message digest or digital signature, or to indicate an estimated transfer progress, for instance. The terminating chunk is a special chunk of zero length. It may contain a trailer, which consists of a (possibly empty) sequence of entity header fields. Normally, such header fields would be sent in the message's header; however, it may be more efficient to determine them after processing the entire message entity. In that case, it is useful to send those headers in the trailer. Header fields that regulate the use of trailers are Transfer-Encoding with the "trailers" parameter (used in requests) and Trailer (used in responses). == Use with compression == HTTP servers often use compression to optimize transmission, for example with Content-Encoding: gzip or Content-Encoding: deflate. If both compression and chunked encoding are enabled, then the content stream is first compressed, then chunked; so the chunk encoding itself is not compressed, and the data in each chunk is compressed holistically (i.e. based on the whole content). The remote endpoint then decodes the stream by concatenating the chunks and uncompressing the result. == Example == === Encoded data === The following example contains three chunks of size 4, 7, and 11 (hexadecimal "B") octets of data. 4␍␊Wiki␍␊7␍␊pedia i␍␊B␍␊n ␍␊chunks.␍␊0␍␊␍␊ Below is an annotated version of the encoded data. 4␍␊ (chunk size is four octets) Wiki (four octets of data) ␍␊ (end of chunk) 7␍␊ (chunk size is seven octets) pedia i (seven octets of data) ␍␊ (end of chunk) B␍␊ (chunk size is eleven octets) n ␍␊chunks. (eleven octets of data) ␍␊ (end of chunk) 0␍␊ (chunk size is zero octets, no more chunks) ␍␊ (end of final chunk with zero data octets) Note: Each chunk's size excludes the two ␍␊ bytes that terminate the data of each chunk. === Decoded data === Decoding the above example produces the following octets: Wikipedia in ␍␊chunks. The bytes above are typically displayed as Wikipedia in chunks.

Software token

A software token (a.k.a. soft token) is a piece of a two-factor authentication security device that may be used to authorize the use of computer services. Software tokens are stored on a general-purpose electronic device such as a desktop computer, laptop, PDA, or mobile phone and can be duplicated. (Contrast hardware tokens, where the credentials are stored on a dedicated hardware device and therefore cannot be duplicated — absent physical invasion of the device) Because software tokens are something one does not physically possess, they are exposed to unique threats based on duplication of the underlying cryptographic material - for example, computer viruses and software attacks. Both hardware and software tokens are vulnerable to bot-based man-in-the-middle attacks, or to simple phishing attacks in which the one-time password provided by the token is solicited, and then supplied to the genuine website in a timely manner. Software tokens do have benefits: there is no physical token to carry, they do not contain batteries that will run out, and they are cheaper than hardware tokens. == Security architecture == There are two primary architectures for software tokens: shared secret and public-key cryptography. For a shared secret, an administrator will typically generate a configuration file for each end-user. The file will contain a username, a personal identification number, and the secret. This configuration file is given to the user. The shared secret architecture is potentially vulnerable in a number of areas. The configuration file can be compromised if it is stolen and the token is copied. With time-based software tokens, it is possible to borrow an individual's PDA or laptop, set the clock forward, and generate codes that will be valid in the future. Any software token that uses shared secrets and stores the PIN alongside the shared secret in a software client can be stolen and subjected to offline attacks. Shared secret tokens can be difficult to distribute, since each token is essentially a different piece of software. Each user must receive a copy of the secret, which can create time constraints. Some newer software tokens rely on public-key cryptography, or asymmetric cryptography. This architecture eliminates some of the traditional weaknesses of software tokens, but does not affect their primary weakness (ability to duplicate). A PIN can be stored on a remote authentication server instead of with the token client, making a stolen software token no good unless the PIN is known as well. However, in the case of a virus infection, the cryptographic material can be duplicated and then the PIN can be captured (via keylogging or similar) the next time the user authenticates. If there are attempts made to guess the PIN, it can be detected and logged on the authentication server, which can disable the token. Using asymmetric cryptography also simplifies implementation, since the token client can generate its own key pair and exchange public keys with the server.

Arabic Ontology

Arabic Ontology is a website offering linguistic ontology services for the Arabic language which can be used like the online site WordNet. Users can use Arabic Ontology to classify or clarify the concepts and meanings of Arabic terms. == Ontology Structure == The ontology structure (i.e., data model) is similar to WordNet's structure. Each concept in the database is given a unique concept identifier (URI), informally described by a gloss, and lexicalized by one or more synonymous lemma terms. Each term-concept pair is called a sense, and is given a SenseID. A set of senses is called synset. Concepts and senses are described by further attributes such as era and area — to specify example usage and ontological analysis. Semantic relations are defined between concepts. Some important entities are included in the ontology, such as individual countries and bodies of water. These individuals are given separate IndividualIDs and linked with their concepts through the InstanceOf relation. == Mappings to other resources == Concepts in the Arabic Ontology are mapped to synsets in WordNet, as well as to BFO and DOLCE. Terms used in the Arabic Ontology are mapped to lemmas in the LDC's SAMA database. == Applications == Arabic Ontology can be used in many application domains, such as: Information retrieval, to enrich queries (e.g., in search engines) and improve the quality of the results, i.e. meaningful search rather than string-matching search; Machine translation and word-sense disambiguation, by finding the exact mapping of concepts across languages, especially that the Arabic ontology is also mapped to the WordNet; Data Integration and interoperability in which the Arabic ontology can be used as a semantic reference to link databases and information systems; Semantic Web and Web 3.0, by using the Arabic ontology as a semantic reference to disambiguate the meanings used in websites; among many other applications. == URLs Design == The URLs in the Arabic Ontology are designed according to the W3C's Best Practices for Publishing Linked Data, as described in the following URL schemes. This allows one to also explore the whole database like exploring a graph: Ontology Concept: Each concept in the Arabic Ontology has a ConceptID and can be accessed using: https://{domain}/concept/{ConceptID | Term}. In case of a term, the set of concepts that this term lexicalizes are all retrieved. In case of a ConceptID, the concept and its direct subtypes are retrieved, e.g. https://ontology.birzeit.edu/concept/293198 Semantic relations: Relationships between concepts can be accessed using these schemes: (i) the URL: https:// {domain}/concept/{RelationName}/{ConceptID} allows retrieval of relationships among ontology concepts. (ii) the URL: https://{domain}/lexicalconcept/{RelationName}/{lexicalConceptID} allows retrieval of relations between lexical concepts. For example, https://ontology.birzeit.edu/concept/instances/293121 retrieves the instances of the concept 293121. The relations that are currently used in our database are: {subtypes, type, instances, parts, related, similar, equivalent}.

Data integration

Data integration is the process of combining, sharing, or synchronizing data from multiple sources to provide users with a unified view. There are a wide range of possible applications for data integration, from commercial (such as when a business merges multiple databases) to scientific (combining research data from different bioinformatics repositories). The decision to integrate data tends to arise when the volume, complexity (that is, big data) and need to share existing data explodes. It has become the focus of extensive theoretical work, and numerous open problems remain unsolved. Data integration encourages collaboration between internal as well as external users. The data being integrated must be received from a heterogeneous database system and transformed to a single coherent data store that provides synchronous data across a network of files for clients. A common use of data integration is in data mining when analyzing and extracting information from existing databases that can be useful for Business information. == History == Issues with combining heterogeneous data sources, often referred to as information silos, under a single query interface have existed for some time. In the early 1980s, computer scientists began designing systems for interoperability of heterogeneous databases. The first data integration system driven by structured metadata was designed in 1991 at the University of Minnesota for the Integrated Public Use Microdata Series (IPUMS). IPUMS used a data warehousing approach, which extracts, transforms, and loads data from heterogeneous sources into a unique view schema so data from different sources become compatible. By making thousands of population databases interoperable, IPUMS demonstrated the feasibility of large-scale data integration. The data warehouse approach offers a tightly coupled architecture because the data are already physically reconciled in a single queryable repository, so it usually takes little time to resolve queries. The data warehouse approach is less feasible for data sets that are frequently updated, requiring the extract, transform, load (ETL) process to be continuously re-executed for synchronization. Difficulties also arise in constructing data warehouses when one has only a query interface to summary data sources and no access to the full data. This problem frequently emerges when integrating several commercial query services like travel or classified advertisement web applications. A trend began in 2009 favoring the loose coupling of data and providing a unified query-interface to access real time data over a mediated schema (see Figure 2), which allows information to be retrieved directly from original databases. This is consistent with the SOA approach popular in that era. This approach relies on mappings between the mediated schema and the schema of original sources, and translating a query into decomposed queries to match the schema of the original databases. Such mappings can be specified in two ways: as a mapping from entities in the mediated schema to entities in the original sources (the "Global-as-View" (GAV) approach), or as a mapping from entities in the original sources to the mediated schema (the "Local-as-View" (LAV) approach). The latter approach requires more sophisticated inferences to resolve a query on the mediated schema, but makes it easier to add new data sources to a (stable) mediated schema. As of 2010, some of the work in data integration research concerns the semantic integration problem. This problem addresses not the structuring of the architecture of the integration, but how to resolve semantic conflicts between heterogeneous data sources. For example, if two companies merge their databases, certain concepts and definitions in their respective schemas like "earnings" inevitably have different meanings. In one database it may mean profits in dollars (a floating-point number), while in the other it might represent the number of sales (an integer). A common strategy for the resolution of such problems involves the use of ontologies which explicitly define schema terms and thus help to resolve semantic conflicts. This approach represents ontology-based data integration. On the other hand, the problem of combining research results from different bioinformatics repositories requires bench-marking of the similarities, computed from different data sources, on a single criterion such as positive predictive value. This enables the data sources to be directly comparable and can be integrated even when the natures of experiments are distinct. As of 2011, it was determined that current data modeling methods were imparting data isolation into every data architecture in the form of islands of disparate data and information silos. This data isolation is an unintended artifact of the data modeling methodology that results in the development of disparate data models. Disparate data models, when instantiated as databases, form disparate databases. Enhanced data model methodologies have been developed to eliminate the data isolation artifact and to promote the development of integrated data models. One enhanced data modeling method recasts data models by augmenting them with structural metadata in the form of standardized data entities. As a result of recasting multiple data models, the set of recast data models will now share one or more commonality relationships that relate the structural metadata now common to these data models. Commonality relationships are a peer-to-peer type of entity relationships that relate the standardized data entities of multiple data models. Multiple data models that contain the same standard data entity may participate in the same commonality relationship. When integrated data models are instantiated as databases and are properly populated from a common set of master data, then these databases are integrated. Since 2011, data hub approaches have been of greater interest than fully structured (typically relational) Enterprise Data Warehouses. Since 2013, data lake approaches have risen to the level of Data Hubs. (See all three search terms popularity on Google Trends.) These approaches combine unstructured or varied data into one location, but do not necessarily require an (often complex) master relational schema to structure and define all data in the Hub. In recent times, as the number of applications being used have increased many fold and application to application integration have become critical and this has given rise to [Unified APIs] that help application developers integrate their apps with other apps and more recently with [MCP - Model Context Protocol] taking it a step further for AI Agents. Data integration plays a big role in business regarding data collection used for studying the market. Converting the raw data retrieved from consumers into coherent data is something businesses try to do when considering what steps they should take next. Organizations are more frequently using data mining for collecting information and patterns from their databases, and this process helps them develop new business strategies to increase business performance and perform economic analyses more efficiently. Compiling the large amount of data they collect to be stored in their system is a form of data integration adapted for Business intelligence to improve their chances of success. == Example == Consider a web application where a user can query a variety of information about cities (such as crime statistics, weather, hotels, demographics, etc.). Traditionally, the information must be stored in a single database with a single schema. But any single enterprise would find information of this breadth somewhat difficult and expensive to collect. Even if the resources exist to gather the data, it would likely duplicate data in existing crime databases, weather websites, and census data. A data-integration solution may address this problem by considering these external resources as materialized views over a virtual mediated schema, resulting in "virtual data integration". This means application-developers construct a virtual schema—the mediated schema—to best model the kinds of answers their users want. Next, they design "wrappers" or adapters for each data source, such as the crime database and weather website. These adapters simply transform the local query results (those returned by the respective websites or databases) into an easily processed form for the data integration solution (see figure 2). When an application-user queries the mediated schema, the data-integration solution transforms this query into appropriate queries over the respective data sources. Finally, the virtual database combines the results of these queries into the answer to the user's query. This solution offers the convenience of adding new sources by simply constructing an adapter or an application software blade for them. It contrasts with ETL systems or with a si