AI Chatbot Emochi

AI Chatbot Emochi — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Data event

    Data event

    A data event is a relevant state transition defined in an event schema. Typically, event schemata are described by pre- and post condition for a single or a set of data items. In contrast to ECA (Event condition action), which considers an event to be a signal, the data event not only refers to the change (signal), but describes specific state transitions, which are referred to in ECA as conditions. Considering data events as relevant data item state transitions allows defining complex event-reaction schemata for a database. Defining data event schemata for relational databases is limited to attribute and instance events. Object-oriented databases also support collection properties, which allows defining changes in collections as data events, too.

    Read more →
  • Data storage

    Data storage

    Data storage is the recording (storing) of information (data) in a storage medium. Handwriting, phonographic recording, magnetic tape, and optical discs are all examples of storage media. Biological molecules such as RNA and DNA are considered by some as data storage. Recording may be accomplished with virtually any form of energy. Electronic data storage requires electrical power to store and retrieve data. Data stored in a digital, machine-readable medium is called digital data. Computer data storage is one of the core functions of a general-purpose computer. Electronic documents can be stored in much less space than paper documents. Barcodes and magnetic ink character recognition (MICR) are two ways of recording machine-readable data on paper. == Recording media == A recording medium is physical material that holds information. Newly created information is distributed and can be stored in four storage media–print, film, magnetic, and optical–and seen or heard in four information flows–telephone, radio, TV, and the Internet as well as being observed directly. Digital information is stored on electronic media in many different recording formats. With electronic media, the data and the recording media are sometimes referred to as "software" despite the more common use of the word to describe computer software. With (traditional art) static media, art materials such as crayons may be considered both equipment and medium as the wax, charcoal or chalk material from the equipment becomes part of the surface of the medium. Some recording media may be temporary, either by design or by nature. Volatile organic compounds may be used to purposely make data expire over time or to reduce environmental impact. Data such as smoke signals or skywriting are temporary by nature. Depending on the volatility, a gas (e.g., atmosphere, smoke) or a liquid surface such as a lake would be considered a temporary recording medium, if it could be considered a recording medium at all. == Global capacity, digitization, and trends == A 2003 UC Berkeley report estimated that about five exabytes of new information were produced in 2002 and that 92% of this data was stored on magnetic media (primarily hard disk drives). This was about twice the data produced in 1999. The amount of data transmitted over telecommunications systems in 2002 was nearly 18 exabytes—three and a half times more than was recorded on non-volatile storage. Telephone calls constituted 98% of the telecommunicated information in 2002. The researchers' highest estimate for the growth rate of newly stored information (uncompressed) was more than 30% per year. In a more limited study, the International Data Corporation estimated that the total amount of digital data in 2007 was 281 exabytes and that the total amount of digital data produced exceeded the global storage capacity for the first time. A 2011 article in Science estimated that the year 2002 was the beginning of the digital age for information storage: an age in which more information is stored on digital storage devices than on analog storage devices. In 1986, approximately 1% of the world's capacity to store information was in digital format; this grew to 3% by 1993, to 25% by 2000, and to 94% by 2007. These figures correspond to less than three compressed exabytes in 1986, and 295 compressed exabytes in 2007. The quantity of digital storage doubled roughly every three to four years. It is estimated that around 120 zettabytes of data will be generated in 2023, an increase of 60x from 2010, and that it will increase to 181 zettabytes generated in 2025. == Mass storage ==

    Read more →
  • G.9970

    G.9970

    G.9970 (also known as G.hnta) is a Recommendation developed by ITU-T that describes the generic transport architecture for home networks and their interfaces to a provider's access network. G.9970 was developed by Study Group 15, Question 1. G.9970 received Consent on December 12, 2008 and was Approved on January 13, 2009. == Relationship with G.hn == G.9970 (G.hnta) and G.9960 (G.hn) are two ITU-T Recommendations that address home networking in a complementary manner. While G.9970 addresses layer 3 (network layer) of the home network architecture, G.9960 addresses layers 1 (physical layer) and 2 (data link layer).

    Read more →
  • Voice inversion

    Voice inversion

    Voice inversion scrambling is an analog method of obscuring the content of a transmission. It is sometimes used in public service radio, automobile racing, cordless telephones and the Family Radio Service. Without a descrambler, the transmission makes the speaker "sound like Donald Duck". Despite the term, the technique operates on the passband of the information and so can be applied to any information being transmitted. == Forms and details == There are various forms of voice inversion which offer differing levels of security. Overall, voice inversion scrambling offers little true security as software and even hobbyist kits are available from kit makers for scrambling and descrambling. The cadence of the speech is not changed. It is often easy to guess what is happening in the conversation by listening for other audio cues like questions, short responses and other language cadences. In the simplest form of voice inversion, the frequency p {\displaystyle p} of each component is replaced with s − p {\displaystyle s-p} , where s {\displaystyle s} is the frequency of a carrier wave. This can be done by amplitude modulating the speech signal with the carrier, then applying a low-pass filter to select the lower sideband. This will make the low tones of the voice sound like high ones and vice versa. This process also occurs naturally if a radio receiver is tuned to a single sideband transmission but set to decode the wrong sideband. There are more advanced forms of voice inversion which are more complex and require more effort to descramble. One method is to use a random code to choose the carrier frequency and then change this code in real time. This is called Rolling Code voice inversion and one can often hear the "ticks" in the transmission which signal the changing of the inversion point. Another method is split band voice inversion. This is where the band is split and then each band is inverted separately. A rolling code can also be added to this method for variable split band inversion (VSB). Common carrier frequencies are: 2.632 kHz, 2.718 kHz, 2.868 kHz, 3.023 kHz, 3.107 kHz, 3.196 kHz, 3.333 kHz, 3.339 kHz, 3.496 kHz, 3.729 kHz and 4.096 kHz. Voice inversion offers no security at all and software is available to restore the original voice, which is why it is no longer used to protect conversations today. However, voice inversion is still found in low-end Chinese walkie talkies.

    Read more →
  • Esdat

    Esdat

    ESdat is a data management, analysis and reporting software for environmental and groundwater data, developed by EarthScience Information Systems (EScIS). It is used to manage many types of environmental data including laboratory chemistry (analytical results, QA data, lab sample planning, and electronic Chain of Custody), field chemistry (water, gas, and soil), hydrogeological data (groundwater, borehole and well construction, lithological, geotechnical and stratigraphic, and LNAPL), meteorological data (rain, wind, and temperature), emission data (dust deposition, HiVol, air quality, and noise) and logger data. Data can be compared against environmental standards or site-specific trigger levels to generate exceedence tables, time series graphs, maps, statistics, and other outputs. ESdat integrates with Power BI and ArcGIS and data can also be exported in a range of other database formats, including USEPA Regions 2,4 & 5, and NYS DEC. ESdat is used by environmental consultants, government, mining and industry for validation, interrogation, and reporting of data derived from complex environmental programs, such as contaminated sites, groundwater investigations, and regulatory compliance for landfills or mining operations.

    Read more →
  • Malleability (cryptography)

    Malleability (cryptography)

    Malleability is a property of some cryptographic algorithms. An encryption algorithm is said to be malleable if it is possible to transform a ciphertext into another ciphertext which decrypts to a related plaintext. That is, given an encryption of a plaintext m {\displaystyle m} , it is possible to generate another ciphertext which decrypts to f ( m ) {\displaystyle f(m)} , for a known function f {\displaystyle f} , without necessarily knowing or learning m {\displaystyle m} . Malleability is often an undesirable property in a general-purpose cryptosystem, since it allows an attacker to modify the contents of a message. For example, suppose that a bank uses a stream cipher to hide its financial information, and a user sends an encrypted message containing, say, "TRANSFER $0000100.00 TO ACCOUNT #199." If an attacker can modify the message on the wire, and can guess the format of the unencrypted message, the attacker could change the amount of the transaction, or the recipient of the funds, e.g. "TRANSFER $0100000.00 TO ACCOUNT #227". Malleability does not refer to the attacker's ability to read the encrypted message. Both before and after tampering, the attacker cannot read the encrypted message. On the other hand, some cryptosystems are malleable by design. In other words, in some circumstances it may be viewed as a feature that anyone can transform an encryption of m {\displaystyle m} into a valid encryption of f ( m ) {\displaystyle f(m)} (for some restricted class of functions f {\displaystyle f} ) without necessarily learning m {\displaystyle m} . Such schemes are known as homomorphic encryption schemes. A cryptosystem may be semantically secure against chosen-plaintext attacks or even non-adaptive chosen-ciphertext attacks (CCA1) while still being malleable. However, security against adaptive chosen-ciphertext attacks (CCA2) is equivalent to non-malleability. == Example malleable cryptosystems == In a stream cipher, the ciphertext is produced by taking the exclusive or of the plaintext and a pseudorandom stream based on a secret key k {\displaystyle k} , as E ( m ) = m ⊕ S ( k ) {\displaystyle E(m)=m\oplus S(k)} . An adversary can construct an encryption of m ⊕ t {\displaystyle m\oplus t} for any t {\displaystyle t} , as E ( m ) ⊕ t = m ⊕ t ⊕ S ( k ) = E ( m ⊕ t ) {\displaystyle E(m)\oplus t=m\oplus t\oplus S(k)=E(m\oplus t)} . In the RSA cryptosystem, a plaintext m {\displaystyle m} is encrypted as E ( m ) = m e mod n {\displaystyle E(m)=m^{e}{\bmod {n}}} , where ( e , n ) {\displaystyle (e,n)} is the public key. Given such a ciphertext, an adversary can construct an encryption of m t {\displaystyle mt} for any t {\displaystyle t} , as E ( m ) ⋅ t e mod n = ( m t ) e mod n = E ( m t ) {\textstyle E(m)\cdot t^{e}{\bmod {n}}=(mt)^{e}{\bmod {n}}=E(mt)} . For this reason, RSA is commonly used together with padding methods such as OAEP or PKCS1. In the ElGamal cryptosystem, a plaintext m {\displaystyle m} is encrypted as E ( m ) = ( g b , m A b ) {\displaystyle E(m)=(g^{b},mA^{b})} , where ( g , A ) {\displaystyle (g,A)} is the public key. Given such a ciphertext ( c 1 , c 2 ) {\displaystyle (c_{1},c_{2})} , an adversary can compute ( c 1 , t ⋅ c 2 ) {\displaystyle (c_{1},t\cdot c_{2})} , which is a valid encryption of t m {\displaystyle tm} , for any t {\displaystyle t} . In contrast, the Cramer-Shoup system (which is based on ElGamal) is not malleable. In the Paillier, ElGamal, and RSA cryptosystems, it is also possible to combine several ciphertexts together in a useful way to produce a related ciphertext. In Paillier, given only the public key and an encryption of m 1 {\displaystyle m_{1}} and m 2 {\displaystyle m_{2}} , one can compute a valid encryption of their sum m 1 + m 2 {\displaystyle m_{1}+m_{2}} . In ElGamal and in RSA, one can combine encryptions of m 1 {\displaystyle m_{1}} and m 2 {\displaystyle m_{2}} to obtain a valid encryption of their product m 1 m 2 {\displaystyle m_{1}m_{2}} . Block ciphers in the cipher block chaining mode of operation, for example, are partly malleable: flipping a bit in a ciphertext block will completely mangle the plaintext it decrypts to, but will result in the same bit being flipped in the plaintext of the next block. This allows an attacker to 'sacrifice' one block of plaintext in order to change some data in the next one, possibly managing to maliciously alter the message. This is essentially the core idea of the padding oracle attack on CBC, which allows the attacker to decrypt almost an entire ciphertext without knowing the key. For this and many other reasons, a message authentication code is required to guard against any method of tampering. == Complete non-malleability == Fischlin, in 2005, defined the notion of complete non-malleability as the ability of the system to remain non-malleable while giving the adversary additional power to choose a new public key which could be a function of the original public key. In other words, the adversary shouldn't be able to come up with a ciphertext whose underlying plaintext is related to the original message through a relation that also takes public keys into account.

    Read more →
  • Virtual collective consciousness

    Virtual collective consciousness

    Virtual collective consciousness (VCC) is a term rebooted and promoted by two behavioral scientists, Yousri Marzouki and Olivier Oullier in their 2012 Huffington Post article titled: "Revolutionizing Revolutions: Virtual Collective Consciousness and the Arab Spring", after its first appearance in 1999-2000. VCC is now defined as an internal knowledge catalyzed by social media platforms and shared by a plurality of individuals driven by the spontaneity, the homogeneity, and the synchronicity of their online actions. VCC occurs when a large group of persons, brought together by a social media platform think and act with one mind and share collective emotions. Thus, they are able to coordinate their efforts efficiently, and could rapidly spread their word to a worldwide audience. When interviewed about the concept of VCC that appeared in the book - Hyperconnectivity and the Future of Internet Communication - he edited, Professor of Pervasive Computing, Adrian David Cheok mentioned the following: "The idea of a global (collective) virtual consciousness is a bottom-up process and a rather emergent property resulting from a momentum of complex interactions taking place in social networks. This kind of collective behaviour (or intelligence) results from a collision between a physical world and a virtual world and can have a real impact in our life by driving collective action." == Etymology == In 1999-2000, Richard Glen Boire provided a cursory mention and the only occurrence of the term "Virtual collective consciousness" in his text as follows: The trend of technology is to overcome the limitations of the human body. And, the Web has been characterized as a virtual collective consciousness and unconsciousness The recent definition of VCC evolved from the first empirical study that provided a cyberpsychological insight into the contribution of Facebook to the 2011 Tunisian revolution. In this study, the concept was originally called "collective cyberconsciousness". The latter is an extension of the idea of "collective consciousness" coupled with "citizen media" usage. The authors of this study also made a parallel between this original definition of VCC and other comparable concepts such as Durkheim's collective representation, Žižek's "collective mind" or Boguta's "new collective consciousness" that he used to describe the computational history of the Internet shutdown during the Egyptian revolution. Since VCC is the byproduct of the network's successful actions, then these actions must be timely, acute, rapid, domain-specific, and purpose-oriented to successfully achieve their goal. Before reaching a momentum of complexity, each collective behavior starts by a spark that triggers a chain of events leading to a crystallized stance of a tremendous amount of interactions. Thus, VCC is an emergent global pattern from these individual actions. In 2012, the term virtual collective consciousness resurfaced and was brought to light after extending its applications to the Egyptian case and the whole social networking major impact on the success of the so-called Arab Spring. Moreover, the acronym VCC was suggested to identify the theoretical framework covering on-line behaviors leading to a virtual collective consciousness. Hence, online social networks have provided a new and faster way of establishing or modifying "collective consciousness" that was paramount to the 2011 uprisings in the Arab world. == Theoretical underpinnings of VCC == Various theoretical references in fields ranging from sociology to computer science were mentioned in order to account for the key features that render the framework for a virtual collective consciousness. The following list is not exhaustive, but the references it contains are often highlighted: Émile Durkheim's collective representations are at the heart of VCC since collectivity taken decisions according to Durkheim's assumptions will approve or disapprove individuals' actions and help them eventually reach their final goal. Marshall McLuhan's global village: The shrinking of our big world to a small place called cyberspace is made possible by technological extensions of human consciousness. Carl Jung's collective unconscious: When a society witnesses significant changes, the anchoring of archetypal images (e.g., political leaders) seems to be deeply rooted in individuals' collective unconscious that is likely to bias their political choices. Individual memories of public events were also supposed to convey a "collective awareness" that can be subconsciously altered by the instantaneous spread of information through social networking around the world. Daniel Wegner's transactive memory (TM): social-networking platforms such as Facebook during the Tunisian revolution or Twitter during the Egyptian revolution served as placeholders of a VCC where information can be harnessed and steered to the highly specific revolutionary purpose. Although research on TM was originally limited to couples, small groups, and organizations, recent studies strongly suggest that an effective TM can operate on a very large scale too. James Surowiecki's wisdom of crowds Collective influence algorithm: The CI (Collective influence) algorithm is effective in finding influential nodes in a variety of networks, including social networks, communication networks, and biological networks. It has been used to identify influencers on social-media platforms, to identify key nodes in transportation networks, and to identify potential drug-targets in biological networks. == Some illustrations of VCC == Besides the studied effect of social networking on the Tunisian and Egyptian revolutions, the former via Facebook and the latter via Twitter other applications were studied under the prism of VCC framework: The Whitacre's virtual choir: A compelling example of the degree of autonomy and self-identity members of a spontaneously created network through a VCC is Eric Whitacre's unique musical project that involved a collection of singers performing remotely to create a virtual Choir. The effect of all the voices illustrated a genuine virtual collective empathy merging the artist's mind with all the singers through his silent conducting gestures. The Harlem Shake dance: The Bitcoin protocol: It was questioned whether or not the Bitcoin protocol can morph into virtual collective consciousness. The Byzantine generals problem was used as an analogy to understand the behavioral complexity of the community of Bitcoin's users. Artificial Social Networking Intelligence (ASNI): refers to the application of artificial intelligence within social networking services and social media platforms. It encompasses various technologies and techniques used to automate, personalize, enhance, improve, and synchronize users' interactions and experiences within social networks. ASNI is expected to evolve rapidly, influencing how we interact online and shaping our digital experiences. Transparency, ethical considerations, media influence bias, and user control over data will be crucial to ensure responsible development and positive impact.

    Read more →
  • Correlation immunity

    Correlation immunity

    In mathematics, the correlation immunity of a Boolean function is a measure of the degree to which its outputs are uncorrelated with some subset of its inputs. Specifically, a Boolean function is said to be correlation-immune of order m if every subset of m or fewer variables in x 1 , x 2 , … , x n {\displaystyle x_{1},x_{2},\ldots ,x_{n}} is statistically independent of the value of f ( x 1 , x 2 , … , x n ) {\displaystyle f(x_{1},x_{2},\ldots ,x_{n})} . == Definition == A function f : F 2 n → F 2 {\displaystyle f:\mathbb {F} _{2}^{n}\rightarrow \mathbb {F} _{2}} is k {\displaystyle k} -th order correlation immune if for any independent n {\displaystyle n} binary random variables X 0 … X n − 1 {\displaystyle X_{0}\ldots X_{n-1}} , the random variable Z = f ( X 0 , … , X n − 1 ) {\displaystyle Z=f(X_{0},\ldots ,X_{n-1})} is independent from any random vector ( X i 1 … X i k ) {\displaystyle (X_{i_{1}}\ldots X_{i_{k}})} with 0 ≤ i 1 < … < i k < n {\displaystyle 0\leq i_{1}<\ldots Read more →

  • Document mosaicing

    Document mosaicing

    Document mosaicing is a process that stitches multiple, overlapping snapshot images of a document together to produce one large, high resolution composite. The document is slid under a stationary, over-the-desk camera by hand until all parts of the document are snapshotted by the camera's field of view. As the document slid under the camera, all motion of the document is coarsely tracked by the vision system. The document is periodically snapshotted such that the successive snapshots are overlap by about 50%. The system then finds the overlapped pairs and stitches them together repeatedly until all pairs are stitched together as one piece of document. The document mosaicing can be divided into four main processes. Tracking Feature detecting Correspondences establishing Images mosaicing. == Tracking (simple correlation process) == In this process, the motion of the document slid under the camera is coarsely tracked by the system. Tracking is performed by a process called simple correlation process. In the first frame of snapshots, a small patch is extracted from the center of the image as a correlation template. The correlation process is performed in the four times size of the patch area of the next frame. The motion of the paper is indicated by the peak in the correlation function. The peak in the correlation function indicates the motion of the paper. The template is resampled from this frame and the tracking continues until the template reaches the edge of the document. After the template reaches the edge of the document, another snapshot is taken and the tracking process performs repeatedly until the whole document is imaged. The snapshots are stored in an ordered list to facilitate pairing the overlapped images in later processes. == Feature detecting for efficient matching == Feature detection is the process of finding the transformation that aligns one image with another. There are two main approaches for feature detection. Feature-based approach : Motion parameters are estimated from point correspondences. This approach is suitable for the case that there is plenty supply of stable and detectable features. Featureless approach : When the motion between the two images is small, the motion parameters are estimated using optical flow. On the other hand, when the motion between the two images is large, the motion parameters are estimated using generalised cross-correlation. However, this approach requires a computationally expensive resources. Each image is segmented into a hierarchy of columns, lines, and words to match the organised sets of features across images. Skew angle estimation and columns, lines and words finding are the examples of feature detection operations. === Skew angle estimation === Firstly, the angle that the rows of text make with the image raster lines (skew angle) is estimated. It is assumed to lie in the range of ±20°. A small patch of text in the image is selected randomly and then rotated in the range of ±20° until the variance of the pixel intensities of the patch summed along the raster lines is maximised. To ensure that the found skew angle is accurate, the document mosaic system performs calculation at many image patches and derive the final estimation by finding the average of the individual angles weighted by the variance of the pixel intensities of each patch. === Columns, lines and words finding === In this operation, the de-skewed document is intuitively segmented into a hierarchy of columns, lines and words. The sensitivity to illumination and page coloration of the de-skewed document can be removed by applying a Sobel operator to the de-skewed image and thresholding the output to obtain the binary gradient, de-skewed image. The operation can be roughly separated into 3 steps: column segmentation, line segmentation and word segmentation. Columns are easily segmented from the binary gradient, de-skewed images by summing pixels vertically. Baselines of each row are segmented in the same way as the column segmentation process but horizontally. Finally, individual words are segmented by applying the vertical process at each segmented row. These segmentations are important because the document mosaic is created by matching the lower right corners of words in overlapping images pair. Moreover, the segmentation operation can organize the list of images in the context of a hierarchy of rows and column reliably. The segmentation operation involves a considerable amount of summing in the binary gradient, de-skewed images, which done by construct a matrix of partial sums whose elements are given by p i y = ∑ u = 1 i ∑ v = 1 j b u v {\displaystyle p_{iy}=\sum _{u=1}^{i}\sum _{v=1}^{j}b_{uv}} The matrix of partial sums is calculated in one pass through the binary gradient, de-skewed image. ∑ u = u 1 u 2 ∑ v = v 1 v 2 b u v = p u 2 v 2 + p u 1 v 1 − p u 1 v 2 − p u 2 v 1 {\displaystyle \sum _{u=u_{1}}^{u_{2}}\sum _{v=v_{1}}^{v_{2}}b_{uv}=p_{u_{2}v_{2}}+p_{u_{1}v_{1}}-p_{u_{1}v_{2}}-p_{u_{2}v_{1}}} == Correspondences establishing == The two images are now organized in hierarchy of linked lists in following structure : image=list of columns row=list of words column=list of row word=length (in pixels) At the bottom of the structure, the length of each word is recorded for establishing correspondence between two images to reduce to search only the corresponding structures for the groups of words with the matching lengths. === Seed match finding === A seed match finding is done by comparing each row in image1 with each row in image2. The two rows are then compared to each other by every word. If the length (in pixel) of the two words (one from image1 and one from image2) and their immediate neighbours agree with each other within a predefined tolerance threshold (5 pixels, for example), then they are assumed to match. The row of each image is assumed a match if there are three or more word matches between the two rows. The seed match finding operation is terminated when two pairs of consecutive row match are found. === Match list building === After finishing a seed match finding operation, the next process is to build the match list to generate the correspondences points of the two images. The process is done by searching the matching pairs of rows away from the seed row. == Images mosaicing == Given the list of corresponding points of the two images, finding the transformation of the overlapping portion of the images is the next process. Assuming a pinhole camera model, the transformation between pixels (u,v) of image 1 and pixels (u0, v0) of image 2 is demonstrated by a plane-to-plane projectivity. [ s u ′ s v ′ s ] = [ p 11 p 12 p 13 p 21 p 22 p 23 p 31 p 32 1 ] [ u v 1 ] E q .1 {\displaystyle \left[{\begin{array}{c}su'\\sv'\\s\end{array}}\right]=\left[{\begin{array}{ccc}p_{11}&p_{12}&p_{13}\\p_{21}&p_{22}&p_{23}\\p_{31}&p_{32}&1\end{array}}\right]\left[{\begin{array}{c}u\\v\\1\end{array}}\right]\qquad Eq.1} The parameters of the projectivity is found from four pairs of matching points. RANSAC regression technique is used to reject outlying matches and estimate the projectivity from the remaining good matches. The projectivity is fine-tuned using correlation at the corners of the overlapping portion to obtain four correspondences to sub-pixel accuracy. Therefore, image1 is then transformed into image2's coordinate system using Eq.1. The typical result of the process is shown in Figure 5. === Many images coping === Finally, the whole page composition is built up by mapping all the images into the coordinate system of an "anchor" image, which is normally the one nearest the page center. The transformations to the anchor frame are calculated by concatenating the pair-wise transformations found earlier. The raw document mosaic is shown in Figure 6. However, there might be a problem of non-consecutive images that are overlap. This problem can be solved by performing Hierarchical sub-mosaics. As shown in Figure 7, image1 and image2 are registered, as are image3 and image4, creating two sub-mosaics. These two sub-mosaics are later stitched together in another mosaicing process. == Applied areas == There are various areas that the technique of document mosaicing can be applied to such as : Text segmentation of images of documents Document Recognition Interaction with paper on the digital desk Video mosaics for virtual environments Image registration techniques == Relevant research papers == Huang, T.S.; Netravali, A.N. (1994). "Motion and structure from feature correspondences: A review". Proceedings of the IEEE. 82 (2): 252–268. doi:10.1109/5.265351. D.G. Lowe. [1] Perceptual Organization and Visual Recognition. Kluwer Academic Publishers, Boston, 1985. Irani, M.; Peleg, S. (1991). "Improving resolution by image registration". CVGIP: Graphical Models and Image Processing. 53 (3): 231–239. doi:10.1016/1049-9652(91)90045-L. S2CID 4834546. Shivakumara, P.; Kumar, G. Hemantha; Guru, D. S.; Nagabhushan, P. (2006). "

    Read more →
  • White-box cryptography

    White-box cryptography

    In cryptography, the white-box model refers to an extreme attack scenario, in which an adversary has full unrestricted access to a cryptographic implementation, most commonly of a block cipher such as the Advanced Encryption Standard (AES). A variety of security goals may be posed (see the section below), the most fundamental being "unbreakability", requiring that any (bounded) attacker should not be able to extract the secret key hardcoded in the implementation, while at the same time the implementation must be fully functional. In contrast, the black-box model only provides an oracle access to the analyzed cryptographic primitive (in the form of encryption and/or decryption queries). There is also a model in-between, the so-called gray-box model, which corresponds to additional information leakage from the implementation, more commonly referred to as side-channel leakage. White-box cryptography is a practice and study of techniques for designing and attacking white-box implementations. It has many applications, including digital rights management (DRM), pay television, protection of cryptographic keys in the presence of malware, mobile payments and cryptocurrency wallets. Examples of DRM systems employing white-box implementations include CSS and Widevine. White-box cryptography is closely related to the more general notions of obfuscation, in particular, to Black-box obfuscation, proven to be impossible, and to Indistinguishability obfuscation, constructed recently under well-founded assumptions but so far being infeasible to implement in practice. As of January 2023, there are no publicly known unbroken white-box designs of standard symmetric encryption schemes. On the other hand, there exist many unbroken white-box implementations of dedicated block ciphers designed specifically to achieve incompressibility (see § Security goals). == Security goals == Depending on the application, different security goals may be required from a white-box implementation. Specifically, for symmetric-key algorithms the following are distinguished: Unbreakability is the most fundamental goal requiring that a bounded attacker should not be able to recover the secret key embedded in the white-box implementation. Without this requirement, all other security goals are unreachable since a successful attacker can simply use a reference implementation of the encryption scheme together with the extracted key. One-wayness requires that a white-box implementation of an encryption scheme can not be used by a bounded attacker to decrypt ciphertexts. This requirement essentially turns a symmetric encryption scheme into a public-key encryption scheme, where the white-box implementation plays the role of the public key associated to the embedded secret key. This idea was proposed already in the famous work of Diffie and Hellman in 1976 as a potential public-key encryption candidate. Code lifting security is an informal requirement on the context, in which the white-box program is being executed. It demands that an attacker can not extract a functional copy of the program. This goal is particularly relevant in the DRM setting. Code obfuscation techniques are often used to achieve this goal. A commonly used technique is to compose the white-box implementation with so-called external encodings. These are lightweight secret encodings that modify the function computed by the white-box part of an application. It is required that their effect is canceled in other parts of the application in an obscure way, using code obfuscation techniques. Alternatively, the canceling counterparts can be applied on a remote server. Incompressibility requires that an attacker can not significantly compress a given white-box implementation. This can be seen as a way to achieve code lifting security (see above), since exfiltrating a large program from a constrained device (for example, an embedded or a mobile device) can be time-consuming and may be easy to detect by a firewall. Examples of incompressible designs include SPACE cipher, SPNbox, WhiteKey and WhiteBlock. These ciphers use large lookup tables that can be pseudorandomly generated from a secret master key. Although this makes the recovery of the master key hard, the lookup tables themselves play the role of an equivalent secret key. Thus, unbreakability is achieved only partially. Traceability (Traitor tracing) requires that each distributed white-box implementation contains a digital watermark allowing identification of the guilty user in case the white-box program is being leaked and distributed publicly. == History == The white-box model with initial attempts of white-box DES and AES implementations were first proposed by Chow, Eisen, Johnson and van Oorshot in 2003. The designs were based on representing the cipher as a network of lookup tables and obfuscating the tables by composing them with small (4- or 8-bit) random encodings. Such protection satisfied a property that each single obfuscated table individually does not contain any information about the secret key. Therefore, a potential attacker has to combine several tables in their analysis. The first two schemes were broken in 2004 by Billet, Gilbert, and Ech-Chatbi using structural cryptanalysis. The attack was subsequently called "the BGE attack". The numerous consequent design attempts (2005-2022) were quickly broken by practical dedicated attacks. In 2016, Bos, Hubain, Michiels and Teuwen showed that an adaptation of standard side-channel power analysis attacks can be used to efficiently and fully automatically break most existing white-box designs. This result created a new research direction about generic attacks (correlation-based, algebraic, fault injection) and protections against them. == Competitions == Four editions of the WhibOx contest were held in 2017, 2019, 2021 and 2024 respectively. These competitions invited white-box designers both from academia and industry to submit their implementation in the form of (possibly obfuscated) C code. At the same time, everyone could attempt to attack these programs and recover the embedded secret key. Each of these competitions lasted for about 4-5 months. WhibOx 2017 / CHES 2017 Capture the Flag Challenge targeted the standard AES block cipher. Among 94 submitted implementations, all were broken during the competition, with the strongest one staying unbroken for 28 days. WhibOx 2019 / CHES 2019 Capture the Flag Challenge again targeted the AES block cipher. Among 27 submitted implementations, 3 programs stayed unbroken throughout the competition, but were broken after 51 days since the publication. WhibOx 2021 / CHES 2021 Capture the Flag Challenge changed the target to ECDSA, a digital signature scheme based on elliptic curves. Among 97 submitted implementations, all were broken within at most 2 days. WhibOx 2024 / CHES 2024 Capture the Flag Challenge again targeted ECDSA. Among 47 submitted implementations, all were broken during the competition, with the strongest one staying unbroken for almost 5 days.

    Read more →
  • Data validation and reconciliation

    Data validation and reconciliation

    Industrial process data validation and reconciliation, or more briefly, process data reconciliation (PDR), is a technology that uses process information and mathematical methods in order to automatically ensure data validation and reconciliation by correcting measurements in industrial processes. The use of PDR allows for extracting accurate and reliable information about the state of industry processes from raw measurement data and produces a single consistent set of data representing the most likely process operation. == Models, data and measurement errors == Industrial processes, for example chemical or thermodynamic processes in chemical plants, refineries, oil or gas production sites, or power plants, are often represented by two fundamental means: Models that express the general structure of the processes, Data that reflects the state of the processes at a given point in time. Models can have different levels of detail, for example one can incorporate simple mass or compound conservation balances, or more advanced thermodynamic models including energy conservation laws. Mathematically the model can be expressed by a nonlinear system of equations F ( y ) = 0 {\displaystyle F(y)=0\,} in the variables y = ( y 1 , … , y n ) {\displaystyle y=(y_{1},\ldots ,y_{n})} , which incorporates all the above-mentioned system constraints (for example the mass or heat balances around a unit). A variable could be the temperature or the pressure at a certain place in the plant. === Error types === Data originates typically from measurements taken at different places throughout the industrial site, for example temperature, pressure, volumetric flow rate measurements etc. To understand the basic principles of PDR, it is important to first recognize that plant measurements are never 100% correct, i.e. raw measurement y {\displaystyle y\,} is not a solution of the nonlinear system F ( y ) = 0 {\displaystyle F(y)=0\,\!} . When using measurements without correction to generate plant balances, it is common to have incoherencies. Measurement errors can be categorized into two basic types: random errors due to intrinsic sensor accuracy and systematic errors (or gross errors) due to sensor calibration or faulty data transmission. Random errors means that the measurement y {\displaystyle y\,\!} is a random variable with mean y ∗ {\displaystyle y^{}\,\!} , where y ∗ {\displaystyle y^{}\,\!} is the true value that is typically not known. A systematic error on the other hand is characterized by a measurement y {\displaystyle y\,\!} which is a random variable with mean y ¯ {\displaystyle {\bar {y}}\,\!} , which is not equal to the true value y ∗ {\displaystyle y^{}\,} . For ease in deriving and implementing an optimal estimation solution, and based on arguments that errors are the sum of many factors (so that the Central limit theorem has some effect), data reconciliation assumes these errors are normally distributed. Other sources of errors when calculating plant balances include process faults such as leaks, unmodeled heat losses, incorrect physical properties or other physical parameters used in equations, and incorrect structure such as unmodeled bypass lines. Other errors include unmodeled plant dynamics such as holdup changes, and other instabilities in plant operations that violate steady state (algebraic) models. Additional dynamic errors arise when measurements and samples are not taken at the same time, especially lab analyses. The normal practice of using time averages for the data input partly reduces the dynamic problems. However, that does not completely resolve timing inconsistencies for infrequently-sampled data like lab analyses. This use of average values, like a moving average, acts as a low-pass filter, so high frequency noise is mostly eliminated. The result is that, in practice, data reconciliation is mainly making adjustments to correct systematic errors like biases. === Necessity of removing measurement errors === ISA-95 is the international standard for the integration of enterprise and control systems It asserts that: Data reconciliation is a serious issue for enterprise-control integration. The data have to be valid to be useful for the enterprise system. The data must often be determined from physical measurements that have associated error factors. This must usually be converted into exact values for the enterprise system. This conversion may require manual, or intelligent reconciliation of the converted values [...]. Systems must be set up to ensure that accurate data are sent to production and from production. Inadvertent operator or clerical errors may result in too much production, too little production, the wrong production, incorrect inventory, or missing inventory. == History == PDR has become more and more important due to industrial processes that are becoming more and more complex. PDR started in the early 1960s with applications aiming at closing material balances in production processes where raw measurements were available for all variables. At the same time the problem of gross error identification and elimination has been presented. In the late 1960s and 1970s unmeasured variables were taken into account in the data reconciliation process., PDR also became more mature by considering general nonlinear equation systems coming from thermodynamic models., , Quasi steady state dynamics for filtering and simultaneous parameter estimation over time were introduced in 1977 by Stanley and Mah. Dynamic PDR was formulated as a nonlinear optimization problem by Liebman et al. in 1992. == Data reconciliation == Data reconciliation is a technique that targets at correcting measurement errors that are due to measurement noise, i.e. random errors. From a statistical point of view the main assumption is that no systematic errors exist in the set of measurements, since they may bias the reconciliation results and reduce the robustness of the reconciliation. Given n {\displaystyle n} measurements y i {\displaystyle y_{i}} , data reconciliation can mathematically be expressed as an optimization problem of the following form: min x , y ∗ ∑ i = 1 n ( y i ∗ − y i σ i ) 2 subject to F ( x , y ∗ ) = 0 y min ≤ y ∗ ≤ y max x min ≤ x ≤ x max , {\displaystyle {\begin{aligned}\min _{x,y^{}}&\sum _{i=1}^{n}\left({\frac {y_{i}^{}-y_{i}}{\sigma _{i}}}\right)^{2}\\{\text{subject to }}&F(x,y^{})=0\\&y_{\min }\leq y^{}\leq y_{\max }\\&x_{\min }\leq x\leq x_{\max },\end{aligned}}\,\!} where y i ∗ {\displaystyle y_{i}^{}\,\!} is the reconciled value of the i {\displaystyle i} -th measurement ( i = 1 , … , n {\displaystyle i=1,\ldots ,n\,\!} ), y i {\displaystyle y_{i}\,\!} is the measured value of the i {\displaystyle i} -th measurement ( i = 1 , … , n {\displaystyle i=1,\ldots ,n\,\!} ), x j {\displaystyle x_{j}\,\!} is the j {\displaystyle j} -th unmeasured variable ( j = 1 , … , m {\displaystyle j=1,\ldots ,m\,\!} ), and σ i {\displaystyle \sigma _{i}\,\!} is the standard deviation of the i {\displaystyle i} -th measurement ( i = 1 , … , n {\displaystyle i=1,\ldots ,n\,\!} ), F ( x , y ∗ ) = 0 {\displaystyle F(x,y^{})=0\,\!} are the p {\displaystyle p\,\!} process equality constraints and x min , x max , y min , y max {\displaystyle x_{\min },x_{\max },y_{\min },y_{\max }\,\!} are the bounds on the measured and unmeasured variables. The term ( y i ∗ − y i σ i ) 2 {\displaystyle \left({\frac {y_{i}^{}-y_{i}}{\sigma _{i}}}\right)^{2}\,\!} is called the penalty of measurement i. The objective function is the sum of the penalties, which will be denoted in the following by f ( y ∗ ) = ∑ i = 1 n ( y i ∗ − y i σ i ) 2 {\displaystyle f(y^{})=\sum _{i=1}^{n}\left({\frac {y_{i}^{}-y_{i}}{\sigma _{i}}}\right)^{2}} . In other words, one wants to minimize the overall correction (measured in the least squares term) that is needed in order to satisfy the system constraints. Additionally, each least squares term is weighted by the standard deviation of the corresponding measurement. The standard deviation is related to the accuracy of the measurement. For example, at a 95% confidence level, the standard deviation is about half the accuracy. === Redundancy === Data reconciliation relies strongly on the concept of redundancy to correct the measurements as little as possible in order to satisfy the process constraints. Here, redundancy is defined differently from redundancy in information theory. Instead, redundancy arises from combining sensor data with the model (algebraic constraints), sometimes more specifically called "spatial redundancy", "analytical redundancy", or "topological redundancy". Redundancy can be due to sensor redundancy, where sensors are duplicated in order to have more than one measurement of the same quantity. Redundancy also arises when a single variable can be estimated in several independent ways from separate sets of measurements at a given time or time averaging period, using the algebraic constraints. Redundancy is linked to the concept

    Read more →
  • Social media use in the financial services sector

    Social media use in the financial services sector

    Social media in the financial services sector refers to the use of social media by the financial services sector to promote and distribute financial services. Social media is used in various aspects of the financial industry including customer service, marketing, and product development. It has enabled financial institutions to extend their reach through direct and real-time communication with customers, fostering more personal connections. It also allows individuals to talk to other individuals creating lending and trading via social groups as well as developing new financial services by fintech startup companies. In terms of marketing, social media is utilized by both traditional financial companies as well as disruptive fintech companies such as peer-to-peer lending (P2P) companies. The financial industry has used information technology since its inception in the 1960s and social media fits in with this ongoing development. Larger, traditional financial firms have integrated social media into their marketing strategies. Companies in the financial sector are subject to strict regulations that include how they use social media. In the United States, the Financial Industry Regulatory Authority (FINRA) is a key regulator that sets rules how financial firms can interact with consumers. This includes ensuring that social media posts follow financial advertising rules, such as being fair and balanced and not providing misleading information, and that financial advice is not provided by unqualified personnel, such as influencers. == History == In 2003, at the beginning of social media development, MySpace was founded as a "social networking service." It allowed people to create a profile, connect with other people, and post videos, pictures, and songs. As MySpace grew in popularity, it attracted interest from companies wishing to promote their brands on the social platform. They were joined by Facebook and in 2010 by Instagram. Financial service firms were initially slow to adapt to promotion via social media but soon joined other big firms after they saw the success other industries had in engaging with younger people. == Uses == === Branding === While companies are able to connect with more people remotely through providing online financial services, their branding strategy has shifted from customized to standardized. Prior to the outbreak of technology, most banks used customized branding where they targeted only customers in their regions. Businesses can now use technology to operate beyond their geographic location and maintain a consistent image across multiple countries with standardized branding. By being able to extend a consistent brand reputation across a wider geographic location, financial services companies can take advantage of economies of scale in advertising cost, lower administrative complexity, lower entry into new markets, and improved cross-border learning within the company. === Customer engagement === Online banking reduced face-to-face interaction between customers and their banks. Most banking transactions can now be conducted online or through mobile devices, rather than at a local branch with a teller. Social media provides a channel for firms to maintain personal contact with customers, replicating some of the interaction that was previously available at local branches. For example, a bank's Facebook page may feature an employee profile describing their job duties, which serves to present a more human face for larger institutions. === Lending === Social media is a core marketing channel for online peer-to-peer lending as well as small business lenders. Since these companies operate exclusively online, it makes sense for them to market online through social media channels. They are able to grow and find new lenders and buyers by utilizing social networks. === Trading === Social trading is an alternative way of analyzing financial data by looking at what other traders are doing and comparing, copying and discussing their techniques and strategies. Prior to the advent of social trading, investors and traders were relying on fundamental or technical analysis to form their investment decisions. Using social trading investors and traders could integrate into their investment decision-process social indicators from trading data-feeds of other traders. Investors also use platform like Reddit, Signal messaging or WeChat to create social communities to discuss investments and finance. In some cases they use this to join together using meme stocks to move financial markets, such as the 2021 GameStop short squeeze incident. They can also use social groups to launch and promote new products such as cryptocurrencies. Investing application like WeBull incorporate a forum style messaging system on each stock that is available for trading. Financial brokers such as Fidelity Investments, Interactive Brokers, and E-Trade have moved to incorporate community features in their investment apps. == Regulations == The use of social media by investors and financial services professionals for business purposes is subject to regulatory oversight, in the United States this is done primarily by the Financial Industry Regulatory Authority (FINRA). FINRA's rules, designed to protect investors from misleading information in all communications and this also applies to social media communications. This includes ensuring that social media posts follow financial advertising rules, such as being fair and balanced and not providing misleading information, and that advice is not provided by unqualified personnel, such as influencers and bank staff acting in a personal capacity. Financial firms have to maintain books and records of all interaction with customers and this includes social media. == New products and services == Social media has created entirely new products for the financial services sector, revolutionizing products and developing new industries through the merging of social technology and financial services. Fintech startups use social media to promote products to get them established. Several developing nations have used social media to leapfrog traditional financial technology; for example, WeChat Pay, which developed from the Chinese WeChat social media platform, became a major payment system in China within a few years. In 2015, according to consulting firm Accenture, 390 million people in China had registered to use mobile banking. This figure is more than the population of the United States. In the United States, the fintech company Venmo combines technology and financial services on a social platform. Other financial technology companies that have used social media to develop or promote financial products include: Lending Club – One of the first peer-to-peer lending businesses OnDeck Capital – A US online-only lending business Funding Circle – A UK-based online lending company Wise – A global online money transfers company Kabbage – A US online unsecured loan company later acquired by American Express Avant – A US online unsecured loan company Zopa – A UK online neobank providing peer-to-peer lending == Risks == === Reputational damage === Due to the real-time nature of social media, financial services companies can be impacted by potential reputational issues. Any negative experience by customers can easily be shared online and could become a viral phenomenon, those comments could likely have a detrimental effect on the company’s stock price and reputation. On the other hand, any positive experience a customer has can also be shared online. However, positive experiences are much less likely to become viral. === Scams === The nature of social media makes it easy to target individuals without being seen by the wider community, this allows scammers to target individuals. Example include romance scams such as the pig butchering scam where an individual is tricked to transfer funds or assets to the scammer over social media making it hard for law enforcement to track them or recover funds. === Customer privacy === Customer privacy is important for the financial services industry. It is critical that customer information such as a bank account numbers and other personal information is kept private. However, this information can be leaked if for example, a customer is unhappy with a bank’s service, they may tweet at the bank expressing their frustrations and include their name and account number.

    Read more →
  • Digital video effect

    Digital video effect

    Digital video effects (DVEs) are visual effects that provide comprehensive live video image manipulation, in the same form as optical printer effects in film. DVEs differ from standard video switcher effects (often referred to as analog effects) such as wipes or dissolves, in that they deal primarily with resizing, distortion or movement of the image. Modern video switchers often contain internal DVE functionality. Modern DVE devices are incorporated in high-end broadcast video switchers. Early examples of DVE devices found in the broadcast post-production industry include the Ampex Digital Optics (ADO), Quantel DPE-5000, Vital Squeezoom, NEC E-Flex and the Abekas A5x series of DVEs. By 1988, Grass Valley Group caught up with the competition with their Kaleidoscope, which integrated ADO-type effects with their widely used line of broadcast switching gear. DVEs are used by the broadcast television industry in live television production environments like television studios and outside broadcasts. They are commonly used in video post-production.

    Read more →
  • Scalable Coherent Interface

    Scalable Coherent Interface

    The Scalable Coherent Interface or Scalable Coherent Interconnect (SCI), is a high-speed interconnect standard for shared memory multiprocessing and message passing. The goal was to scale well, provide system-wide memory coherence and a simple interface; i.e. a standard to replace existing buses in multiprocessor systems with one with no inherent scalability and performance limitations. The IEEE Std 1596-1992, IEEE Standard for Scalable Coherent Interface (SCI) was approved by the IEEE standards board on March 19, 1992. It saw some use during the 1990s, but never became widely used and has been replaced by other systems from the early 2000s. == History == Soon after the Fastbus (IEEE 960) follow-on Futurebus (IEEE 896) project in 1987, some engineers predicted it would already be too slow for the high performance computing marketplace by the time it would be released in the early 1990s. In response, a "Superbus" study group was formed in November 1987. Another working group of the standards association of the Institute of Electrical and Electronics Engineers (IEEE) spun off to form a standard targeted at this market in July 1988. It was essentially a subset of Futurebus features that could be easily implemented at high speed, along with minor additions to make it easier to connect to other systems, such as VMEbus. Most of the developers had their background from high-speed computer buses. Representatives from companies in the computer industry and research community included Amdahl, Apple Computer, BB&N, Hewlett-Packard, CERN, Dolphin Server Technology, Cray Research, Sequent, AT&T, Digital Equipment Corporation, McDonnell Douglas, National Semiconductor, Stanford Linear Accelerator Center, Tektronix, Texas Instruments, Unisys, University of Oslo, University of Wisconsin. The original intent was a single standard for all buses in the computer. The working group soon came up with the idea of using point-to-point communication in the form of insertion rings. This avoided the lumped capacitance, limited physical length/speed of light problems and stub reflections in addition to allowing parallel transactions. The use of insertion rings is credited to Manolis Katevenis who suggested it at one of the early meetings of the working group. The working group for developing the standard was led by David B. Gustavson (chair) and David V. James (Vice Chair). David V. James was a major contributor for writing the specifications including the executable C-code. Stein Gjessing’s group at the University of Oslo used formal methods to verify the coherence protocol and Dolphin Server Technology implemented a node controller chip including the cache coherence logic. Different versions and derivatives of SCI were implemented by companies like Dolphin Interconnect Solutions, Convex, Data General AViiON (using cache controller and link controller chips from Dolphin), Sequent and Cray Research. Dolphin Interconnect Solutions implemented a PCI and PCI-Express connected derivative of SCI that provides non-coherent shared memory access. This implementation was used by Sun Microsystems for its high-end clusters, Thales Group and several others including volume applications for message passing within HPC clustering and medical imaging. SCI was often used to implement non-uniform memory access architectures. It was also used by Sequent Computer Systems as the processor memory bus in their NUMA-Q systems. Numascale developed a derivative to connect with coherent HyperTransport. == The standard == The standard defined two interface levels: The physical level that deals with electrical signals, connectors, mechanical and thermal conditions The logical level that describes the address space, data transfer protocols, cache coherence mechanisms, synchronization primitives, control and status registers, and initialization and error recovery facilities. This structure allowed new developments in physical interface technology to be easily adapted without any redesign on the logical level. Scalability for large systems is achieved through a distributed directory-based cache coherence model. (The other popular models for cache coherency are based on system-wide eavesdropping (snooping) of memory transactions – a scheme which is not very scalable.) In SCI each node contains a directory with a pointer to the next node in a linked list that shares a particular cache line. SCI defines a 64-bit flat address space (16 exabytes) where 16 bits are used for identifying a node (65,536 nodes) and 48 bits for address within the node (256 terabytes). A node can contain many processors and/or memory. The SCI standard defines a packet switched network. === Topologies === SCI can be used to build systems with different types of switching topologies from centralized to fully distributed switching: With a central switch, each node is connected to the switch with a ringlet (in this case a two-node ring). In distributed switching systems, each node can be connected to a ring of arbitrary length and either all or some of the nodes can be connected to two or more rings. The most common way to describe these multi-dimensional topologies is k-ary n-cubes (or tori). The SCI standard specification mentions several such topologies as examples. The 2-D torus is a combination of rings in two dimensions. Switching between the two dimensions requires a small switching capability in the node. This can be expanded to three or more dimensions. The concept of folding rings can also be applied to the Torus topologies to avoid any long connection segments. === Transactions === SCI sends information in packets. Each packet consists of an unbroken sequence of 16-bit symbols. The symbol is accompanied by a flag bit. A transition of the flag bit from 0 to 1 indicates the start of a packet. A transition from 1 to 0 occurs 1 (for echoes) or 4 symbols before the packet end. A packet contains a header with address command and status information, payload (from 0 through optional lengths of data) and a CRC check symbol. The first symbol in the packet header contains the destination node address. If the address is not within the domain handled by the receiving node, the packet is passed to the output through the bypass FIFO. In the other case, the packet is fed to a receive queue and may be transferred to a ring in another dimension. All packets are marked when they pass the scrubber (a node is established as scrubber when the ring is initialized). Packets without a valid destination address will be removed when passing the scrubber for the second time to avoid filling the ring with packets that would otherwise circulate indefinitely. === Cache coherence === Cache coherence ensures data consistency in multiprocessor systems. The simplest form applied in earlier systems was based on clearing the cache contents between context switches and disabling the cache for data that were shared between two or more processors. These methods were feasible when the performance difference between the cache and memory were less than one order of magnitude. Modern processors with caches that are more than two orders of magnitude faster than main memory would not perform anywhere near optimal without more sophisticated methods for data consistency. Bus based systems use eavesdropping (snooping) methods since buses are inherently broadcast. Modern systems with point-to point links use broadcast methods with snoop filter options to improve performance. Since broadcast and eavesdropping are inherently non-scalable, these are not used in SCI. Instead, SCI uses a distributed directory-based cache coherence protocol with a linked list of nodes containing processors that share a particular cache line. Each node holds a directory for the main memory of the node with a tag for each line of memory (same line length as the cache line). The memory tag holds a pointer to the head of the linked list and a state code for the line (three states – home, fresh, gone). Associated with each node is also a cache for holding remote data with a directory containing forward and backward pointers to nodes in the linked list sharing the cache line. The tag for the cache has seven states (invalid, only fresh, head fresh, only dirty, head dirty, mid valid, tail valid). The distributed directory is scalable. The overhead for the directory based cache coherence is a constant percentage of the node’s memory and cache. This percentage is in the order of 4% for the memory and 7% for the cache. == Legacy == SCI is a standard for connecting the different resources within a multiprocessor computer system, and it is not as widely known to the public as for example the Ethernet family for connecting different systems. Different system vendors implemented different variants of SCI for their internal system infrastructure. These different implementations interface to very intricate mechanisms in processors and memory systems and each vendor has to preserve some degrees of

    Read more →
  • Commit (data management)

    Commit (data management)

    In computer science and data management, a commit is a behavior that marks the end of a transaction and provides Atomicity, Consistency, Isolation, and Durability (ACID) in transactions. The submission records are stored in the submission log for recovery and consistency in case of failure. In terms of transactions, the opposite of committing is giving up tentative changes to the transaction, which is rolled back. Due to the rise of distributed computing and the need to ensure data consistency across multiple systems, commit protocols have been evolving since their emergence in the 1970s. The main developments include the Two-Phase Commit (2PC) first proposed by Jim Gray, which is the fundamental core of distributed transaction management. Subsequently, the Three-phase Commit (3PC), Hypothesis Commit (PC), Hypothesis Abort (PA), and Optimistic Commit protocols gradually emerged, solving the problems of blocking and fault recovery. Today, new fields such as e-commerce payment and blockchain technology are emerging, and submission protocols play a significant role in various business areas. By effectively handling transactions, resolving faults and recovering problems, the commit protocol becomes crucial in ensuring the reliability and consistency of data management. == History == The concept of Commit originated in the late 1960s and early 1970s, when computer technology was rapidly advancing and data management was becoming an important requirement in business and finance. Enterprises have gradually replaced the traditional paper records with computers, which has fully improved the work efficiency. The reliability and consistency of data have become a necessary requirement. Transaction management at this stage is relatively simple, limited to using a single computer for processing. It merely effectively records the changes in data to ensure that the data remains stable after the transaction is completed or terminated. In the late 1970s, as database systems moved from a single calculator operation to multiple distributed collaborations, ensuring data consistency and reliability became a new challenge. In 1978, computer scientist Jim Gray proposed the famous two-phase Commit Protocol (2PC), which became an effective solution for distributed transaction management, successfully managing data synchronization problems between multiple nodes. However, this commit protocol has some potential transaction blocking problems when nodes fail. In the early 1980s, researchers discovered that although the two-step commit protocol was effective at synchronizing data, there could be long waits and even system crashes, with limitations. To improve this problem, people have begun to explore new and effective methods, including enhancing efficiency by reducing message communication during the protocol process. IBM's R database introduced the Assumed Commit and Assumed abort protocols, which contributed significantly to transaction management efficiency. These two protocols have greatly improved the processing efficiency of distributed transactions by reducing communication overhead and have become an important breakthrough in the technology of transaction commit protocols. By the early 1990s, with the increase in business demands and the complexity of transactions, enterprises required higher efficiency in distributed transaction processing. In order to adapt to the needs of different environments, the scientific community has gradually developed various variants of commit protocols to provide more flexible transaction management options for different needs. For example, the three-phase commit protocol promotes the commit of transactions more effectively and reduces the occurrence of blocking problems by adding a pre-commit protocol and a timeout mechanism. In the 21st century, with the popularization of mobile Internet and wireless technology, the commit protocol has been further developed, and researchers have begun to pay attention to how to reduce the blocking in the transaction process to solve the problem of broadband limitation, battery life and network instability in the mobile environment. The proposal of optimistic commit protocol marks the extension of commit technology from traditional database to the emerging mobile data field. This protocol allows transactions to temporarily use unconfirmed data, improving the user experience in cases of poor network conditions. In recent years, with the rise of blockchain and decentralized technologies, submission protocols and consensus mechanisms have gradually merged. These consensus algorithms play a role in tamper-proofing and preventing malicious attacks on node pairs in a decentralized environment. This enables commit to no longer be confined to the scope of traditional database management, but to become the core technology of trust computing and distributed ledgers, further expanding the application field of commit in the digital age. This integration has brought about extensive application impacts. Each transaction can achieve the effect of tracking global submissions through the verification of the consensus mechanism, becoming an important technical foundation for promoting the circulation of digital assets, the operation of cryptocurrencies and decentralized applications. == Commit Protocol Types == In the world of data management, a transaction is a series of database operations, such as bank transfers and order submission. In order to ensure the accuracy, consistency, and security of the data, transactions are usually completed completely, or cancelled completely, leaving no partially completed results. Commit protocol is the method used to coordinate this process. Different protocols are applicable to different submission scenarios and have their own advantages and disadvantages. There are four major commit protocols. === Two-Phase Commit (2PC) === The two-phase commit protocol is the most classic and broadest approach to distributed transactions, which includes both a preparation phase and a commit phase. This commit protocol is designed to allow the database coordinator to determine if all participating nodes agree. The preparation phase is the phase in which the coordination node sends a ready to commit request to all nodes participating in the transaction. The commit phase is a global commit after all participating nodes are ready, and if no agreement is reached, all nodes roll back the transaction and undo all previous operations. Although the two-phase commit protocol is the easiest to operate and widely used, its obvious drawback is that it can cause transactions to be blocked for a long time when nodes fail, resulting in a decline in system performance and making it difficult to terminate or continue immediately. === Three-Phase Commit (3PC) === The three-phase commit protocol is an improved non-blocking protocol based on 2PC, which is divided into three stages: preparation, pre-commit and commit. Firstly, each node sends a "preparation" request. After confirmation, a "pre-submission" stage is added. At this point, each node has completed most of the preparatory work and is waiting for the final confirmation. Finally, in the formal commit stage, after all nodes send the "commit" request, the transaction is completed and committed. Compared with 2PC, it increases the timeout mechanism, avoids the blocking problem caused by single point of failure, and improves the reliability of the system. The three-phase commit protocol significantly optimizes transaction reliability, but adds additional overhead for message transmission and state maintenance. It is more suitable for distributed application scenarios with high transaction sensitivity and no acceptance of long waiting times. === Presumed Commit (PC) and Presumed Abort (PA) === Presumed Commit (PC) is the default that the transaction will be committed successfully and rollback will be notified unless an anomaly is encountered. This commit reduces the message overhead and logging costs of a normal commits. Presumed Abort (PA) is assumed that the default state of the transaction is a rollback and will only be committed when all nodes have explicitly agreed. This commit is applicable to transactions that are not updated frequently or have a low probability of successful commit. The IBM R Distributed Database management System was the first to propose and practice the PC and PA protocols, handling distributed transaction management very efficiently and becoming a classic case in the field of database transaction management. === Optimistic Commit Protocol === With the rise of the Internet, the previous commit protocols are facing new challenges, especially in mobile scenarios with unstable networks. Excessively long transaction waiting times can affect the user experience. The Optimistic Commit Protocol allows a transaction to temporarily access uncommitted data before committing to avoid wait times. This type of commit is suitable f

    Read more →