System of record

System of record

A system of record (SOR) or source system of record (SSoR) is a data management term for an information storage system (commonly implemented on a computer system running a database management system) that is the authoritative data source for a given data element or piece of information, like for example a row (or record) in a table. In data vault it is referred to as the record source. == Background == The need to identify systems of record can become acute in organizations where management information systems have been built by taking output data from multiple source systems, re-processing this data, and then re-presenting the result for a new business use. In these cases, multiple information systems may disagree about the same piece of information. These disagreements may stem from semantic differences, differences in opinion, use of different sources, differences in the timing of the extract, transform, load processes that create the data they report against, or may simply be the result of bugs. == Use == The integrity and validity of any data set is open to question when there is no traceable connection to a good source, and listing a source system of record is a solution to this. Where the integrity of the data is vital, if there is an agreed system of record, the data element must either be linked to, or extracted directly from it. In other cases, the provenance and estimated data quality should be documented. The "system of record" approach is a good fit for environments where both: there is a single authority over all data consumers, and all consumers have similar needs == Trade-offs == In diverse environments, one instead needs to support the presence of multiple opinions. Consumers may accept different authorities or may differ on what constitutes an authoritative source—researchers may prefer carefully vetted data, while tactical military systems may require the most recent credible report.

Articulatory speech recognition

Articulatory speech recognition means the recovery of speech (in forms of phonemes, syllables or words) from acoustic signals with the help of articulatory modeling or an extra input of articulatory movement data. Speech recognition (or automatic speech recognition, acoustic speech recognition) means the recovery of speech from acoustics (sound wave) only. Articulatory information is extremely helpful when the acoustic input is in low quality, perhaps because of noise or missing data. Measurable information from the articulatory system (e.g. tongue, jaw movements) can supplement acoustic signals to improve phone recognition accuracy by 2%. However, attempts to estimate articulatory data from acoustic signals alone have not significantly enhanced recognition performance.

BitFunnel

BitFunnel is the search engine indexing algorithm and a set of components used in the Bing search engine, which were made open source in 2016. BitFunnel uses bit-sliced signatures instead of an inverted index in an attempt to reduce operations cost. == History == Progress on the implementation of BitFunnel was made public in early 2016, with the expectation that there would be a usable implementation later that year. In September 2016, the source code was made available via GitHub. A paper discussing the BitFunnel algorithm and implementation was released as through the Special Interest Group on Information Retrieval of the Association for Computing Machinery in 2017 and won the Best Paper Award. == Components == BitFunnel consists of three major components: BitFunnel – the text search/retrieval system itself WorkBench – a tool for preparing text for use in BitFunnel NativeJIT – a software component that takes expressions that use C data structures and transforms them into highly optimized assembly code == Algorithm == === Initial problem and solution overview === The BitFunnel paper describes the "matching problem", which occurs when an algorithm must identify documents through the usage of keywords. The goal of the problem is to identify a set of matches given a corpus to search and a query of keyword terms to match against. This problem is commonly solved through inverted indexes, where each searchable item is maintained with a map of keywords. In contrast, BitFunnel represents each searchable item through a signature. A signature is a sequence of bits which describe a Bloom filter of the searchable terms in a given searchable item. The bloom filter is constructed through hashing through several bit positions. === Theoretical implementation of bit-string signatures === The signature of a document (D) can be described as the logical-or of its term signatures: S D → = ⋃ t ∈ D S t → {\displaystyle {\overrightarrow {S_{D}}}=\bigcup _{t\in D}{\overrightarrow {S_{t}}}} Similarly, a query for a document (Q) can be defined as a union: S Q → = ⋃ t ∈ Q S t → {\displaystyle {\overrightarrow {S_{Q}}}=\bigcup _{t\in Q}{\overrightarrow {S_{t}}}} Additionally, a document D is a member of the set M' when the following condition is satisfied: S Q → ∩ S D → = S Q → {\displaystyle {\overrightarrow {S_{Q}}}\cap {\overrightarrow {S_{D}}}={\overrightarrow {S_{Q}}}} This knowledge is then combined to produce a formula where M' is identified by documents which match the query signature: M ′ = { D ∈ C ∣ S Q → ∩ S D → = S Q → } {\displaystyle M'=\left\{D\in C\mid {\overrightarrow {S_{Q}}}\cap {\overrightarrow {S_{D}}}={\overrightarrow {S_{Q}}}\right\}} These steps and their proofs are discussed in the 2017 paper. === Pseudocode for bit-string signatures === This algorithm is described in the 2017 paper. M ′ = ∅ foreach D ∈ C do if S D → ∩ S Q → = S Q → then M ′ = M ′ ∪ { D } endif endfor {\displaystyle {\begin{array}{l}M'=\emptyset \\{\texttt {foreach}}\ D\in C\ {\texttt {do}}\\\qquad {\texttt {if}}\ {\overrightarrow {S_{D}}}\cap {\overrightarrow {S_{Q}}}={\overrightarrow {S_{Q}}}\ {\texttt {then}}\\\qquad \qquad M'=M'\cup \{D\}\\\qquad {\texttt {endif}}\\{\texttt {endfor}}\end{array}}}

Internet Security Alliance

Internet Security Alliance (ISA) was founded in 2001 as a non-profit collaboration between Carnegie Mellon University's CyLab and Electronic Industries Alliance, a federation of trade associations. The Internet Security Alliance is focused on cyber security, acting as a forum for information sharing and leadership on information security, and lobbying for corporate security interests. == International operations == The Internet Security Alliance operates with a global membership to provide international security for its partners. The organization's membership includes companies located on four continents, and the Executive Committee always includes at least one non-U.S.-based company. The Internet Security Alliance believes that international communication is crucial for long-term greater information security, as it allows for a more realistic approach to addressing the many challenges faced by users of the Internet. == Publications == Published in 2009, The Financial Impact of Cyber Risk is the first known guidance document to attempt to approach the financial impact of cyber risks from the perspective of core business functions. It claims to provide guidance to CFOs and their colleagues responsible for legal issues, business operations and technology, privacy and compliance, risk assessment and insurance, and corporate communications.

Data exhaust

Data exhaust (also exhaust data) is the trail of data generated as a by-product of users' online activity, behaviour, and transactions, rather than data they deliberately create or submit. It forms part of a broader category of unconventional data that also includes geospatial, network, and time-series data, and may be useful for predictive analytics. Data exhaust can take the form of cookies, temporary files, log files, clickstream records and stored preferences. Actions such as visiting a web page, following a link, or dwelling on an element may all generate exhaust data that is recorded without the user's active awareness. Unlike primary content — which the user intentionally creates — exhaust data is a passive side effect of interaction. A bank, for example, might treat the amounts and parties involved in a transaction as primary data, while secondary data could include whether the transaction was carried out at a cash machine rather than a branch. == Uses == Data exhaust collected by companies is often information that is not immediately useful in isolation, but can be aggregated and analysed to improve products, personalise content, identify trends, and support quality control. Companies may also store exhaust data for future analysis or sell it to third parties. Shoshana Zuboff has described this practice as a core mechanism of what she terms surveillance capitalism, in which behavioural data generated by users is converted into predictive products. Kosciejew notes that large quantities of often raw data are collected in this way, much of which is never analysed. == Medical exhaust data == Many medical devices — including pacemakers, dialysis machines and surgical cameras — generate exhaust data as a by-product of their operation. The majority of this data is never captured or analysed, and is typically discarded once a procedure ends or a device completes its routine monitoring cycle. The potential use of data generated by implanted devices such as pacemakers raises additional legal and ethical questions around ownership and consent. Using electronic health records for research also creates challenges because of the volume of data involved, creating a need for automated algorithms to process it. == Privacy and regulation == The collection and distribution of data exhaust is not in itself illegal in most jurisdictions, but its use raises questions of privacy and informed consent. Steps commonly taken to address these concerns include data anonymisation, offering users an opt-out from the sale of their data, and publishing explicit privacy policies that disclose what data is collected and how it is used.

Amaryllo

Amaryllo Inc. is a multinational company founded in Amsterdam, the Netherlands, and now headquartered in the United States. It operates as a cloud service platform, providing cloud storage and cloud computing solutions to enterprises and brand companies. Amaryllo began with Skype IP camera development, pioneering biometric robotic technologies, encrypted P2P network, and secure cloud storage. Amaryllo was founded by Band of Angels member, Marcus Yang to develop patents for a new type of robotic cameras that is claimed to "talk, hear, sense, recognize human faces, and track intruders". It also claims to have made the world's first security robot based on the WebRTC protocol, Icam PRO FHD, and won the 2015 CES Best of Innovation Award under Embedded Technology category. Its home security robots claim to employ 256-bit encryption and run on the WebRTC protocol. Amaryllo products are sold in over 100 Countries across 6 Continents. == History == Amaryllo revealed its first smart home security products at Internationale Funkausstellung Berlin (IFA) 2013 with a Skype-enabled IP camera called iCam HD. Amaryllo announced its second Skype-certified smart home product, iBabi HD, at CES 2014. The company was chosen as a "Cool Vendor" by Gartner in Connected Home 2014. Amaryllo introduced WebRTC-based smart home products after Microsoft terminated embedded Skype services in mid 2014. Since then, Amaryllo has been developing camera robots with auto-tracking and facial recognition technologies. Its camera robots, ATOM AR3 and ATOM AR3S, were introduced in late 2016. It focuses on wired and wireless technology based on AI services. == Cloud Service Platform == Amaryllo offers prepaid cloud storage through digital codes and gift cards, distributed via InComm Payments, Blackhawk Network, and other partners. It provides high-performance cloud computing service through Rescale partnership. Amaryllo provides free cameras under an annual cloud storage subscription on its website. == Global Supercomputing Network (GSN) == The Global Supercomputing Network (GSN) is a distributed high-performance computing (HPC) platform developed by Amaryllo. The network is designed to provide scalable Infrastructure as a Service (IaaS) by connecting a global array of data centers to offer GPU computing resources for specialized industrial and scientific applications. === Architecture and Technology === GSN operates as a decentralized distributed network of servers rather than a single centralized supercomputer. The platform integrates an artificial intelligence assistant named Genie, also developed by Amaryllo. Genie's primary function is to manage computing allocation, helping users identify and connect to available resources across the network’s various nodes based on the specific requirements of their tasks. === Services === The network primarily focuses on the rental of GPU processing resources, catering to fields that require massive parallel processing capabilities, including: Artificial Intelligence and Machine Learning: Training large language models (LLMs) and neural networks. Scientific Simulations: Executing complex calculations in physics, chemistry, and bioinformatics. Data Analytics: Processing large-scale datasets. By utilizing a rental model, GSN allows organizations to access high-end hardware without the capital expenditure associated with purchasing and maintaining physical server infrastructure. === Infrastructure and Partnerships === The network’s physical footprint is expanded through strategic partnerships with data center operators. GSN collaborates with MettaDC and Cyber DC to provide colocation services. These partnerships facilitate the deployment of Nvidia server clusters within secure, Tier-rated facilities, ensuring high availability and connectivity for GSN users. == Official Brand Licensee of HP == Amaryllo Inc. is an official licensee of HP Inc., managing both B2B and B2C cloud services under the HP brand. Through this partnership, Amaryllo offers a range of secure and scalable cloud solutions, including HP Cloud, which provides subscription and one-time payment storage for reliable data backup and storage for individuals, families, and businesses. HP Cloud employs cloud computing technologies to create smart albums for users.

G.hn

Gigabit Home Networking (G.hn) is a specification for wired home networking that supports speeds up to 2 Gbit/s and operates over four types of legacy wires: telephone wiring, coaxial cables, power lines and plastic optical fiber. Some benefits of a multi-wire standard are lower equipment development costs and lower deployment costs for service providers (by allowing customer self-install). == History == G.hn was developed under the International Telecommunication Union's Telecommunication Standardization sector (the ITU-T) and promoted by the HomeGrid Forum and several other organizations. ITU-T Recommendation (the ITU's term for standard) G.9960, which received approval on October 9, 2009, specified the physical layers and the architecture of G.hn. The Data Link Layer (Recommendation G.9961) was approved on June 11, 2010. Prominent organizations, including CEPca, HomePNA, and UPA, who were creators of some of these interfaces, rallied behind the latest version of the standard, emphasizing its potential and significance in the home networking domain. Moreover, the ITU-T extended the technology with multiple input, multiple output (MIMO) technology to increase data rates and signaling distance. This new feature was approved in March 2012 under G.9963 Recommendation. The development and promotion of G.hn have been significantly supported by the HomeGrid Forum and several other organizations. The technology was not only designed to address home-networking challenges but also found applications beyond this initial scope, showcasing its versatility and potential in the networking domain. == Technical specifications == === Technical overview === G.hn specifies a single physical layer based on fast Fourier transform (FFT) orthogonal frequency-division multiplexing (OFDM) modulation and low-density parity-check code (LDPC) forward error correction (FEC) code. G.hn includes the capability to notch specific frequency bands to avoid interference with amateur radio bands and other licensed radio services. G.hn includes mechanisms to avoid interference with legacy home networking technologies and also with other wireline systems such as VDSL2 or other types of DSL used to access the home. OFDM systems split the transmitted signal into multiple orthogonal sub-carriers. In G.hn each one of the sub-carriers is modulated using QAM. The maximum QAM constellation supported by G.hn is 4096-QAM (12-bit QAM). The G.hn media access control is based on a time division multiple access (TDMA) architecture, in which a "domain master" schedules Transmission Opportunities (TXOPs) that can be used by one or more devices in the "domain". There are two types of TXOPs: Contention-Free Transmission Opportunities (CFTXOP), which have a fixed duration and are allocated to a specific pair of transmitter and receiver. CFTXOP are used for implementing TDMA Channel Access for specific applications that require quality of service (QoS) guarantees. Shared Transmission Opportunities (STXOP), which are shared among multiple devices in the network. STXOP are divided into Time Slots (TS). There are two types of TS: Contention-Free Time Slots (CFTS), which are used for implementing "implicit" token passing Channel Access. In G.hn, a series of consecutive CFTS is allocated to a number of devices. The allocation is performed by the "domain master" and broadcast to all nodes in the network. There are pre-defined rules that specify which device can transmit after another device has finished using the channel. As all devices know "who is next", there is no need to explicitly send a "token" between devices. The process of "passing the token" is implicit and ensures that there are no collisions during Channel access. Contention-Based Time Slots (CBTS), which are used for implementing CSMA/CARP Channel Access. In general, CSMA systems cannot completely avoid collisions, so CBTS are only useful for applications that do not have strict Quality of Service requirements. ==== Optimization for each medium ==== Although most elements of G.hn are common for all three media supported by the standard (power lines, phone lines and coaxial cable), G.hn includes media-specific optimizations for each media. Some of these media-specific parameters include: OFDM Carrier Spacing: 195.31 kHz in coaxial, 48.82 kHz in phone lines, 24.41 kHz in power lines. FEC Rates: G.hn's FEC can operate with code rates 1/2, 2/3, 5/6, 16/18 and 20/21. Although these rates are not media specific, it is expected that the higher code rates will be used in cleaner media (such as coaxial) while the lower code rates will be used in noisy environments such as power lines. Automatic repeat request (ARQ) mechanisms: G.hn supports operation both with and without ARQ (re-transmission). Although this is not media specific, it is expected that ARQ-less operation is sometimes appropriate for cleaner media (such as coaxial) while ARQ operation is appropriate for noisy environments such as power lines. Power levels and frequency bands: G.hn defines different power masks for each medium. MIMO support: Recommendation G.9963 includes provisions for transmitting G.hn signals over multiple AC wires (phase, neutral, ground), if they are physically available. In July 2016, G.9963 was updated to include MIMO support over twisted pairs. ==== Security ==== G.hn uses the Advanced Encryption Standard (AES) encryption algorithm (with a 128-bit key length) using the CCMP protocol to ensure confidentiality and message integrity. Authentication and key exchange is done following ITU-T Recommendation X.1035. G.hn specifies point-to-point security inside a domain, which means that each pair of transmitter and receiver uses a unique encryption key which is not shared by other devices in the same domain. For example, if node Alice sends data to node Bob, node Eve (in the same domain as Alice and Bob) will not be able to easily eavesdrop their communication. G.hn supports the concept of relays, in which one device can receive a message from one node and deliver it to another node farther away in the same domain. Relaying becomes critical for applications with complex network topologies that need to cover large distances, such as those found in industrial or utility applications. While a relay can read the source and target addresses, it cannot read the message's content due to its body being end-to-end-encrypted. ==== Profiles ==== The G.hn architecture includes the concept of profiles. Profiles are intended to address G.hn nodes with significantly different levels of complexity. In G.hn the higher complexity profiles are proper supersets of lower complexity profiles, so that devices based on different profiles can interoperate with each other. Examples of G.hn devices based on high complexity profiles are Residential Gateways or Set-Top Boxes. Examples of G.hn devices based on low complexity profiles are home automation, home security and smart grid devices. ==== Technical parameters ==== The chart depicts a summary of the crucial technical specifications of the G.hn standard. Many of these technical elements are consistent across different physical media, with variations seen in areas such as Tone Spacing and frequency ranges. This uniformity is essential as it allows silicon manufacturers to produce a singular chip capable of implementing all three media types, leading to cost savings. Presently, G.hn chipsets are compatible with all three media types. This compatibility allows system manufacturers to create devices that can adjust to any wiring type simply by modifying a software configuration in the equipment. === Spectrum === The G.hn spectrum depends on the medium as shown in the diagram below: === Protocol stack === G.hn specifies the physical layer and the data link layer, according to the OSI model. The G.hn Data Link Layer (Recommendation G.9961) is divided into three sub-layers: The Application Protocol Convergence (APC) Layer, which accepts frames (usually in Ethernet format) from the upper layer (Application Entity) and encapsulates them into G.hn APC protocol data units (APDUs). The maximum payload of each APDU is 214 bytes. The logical link control (LLC), which is responsible for encryption, aggregation, segmentation and automatic repeat-request. This sub-layer is also responsible for "relaying" of APDUs between nodes that may not be able to communicate through a direct connection. The medium access control (MAC), which schedules channel access. The G.hn physical layer (Recommendation G.9960) is divided into three sub-layers: The Physical Coding Sub-layer (PCS), responsible for generating PHY headers. The Physical Medium Attachment (PMA), responsible for scrambling and forward error correction coding/decoding. The Physical Medium Dependent (PMD), responsible for bit-loading and OFDM modulation. The interface between the Application Entity and the Data Link Layer is called A-interface. The interface between the Data Link Layer and the ph