Powerset construction

Powerset construction

In the theory of computation and automata theory, the powerset construction or subset construction is a standard method for converting a nondeterministic finite automaton (NFA) into a deterministic finite automaton (DFA) that recognizes the same formal language. It is important in theory because it establishes that NFAs, despite their additional flexibility, are unable to recognize any language that cannot be recognized by some DFA. It is also important in practice for converting easier-to-construct NFAs into more efficiently executable DFAs. However, if the NFA has n states, the resulting DFA may have up to 2n states, an exponentially larger number, which sometimes makes the construction impractical for large NFAs. The construction, sometimes called the Rabin–Scott powerset construction (or subset construction) to distinguish it from similar constructions for other types of automata, was first published by Michael O. Rabin and Dana Scott in 1959. == Intuition == To simulate the operation of a DFA on a given input string, one needs to keep track of a single state at any time: the state that the automaton will reach after seeing a prefix of the input. In contrast, to simulate an NFA, one needs to keep track of a set of states: all of the states that the automaton could reach after seeing the same prefix of the input, according to the nondeterministic choices made by the automaton. If, after a certain prefix of the input, a set S of states can be reached, then after the next input symbol x the set of reachable states is a deterministic function of S and x. Therefore, the sets of reachable NFA states play the same role in the NFA simulation as single DFA states play in the DFA simulation, and in fact the sets of NFA states appearing in this simulation may be re-interpreted as being states of a DFA. == Construction == The powerset construction applies most directly to an NFA that does not allow state transformations without consuming input symbols (aka: "ε-moves"). Such an automaton may be defined as a 5-tuple (Q, Σ, T, q0, F), in which Q is the set of states, Σ is the set of input symbols, T is the transition function (mapping a state and an input symbol to a set of states), q0 is the initial state, and F is the set of accepting states. The corresponding DFA has states corresponding to subsets of Q. The initial state of the DFA is {q0}, the (one-element) set of initial states. The transition function of the DFA maps a state S (representing a subset of Q) and an input symbol x to the set T(S,x) = ∪{T(q,x) | q ∈ S}, the set of all states that can be reached by an x-transition from a state in S. A state S of the DFA is an accepting state if and only if at least one member of S is an accepting state of the NFA. In the simplest version of the powerset construction, the set of all states of the DFA is the powerset of Q, the set of all possible subsets of Q. However, many states of the resulting DFA may be useless as they may be unreachable from the initial state. An alternative version of the construction creates only the states that are actually reachable. === NFA with ε-moves === For an NFA with ε-moves (also called an ε-NFA), the construction must be modified to deal with these by computing the ε-closure of states: the set of all states reachable from some given state using only ε-moves. Van Noord recognizes three possible ways of incorporating this closure computation in the powerset construction: Compute the ε-closure of the entire automaton as a preprocessing step, producing an equivalent NFA without ε-moves, then apply the regular powerset construction. This version, also discussed by Hopcroft and Ullman, is straightforward to implement, but impractical for automata with large numbers of ε-moves, as commonly arise in natural language processing application. During the powerset computation, compute the ε-closure { q ′ | q → ε ∗ q ′ } {\displaystyle \{q'~|~q\to _{\varepsilon }^{}q'\}} of each state q that is considered by the algorithm (and cache the result). During the powerset computation, compute the ε-closure { q ′ | ∃ q ∈ Q ′ , q → ε ∗ q ′ } {\displaystyle \{q'~|~\exists q\in Q',q\to _{\varepsilon }^{}q'\}} of each subset of states Q' that is considered by the algorithm, and add its elements to Q'. === Multiple initial states === If NFAs are defined to allow for multiple initial states, the initial state of the corresponding DFA is the set of all initial states of the NFA, or (if the NFA also has ε-moves) the set of all states reachable from initial states by ε-moves. == Example == The NFA below has four states; state 1 is initial, and states 3 and 4 are accepting. Its alphabet consists of the two symbols 0 and 1, and it has ε-moves. The initial state of the DFA constructed from this NFA is the set of all NFA states that are reachable from state 1 by ε-moves; that is, it is the set {1,2,3}. A transition from {1,2,3} by input symbol 0 must follow either the arrow from state 1 to state 2, or the arrow from state 3 to state 4. Additionally, neither state 2 nor state 4 have outgoing ε-moves. Therefore, T({1,2,3},0) = {2,4}, and by the same reasoning the full DFA constructed from the NFA is as shown below. As can be seen in this example, there are five states reachable from the start state of the DFA; the remaining 11 sets in the powerset of the set of NFA states are not reachable. == Complexity == Because the DFA states consist of sets of NFA states, an n-state NFA may be converted to a DFA with at most 2n states. For every n, there exist n-state NFAs such that every subset of states is reachable from the initial subset, so that the converted DFA has exactly 2n states, giving Θ(2n) worst-case time complexity. A simple example requiring nearly this many states is the language of strings over the alphabet {0,1} in which there are at least n characters, the nth from last of which is 1. It can be represented by an (n + 1)-state NFA, but it requires 2n DFA states, one for each n-character suffix of the input; cf. picture for n=4. == Applications == Brzozowski's algorithm for DFA minimization uses the powerset construction, twice. It converts the input DFA into an NFA for the reverse language, by reversing all its arrows and exchanging the roles of initial and accepting states, converts the NFA back into a DFA using the powerset construction, and then repeats its process. Its worst-case complexity is exponential, unlike some other known DFA minimization algorithms, but in many examples it performs more quickly than its worst-case complexity would suggest. Safra's construction, which converts a non-deterministic Büchi automaton with n states into a deterministic Muller automaton or into a deterministic Rabin automaton with 2O(n log n) states, uses the powerset construction as part of its machinery.

Topological deep learning

Topological deep learning (TDL) is a research field that extends deep learning to handle complex, non-Euclidean data structures. Traditional deep learning models, such as convolutional neural networks (CNNs) and recurrent neural networks (RNNs), excel in processing data on regular grids and sequences. However, scientific and real-world data often exhibit more intricate data domains encountered in scientific computations, including point clouds, meshes, time series, scalar fields graphs, or general topological spaces like simplicial complexes and CW complexes. TDL addresses this by incorporating topological concepts to process data with higher-order relationships, such as interactions among multiple entities and complex hierarchies. This approach leverages structures like simplicial complexes and hypergraphs to capture global dependencies and qualitative spatial properties, offering a more nuanced representation of data. TDL also encompasses methods from computational and algebraic topology that permit studying properties of neural networks and their training process, such as their predictive performance or generalization properties. The mathematical foundations of TDL are algebraic topology, differential topology, and geometric topology. Therefore, TDL can be generalized for data on differentiable manifolds, knots, links, tangles, curves, etc. == History and motivation == Traditional techniques from deep learning often operate under the assumption that a dataset is residing in a highly-structured space (like images, where convolutional neural networks exhibit outstanding performance over alternative methods) or a Euclidean space. The prevalence of new types of data, in particular graphs, meshes, and molecules, resulted in the development of new techniques, culminating in the field of geometric deep learning, which originally proposed a signal-processing perspective for treating such data types. While originally confined to graphs, where connectivity is defined based on nodes and edges, follow-up work extended concepts to a larger variety of data types, including simplicial complexes and CW complexes, with recent work proposing a unified perspective of message-passing on general combinatorial complexes. An independent perspective on different types of data originated from topological data analysis, which proposed a new framework for describing structural information of data, i.e., their "shape," that is inherently aware of multiple scales in data, ranging from local information to global information. While at first restricted to smaller datasets, subsequent work developed new descriptors that efficiently summarized topological information of datasets to make them available for traditional machine-learning techniques, such as support vector machines or random forests. Such descriptors ranged from new techniques for feature engineering over new ways of providing suitable coordinates for topological descriptors, or the creation of more efficient dissimilarity measures. Contemporary research in this field is largely concerned with either integrating information about the underlying data topology into existing deep-learning models or obtaining novel ways of training on topological domains. == Learning on topological spaces == One of the core concepts in topological deep learning is considering the domain upon which this data is defined and supported. In case of Euclidean data, such as images, this domain is a grid, upon which the pixel value of the image is supported. In a more general setting this domain might be a topological domain. Studying and developing deep learning models that are supported ln topological domains constitute the essence of topological deep learning. Next, we introduce the most common topological domains that are encountered in a deep learning setting. These domains include, but not limited to, graphs, simplicial complexes, cell complexes, combinatorial complexes and hypergraphs. Given a finite set S of abstract entities, a neighborhood function N {\displaystyle {\mathcal {N}}} on S is an assignment that attach to every point x {\displaystyle x} in S a subset of S or a relation. Such a function can be induced by equipping S with an auxiliary structure. Edges provide one way of defining relations among the entities of S. More specifically, edges in a graph allow one to define the notion of neighborhood using, for instance, the one hop neighborhood notion. Edges however, limited in their modeling capacity as they can only be used to model binary relations among entities of S since every edge is connected typically to two entities. In many applications, it is desirable to permit relations that incorporate more than two entities. The idea of using relations that involve more than two entities is central to topological domains. Such higher-order relations allow for a broader range of neighborhood functions to be defined on S to capture multi-way interactions among entities of S. Next we review the main properties, advantages, and disadvantages of some commonly studied topological domains in the context of deep learning, including (abstract) simplicial complexes, regular cell complexes, hypergraphs, and combinatorial complexes. ==== Comparisons among topological domains ==== Each of the enumerated topological domains has its own characteristics, advantages, and limitations: Simplicial complexes Simplest form of higher-order domains. Extensions of graph-based models. Admit hierarchical structures, making them suitable for various applications. Hodge theory can be naturally defined on simplicial complexes. Require relations to be subsets of larger relations, imposing constraints on the structure. Cell Complexes Generalize simplicial complexes. Provide more flexibility in defining higher-order relations. Each cell in a cell complex is homeomorphic to an open ball, attached together via attaching maps. Boundary cells of each cell in a cell complex are also cells in the complex. Represented combinatorially via incidence matrices. Hypergraphs Allow arbitrary set-type relations among entities. Relations are not imposed by other relations, providing more flexibility. Do not explicitly encode the dimension of cells or relations. Useful when relations in the data do not adhere to constraints imposed by other models like simplicial and cell complexes. Combinatorial Complexes : Generalize and bridge the gaps between simplicial complexes, cell complexes, and hypergraphs. Allow for hierarchical structures and set-type relations. Combine features of other complexes while providing more flexibility in modeling relations. Can be represented combinatorially, similar to cell complexes. ==== Hierarchical structure and set-type relations ==== The properties of simplicial complexes, cell complexes, and hypergraphs give rise to two main features of relations on higher-order domains, namely hierarchies of relations and set-type relations. ===== Rank function ===== A rank function on a higher-order domain X is an order-preserving function rk: X → Z, where rk(x) attaches a non-negative integer value to each relation x in X, preserving set inclusion in X. Cell and simplicial complexes are common examples of higher-order domains equipped with rank functions and therefore with hierarchies of relations. ===== Set-type relations ===== Relations in a higher-order domain are called set-type relations if the existence of a relation is not implied by another relation in the domain. Hypergraphs constitute examples of higher-order domains equipped with set-type relations. Given the modeling limitations of simplicial complexes, cell complexes, and hypergraphs, we develop the combinatorial complex, a higher-order domain that features both hierarchies of relations and set-type relations. The learning tasks in TDL can be broadly classified into three categories: Cell classification: Predict targets for each cell in a complex. Examples include triangular mesh segmentation, where the task is to predict the class of each face or edge in a given mesh. Complex classification: Predict targets for an entire complex. For example, predict the class of each input mesh. Cell prediction: Predict properties of cell-cell interactions in a complex, and in some cases, predict whether a cell exists in the complex. An example is the prediction of linkages among entities in hyperedges of a hypergraph. In practice, to perform the aforementioned tasks, deep learning models designed for specific topological spaces must be constructed and implemented. These models, known as topological neural networks, are tailored to operate effectively within these spaces. === Topological neural networks === Central to TDL are topological neural networks (TNNs), specialized architectures designed to operate on data structured in topological domains. Unlike traditional neural networks tailored for grid-like structures, TNNs are adept at handling more intricate data representations, such as graphs

Hybrid cryptosystem

In cryptography, a hybrid cryptosystem is one which combines the convenience of a public-key cryptosystem with the efficiency of a symmetric-key cryptosystem. Public-key cryptosystems are convenient in that they do not require the sender and receiver to share a common secret in order to communicate securely. However, they often rely on complicated mathematical computations and are thus generally much more inefficient than comparable symmetric-key cryptosystems. In many applications, the high cost of encrypting long messages in a public-key cryptosystem can be prohibitive. This is addressed by hybrid systems by using a combination of both. A hybrid cryptosystem can be constructed using any two separate cryptosystems: a key encapsulation mechanism, which is a public-key cryptosystem a data encapsulation scheme, which is a symmetric-key cryptosystem The hybrid cryptosystem is itself a public-key system, whose public and private keys are the same as in the key encapsulation scheme. Note that for very long messages the bulk of the work in encryption/decryption is done by the more efficient symmetric-key scheme, while the inefficient public-key scheme is used only to encrypt/decrypt a short key value. == Implementations and standards == All practical implementations of public key cryptography today employ a hybrid system. Examples include the TLS protocol and the SSH protocol, that use a public-key mechanism for key exchange (such as Diffie-Hellman) and a symmetric-key mechanism for data encapsulation (such as AES). The OpenPGP file format and the PKCS#7 file format are other examples. Hybrid Public Key Encryption (HPKE, published as RFC 9180) is a modern standard for generic hybrid encryption. HPKE is used within multiple IETF protocols, including Messaging Layer Security (MLS), Oblivious DNS over HTTPS, Oblivious HTTP, Privacy Preserving Measurement, and TLS Encrypted Client Hello. Envelope encryption is an example of a usage of hybrid cryptosystems in cloud computing. In a cloud context, hybrid cryptosystems also enable centralized key management. == Example == To encrypt a message addressed to Alice in a hybrid cryptosystem, Bob does the following: Obtains Alice's public key. Generates a fresh symmetric key for the data encapsulation scheme. Encrypts the message under the data encapsulation scheme, using the symmetric key just generated. Encrypts the symmetric key under the key encapsulation scheme, using Alice's public key. Sends both of these ciphertexts to Alice. To decrypt this hybrid ciphertext, Alice does the following: Uses her private key to decrypt the symmetric key contained in the key encapsulation segment. Uses this symmetric key to decrypt the message contained in the data encapsulation segment. == Security == If both the key encapsulation and data encapsulation schemes in a hybrid cryptosystem are secure against adaptive chosen ciphertext attacks, then the hybrid scheme inherits that property as well. However, it is possible to construct a hybrid scheme secure against adaptive chosen ciphertext attacks even if the key encapsulation has a slightly weakened security definition (though the security of the data encapsulation must be slightly stronger). == Envelope encryption == Envelope encryption is term used for encrypting with a hybrid cryptosystem used by all major cloud service providers, often as part of a centralized key management system in cloud computing. Envelope encryption gives names to the keys used in hybrid encryption: Data Encryption Keys (abbreviated DEK, and used to encrypt data) and Key Encryption Keys (abbreviated KEK, and used to encrypt the DEKs). In a cloud environment, encryption with envelope encryption involves generating a DEK locally, encrypting one's data using the DEK, and then issuing a request to wrap (encrypt) the DEK with a KEK stored in a potentially more secure service. Then, this wrapped DEK and encrypted message constitute a ciphertext for the scheme. To decrypt a ciphertext, the wrapped DEK is unwrapped (decrypted) via a call to a service, and then the unwrapped DEK is used to decrypt the encrypted message. In addition to the normal advantages of a hybrid cryptosystem, using asymmetric encryption for the KEK in a cloud context provides easier key management and separation of roles, but can be slower. In cloud systems, such as Google Cloud Platform and Amazon Web Services, a key management system (KMS) can be available as a service. In some cases, the key management system will store keys in hardware security modules, which are hardware systems that protect keys with hardware features like intrusion resistance. This means that KEKs can also be more secure because they are stored on secure specialized hardware. Envelope encryption makes centralized key management easier because a centralized key management system only needs to store KEKs, which occupy less space, and requests to the KMS only involve sending wrapped and unwrapped DEKs, which use less bandwidth than transmitting entire messages. Since one KEK can be used to encrypt many DEKs, this also allows for less storage space to be used in the KMS. This also allows for centralized auditing and access control at one point of access.

Cryptochannel

In telecommunications, a cryptochannel is a complete system of crypto-communications between two or more holders or parties. It includes: (a) the cryptographic aids prescribed; (b) the holders thereof; (c) the indicators or other means of identification; (d) the area or areas in which effective; (e) the special purpose, if any, for which provided; and (f) pertinent notes as to distribution, usage, etc. A cryptochannel is analogous to a radio circuit.

Undeniable signature

An undeniable signature is a digital signature scheme which allows the signer to be selective to whom they allow to verify signatures. The scheme adds explicit signature repudiation, preventing a signer later refusing to verify a signature by omission; a situation that would devalue the signature in the eyes of the verifier. It was invented by David Chaum and Hans van Antwerpen in 1989. == Overview == In this scheme, a signer possessing a private key can publish a signature of a message. However, the signature reveals nothing to a recipient/verifier of the message and signature without taking part in either of two interactive protocols: Confirmation protocol, which confirms that a candidate is a valid signature of the message issued by the signer, identified by the public key. Disavowal protocol, which confirms that a candidate is not a valid signature of the message issued by the signer. The motivation for the scheme is to allow the signer to choose to whom signatures are verified. However, that the signer might claim the signature is invalid at any later point, by refusing to take part in verification, would devalue signatures to verifiers. The disavowal protocol distinguishes these cases removing the signer's plausible deniability. It is important that the confirmation and disavowal exchanges are not transferable. They achieve this by having the property of zero-knowledge; both parties can create transcripts of both confirmation and disavowal that are indistinguishable, to a third-party, of correct exchanges. The designated verifier signature scheme improves upon deniable signatures by allowing, for each signature, the interactive portion of the scheme to be offloaded onto another party, a designated verifier, reducing the burden on the signer. == Zero-knowledge protocol == The following protocol was suggested by David Chaum. A group, G, is chosen in which the discrete logarithm problem is intractable, and all operation in the scheme take place in this group. Commonly, this will be the finite cyclic group of order p contained in Z/nZ, with p being a large prime number; this group is equipped with the group operation of integer multiplication modulo n. An arbitrary primitive element (or generator), g, of G is chosen; computed powers of g then combine obeying fixed axioms. Alice generates a key pair, randomly chooses a private key, x, and then derives and publishes the public key, y = gx. === Message signing === Alice signs the message, m, by computing and publishing the signature, z = mx. === Confirmation (i.e., avowal) protocol === Bob wishes to verify the signature, z, of m by Alice under the key, y. Bob picks two random numbers: a and b, and uses them to blind the message, sending to Alice: c = magb. Alice picks a random number, q, uses it to blind, c, and then signing this using her private key, x, sending to Bob: s1 = cgq ands2 = s1x. Note that s1x = (cgq)x = (magb)xgqx = (mx)a(gx)b+q = zayb+q. Bob reveals a and b. Alice verifies that a and b are the correct blind values, then, if so, reveals q. Revealing these blinds makes the exchange zero knowledge. Bob verifies s1 = cgq, proving q has not been chosen dishonestly, and s2 = zayb+q, proving z is valid signature issued by Alice's key. Note that zayb+q = (mx)a(gx)b+q. Alice can cheat at step 2 by attempting to randomly guess s2. === Disavowal protocol === Alice wishes to convince Bob that z is not a valid signature of m under the key, gx; i.e., z ≠ mx. Alice and Bob have agreed an integer, k, which sets the computational burden on Alice and the likelihood that she should succeed by chance. Bob picks random values, s ∈ {0, 1, ..., k} and a, and sends: v1 = msga and v2 = zsya, where exponentiating by a is used to blind the sent values. Note that v2 = zsya = (mx)s(gx)a = v1x. Alice, using her private key, computes v1x and then the quotient, v1xv2−1 = (msga)x(zsgxa)−1 = msxz−s = (mxz−1)s. Thus, v1xv2−1 = 1, unless z ≠ mx. Alice then tests v1xv2−1 for equality against the values: (mxz−1)i for i ∈ {0, 1, …, k}; which are calculated by repeated multiplication of mxz−1 (rather than exponentiating for each i). If the test succeeds, Alice conjectures the relevant i to be s; otherwise, she conjectures random value. Where z = mx, (mxz−1)i = v1xv2−1 = 1 for all i, s is unrecoverable. Alice commits to i: she picks a random r and sends hash(r, i) to Bob. Bob reveals a. Alice confirms that a is the correct blind (i.e., v1 and v2 can be generated using it), then, if so, reveals r. Revealing these blinds makes the exchange zero knowledge. Bob checks hash(r, i) = hash(r, s), proving Alice knows s, hence z ≠ mx. If Alice attempts to cheat at step 3 by guessing s at random, the probability of succeeding is 1/(k + 1). So, if k = 1023 and the protocol is conducted ten times, her chances are 1 to 2100.

Microsoft Support Diagnostic Tool

The Microsoft Support Diagnostic Tool (MSDT) is a legacy service in Microsoft Windows that allows Microsoft technical support agents to analyze diagnostic data remotely for troubleshooting purposes. In April 2022 it was observed to have a security vulnerability that allowed remote code execution which was being exploited to attack computers in Russia and Belarus, and later against the Tibetan government in exile. Microsoft advised a temporary workaround of disabling the MSDT by editing the Windows registry. == Use == When contacting support the user is told to run MSDT and given a unique "passkey" which they enter. They are also given an "incident number" to uniquely identify their case. The MSDT can also be run offline which will generate a .CAB file which can be uploaded from a computer with an internet connection. == Security vulnerabilities == === Follina === Follina is the name given to a remote code execution (RCE) vulnerability, a type of arbitrary code execution (ACE) exploit, in the Microsoft Support Diagnostic Tool (MSDT) which was first widely publicized on May 27, 2022, by a security research group called Nao Sec. This exploit allows a remote attacker to use a Microsoft Office document template to execute code via MSDT. This works by exploiting the ability of Microsoft Office document templates to download additional content from a remote server. If the size of the downloaded content is large enough it causes a buffer overflow allowing a payload of Powershell code to be executed without explicit notification to the user. On May 30 Microsoft issued CVE-2022-30190 with guidance that users should disable MSDT. Malicious actors have been observed exploiting the bug to attack computers in Russia and Belarus since April, and it is believed Chinese state actors had been exploiting it to attack the Tibetan government in exile based in India. Microsoft patched this vulnerability in its June 2022 patches. === DogWalk === The DogWalk vulnerability is a remote code execution (RCE) vulnerability in the Microsoft Support Diagnostic Tool (MSDT). It was first reported in January 2020, but Microsoft initially did not consider it to be a security issue. However, the vulnerability was later exploited in the wild, and Microsoft released a patch for it in August 2022. The vulnerability is caused by a path traversal vulnerability in the sdiageng.dll library. This vulnerability allows an attacker to trick a victim into opening a malicious diagcab file, which is a type of Windows cabinet file that is used to store support files. When the diagcab file is opened, it triggers the MSDT tool, which then executes the malicious code. Originally discovered by Mitja Kolsek, the DogWalk vulnerability is caused by a path traversal vulnerability in the sdiageng.dll library. This vulnerability allows an attacker to trick a victim into opening a malicious diagcab file, which is a type of Windows cabinet file that is used to store support files. When the diagcab file is opened, it triggers the MSDT tool, which then executes the malicious code. The vulnerability is exploited by creating a malicious diagcab file that contains a specially crafted path. This path contains a sequence of characters that is designed to exploit the path traversal vulnerability in the sdiageng.dll library. When the diagcab file is opened, the MSDT tool will attempt to follow the path. However, the path will contain characters that are not valid for a Windows path. This will cause the MSDT tool to crash. When the MSDT tool crashes, it will generate a memory dump. This memory dump will contain the malicious code that was executed by the MSDT tool. The attacker can then use this memory dump to extract the malicious code and execute it on their own computer. == Retirement == Microsoft will no longer be supporting the Windows legacy inbox Troubleshooters. In 2025, Microsoft will remove the MSDT platform entirely. Get Help is the replacement tool. == Windows versions == Windows 7 Windows 8.1 Windows 10 Windows 11 (up to 22H2) Future versions and feature upgrades will deprecate the MSDT after May 23, 2023.

Cipher device

A cipher device was a term used by the US military in the first half of the 20th century to describe a manually operated cipher equipment that converted the plaintext into ciphertext or vice versa. A similar term, cipher machine, was used to describe the cipher equipment that required external power for operation. Cipher box or crypto box is a physical cryptographic device used to encrypt and decrypt messages between plaintext (unencrypted) and ciphertext (encrypted or secret) forms. The ciphertext is suitable for transmission over a channel, such as radio, that might be observed by an adversary the communicating parties wish to conceal the plaintext from.