AI Assistant Stock Trading

AI Assistant Stock Trading — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Algorithmic inference

    Algorithmic inference

    Algorithmic inference gathers new developments in the statistical inference methods made feasible by the powerful computing devices widely available to any data analyst. Cornerstones in this field are computational learning theory, granular computing, bioinformatics, and, long ago, structural probability (Fraser 1966). The main focus is on the algorithms which compute statistics rooting the study of a random phenomenon, along with the amount of data they must feed on to produce reliable results. This shifts the interest of mathematicians from the study of the distribution laws to the functional properties of the statistics, and the interest of computer scientists from the algorithms for processing data to the information they process. == The Fisher parametric inference problem == Concerning the identification of the parameters of a distribution law, the mature reader may recall lengthy disputes in the mid 20th century about the interpretation of their variability in terms of fiducial distribution (Fisher 1956), structural probabilities (Fraser 1966), priors/posteriors (Ramsey 1925), and so on. From an epistemology viewpoint, this entailed a companion dispute as to the nature of probability: is it a physical feature of phenomena to be described through random variables or a way of synthesizing data about a phenomenon? Opting for the latter, Fisher defines a fiducial distribution law of parameters of a given random variable that he deduces from a sample of its specifications. With this law he computes, for instance "the probability that μ (mean of a Gaussian variable – omeur note) is less than any assigned value, or the probability that it lies between any assigned values, or, in short, its probability distribution, in the light of the sample observed". == The classic solution == Fisher fought hard to defend the difference and superiority of his notion of parameter distribution in comparison to analogous notions, such as Bayes' posterior distribution, Fraser's constructive probability and Neyman's confidence intervals. For half a century, Neyman's confidence intervals won out for all practical purposes, crediting the phenomenological nature of probability. With this perspective, when you deal with a Gaussian variable, its mean μ is fixed by the physical features of the phenomenon you are observing, where the observations are random operators, hence the observed values are specifications of a random sample. Because of their randomness, you may compute from the sample specific intervals containing the fixed μ with a given probability that you denote confidence. === Example === Let X be a Gaussian variable with parameters μ {\displaystyle \mu } and σ 2 {\displaystyle \sigma ^{2}} and { X 1 , … , X m } {\displaystyle \{X_{1},\ldots ,X_{m}\}} a sample drawn from it. Working with statistics S μ = ∑ i = 1 m X i {\displaystyle S_{\mu }=\sum _{i=1}^{m}X_{i}} and S σ 2 = ∑ i = 1 m ( X i − X ¯ ) 2 , where X ¯ = S μ m {\displaystyle S_{\sigma ^{2}}=\sum _{i=1}^{m}(X_{i}-{\overline {X}})^{2},{\text{ where }}{\overline {X}}={\frac {S_{\mu }}{m}}} is the sample mean, we recognize that T = S μ − m μ S σ 2 m − 1 m = X ¯ − μ S σ 2 / ( m ( m − 1 ) ) {\displaystyle T={\frac {S_{\mu }-m\mu }{\sqrt {S_{\sigma ^{2}}}}}{\sqrt {\frac {m-1}{m}}}={\frac {{\overline {X}}-\mu }{\sqrt {S_{\sigma ^{2}}/(m(m-1))}}}} follows a Student's t distribution (Wilks 1962) with parameter (degrees of freedom) m − 1, so that f T ( t ) = Γ ( m / 2 ) Γ ( ( m − 1 ) / 2 ) 1 π ( m − 1 ) ( 1 + t 2 m − 1 ) m / 2 . {\displaystyle f_{T}(t)={\frac {\Gamma (m/2)}{\Gamma ((m-1)/2)}}{\frac {1}{\sqrt {\pi (m-1)}}}\left(1+{\frac {t^{2}}{m-1}}\right)^{m/2}.} Gauging T between two quantiles and inverting its expression as a function of μ {\displaystyle \mu } you obtain confidence intervals for μ {\displaystyle \mu } . With the sample specification: x = { 7.14 , 6.3 , 3.9 , 6.46 , 0.2 , 2.94 , 4.14 , 4.69 , 6.02 , 1.58 } {\displaystyle \mathbf {x} =\{7.14,6.3,3.9,6.46,0.2,2.94,4.14,4.69,6.02,1.58\}} having size m = 10, you compute the statistics s μ = 43.37 {\displaystyle s_{\mu }=43.37} and s σ 2 = 46.07 {\displaystyle s_{\sigma ^{2}}=46.07} , and obtain a 0.90 confidence interval for μ {\displaystyle \mu } with extremes (3.03, 5.65). == Inferring functions with the help of a computer == From a modeling perspective the entire dispute looks like a chicken-egg dilemma: either fixed data by first and probability distribution of their properties as a consequence, or fixed properties by first and probability distribution of the observed data as a corollary. The classic solution has one benefit and one drawback. The former was appreciated particularly back when people still did computations with sheet and pencil. Per se, the task of computing a Neyman confidence interval for the fixed parameter θ is hard: you do not know θ, but you look for disposing around it an interval with a possibly very low probability of failing. The analytical solution is allowed for a very limited number of theoretical cases. Vice versa a large variety of instances may be quickly solved in an approximate way via the central limit theorem in terms of confidence interval around a Gaussian distribution – that's the benefit. The drawback is that the central limit theorem is applicable when the sample size is sufficiently large. Therefore, it is less and less applicable with the sample involved in modern inference instances. The fault is not in the sample size on its own part. Rather, this size is not sufficiently large because of the complexity of the inference problem. With the availability of large computing facilities, scientists refocused from isolated parameters inference to complex functions inference, i.e. re sets of highly nested parameters identifying functions. In these cases we speak about learning of functions (in terms for instance of regression, neuro-fuzzy system or computational learning) on the basis of highly informative samples. A first effect of having a complex structure linking data is the reduction of the number of sample degrees of freedom, i.e. the burning of a part of sample points, so that the effective sample size to be considered in the central limit theorem is too small. Focusing on the sample size ensuring a limited learning error with a given confidence level, the consequence is that the lower bound on this size grows with complexity indices such as VC dimension or detail of a class to which the function we want to learn belongs. === Example === A sample of 1,000 independent bits is enough to ensure an absolute error of at most 0.081 on the estimation of the parameter p of the underlying Bernoulli variable with a confidence of at least 0.99. The same size cannot guarantee a threshold less than 0.088 with the same confidence 0.99 when the error is identified with the probability that a 20-year-old man living in New York does not fit the ranges of height, weight and waistline observed on 1,000 Big Apple inhabitants. The accuracy shortage occurs because both the VC dimension and the detail of the class of parallelepipeds, among which the one observed from the 1,000 inhabitants' ranges falls, are equal to 6. == The general inversion problem solving the Fisher question == With insufficiently large samples, the approach: fixed sample – random properties suggests inference procedures in three steps: === Definition === For a random variable and a sample drawn from it a compatible distribution is a distribution having the same sampling mechanism M X = ( Z , g θ ) {\displaystyle {\mathcal {M}}_{X}=(Z,g_{\boldsymbol {\theta }})} of X with a value θ {\displaystyle {\boldsymbol {\theta }}} of the random parameter Θ {\displaystyle \mathbf {\Theta } } derived from a master equation rooted on a well-behaved statistic s. === Example === You may find the distribution law of the Pareto parameters A and K as an implementation example of the population bootstrap method as in the figure on the left. Implementing the twisting argument method, you get the distribution law F M ( μ ) {\displaystyle F_{M}(\mu )} of the mean M of a Gaussian variable X on the basis of the statistic s M = ∑ i = 1 m x i {\textstyle s_{M}=\sum _{i=1}^{m}x_{i}} when Σ 2 {\displaystyle \Sigma ^{2}} is known to be equal to σ 2 {\displaystyle \sigma ^{2}} (Apolloni, Malchiodi & Gaito 2006). Its expression is: F M ( μ ) = Φ ( m μ − s M σ m ) , {\displaystyle F_{M}(\mu )=\Phi {\left({\frac {m\mu -s_{M}}{\sigma {\sqrt {m}}}}\right)},} shown in the figure on the right, where Φ {\displaystyle \Phi } is the cumulative distribution function of a standard normal distribution. Computing a confidence interval for M given its distribution function is straightforward: we need only find two quantiles (for instance δ / 2 {\displaystyle \delta /2} and 1 − δ / 2 {\displaystyle 1-\delta /2} quantiles in case we are interested in a confidence interval of level δ symmetric in the tail's probabilities) as indicated on the left in the diagram showing the behavior of

    Read more →
  • Data governance

    Data governance

    Data governance is a term used on both a macro and a micro level. The former is a political concept and forms part of international relations and Internet governance; the latter is a data management concept and forms part of corporate/organizational data governance. Data governance involves delegating authority over data and exercising that authority through decision-making processes. It plays a role in enhancing the value of data assets. == Macro level == Data governance at the macro level involves regulating cross-border data flows among countries, which is more precisely termed international data governance. This field was first formed in the early 2000s, and consists of "norms, principles and rules governing various types of data." There have been several international groups established by research organizations that aim to grant access to their data. These groups that enable an exchange of data are, as a result, exposed to domestic and international legal interpretations that ultimately decide how data is used. However, as of 2023, there are no international laws or agreements specifically focused on data protection. == Data governance (Data Management) == Data governance is the set of principles, policies, and processes that guide the effective and responsible use of data within an organization. It creates a framework for decision making, accountability, and oversight across the data lifecycle, from creation and storage to sharing and disposal. Data governance is closely linked with data management, which provides the practical methods to carry out governance objectives. These methods include data quality assurance, metadata management, master data management, security controls, and compliance monitoring. Together, governance and management aim to maximize the value of data as a strategic asset, reduce risks from misuse or inaccuracy, and ensure compliance with regulatory, ethical, and business requirements. The importance of this discipline has grown with the rise of big data, cloud computing, and artificial intelligence, where consistent standards and stewardship are essential for privacy protection, interoperability, and informed decision making. == Data governance drivers == While data governance initiatives can be driven by a desire to improve data quality, they are often driven by C-level leaders responding to external regulations. In a recent report conducted by the CIO WaterCooler community, 54% stated the key driver was efficiencies in processes; 39% - regulatory requirements; and only 7% customer service. Examples of these regulations include Sarbanes–Oxley Act, Basel I, Basel II, HIPAA, GDPR, cGMP, and a number of data privacy regulations. To achieve compliance with these regulations, business processes and controls require formal management processes to govern the data subject to these regulations. Successful programs identify drivers that are meaningful to both supervisory and executive leadership. Common themes among the external regulations center on the need to manage risk. The risks can be financial misstatement, inadvertent release of sensitive data, or poor data quality for key decisions. Methods to manage these risks vary from industry to industry. Examples of commonly referenced best practices and guidelines include COBIT, ISO/IEC 38500, and others. The proliferation of regulations and standards creates challenges for data governance professionals, particularly when multiple regulations overlap the data being managed. Organizations often launch data governance initiatives to address these challenges. == Data governance initiatives (Dimensions) == Data governance initiatives improve the quality of data by assigning a team responsible for data's accuracy, completeness, consistency, timeliness, validity, and uniqueness. This team usually consists of executive leadership, project management, line-of-business managers, and data stewards. The team usually employs a methodology for tracking and improving enterprise data, such as Six Sigma, and tools for data mapping, profiling, cleansing, and monitoring data. Data governance initiatives may be aimed at achieving a number of objectives including offering better visibility to internal and external customers (such as supply chain management), compliance with regulatory law, improving operations after rapid company growth or corporate mergers, or to aid the efficiency of enterprise knowledge workers by reducing confusion and error and increasing their scope of knowledge. Many data governance initiatives are also inspired by past attempts to fix information quality at the departmental level, which can lead to incongruent and redundant data quality processes. Most large companies have many applications and databases that can not easily share information. Therefore, knowledge workers within large organizations may not have access to the data they need to best do their jobs. When they do have access to the data, the data quality may be poor. By setting up a data governance practice or corporate data authority (individual or area responsible for determining how to proceed, in the best interest of the business, when a data issue arises), these problems can be mitigated. == Implementation == Implementation of a data governance initiative may vary in scope as well as origin. Sometimes, an executive mandate will arise to initiate an enterprise-wide effort. Sometimes the mandate will be to create a pilot project or projects, limited in scope and objectives, aimed at either resolving existing issues or demonstrating value. Sometimes, an initiative originates from lower down in the organization's hierarchy and will be deployed in a limited scope to demonstrate value to potential sponsors higher up in the organization. The initial scope of an implementation can vary greatly as well, from review of a one-off IT system to a cross-organization initiative. == Data governance tools == Leaders of successful data governance programs declared at the Data Governance Conference in Orlando, FL, in December 2006, that data governance is about 80 to 95 percent communication. That stated, it is a given that many of the objectives of a data governance program must be accomplished with appropriate tools. Many vendors are now positioning their products as data governance tools. Due to the different focus areas of various data governance initiatives, a given tool may or may not be appropriate. Additionally, many tools that are not marketed as governance tools address governance needs and demands.

    Read more →
  • Torus interconnect

    Torus interconnect

    A torus interconnect is a switch-less network topology for connecting processing nodes in a parallel computer system. == Introduction == In geometry, a torus is created by revolving a circle about an axis coplanar to the circle. While this is a general definition in geometry, the topological properties of this type of shape describes the network topology in its essence. === Geometry illustration === In the representations below, the first is a one dimension torus, a simple circle. The second is a two dimension torus, in the shape of a 'doughnut'. The animation illustrates how a two dimension torus is generated from a rectangle by connecting its two pairs of opposite edges. At one dimension, a torus topology is equivalent to a ring interconnect network, in the shape of a circle. At two dimensions, it becomes equivalent to a two dimension mesh, but with extra connection at the edge nodes. === Torus network topology === A torus interconnect is a switch-less topology that can be seen as a mesh interconnect with nodes arranged in a rectilinear array of N = 2, 3, or more dimensions, with processors connected to their nearest neighbors, and corresponding processors on opposite edges of the array connected.[1] In this lattice, each node has 2N connections. This topology is named for the lattice formed in this way, which is topologically homogeneous to an N-dimensional torus. == Visualization == The first 3 dimensions of torus network topology are easier to visualize and are described below: 1D Torus: one dimension, n nodes are connected in closed loop with each node connected to its two nearest neighbors. Communication can take place in two directions, +x and −x. A 1D Torus is the same as ring interconnection. 2D Torus: two dimensions with degree of four, the nodes are imagined laid out in a two-dimensional rectangular lattice of n rows and n columns, with each node connected to its four nearest neighbors, and corresponding nodes on opposite edges connected. Communication can take place in four directions, +x, −x, +y, and −y. The total nodes of a 2D Torus is n2. 3D Torus: three dimensions, the nodes are imagined in a three-dimensional lattice in the shape of a rectangular prism, with each node connected with its six neighbors, with corresponding nodes on opposing faces of the array connected. Each edge consists of n nodes. communication can take place in six directions, +x, −x, +y, −y, +z, −z. Each edge of a 3D Torus consist of n nodes. The total nodes of 3D Torus is n3. ND Torus: N dimensions, each node of an N dimension torus has 2N neighbors, Communication can take place in 2N directions. Each edge consists of n nodes. Total nodes of this torus is nN. The main motivation of having higher dimension of torus is to achieve higher bandwidth, lower latency, and higher scalability. Higher-dimensional arrays are difficult to visualize. The above ruleset shows that each higher dimension adds another pair of nearest neighbor connections to each node. == Performance == A number of supercomputers on the TOP500 list use three-dimensional torus networks, e.g. IBM's Blue Gene/L and Blue Gene/P, and the Cray XT3. IBM's Blue Gene/Q uses a five-dimensional torus network. Fujitsu's K computer and the PRIMEHPC FX10 use a proprietary three-dimensional torus 3D mesh interconnect called Tofu. === 3D Torus performance simulation === Sandeep Palur and Dr. Ioan Raicu from Illinois Institute of Technology conducted experiments to simulate 3D torus performance. Their experiments ran on a computer with 250GB RAM, 48 cores and x86_64 architecture. The simulator they used was ROSS (Rensselaer’s Optimistic Simulation System). They mainly focused on three aspects: Varying network size Varying number of servers Varying message size They concluded that throughput decreases with the increase of servers and network size. Otherwise, throughput increases with the increase of message size. === 6D Torus product performance === Fujitsu Limited developed a 6D torus computer model called "Tofu". In their model, a 6D torus can achieve 100 GB/s off-chip bandwidth, 12 times higher scalability than a 3D torus, and high fault tolerance. The model is used in the K computer and Fugaku. === Cost === While long wrap-around links may be the easiest way to visualize the connection topology, in practice, restrictions on cable lengths often make long wrap-around links impractical. Instead, directly connected nodes—including nodes that the above visualization places on opposite edges of a grid, connected by a long wrap-around link—are physically placed nearly adjacent to each other in a folded torus network. Every link in the folded torus network is very short—almost as short as the nearest-neighbor links in a simple grid interconnect—and therefore low-latency.

    Read more →
  • Data stream management system

    Data stream management system

    A data stream management system (DSMS) is a computer software system to manage continuous data streams. It is similar to a database management system (DBMS), which is, however, designed for static data in conventional databases. A DBMS also offers a flexible query processing so that the information needed can be expressed using queries. However, in contrast to a DBMS, a DSMS executes a continuous query that is not only performed once, but is permanently installed. Therefore, the query is continuously executed until it is explicitly uninstalled. Since most DSMS are data-driven, a continuous query produces new results as long as new data arrive at the system. This basic concept is similar to complex event processing so that both technologies are partially coalescing. == Functional principle == One important feature of a DSMS is the possibility to handle potentially infinite and rapidly changing data streams by offering flexible processing at the same time, although there are only limited resources such as main memory. The following table provides various principles of DSMS and compares them to traditional DBMS. == Processing and streaming models == One of the biggest challenges for a DSMS is to handle potentially infinite data streams using a fixed amount of memory and no random access to the data. There are different approaches to limit the amount of data in one pass, which can be divided into two classes. For the one hand, there are compression techniques that try to summarize the data and for the other hand there are window techniques that try to portion the data into (finite) parts. === Synopses === The idea behind compression techniques is to maintain only a synopsis of the data, but not all (raw) data points of the data stream. The algorithms range from selecting random data points called sampling to summarization using histograms, wavelets or sketching. One simple example of a compression is the continuous calculation of an average. Instead of memorizing each data point, the synopsis only holds the sum and the number of items. The average can be calculated by dividing the sum by the number. However, it should be mentioned that synopses cannot reflect the data accurately. Thus, a processing that is based on synopses may produce inaccurate results. === Windows === Instead of using synopses to compress the characteristics of the whole data streams, window techniques only look on a portion of the data. This approach is motivated by the idea that only the most recent data are relevant. Therefore, a window continuously cuts out a part of the data stream, e.g. the last ten data stream elements, and only considers these elements during the processing. There are different kinds of such windows like sliding windows that are similar to FIFO lists or tumbling windows that cut out disjoint parts. Furthermore, the windows can also be differentiated into element-based windows, e.g., to consider the last ten elements, or time-based windows, e.g., to consider the last ten seconds of data. There are also different approaches to implementing windows. There are, for example, approaches that use timestamps or time intervals for system-wide windows or buffer-based windows for each single processing step. Sliding-window query processing is also suitable to being implemented in parallel processors by exploiting parallelism between different windows and/or within each window extent. == Query processing == Since there are a lot of prototypes, there is no standardized architecture. However, most DSMS are based on the query processing in DBMS by using declarative languages to express queries, which are translated into a plan of operators. These plans can be optimized and executed. A query processing often consists of the following steps. === Formulation of continuous queries === The formulation of queries is mostly done using declarative languages like SQL in DBMS. Since there are no standardized query languages to express continuous queries, there are a lot of languages and variations. However, most of them are based on SQL, such as the Continuous Query Language (CQL), StreamSQL and ESP. There are also graphical approaches where each processing step is a box and the processing flow is expressed by arrows between the boxes. The language strongly depends on the processing model. For example, if windows are used for the processing, the definition of a window has to be expressed. In StreamSQL, a query with a sliding window for the last 10 elements looks like follows: This stream continuously calculates the average value of "price" of the last 10 tuples, but only considers those tuples whose prices are greater than 100.0. In the next step, the declarative query is translated into a logical query plan. A query plan is a directed graph where the nodes are operators and the edges describe the processing flow. Each operator in the query plan encapsulates the semantic of a specific operation, such as filtering or aggregation. In DSMSs that process relational data streams, the operators are equal or similar to the operators of the Relational algebra, so that there are operators for selection, projection, join, and set operations. This operator concept allows the very flexible and versatile processing of a DSMS. === Optimization of queries === The logical query plan can be optimized, which strongly depends on the streaming model. The basic concepts for optimizing continuous queries are equal to those from database systems. If there are relational data streams and the logical query plan is based on relational operators from the Relational algebra, a query optimizer can use the algebraic equivalences to optimize the plan. These may be, for example, to push selection operators down to the sources, because they are not so computationally intensive like join operators. Furthermore, there are also cost-based optimization techniques like in DBMS, where a query plan with the lowest costs is chosen from different equivalent query plans. One example is to choose the order of two successive join operators. In DBMS this decision is mostly done by certain statistics of the involved databases. But, since the data of a data streams is unknown in advance, there are no such statistics in a DSMS. However, it is possible to observe a data stream for a certain time to obtain some statistics. Using these statistics, the query can also be optimized later. So, in contrast to a DBMS, some DSMS allows to optimize the query even during runtime. Therefore, a DSMS needs some plan migration strategies to replace a running query plan with a new one. === Transformation of queries === Since a logical operator is only responsible for the semantics of an operation but does not consist of any algorithms, the logical query plan must be transformed into an executable counterpart. This is called a physical query plan. The distinction between a logical and a physical operator plan allows more than one implementation for the same logical operator. The join, for example, is logically the same, although it can be implemented by different algorithms like a Nested loop join or a Sort-merge join. Notice, these algorithms also strongly depend on the used stream and processing model. Finally, the query is available as a physical query plan. === Execution of queries === Since the physical query plan consists of executable algorithms, it can be directly executed. For this, the physical query plan is installed into the system. The bottom of the graph (of the query plan) is connected to the incoming sources, which can be everything like connectors to sensors. The top of the graph is connected to the outgoing sinks, which may be for example a visualization. Since most DSMSs are data-driven, a query is executed by pushing the incoming data elements from the source through the query plan to the sink. Each time when a data element passes an operator, the operator performs its specific operation on the data element and forwards the result to all successive operators. == Examples == AURORA, StreamBase Systems, Inc. Archived 23 March 2009 at the Wayback Machine Hortonworks DataFlow IBM Streams NIAGARA Query Engine NiagaraST: A Research Data Stream Management System at Portland State University Odysseus, an open source Java-based framework for Data Stream Management Systems Pipeline DB PIPES Archived 24 December 2016 at the Wayback Machine, webMethods Business Events QStream SAS Event Stream Processing SQLstream STREAM StreamGlobe StreamInsight TelegraphCQ WSO2 Stream Processor

    Read more →
  • Scan line

    Scan line

    A scan line (also scanline) is one line, or row, in a raster scanning pattern, such as a line of video on a cathode-ray tube (CRT) display of a television set or computer monitor. On CRT screens the horizontal scan lines are visually discernible, even when viewed from a distance, as alternating colored lines and black lines, especially when a progressive scan signal with below maximum vertical resolution is displayed. This is sometimes used today as a visual effect in computer graphics. The term is used, by analogy, for a single row of pixels in a raster graphics image. Scan lines are important in representations of image data, because many image file formats have special rules for data at the end of a scan line. For example, there may be a rule that each scan line starts on a particular boundary (such as a byte or word; see for example BMP file format). This means that even otherwise compatible raster data may need to be analyzed at the level of scan lines in order to convert between formats.

    Read more →
  • White-box cryptography

    White-box cryptography

    In cryptography, the white-box model refers to an extreme attack scenario, in which an adversary has full unrestricted access to a cryptographic implementation, most commonly of a block cipher such as the Advanced Encryption Standard (AES). A variety of security goals may be posed (see the section below), the most fundamental being "unbreakability", requiring that any (bounded) attacker should not be able to extract the secret key hardcoded in the implementation, while at the same time the implementation must be fully functional. In contrast, the black-box model only provides an oracle access to the analyzed cryptographic primitive (in the form of encryption and/or decryption queries). There is also a model in-between, the so-called gray-box model, which corresponds to additional information leakage from the implementation, more commonly referred to as side-channel leakage. White-box cryptography is a practice and study of techniques for designing and attacking white-box implementations. It has many applications, including digital rights management (DRM), pay television, protection of cryptographic keys in the presence of malware, mobile payments and cryptocurrency wallets. Examples of DRM systems employing white-box implementations include CSS and Widevine. White-box cryptography is closely related to the more general notions of obfuscation, in particular, to Black-box obfuscation, proven to be impossible, and to Indistinguishability obfuscation, constructed recently under well-founded assumptions but so far being infeasible to implement in practice. As of January 2023, there are no publicly known unbroken white-box designs of standard symmetric encryption schemes. On the other hand, there exist many unbroken white-box implementations of dedicated block ciphers designed specifically to achieve incompressibility (see § Security goals). == Security goals == Depending on the application, different security goals may be required from a white-box implementation. Specifically, for symmetric-key algorithms the following are distinguished: Unbreakability is the most fundamental goal requiring that a bounded attacker should not be able to recover the secret key embedded in the white-box implementation. Without this requirement, all other security goals are unreachable since a successful attacker can simply use a reference implementation of the encryption scheme together with the extracted key. One-wayness requires that a white-box implementation of an encryption scheme can not be used by a bounded attacker to decrypt ciphertexts. This requirement essentially turns a symmetric encryption scheme into a public-key encryption scheme, where the white-box implementation plays the role of the public key associated to the embedded secret key. This idea was proposed already in the famous work of Diffie and Hellman in 1976 as a potential public-key encryption candidate. Code lifting security is an informal requirement on the context, in which the white-box program is being executed. It demands that an attacker can not extract a functional copy of the program. This goal is particularly relevant in the DRM setting. Code obfuscation techniques are often used to achieve this goal. A commonly used technique is to compose the white-box implementation with so-called external encodings. These are lightweight secret encodings that modify the function computed by the white-box part of an application. It is required that their effect is canceled in other parts of the application in an obscure way, using code obfuscation techniques. Alternatively, the canceling counterparts can be applied on a remote server. Incompressibility requires that an attacker can not significantly compress a given white-box implementation. This can be seen as a way to achieve code lifting security (see above), since exfiltrating a large program from a constrained device (for example, an embedded or a mobile device) can be time-consuming and may be easy to detect by a firewall. Examples of incompressible designs include SPACE cipher, SPNbox, WhiteKey and WhiteBlock. These ciphers use large lookup tables that can be pseudorandomly generated from a secret master key. Although this makes the recovery of the master key hard, the lookup tables themselves play the role of an equivalent secret key. Thus, unbreakability is achieved only partially. Traceability (Traitor tracing) requires that each distributed white-box implementation contains a digital watermark allowing identification of the guilty user in case the white-box program is being leaked and distributed publicly. == History == The white-box model with initial attempts of white-box DES and AES implementations were first proposed by Chow, Eisen, Johnson and van Oorshot in 2003. The designs were based on representing the cipher as a network of lookup tables and obfuscating the tables by composing them with small (4- or 8-bit) random encodings. Such protection satisfied a property that each single obfuscated table individually does not contain any information about the secret key. Therefore, a potential attacker has to combine several tables in their analysis. The first two schemes were broken in 2004 by Billet, Gilbert, and Ech-Chatbi using structural cryptanalysis. The attack was subsequently called "the BGE attack". The numerous consequent design attempts (2005-2022) were quickly broken by practical dedicated attacks. In 2016, Bos, Hubain, Michiels and Teuwen showed that an adaptation of standard side-channel power analysis attacks can be used to efficiently and fully automatically break most existing white-box designs. This result created a new research direction about generic attacks (correlation-based, algebraic, fault injection) and protections against them. == Competitions == Four editions of the WhibOx contest were held in 2017, 2019, 2021 and 2024 respectively. These competitions invited white-box designers both from academia and industry to submit their implementation in the form of (possibly obfuscated) C code. At the same time, everyone could attempt to attack these programs and recover the embedded secret key. Each of these competitions lasted for about 4-5 months. WhibOx 2017 / CHES 2017 Capture the Flag Challenge targeted the standard AES block cipher. Among 94 submitted implementations, all were broken during the competition, with the strongest one staying unbroken for 28 days. WhibOx 2019 / CHES 2019 Capture the Flag Challenge again targeted the AES block cipher. Among 27 submitted implementations, 3 programs stayed unbroken throughout the competition, but were broken after 51 days since the publication. WhibOx 2021 / CHES 2021 Capture the Flag Challenge changed the target to ECDSA, a digital signature scheme based on elliptic curves. Among 97 submitted implementations, all were broken within at most 2 days. WhibOx 2024 / CHES 2024 Capture the Flag Challenge again targeted ECDSA. Among 47 submitted implementations, all were broken during the competition, with the strongest one staying unbroken for almost 5 days.

    Read more →
  • Secret London

    Secret London

    Secret London is a Facebook group started by 21-year-old Bristol University graduate, Tiffany Philippou, on 19 January 2010 in response to a Saatchi & Saatchi competition. The group grew rapidly (180,000 members as of 8 February 2010) and is composed mostly of Londoners who use the site to share suggestions and photos of London. After the group's early success, the founder announced her intention to launch a website of the same name by crowdsourcing the design and development. The website was launched on 16 February 2010. == Other secret cities == Following the initial success of Secret London, a number of other secret groups were independently started around the world, some of which already have over 100,000 users. As of 19 February 2010, the list of other groups includes: Secret Frankfurt, Secret Tel Aviv, Secret Paris, Secret New York, Secret Tokyo, Secret Toronto, Secret Los Angeles, Secret Exeter, Secret Boston, Secret Norwich, Secret Singapore, Secret Brighton, Secret Minneapolis, Secret Sydney, Secret Canberra, Secret Brisbane, Secret Wellington, Secret Christchurch, Secret Madeira, Secret Funchal, Secret Bristol and Secret Cardiff. == Controversy == Some commentators have questioned whether it possible to share secrets without compromising them, and whether sharing tips publicly will lead to over-exposure of the businesses who are recommended.

    Read more →
  • Instant messaging

    Instant messaging

    Instant messaging (IM) technology is a type of synchronous computer-mediated communication involving the immediate (real-time) transmission of messages between two or more parties over the Internet or another computer network. Originally involving simple text message exchanges, modern instant messaging applications and services (also variously known as instant messenger, messaging app, chat app, chat client, or simply a messenger) tend to also feature the exchange of multimedia, emojis, file transfer, VoIP (voice calling), and video chat capabilities. Instant messaging systems facilitate connections between specified known users (often using a contact list also known as a "buddy list" or "friend list") or in chat rooms, and can be standalone apps or integrated into a wider social media platform, or in a website where it can, for instance, be used for conversational commerce. Originally the term "instant messaging" was distinguished from "text messaging" by being run on a computer network instead of a cellular/mobile network, being able to write longer messages, real-time communication, presence ("status"), and being free (only cost of access instead of per SMS message sent). Instant messaging was pioneered in the early Internet era; the IRC protocol was the earliest to achieve wide adoption. Later in the 1990s, ICQ was among the first closed and commercialized instant messengers, and several rival services appeared afterwards as it became a popular use of the Internet. Beginning with its first introduction in 2005, BlackBerry Messenger became the first popular example of mobile-based IM, combining features of traditional IM and mobile SMS. Instant messaging remains very popular today; IM apps are the most widely used smartphone apps: in 2018 for instance there were 980 million monthly active users of WeChat and 1.3 billion monthly users of WhatsApp, the largest IM network. == Overview == Instant messaging (IM), sometimes also called "messaging" or "texting", consists of computer-based human communication between two users (private messaging) or more (chat room or "group") in real-time, allowing immediate receipt of acknowledgment or reply. This is in direct contrast to email, where conversations are not in real-time, and the perceived quasi-synchrony of the communications by the users (although many systems allow users to send offline messages that the other user receives when logging in). Earlier IM networks were limited to text-based communication, not dissimilar to mobile text messaging. As technology has moved forward, IM has expanded to include voice calling using a microphone, videotelephony using webcams, file transfer, location sharing, image and video transfer, voice notes, and other features. IM is conducted over the Internet or other types of networks (see also LAN messenger). Depending on the IM protocol, the technical architecture can be peer-to-peer (direct point-to-point transmission) or client–server (when all clients have to first connect to the central server). Primary IM services are controlled by their corresponding companies and usually follow the client-server model. At one point, the term "Instant Messenger" was a service mark of AOL Time Warner and could not be used in software not affiliated with AOL in the United States. For this reason, in April 2007, the instant messaging client formerly named Gaim (or gaim) announced that they would be renamed "Pidgin". === Clients === Modern IM services generally provide their own client, either a separately installed application or a browser-based client. They are normally centralised networks run by the servers of the platform's operators, unlike peer-to-peer protocols like XMPP. These usually only work within the same IM network, although some allow limited function with other services (see #Interoperability). Third-party client software applications exist that will connect with most of the major IM services. There is the class of instant messengers that uses the serverless model, which doesn't require servers, and the IM network consists only of clients. There are several serverless messengers: RetroShare, Tox, Bitmessage, Ricochet. See also: LAN messenger. Some examples of popular IM services today include Signal, Telegram, WhatsApp Messenger, WeChat, QQ Messenger, Viber, Line, and Snapchat. The popularity of certain apps greatly differ between different countries. Certain apps have an emphasis on certain uses - for example, Skype focuses on video calling, Slack focuses on messaging and file sharing for work teams, and Snapchat focuses on image messages. Some social networking services offer messaging services as a component of their overall platform, such as Facebook's Facebook Messenger, who also own WhatsApp. Others have a direct IM function as an additional adjunct component of their social networking platforms, like Instagram, Reddit, Tumblr, TikTok, Clubhouse and Twitter; this also includes for example dating websites, such as OkCupid or Plenty of Fish, and online gaming chat platforms. === Features === ==== Private and group messaging ==== Private chat allows users to converse privately with another person or a group. Privacy can also be enhanced in several ways, such as end-to-end encryption by default. Public and group chat features allow users to communicate with multiple people simultaneously. ==== Calling ==== Many major IM services and applications offer a call feature for user-to-user voice calls, conference calls, and voice messages. The call functionality is useful for professionals who utilize the application for work purposes and as a hands-free method. Videotelephony using a webcam is also possible by some. ==== Games and entertainment ==== Some IM applications include in-app games for entertainment. Yahoo! Messenger, for example, introduced these where users could play a game and viewed by friends in real-time. MSN Messenger featured a number of playable games within the interface. Facebook's Messenger has had a built-in option to play games with people in a chat, including games like Tetris and Blackjack. Discord features multiple games built inside the "activities" tab in voice channels. ==== Payments ==== A relatively new feature to instant messaging, peer-to-peer payments are available for financial tasks on top of communication. The lack of a service fee also makes these advantageous to financial applications. IM services such as Facebook Messenger and the WeChat 'super-app' for example offer a payment feature. == History == === Early systems === Though the term dates from the 1990s, instant messaging predates the Internet, first appearing on multi-user operating systems like Compatible Time-Sharing System (CTSS) and Multiplexed Information and Computing Service (Multics) in the mid-1960s. Initially, some of these systems were used as notification systems for services like printing, but quickly were used to facilitate communication with other users logged into the same machine. CTSS facilitated communication via text message for up to 30 people. Parallel to instant messaging were early online chat facilities, the earliest of which was Talkomatic (1973) on the PLATO system, which allowed 5 people to chat simultaneously on a 512 x 512 plasma display (5 lines of text + 1 status line per person). During the bulletin board system (BBS) phenomenon that peaked during the 1980s, some systems incorporated chat features which were similar to instant messaging; Freelancin' Roundtable was one prime example. The first such general-availability commercial online chat service (as opposed to PLATO, which was educational) was the CompuServe CB Simulator in 1980, created by CompuServe executive Alexander "Sandy" Trevor in Columbus, Ohio. As networks developed, the protocols spread with the networks. Some of these used a peer-to-peer protocol (e.g. talk, ntalk and ytalk), while others required peers to connect to a server (see talker and IRC). The Zephyr Notification Service (still in use at some institutions) was invented at MIT's Project Athena in the 1980s to allow service providers to locate and send messages to users. Early instant messaging programs were primarily real-time text, where characters appeared as they were typed. This includes the Unix "talk" command line program, which was popular in the 1980s and early 1990s. Some BBS chat programs (i.e. Celerity BBS) also used a similar interface. Modern implementations of real-time text also exist in instant messengers, such as AOL's Real-Time IM as an optional feature. In the latter half of the 1980s and into the early 1990s, the Quantum Link online service for Commodore 64 computers offered user-to-user messages between concurrently connected customers, which they called "On-Line Messages" (or OLM for short), and later "FlashMail." Quantum Link later became America Online and made AOL Instant Messenger (AIM, discussed later). While the Quantum Link client software ran on a Commodore 64, using only

    Read more →
  • Topological deep learning

    Topological deep learning

    Topological deep learning (TDL) is a research field that extends deep learning to handle complex, non-Euclidean data structures. Traditional deep learning models, such as convolutional neural networks (CNNs) and recurrent neural networks (RNNs), excel in processing data on regular grids and sequences. However, scientific and real-world data often exhibit more intricate data domains encountered in scientific computations, including point clouds, meshes, time series, scalar fields graphs, or general topological spaces like simplicial complexes and CW complexes. TDL addresses this by incorporating topological concepts to process data with higher-order relationships, such as interactions among multiple entities and complex hierarchies. This approach leverages structures like simplicial complexes and hypergraphs to capture global dependencies and qualitative spatial properties, offering a more nuanced representation of data. TDL also encompasses methods from computational and algebraic topology that permit studying properties of neural networks and their training process, such as their predictive performance or generalization properties. The mathematical foundations of TDL are algebraic topology, differential topology, and geometric topology. Therefore, TDL can be generalized for data on differentiable manifolds, knots, links, tangles, curves, etc. == History and motivation == Traditional techniques from deep learning often operate under the assumption that a dataset is residing in a highly-structured space (like images, where convolutional neural networks exhibit outstanding performance over alternative methods) or a Euclidean space. The prevalence of new types of data, in particular graphs, meshes, and molecules, resulted in the development of new techniques, culminating in the field of geometric deep learning, which originally proposed a signal-processing perspective for treating such data types. While originally confined to graphs, where connectivity is defined based on nodes and edges, follow-up work extended concepts to a larger variety of data types, including simplicial complexes and CW complexes, with recent work proposing a unified perspective of message-passing on general combinatorial complexes. An independent perspective on different types of data originated from topological data analysis, which proposed a new framework for describing structural information of data, i.e., their "shape," that is inherently aware of multiple scales in data, ranging from local information to global information. While at first restricted to smaller datasets, subsequent work developed new descriptors that efficiently summarized topological information of datasets to make them available for traditional machine-learning techniques, such as support vector machines or random forests. Such descriptors ranged from new techniques for feature engineering over new ways of providing suitable coordinates for topological descriptors, or the creation of more efficient dissimilarity measures. Contemporary research in this field is largely concerned with either integrating information about the underlying data topology into existing deep-learning models or obtaining novel ways of training on topological domains. == Learning on topological spaces == One of the core concepts in topological deep learning is considering the domain upon which this data is defined and supported. In case of Euclidean data, such as images, this domain is a grid, upon which the pixel value of the image is supported. In a more general setting this domain might be a topological domain. Studying and developing deep learning models that are supported ln topological domains constitute the essence of topological deep learning. Next, we introduce the most common topological domains that are encountered in a deep learning setting. These domains include, but not limited to, graphs, simplicial complexes, cell complexes, combinatorial complexes and hypergraphs. Given a finite set S of abstract entities, a neighborhood function N {\displaystyle {\mathcal {N}}} on S is an assignment that attach to every point x {\displaystyle x} in S a subset of S or a relation. Such a function can be induced by equipping S with an auxiliary structure. Edges provide one way of defining relations among the entities of S. More specifically, edges in a graph allow one to define the notion of neighborhood using, for instance, the one hop neighborhood notion. Edges however, limited in their modeling capacity as they can only be used to model binary relations among entities of S since every edge is connected typically to two entities. In many applications, it is desirable to permit relations that incorporate more than two entities. The idea of using relations that involve more than two entities is central to topological domains. Such higher-order relations allow for a broader range of neighborhood functions to be defined on S to capture multi-way interactions among entities of S. Next we review the main properties, advantages, and disadvantages of some commonly studied topological domains in the context of deep learning, including (abstract) simplicial complexes, regular cell complexes, hypergraphs, and combinatorial complexes. ==== Comparisons among topological domains ==== Each of the enumerated topological domains has its own characteristics, advantages, and limitations: Simplicial complexes Simplest form of higher-order domains. Extensions of graph-based models. Admit hierarchical structures, making them suitable for various applications. Hodge theory can be naturally defined on simplicial complexes. Require relations to be subsets of larger relations, imposing constraints on the structure. Cell Complexes Generalize simplicial complexes. Provide more flexibility in defining higher-order relations. Each cell in a cell complex is homeomorphic to an open ball, attached together via attaching maps. Boundary cells of each cell in a cell complex are also cells in the complex. Represented combinatorially via incidence matrices. Hypergraphs Allow arbitrary set-type relations among entities. Relations are not imposed by other relations, providing more flexibility. Do not explicitly encode the dimension of cells or relations. Useful when relations in the data do not adhere to constraints imposed by other models like simplicial and cell complexes. Combinatorial Complexes : Generalize and bridge the gaps between simplicial complexes, cell complexes, and hypergraphs. Allow for hierarchical structures and set-type relations. Combine features of other complexes while providing more flexibility in modeling relations. Can be represented combinatorially, similar to cell complexes. ==== Hierarchical structure and set-type relations ==== The properties of simplicial complexes, cell complexes, and hypergraphs give rise to two main features of relations on higher-order domains, namely hierarchies of relations and set-type relations. ===== Rank function ===== A rank function on a higher-order domain X is an order-preserving function rk: X → Z, where rk(x) attaches a non-negative integer value to each relation x in X, preserving set inclusion in X. Cell and simplicial complexes are common examples of higher-order domains equipped with rank functions and therefore with hierarchies of relations. ===== Set-type relations ===== Relations in a higher-order domain are called set-type relations if the existence of a relation is not implied by another relation in the domain. Hypergraphs constitute examples of higher-order domains equipped with set-type relations. Given the modeling limitations of simplicial complexes, cell complexes, and hypergraphs, we develop the combinatorial complex, a higher-order domain that features both hierarchies of relations and set-type relations. The learning tasks in TDL can be broadly classified into three categories: Cell classification: Predict targets for each cell in a complex. Examples include triangular mesh segmentation, where the task is to predict the class of each face or edge in a given mesh. Complex classification: Predict targets for an entire complex. For example, predict the class of each input mesh. Cell prediction: Predict properties of cell-cell interactions in a complex, and in some cases, predict whether a cell exists in the complex. An example is the prediction of linkages among entities in hyperedges of a hypergraph. In practice, to perform the aforementioned tasks, deep learning models designed for specific topological spaces must be constructed and implemented. These models, known as topological neural networks, are tailored to operate effectively within these spaces. === Topological neural networks === Central to TDL are topological neural networks (TNNs), specialized architectures designed to operate on data structured in topological domains. Unlike traditional neural networks tailored for grid-like structures, TNNs are adept at handling more intricate data representations, such as graphs

    Read more →
  • Branch number

    Branch number

    In cryptography, the branch number is a numerical value that characterizes the amount of diffusion introduced by a vectorial Boolean function F that maps an input vector a to output vector F ( a ) {\displaystyle F(a)} . For the (usual) case of a linear F the value of the differential branch number is produced by: applying nonzero values of a (i.e., values that have at least one non-zero component of the vector) to the input of F; calculating for each input value a the Hamming weight W {\displaystyle W} (number of nonzero components), and adding weights W ( a ) {\displaystyle W(a)} and W ( F ( a ) ) {\displaystyle W(F(a))} together; selecting the smallest combined weight across for all nonzero input values: B d ( F ) = min a ≠ 0 ( W ( a ) + W ( F ( a ) ) ) {\displaystyle B_{d}(F)={\underset {a\neq 0}{\min }}(W(a)+W(F(a)))} . If both a and F ( a ) {\displaystyle F(a)} have s components, the result is obviously limited on the high side by the value s + 1 {\displaystyle s+1} (this "perfect" result is achieved when any single nonzero component in a makes all components of F ( a ) {\displaystyle F(a)} to be non-zero). A high branch number suggests higher resistance to the differential cryptanalysis: the small variations of input will produce large changes on the output and in order to obtain small variations of the output, large changes of the input value will be required. The term was introduced by Daemen and Rijmen in early 2000s and quickly became a typical tool to assess the diffusion properties of the transformations. == Mathematics == The branch number concept is not limited to the linear transformations, Daemen and Rijmen provided two general metrics: differential branch number, where the minimum is obtained over inputs of F that are constructed by independently sweeping all the values of two nonzero and unequal vectors a, b ( ⊕ {\displaystyle \oplus } is a component-by-component exclusive-or): B d ( F ) = min a ≠ b ( W ( a ⊕ b ) + W ( F ( a ) ⊕ F ( b ) ) {\displaystyle B_{d}(F)={\underset {a\neq b}{\min }}(W(a\oplus b)+W(F(a)\oplus F(b))} ; for linear branch number, the independent candidates α {\displaystyle \alpha } and β {\displaystyle \beta } are independently swept; they should be nonzero and correlated with respect to F (the L A T ( α , β ) {\displaystyle LAT(\alpha ,\beta )} coefficient of the linear approximation table of F should be nonzero): B l ( F ) = min α ≠ 0 , β , L A T ( α , β ) ≠ 0 ( W ( α ) + W ( β ) ) {\displaystyle B_{l}(F)={\underset {\alpha \neq 0,\beta ,LAT(\alpha ,\beta )\neq 0}{\min }}(W(\alpha )+W(\beta ))} .

    Read more →
  • Backup

    Backup

    In information technology, a backup, or data backup is a copy of computer data taken and stored elsewhere so that it may be used to restore the original after a data loss event. The verb form, referring to the process of doing so, is "back up", whereas the noun and adjective form is "backup". Backups can be used to recover data after its loss from data deletion or corruption, or to recover data from an earlier time. Backups provide a simple form of IT disaster recovery; however not all backup systems are able to reconstitute a computer system or other complex configuration such as a computer cluster, active directory server, or database server. A backup system contains at least one copy of all data considered worth saving. The data storage requirements can be large. An information repository model may be used to provide structure to this storage. There are different types of data storage devices used for copying backups of data that is already in secondary storage onto archive files. There are also different ways these devices can be arranged to provide geographic dispersion, data security, and portability. Data is selected, extracted, and manipulated for storage. The process can include methods for dealing with live data, including open files, as well as compression, encryption, and de-duplication. Additional techniques apply to enterprise client-server backup. Backup schemes may include dry runs that validate the reliability of the data being backed up. There are limitations and human factors involved in any backup scheme. == Storage == A backup strategy requires an information repository, "a secondary storage space for data" that aggregates backups of data "sources". The repository could be as simple as a list of all backup media (DVDs, etc.) and the dates produced, or could include a computerized index, catalog, or relational database. === 3-2-1 Backup Rule === The backup data needs to be stored, requiring a backup rotation scheme, which is a system of backing up data to computer media that limits the number of backups of different dates retained separately, by appropriate re-use of the data storage media by overwriting of backups no longer needed. The scheme determines how and when each piece of removable storage is used for a backup operation and how long it is retained once it has backup data stored on it. The 3-2-1 rule can aid in the backup process. It states that there should be at least 3 copies of the data, stored on 2 different types of storage media, and one copy should be kept offsite, in a remote location (this can include cloud storage). 2 or more different media should be used to eliminate data loss due to similar reasons (for example, optical discs may tolerate being underwater while LTO tapes may not, and SSDs cannot fail due to head crashes or damaged spindle motors since they do not have any moving parts, unlike hard drives). An offsite copy protects against fire, theft of physical media (such as tapes or discs) and natural disasters like floods and earthquakes. Physically protected hard drives are an alternative to an offsite copy, but they have limitations like only being able to resist fire for a limited period of time, so an offsite copy still remains as the ideal choice. Because there is no perfect storage, many backup experts recommend maintaining a second copy on a local physical device, even if the data is also backed up offsite. === Backup methods === ==== Unstructured ==== An unstructured repository may simply be a stack of tapes, DVD-Rs or external HDDs with minimal information about what was backed up and when. This method is the easiest to implement, but unlikely to achieve a high level of recoverability as it lacks automation. ==== Full only/System imaging ==== A repository using this backup method contains complete source data copies taken at one or more specific points in time. Copying system images, this method is frequently used by computer technicians to record known good configurations. However, imaging is generally more useful as a way of deploying a standard configuration to many systems rather than as a tool for making ongoing backups of diverse systems. ==== Incremental ==== An incremental backup stores data changed since a reference point in time. Duplicate copies of unchanged data are not copied. Typically a full backup of all files is made once or at infrequent intervals, serving as the reference point for an incremental repository. Subsequently, a number of incremental backups are made after successive time periods. Restores begin with the last full backup and then apply the incrementals. Some backup systems can create a synthetic full backup from a series of incrementals, thus providing the equivalent of frequently doing a full backup. When done to modify a single archive file, this speeds restores of recent versions of files. ==== Near-CDP ==== Continuous Data Protection (CDP) refers to a backup that instantly saves a copy of every change made to the data. This allows restoration of data to any point in time and is the most comprehensive and advanced data protection. Near-CDP backup applications—often marketed as "CDP"—automatically take incremental backups at a specific interval, for example every 15 minutes, one hour, or 24 hours. They can therefore only allow restores to an interval boundary. Near-CDP backup applications use journaling and are typically based on periodic "snapshots", read-only copies of the data frozen at a particular point in time. Near-CDP (except for Apple Time Machine) intent-logs every change on the host system, often by saving byte or block-level differences rather than file-level differences. This backup method differs from simple disk mirroring in that it enables a roll-back of the log and thus a restoration of old images of data. Intent-logging allows precautions for the consistency of live data, protecting self-consistent files but requiring applications "be quiesced and made ready for backup." Near-CDP is more practicable for ordinary personal backup applications, as opposed to true CDP, which must be run in conjunction with a virtual machine or equivalent and is therefore generally used in enterprise client-server backups. Software may create copies of individual files such as written documents, multimedia projects, or user preferences, to prevent failed write events caused by power outages, operating system crashes, or exhausted disk space, from causing data loss. A common implementation is an appended ".bak" extension to the file name. ==== Reverse incremental ==== A Reverse incremental backup method stores a recent archive file "mirror" of the source data and a series of differences between the "mirror" in its current state and its previous states. A reverse incremental backup method starts with a non-image full backup. After the full backup is performed, the system periodically synchronizes the full backup with the live copy, while storing the data necessary to reconstruct older versions. This can either be done using hard links—as Apple Time Machine does, or using binary diffs. ==== Differential ==== A differential backup saves only the data that has changed since the last full backup. This means a maximum of two backups from the repository are used to restore the data. However, as time from the last full backup (and thus the accumulated changes in data) increases, so does the time to perform the differential backup. Restoring an entire system requires starting from the most recent full backup and then applying just the last differential backup. A differential backup copies files that have been created or changed since the last full backup, regardless of whether any other differential backups have been made since, whereas an incremental backup copies files that have been created or changed since the most recent backup of any type (full or incremental). Changes in files may be detected through a more recent date/time of last modification file attribute, and/or changes in file size. Other variations of incremental backup include multi-level incrementals and block-level incrementals that compare parts of files instead of just entire files. === Storage media === Regardless of the repository model that is used, the data has to be copied onto an archive file data storage medium. The medium used is also referred to as the type of backup destination. ==== Magnetic tape ==== Magnetic tape was for a long time the most commonly used medium for bulk data storage, backup, archiving, and interchange. It was previously a less expensive option, but this is no longer the case for smaller amounts of data. Tape is a sequential access medium, so the rate of continuously writing or reading data can be very fast. While tape media itself has a low cost per space, tape drives are typically dozens of times as expensive as hard disk drives and optical drives. Tape media are generally rotated on a schedule so at least one set is off-site in case something should happe

    Read more →
  • Rassd News Network

    Rassd News Network

    Rassd News Network, also known by its initials of RNN (Arabic:شبكة رصد الاخبارية), is an alternative media network based in Cairo, Egypt. RNN was launched as a Facebook-based news source launched on January 25, 2011. It quickly advanced to become a primary contributor of Egyptian revolution-related news that year. Applying the motto "From the people to the people," the citizen journalists who created RNN have since added a Twitter feed and launched an independent website dedicated to short news stories favored by an online audience. RNN is an organized citizen news network with four working committees; one for editing the news, another to support the correspondents covering Egypt, a third for managing the multimedia feeds and a fourth for staff functions such as development, training and public relations. RNN's Arabic name, Rassd, is an acronym that stands for Rakeb (observe), Sawwer (record) and Dawwen (blog). RNN created a Ustream channel on January 27, 2011, and a YouTube account a month later. The success of RNN and its new social media model is evidenced in its recent local network expansion into Libya, Morocco, Syria, Jerusalem and Turkey. Even so, one media scholar in the US (commenting in 2011) called the accuracy of RNN's reporting "fairly mediocre". RNN has endured closures of their Facebook profile and YouTube account as part of the attacks from private media, attempting to thwart their work and influence their content. == Use of RNN's news by international media == RNN has been a global source of Egyptian revolution-related news since its launch. During the early days of the citizen uprisings across the Middle East, major networks such as BBC, Reuters, Al Jazeera and Al Arabiya used some of Rassd's news and photos, and followed the network on Twitter. Three days after the online portal went live it was streaming video to MSNBC through its Facebook page. Then on February 5, 2011, Louisville's NBC-affiliate cited RNN, Cairo when it reported that President Hosni Mubarak had stepped down as head of Egypt's ruling party.

    Read more →
  • GoodRx

    GoodRx

    GoodRx Holdings, Inc. is an American healthcare company that operates a telemedicine platform and free-to-use website and mobile app that track prescription drug prices in the United States and provide drug coupons for discounts on medications. GoodRx compares prescription drug prices at more than 75,000 pharmacies in the United States. The platform allows users to consult a doctor online and obtain a prescription for certain types of medications. == History == === Financial performance === GoodRx was founded in Santa Monica, California in 2011. GoodRx experienced substantial growth in net income in 2017 ($9 million), 2018 ($44 million), and 2019 ($66 million), but recorded a loss of $293.6 million in 2020 due to IPO-related expenses. In September 2020, GoodRx went public on the Nasdaq under the ticker symbol GDRX. The company priced its initial public offering at $33 per share, above the expected range of $24 to $28, raising more than $1.1 billion at an initial valuation of approximately $12.7 billion. In the first half of 2020, the company reported revenues of $257 million and net income of $55 million. GoodRx generated $745.4 million in revenue for the full year 2021, a 35.36% increase over 2020. During the first half of 2021, the company’s share price declined by 10.7%. The decline was attributed to increased competition in online pharmacy services and slower user growth. GoodRx reported full-year revenue of $766.6 million, with adjusted EBITDA reaching $213.5 million, exceeding guidance in the fourth quarter. GoodRx reported that 41% of prescriptions filled using its coupons were newly adherent, meaning they would not have been filled without the service. GoodRx reported a full-year 2023 revenue of $750.3 million, a decrease of 2.1% from 2022. However, its fourth-quarter revenue increased by 7% year-over-year. GoodRx achieved an Adjusted EBITDA of $217.4 million for the year and an Adjusted EBITDA Margin of 28.6%. In 2024, GoodRx achieved 6% revenue growth with $792.3 million for the full year and turned a net loss into a positive net income of $16.4 million. The company also demonstrated strong operational efficiency, with a 32.8% increase in full-year Adjusted EBITDA. In Q2 2025, GoodRx reported revenue of $203.1 million, a 1.2% increase from the previous year, and a net income of $12.8 million, a significant 92% jump, which resulted in a 6.3% net income margin. However, prescription transaction revenue declined by 3% due to a decrease in monthly active consumers, but this was offset by strong 32% growth in its Pharma Manufacturer Solutions business. GoodRx also saw a 7% decrease in subscription revenue. === Mergers and acquisitions === In 2019, GoodRx acquired HeyDoctor, a telemedicine company, to integrate virtual healthcare services into the platform. In 2021, a health video content producer, HealthiNation was acquired by GoodRx, which helped provide consumers with health information and offered pharmaceutical manufacturers new ways to reach relevant audiences. In April 2022, GoodRx acquired VitaCare Prescription Services from TherapeuticsMD to strengthen its pharma manufacturer solutions business. === Partnerships === In 2017, the company announced partnerships with major pharmaceutical companies to negotiate lower prescription drug costs. GoodRx has deep relationships with major pharmacy chains, including Walgreens, Walmart, CVS Caremark, and Publix, to allow customers to use GoodRx discounts and Gold benefits. GoodRx began its partnership with CVS Caremark in July 2023 to automatically apply coupons to insured CVS customers purchasing generic prescriptions at certain locations. In April 2024, GoodRx added Publix into its network, allowing GoodRx Gold members to use their cards at Publix Pharmacies. GoodRx partners with Pharmacy Benefit Management like Caremark, Express Scripts, and MedImpact to apply their savings directly to eligible insurance plans and members. GoodRx partners with companies like Affirm, Benefitfocus, and DoorDash to integrate their services that offer members discounts and financial flexibility for prescriptions. GoodRx also partners with organizations like the American Academy of Family Physicians Foundation to support broader access to care. In October 2022, GoodRx launched Provider Mode, which allows healthcare providers to use the app to compare costs of drugs for patients based on different payment methods and drug alternatives. In 2025, GoodRx partnered with Novo Nordisk to offer discounted cash-pay access to semaglutide products like Ozempic and Wegovy through its platform and participating pharmacies. == Products and services == GoodRx started its telemedicine service GoodRx Care in September 2019. It lets people talk to a licensed provider online for common issues and get prescriptions even if they don't have insurance. They also run condition-specific subscription plans that bundle online doctor visits, FDA-approved meds, and home delivery into one monthly payment. On the weight management side, GoodRx offers prescriptions for GLP-1 drugs like semaglutide through their telemedicine platform. This got a boost when the oral version of Wegovy became widely available in the US in early 2026. GoodRx works with drug makers like Novo Nordisk to make some medications (including semaglutide options) more affordable for people paying cash. The telemedicine part took off after GoodRx bought HeyDoctor in 2019 and brought their virtual care tools into the main platform. == Key people == The Santa Monica-based startup was founded in September 2011 by Trevor Bezdek and former Facebook executives Doug Hirsch and Scott Marlette. Marlette was one of the first 20 employees at Facebook and built Facebook's photo application. In 2005, Hirsch was the Vice President of Product at Facebook, working closely with Mark Zuckerberg. Bezdek and Hirsch served as co-chief executive officers until April 2023, when they stepped down from those roles and technology executive Scott Wagner was appointed interim chief executive officer. Bezdek became chair of the board, while Hirsch took on the role of chief mission officer. In December 2024, GoodRx announced that healthcare executive Wendy Barnes would become president and chief executive officer effective January 1, 2025. As of 2025, Barnes serves as the company’s CEO, while Trevor Bezdek and Scott Wagner serve as co-chairs of the board, and Doug Hirsch remains involved as a co-founder and senior executive. == Controversy == On February 25, 2020, Consumer Reports published an article stating that GoodRx shared user data—specifically, pseudonymized advertising ID numbers that companies use to track the behavior of web users across websites, the names of the drugs that users browsed, and the pharmacies where users sought to fill prescriptions—with Google, Facebook, and around twenty other Internet-based companies. A few days later, GoodRx released a statement saying that it had made changes to prevent user search data on medical conditions and pharmaceuticals from being shared with Facebook. In March 2020, GoodRx stopped sending data about user prescriptions to Facebook. On February 1, 2023, the Federal Trade Commission fined GoodRx US$1.5 million for violations of the Breach Notification Rule and the Federal Trade Commission Act for allegedly failing to obtain specific, informed, and unambiguous consent from users before disclosing health-related information to Facebook and Google. In November 2024, independent pharmacies filed at least three class action lawsuits against GoodRx and major pharmacy benefit managers. The cases, brought by independent pharmacies in California, Michigan, Pennsylvania, and Rhode Island, allege that GoodRx and the PBMs collaborated to suppress reimbursements for generic prescription drugs. They allege that agreements using GoodRx’s software suppressed reimbursements for generic drugs and violated the Sherman Antitrust Act. The suits claim the practices amount to price fixing which harms small pharmacies while benefiting PBMs and their affiliates. GoodRx settled both the 2023 FTC action and the 2025 class action lawsuit without admitting wrongdoing.

    Read more →
  • Utah Social Media Regulation Act

    Utah Social Media Regulation Act

    S.B. 152 and H.B. 311, collectively known as the Utah Social Media Regulation Act, were social media regulation bills that were passed by the Utah State Legislature in March 2023. The bills would have collectively imposed restrictions on how social networking services serve minors in the state of Utah, including mandatory age verification and age restrictions, as well as restrictions on data collection and on algorithmic recommendations. The Act was intended to take effect in March 2024. However, following a lawsuit over the Act by NetChoice, a tech industry lobby group, the Utah attorney general stated in January 2024 that its implementation had been delayed to October 2024, but was likely to be repealed and amended. On September 10, 2024 Chief Judge Robert J. Shelby issued a written order granting a request from NetChoice for a preliminary injunction, meaning that Utah will be unable to enforce its social media law as litigation plays out. The law was appealed to the 10th Circuit on October 11, 2024 and is awaiting a decision. == Provisions == The Act comprises two bills, S.B. 152 and H.B. 311, which respectively regulate access to social network accounts registered to minors, and impose obligations on social networking services to follow design practices that protect the privacy of minors. The bills would apply to social networks with more than 5 million active users in the United States. Social networking services would've verified the age of all users in the state of Utah, or else their account must've been deleted. The Act does not specify a specific method of age verification. Users who are under 18 must have consent from a parent or guardian to open an account, and the parent must be able to have access to the account and its data for monitoring. Unless required to comply with state or federal law, social networks were prohibited from collecting data based on the activity of minors, and may've not displayed targeted advertising or algorithmic recommendations of content, users, or groups to minors. A social network must not allow minors to access the service between the hours of 10:30 p.m., and 6:30 a.m. without parental consent. H.B. 311 prohibits social networks from exposing features to minors that cause them to have an "addiction" to the platform; the service must perform quarterly audits, and may be sued by users for harms caused by providing "addictive" features; there is a rebuttable presumption of harm if the plaintiff is 16 or younger. The bills prescribed fines of $2,500 per-violation for violations of the provisions of S.B. 152, and up to $250,000 in liabilities (plus fines of $2,500 per-user) for violations of the addiction rules. == History == The two bills were passed in early-March 2023, and signed by Governor Spencer Cox on March 23, 2023. Cox cited studies linking social media addiction to increases in depression and suicide among youth. They were originally intended to take effect on March 1, 2024. In the wake of a lawsuit in Arkansas by the trade association NetChoice over a similar bill, state senator and bill author Mike McKell stated that he planned to introduce amendments when the legislature resumed in 2024. In December 2023, NetChoice filed a lawsuit in Utah seeking to block the Act, citing that its definition of a social network was too vague, and that it "restricts who can express themselves, what can be said, and when and how speech on covered websites can occur, down to the very hours of the day minors can use covered websites. The First Amendment, reinforced by decades of precedent, allows none of this." In regards to its age verification requirements, NetChoice argued that "it may not be enough to simply verify the age of whatever person may be listed on a form of identification (even if they have such a record) because that record may not accurately reflect who the individual actually is." The office of the attorney general stated that the state was "reviewing the lawsuit but remains intently focused on the goal of this legislation: Protecting young people from negative and harmful effects of social media use." In January 2024, Attorney General Sean Reyes asked the court to delay a hearing over the bill, stating that its effective date had been delayed to October 2024, and that the legislature planned to repeal and replace the bills. On September 10, 2024, Federal Chief Judge Robert Shelby granted a preliminary injunction to stop enforcement of the law as litigation continues. The law was later appealed on October 11, 2024, by the state of Utah and had a court hearing on the appeal on November 20, 2025.

    Read more →
  • Strong cryptography

    Strong cryptography

    Strong cryptography or cryptographically strong are general terms used to designate the cryptographic algorithms that, when used correctly, provide a very high (usually insurmountable) level of protection against any eavesdropper, including the government agencies. There is no precise definition of the boundary line between the strong cryptography and (breakable) weak cryptography, as this border constantly shifts due to improvements in hardware and cryptanalysis techniques. These improvements eventually place the capabilities once available only to the NSA within the reach of a skilled individual, so in practice there are only two levels of cryptographic security, "cryptography that will stop your kid sister from reading your files, and cryptography that will stop major governments from reading your files" (Bruce Schneier). The strong cryptography algorithms have high security strength, for practical purposes usually defined as a number of bits in the key. For example, the United States government, when dealing with export control of encryption, considered as of 1999 any implementation of the symmetric encryption algorithm with the key length above 56 bits or its public key equivalent to be strong and thus potentially a subject to the export licensing. To be strong, an algorithm needs to have a sufficiently long key and be free of known mathematical weaknesses, as exploitation of these effectively reduces the key size. At the beginning of the 21st century, the typical security strength of the strong symmetrical encryption algorithms is 128 bits (slightly lower values still can be strong, but usually there is little technical gain in using smaller key sizes). Demonstrating the resistance of any cryptographic scheme to attack is a complex matter, requiring extensive testing and reviews, preferably in a public forum. Good algorithms and protocols are required (similarly, good materials are required to construct a strong building), but good system design and implementation is needed as well: "it is possible to build a cryptographically weak system using strong algorithms and protocols" (just like the use of good materials in construction does not guarantee a solid structure). Many real-life systems turn out to be weak when the strong cryptography is not used properly, for example, random nonces are reused A successful attack might not even involve algorithm at all, for example, if the key is generated from a password, guessing a weak password is easy and does not depend on the strength of the cryptographic primitives. A user can become the weakest link in the overall picture, for example, by sharing passwords and hardware tokens with the colleagues. == Background == The level of expense required for strong cryptography originally restricted its use to the government and military agencies, until the middle of the 20th century the process of encryption required a lot of human labor and errors (preventing the decryption) were very common, so only a small share of written information could have been encrypted. US government, in particular, was able to keep a monopoly on the development and use of cryptography in the US into the 1960s. In the 1970, the increased availability of powerful computers and unclassified research breakthroughs (Data Encryption Standard, the Diffie-Hellman and RSA algorithms) made strong cryptography available for civilian use. Mid-1990s saw the worldwide proliferation of knowledge and tools for strong cryptography. By the 21st century the technical limitations were gone, although the majority of the communication were still unencrypted. At the same the cost of building and running systems with strong cryptography became roughly the same as the one for the weak cryptography. The use of computers changed the process of cryptanalysis, famously with Bletchley Park's Colossus. But just as the development of digital computers and electronics helped in cryptanalysis, it also made possible much more complex ciphers. It is typically the case that use of a quality cipher is very efficient, while breaking it requires an effort many orders of magnitude larger - making cryptanalysis so inefficient and impractical as to be effectively impossible. == Cryptographically strong algorithms == This term "cryptographically strong" is often used to describe an encryption algorithm, and implies, in comparison to some other algorithm (which is thus cryptographically weak), greater resistance to attack. But it can also be used to describe hashing and unique identifier and filename creation algorithms. See for example the description of the Microsoft .NET runtime library function Path.GetRandomFileName. In this usage, the term means "difficult to guess". An encryption algorithm is intended to be unbreakable (in which case it is as strong as it can ever be), but might be breakable (in which case it is as weak as it can ever be) so there is not, in principle, a continuum of strength as the idiom would seem to imply: Algorithm A is stronger than Algorithm B which is stronger than Algorithm C, and so on. The situation is made more complex, and less subsumable into a single strength metric, by the fact that there are many types of cryptanalytic attack and that any given algorithm is likely to force the attacker to do more work to break it when using one attack than another. There is only one known unbreakable cryptographic system, the one-time pad, which is not generally possible to use because of the difficulties involved in exchanging one-time pads without them being compromised. So any encryption algorithm can be compared to the perfect algorithm, the one-time pad. The usual sense in which this term is (loosely) used, is in reference to a particular attack, brute force key search — especially in explanations for newcomers to the field. Indeed, with this attack (always assuming keys to have been randomly chosen), there is a continuum of resistance depending on the length of the key used. But even so there are two major problems: many algorithms allow use of different length keys at different times, and any algorithm can forgo use of the full key length possible. Thus, Blowfish and RC5 are block cipher algorithms whose design specifically allowed for several key lengths, and who cannot therefore be said to have any particular strength with respect to brute force key search. Furthermore, US export regulations restrict key length for exportable cryptographic products and in several cases in the 1980s and 1990s (e.g., famously in the case of Lotus Notes' export approval) only partial keys were used, decreasing 'strength' against brute force attack for those (export) versions. More or less the same thing happened outside the US as well, as for example in the case of more than one of the cryptographic algorithms in the GSM cellular telephone standard. The term is commonly used to convey that some algorithm is suitable for some task in cryptography or information security, but also resists cryptanalysis and has no, or fewer, security weaknesses. Tasks are varied, and might include: generating randomness encrypting data providing a method to ensure data integrity Cryptographically strong would seem to mean that the described method has some kind of maturity, perhaps even approved for use against different kinds of systematic attacks in theory and/or practice. Indeed, that the method may resist those attacks long enough to protect the information carried (and what stands behind the information) for a useful length of time. But due to the complexity and subtlety of the field, neither is almost ever the case. Since such assurances are not actually available in real practice, sleight of hand in language which implies that they are will generally be misleading. There will always be uncertainty as advances (e.g., in cryptanalytic theory or merely affordable computer capacity) may reduce the effort needed to successfully use some attack method against an algorithm. In addition, actual use of cryptographic algorithms requires their encapsulation in a cryptosystem, and doing so often introduces vulnerabilities which are not due to faults in an algorithm. For example, essentially all algorithms require random choice of keys, and any cryptosystem which does not provide such keys will be subject to attack regardless of any attack resistant qualities of the encryption algorithm(s) used. == Legal issues == Widespread use of encryption increases the costs of surveillance, so the government policies aim to regulate the use of the strong cryptography. In the 2000s, the effect of encryption on the surveillance capabilities was limited by the ever-increasing share of communications going through the global social media platforms, that did not use the strong encryption and provided governments with the requested data. Murphy talks about a legislative balance that needs to be struck between the power of the government that are broad enough to be able to follow the qui

    Read more →