Ultra was the designation adopted by British military intelligence in June 1941 for wartime signals intelligence obtained by breaking high-level encrypted enemy radio and teleprinter communications at the Government Code and Cypher School (GC&CS) at Bletchley Park. Ultra eventually became the standard designation among the western Allies for all such intelligence. The name arose because the intelligence obtained was considered more important than that designated by the highest British security classification then used (Most Secret) and so was regarded as being Ultra Secret. Several other cryptonyms had been used for such intelligence. The code name "Boniface" was used as a cover name for Ultra. In order to ensure that the successful code-breaking did not become apparent to the Germans, British intelligence created a fictional MI6 master spy, Boniface, who controlled a fictional series of agents throughout Germany. Information obtained through code-breaking was often attributed to the human intelligence from the Boniface network. The U.S. used the codename Magic for its decrypts from Japanese sources, including the "Purple" cipher. Much of the German cipher traffic was encrypted on the Enigma machine. Used properly, the German military Enigma would have been virtually unbreakable; in practice, shortcomings in operation allowed it to be broken. The term "Ultra" has often been used almost synonymously with "Enigma decrypts". However, Ultra also encompassed decrypts of the German Lorenz SZ 40/42 machines that were used by the German High Command, and the Hagelin machine. Many observers, at the time and later, regarded Ultra as immensely valuable to the Allies. Winston Churchill was reported to have told King George VI, when presenting to him Stewart Menzies (head of the Secret Intelligence Service and the person who controlled distribution of Ultra decrypts to the government): "It is thanks to the secret weapon of General Menzies, put into use on all the fronts, that we won the war!" F. W. Winterbotham quoted the western Supreme Allied Commander, Dwight D. Eisenhower, at war's end describing Ultra as having been "decisive" to Allied victory. Sir Harry Hinsley, Bletchley Park veteran and official historian of British Intelligence in World War II, made a similar assessment of Ultra, saying that while the Allies would have won the war without it, "the war would have been something like two years longer, perhaps three years longer, possibly four years longer than it was." However, Hinsley and others have emphasized the difficulties of counterfactual history in attempting such conclusions, and some historians, such as John Keegan, have said the shortening might have been as little as the three months it took the United States to deploy the atomic bomb. == Sources of intelligence == Most Ultra intelligence was derived from reading radio messages that had been encrypted with cipher machines, complemented by material from radio communications using traffic analysis and direction finding. In the early phases of the war, particularly during the eight-month Phoney War, the Germans could transmit most of their messages using land lines and so had no need to use radio. This meant that those at Bletchley Park had some time to build up experience of collecting and starting to decrypt messages on the various radio networks. German Enigma messages were the main source, with those of the German air force (the Luftwaffe) predominating, as they used radio more and their operators were particularly ill-disciplined. === German === ==== Enigma ==== "Enigma" refers to a family of electro-mechanical rotor cipher machines. These produced a polyalphabetic substitution cipher and were widely thought to be unbreakable in the 1920s, when a variant of the commercial Model D was first used by the Reichswehr. The German Army (Heer), Navy, Air Force, Nazi party, Gestapo and German diplomats used Enigma machines in several variants. Abwehr (German military intelligence) used a four-rotor machine without a plugboard and Naval Enigma used different key management from that of the army or air force, making its traffic far more difficult to cryptanalyse; each variant required different cryptanalytic treatment. The commercial versions were not as secure and Dilly Knox of GC&CS is said to have broken one before the war. German military Enigma was first broken in December 1932 by Marian Rejewski and the Polish Cipher Bureau, using a combination of brilliant mathematics, the services of a spy in the German office responsible for administering encrypted communications, and good luck. The Poles read Enigma to the outbreak of World War II and beyond, in France. At the turn of 1939, the Germans made the systems ten times more complex, which required a tenfold increase in Polish decryption equipment, which they could not meet. On 25 July 1939, the Polish Cipher Bureau handed reconstructed Enigma machines and their techniques for decrypting ciphers to the French and British. Gordon Welchman wrote, Ultra would never have got off the ground if we had not learned from the Poles, in the nick of time, the details both of the German military Enigma machine, and of the operating procedures that were in use. At Bletchley Park, some of the key people responsible for success against Enigma included mathematicians Alan Turing and Hugh Alexander and, at the British Tabulating Machine Company, chief engineer Harold Keen. After the war, interrogation of German cryptographic personnel led to the conclusion that German cryptanalysts understood that cryptanalytic attacks against Enigma were possible but were thought to require impracticable amounts of effort and investment. The Poles' early start at breaking Enigma and the continuity of their success gave the Allies an advantage when World War II began. ==== Lorenz cipher ==== In June 1941, the Germans started to introduce on-line stream cipher teleprinter systems for strategic point-to-point radio links, to which the British gave the code-name Fish. Several systems were used, principally the Lorenz SZ 40/42 (codenamed "Tunny" by the British) and Geheimfernschreiber ("Sturgeon"). These cipher systems were cryptanalysed, particularly Tunny, which the British thoroughly penetrated. It was eventually attacked using Colossus machines, which were the first digital programme-controlled electronic computers. In many respects the Tunny work was more difficult than for the Enigma, since the British codebreakers had no knowledge of the machine producing it and no head-start such as that the Poles had given them against Enigma. Although the volume of intelligence derived from this system was much smaller than that from Enigma, its importance was often far higher because it produced primarily high-level, strategic intelligence that was sent between Wehrmacht high command (Oberkommando der Wehrmacht, OKW). The eventual bulk decryption of Lorenz-enciphered messages contributed significantly, and perhaps decisively, to the defeat of Nazi Germany. Nevertheless, the Tunny story has become much less well known among the public than the Enigma one. At Bletchley Park, some of the key people responsible for success in the Tunny effort included mathematicians W. T. "Bill" Tutte and Max Newman and electrical engineer Tommy Flowers. === Italian === In June 1940, the Italians were using book codes for most of their military messages, except for the Italian Navy, which in early 1941 had started using a version of the Hagelin rotor-based cipher machine C-38. This was broken from June 1941 onwards by the Italian subsection of GC&CS at Bletchley Park. === Japanese === In the Pacific theatre, a Japanese cipher machine, called "Purple" by the Americans, was used for highest-level Japanese diplomatic traffic. It produced a polyalphabetic substitution cipher, but unlike Enigma, was not a rotor machine, being built around electrical stepping switches. It was broken by the US Army Signal Intelligence Service and disseminated as Magic. Detailed reports by the Japanese ambassador to Germany were encrypted on the Purple machine. His reports included reviews of German assessments of the military situation, reviews of strategy and intentions, reports on direct inspections by the ambassador (in one case, of Normandy beach defences), and reports of long interviews with Hitler. The Japanese are said to have obtained an Enigma machine in 1937, although it is debated whether they were given it by the Germans or bought a commercial version, which, apart from the plugboard and internal wiring, was the German Heer/Luftwaffe machine. Having developed a similar machine, the Japanese did not use the Enigma machine for their most secret communications. The chief fleet communications code system used by the Imperial Japanese Navy was called JN-25 by the Americans, and by early 1942 the US Navy had made considerable progress in decrypting Japanese naval messages. The US Army also made progress on the
MSpy
mSpy is a brand of mobile and computer parental control monitoring software for iOS, Android, Windows, and macOS. The app monitors and logs user activity on the client device and sends the data to a personalized dashboard. Data the users can monitor includes text messages, calls, GPS locations, social media chats, and more. It is owned by Virtuoso Holding. == History == mSpy was launched as a product for mobile monitoring by Altercon Group in 2010. In 2012, the application allowed parents to monitor not only smartphones but also computers running Windows and macOS. In 2013, mSpy became TopTenReviews cell phone monitoring software award winner. By 2014, the business grew nearly 400%, and the app's user numbers exceeded 1 million. In 2015, mSpy received the Parents Tested Parents Approved (PTPA) Winner’s Seal of Approval in the United States. In 2015 and 2018, mSpy was the victim of data breaches which released user data. In 2016, mLite, a light version of mSpy, became available from Google Play. The same year, it was awarded the kidSAFE Certified Seal in the United States. In 2017, mSpy collaborated with YouTuber and journalist Coby Persin to conduct a social experiment on the dangers of social media and online predators. A social experiment, conducted with parental consent, involved Coby Persin to befriend three children—aged 12, 13, and 14—via Snapchat and then invite them to meet personally. Each of the participants agreed to the meeting and arrived at the designated location. The video of the experiment received widespread attention and helped to raise awareness about the importance of online security and parental controls. In early 2021, mSpy released a new feature - Screenrecorder. The feature allows parents to take screenshots of the kid's screen when they are browsing certain apps. In 2024, mSpy's Zendesk was compromised by an unknown threat actor, revealing their customer list. As of 2025, mSpy is compatible with Android, iPhone, and iPad devices. It provides access to various types of data stored on the device, including contact information, calendar entries, emails, SMS messages, browser history, photos, videos, and installed applications. Functions also include GPS tracking, geofencing, keyword alerts etc. == Reception == It was noted that since MSpy runs inconspicuously, there is risk of the software being used illegally. mSpy was called "terrifying" by The Next Web and was featured in NPR coverage of spyware used against victims of stalking and other domestic violence. In response mSpy released security updates aimed at reducing the risk of misuse and stated that it "uses encryption protocols to protect user data and that access is restricted to the account holder". In May 2015, Brian Krebs reported that mSpy was hacked, leaking personal data for hundreds of thousands of users of devices with mSpy installed. mSpy claimed that there was no data leak, but that instead, it was the victim of blackmailers. In September 2018, Krebs claimed and demonstrated that anyone could easily gain access to the mSpy database containing data for millions of users. The company responded by stating that the exposed data consisted primarily of error logs and incorrect login attempts. Following the incident, mSpy implemented new security measures, changed encryption keys, and reset passwords for affected accounts. A 2024 Sky News story characterised mSpy as "stalkerware". Leaked customer support messages from mSpy reveal misuse of its app for illegally monitoring partners and children.
Caste census
Caste census is a proposed census to be conducted in India by the Central Government of India. The proposed census was decided under the leadership of Prime Minister Narendra Modi by the cabinet committee of political affairs (CCPA) on 30 April 2025. It has been decided that a caste enumeration should be included with the forthcoming census. The exact time has not been declared yet. It is unclear that when the next census will be held. The decision of the cabinet was announced by the Central Railway Minister Ashwini Vaishnaw. It has been seen as a step that would help in drafting "equitable and targeted" policies by the present Central Government of India led by the Bhartiya Janta Party in India. The Central Home Minister Amit Shah has described the decision as a "historic decision". He has also described that the historic decision as “committed to social justice”. The leader of opposition Rahul Gandhi has welcomed the decision. He said "We have shown we can pressure govt" He has demanded a clear timeline for its completion. He has called it "The first step towards deep social reform". == Description == The caste census is a systematic recording of individuals’ caste identities during the nationwide census in the country. The Central minister Ashwini Vaishnaw expressed his view on the proposed census and said that it would "strengthen the social and economic structure of our society while the nation continues to progress”. The Caste census will happen for the first time in 100 years by the Central Government of India. It will be the part of the upcoming census in India. == History == According to Peabody, the first systematic caste-wise enumeration of households in the Indian subcontinent was conducted between 1658 and 1664 across seven districts of the then Marwar Kingdom, including Jodhpur city which was its capital. It was conducted by the then home minister Munhata Nainsi of the kingdom for the purpose of tax documentation. It was not to for classification of society or creation of social hierarchies but solving a tax related problem. During the period of the British rule in India, caste census was included in the decadal censuses to categorise the population by caste, religion and occupation. In 1871–72, the first detailed caste census was conducted by the government of British Raj in India. It was practiced between the period 1881 to 1931. The last caste census was conducted in the year 1931 in which 4,147 castes were recorded. The largest population in the whole of British India (including Pakistan and Bangladesh) was of Brahmins. The population of Brahmins was recorded more than 1.5 crores. After Brahmin community, the second place was of Jatav (Chamar)community. The population of Jatav was a little more than 1.23 crores. On the third place were Rajputs. The population of Rajputs was 81 lakhs. The Rajput caste was followed by the Kunbi caste of Maharashtra. The population of Kunbi caste was 64 lakhs and 34 thousands. The Kunbi caste was followed by Yadav (Ahir) caste. The population of Yadav (Ahir) community was 56 lakhs and 82 thousands. The Yadav (Ahir) caste was followed by Teli community. The population of Teli community was 42 lakhs and 58 thousands. The Teli community was followed by Gwala community. The population of the Gwala community was 40 lakhs. After the independence of India, the caste enumeration was stopped by the newly independent Government of India led by the prime minister Pandit Jawahar Lal Nehru in 1951. The caste enumeration was stopped to avoid reinforcing social divisions in the Indian society. But, there was an exception made for the enumeration of the Scheduled Castes (SCs) and Scheduled Tribes (STs) in the decadal censuses. Therefore, the enumeration of the Scheduled Castes and the Scheduled Tribes is being conducted in every census since 1951. In 1961, the Government of India permitted states for conducting their own surveys to compile OBC lists, but national caste census was not conducted.
Verifiable secret sharing
In cryptography, a secret sharing scheme is verifiable if auxiliary information is included that allows players to verify their shares as consistent. More formally, verifiable secret sharing ensures that even if the dealer is malicious there is a well-defined secret that the players can later reconstruct. (In standard secret sharing, the dealer is assumed to be honest.) The concept of verifiable secret sharing (VSS) was first introduced in 1985 by Benny Chor, Shafi Goldwasser, Silvio Micali and Baruch Awerbuch. In a VSS protocol a distinguished player who wants to share the secret is referred to as the dealer. The protocol consists of two phases: a sharing phase and a reconstruction phase. Sharing: Initially the dealer holds secret as input and each player holds an independent random input. The sharing phase may consist of several rounds. At each round each player can privately send messages to other players and can also broadcast a message. Each message sent or broadcast by a player is determined by its input, its random input and messages received from other players in previous rounds. Reconstruction: In this phase each player provides its entire view from the sharing phase and a reconstruction function is applied and is taken as the protocol's output. An alternative definition given by Oded Goldreich defines VSS as a secure multi-party protocol for computing the randomized functionality corresponding to some (non-verifiable) secret sharing scheme. This definition is stronger than that of the other definitions and is very convenient to use in the context of general secure multi-party computation. Verifiable secret sharing is important for secure multiparty computation. Multiparty computation is typically accomplished by making secret shares of the inputs, and manipulating the shares to compute some function. To handle "active" adversaries (that is, adversaries that corrupt nodes and then make them deviate from the protocol), the secret sharing scheme needs to be verifiable to prevent the deviating nodes from throwing off the protocol. == Feldman's scheme == A commonly used example of a simple VSS scheme is the protocol by Paul Feldman, which is based on Shamir's secret sharing scheme combined with any encryption scheme which satisfies a specific homomorphic property (that is not necessarily satisfied by all homomorphic encryption schemes). The following description gives the general idea, but is not secure as written. (Note, in particular, that the published value gs leaks information about the dealer's secret s.) First, a cyclic group G of prime order q, along with a generator g of G, is chosen publicly as a system parameter. The group G must be chosen such that computing discrete logarithms is hard in this group. (Typically, one takes an order-q subgroup of (Z/pZ)×, where q is a prime dividing p − 1.) The dealer then computes (and keeps secret) a random polynomial P of degree t with coefficients in Zq, such that P(0) = s, where s is the secret. Each of the n share holders will receive a value P(1), ..., P(n) modulo q. Any t + 1 share holders can recover the secret s by using polynomial interpolation modulo q, but any set of at most t share holders cannot. (In fact, at this point any set of at most t share holders has no information about s.) So far, this is exactly Shamir's scheme. To make these shares verifiable, the dealer distributes commitments to the coefficients of P modulo q. If P(x) = s + a1x + ... + atxt, then the commitments that must be given are: c0 = gs, c1 = ga1, ... ct = gat. Once these are given, any party can verify their share. For instance, to verify that v = P(i) modulo q, party i can check that g v = c 0 c 1 i c 2 i 2 ⋯ c t i t = ∏ j = 0 t c j i j = ∏ j = 0 t g a j i j = g ∑ j = 0 t a j i j = g P ( i ) {\displaystyle g^{v}=c_{0}c_{1}^{i}c_{2}^{i^{2}}\cdots c_{t}^{i^{t}}=\prod _{j=0}^{t}c_{j}^{i^{j}}=\prod _{j=0}^{t}g^{a_{j}i^{j}}=g^{\sum _{j=0}^{t}a_{j}i^{j}}=g^{P(i)}} . This scheme is, at best, secure against computationally bounded adversaries, namely the intractability of computing discrete logarithms. Pedersen proposed later a scheme where no information about the secret is revealed even with a dealer with unlimited computing power. == Baghery's hash-based scheme == A recent line of research has proposed a unified framework, for building practical VSS schemes that do not necessarily require homomorphic commitments —a key requirement in traditional constructions such as Feldman's and Pedersen's schemes. The framework allows instantiations with different commitment schemes, including post-quantum secure options such as hash-based commitments. This offers a flexible and efficient approach to build VSS schemes, in which the verifiability of shares is decoupled from the need for homomorphic commitments, which are often tied to assumptions like the Discrete Logarithm (DL) problem, known to be insecure against quantum adversaries. One instantiation of the new framework uses hash-based commitments and a random oracle to construct a hash-based VSS scheme based on Shamir's secret sharing. === Protocol Overview === Sharing Phase: Given a secure hash-based commitment scheme C {\displaystyle {\mathcal {C}}} and a hash function H {\displaystyle {\mathcal {H}}} (modeled as a random oracle), to share a secret value s {\displaystyle s} among n {\displaystyle n} parties with threshold t {\displaystyle t} , the dealer acts as follows: Following Shamir sharing, the dealer samples a random degree- t {\displaystyle t} polynomial P ( X ) {\displaystyle P(X)} over a filed or ring, with P ( 0 ) = s {\displaystyle P(0)=s} . Each of the n {\displaystyle n} parties will receive a value v i = P ( i ) {\displaystyle v_{i}=P(i)} modulo q {\displaystyle q} as a share. To prove the validity of the shares, the dealer acts as follows: Samples another random degree- t {\displaystyle t} polynomial R ( X ) {\displaystyle R(X)} and n {\displaystyle n} random values γ 1 , … , γ n {\displaystyle \gamma _{1},\dots ,\gamma _{n}} from the same filed or ring. Computes a set of commitments c i = C ( P ( i ) , R ( i ) , γ i ) {\displaystyle c_{i}={\mathcal {C}}(P(i),R(i),\gamma _{i})} for i = 1 , 2 , … , n {\displaystyle i=1,2,\dots ,n} . Note that, the additional randomness γ i {\displaystyle \gamma _{i}} is used when the secret s {\displaystyle s} does not have sufficient entropy, but it can be omitted when sharing a uniformly random secret. Each of the n {\displaystyle n} parties will also receive a value γ i {\displaystyle \gamma _{i}} modulo q {\displaystyle q} as a share. Calculates a challenge value d {\displaystyle d} via a hash function d = H ( c 1 , … , c n ) {\displaystyle d={\mathcal {H}}(c_{1},\dots ,c_{n})} and then computes a polynomial Z ( X ) = R ( X ) + d ⋅ P ( X ) {\displaystyle Z(X)=R(X)+d\cdot P(X)} . Broadcasts the commitments c 1 , … , c n {\displaystyle c_{1},\dots ,c_{n}} along with Z ( X ) {\displaystyle Z(X)} as the proof and privately sends ( v i , γ i ) {\displaystyle (v_{i},\gamma _{i})} as the individual share to party i {\displaystyle i} . Verification Phase: Given an individual share ( v i , γ i ) {\displaystyle (v_{i},\gamma _{i})} and a proof ( c 1 , … , c n , Z ( X ) ) {\displaystyle (c_{1},\dots ,c_{n},Z(X))} , party i {\displaystyle i} verifies the correctness of it as below: Checks that Z ( X ) {\displaystyle Z(X)} is a valid (up to) degree- t {\displaystyle t} polynomial. Recomputes the challenge value d = H ( c 1 , … , c n ) {\displaystyle d={\mathcal {H}}(c_{1},\dots ,c_{n})} , and verifies the commitment equation c i = C ( v i , Z ( i ) − d v i , γ i ) {\displaystyle c_{i}={\mathcal {C}}(v_{i},Z(i)-dv_{i},\gamma _{i})} . If the verification fails, similar to Feldman’s and Pedersen’s schemes, the party raises a complaint. If too many complaints (more than t {\displaystyle t} ) are raised, the dealer is disqualified. In case of a complaint, the dealer can publicly reveal the disputed share to allow global verification. Honest parties can then collectively agree to either continue or disqualify the dealer. This scheme supports the sharing of both low-entropy and high-entropy secrets. Moreover, since it relies solely on secure hash functions for commitments and on a (quantum) random oracle, it plausibly achieves security even against quantum adversaries. Additionally, by using only lightweight cryptographic primitives, the scheme is considerably more efficient in practice compared to traditional VSS constructions based on number-theoretic assumptions. == Benaloh's scheme == Once n shares are distributed to their holders, each holder should be able to verify that all shares are collectively t-consistent (i.e., any subset t of n shares will yield the same, correct, polynomial without exposing the secret). In Shamir's secret sharing scheme the shares s 1 , s 2 , . . . , s n {\displaystyle s_{1},s_{2},...,s_{n}} are t-consistent if and only if the interpolation of the points ( 1 , s 1 ) , ( 2 , s 2 ) , . . . , (
Data grid
A data grid is an architecture or set of services that allows users to access, modify and transfer extremely large amounts of geographically distributed data for research purposes. Data grids make this possible through a host of middleware applications and services that pull together data and resources from multiple administrative domains and then present it to users upon request. The data in a data grid can be located at a single site or multiple sites where each site can be its own administrative domain governed by a set of security restrictions as to who may access the data. Likewise, multiple replicas of the data may be distributed throughout the grid outside their original administrative domain and the security restrictions placed on the original data for who may access it must be equally applied to the replicas. Specifically developed data grid middleware is what handles the integration between users and the data they request by controlling access while making it available as efficiently as possible. == Middleware == Middleware provides all the services and applications necessary for efficient management of datasets and files within the data grid while providing users quick access to the datasets and files. There is a number of concepts and tools that must be available to make a data grid operationally viable. However, at the same time not all data grids require the same capabilities and services because of differences in access requirements, security and location of resources in comparison to users. In any case, most data grids will have similar middleware services that provide for a universal name space, data transport service, data access service, data replication and resource management service. When taken together, they are key to the data grids functional capabilities. === Universal namespace === Since sources of data within the data grid will consist of data from multiple separate systems and networks using different file naming conventions, it would be difficult for a user to locate data within the data grid and know they retrieved what they needed based solely on existing physical file names (PFNs). A universal or unified name space makes it possible to create logical file names (LFNs) that can be referenced within the data grid that map to PFNs. When an LFN is requested or queried, all matching PFNs are returned to include possible replicas of the requested data. The end user can then choose from the returned results the most appropriate replica to use. This service is usually provided as part of a management system known as a Storage Resource Broker (SRB). Information about the locations of files and mappings between the LFNs and PFNs may be stored in a metadata or replica catalogue. The replica catalogue would contain information about LFNs that map to multiple replica PFNs. === Data transport service === Another middleware service is that of providing for data transport or data transfer. Data transport will encompass multiple functions that are not just limited to the transfer of bits, to include such items as fault tolerance and data access. Fault tolerance can be achieved in a data grid by providing mechanisms that ensures data transfer will resume after each interruption until all requested data is received. There are multiple possible methods that might be used to include starting the entire transmission over from the beginning of the data to resuming from where the transfer was interrupted. As an example, GridFTP provides for fault tolerance by sending data from the last acknowledged byte without starting the entire transfer from the beginning. The data transport service also provides for the low-level access and connections between hosts for file transfer. The data transport service may use any number of modes to implement the transfer to include parallel data transfer where two or more data streams are used over the same channel or striped data transfer where two or more steams access different blocks of the file for simultaneous transfer to also using the underlying built-in capabilities of the network hardware or specifically developed protocols to support faster transfer speeds. The data transport service might optionally include a network overlay function to facilitate the routing and transfer of data as well as file I/O functions that allow users to see remote files as if they were local to their system. The data transport service hides the complexity of access and transfer between the different systems to the user so it appears as one unified data source. === Data access service === Data access services work hand in hand with the data transfer service to provide security, access controls and management of any data transfers within the data grid. Security services provide mechanisms for authentication of users to ensure they are properly identified. Common forms of security for authentication can include the use of passwords or Kerberos (protocol). Authorization services are the mechanisms that control what the user is able to access after being identified through authentication. Common forms of authorization mechanisms can be as simple as file permissions. However, need for more stringent controlled access to data is done using Access Control Lists (ACLs), Role-Based Access Control (RBAC) and Tasked-Based Authorization Controls (TBAC). These types of controls can be used to provide granular access to files to include limits on access times, duration of access to granular controls that determine which files can be read or written to. The final data access service that might be present to protect the confidentiality of the data transport is encryption. The most common form of encryption for this task has been the use of SSL while in transport. While all of these access services operate within the data grid, access services within the various administrative domains that host the datasets will still stay in place to enforce access rules. The data grid access services must be in step with the administrative domains access services for this to work. === Data replication service === To meet the needs for scalability, fast access and user collaboration, most data grids support replication of datasets to points within the distributed storage architecture. The use of replicas allows multiple users faster access to datasets and the preservation of bandwidth since replicas can often be placed strategically close to or within sites where users need them. However, replication of datasets and creation of replicas is bound by the availability of storage within sites and bandwidth between sites. The replication and creation of replica datasets is controlled by a replica management system. The replica management system determines user needs for replicas based on input requests and creates them based on availability of storage and bandwidth. All replicas are then cataloged or added to a directory based on the data grid as to their location for query by users. In order to perform the tasks undertaken by the replica management system, it needs to be able to manage the underlying storage infrastructure. The data management system will also ensure the timely updates of changes to replicas are propagated to all nodes. ==== Replication update strategy ==== There are a number of ways the replication management system can handle the updates of replicas. The updates may be designed around a centralized model where a single master replica updates all others, or a decentralized model, where all peers update each other. The topology of node placement may also influence the updates of replicas. If a hierarchy topology is used then updates would flow in a tree like structure through specific paths. In a flat topology it is entirely a matter of the peer relationships between nodes as to how updates take place. In a hybrid topology consisting of both flat and hierarchy topologies updates may take place through specific paths and between peers. ==== Replication placement strategy ==== There are a number of ways the replication management system can handle the creation and placement of replicas to best serve the user community. If the storage architecture supports replica placement with sufficient site storage, then it becomes a matter of the needs of the users who access the datasets and a strategy for placement of replicas. There have been numerous strategies proposed and tested on how to best manage replica placement of datasets within the data grid to meet user requirements. There is not one universal strategy that fits every requirement the best. It is a matter of the type of data grid and user community requirements for access that will determine the best strategy to use. Replicas can even be created where the files are encrypted for confidentiality that would be useful in a research project dealing with medical files. The following section contains several strategies for replica placement. ===== Dynamic replication ===== Dynam
News analytics
In trading strategy, news analysis refers to the measurement of the various qualitative and quantitative attributes of textual (unstructured data) news stories. Some of these attributes are: sentiment, relevance, and novelty. Expressing news stories as numbers and metadata permits the manipulation of everyday information in a mathematical and statistical way. This data is often used in financial markets as part of a trading strategy or by businesses to judge market sentiment and make better business decisions. News analytics are usually derived through automated text analysis and applied to digital texts using elements from natural language processing and machine learning such as latent semantic analysis, support vector machines, "bag of words" among other techniques. == Applications and strategies == The application of sophisticated linguistic analysis to news and social media has grown from an area of research to mature product solutions since 2007. News analytics and news sentiment calculations are now routinely used by both buy-side and sell-side in alpha generation, trading execution, risk management, and market surveillance and compliance. There is however a good deal of variation in the quality, effectiveness and completeness of currently available solutions. A large number of companies use news analysis to help them make better business decisions. Academic researchers have become interested in news analysis especially with regards to predicting stock price movements, volatility and traded volume. Provided a set of values such as sentiment and relevance as well as the frequency of news arrivals, it is possible to construct news sentiment scores for multiple asset classes such as equities, Forex, fixed income, and commodities. Sentiment scores can be constructed at various horizons to meet the different needs and objectives of high and low frequency trading strategies, whilst characteristics such as direction and volatility of asset returns as well as the traded volume may be addressed more directly via the construction of tailor-made sentiment scores. Scores are generally constructed as a range of values. For instance, values may range between 0 and 100, where values above and below 50 convey positive and negative sentiment, respectively. === Absolute return strategies === The objective of absolute return strategies is absolute (positive) returns regardless of the direction of the financial market. To meet this objective, such strategies typically involve opportunistic long and short positions in selected instruments with zero or limited market exposure. In statistical terms, absolute return strategies should have very low correlation with the market return. Typically, hedge funds tend to employ absolute return strategies. Below, a few examples show how news analysis can be applied in the absolute return strategy space with the purpose to identify alpha opportunities applying a market neutral strategy or based on volatility trading. Example 1 Scenario: The gap between the news sentiment scores for direction, S {\displaystyle S} , of Company X {\displaystyle X} and Market Y {\displaystyle Y} has moved beyond + 20 {\displaystyle +20} . That is, S X − S Y {\displaystyle S_{X}-S_{Y}} ≥ 20 {\displaystyle 20} . Action: Buy the stock on Company X {\displaystyle X} and short the future on Market Y {\displaystyle Y} . Exit Strategy: When the gap in the news sentiment scores for direction of Company X {\displaystyle X} and Market Y {\displaystyle Y} has disappeared, S X − S Y {\displaystyle S_{X}-S_{Y}} = 0 {\displaystyle 0} , sell the stock on Company X {\displaystyle X} and go long the future on Market Y {\displaystyle Y} to close the positions. Example 2 Scenario: The news sentiment score for volatility of Company X {\displaystyle X} goes above 70 {\displaystyle 70} out of 100 {\displaystyle 100} indicating an expected volatility above the option implied volatility. Action: Buy a short-dated straddle (the purchase of both a put and a call) on the stock of Company X {\displaystyle X} . Exit Strategy: Keep the straddle on Company X {\displaystyle X} until expiry or until a certain profit target has been reached. === Relative return strategies === The objective of relative return strategies is to either replicate (passive management) or outperform (active management) a theoretical passive reference portfolio or benchmark. To meet these objectives such strategies typically involve long positions in selected instruments. In statistical terms, relative return strategies often have high correlation with the market return. Typically, mutual funds tend to employ relative return strategies. Below, a few examples show how news analysis can be applied in the relative return strategy space with the purpose to outperform the market applying a stock picking strategy and by making tactical tilts to ones asset allocation model. Example 1 Scenario: The news sentiment score for direction of Company X {\displaystyle X} goes above 70 {\displaystyle 70} out of 100 {\displaystyle 100} . Action: Buy the stock on Company X {\displaystyle X} . Exit Strategy: When the news sentiment score for direction of Company X {\displaystyle X} falls below 60 {\displaystyle 60} , sell the stock on Company X {\displaystyle X} to close the position. Example 2 Scenario: The news sentiment score for direction of Sector Z {\displaystyle Z} goes above 70 {\displaystyle 70} out of 100 {\displaystyle 100} . Action: Include Sector Z {\displaystyle Z} as a tactical bet in the asset allocation model. Exit Strategy: When the news sentiment score for direction of Sector Z {\displaystyle Z} falls below 60 {\displaystyle 60} , remove the tactical bet for Sector Z {\displaystyle Z} from the asset allocation model. === Financial risk management === The objective of financial risk management is to create economic value in a firm or to maintain a certain risk profile of an investment portfolio by using financial instruments to manage risk exposures, particularly credit risk and market risk. Other types include Foreign exchange, Shape, Volatility, Sector, Liquidity, Inflation risks, etc. Below, a few examples show how news analysis can be applied in the financial risk management space with the purpose to either arrive at better risk estimates in terms of Value at Risk (VaR) or to manage the risk of a portfolio to meet ones portfolio mandate. Example 1 Scenario: The bank operates a VaR model to manage the overall market risk of its portfolio. Action: Estimate the portfolio covariance matrix taking into account the development of the news sentiment score for volume. Implement the relevant hedges to bring the VaR of the bank in line with the desired levels. Example 2 Scenario: A portfolio manager operates his portfolio towards a certain desired risk profile. Action: Estimate the portfolio covariance matrix taking into account the development of the news sentiment score for volume. Scale the portfolio exposure according to the targeted risk profile. === Computer algorithms using news analytics === Within 0.33 seconds, computer algorithms using news analytics can notify subscribers which company the news is about, if the news article sentiment is positive or negative, if the news is ranked as high or low relative importance … relative relevance. the stock price reaction and the increase in trade volume is concentrated in the first 5 seconds after an news article is released. === Algorithmic order execution === The objective of algorithmic order execution, which is part of the concept of algorithmic trading, is to reduce trading costs by optimizing on the timing of a given order. It is widely used by hedge funds, pension funds, mutual funds, and other institutional traders to divide up large trades into several smaller trades to manage market impact, opportunity cost, and risk more effectively. The example below shows how news analysis can be applied in the algorithmic order execution space with the purpose to arrive at more efficient algorithmic trading systems. Example 1 Scenario: A large order needs to be placed in the market for the stock on Company X {\displaystyle X} . Action: Scale the daily volume distribution for Company X {\displaystyle X} applied in the algorithmic trading system, thus taking into account the news sentiment score for volume. This is followed by the creation of the desired trading distribution forcing greater market participation during the periods of the day when volume is expected to be heaviest. == Effects == Being able to express news stories as numbers permits the manipulation of everyday information in a statistical way that allows computers not only to make decisions once made only by humans, but to do so more efficiently. Since market participants are always looking for an edge, the speed of computer connections and the delivery of news analysis, measured in milliseconds, have become essential.
Format-preserving encryption
In cryptography, format-preserving encryption (FPE), refers to encrypting in such a way that the output (the ciphertext) is in the same format as the input (the plaintext). The meaning of "format" varies. Typically only finite sets of characters are used; numeric, alphabetic or alphanumeric. For example: Encrypting a 16-digit credit card number so that the ciphertext is another 16-digit number. Encrypting an English word so that the ciphertext is another English word. Encrypting an n-bit number so that the ciphertext is another n-bit number (this is the definition of an n-bit block cipher). For such finite domains, and for the purposes of the discussion below, the cipher is equivalent to a permutation of N integers {0, ... , N−1} where N is the size of the domain. == Motivation == === Restricted field lengths or formats === One motivation for using FPE comes from the problems associated with integrating encryption into existing applications, with well-defined data models. A typical example would be a credit card number, such as 1234567812345670 (16 bytes long, digits only). Adding encryption to such applications might be challenging if data models are to be changed, as it usually involves changing field length limits or data types. For example, output from a typical block cipher would turn credit card number into a hexadecimal (e.g.0x96a45cbcf9c2a9425cde9e274948cb67, 34 bytes, hexadecimal digits) or Base64 value (e.g. lqRcvPnCqUJc3p4nSUjLZw==, 24 bytes, alphanumeric and special characters), which will break any existing applications expecting the credit card number to be a 16-digit number. Apart from simple formatting problems, using AES-128-CBC, this credit card number might get encrypted to the hexadecimal value 0xde015724b081ea7003de4593d792fd8b695b39e095c98f3a220ff43522a2df02. In addition to the problems caused by creating invalid characters and increasing the size of the data, data encrypted using the CBC mode of an encryption algorithm also changes its value when it is decrypted and encrypted again. This happens because the random seed value that is used to initialize the encryption algorithm and is included as part of the encrypted value is different for each encryption operation. Because of this, it is impossible to use data that has been encrypted with the CBC mode as a unique key to identify a row in a database. FPE attempts to simplify the transition process by preserving the formatting and length of the original data, allowing a drop-in replacement of plaintext values with their ciphertexts in legacy applications. == Comparison to truly random permutations == Although a truly random permutation is the ideal FPE cipher, for large domains it is infeasible to pre-generate and remember a truly random permutation. So the problem of FPE is to generate a pseudorandom permutation from a secret key, in such a way that the computation time for a single value is small (ideally constant, but most importantly smaller than O(N)). == Comparison to block ciphers == An n-bit block cipher technically is a FPE on the set {0, ..., 2n-1}. If an FPE is needed on one of these standard sized sets (for example, n = 64 for DES and n = 128 for AES) a block cipher of the right size can be used. However, in typical usage, a block cipher is used in a mode of operation that allows it to encrypt arbitrarily long messages, and with an initialization vector as discussed above. In this mode, a block cipher is not an FPE. == Definition of security == In cryptographic literature (see most of the references below), the measure of a "good" FPE is whether an attacker can distinguish the FPE from a truly random permutation. Various types of attackers are postulated, depending on whether they have access to oracles or known ciphertext/plaintext pairs. == Algorithms == In most of the approaches listed here, a well-understood block cipher (such as AES) is used as a primitive to take the place of an ideal random function. This has the advantage that incorporation of a secret key into the algorithm is easy. Where AES is mentioned in the following discussion, any other good block cipher would work as well. === The FPE constructions of Black and Rogaway === Implementing FPE with security provably related to that of the underlying block cipher was first undertaken in a paper by cryptographers John Black and Phillip Rogaway, which described three ways to do this. They proved that each of these techniques is as secure as the block cipher that is used to construct it. This means that if the AES algorithm is used to create an FPE algorithm, then the resulting FPE algorithm is as secure as AES because an adversary capable of defeating the FPE algorithm can also defeat the AES algorithm. Therefore, if AES is secure, then the FPE algorithms constructed from it are also secure. In all of the following, E denotes the AES encryption operation that is used to construct an FPE algorithm and F denotes the FPE encryption operation. ==== FPE from a prefix cipher ==== One simple way to create an FPE algorithm on {0, ..., N-1} is to assign a pseudorandom weight to each integer, then sort by weight. The weights are defined by applying an existing block cipher to each integer. Black and Rogaway call this technique a "prefix cipher" and showed it was provably as good as the block cipher used. Thus, to create an FPE on the domain {0,1,2,3}, given a key K apply AES(K) to each integer, giving, for example, weight(0) = 0x56c644080098fc5570f2b329323dbf62 weight(1) = 0x08ee98c0d05e3dad3eb3d6236f23e7b7 weight(2) = 0x47d2e1bf72264fa01fb274465e56ba20 weight(3) = 0x077de40941c93774857961a8a772650d Sorting [0,1,2,3] by weight gives [3,1,2,0], so the cipher is F(0) = 3 F(1) = 1 F(2) = 2 F(3) = 0 This method is only useful for small values of N. For larger values, the size of the lookup table and the required number of encryptions to initialize the table gets too big to be practical. ==== FPE from cycle walking ==== If there is a set M of allowed values within the domain of a pseudorandom permutation P (for example P can be a block cipher like AES), an FPE algorithm can be created from the block cipher by repeatedly applying the block cipher until the result is one of the allowed values (within M). CycleWalkingFPE(x) { if P(x) is an element of M then return P(x) else return CycleWalkingFPE(P(x)) } The recursion is guaranteed to terminate. (Because P is one-to-one and the domain is finite, repeated application of P forms a cycle, so starting with a point in M the cycle will eventually terminate in M.) This has the advantage that the elements of M do not have to be mapped to a consecutive sequence {0,...,N-1} of integers. It has the disadvantage, when M is much smaller than P's domain, that too many iterations might be required for each operation. If P is a block cipher of a fixed size, such as AES, this is a severe restriction on the sizes of M for which this method is efficient. For example, an application may want to encrypt 100-bit values with AES in a way that creates another 100-bit value. With this technique, AES-128-ECB encryption can be applied until it reaches a value which has all of its 28 highest bits set to 0, which will take an average of 228 iterations to happen. ==== FPE from a Feistel network ==== It is also possible to make a FPE algorithm using a Feistel network. A Feistel network needs a source of pseudo-random values for the sub-keys for each round, and the output of the AES algorithm can be used as these pseudo-random values. When this is done, the resulting Feistel construction is good if enough rounds are used. One way to implement an FPE algorithm using AES and a Feistel network is to use as many bits of AES output as are needed to equal the length of the left or right halves of the Feistel network. If a 24-bit value is needed as a sub-key, for example, it is possible to use the lowest 24 bits of the output of AES for this value. This may not result in the output of the Feistel network preserving the format of the input, but it is possible to iterate the Feistel network in the same way that the cycle-walking technique does to ensure that format can be preserved. Because it is possible to adjust the size of the inputs to a Feistel network, it is possible to make it very likely that this iteration ends very quickly on average. In the case of credit card numbers, for example, there are 1015 possible 16-digit credit card numbers (accounting for the redundant check digit), and because the 1015 ≈ 249.8, using a 50-bit wide Feistel network along with cycle walking will create an FPE algorithm that encrypts fairly quickly on average. === The Thorp shuffle === A Thorp shuffle is like an idealized card-shuffle, or equivalently a maximally-unbalanced Feistel cipher where one side is a single bit. It is easier to prove security for unbalanced Feistel ciphers than for balanced ones. === VIL mode === For domain sizes that are a power of two, and an existing block cipher with a smaller bl