AI Chatbot Login

AI Chatbot Login — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Information space analysis

    Information space analysis

    Within the field of information science, information space analysis is a deterministic method, enhanced by machine intelligence, for locating and assessing resources for team-centric efforts. Organizations need to be able to quickly assemble teams backed by the support services, information, and material to do the job. To do so, these teams need to find and assess sources of services that are potential participants in the team effort. To support this initial team and resource development, information needs to be developed via analysis tools that help make sense of sets of data sources in an Intranet or Internet. Part of the process is to characterize them, partition them, and sort and filter them. These tools focus on three key issues in forming a collaborative team: Help individuals responsible for forming the team understand what is available. Assist team members in identifying the structure and categorize the information available to them in a manner specifically suited to the task at hand. Aid team members to understand the mappings of their information between their organization and that used by others who might participate. Information space analysis tools combine multiple methods to assist in this task. This causes the tools to be particularly well-suited to integrating additional technologies in order to create specialized systems.

    Read more →
  • Social influence bias

    Social influence bias

    The social influence bias is an asymmetric herding effect on online social media platforms which makes users overcompensate for negative ratings but amplify positive ones. Driven by the desire to be accepted within a specific group, it surrounds the idea that people alter certain behaviors to be like those of the people within a group. Therefore, it is a subgroup term for various types of cognitive biases. Some social influence bias types include the bandwagon effect, authority bias, groupthinking effect, social comparison bias, social media bias and more. Understanding these biases helps us understand the term overall. However, the composition of the term "social influence bias" requires critical examination to understand the way that it affects individuals' and groups' lives. The term "influence" has 2 different types of stigma. For one, it surrounds the idea that people show their true inner selves when "under the influence". On the other end, it also proposes the idea that people are not their own selves when "under the influence". These tend to be constructions made by people, which also tend to fit the situation based on their own perspectives. So, even in social terms, it requires both sides to be examined to understand whether we truly are affected by context, or we remain to be and behave in terms of our own selves. The term "influence" doesn't necessarily say that there lies greater strength in our inner self's desires and decisions, nor does it say that external factors have the greater power. In a similar manner, both social and non-social judgments are to be associated with anxiety, but the same can't necessarily be said in the case of social conformity. So, the gray areas within this topic beg the question, "What does social influence bias say about us, and does it affect us all in the same way?" == Social media bias == Media bias is reflected in search systems in social media. Kulshrestha and her team found through research in 2018 that the top-ranked results returned by these search engines can influence users' perceptions when they conduct searches for events or people, which is particularly reflected in political bias and polarizing topics. Fueled by confirmation bias, online echo chambers allow users to be steeped within their own ideology. Because social media is tailored to your interests and your selected friends, it is an easy outlet for political echo chambers. Social media bias is also reflected in hostile media effect. Social media has a place in disseminating news in modern society, where viewers are exposed to other people's comments while reading news articles. In their 2020 study, Gearhart and her team showed that viewers' perceptions of bias increased and perceptions of credibility decreased after seeing comments with which they held different opinions. == In research context == In observational data, how social influence affects collected judgment is challenging to fully understand. Positive social influence can accumulate and result in a rating bubble, while negative social influence is neutralized by crowd correction. This phenomenon was first described in a paper written by Lev Muchnik, Sinan Aral and Sean J. Taylor in 2014, then the question was revisited by Cicognani et al., whose experiment reinforced Munchnik's and his co-authors' results. == Relevance == Online customer reviews are trusted sources of information in various contexts such as online marketplaces, dining, accommodation, movies, or digital products. However, these online ratings are not immune to herd behavior, which means that subsequent reviews are not independent from each other. As on many such sites, preceding opinions are visible to a new reviewer, he or she can be heavily influenced by the antecedent evaluations in his or her decision about the certain product, service or online content. This form of herding behavior inspired Muchnik, Aral and Taylor to conduct their experiment on influence in social contexts. == Experimental design == Muchnik, Aral, and Taylor designed a large-scale randomized experiment to measure social influence on user reviews. The experiment was conducted on social news aggregation website like Reddit. The study lasted for 5 months, the authors randomly assigned 101 281 comments to one of the following treatment groups: up-treated (4049), down-treated (1942), or control (the proportions reflect the observed ratio of up-and down-votes. Comments which fell to the first group were given an up-vote upon the creation of the comment, the second group got a down-vote upon creation, the comments in the control group remained untouched. A vote is equivalent to a single rating (+1 or -1). As other users are unable to trace a user’s votes, they were unaware of the experiment. Due to randomization, comments in the control and the treatment group were not different in terms of expected rating. The treated comments were viewed more than 10 million times and rated 308 515 times by successive users. == Results == The up-vote treatment increased the probability of up-voting by the first viewer by 32% over the control group, while the probability of down-voting did not change compared to the control group, which means that users did not correct the random positive rating. The upward bias remained inplace for the observed 5-month period. The accumulating herding effect increased the comment’s mean rating by 25% compared to the control group comments. Positively manipulated comments did receive higher ratings at all parts of the distribution, which means that they were also more likely to collect extremely high scores. The negative manipulation created an asymmetric herd effect: although the probability of subsequent down-votes was increased by the negative treatment, the probability of up-voting also grew for these comments. The community performed a correction which neutralized the negative treatment and resulted non-different final mean ratings from the control group. The authors also compared the final mean scores of comments across the most active topic categories on the website. The observed positive herding effect was present in the "politics," "culture and society," and "business" subreddits, but was not applicable for "economics," "IT," "fun," and "general news".- == Implications == The skewed nature of online ratings makes review outcomes different to what it would be without the social influence bias. In a 2009 experiment by Hu, Zhang and Pavlou showed that the distribution of reviews of a certain product made by unconnected individuals is approximately normal, however, the rating of the same product on Amazon followed a J-Shaped distribution with twice as much five-star ratings than others. Cicognani, Figini and Magnani came to similar conclusions after their experiment conducted on a tourism services website: positive preceding ratings influenced raters' behavior more than mediocre ones. Positive crowd correction makes community-based opinions upward-biased.

    Read more →
  • Philco computers

    Philco computers

    Philco was one of the pioneers of transistorized computers, also known as second-generation computers. After the company developed the surface-barrier transistor, which was much faster than previous point-contact types, it was awarded contracts for military and government computers. Commercialized derivatives of some of these designs became successful business and scientific computers. The TRANSAC (Transistor Automatic Computer) Model S-1000 was released as a scientific computer. The TRANSAC S-2000 mainframe computer system was first produced in 1958, and a family of compatible machines, with increasing performance, was released over the next several years. However, the mainframe computer market was dominated by IBM. Other companies could not deploy resources for development, customer support and marketing on the scale that IBM could afford, making competition in this segment difficult after the introduction of the IBM 360 family. Philco went bankrupt and was purchased in 1961 by Ford Motor Company, but the computer division carried on until the Philco division of Ford exited the computer business in 1963. The Ford company maintained one Philco mainframe in use until 1981. == The surface-barrier transistor == The surface-barrier transistor developed by Philco in 1953 had a much higher frequency response than the original point-contact transistors. The transistor was made of a thin crystal of germanium, which was electrolytically etched with pits on either side forming a very thin base region, on the order of 5 micrometers. Philco's process for etching was United States patent number 2,885,571. Philco surface-barrier transistors were used in TX-0, and in early models of what would become the DEC PDP product line. Although relatively fast, the small size of the devices limited their power to circuits operating at a few tens of milliwatts. == Military and government == Between 1955 and 1957, Philco built transistor computers for use in aircraft, models C-1000, C-1100, and C-1102, intended for airborne real-time applications. By 1957, the C-1102 had been used by a civilian sector customer. The BASICPAC AN/TYK 6V (first delivery in 1961), COMPAC AN/TYK 4V (not completed), and LOGICPAC systems were built for the US Army as transportable computer systems for use with their Fieldata concept of integrated information management. BASICPAC was a transistorized computer with up to 28,672 words of 38-bit core memory (including sign and parity), available in several configurations from a minimum system, to a truck-borne mobile version, to a fully expanded system. Basic clock periods was 1 microsecond (which gives a clock rate of 1 MHz), with 12 microsecond memory access and a fixed-point multiplication taking 242 microseconds. Input/output was by paper tape reader and punch, or through a teletypewriter. With additional hardware, magnetic tape storage was also available, with up to seven I/O devices. The instruction set had 31 basic operation codes and nine opcodes for I/O === CXPQ === Philco was contracted by the US Navy to build the CXPQ computer. One model was completed and installed at the David Taylor Model Basin. This design was later adapted to become the commercial TRANSAC S-2000. Only one CXPQ was built. The CXPQ is a 48-bit transistorized computer. === SOLO === In 1955, the National Security Agency through the US Navy contracted with Philco to produce a computer suitable for use as a workstation, with an architecture based on the vacuum-tube computer system called Atlas II already in use at the NSA, and similar to the commercial UNIVAC 1103. At the time, Philco was the largest producer of surface barrier transistors, which were the only type available with the speed and quantities required for a computer. The SOLO prototype was delivered in 1958, but required extensive debugging at NSA. Difficulties were encountered with core memory and power supplies. SOLO used paper tape and teleprinter machines for input and output. SOLO cost about $1 million US, and contained 8,000 transistors. While the system was extensively used for training, testing, research and development, no additional units were ordered. SOLO was removed from active service in 1963. The design of the SOLO became commercialized as Philco's TRANSAC Model S-1000. == Commercial == === S-1000 === The TRANSAC S-1000 was a scientific computer with a 36-bit word length and 4096 words of core memory. It was packaged in a container about the size of a large office desk, and used only 1.2 kilowatts, much less than vacuum-tube-based computers of similar capacity. In a 1961 survey, about 15 S-1000 computer installations had been identified. It weighed about 1,650 pounds (750 kg). === S-2000 === The TRANSAC S-2000 was a large mainframe system intended for both business and scientific work. It had a 48-bit word length and supported calculations in fixed point, floating point and binary-coded decimal formats. The original S-2000 "TRANSAC" (Transistor Automatic Computer) released in 1958 was later designated Model 210; it was used internally at Philco. Similar to the Control Data Corporation Model 1604, it was a 48-bit fully transistorized computer. Three succeeding models were released in the series, all compatible with the software of the original model. The Model 211 was introduced in 1960, using micro-alloy diffused field-effect transistors, requiring significant redesign of circuits compared to the original. The TRANSAC S-2000/Philco 210/211 weighed about 2,000 pounds (910 kg). By 1964, eighteen Model 210, eighteen Model 211 and seven Model 212 systems had been sold. After Philco was purchased by Ford Motor Company, the Model 212 was introduced in 1962 and released in 1963. It had 65,535 words of 48-bit memory. Initially made with 6-microsecond core memory, it had better performance than the IBM 7094 transistor computer. It was later upgraded in 1964 to 2-microsecond core memory, which gave the machine floating-point performance greater than the IBM 7030 Stretch computer. A Model 213 was announced in 1964 but never built. By that time competition from IBM had made the Philco computer operations no longer profitable for Ford, and the division was closed down. The Model 212 could carry out a floating-point multiplication in 22 microseconds. Each word contained two 24-bit instructions with 16 bits of address information and eight bits for the opcode. There were 225 different valid opcodes in the Model 212; invalid opcodes were detected and halted the machine. The CPU had an accumulator register of 48 bits, three general-purpose registers of 24 bits, and 32 index registers of 15 bits. Main memory size ranged from 4K words to 64K words. Only the first model had a magnetic drum memory; later editions used tape drives. The Model 212 weighed about 6,500 pounds (3.3 short tons; 2.9 t). Software for the S-2000 initially consisted of TAC (Translator-Assembler-Compiler), and ALTAC, a FORTRAN II-like language with some differences from the IBM 704 FORTRAN implementation. A COBOL compiler was also available, targeted at business applications. The Philco 2400 was the input/output system for the S-2000. Operations such as reading cards or printing were carried out through magnetic tapes, thereby offloading the S-2000 from relatively slow input/output processing. The 2400 had a 24-bit word length and could be supplied with 4K to 32K characters (1K to 8K words) of core memory, rated at 3-microsecond cycle time. The instruction set was aimed at character I/O use. The idea of base registers, implemented in Philco computers, influenced the design of IBM/360. The last Philco TRANSAC S-2000 Model 212 was taken out of service in December 1981, after 19 years of service at Ford.

    Read more →
  • POODLE

    POODLE

    POODLE (which stands for "Padding Oracle On Downgraded Legacy Encryption") is a security vulnerability which takes advantage of the fallback to SSL 3.0. If attackers successfully exploit this vulnerability, on average, they only need to make 256 SSL 3.0 requests to reveal one byte of encrypted messages. Bodo Möller, Thai Duong and Krzysztof Kotowicz from the Google Security Team discovered this vulnerability; they disclosed the vulnerability publicly on October 14, 2014 (despite the paper being dated "September 2014"). On December 8, 2014, a variation of the POODLE vulnerability that affected TLS was announced. The CVE-ID associated with the original POODLE attack is CVE-2014-3566. F5 Networks filed for CVE-2014-8730 as well, see POODLE attack against TLS section below. == Prevention == To mitigate the POODLE attack, one approach is to completely disable SSL 3.0 on the client side and the server side. However, some old clients and servers do not support TLS 1.0 and above. Thus, the authors of the paper on POODLE attacks also encourage browser and server implementation of TLS_FALLBACK_SCSV, which will make downgrade attacks impossible. Another mitigation is to implement "anti-POODLE record splitting". It splits the records into several parts and ensures none of them can be attacked. However the problem of the splitting is that, though valid according to the specification, it may also cause compatibility issues due to problems in server-side implementations. A full list of browser versions and levels of vulnerability to different attacks (including POODLE) can be found in the article Transport Layer Security. Opera 25 implemented this mitigation in addition to TLS_FALLBACK_SCSV. Google's Chrome browser and their servers had already supported TLS_FALLBACK_SCSV. Google stated in October 2014 it was planning to remove SSL 3.0 support from their products completely within a few months. Fallback to SSL 3.0 has been disabled in Chrome 39, released in November 2014. SSL 3.0 has been disabled by default in Chrome 40, released in January 2015. Mozilla disabled SSL 3.0 in Firefox 34 and ESR 31.3, which were released in December 2014, and added support of TLS_FALLBACK_SCSV in Firefox 35. Microsoft published a security advisory to explain how to disable SSL 3.0 in Internet Explorer and Windows OS, and on October 29, 2014, Microsoft released a fix which disables SSL 3.0 in Internet Explorer on Windows Vista / Server 2003 and above and announced a plan to disable SSL 3.0 by default in their products and services within a few months. Microsoft disabled fallback to SSL 3.0 in Internet Explorer 11 for Protect Mode sites on February 10, 2015, and for other sites on April 14, 2015. Apple's Safari (on OS X 10.8, iOS 8.1 and later) mitigated against POODLE by removing support for all CBC protocols in SSL 3.0, however, this left RC4 which is also completely broken by the RC4 attacks in SSL 3.0. POODLE was completely mitigated in OS X 10.11 (El Capitan 2015) and iOS 9 (2015). To prevent the POODLE attack, some web services dropped support of SSL 3.0. Examples include CloudFlare and Wikimedia. Network Security Services version 3.17.1 (released on October 3, 2014) and 3.16.2.3 (released on October 27, 2014) introduced support for TLS_FALLBACK_SCSV, and NSS will disable SSL 3.0 by default in April 2015. OpenSSL versions 1.0.1j, 1.0.0o and 0.9.8zc, released on October 15, 2014, introduced support for TLS_FALLBACK_SCSV. LibreSSL version 2.1.1, released on October 16, 2014, disabled SSL 3.0 by default. == POODLE attack against TLS == A new variant of the original POODLE attack was announced on December 8, 2014. This attack exploits implementation flaws of CBC encryption mode in the TLS 1.0 - 1.2 protocols. Even though TLS specifications require servers to check the padding, some implementations fail to validate it properly, which makes some servers vulnerable to POODLE even if they disable SSL 3.0. SSL Pulse showed "about 10% of the servers are vulnerable to the POODLE attack against TLS" before this vulnerability was announced. The CVE-ID for F5 Networks' implementation bug is CVE-2014-8730. The entry in NIST's NVD states that this CVE-ID is to be used only for F5 Networks' implementation of TLS, and that other vendors whose products have the same failure to validate the padding mistake in their implementations like A10 Networks and Cisco Systems need to issue their own CVE-IDs for their implementation errors because this is not a flaw in the protocol but in the implementation. The POODLE attack against TLS was found to be easier to initiate than the initial POODLE attack against SSL. There is no need to downgrade clients to SSL 3.0, meaning fewer steps are needed to execute a successful attack.

    Read more →
  • Digital transaction management

    Digital transaction management

    Digital transaction management (DTM) is a category of cloud services designed to digitally manage document-based transactions. DTM removes the friction inherent in transactions that involve people, documents, and data to create faster, easier, more convenient, and secure processes. DTM goes beyond content and document management to include e-signatures, authentication and non-repudiation; enabling co-browsing between the customer and the business; document transfer and certification; secure archiving that goes beyond records management; and a variety of meta-processes around managing electronic transactions and the documents associated with them. DTM standards are proposed and managed by the xDTM Standard Association Aragon Research has estimated that "by YE 2016, 70% of large enterprises will have a DTM initiative underway or fully implemented."

    Read more →
  • Memory-hard function

    Memory-hard function

    In cryptography, a memory-hard function (MHF) is a function that costs a significant amount of memory to efficiently evaluate. It differs from a memory-bound function, which incurs cost by slowing down computation through memory latency. MHFs have found use in key stretching and proof of work as their increased memory requirements significantly reduce the computational efficiency advantage of custom hardware over general-purpose hardware compared to non-MHFs. == Introduction == MHFs are designed to consume large amounts of memory on a computer in order to reduce the effectiveness of parallel computing. In order to evaluate the function using less memory, a significant time penalty is incurred. As each MHF computation requires a large amount of memory, the number of function computations that can occur simultaneously is limited by the amount of available memory. This reduces the efficiency of specialised hardware, such as application-specific integrated circuits and graphics processing units, which utilise parallelisation, in computing a MHF for a large number of inputs, such as when brute-forcing password hashes or mining cryptocurrency. == Motivation and examples == Bitcoin's proof-of-work uses repeated evaluation of the SHA-256 function, but modern general-purpose processors, such as off-the-shelf CPUs, are inefficient when computing a fixed function many times over. Specialized hardware, such as application-specific integrated circuits (ASICs) designed for Bitcoin mining, can use 30,000 times less energy per hash than x86 CPUs whilst having much greater hash rates. This led to concerns about the centralization of mining for Bitcoin and other cryptocurrencies. Because of this inequality between miners using ASICs and miners using CPUs or off-the shelf hardware, designers of later proof-of-work systems utilised hash functions for which it was difficult to construct ASICs that could evaluate the hash function significantly faster than a CPU. As memory cost is platform-independent, MHFs have found use in cryptocurrency mining, such as for Litecoin, which uses scrypt as its hash function. They are also useful in password hashing because they significantly increase the cost of trying many possible passwords against a leaked database of hashed passwords without significantly increasing the computation time for legitimate users. == Measuring memory hardness == There are various ways to measure the memory hardness of a function. One commonly seen measure is cumulative memory complexity (CMC). In a parallel model, CMC is the sum of the memory required to compute a function over every time step of the computation. Other viable measures include integrating memory usage against time and measuring memory bandwidth consumption on a memory bus. Functions requiring high memory bandwidth are sometimes referred to as "bandwidth-hard functions". == Variants == MHFs can be categorized into two different groups based on their evaluation patterns: data-dependent memory-hard functions (dMHF) and data-independent memory-hard functions (iMHF). As opposed to iMHFs, the memory access pattern of a dMHF depends on the function input, such as the password provided to a key derivation function. Examples of dMHFs are scrypt and Argon2d, while examples of iMHFs are Argon2i and catena. Many of these MHFs have been designed to be used as password hashing functions because of their memory hardness. A notable problem with dMHFs is that they are prone to side-channel attacks such as cache timing. This has resulted in a preference for using iMHFs when hashing passwords. However, iMHFs have been mathematically proven to have weaker memory hardness properties than dMHFs.

    Read more →
  • CARE Principles for Indigenous Data Governance

    CARE Principles for Indigenous Data Governance

    The CARE Principles for Indigenous Data Governance are a set of principles intended to guide open data projects in engaging Indigenous Peoples rights and interests. CARE was created in 2019 by the International Indigenous Data Sovereignty Interest Group, a group that is a part of the Research Data Alliance. It outlines collective rights related to open data in the context of the United Nations Declaration on the Rights of Indigenous Peoples and Indigenous data sovereignty. CARE is an acronym which stands for Collective Benefit, Authority to Control, Responsibility, Ethics. The CARE Principles are 'people and purpose-oriented, reflecting the crucial role of data in advancing Indigenous innovation and self-determination', and intended as a complement to the data-oriented perspective of other standards such as FAIR data (findable, accessible, interoperable, reusable). The CARE principles have been embedded into the Beta version of Standardised Data on Initiatives (STARDIT). CARE principles were the basis of a submission to the UN's Global Digital Compact.

    Read more →
  • Commit (data management)

    Commit (data management)

    In computer science and data management, a commit is a behavior that marks the end of a transaction and provides Atomicity, Consistency, Isolation, and Durability (ACID) in transactions. The submission records are stored in the submission log for recovery and consistency in case of failure. In terms of transactions, the opposite of committing is giving up tentative changes to the transaction, which is rolled back. Due to the rise of distributed computing and the need to ensure data consistency across multiple systems, commit protocols have been evolving since their emergence in the 1970s. The main developments include the Two-Phase Commit (2PC) first proposed by Jim Gray, which is the fundamental core of distributed transaction management. Subsequently, the Three-phase Commit (3PC), Hypothesis Commit (PC), Hypothesis Abort (PA), and Optimistic Commit protocols gradually emerged, solving the problems of blocking and fault recovery. Today, new fields such as e-commerce payment and blockchain technology are emerging, and submission protocols play a significant role in various business areas. By effectively handling transactions, resolving faults and recovering problems, the commit protocol becomes crucial in ensuring the reliability and consistency of data management. == History == The concept of Commit originated in the late 1960s and early 1970s, when computer technology was rapidly advancing and data management was becoming an important requirement in business and finance. Enterprises have gradually replaced the traditional paper records with computers, which has fully improved the work efficiency. The reliability and consistency of data have become a necessary requirement. Transaction management at this stage is relatively simple, limited to using a single computer for processing. It merely effectively records the changes in data to ensure that the data remains stable after the transaction is completed or terminated. In the late 1970s, as database systems moved from a single calculator operation to multiple distributed collaborations, ensuring data consistency and reliability became a new challenge. In 1978, computer scientist Jim Gray proposed the famous two-phase Commit Protocol (2PC), which became an effective solution for distributed transaction management, successfully managing data synchronization problems between multiple nodes. However, this commit protocol has some potential transaction blocking problems when nodes fail. In the early 1980s, researchers discovered that although the two-step commit protocol was effective at synchronizing data, there could be long waits and even system crashes, with limitations. To improve this problem, people have begun to explore new and effective methods, including enhancing efficiency by reducing message communication during the protocol process. IBM's R database introduced the Assumed Commit and Assumed abort protocols, which contributed significantly to transaction management efficiency. These two protocols have greatly improved the processing efficiency of distributed transactions by reducing communication overhead and have become an important breakthrough in the technology of transaction commit protocols. By the early 1990s, with the increase in business demands and the complexity of transactions, enterprises required higher efficiency in distributed transaction processing. In order to adapt to the needs of different environments, the scientific community has gradually developed various variants of commit protocols to provide more flexible transaction management options for different needs. For example, the three-phase commit protocol promotes the commit of transactions more effectively and reduces the occurrence of blocking problems by adding a pre-commit protocol and a timeout mechanism. In the 21st century, with the popularization of mobile Internet and wireless technology, the commit protocol has been further developed, and researchers have begun to pay attention to how to reduce the blocking in the transaction process to solve the problem of broadband limitation, battery life and network instability in the mobile environment. The proposal of optimistic commit protocol marks the extension of commit technology from traditional database to the emerging mobile data field. This protocol allows transactions to temporarily use unconfirmed data, improving the user experience in cases of poor network conditions. In recent years, with the rise of blockchain and decentralized technologies, submission protocols and consensus mechanisms have gradually merged. These consensus algorithms play a role in tamper-proofing and preventing malicious attacks on node pairs in a decentralized environment. This enables commit to no longer be confined to the scope of traditional database management, but to become the core technology of trust computing and distributed ledgers, further expanding the application field of commit in the digital age. This integration has brought about extensive application impacts. Each transaction can achieve the effect of tracking global submissions through the verification of the consensus mechanism, becoming an important technical foundation for promoting the circulation of digital assets, the operation of cryptocurrencies and decentralized applications. == Commit Protocol Types == In the world of data management, a transaction is a series of database operations, such as bank transfers and order submission. In order to ensure the accuracy, consistency, and security of the data, transactions are usually completed completely, or cancelled completely, leaving no partially completed results. Commit protocol is the method used to coordinate this process. Different protocols are applicable to different submission scenarios and have their own advantages and disadvantages. There are four major commit protocols. === Two-Phase Commit (2PC) === The two-phase commit protocol is the most classic and broadest approach to distributed transactions, which includes both a preparation phase and a commit phase. This commit protocol is designed to allow the database coordinator to determine if all participating nodes agree. The preparation phase is the phase in which the coordination node sends a ready to commit request to all nodes participating in the transaction. The commit phase is a global commit after all participating nodes are ready, and if no agreement is reached, all nodes roll back the transaction and undo all previous operations. Although the two-phase commit protocol is the easiest to operate and widely used, its obvious drawback is that it can cause transactions to be blocked for a long time when nodes fail, resulting in a decline in system performance and making it difficult to terminate or continue immediately. === Three-Phase Commit (3PC) === The three-phase commit protocol is an improved non-blocking protocol based on 2PC, which is divided into three stages: preparation, pre-commit and commit. Firstly, each node sends a "preparation" request. After confirmation, a "pre-submission" stage is added. At this point, each node has completed most of the preparatory work and is waiting for the final confirmation. Finally, in the formal commit stage, after all nodes send the "commit" request, the transaction is completed and committed. Compared with 2PC, it increases the timeout mechanism, avoids the blocking problem caused by single point of failure, and improves the reliability of the system. The three-phase commit protocol significantly optimizes transaction reliability, but adds additional overhead for message transmission and state maintenance. It is more suitable for distributed application scenarios with high transaction sensitivity and no acceptance of long waiting times. === Presumed Commit (PC) and Presumed Abort (PA) === Presumed Commit (PC) is the default that the transaction will be committed successfully and rollback will be notified unless an anomaly is encountered. This commit reduces the message overhead and logging costs of a normal commits. Presumed Abort (PA) is assumed that the default state of the transaction is a rollback and will only be committed when all nodes have explicitly agreed. This commit is applicable to transactions that are not updated frequently or have a low probability of successful commit. The IBM R Distributed Database management System was the first to propose and practice the PC and PA protocols, handling distributed transaction management very efficiently and becoming a classic case in the field of database transaction management. === Optimistic Commit Protocol === With the rise of the Internet, the previous commit protocols are facing new challenges, especially in mobile scenarios with unstable networks. Excessively long transaction waiting times can affect the user experience. The Optimistic Commit Protocol allows a transaction to temporarily access uncommitted data before committing to avoid wait times. This type of commit is suitable f

    Read more →
  • Vision transformer

    Vision transformer

    A vision transformer (ViT) is a transformer designed for computer vision. A ViT decomposes an input image into a series of patches (rather than text into tokens), serializes each patch into a vector, and maps it to a smaller dimension with a single matrix multiplication. These vector embeddings are then processed by a transformer encoder as if they were token embeddings. ViTs were designed as alternatives to convolutional neural networks (CNNs) in computer vision applications. They have different inductive biases, training stability, and data efficiency. Compared to CNNs, ViTs are less data efficient, but have higher capacity. Some of the largest modern computer vision models are ViTs, such as one with 22B parameters. Subsequent to its publication, many variants were proposed, with hybrid architectures with both features of ViTs and CNNs. ViTs have found application in image recognition, image segmentation, weather prediction, and autonomous driving. == History == Transformers were introduced in Attention Is All You Need (2017), and have found widespread use in natural language processing. A 2019 paper applied ideas from the Transformer to computer vision. Specifically, they started with a ResNet, a standard convolutional neural network used for computer vision, and replaced all convolutional kernels by the self-attention mechanism found in a Transformer. It resulted in superior performance. However, it is not a Vision Transformer. In 2020, an encoder-only Transformer was adapted for computer vision, yielding the ViT, which reached state of the art in image classification, overcoming the previous dominance of CNN. The masked autoencoder (2022) extended ViT to work with unsupervised training. The vision transformer and the masked autoencoder, in turn, stimulated new developments in convolutional neural networks. Subsequently, there was cross-fertilization between the previous CNN approach and the ViT approach. In 2021, some important variants of the Vision Transformers were proposed. These variants are mainly intended to be more efficient, more accurate or better suited to a specific domain. Two studies improved efficiency and robustness of ViT by adding a CNN as a preprocessor. The Swin Transformer achieved state-of-the-art results on some object detection datasets such as COCO, by using convolution-like sliding windows of attention mechanism, and the pyramid process in classical computer vision. == Overview == The basic architecture, used by the original 2020 paper, is as follows. In summary, it is a BERT-like encoder-only Transformer. The input image is of type R H × W × C {\displaystyle \mathbb {R} ^{H\times W\times C}} , where H , W , C {\displaystyle H,W,C} are height, width, channel (RGB). It is then split into square-shaped patches of type R P × P × C {\displaystyle \mathbb {R} ^{P\times P\times C}} . For each patch, the patch is pushed through a linear operator, to obtain a vector ("patch embedding"). The position of the patch is also transformed into a vector by "position encoding" (the paper tried no embedding, 1D embedding, 2D embedding, and relative embedding: 1D was adopted). The two vectors are added, then pushed through several Transformer encoders. The attention mechanism in a ViT repeatedly transforms representation vectors of image patches, incorporating more and more semantic relations between image patches in an image. This is analogous to how in natural language processing, as representation vectors flow through a transformer, they incorporate more and more semantic relations between words, from syntax to semantics. The above architecture turns an image into a sequence of vector representations. To use these for downstream applications, an additional head needs to be trained to interpret them. For example, to use it for classification, one can add a shallow MLP on top of it that outputs a probability distribution over classes. The original paper uses a linear-GeLU-linear-softmax network. == Variants == === Original ViT === The original ViT was an encoder-only Transformer supervise-trained to predict the image label from the patches of the image. As in the case of BERT, it uses a special token in the input side, and the corresponding output vector is used as the only input of the final output MLP head. The special token is an architectural hack to allow the model to compress all information relevant for predicting the image label into one vector. Transformers found their initial applications in natural language processing tasks, as demonstrated by language models such as BERT and GPT-3. By contrast the typical image processing system uses a convolutional neural network (CNN). Well-known projects include Xception, ResNet, EfficientNet, DenseNet, and Inception. Transformers measure the relationships between pairs of input tokens (words in the case of text strings), termed attention. The cost is quadratic in the number of tokens. For images, the basic unit of analysis is the pixel. However, computing relationships for every pixel pair in a typical image is prohibitive in terms of memory and computation. Instead, ViT computes relationships among pixels in various small sections of the image (e.g., 16x16 pixels), at a drastically reduced cost. The sections (with positional embeddings) are placed in a sequence. The embeddings are learnable vectors. Each section is arranged into a linear sequence and multiplied by the embedding matrix. The result, with the position embedding is fed to the transformer. === Architectural improvements === ==== Pooling ==== After the ViT processes an image, it produces some embedding vectors. These must be converted to a single class probability prediction by some kind of network. In the original ViT and Masked Autoencoder, they used a dummy [CLS] token, in emulation of the BERT language model. The output at [CLS] is the classification token, which is then processed by a LayerNorm-feedforward-softmax module into a probability distribution. Global average pooling (GAP) does not use the dummy token, but simply takes the average of all output tokens as the classification token. It was mentioned in the original ViT as being equally good. Multihead attention pooling (MAP) applies a multiheaded attention block to pooling. Specifically, it takes as input a list of vectors x 1 , x 2 , … , x n {\displaystyle x_{1},x_{2},\dots ,x_{n}} , which might be thought of as the output vectors of a layer of a ViT. The output from MAP is M u l t i h e a d e d A t t e n t i o n ( Q , V , V ) {\displaystyle \mathrm {MultiheadedAttention} (Q,V,V)} , where q {\displaystyle q} is a trainable query vector, and V {\displaystyle V} is the matrix with rows being x 1 , x 2 , … , x n {\displaystyle x_{1},x_{2},\dots ,x_{n}} . This was first proposed in the Set Transformer architecture. Later papers demonstrated that GAP and MAP both perform better than BERT-like pooling. A variant of MAP was proposed as class attention, which applies MAP, then feedforward, then MAP again. Re-attention was proposed to allow training deep ViT. It changes the multiheaded attention module. === Masked Autoencoder === The Masked Autoencoder took inspiration from denoising autoencoders and context encoders. It has two ViTs put end-to-end. The first one ("encoder") takes in image patches with positional encoding, and outputs vectors representing each patch. The second one (called "decoder", even though it is still an encoder-only Transformer) takes in vectors with positional encoding and outputs image patches again. ==== Training ==== During training, input images (224px x 224 px in the original implementation) are split along a designated number of lines on each axis, producing image patches. A certain percentage of patches are selected to be masked out by mask tokens, while all others are retained in the image. The network is tasked with reconstructing the image from the remaining unmasked patches. Mask tokens in the original implementation are learnable vector quantities. A linear projection with positional embeddings is then applied to the vector of unmasked patches. Experiments varying mask ratio on networks trained on the ImageNet-1K dataset found 75% mask ratios achieved high performance on both finetuning and linear-probing of the encoder's latent space. The MAE processes only unmasked patches during training, increasing the efficiency of data processing in the encoder and lowering the memory usage of the transformer. A less computationally-intensive ViT is used for the decoder in the original implementation of the MAE. Masked patches are added back to the output of the encoder block as mask tokens and both are fed into the decoder. A reconstruction loss is computed for the masked patches to assess network performance. ==== Prediction ==== In prediction, the decoder architecture is discarded entirely. The input image is split into patches by the same algorithm as in training, but no patches are masked out. A linear projection wi

    Read more →
  • Airborne Networking

    Airborne Networking

    An Airborne Network (AN) is the infrastructure owned by the United States Air Force that provides communication transport services through at least one node that is on a platform capable of flight. == Background == === Definition === The intent of the US Air Force's Airborne Network is to expand the Global Information Grid (GIG) to connect the three major domains of warfare: Air, Space, and Terrestrial. The Transformational Satellite Communications System network currently provides connectivity for all communication through space assets. The Combat Information Transport System and Theater Deployable Communications provide terrestrial connectivity for theatre based operations. The Airborne Network is engineered to utilize all airborne assets to connect with space and surface networks building a seamless communications platform across all domains. === Capabilities === The capabilities identified by this type of system are vastly beyond that of our current military. This system will enable the Air Force to provide a transportable network, flexible enough to communicate with any air, space, or ground asset in the area. The network will provide a beyond line-of-sight (LoS) communications infrastructure that can be packed up and moved in and out of the designated battlespace, enabling the military to have a reliable and secure communications network that extends globally. The network is designed to be flexible enough to provide the right communication and network packages for a specific region, mission, or technology. Operationally, The AN is designed to be self-forming, self-organizing, and self-generating, with nodes joining and leaving the network as they enter and exit a specific region. The network consists of dedicated tactical links, wideband air-to-air links, and ad hoc networks constructed by the Joint Tactical Radio System (JTRS) networking services. JTRS is a software-defined radio that will work with many existing military and civilian radios. It includes integrated encryption and Wideband Networking Software to create mobile ad hoc networks. It also provides system performance analysis and fault diagnostics automatically, reducing the demand for human intervention and network maintenance. === Intended Use === The AN was designed as the cornerstone for the new military doctrine known as Network Centric Warfare. This doctrine was developed to use information superiority to equip warfighters with more precise information enabling commanders and shooters to make smarter decisions faster. The AN contributes to Network Centric Warfare by enabling commanders to provide real-time information to warfighters in the air and on the ground. Warfighters can then utilize more information and make more educated decisions about how to act in a particular situation. Once the act has been carried out commanders will have immediate information about the result and can make judgments on how to continue. All-in-all the AN was designed to reduce the time necessary to identify a target, make clear and educated decisions to pull or not to pull the trigger, and assess battle == Topologies == There are four main network topologies that will be deployed and vary based on the placement of backbone and subnet class networks. === Space, Air, Ground Tether === Establishing a direct connection to another aircraft or ground node, via a point-to-point link for nodes within LOS or via a Satellite Communications (SATCOM) link for nodes that are beyond line-of-sight is known as tethering. SATCOM links provide connectivity to a network ground entry point. Strike aircraft that accompany C2 aircraft such as an AWACS are tethered via point-to-point links. Finally, C2 or intelligence, surveillance, and reconnaissnce (ISR) aircraft may connect via a LOS link directly to a network ground entry point. Each of these tethered alternatives works exactly like a hub or switch that has an entry point to a larger network and allows their connected users access to that network. === Flat Ad Hoc === A flat ad hoc topology refers to establishing nonpersistent network connections as needed among AN nodes that are present at a given time. With this network the nodes dynamically “discover” other nodes to which they can interconnect and form the network. The specific interconnections between the nodes are not planned in advance, but are made as opportunities arise. The nodes join and leave the network at will, continually changing connections to neighbor nodes based upon their location and mobility characteristics. === Tiered Ad Hoc === Ad hoc networks can be flat in the sense that all nodes are peers of each other in a single network, as discussed above, or they can dynamically organize themselves into hierarchical tiers such that higher tiers are used to move data between more localized subnets. This network topology can be compared to any conventional deployed network that utilizes routers, switches, and hubs to temporarily connect users. === Persistent Backbone === A network topology characterized by a persistent backbone is established using relatively persistent wideband connections among high-value platforms flying relatively stable orbits. It provides the connectivity between the tactical subnets which are considered edge networks relative to the backbone. This provides concentration points for connectivity to the space backbone as well as to terrestrial networks. This type of network topology is comparable to a conventional permanent network with established data trunks, routers, switches, and hubs to connect users. == Architecture == === Network Management === The platform management system enables operators to manage all on-board network elements. It interfaces and interoperates with the Airborne Network management system to enable operators to manage remote network elements in the airborne network. The network management system monitors the health of the network by passively testing the network for faults and latency. The system will also actively troubleshoot faults with probes to identify and isolate faulty connections, and enables operators to apply network parameters and security changes to all systems based on the status of the network. === Routing/Switching === Routing and switching enables data to be dynamically transmitted over the network to other nodes. Routing protocols must be able to identify nodes transmitted within their own platform and data to be sent to other platforms regardless of the current topology. The routing protocol must also provide seamless roaming by ensuring that no routed packets are lost when a node changes its point of attachment to the network. Maintaining scalability is important in routing as the network is constantly changing. The network must be able to function with numerous levels of platforms, varying numbers of fast moving platforms, and varying amounts of traffic per platform. Routers and switches will use metrics to determine the best paths to take when routing data. The routing protocol utilized for the AN will be an Adaptive Quality of Service routing protocol. === Gateways/Proxies === Gateways and proxies enable the connection numerous technology types regardless of age to communicate across the IP-based network. Gateways and proxies are essential in the operation of this network because so many different technologies are used to communicate in each domain. These systems will facilitate the transition of the legacy on-board infrastructure, transmission systems, tactical data link systems, and user applications to the objective airborne network systems. Therefore, they are only temporary until all platforms use a standardized IP radio for transmission. === Performance Enhancing Proxies === Performance Enhancing Proxies improve the performance of user applications running across the Airborne Network by countering wireless network impairments, such as limited bandwidth, long delays, high loss rates, and disruptions in network connections. Proxy systems are implemented between the user application and the network and can be used to improve performance at the application and transport functional layers of the OSI model. Some techniques that can be employed include: Compression: Data compression or header compression can be used to minimize the number of bits sent over the network. Data bundling: Smaller data packets can be combined (bundled) into a single large packet for transmission over the network. Caching: A local cache can be used to save and provide data objects that are requested multiple times, reducing transmissions over the network (and improving response times). Store and forward: Message queuing can be used to ensure message delivery to users who become disconnected from the network or are unable to connect to the network for a period of time. Once the platform connects, the stored messages are sent. Pipelining: Rather than opening several separate network connections pipelining can be used to share a single networ

    Read more →
  • Factorization of polynomials over finite fields

    Factorization of polynomials over finite fields

    In mathematics and computer algebra the factorization of a polynomial consists of decomposing it into a product of irreducible factors. This decomposition is theoretically possible and is unique for polynomials with coefficients in any field, but rather strong restrictions on the field of the coefficients are needed to allow the computation of the factorization by means of an algorithm. In practice, algorithms have been designed only for polynomials with coefficients in a finite field, in the field of rationals or in a finitely generated field extension of one of them. All factorization algorithms, including the case of multivariate polynomials over the rational numbers, reduce the problem to this case; see polynomial factorization. It is also used for various applications of finite fields, such as coding theory (cyclic redundancy codes and BCH codes), cryptography (public key cryptography by the means of elliptic curves), and computational number theory. As the reduction of the factorization of multivariate polynomials to that of univariate polynomials does not have any specificity in the case of coefficients in a finite field, only polynomials with one variable are considered in this article. == Background == === Finite field === The theory of finite fields, whose origins can be traced back to the works of Gauss and Galois, has played a part in various branches of mathematics. Due to the applicability of the concept in other topics of mathematics and sciences like computer science there has been a resurgence of interest in finite fields and this is partly due to important applications in coding theory and cryptography. Applications of finite fields introduce some of these developments in cryptography, computer algebra and coding theory. A finite field or Galois field is a field with a finite order (number of elements). The order of a finite field is always a prime or a power of prime. For each prime power q = pr, there exists exactly one finite field with q elements, up to isomorphism. This field is denoted GF(q) or Fq. If p is prime, GF(p) is the prime field of order p; it is the field of residue classes modulo p, and its p elements are denoted 0, 1, ..., p−1. Thus a = b in GF(p) means the same as a ≡ b (mod p). === Irreducible polynomials === Let F be a finite field. As for general fields, a non-constant polynomial f in F[x] is said to be irreducible over F if it is not the product of two polynomials of positive degree. A polynomial of positive degree that is not irreducible over F is called reducible over F. Irreducible polynomials allow us to construct the finite fields of non-prime order. In fact, for a prime power q, let Fq be the finite field with q elements, unique up to isomorphism. A polynomial f of degree n greater than one, which is irreducible over Fq, defines a field extension of degree n which is isomorphic to the field with qn elements: the elements of this extension are the polynomials of degree lower than n; addition, subtraction and multiplication by an element of Fq are those of the polynomials; the product of two elements is the remainder of the division by f of their product as polynomials; the inverse of an element may be computed by the extended GCD algorithm (see Arithmetic of algebraic extensions). It follows that, to compute in a finite field of non prime order, one needs to generate an irreducible polynomial. For this, the common method is to take a polynomial at random and test it for irreducibility. For sake of efficiency of the multiplication in the field, it is usual to search for polynomials of the shape xn + ax + b. Irreducible polynomials over finite fields are also useful for pseudorandom number generators using feedback shift registers and discrete logarithm over F2n. The number of irreducible monic polynomials of degree n over Fq is the number of aperiodic necklaces, given by Moreau's necklace-counting function Mq(n). The closely related necklace function Nq(n) counts monic polynomials of degree n which are primary (a power of an irreducible); or alternatively irreducible polynomials of all degrees d which divide n. === Example === The polynomial P = x4 + 1 is irreducible over Q but not over any finite field. On any field extension of F2, P = (x + 1)4. On every other finite field, at least one of −1, 2 and −2 is a square, because the product of two non-squares is a square and so we have If − 1 = a 2 , {\displaystyle -1=a^{2},} then P = ( x 2 + a ) ( x 2 − a ) . {\displaystyle P=(x^{2}+a)(x^{2}-a).} If 2 = b 2 , {\displaystyle 2=b^{2},} then P = ( x 2 + b x + 1 ) ( x 2 − b x + 1 ) . {\displaystyle P=(x^{2}+bx+1)(x^{2}-bx+1).} If − 2 = c 2 , {\displaystyle -2=c^{2},} then P = ( x 2 + c x − 1 ) ( x 2 − c x − 1 ) . {\displaystyle P=(x^{2}+cx-1)(x^{2}-cx-1).} === Complexity === Polynomial factoring algorithms use basic polynomial operations such as products, divisions, gcd, powers of one polynomial modulo another, etc. A multiplication of two polynomials of degree at most n can be done in O(n2) operations in Fq using "classical" arithmetic, or in O(nlog(n) log(log(n)) ) operations in Fq using "fast" arithmetic. A Euclidean division (division with remainder) can be performed within the same time bounds. The cost of a polynomial greatest common divisor between two polynomials of degree at most n can be taken as O(n2) operations in Fq using classical methods, or as O(nlog2(n) log(log(n)) ) operations in Fq using fast methods. For polynomials h, g of degree at most n, the exponentiation hq mod g can be done with O(log(q)) polynomial products, using exponentiation by squaring method, that is O(n2log(q)) operations in Fq using classical methods, or O(nlog(q)log(n) log(log(n))) operations in Fq using fast methods. In the algorithms that follow, the complexities are expressed in terms of number of arithmetic operations in Fq, using classical algorithms for the arithmetic of polynomials. == Factoring algorithms == Many algorithms for factoring polynomials over finite fields include the following three stages: Square-free factorization Distinct-degree factorization Equal-degree factorization An important exception is Berlekamp's algorithm, which combines stages 2 and 3. === Berlekamp's algorithm === Berlekamp's algorithm is historically important as being the first factorization algorithm which works well in practice. However, it contains a loop on the elements of the ground field, which implies that it is practicable only over small finite fields. For a fixed ground field, its time complexity is polynomial, but, for general ground fields, the complexity is exponential in the size of the ground field. === Square-free factorization === The algorithm determines a square-free factorization for polynomials whose coefficients come from the finite field Fq of order q = pm with p a prime. This algorithm firstly determines the derivative and then computes the gcd of the polynomial and its derivative. If it is not one then the gcd is again divided into the original polynomial, provided that the derivative is not zero (a case that exists for non-constant polynomials defined over finite fields). This algorithm uses the fact that, if the derivative of a polynomial is zero, then it is a polynomial in xp, which is, if the coefficients belong to Fp, the pth power of the polynomial obtained by substituting x by x1/p. If the coefficients do not belong to Fp, the pth root of a polynomial with zero derivative is obtained by the same substitution on x, completed by applying the inverse of the Frobenius automorphism to the coefficients. This algorithm works also over a field of characteristic zero, with the only difference that it never enters in the blocks of instructions where pth roots are computed. However, in this case, Yun's algorithm is much more efficient because it computes the greatest common divisors of polynomials of lower degrees. A consequence is that, when factoring a polynomial over the integers, the algorithm which follows is not used: one first computes the square-free factorization over the integers, and to factor the resulting polynomials, one chooses a p such that they remain square-free modulo p. Algorithm: SFF (Square-Free Factorization) Input: A monic polynomial f in Fq[x] where q = pm Output: Square-free factorization of f R ← 1 # Make w be the product (without multiplicity) of all factors of f that have # multiplicity not divisible by p c ← gcd(f, f′) w ← f/c # Step 1: Identify all factors in w i ← 1 while w ≠ 1 do y ← gcd(w, c) fac ← w / y R ← R · faci w ← y; c ← c / y; i ← i + 1 end while # c is now the product (with multiplicity) of the remaining factors of f # Step 2: Identify all remaining factors using recursion # Note that these are the factors of f that have multiplicity divisible by p if c ≠ 1 then c ← c1/p R ← R·SFF(c)p end if Output(R) The idea is to identify the product of all irreducible factors of f with the same multiplicity. This is done in two steps. The first step uses the formal d

    Read more →
  • Private message

    Private message

    In computer networking, a private message (PM), or direct message (DM), refers to a private communication, often text-based, sent or received by a user of a private communication channel on any given platform. Unlike public posts, PMs are only viewable by the participants. Long a function present on IRCs and Internet forums, private channels for PMs have also been prevalent features on instant messaging (IM) and on social media networks. It may be either synchronous (e.g. on an IM) or asynchronous (e.g. on an Internet forum). The term private message (PM) originated as a feature on internet forums, while the term direct message (DM) originated as a feature on Twitter. Due to the popularity of the latter service, DM has since been appropriated by other platforms, such as Instagram, and is often genericized in popular usage. == Overview == There are two main types of private messages, and one obscure type: One type includes those found on IRCs and Internet forums, as well as on social media services like Twitter, Facebook, and Instagram, where the focus is public posting, PMs allow users to communicate privately without leaving the platform. The second type are those relayed through instant messaging platforms such as WhatsApp and Snapchat, where users join the networks primarily to exchange PMs. A third type, peer-to-peer messaging, occurs when users create and own the infrastructure used to transmit and store the messages; while features vary depending on application, they give the user full control over the data they transmit. An example of software that enables this kind of messaging is Classified-ads. Besides serving as a tool to connect privately with friends and family, PMs have gained momentum in the workplace. Working professionals use PMs to reach coworkers in other spaces and increase efficiency during meetings. Although useful, using PMs in the workplace may blur the boundary between work and private lives. Some common forms of private messaging today include Facebook messaging (sometimes referred to as "inboxing"), Twitter direct messaging, and Instagram direct messaging. These forms of private messaging provide a private space on a usually public site. For instance, most activity on Twitter is public, but Twitter DMs provide a private space for communication between two users. This differs from mediums like email, texting, and Snapchat, where most or all activity is always private. Modern forms of private messaging may include multimedia messages, such as pictures or videos. == History == Email was first developed to send messages between different computers on ARPANET in 1971. Access to ARPANET was primarily limited to universities and other research institutions. Starting in 1983 or 1984, FidoNet allowed home computer users to send and receive email via bulletin board systems. Information services such as CompuServe, America Online, and Prodigy also helped to popularizes online messaging. The advent of the public World Wide Web in 1993 increased access to email via internet service providers, and later via webmail. Instant messaging systems became popular in the mid 1990s, as Internet access improved and personal computers became more common. The introduction of Skype in 2003 popularized Internet-based voice and video messaging. Direct messaging is now a feature of all major social networking services. == Privacy concerns == In January 2014, Matthew Campbell and Michael Hurley filed a class-action lawsuit against Facebook for breaching the Electronic Communications Privacy Act. They alleged that private messages which contained URLs were being read and used to generate profit, through data mining and user profiling, and that it was misleading for Facebook to refer to the functionality as "private" with the implication that the communication was "free from surveillance". In 2012, some Facebook users misinterpreted a redesign of the Facebook wall as publicly sharing private messages from 2008–2009. These were found to be public wall posts from those years, made at a time when it was not possible to like or comment on a wall post, making the notes look like private messages.

    Read more →
  • AZFinText

    AZFinText

    Arizona Financial Text System (AZFinText) is a textual-based quantitative financial prediction system written by Robert P. Schumaker of University of Texas at Tyler and Hsinchun Chen of the University of Arizona. == System == This system differs from other systems in that it uses financial text as one of its key means of predicting stock price movement. This reduces the information lag-time problem evident in many similar systems where new information must be transcribed (e.g., such as losing a costly court battle or having a product recall), before the quant can react appropriately. AZFinText overcomes these limitations by utilizing the terms used in financial news articles to predict future stock prices twenty minutes after the news article has been released. It is believed that certain article terms can move stocks more than others. Terms such as factory exploded or workers strike will have a depressing effect on stock prices whereas terms such as earnings rose will tend to increase stock prices. The AZFinText system analyzes financial news to identify the patterns in how investors react to such specific information. It uses methods like sentiment analysis and term weighting to examine the text of news articles. This system is designed to find price differences that occur when the market responds to news stories. This approach provides an alternative and easier method for predicting stock market movements. == Overview of research == The foundation of AZFinText can be found in the ACM TOIS article. Within this paper, the authors tested several different prediction models and linguistic textual representations. From this work, it was found that using the article terms and the price of the stock at the time the article was released was the most effective model and using proper nouns was the most effective textual representation technique. Combining the two, AZFinText netted a 2.84% trading return over the five-week study period. AZFinText was then extended to study what combination of peer organizations help to best train the system. Using the premise that IBM has more in common with Microsoft than GM, AZFinText studied the effect of varying peer-based training sets. To do this, AZFinText trained on the various levels of GICS and evaluated the results. It was found that sector-based training was most effective, netting an 8.50% trading return, outperforming Jim Cramer, Jim Jubak and DayTraders.com during the study period. AZFinText was also compared against the top 10 quantitative systems and outperformed 6 of them. A third study investigated the role of portfolio building in a textual financial prediction system. From this study, Momentum and Contrarian stock portfolios were created and tested. Using the premise that past winning stocks will continue to win and past losing stocks will continue to lose, AZFinText netted a 20.79% return during the study period. It was also noted that traders were generally overreacting to news events, creating the opportunity of abnormal returns. A fourth study looked into using author sentiment as an added predictive variable. Using the premise that an author can unwittingly influence market trades simply by the terms they use, AZFinText was tested using tone and polarity features. It was found that Contrarian activity was occurring within the market, where articles of a positive tone would decrease in price and articles of a negative tone would increase in price. A further study investigated what article verbs have the most influence on stock price movement. From this work, it was found that planted, announcing, front, smaller and crude had the highest positive impact on stock price. == Notable publicity == AZFinText has been the topic of discussion by numerous media outlets. Some of the more notable ones include The Wall Street Journal, MIT's Technology Review, Dow Jones Newswire, WBIR in Knoxville, TN, Slashdot and other media outlets.

    Read more →
  • Upworthy

    Upworthy

    Upworthy is a media brand that focuses on positive storytelling. It was started in March 2012 by Eli Pariser, the former executive director of MoveOn, and Peter Koechley, the former managing editor of The Onion. One of Facebook's co-founders, Chris Hughes, was an early investor. At its peak between 2012 and 2014, it reached up to 100 million people a month. In 2017, the company was acquired by Good Worldwide. == History == Upworthy was launched in 2012 with a focus on aggregating positive content, which aligned with Facebook's algorithm. Originally, Upworthy curators searched the internet for existing content to feature on the site. Once selected as an option, curators brainstormed different headlines and shareable images for the content, and tested it with a small sample of Upworthy's visitors before sharing it on the site. The site popularized a clickbait style of two-phrase headlines. The company simplifies issues that are controversial by nature, which are presented from a politically liberal point of view and are heavily fact-checked for accuracy. In June 2013, an article in Fast Company called Upworthy "the fastest growing media site of all time". It had 8.7 million unique monthly visitors in the first six months, and in November 2013, had a high of 87 million unique visitors in a single month. In 2013, Facebook changed its algorithm, leading to a significant decline in readers from that platform. Upworthy fired one round of writers in 2015, and another in 2016, after an unionization effort by some of the staff. The union involved, the Writers Guild of America, East, has organized several online "viral" news publishers. In January 2017, Upworthy was acquired by media company GOOD Worldwide. The newsrooms of the two organizations would merge as part of the acquisition. About 20 staffers were laid off as part of the merger. In March 2020, Upworthy saw a 65% increase in Instagram followers and a 47% increased interest in positive content on-site page views as a result of increased interest in positive content during the COVID-19 pandemic. In January 2023, National Geographic Books bought Good People: Stories From the Best of Humanity from Upworthy, with a publication date of September 3, 2024. The book is described as "a heartwarming collection of first-person tales that will provide comfort and inspiration to anyone who could use a little dose of joy right now". It was created by two senior Upworthy team members, Gabriel Reilich and Lucia Knell, and features 101 stories from Upworthy's audience. The co-creators encouraged Upworthy followers to connect with the brand through questions on their posts, opening the door for organic and personal stories to be shared in the comment sections. The book debuted on The New York Times nonfiction bestseller list on September 22, 2024, and remained on the list for two weeks. The book is seen in the top 10 on Publishers Weekly Fall 2024 Adult Preview: Lifestyle and on The Washington Post's "5 feel-good books".

    Read more →
  • Data marketplace

    Data marketplace

    Data marketplace is an online platform for sharing and consuming data in the form of data assets or data products. Part of the data management stack, it aims to bring together data producers and data consumers (including business users and AI) in a single space, with the objective of increasing access to understandable, high-quality data. Included within its Data Marketplaces and Exchange (DME) category by Gartner, data marketplaces can provide data internally within an organization, externally with partners, or as open data. == Concept == Digitization has dramatically increased data volumes within organizations, with IDC predicting that by 2025 the world will contain 175 zettabytes of data. This has created a need to both manage this data and provide access to it to enable business intelligence and data analysis. However, data is often scattered within multiple systems (such as data warehouses and data lakes), and is in formats that are only understandable by technical experts, such as data scientists. According to IDC, 81% of IT leaders cite data silos as a major barrier to digital transformation. This means that data is not freely available to business users or external audiences such as partners or citizens, limiting its value, and holding back AI deployments. Data marketplaces solve this issue, providing seamless, self-service access to high-quality data in an understandable, secure and auditable manner. They break down data silos, reduce friction in data access, and enable a broader range of users, including non-technical profiles, to find, understand, and consume data autonomously. Data assets on the marketplace can be raw data, data visualizations or data products. Data marketplaces combine data management functions such as data governance with the user-friendly experience offered by e-commerce marketplaces in order to increase the usage of data. These include features such as powerful search engines, feedback, ratings, subscriptions and product description sheets. According to Gartner, data marketplaces provide infrastructure, transactional capabilities, and services for both consumers and providers of data assets. == History and timeline == Data marketplaces have evolved since they first emerged in terms of both their scope and usage. === 2000s === With the rise of the internet, data brokers began collecting, aggregating, distributing and selling personal, financial and marketing data to third parties online. Data marketplaces were deployed to monetize this data, making it discoverable and accessible to users, either through subscriptions or one-off purchases. At the same time, regulations, such as the US Open Government Initiative of 2009 and others around the world mandated greater transparency and data sharing with the public. Data sharing portals were created by public and government bodies to make this information available through self-service to all users. === 2010s === Due to the growth of big data and cloud platforms, cloud-based data exchange platforms emerged. These were offered by major infrastructure providers, and included Amazon Web Services (AWS) Data Exchange, Snowflake Data Marketplace, and the Google Cloud Platform. These platforms moved beyond simple data brokerage or open data by providing structured, catalogued data sharing between organizations. === 2020s === Driven by a need to increase internal data sharing with both business users and AI, organizations are now looking to adopt internal data marketplaces. These aim to democratize data consumption by providing seamless access for all employees and AI to trusted data, including data products, through an intuitive, e-commerce style experience. According to Gartner analyst Richa Jha, "by providing a single, governed platform for discovering, sharing, and scaling data products, data marketplaces drive productivity, collaboration, and ROI across the enterprise." == Data marketplaces within the overall data architecture == Data marketplaces provide a consumption and collaboration layer for data. That means they complement and integrate with other parts of the overall data architecture, including: === Data warehouses and data lakes === Data marketplaces connect to data sources, such as data warehouses or data lakes, to provide intuitive access to the data stored within them, enabling data to be shared and distributed to non-technical audiences. Access can be direct, with data and data products stored within the data marketplace or virtualized. === Data catalog === A data catalog provides a technical inventory of an organization's data estate. It collects technical information on all available data assets within an organization, based on metadata descriptions. This ensures traceability, and supports compliance and governance requirements. Unlike a data marketplace, a data catalog does not provide access to data, and is designed to be used by data professionals, rather than the business. This means it lacks an intuitive, understandable interface and is consequently not easily accessible by business users. === Data mesh === Data mesh is an architecture and framework for data management, first defined by Zhamak Dehghani in 2019. It aims to decentralize data ownership to delegate responsibility, empowering teams and focusing on delivering data to users in the form of self-service data products. The data marketplace is a central pillar of data mesh, providing intuitive access to these data products, and creating a collaboration space for data owners and data consumers. === Data product === Data products are high-value, consumable data assets that package high-quality data and associated tools to enable seamless usage by business users at scale. First defined by McKinsey in 2022, they have an identified owner, a service level agreement (SLA), and a reusability logic. == Core components of a data marketplace == A data marketplace typically includes specific core components: === E-commerce style interface === An e-commerce style experience that engages non-technical users, minimizes the need for training and builds confidence and trust in data. Look and feel should be customizable to incorporate corporate design guidelines to ensure consistency with other organizational applications. === Built-in data catalog === As in a standalone data catalog, this indexes all available data, based on metadata that includes type, source, owner, freshness, and quality level. === Discovery and search engine === This enables users to search, filter, explore and discover available data intuitively. As in an e-commerce marketplace, it should be intelligent, and provide relevant results based on natural language queries. === Access control and security management === Data marketplaces will contain data that needs to be protected under regulations such as the General Data Protection Regulation (GDPR) in Europe, the California Consumer Privacy Act (CCPA) in the United States, and sector-specific frameworks in industries such as finance and healthcare. To ensure both security and compliance while maximizing data consumption, the data marketplace should include granular access management and a full audit trail. === Semantic layer and business glossary === Different parts of the business are likely to use different terms to describe data. This leads to inconsistencies and an inability to share data across systems and teams. The semantic layer and business glossary standardize a shared vocabulary and common definitions of business indicators and concepts, providing a single language for data across the business and for AI agents. === Data governance mechanisms === These enforce corporate data governance policies, ensuring data traceability through data lineage, quality certification, usage monitoring, and continuous improvement through user feedback loops. === Collaboration features === As on an e-commerce website, a data marketplace should provide collaboration features that bring together data users and data owners. This includes the ability to rate data products, share use cases, and provide feedback to data owners, creating a community around data and supporting a data-driven culture. == Types of data marketplace == While they share the same underlying technology, data marketplaces can be deployed in three broad ways: === Internal data marketplaces === These bring together data from across an organization and make it available via self-service to employees from across the business. They aim to widen access to data and consequently to improve decision-making and reporting, increase performance and maximize efficiency. === Ecosystem data marketplaces === These extend sharing beyond a single organization, enabling multiple partners (public institutions, industry players, research bodies) to share and consume data within a governed framework. Data can be provided by all parties or simply by one organization and consumed by others. Ecosystem data marketplaces are particularly relevant in

    Read more →