AI Grammar Detection

AI Grammar Detection — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Energy-based model

    Energy-based model

    An energy-based model (EBM), also called Canonical Ensemble Learning (CEL) or Learning via Canonical Ensemble (LCE), is an application of canonical ensemble formulation from statistical physics for learning from data. The approach prominently appears in generative artificial intelligence. EBMs provide a unified framework for many probabilistic and non-probabilistic approaches to such learning, particularly for training graphical and other structured models. An EBM learns the characteristics of a target dataset and generates a similar but larger dataset. EBMs detect the latent variables of a dataset and generate new datasets with a similar distribution. Energy-based generative neural networks is a class of generative models, which aim to learn explicit probability distributions of data in the form of energy-based models, the energy functions of which are parameterized by modern deep neural networks. Boltzmann machines are a special form of energy-based models with a specific parametrization of the energy. == Description == For a given input x {\displaystyle x} , the model describes an energy E θ ( x ) {\displaystyle E_{\theta }(x)} such that the Boltzmann distribution P θ ( x ) = e − β E θ ( x ) Z ( θ ) {\displaystyle P_{\theta }(x)={e^{-\beta E_{\theta }(x)} \over Z(\theta )}} is a probability (density), and typically β = 1 {\displaystyle \beta =1} . Since the normalization constant: Z ( θ ) := ∫ x ∈ X e − β E θ ( x ) d x {\displaystyle Z(\theta ):=\int _{x\in X}e^{-\beta E_{\theta }(x)}dx} (also known as the partition function) depends on all the Boltzmann factors of all possible inputs x {\displaystyle x} , it cannot be easily computed or reliably estimated during training simply using standard maximum likelihood estimation. However, for maximizing the likelihood during training, the gradient of the log-likelihood of a single training example x {\displaystyle x} is given by using the chain rule: ∂ θ log ⁡ ( P θ ( x ) ) = E x ′ ∼ P θ [ ∂ θ E θ ( x ′ ) ] − ∂ θ E θ ( x ) ( ∗ ) {\displaystyle \partial _{\theta }\log \left(P_{\theta }(x)\right)=\mathbb {E} _{x'\sim P_{\theta }}[\partial _{\theta }E_{\theta }(x')]-\partial _{\theta }E_{\theta }(x)\,()} The expectation in the above formula for the gradient can be approximately estimated by drawing samples x ′ {\displaystyle x'} from the distribution P θ {\displaystyle P_{\theta }} using Markov chain Monte Carlo (MCMC). Early energy-based models, such as the 2003 Boltzmann machine by Hinton, estimated this expectation via blocked Gibbs sampling. Newer approaches make use of more efficient Stochastic Gradient Langevin Dynamics (LD), drawing samples using: x 0 ′ ∼ P 0 , x i + 1 ′ = x i ′ − α 2 ∂ E θ ( x i ′ ) ∂ x i ′ + ϵ {\displaystyle x_{0}'\sim P_{0},x_{i+1}'=x_{i}'-{\frac {\alpha }{2}}{\frac {\partial E_{\theta }(x_{i}')}{\partial x_{i}'}}+\epsilon } , where ϵ ∼ N ( 0 , α ) {\displaystyle \epsilon \sim {\mathcal {N}}(0,\alpha )} . A replay buffer of past values x i ′ {\displaystyle x_{i}'} is used with LD to initialize the optimization module. The parameters θ {\displaystyle \theta } of the neural network are therefore trained in a generative manner via MCMC-based maximum likelihood estimation: the learning process follows an "analysis by synthesis" scheme, where within each learning iteration, the algorithm samples the synthesized examples from the current model by a gradient-based MCMC method (e.g., Langevin dynamics or Hybrid Monte Carlo), and then updates the parameters θ {\displaystyle \theta } based on the difference between the training examples and the synthesized ones – see equation ( ∗ ) {\displaystyle ()} . This process can be interpreted as an alternating mode seeking and mode shifting process, and also has an adversarial interpretation. Essentially, the model learns a function E θ {\displaystyle E_{\theta }} that associates low energies to correct values, and higher energies to incorrect values. After training, given a converged energy model E θ {\displaystyle E_{\theta }} , the Metropolis–Hastings algorithm can be used to draw new samples. The acceptance probability is given by: P a c c ( x i → x ∗ ) = min ( 1 , P θ ( x ∗ ) P θ ( x i ) ) . {\displaystyle P_{acc}(x_{i}\to x^{})=\min \left(1,{\frac {P_{\theta }(x^{})}{P_{\theta }(x_{i})}}\right).} == History == The term "energy-based models" was first coined in a 2003 JMLR paper where the authors defined a generalisation of independent components analysis to the overcomplete setting using EBMs. Other early work on EBMs proposed models that represented energy as a composition of latent and observable variables. == Characteristics == EBMs demonstrate useful properties: Simplicity and stability. The EBM is the only object that needs to be designed and trained. Separate networks need not be trained to ensure balance. Adaptive computation time. An EBM can generate sharp, diverse samples or (more quickly) coarse, less diverse samples. Given infinite time, this procedure produces true samples. Flexibility. In Variational Autoencoders (VAE) and flow-based models, the generator learns a map from a continuous space to a (possibly) discontinuous space containing different data modes. EBMs can learn to assign low energies to disjoint regions (multiple modes). Adaptive generation. EBM generators are implicitly defined by the probability distribution, and automatically adapt as the distribution changes (without training), allowing EBMs to address domains where generator training is impractical, as well as minimizing mode collapse and avoiding spurious modes from out-of-distribution samples. Compositionality. Individual models are unnormalized probability distributions, allowing models to be combined through product of experts or other hierarchical techniques. == Experimental results == On image datasets such as CIFAR-10 and ImageNet 32x32, an EBM model generated high-quality images relatively quickly. It supported combining features learned from one type of image for generating other types of images. It was able to generalize using out-of-distribution datasets, outperforming flow-based and autoregressive models. EBM was relatively resistant to adversarial perturbations, behaving better than models explicitly trained against them with training for classification. == Applications == Target applications include natural language processing, robotics and computer vision. The first energy-based generative neural network is the generative ConvNet proposed in 2016 for image patterns, where the neural network is a convolutional neural network. The model has been generalized to various domains to learn distributions of videos, and 3D voxels. They are made more effective in their variants. They have proven useful for data generation (e.g., image synthesis, video synthesis, 3D shape synthesis, etc.), data recovery (e.g., recovering videos with missing pixels or image frames, 3D super-resolution, etc), data reconstruction (e.g., image reconstruction and linear interpolation ). == Alternatives == EBMs compete with techniques such as variational autoencoders (VAEs), generative adversarial networks (GANs) or normalizing flows. == Extensions == === Joint energy-based models === Joint energy-based models (JEM), proposed in 2020 by Grathwohl et al., allow any classifier with softmax output to be interpreted as energy-based model. The key observation is that such a classifier is trained to predict the conditional probability p θ ( y | x ) = e f → θ ( x ) [ y ] ∑ j = 1 K e f → θ ( x ) [ j ] for y = 1 , … , K and f → θ = ( f 1 , … , f K ) ∈ R K , {\displaystyle p_{\theta }(y|x)={\frac {e^{{\vec {f}}_{\theta }(x)[y]}}{\sum _{j=1}^{K}e^{{\vec {f}}_{\theta }(x)[j]}}}\ \ {\text{ for }}y=1,\dotsc ,K{\text{ and }}{\vec {f}}_{\theta }=(f_{1},\dotsc ,f_{K})\in \mathbb {R} ^{K},} where f → θ ( x ) [ y ] {\displaystyle {\vec {f}}_{\theta }(x)[y]} is the y-th index of the logits f → {\displaystyle {\vec {f}}} corresponding to class y. Without any change to the logits it was proposed to reinterpret the logits to describe a joint probability density: p θ ( y , x ) = e f → θ ( x ) [ y ] Z ( θ ) , {\displaystyle p_{\theta }(y,x)={\frac {e^{{\vec {f}}_{\theta }(x)[y]}}{Z(\theta )}},} with unknown partition function Z ( θ ) {\displaystyle Z(\theta )} and energy E θ ( x , y ) = − f θ ( x ) [ y ] {\displaystyle E_{\theta }(x,y)=-f_{\theta }(x)[y]} . By marginalization, we obtain the unnormalized density p θ ( x ) = ∑ y p θ ( y , x ) = ∑ y e f → θ ( x ) [ y ] Z ( θ ) =: e − E θ ( x ) , {\displaystyle p_{\theta }(x)=\sum _{y}p_{\theta }(y,x)=\sum _{y}{\frac {e^{{\vec {f}}_{\theta }(x)[y]}}{Z(\theta )}}=:e^{-E_{\theta }(x)},} therefore, E θ ( x ) = − log ⁡ ( ∑ y e f → θ ( x ) [ y ] Z ( θ ) ) , {\displaystyle E_{\theta }(x)=-\log \left(\sum _{y}{\frac {e^{{\vec {f}}_{\theta }(x)[y]}}{Z(\theta )}}\right),} so that any classifier can be used to define an energy function E θ ( x ) {\displaystyle E_{\theta }(x)} .

    Read more →
  • IBM 37xx

    IBM 37xx

    IBM 37xx (or 37x5) is a family of IBM Systems Network Architecture (SNA) programmable front-end processors used mainly in mainframe environments. All members of the family ran one of three IBM-supplied programs. Emulation Program (EP) mimicked the operation of the older IBM 270x non-programmable controllers. Network Control Program (NCP) supported Systems Network Architecture devices. Partitioned Emulation Program (PEP) combined the functions of the two. == Models == === 370x series === 3705 — the oldest of the family, introduced in 1972 to replace the non-programmable IBM 270x family. The 3705 could control up to 352 communications lines. 3704 was a smaller version, introduced in 1973. It supported up to 32 lines. === 371x === The 3710 communications controller was introduced in 1984. === 372x series === The 3725 and the 3720 systems were announced in 1983. The 3725 replaced the hardware line scanners used on previous 370x machines with multiple microcoded processors. The 3725 was a large-scale node and front end processor. The 3720 was a smaller version of the 3725, which was sometimes used as a remote concentrator. The 3726 was an expansion unit for the 3725. With the expansion unit, the 3725 could support up to 256 lines at data rates up to 256 kbit/s, and connect to up to eight mainframe channels. Marketing of the 372x machines was discontinued in 1989. IBM discontinued support for the 3705, 3720, 3725 in 1999. === 374x series === The 3745, announced in 1988, provides up to eight T1 circuits. At the time of the announcement, IBM was estimated to have nearly 85% of the over US$825 million market for communications controllers over rivals such as NCR Comten and Amdahl Corporation. The 3745 is no longer marketed, but still supported and used. The 3746 "Nways Controller" model 900, unveiled in 1992, was an expansion unit for the 3745 supporting additional Token Ring and ESCON connections. A stand-alone model 950 appeared in 1995. == Successors == IBM no longer manufactures 37xx processors. The last models, the 3745/46, were withdrawn from marketing in 2002. Replacement software products are Communications Controller for Linux on System z and Enterprise Extender. == Clones == Several companies produced clones of 37xx controllers, including NCR COMTEN and Amdahl Corporation.

    Read more →
  • NRENum.net

    NRENum.net

    The NRENum.net service is an end-user ENUM service run by TERENA and the participating national research and education networking organisations (NRENs), primarily for academia. NRENum.net is considered as a complementary service and a valid alternative to the Golden ENUM tree. The domain nrenum.net is being populated in order to provide the infrastructure in DNS for storage of E.164 numbers. The NRENum.net service includes the operation of the Tier-0 root Domain Name Server(s) and the delegation of county codes to NRENum.net Registries. NRENum.net is a registered community trademark of TERENA. == Service description == E.164 Telephone Number Mapping (ENUM) is a standard protocol that is the result of work of the Internet Engineering Task Force's Telephone Number Mapping working group. ENUM translates a telephone number into a domain name. This allows users to continue to use the existing phone number formats they are familiar with, while allowing the call to be routed using DNS. This makes ENUM a quick, stable and cheap link between telecommunications systems and the Internet. RFC 3761 discusses the use of the Domain Name System for storage of E.164 numbers. More specifically, how DNS can be used for identifying available services connected to one E.164 number. The RIPE NCC provides DNS operations for e164.arpa (known as Golden ENUM tree) in accordance with the instructions from the Internet Architecture Board. The NRENum.net service is an end-user ENUM service run by TERENA and the participating NRENs primarily for academia. NRENum.net is considered as a complementary service and a valid alternative to the Golden ENUM tree. The domain nrenum.net is being populated in order to provide the infrastructure in DNS for storage of E.164 numbers. The NRENum.net service includes the operation of the Tier-0 root Domain Name Servers and the delegation of county codes to NRENum.net Registries. NRENum.net is a registered community trademark of TERENA. NRENum.net facilitates services such as Voice over IP and videoconferencing. NRENum.net tree refers to the tree structure where: Tier-0 root Domain Name Servers (technically one master and several secondary servers ensuring resilience) are run by the hosting organisations and coordinated by the NRENum.net Operations Team. Tier-1 Domain Name Servers are run by the NRENum.net (national or regional) Registries responsible for the country code(s) delegated. Tier-2 and lower DNS sub-delegations may be implemented, regulated by the national service policies. An NRENum.net Registry is an entity that is authorised by the NRENum.net Operations Team to operate the national or regional Tier-1 Domain Name Server and be responsible for the county code(s) delegated. In many countries there is a National Research and Education Networking organisation (NREN) that acts as the Registry of the country. An NRENum.net Registrar is responsible for the number/block registration in the Tier-1 DNS and a Number Validation Entity is responsible for the validation of the E.164 telephone numbers to be registered. The NREN may at the same time have the role of the NRENum.net Registry, Registrar and Validation Entity for the country code(s) delegated. A Registrant (end user) is an E.164 telephone number holder. Holders of E.164 numbers who want to be listed in the service must contact the appropriate NRENum.net Registrar. Number (block) delegation is the technical process of assigning country codes to national registries, or number blocks under country codes to end users. Number (block) registration is the technical process of configuring DNS and populating it with the appropriate ENUM records (i.e., adding NAPTR records to DNS) via registrars. The ITU-T strictly regulates the number structure of valid E.164 telephone numbers and assigns number blocks to national authorities (telecom regulators) or recently to global entities directly. The national authorities can further delegate the number ranges to local operators within the country or region. A virtual number has either a non-valid E.164 number structure (e.g., longer than 15 digits) or has a valid structure but is not assigned to any national authorities or operators. The number Validation Entity is responsible for checking the numbers to be registered to NRENum.net. == History == The idea for the NRENum.net service was conceived in 2006. NRENum.net became operational in August 2006, and was run by Bernie Höneisen, a staff member of SWITCH, and Kewin Stöckigt, a staff member of AARNet, as a private service, with technical support from SWITCH and the participants in the TERENA Task Force on Enhanced Communication Services (TF-ECS). When that task force completed its activities in 2008, TERENA agreed to take over the coordination of the NRENum.net service. By that time, nine NRENs had joined NRENum.net. The service continued to grow during the next years, and in March 2012 NRENum.net went global when RNP from Brazil joined the service as its 14th partificpant and the first outside Europe. In 2011, the participants decided to migrate the operation of the service's master Domain Name Server to NIIF and the operation of the two secondary DNSs to CARNET and SWITCH. In 2013, Internet2, AARNet and NORDUnet set up additional secondary Domain Name Servers for their regions, thereby completing the global distribution of DNS slaves and bringing the resilience of the NRENum.net infrastructure to a high level. == Governance == TERENA has established a lightweight global governance structure. The Global NRENum.net Governance Committee (GNGC) is the highest-level strategic body responsible for overall NRENum.net service definition, sustainability and long-term strategy. This includes formulating and recommending service governance principles and policies. Its members are nominated by the NRENum.net Registries in the various world regions, and are appointed by TERENA. The GNGC is composed of two members representing Europe, two representing the Asia-Pacific region, and two representing the Americas. The NRENum.net Operations Team is responsible for the day-to-day operations of the Tier-0 root DNSs and the handling of country code delegation requests. It may escalate technical or policy issues to the GNGC for discussion. TERENA is responsible for ensuring the correct and secure operations of the NRENum.net service performed by the NRENum.net Operations Team and governance by the GNGC. TERENA also supports the development of technical improvements to the NRENum.net service and promotes the deployment of NRENum.net worldwide. == Geographical deployment == Thirty-two county codes are delegated in the NRENum.net service. Below these are listed per world region. === Europe === === Asia-Pacific === === North America === +1 United States (Internet2) === Latin America === === Caribbean === === Africa === +262 Réunion, Mayotte (RENATER)

    Read more →
  • Data philanthropy

    Data philanthropy

    Data philanthropy refers to the practice of private companies donating corporate data. This data is usually donated to nonprofits or donation-run organizations that have difficulty keeping up with expensive data collection technology. The concept was introduced through the United Nations Global Pulse initiative in 2011 to explore corporate data assets for humanitarian, academic, and societal causes. For example, anonymized mobile data could be used to track disease outbreaks, or data on consumer actions may be shared with researchers to study public health and economic trends. == Definition == A large portion of data collected from the internet consists of user-generated content, such as blogs, social media posts, and information submitted through lead generation and data forms. Additionally, corporations gather and analyze consumer data to gain insight into customer behavior, identify potential markets, and inform investment decisions. United Nations Global Pulse director Robert Kirkpatrick has referred to this type of data as "massive passive data" or "data exhaust." == Challenges == While data philanthropy can enhance development policies, making users' private data available to various organizations raises concerns regarding privacy, ownership, and the equitable use of data. Different techniques, such as differential privacy and alphanumeric strings of information, can allow access to personal data while ensuring user anonymity. However, even if these algorithms work, re-identification may still be possible. Another challenge is convincing corporations to share their data. The data collected by corporations provides them with market competitiveness and insight regarding consumer behavior. Corporations may fear losing their competitive edge if they share the information they have collected with the public. Numerous moral challenges are also encountered. In 2016, Mariarosaria Taddeo, a digital ethics professor at the University of Oxford, proposed an ethical framework to address them. == Sharing strategies == The goal of data philanthropy is to create a global data commons where companies, governments, and individuals can contribute anonymous, aggregated datasets. The United Nations Global Pulse offers four different tactics that companies can use to share their data that preserve consumer anonymity: Share aggregated and derived data sets for analysis under nondisclosure agreements (NDA) Allow researchers to analyze data within the private company's own network under NDAs Real-Time Data Commons: data pooled and aggregated among multiple companies of the same industry to protect competitiveness Public/Private Alerting Network: companies mine data behind their own firewalls and share indicators == Application in various fields == Many corporations take part in data philanthropy, including social networking platforms (e.g., Facebook, Twitter), telecommunications providers (e.g., Verizon, AT&T), and search engines (e.g., Google, Bing). Collecting and sharing anonymized, aggregated user-generated data is made available through data-sharing systems to support research, policy development, and social impact initiatives. By participating in such efforts, these organizations contribute to causes regarded as beneficial to society, allowing institutions to give back meaningfully. With the onset of technological advancements, the sharing of data on a global scale and an in-depth analysis of these data structures could mitigate the effects of global issues such as natural disasters and epidemics. Robert Kirkpatrick, the Director of the United Nations Global Pulse, has argued that this aggregated information is beneficial for the common good and can lead to developments in research and data production in a range of varied fields. === Digital disease detection === Health researchers use digital disease detection by collecting data from various sources—such as social media platforms (e.g., Twitter, Facebook), mobile devices (e.g., cell phones, smartphones), online search queries, mobile apps, and sensor data from wearables and environmental sensors—to monitor and predict the spread of infectious diseases. This approach allows them to track and anticipate outbreaks of epidemics (e.g., COVID-19, Ebola), pandemics, vector-borne diseases (e.g., malaria, dengue fever), and respiratory illnesses (e.g., influenza, SARS), improving response and intervention strategies for the spread of diseases. In 2008, Centers for Disease Control and Prevention collaborated with Google and launched Google Flu Trends, a website that tracked flu-related searches and user locations to track the spread of the flu. Users could visit Google Flu Trends to compare the amount of flu-related search activity versus the reported numbers of flu outbreaks on a graphical map. One drawback of this method of tracking was that Google searches are sometimes performed due to curiosity rather than when an individual is suffering from the flu. According to Ashley Fowlkes, an epidemiologist in the CDC Influenza division, "The Google Flu Trends system tries to account for that type of media bias by modeling search terms over time to see which ones remain stable." Google Flu Trends is no longer publishing current flu estimates on the public website; however, visitors to the site can still view and download previous estimates. Current data can be shared with verified researchers. A study from the Harvard School of Public Health (HSPH), published in the October 12, 2012 issue of Science, discussed how phone data helped curb the spread of malaria in Kenya. The researchers mapped phone calls and texts made by 14,816,521 Kenyan mobile phone subscribers. When individuals left their primary living location, the destination and length of journey were calculated. This data was then compared to a 2009 malaria prevalence map to estimate the disease's commonality in each location. Combining all this information, the researchers could estimate the probability of an individual carrying malaria and map the movement of the disease. This research can be used to track the spread of similar diseases. === Humanitarian aid === Calling patterns of mobile phone users can determine the socioeconomic standings of the populace, which can be used to deduce "its access to housing, education, healthcare, and basic services such as water and electricity." Researchers from Columbia University and Karolinska Institute used daily SIM card location data from both before and after the 2010 Haiti earthquake to estimate the movement of people both in response to the earthquake and during the related 2010 Haiti cholera outbreak. Their research suggests that mobile phone data can provide rapid and accurate estimates of population movements during disasters and outbreaks of infectious disease. Big data can also provide information on looming disasters and can assist relief organizations in rapid-response and locating displaced individuals. By analyzing specific patterns within this 'big data', governments and NGOs can enhance responses to disruptive events such as natural disasters, disease outbreaks, and global economic crises. Leveraging real-time information enables a deeper understanding of individual well-being, allowing for more effective interventions. Corporations utilize digital services, such as human sensor systems, to detect and solve impending problems within communities. This is a strategy used by the private sector to anonymously share customer information for public benefit, while preserving user privacy. === Impoverished areas === Poverty still remains a worldwide issue, with over 2.5 billion people currently impoverished. Statistics indicate the widespread use of mobile phones, even within impoverished communities. Additional data can be collected through Internet access, social media, utility payments and governmental statistics. Data-driven activities can lead to the accumulation of 'big data', which in turn can assist international non-governmental organizations in documenting and evaluating the needs of underprivileged populations. Through data philanthropy, NGOs can distribute information while cooperating with governments and private companies. === Corporate === Data philanthropy incorporates aspects of social philanthropy by allowing corporations to create profound impacts through the act of giving back by dispersing proprietary datasets. The public sector collects and preserves information, considered an essential asset. Companies track and analyze users' online activities to gain insight into their needs related to new products and services. These companies view the welfare of the population as key to business expansion and progression by using their data to highlight global citizens' issues. Experts in the private sector emphasize the importance of integrating diverse data sources—such as retail, mobile, and social media data—to develop essential solutions for global challenges. In Data Philanthropy:

    Read more →
  • Logogen model

    Logogen model

    The logogen model of 1969 is a model of speech recognition that uses units called "logogens" to explain how humans comprehend spoken or written words. Logogens are a vast number of specialized recognition units, each able to recognize one specific word. This model provides for the effects of context on word recognition. == Overview == The word logogen can be traced back to the Greek-language word logos, which means "word", and genus, which means "birth". British scientist John Morton's logogen model was designed to explain word recognition using a new type of unit known as a logogen. A critical element of this theory is the involvement of lexicons, or specialized aspects of memory that include semantic and phonemic information about each item that is contained in memory. A given lexicon consists of many smaller, abstract items known as logogens. Logogens contain a variety of properties about given word such as their appearance, sound, and meaning. Logogens do not store words within themselves, but rather they store information that is specifically necessary for retrieval of whatever word is being searched for. A given logogen will become activated by psychological stimuli or contextual information (words) that is consistent with the properties of that specific logogen and when the logogen's activation level rises to or above its threshold level, the pronunciation of the given word is sent to the output system. Certain stimuli can affect the activation levels of more than one word at a time, usually involving words that are similar to one another. When this occurs, whichever of the words' activation levels reaches the threshold level, it is that word that is then sent to the output system with the subject remaining unaware of any partially excited logogens. This assumption was made by Marslen-Wilson and Welch (1978), who added to the model some assumptions of their own in order to account for their experimental results. They also assumed that the analysis of phonetic input can only become available to other parts of the system by process of how the input affects the logogen system. Finally, Marslen-Wilson and Welch assume that the first syllable of a given word will increase the activation level of a given logogen more than those of the latter syllables, which supported the data found at the time. == Analysis == The logogen model can be used to help linguists explain particular occurrences in the human language. The most-helpful application of the model is to show how one accesses words and their meanings in the lexicon. The word-frequency effect is best explained by the logogen model in that words (or logogens) that have a higher frequency (or are more common) have a lower threshold. This means that they require less perceptual power in the brain to be recognized and decoded from the lexicon and are recognized faster than those words that are less common. Also, with high-frequency words, the recovery from lowering the item's threshold is less fulfilled compared to low-frequency words so less sensory information is needed for that particular item's recognition. There are ways to lower thresholds, such as repetition and semantic priming. Also, each time a word is encountered through these methods, the threshold for that word is temporarily lowered partially because of its recovering ability. This model also conveys that specific concrete words are recalled better because they use images and logogens, whereas abstract words are not as easily recalled well because they only use logogens, hence showing the difference in thresholds between these two types of words. At the time of its conception, Morton's logogen model was one of the most influential models in springing up other parallel word access models and served as the essential basis for these subsequent models. Morton's model also strongly influenced other contemporary theories on lexical access. However, despite the advantages that the logogen theory presents, it also displays some negative facets. First and foremost, the logogen model does not explain all occurrences in language, such as the introduction of new words or non-words into a person's lexicon. Also, because of the distinctive model application, it may vary in its effectiveness in different languages. == Criticisms == While this model does a reasonable job of understanding the underlying semantics of many aspects in psycholinguistics, there are some flaws that have been pointed out in the logogen model. It has been argued that the prior stimulus patterns that have been seen in the logogen theory are not centrally localized in the logogen itself but are actually distributed throughout the different pathways over which the stimulus is being processed. What this directs at is that the notion and proliferation of logogens was due to modality. In essence, the logogen is unnecessary in the idea of attaining the title of being a recognition unit because of the variety of pathways that it is open to, not just logogens. Another criticism has been that this model essentially ignores larger and more critical structures in language and phonetics such as the different syntactic rules or grammatical construction that innately exists in language. Since this model overtly limits itself to the scope of lexical access then this model is seen as biased and misunderstood. To many psychologists, the logogen model does not meet the functional or representational adequacy that a theory should include to sufficiently comprehend language. Also, another criticism is that the logogen theory was supposed to predict that stimulus degradation should affect priming and word frequency in humans. However, many psychologists have conducted studies and researched the model to show that only priming and not word frequency is interacted with stimulus degradation. Priming is supposed to deteriorate a stimulus because it postulates that the semantic characteristics of previously known words are fed back into the detector of a person which in turn raises the threshold of related items. In word frequency, stimulus degradation is supposed to occur because it postulates that familiar words have lower thresholds than their low-frequency counterparts. However, in studies, priming is the only structure that does show observable and notable stimulus decadence. Even though the logogen theory has many unfilled holes, Morton was a revolutionary of his field whose speculation and research has opened up a remarkable era of psycholinguistics. == Other models to consider == cohort model – This model was proposed by Marslen-Wilson and was designed specifically to account for auditory word recognition. It works by breaking the word down and states that when a word is heard all words that begin with the first sound of the target word are activated. This set of words is considered the cohort. Once the first cohort has been activated, the other information, or sounds in the word narrow down the choices. The person recognizes the word when you are left with a single choice; this is considered the "recognition point". checking model – This model was developed by Norris in 1986. In this particular model, he took the approach that any word that partially matches the input is analyzed and checked to see if it fits with the context of the situation. interactive-activation model – This model is considered a connectionist model. Proposed by McClelland and Rumelhart in the 1981 to 1982 period, it is based around nodes, which are visual features, and positions of letters within a given word. They also act as word detectors which have inhibitory and excitatory connections between them. This model starts with first letter and suggests that all the words with that first letter are activated at first and then going through the word one can determine what the word is they are looking at. The main principle is that mental phenomena can be described by interconnected networks of simple units. verification model – The model was developed by Curtis Becker in 1970. The main idea is that a small number of candidates that are activated in parallel are subject to a serial-verification process. This model starts the word-recognition process with a basic representation of the stimulus. Then, sensory trace, consisting of line features is used to activate word detectors. When an acceptable number of detectors are activated these are used to generate a search set. These items are drawn from the lexicon on the basis of similarity to the sensory trace, which help with the identity of the stimulus. Then, in a serial process the candidates are compared to the representation of the sensory-trace input. == Related concepts == word frequency – This is the belief that the speed and accuracy with which a word is recognized is related to how frequently the word occurs in our language. Each logogen has a threshold (for identification) and words with higher frequencies have lower thresholds. Words with higher freq

    Read more →
  • AS1 (networking)

    AS1 (networking)

    AS1 (Applicability Statement 1) is a specification about how to transport structured business-to-business data securely and reliably over the Internet. Security is achieved by using digital certificates and encryption. == AS1 technical overview == The AS1 protocol is based on SMTP and S/MIME. It was the first AS protocol developed and uses signing, encryption and MDN conventions. In other words: Files are sent as "attachments" in a specially coded SMIME email message Messages can be signed, but do not have to be Messages can be encrypted, but do not have to be Messages may request an MDN back if all went well, but do not have to request such a message If the original AS1 message requested an MDN... Upon the receipt of the message and its successful decryption or signature validation (as necessary) a "success" MDN will be sent back to the original sender. This MDN is typically signed but not encrypted. Upon the receipt and successful verification of the signature on the MDN, the original sender will "know" that the recipient got their message (this provides the "Non-repudiation" element of AS1) If there are any problems receiving or interpreting the original AS1 message, a "failed" MDN may be sent back. Like any other AS file transfer, AS1 file transfers typically require both sides of the exchange to trade X.509 certificates and specific "trading partner" names before any transfers can take place.

    Read more →
  • Telenet

    Telenet

    Telenet was an American commercial packet-switched network which went into service in August 16, 1975. It was the first FCC-licensed public data network in the United States. Various commercial and government interests paid monthly fees for dedicated lines connecting their computers and local networks to this backbone network. Free public dialup access to Telenet, for those who wished to access these systems, was provided in hundreds of cities throughout the United States. == History == After establishing that commercial operation of "value added carriers" was legal in the U.S., Bolt Beranek and Newman (BBN), who were the private contractors for constructing packet switching nodes (Interface Message Processor) for the ARPANET, set out to create a private sector version. The original founding company, Telenet Inc., was established by BBN. In January 1975, Telenet Communications Corporation announced that they had acquired the necessary venture capital after a two-year quest. Initially, Bob Kahn was the first President of Telenet; he then moved to ARPA as Larry Roberts left to become President of the company. Barry Wessler also joined from ARPA. On August 16 of the same year they began operating the first public data network. The network offered an email service called Telemail. Telenet had its first offices in downtown Washington, D.C., then moved to McLean, Virginia. It was acquired by GTE in 1979, and then moved to offices in Reston, Virginia. It was later acquired by Sprint and called "Sprintnet". Sprint migrated customers from Telenet to the modern-day Sprintlink IP network, one of many networks composing today's Internet. == Coverage == Originally, the public network had switching nodes in seven US cities: Washington, D.C. (network operations center as well as switching) Boston, Massachusetts New York, New York Chicago, Illinois Dallas, Texas San Francisco, California Los Angeles, California The switching nodes were fed by Telenet Access Controller (TAC) terminal concentrators both colocated and remote from the switches. By 1980, there were over 1000 switches in the public network. At that time, the next largest network using Telenet switches was that of Southern Bell, which had approximately 250 switches. In 1977, Telenet added a London node and a Network Control Centre in a London building of Britain's Post Office Telecommunications. == Internal network technology == Telenet initially used a proprietary virtual connection host interface. The network used statically defined hop-by-hop routing, using Prime commercial minicomputers as switches, but then migrated to a purpose-built multiprocessing switch based on 6502 microprocessors. Among the innovations of this second-generation switch was a patented arbitrated bus interface that created a switched fabric among the microprocessors. By contrast, a typical microprocessor-based system of the time used a bus; switched fabrics did not become common until about twenty years later, with the advent of PCI Express and HyperTransport. Most interswitch lines ran at 56 kbit/s, with a few, such as New York-Washington, at T1 (i.e., 1.544 Mbit/s). Originally, the switching tables could not be altered separately from the main executable code, and topology updates had to be made by deliberately crashing the switch code and forcing a reboot from the network management center. Improvements in the software allowed new tables to be loaded, but the network never used dynamic routing protocols. Multiple static routes, on a switch-by-switch basis, could be defined for fault tolerance. Network management functions continued to run on Prime minicomputers. Roberts and Barry Wessler joined the international effort to standardize the a protocol for packet-switched data communication based on virtual circuits shortly before it was finalized. The CCITT proposal for X.25 was being prepared by Rémi Després and other international experts. A few minor changes, which complemented the proposed specification, were accommodated to enable Telenet to join the agreement. Telenet adopted X.25 shortly after the protocol was published in March 1976. Its X.25 host interface was the first in the industry. The main internal protocol was a proprietary variant on X.75; Telenet also ran standard X.75 gateways to other packet switching networks. == Accessing the network == === Basic asynchronous access === Users could use modems on the Public Switched Telephone Network to dial TAC ports, calling either from "dumb" terminals or from computers emulating such terminals. Organizations with a large number of local terminals could install a TAC on their own site, which used a dedicated line, at up to 56 kbit/s, to connect to a switch at the nearest Telenet location. Dialup modems supported had a maximum speed of 1200 bit/s, and later 4800 bit/s. For example, a customer in NYC could dial into the local number, then type in a command similar to: which would connect (that "c") them to a computer system designated as number "555" located in the same vicinity as the standard telephone "area code" 301. One significant customer was an early (what would now be called) internet service provider The Source which had their equipment in Mclean, Va. Telenet offered a much lower nighttime rate when there were few corporate customers, and this let The Source set up a modestly priced offering to tens of thousands of customers. Another prominent customer in the 1980s was Quantum Link (now AOL). === Other access protocols === Telenet supported remote concentrators for IBM 3270 family intelligent terminals, which communicated, via X.25 to Telenet-written software that ran in IBM 370x series front-end processors. Telenet also supported Block Mode Terminal Interfaces (BMTI) for IBM Remote Job Entry terminals supporting the 2780/3780 and HASP Bisync protocols. === PC Pursuit === In the late 1980s, Telenet offered a service called PC Pursuit. For a flat monthly fee, customers could dial into the Telenet network in one city, then dial out on the modems in another city to access bulletin board systems and other services. PC Pursuit was popular among computer hobbyists because it sidestepped long-distance charges. In this sense, PC Pursuit was similar to the Internet, allowing any user to call any system as if it were local. On connection to the network, the user entered a 5-letter code for the target city they wished to call. This consisted of a 2-letter state code and a 3-letter acronym for the city. For instance, to call a system in Cleveland, Ohio, the user would enter the code OHCLV, for "OHio", "CLeVeland". Once connected, the user could dial out to any local number, and the system simulated a direct connection between the two endpoints.

    Read more →
  • Data dictionary

    Data dictionary

    A data dictionary, or metadata repository, as defined in the IBM Dictionary of Computing, is a "centralized repository of information about data such as meaning, relationships to other data, origin, usage, and format". Oracle defines it as a collection of tables with metadata. The term can have one of several closely related meanings pertaining to databases and database management systems (DBMS): A document describing a database or collection of databases An integral component of a DBMS that is required to determine its structure A piece of middleware that extends or supplants the native data dictionary of a DBMS == Documentation == The terms data dictionary and data repository indicate a more general software utility than a catalogue. A catalogue is closely coupled with the DBMS software. It provides the information stored in it to the user and the DBA, but it is mainly accessed by the various software modules of the DBMS itself, such as DDL and DML compilers, the query optimiser, the transaction processor, report generators, and the constraint enforcer. On the other hand, a data dictionary is a data structure that stores metadata, i.e., (structured) data about information. The software package for a stand-alone data dictionary or data repository may interact with the software modules of the DBMS, but it is mainly used by the designers, users and administrators of a computer system for information resource management. These systems maintain information on system hardware and software configuration, documentation, application and users as well as other information relevant to system administration. If a data dictionary system is used only by the designers, users, and administrators and not by the DBMS Software, it is called a passive data dictionary. Otherwise, it is called an active data dictionary or data dictionary. When a passive data dictionary is updated, it is done so manually and independently from any changes to a DBMS (database) structure. With an active data dictionary, the dictionary is updated first and changes occur in the DBMS automatically as a result. Database users and application developers can benefit from an authoritative data dictionary document that catalogs the organization, contents, and conventions of one or more databases. This typically includes the names and descriptions of various tables (records or entities) and their contents (fields), plus additional details, like the type and length of each data element. Another important piece of information that a data dictionary can provide is the relationship between tables. This is sometimes referred to in entity-relationship diagrams (ERDs), or if using set descriptors, identifying which sets database tables participate in. In an active data dictionary constraints may be placed upon the underlying data. For instance, a range may be imposed on the value of numeric data in a data element (field), or a record in a table may be forced to participate in a set relationship with another record-type. Additionally, a distributed DBMS may have certain location specifics described within its active data dictionary (e.g. where tables are physically located). The data dictionary consists of record types (tables) created in the database by systems generated command files, tailored for each supported back-end DBMS. Oracle has a list of specific views for the "sys" user. This allows users to look up the exact information that is needed. Command files contain SQL Statements for CREATE TABLE, CREATE UNIQUE INDEX, ALTER TABLE (for referential integrity), etc., using the specific statement required by that type of database. There is no universal standard as to the level of detail in such a document. == Middleware == In the construction of database applications, it can be useful to introduce an additional layer of data dictionary software, i.e. middleware, which communicates with the underlying DBMS data dictionary. Such a "high-level" data dictionary may offer additional features and a degree of flexibility that goes beyond the limitations of the native "low-level" data dictionary, whose primary purpose is to support the basic functions of the DBMS, not the requirements of a typical application. For example, a high-level data dictionary can provide alternative entity-relationship models tailored to suit different applications that share a common database. Extensions to the data dictionary also can assist in query optimization against distributed databases. Additionally, DBA functions are often automated using restructuring tools that are tightly coupled to an active data dictionary. Software frameworks aimed at rapid application development sometimes include high-level data dictionary facilities, which can substantially reduce the amount of programming required to build menus, forms, reports, and other components of a database application, including the database itself. For example, PHPLens includes a PHP class library to automate the creation of tables, indexes, and foreign key constraints portably for multiple databases. Another PHP-based data dictionary, part of the RADICORE toolkit, automatically generates program objects, scripts, and SQL code for menus and forms with data validation and complex joins. For the ASP.NET environment, Base One's data dictionary provides cross-DBMS facilities for automated database creation, data validation, performance enhancement (caching and index utilization), application security, and extended data types. Visual DataFlex features provides the ability to use DataDictionaries as class files to form middle layer between the user interface and the underlying database. The intent is to create standardized rules to maintain data integrity and enforce business rules throughout one or more related applications. Some industries use generalized data dictionaries as technical standards to ensure interoperability between systems. The real estate industry, for example, abides by a RESO's Data Dictionary to which the National Association of REALTORS mandates its MLSs comply with through its policy handbook. This intermediate mapping layer for MLSs' native databases is supported by software companies which provide API services to MLS organizations. == Platform-specific examples == Developers use a data description specification (DDS) to describe data attributes in file descriptions that are external to the application program that processes the data, in the context of an IBM i. The sys.ts$ table in Oracle stores information about every table in the database. It is part of the data dictionary that is created when the Oracle Database is created. Developers may also use DDS context from free and open-source software (FOSS) for structured and transactional queries in open environments. == Typical attributes == Here is a non-exhaustive list of typical items found in a data dictionary for columns or fields: Entity or form name or their ID (EntityID or FormID). The group this field belongs to. Field name, such as RDBMS field name Displayed field title. May default to field name if blank. Field type (string, integer, date, etc.) Measures such as min and max values, display width, or number of decimal places. Different field types may interpret this differently. An alternative is to have different attributes depending on field type. Field display order or tab order Coordinates on screen (if a positional or grid-based UI) Default value Prompt type, such as drop-down list, combo-box, check-boxes, range, etc. Is-required (Boolean) - If 'true', the value cannot be blank, null, or only white-spaces Is-read-only (Boolean) Reference table name, if a foreign key. Can be used for validation or selection lists. Various event handlers or references to. Example: "on-click", "on-validate", etc. See event-driven programming. Format code, such as a regular expression or COBOL-style "PIC" statements Description or synopsis Database index characteristics or specification

    Read more →
  • Jais (language model)

    Jais (language model)

    Jais is an open-source large language model launched in August 2023. Developed as a collaboration between Emirati AI company G42, the Mohamed bin Zayed University of Artificial Intelligence (MBZUAI), and US-based Cerebras Systems, Jais was designed to produce high-quality Arabic text and was also trained on English data. The model's creation was motivated by the underrepresentation of the Arabic language in the field of generative artificial intelligence. It aims to provide a more culturally and linguistically accurate model for the world's 400 million Arabic speakers. Its name is a reference to Jebel Jais, the highest mountain in the UAE. == Background and development == Jais was developed in response to the limited availability of advanced generative artificial intelligence models for the Arabic language, despite it being spoken by over 400 million people. Existing models were often trained on limited or low-quality Arabic web content, resulting in poor performance. The project represents a significant investment by the United Arab Emirates in the field of AI as part of its national strategy. The model was created through a partnership between Inception (now Core42), a subsidiary of the Abu Dhabi-based AI company G42; the Mohamed bin Zayed University of Artificial Intelligence (MBZUAI); and Cerebras Systems, a US company specializing in AI hardware. The model is named after Jebel Jais, the highest peak in the UAE. == Training == The initial version of Jais released in August 2023 had 13 billion parameters. In November 2023, Core42 released Jais 30B, an improved version with 30 billion parameters. Both models were trained on a subset of the Cerebras Condor Galaxy 1 supercomputer. The training dataset consisted of a mix of Arabic, English, and computer code. According to Timothy Baldwin, a professor of natural language processing at MBZUAI, training the model on a diverse Arabic dataset allows it to switch between dialects. == Features == Jais is designed to generate text in both English and Arabic. The project has also released instruction-tuned "Chat" variants for both the 13B and 30B models, which are specifically optimized for conversational applications. Additional functionality for working with images, graphs, and tabular data is planned for future releases.

    Read more →
  • Data steward

    Data steward

    A data steward is an oversight or data governance role within an organization, and is responsible for ensuring the quality and fitness for purpose of the organization's data assets, including the metadata for those data assets. A data steward may share some responsibilities with a data custodian, such as the awareness, accessibility, release, appropriate use, security and management of data. A data steward would also participate in the development and implementation of data assets. A data steward may seek to improve the quality and fitness for purpose of other data assets their organization depends upon but is not responsible for. Data stewards have a specialist role that utilizes an organization's data governance processes, policies, guidelines and responsibilities for administering an organizations' entire data in compliance with policy and/or regulatory obligations (e.g., GDPR, HIPAA). The overall objective of a data steward is the data quality of the data assets, datasets, data records and data elements. This includes documenting metainformation for the data, such as definitions, related rules/governance, physical manifestation, and related data models (most of these properties being specific to an attribute/concept relationship), identifying owners/custodian's various responsibilities, relations insight pertaining to attribute quality, aiding with project requirement data facilitation and documentation of capture rules. Data stewards begin the stewarding process with the identification of the data assets and elements which they will steward, with the ultimate result being standards, controls and data entry. The steward works closely with business glossary standards analysts (for standards), with data architect/modelers (for standards), with DQ analysts (for controls) and with operations team members (good-quality data going in per business rules) while entering data. Data stewardship roles are common when organizations attempt to exchange data precisely and consistently between computer systems and to reuse data-related resources. Master data management often makes references to the need for data stewardship for its implementation to succeed. Data stewardship must have precise purpose, fit for purpose or fitness. == Data steward responsibilities == A data steward ensures that each assigned data element: Has clear and unambiguous data element definition Does not conflict with other data elements in the metadata registry (removes duplicates, overlap etc.) Has clear enumerated value definitions if it is of type Code Is still being used (remove unused data elements) Is being used consistently in various computer systems Is being used, fit for purpose = Data Fitness Has adequate documentation on appropriate usage and notes Documents the origin and sources of authority on each metadata element Is protected against unauthorised access or change Responsibilities of data stewards vary between different organisations and institutions. For example, at Delft University of Technology, data stewards are perceived as the first contact point for any questions related to research data. They also have subject-specific background allowing them to easily connect with researchers and to contextualise data management problems to take into account disciplinary practices. == Types of data stewards == Depending on the set of data stewardship responsibilities assigned to an individual, there are 4 types (or dimensions of responsibility) of data stewards typically found within an organization: Data object data steward - responsible for managing reference data and attributes of one business data entity Business data steward - responsible for managing critical data, both reference and transactional, created or used by one business function. The data steward may also serve as a liaison between the organization's data users and technical teams, helping to bridge the gap between business needs and technical requirements. They may also play a role in educating others within the organization about best practices for data management, and advocating for data-driven decision-making. Process data steward - responsible for managing data across one business process System data steward - responsible for managing data for at least one IT system == Benefits of data stewardship == Systematic data stewardship can foster: Faster analysis Consistent use of data management resources Easy mapping of data between computer systems and exchange documents Lower costs associated with migration to (for example) service-oriented architecture (SOA) Mitigation of data risk Better control of dangers associated with privacy, legal, errors, etc. Assignment of each data element to a person sometimes seems like an unimportant process. But multiple groups have found that users have greater trust and usage rates in systems where they can contact a person with questions on each data element. == Examples == Delft University of Technology (TU Delft) offers an example of data stewardship implementation at a research institution. In 2017 the Data Stewardship Project was initiated at TU Delft to address research data management needs in a disciplinary manner across the whole campus. Dedicated data stewards with subject-specific background were appointed at every TU Delft faculty to support researchers with data management questions and to act as a linking point with the other institutional support services. The project is coordinated centrally by TU Delft Library, and it has its own website, blog and a YouTube channel. The [1]EPA metadata registry furnishes an example of data stewardship. Note that each data element therein has a "POC" (point of contact). In 2023, ETH Zurich launched the Data Stewardship Network (DSN) to facilitate collaboration among employees engaged in data management, analysis, and code development across research groups. The DSN serves as a platform for networking and knowledge exchange, aiming to professionalize the role of data stewards who support research data management and reproducible workflows. Established by the team for Research Data Management and Digital Curation at the ETH Library, the DSN collaborates with Scientific IT Services to provide expertise in areas such as storage infrastructure and reproducible workflows. == Data stewardship applications == Information stewardship applications are business solutions used by business users acting in the role of information steward (interpreting and enforcing information governance policy, for example). These developing solutions represent, for the most part, an amalgam of a number of disparate, previously IT-centric tools already on the market, but are organized and presented in such a way that information stewards (a business role) can support the work of information policy enforcement as part of their normal, business-centric, day-to-day work in a range of use cases. The initial push for the formation of this new category of packaged software came from operational use cases — that is, use of business data in and between transactional and operational business applications. This is where most of the master data management efforts are undertaken in organizations. However, there is also now a faster-growing interest in the new data lake arena for more analytical use cases.

    Read more →
  • Social media therapy

    Social media therapy

    Social media therapy is a form of expressive therapy. It uses the act of creating and sharing user-generated content as a way of connecting with and understanding people. Social media therapy combines different expressive therapy aspects of talk therapy, art therapy, writing therapy, and drama therapy and applies them to the web domain. Within social media therapy, synchronous or asynchronous dialogue occurs through exchanges of audio, text or visual information. The digital content is published online to serve as a form of therapy. == Background == Time spent online via email, websites, instant messaging and social media has increased: since 1999, more than 2,554 million people have become internet users. This alters the way people communicate with each other, and alters the connotation of certain words. The concepts of "identity", "friend", "like" and "connected" have adapted alongside technology. People are influenced by data sharing, social marketing, and technological tools. There are multiple therapeutic services offered through the internet. E-therapy, online counseling, cyber therapy, and social media therapy are similar in that each utilizes the internet in order to provide therapy for patients. == Controversy == There are pros and cons when it comes to the subject of online therapy. Criticism of providing therapy through online methods comes from concerns over the lack of physical contact. There are important features of therapy created through face-to-face therapy such as transference and countertransference that can not be created through online therapy. Patricia R. Recupero and Samara E. Rainey stated in their article "Informed Consent to E-Therapy" of American Journal of Psychotherapy that the lack of face-to-face interaction increased the risk of misdiagnosis and misunderstanding between the E-therapist and patient, thereby increasing the risk of uncertainty for the clinician. There are also concerns over the internet creating a distraction from the therapy itself. Confidentiality and privacy concerns have been raised as well. However, several systematic reviews have found that online psychotherapy can produce clinical outcomes comparable to face-to-face treatment, suggesting that physical distance does not inherently reduce therapeutic effectiveness.

    Read more →
  • Torus interconnect

    Torus interconnect

    A torus interconnect is a switch-less network topology for connecting processing nodes in a parallel computer system. == Introduction == In geometry, a torus is created by revolving a circle about an axis coplanar to the circle. While this is a general definition in geometry, the topological properties of this type of shape describes the network topology in its essence. === Geometry illustration === In the representations below, the first is a one dimension torus, a simple circle. The second is a two dimension torus, in the shape of a 'doughnut'. The animation illustrates how a two dimension torus is generated from a rectangle by connecting its two pairs of opposite edges. At one dimension, a torus topology is equivalent to a ring interconnect network, in the shape of a circle. At two dimensions, it becomes equivalent to a two dimension mesh, but with extra connection at the edge nodes. === Torus network topology === A torus interconnect is a switch-less topology that can be seen as a mesh interconnect with nodes arranged in a rectilinear array of N = 2, 3, or more dimensions, with processors connected to their nearest neighbors, and corresponding processors on opposite edges of the array connected.[1] In this lattice, each node has 2N connections. This topology is named for the lattice formed in this way, which is topologically homogeneous to an N-dimensional torus. == Visualization == The first 3 dimensions of torus network topology are easier to visualize and are described below: 1D Torus: one dimension, n nodes are connected in closed loop with each node connected to its two nearest neighbors. Communication can take place in two directions, +x and −x. A 1D Torus is the same as ring interconnection. 2D Torus: two dimensions with degree of four, the nodes are imagined laid out in a two-dimensional rectangular lattice of n rows and n columns, with each node connected to its four nearest neighbors, and corresponding nodes on opposite edges connected. Communication can take place in four directions, +x, −x, +y, and −y. The total nodes of a 2D Torus is n2. 3D Torus: three dimensions, the nodes are imagined in a three-dimensional lattice in the shape of a rectangular prism, with each node connected with its six neighbors, with corresponding nodes on opposing faces of the array connected. Each edge consists of n nodes. communication can take place in six directions, +x, −x, +y, −y, +z, −z. Each edge of a 3D Torus consist of n nodes. The total nodes of 3D Torus is n3. ND Torus: N dimensions, each node of an N dimension torus has 2N neighbors, Communication can take place in 2N directions. Each edge consists of n nodes. Total nodes of this torus is nN. The main motivation of having higher dimension of torus is to achieve higher bandwidth, lower latency, and higher scalability. Higher-dimensional arrays are difficult to visualize. The above ruleset shows that each higher dimension adds another pair of nearest neighbor connections to each node. == Performance == A number of supercomputers on the TOP500 list use three-dimensional torus networks, e.g. IBM's Blue Gene/L and Blue Gene/P, and the Cray XT3. IBM's Blue Gene/Q uses a five-dimensional torus network. Fujitsu's K computer and the PRIMEHPC FX10 use a proprietary three-dimensional torus 3D mesh interconnect called Tofu. === 3D Torus performance simulation === Sandeep Palur and Dr. Ioan Raicu from Illinois Institute of Technology conducted experiments to simulate 3D torus performance. Their experiments ran on a computer with 250GB RAM, 48 cores and x86_64 architecture. The simulator they used was ROSS (Rensselaer’s Optimistic Simulation System). They mainly focused on three aspects: Varying network size Varying number of servers Varying message size They concluded that throughput decreases with the increase of servers and network size. Otherwise, throughput increases with the increase of message size. === 6D Torus product performance === Fujitsu Limited developed a 6D torus computer model called "Tofu". In their model, a 6D torus can achieve 100 GB/s off-chip bandwidth, 12 times higher scalability than a 3D torus, and high fault tolerance. The model is used in the K computer and Fugaku. === Cost === While long wrap-around links may be the easiest way to visualize the connection topology, in practice, restrictions on cable lengths often make long wrap-around links impractical. Instead, directly connected nodes—including nodes that the above visualization places on opposite edges of a grid, connected by a long wrap-around link—are physically placed nearly adjacent to each other in a folded torus network. Every link in the folded torus network is very short—almost as short as the nearest-neighbor links in a simple grid interconnect—and therefore low-latency.

    Read more →
  • AlternativeTo

    AlternativeTo

    AlternativeTo is a website which lists alternatives to web-based software, desktop computer software, and mobile apps, and sorts the alternatives by various criteria, including the number of registered users who have "Liked" each of them on AlternativeTo. Users can search the site to find better alternatives to an application they are using or previously have used, including free alternatives such as free web applications (cloud computing) which don't require any installation and can be accessed from any browser. == Differences == Unlike a number of other software directory websites, the software is not arranged into categories, but each individual piece of software has its own list of alternatives. However, users can also search by tag to find software, which offers an alternative way of finding related software. Users can also narrow their search by focusing on particular platforms and license types (such as "free for non-commercial use"). AlternativeTo lists basic information such as platform and license type at the top of each entry, but does not have comparison tables listing software features side by side. AlternativeTo does not host software for download but it provides links to official websites to where you can download or buy them. AlternativeTo allows anyone to register and suggest new alternatives, or to update the information held about existing entries. Suggestions and alterations are reviewed before being made publicly visible. Users can register using either email and password or OpenID. Login with Facebook has been discontinued. As AlternativeTo is itself a web application, it even has a page for alternatives to itself. == Features == Tweets from anyone mentioning particular pieces of software are also pulled in dynamically from Twitter. Each application has an RSS feed for notifying users of newly listed alternatives to that application. After a user has clicked the Like button next to an application, they are offered the opportunity to tell their friends on Facebook or their followers on Twitter that they liked it. The site also has a forum. For software developers, a JSON API used to be available, but has been taken down indefinitely.

    Read more →
  • Hybrid cryptosystem

    Hybrid cryptosystem

    In cryptography, a hybrid cryptosystem is one which combines the convenience of a public-key cryptosystem with the efficiency of a symmetric-key cryptosystem. Public-key cryptosystems are convenient in that they do not require the sender and receiver to share a common secret in order to communicate securely. However, they often rely on complicated mathematical computations and are thus generally much more inefficient than comparable symmetric-key cryptosystems. In many applications, the high cost of encrypting long messages in a public-key cryptosystem can be prohibitive. This is addressed by hybrid systems by using a combination of both. A hybrid cryptosystem can be constructed using any two separate cryptosystems: a key encapsulation mechanism, which is a public-key cryptosystem a data encapsulation scheme, which is a symmetric-key cryptosystem The hybrid cryptosystem is itself a public-key system, whose public and private keys are the same as in the key encapsulation scheme. Note that for very long messages the bulk of the work in encryption/decryption is done by the more efficient symmetric-key scheme, while the inefficient public-key scheme is used only to encrypt/decrypt a short key value. == Implementations and standards == All practical implementations of public key cryptography today employ a hybrid system. Examples include the TLS protocol and the SSH protocol, that use a public-key mechanism for key exchange (such as Diffie-Hellman) and a symmetric-key mechanism for data encapsulation (such as AES). The OpenPGP file format and the PKCS#7 file format are other examples. Hybrid Public Key Encryption (HPKE, published as RFC 9180) is a modern standard for generic hybrid encryption. HPKE is used within multiple IETF protocols, including Messaging Layer Security (MLS), Oblivious DNS over HTTPS, Oblivious HTTP, Privacy Preserving Measurement, and TLS Encrypted Client Hello. Envelope encryption is an example of a usage of hybrid cryptosystems in cloud computing. In a cloud context, hybrid cryptosystems also enable centralized key management. == Example == To encrypt a message addressed to Alice in a hybrid cryptosystem, Bob does the following: Obtains Alice's public key. Generates a fresh symmetric key for the data encapsulation scheme. Encrypts the message under the data encapsulation scheme, using the symmetric key just generated. Encrypts the symmetric key under the key encapsulation scheme, using Alice's public key. Sends both of these ciphertexts to Alice. To decrypt this hybrid ciphertext, Alice does the following: Uses her private key to decrypt the symmetric key contained in the key encapsulation segment. Uses this symmetric key to decrypt the message contained in the data encapsulation segment. == Security == If both the key encapsulation and data encapsulation schemes in a hybrid cryptosystem are secure against adaptive chosen ciphertext attacks, then the hybrid scheme inherits that property as well. However, it is possible to construct a hybrid scheme secure against adaptive chosen ciphertext attacks even if the key encapsulation has a slightly weakened security definition (though the security of the data encapsulation must be slightly stronger). == Envelope encryption == Envelope encryption is term used for encrypting with a hybrid cryptosystem used by all major cloud service providers, often as part of a centralized key management system in cloud computing. Envelope encryption gives names to the keys used in hybrid encryption: Data Encryption Keys (abbreviated DEK, and used to encrypt data) and Key Encryption Keys (abbreviated KEK, and used to encrypt the DEKs). In a cloud environment, encryption with envelope encryption involves generating a DEK locally, encrypting one's data using the DEK, and then issuing a request to wrap (encrypt) the DEK with a KEK stored in a potentially more secure service. Then, this wrapped DEK and encrypted message constitute a ciphertext for the scheme. To decrypt a ciphertext, the wrapped DEK is unwrapped (decrypted) via a call to a service, and then the unwrapped DEK is used to decrypt the encrypted message. In addition to the normal advantages of a hybrid cryptosystem, using asymmetric encryption for the KEK in a cloud context provides easier key management and separation of roles, but can be slower. In cloud systems, such as Google Cloud Platform and Amazon Web Services, a key management system (KMS) can be available as a service. In some cases, the key management system will store keys in hardware security modules, which are hardware systems that protect keys with hardware features like intrusion resistance. This means that KEKs can also be more secure because they are stored on secure specialized hardware. Envelope encryption makes centralized key management easier because a centralized key management system only needs to store KEKs, which occupy less space, and requests to the KMS only involve sending wrapped and unwrapped DEKs, which use less bandwidth than transmitting entire messages. Since one KEK can be used to encrypt many DEKs, this also allows for less storage space to be used in the KMS. This also allows for centralized auditing and access control at one point of access.

    Read more →
  • Cryptographic Service Provider

    Cryptographic Service Provider

    A cryptographic service provider (CSP) is a package that "provides a concrete implementation of certain cryptographic services." A CSP offers operations and protocols to support a variety of use cases. The cryptographic application programming interface (API) provided by the CSP provides common solutions for different platforms, for example hardware and cloud services. == Microsoft Windows == In Microsoft Windows, a Cryptographic Service Provider is a software library that implements the Microsoft CryptoAPI (CAPI). CSPs implement encoding and decoding functions, which computer application programs may use, for example, to implement strong user authentication or for secure email. CSPs are independent modules that can be used by different applications. A user program calls CryptoAPI functions and these are redirected to CSPs functions. Since CSPs are responsible for implementing cryptographic algorithms and standards, applications do not need to be concerned about security details. Furthermore, each application can define which CSP it is going to use on its calls to CryptoAPI. In fact, all cryptographic activity is implemented in CSPs. CryptoAPI only works as a bridge between the application and the CSP. CSPs are implemented basically as a special type of DLL with special restrictions on loading and use. Every CSP must be digitally signed by Microsoft and the signature is verified when Windows loads the CSP. In addition, after being loaded, Windows periodically re-scans the CSP to detect tampering, either by malicious software such as computer viruses or by the user him/herself trying to circumvent restrictions (for example on cryptographic key length) that might be built into the CSP's code. To obtain a signature, non-Microsoft CSP developers must supply paperwork to Microsoft promising to obey various legal restrictions and giving valid contact information. As of circa 2000, Microsoft did not charge any fees to supply these signatures. For development and testing purposes, a CSP developer can configure Windows to recognize the developer's own signatures instead of Microsoft's, but this is a somewhat complex and obscure operation unsuitable for nontechnical end users. The CAPI/CSP architecture had its origins in the era of restrictive US government controls on the export of cryptography. Microsoft's default or "base" CSP then included with Windows was limited to 512-bit RSA public-key cryptography and 40-bit symmetric cryptography, the maximum key lengths permitted in exportable mass market software at the time. CSPs implementing stronger cryptography were available only to U.S. residents, unless the CSPs themselves had received U.S. government export approval. The system of requiring CSPs to be signed only on presentation of completed paperwork was intended to prevent the easy spread of unauthorized CSPs implemented by anonymous or foreign developers. As such, it was presented as a concession made by Microsoft to the government, in order to get export approval for the CAPI itself. After the Bernstein v. United States court decision establishing computer source code as protected free speech and the transfer of cryptographic regulatory authority from the U.S. State Department to the more pro-export Commerce Department, the restrictions on key lengths were dropped, and the CSPs shipped with Windows now include full-strength cryptography. The main use of third-party CSPs is to interface with external cryptography hardware such as hardware security modules (HSM) or smart cards. === Smart Card CSP === These cryptographic functions can be realized by a smart card, thus the Smart Card CSP is the Microsoft way of a PKCS#11. Microsoft Windows is identifying the correct Smart Card CSP, which have to be used, analyzing the answer to reset (ATR) of the smart card, which is registered in the Windows Registry. Installing a new CSP, all ATRs of the supported smart cards are enlisted in the registry. === Use of CSP in MS Office password protection === Cryptographic service providers can be used for encryption of Word, Excel, and PowerPoint documents starting from Microsoft Office XP. A standard encryption algorithm with a 40-bit key is used by default, but enabling a CSP enhances key length and thus makes decryption process more continuous. This only applies to passwords that are required to open document because this password type is the only one that encrypts a password-protected document.

    Read more →