AI Data Center Zoning

AI Data Center Zoning — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Solomonoff's theory of inductive inference

    Solomonoff's theory of inductive inference

    Solomonoff's theory of inductive inference proves that, under its common sense assumptions (axioms), the best possible scientific model is the shortest algorithm that generates the empirical data under consideration. In addition to the choice of data, other assumptions are that, to avoid the post-hoc fallacy, the programming language must be chosen prior to the data and that the environment being observed is generated by an unknown algorithm. This is also called a theory of induction. Due to its basis in the dynamical (state-space model) character of Algorithmic Information Theory, it encompasses statistical as well as dynamical information criteria for model selection. It was introduced by Ray Solomonoff, based on probability theory and theoretical computer science. In essence, Solomonoff's induction derives the posterior probability of any computable theory, given a sequence of observed data. This posterior probability is derived from Bayes' rule and some universal prior, that is, a prior that assigns a positive probability to any computable theory. Solomonoff proved that this induction is incomputable (or more precisely, lower semi-computable), but noted that "this incomputability is of a very benign kind", and that it "in no way inhibits its use for practical prediction" (as it can be approximated from below more accurately with more computational resources). It is only "incomputable" in the benign sense that no scientific consensus is able to prove that the best current scientific theory is the best of all possible theories. However, Solomonoff's theory does provide an objective criterion for deciding among the current scientific theories explaining a given set of observations. Solomonoff's induction naturally formalizes Occam's razor by assigning larger prior credences to theories that require a shorter algorithmic description. == Origin == === Philosophical === The theory is based in philosophical foundations, and was founded by Ray Solomonoff around 1960. It is a mathematically formalized combination of Occam's razor and the Principle of Multiple Explanations. All computable theories which perfectly describe previous observations are used to calculate the probability of the next observation, with more weight put on the shorter computable theories. Marcus Hutter's universal artificial intelligence builds upon this to calculate the expected value of an action. === Principle === Solomonoff's induction has been argued to be the computational formalization of pure Bayesianism. To understand, recall that Bayesianism derives the posterior probability P [ T | D ] {\displaystyle \mathbb {P} [T|D]} of a theory T {\displaystyle T} given data D {\displaystyle D} by applying Bayes rule, which yields P [ T | D ] = P [ D | T ] P [ T ] P [ D | T ] P [ T ] + ∑ A ≠ T P [ D | A ] P [ A ] {\displaystyle \mathbb {P} [T|D]={\frac {\mathbb {P} [D|T]\mathbb {P} [T]}{\mathbb {P} [D|T]\mathbb {P} [T]+\sum _{A\neq T}\mathbb {P} [D|A]\mathbb {P} [A]}}} where theories A {\displaystyle A} are alternatives to theory T {\displaystyle T} . For this equation to make sense, the quantities P [ D | T ] {\displaystyle \mathbb {P} [D|T]} and P [ D | A ] {\displaystyle \mathbb {P} [D|A]} must be well-defined for all theories T {\displaystyle T} and A {\displaystyle A} . In other words, any theory must define a probability distribution over observable data D {\displaystyle D} . Solomonoff's induction essentially boils down to demanding that all such probability distributions be computable. Interestingly, the set of computable probability distributions is a subset of the set of all programs, which is countable. Similarly, the sets of observable data considered by Solomonoff were finite. Without loss of generality, we can thus consider that any observable data is a finite bit string. As a result, Solomonoff's induction can be defined by only invoking discrete probability distributions. Solomonoff's induction then allows to make probabilistic predictions of future data F {\displaystyle F} , by simply obeying the laws of probability. Namely, we have P [ F | D ] = E T [ P [ F | T , D ] ] = ∑ T P [ F | T , D ] P [ T | D ] {\displaystyle \mathbb {P} [F|D]=\mathbb {E} _{T}[\mathbb {P} [F|T,D]]=\sum _{T}\mathbb {P} [F|T,D]\mathbb {P} [T|D]} . This quantity can be interpreted as the average predictions P [ F | T , D ] {\displaystyle \mathbb {P} [F|T,D]} of all theories T {\displaystyle T} given past data D {\displaystyle D} , weighted by their posterior credences P [ T | D ] {\displaystyle \mathbb {P} [T|D]} . === Mathematical === The proof of the "razor" is based on the known mathematical properties of a probability distribution over a countable set. These properties are relevant because the infinite set of all programs is a denumerable set. The sum S of the probabilities of all programs must be exactly equal to one (as per the definition of probability) thus the probabilities must roughly decrease as we enumerate the infinite set of all programs, otherwise S will be strictly greater than one. To be more precise, for every ϵ {\displaystyle \epsilon } > 0, there is some length l such that the probability of all programs longer than l is at most ϵ {\displaystyle \epsilon } . This does not, however, preclude very long programs from having very high probability. Fundamental ingredients of the theory are the concepts of algorithmic probability and Kolmogorov complexity. The universal prior probability of any prefix p of a computable sequence x is the sum of the probabilities of all programs (for a universal computer) that compute something starting with p. Given some p and any computable but unknown probability distribution from which x is sampled, the universal prior and Bayes' theorem can be used to predict the yet unseen parts of x in optimal fashion. == Mathematical guarantees == === Solomonoff's completeness === The remarkable property of Solomonoff's induction is its completeness. In essence, the completeness theorem guarantees that the expected cumulative errors made by the predictions based on Solomonoff's induction are upper-bounded by the Kolmogorov complexity of the (stochastic) data generating process. The errors can be measured using the Kullback–Leibler divergence or the square of the difference between the induction's prediction and the probability assigned by the (stochastic) data generating process. === Solomonoff's uncomputability === Unfortunately, Solomonoff also proved that Solomonoff's induction is uncomputable. In fact, he showed that computability and completeness are mutually exclusive: any complete theory must be uncomputable. The proof of this is derived from a game between the induction and the environment. Essentially, any computable induction can be tricked by a computable environment, by choosing the computable environment that negates the computable induction's prediction. This fact can be regarded as an instance of the no free lunch theorem. == Modern applications == === Artificial intelligence === Though Solomonoff's inductive inference is not computable, several AIXI-derived algorithms approximate it in order to make it run on a modern computer. The more computing power they are given, the closer their predictions are to the predictions of inductive inference (their mathematical limit is Solomonoff's inductive inference). Another direction of inductive inference is based on E. Mark Gold's model of learning in the limit from 1967 and has developed since then more and more models of learning. The general scenario is the following: Given a class S of computable functions, is there a learner (that is, recursive functional) which for any input of the form (f(0),f(1),...,f(n)) outputs a hypothesis (an index e with respect to a previously agreed on acceptable numbering of all computable functions; the indexed function may be required consistent with the given values of f). A learner M learns a function f if almost all its hypotheses are the same index e, which generates the function f; M learns S if M learns every f in S. Basic results are that all recursively enumerable classes of functions are learnable while the class REC of all computable functions is not learnable. Many related models have been considered and also the learning of classes of recursively enumerable sets from positive data is a topic studied from Gold's pioneering paper in 1967 onwards. A far reaching extension of the Gold’s approach is developed by Schmidhuber's theory of generalized Kolmogorov complexities, which are kinds of super-recursive algorithms.

    Read more →
  • Computer network

    Computer network

    In computer science, computer engineering, and telecommunications, a network is a group of communicating computers and peripherals known as hosts, which communicate data to other hosts via communication protocols, as facilitated by networking hardware. Within a computer network, hosts are identified by network addresses, which allow networking hardware to locate and identify hosts. Hosts may also have hostnames, memorable labels for the host nodes, which can be mapped to a network address using a hosts file or a name server such as Domain Name Service. The physical medium that supports information exchange includes wired media like copper cables, optical fibers, and wireless radio-frequency media. The arrangement of hosts and hardware within a network architecture is known as the network topology. The first computer network was created in 1940 when George Stibitz connected a terminal at Dartmouth to his Complex Number Calculator at Bell Labs in New York. Today, almost all computers are connected to a computer network, such as the global Internet or embedded networks such as those found in many modern electronic devices. Many applications have only limited functionality unless they are connected to a network. Networks support applications and services, such as access to the World Wide Web, digital video and audio, application and storage servers, printers, and email and instant messaging applications. == History == === Early origins (1940 – 1960s) === In 1940, George Stibitz of Bell Labs connected a teletype at Dartmouth to a Bell Labs computer running his Complex Number Calculator to demonstrate the use of computers at long distance. This was the first real-time, remote use of a computing machine. In the late 1950s, a network of computers was built for the U.S. military Semi-Automatic Ground Environment (SAGE) radar system using the Bell 101 modem. It was the first commercial modem for computers, released by AT&T Corporation in 1958. The modem allowed digital data to be transmitted over regular unconditioned telephone lines at a speed of 110 bits per second (bit/s). In 1959, Christopher Strachey filed a patent application for time-sharing in the United Kingdom and John McCarthy initiated the first project to implement time-sharing of user programs at MIT. Strachey passed the concept on to J. C. R. Licklider at the inaugural UNESCO Information Processing Conference in Paris that year. McCarthy was instrumental in the creation of three of the earliest time-sharing systems (the Compatible Time-Sharing System in 1961, the BBN Time-Sharing System in 1962, and the Dartmouth Time-Sharing System in 1963). In 1959, Anatoly Kitov proposed to the Central Committee of the Communist Party of the Soviet Union a detailed plan for the re-organization of the control of the Soviet armed forces and of the Soviet economy on the basis of a network of computing centers. Kitov's proposal was rejected, as later was the 1962 OGAS economy management network project. During the 1960s, Paul Baran and Donald Davies independently invented the concept of packet switching for data communication between computers over a network. Baran's work addressed adaptive routing of message blocks across a distributed network, but did not include routers with software switches, nor the idea that users, rather than the network itself, would provide the reliability. Davies' hierarchical network design included high-speed routers, communication protocols and the essence of the end-to-end principle. The NPL network, a local area network at the National Physical Laboratory (United Kingdom), pioneered the implementation of the concept in 1968-69 using 768 kbit/s links. Both Baran's and Davies' inventions were seminal contributions that influenced the development of computer networks. === ARPANET (1969 – 1974) === In 1962 and 1963, J. C. R. Licklider sent a series of memos to office colleagues discussing the concept of the "Intergalactic Computer Network", a computer network intended to allow general communications among computer users. This ultimately became the basis for the ARPANET, which began in 1969. That year, the first four nodes of the ARPANET were connected using 50 kbit/s circuits between the University of California at Los Angeles, the Stanford Research Institute, the University of California, Santa Barbara, and the University of Utah. Designed principally by Bob Kahn, the network's routing, flow control, software design and network control were developed by the IMP team working for Bolt Beranek & Newman. In the early 1970s, Leonard Kleinrock carried out mathematical work to model the performance of packet-switched networks, which underpinned the development of the ARPANET. His theoretical work on hierarchical routing in the late 1970s with student Farouk Kamoun remains critical to the operation of the Internet today. In 1973, Peter Kirstein put internetworking into practice at University College London (UCL), connecting the ARPANET to British academic networks, the first international heterogeneous computer network. That same year, Robert Metcalfe wrote a formal memo at Xerox PARC describing Ethernet, a local area networking system he created with David Boggs. It was inspired by the packet radio ALOHAnet, started by Norman Abramson and Franklin Kuo at the University of Hawaii in the late 1960s. Metcalfe and Boggs, with John Shoch and Edward Taft, also developed the PARC Universal Packet for internetworking. That year, the French CYCLADES network, directed by Louis Pouzin was the first to make the hosts responsible for the reliable delivery of data, rather than this being a centralized service of the network itself. === The internet (1974 – present) === In 1974, Vint Cerf and Bob Kahn published their seminal 1974 paper on internetworking, A Protocol for Packet Network Intercommunication. Later that year, Cerf, Yogen Dalal, and Carl Sunshine wrote the first Transmission Control Protocol (TCP) specification, RFC 675, coining the term Internet as a shorthand for internetworking. In July 1976, Metcalfe and Boggs published their paper "Ethernet: Distributed Packet Switching for Local Computer Networks" and in December 1977, together with Butler Lampson and Charles P. Thacker, they received U.S. patent 4063220A for their invention. In 1976, John Murphy of Datapoint Corporation created ARCNET, a token-passing network first used to share storage devices. In 1979, Robert Metcalfe pursued making Ethernet an open standard. In 1980, Ethernet was upgraded from the original 2.94 Mbit/s protocol to the 10 Mbit/s protocol, which was developed by Ron Crane, Bob Garner, Roy Ogus, Hal Murray, Dave Redell and Yogen Dalal. In 1986, the National Science Foundation (NSF) launched the National Science Foundation Network (NSFNET) as a general-purpose research network connecting various NSF-funded sites to each other and to regional research and education networks. In 1995, the transmission speed capacity for Ethernet increased from 10 Mbit/s to 100 Mbit/s. By 1998, Ethernet supported transmission speeds of 1 Gbit/s. Subsequently, higher speeds of up to 800 Gbit/s were added (as of 2025). The scaling of Ethernet has been a contributing factor to its continued use. In the 1980s and 1990s, as embedded systems were becoming increasingly important in factories, cars, and airplanes, network protocols were developed to allow the embedded computers to communicate. In the late 1990s and 2000s, ubiquitous computing and an Internet of Things became popular. === Commercial usage === In 1960, the commercial airline reservation system semi-automatic business research environment (SABRE) went online with two connected mainframes. In 1965, Western Electric introduced the first widely used telephone switch that implemented computer control in the switching fabric. In 1972, commercial services were first deployed on experimental public data networks in Europe. Public data networks in Europe, North America and Japan began using X.25 in the late 1970s and interconnected with X.75. This underlying infrastructure was used for expanding TCP/IP networks in the 1980s. In 1977, the first long-distance fiber network was deployed by GTE in Long Beach, California. == Hardware == === Network links === The transmission media used to link devices to form a computer network include electrical cable, optical fiber, and free space. In the OSI model, the software to handle the media is defined at layers 1 and 2 — the physical layer and the data link layer. Common examples of networking technologies include: Ethernet is a widely adopted family of networking technologies that use copper and fiber media in local area networks (LAN). The media and protocol standards that enable communication between networked devices over Ethernet are defined by IEEE 802.3. Wireless LAN standards, which use radio waves. Some standards use infrared signals as a transmission medium. Power line communication uses a building's power cabling to transmit

    Read more →
  • Data exchange

    Data exchange

    Data exchange is the process of moving data from one information system to another. It often involves transforming data that is native to the source system into a form that is consumable by the target system or to a standardized form that is consumable by any compatible system. In particular, data exchange allows data to be shared between computer programs. Data exchange is similar to data integration except that data may be restructured with possible loss of content. There may be no way to transform a particular collection based on exchange constraints. Conversely, there may be multiple ways to transform the data, in which case one option must be identified in order to achieve compatibility between source and target. There are two main types of data exchange: broadcast and peer-to-peer (a.k.a. unicast). For broadcast, data is transmitted simultaneously to all consumers. Just as a conference call, all participants get the same information from the speaker at the same time. For peer-to-peer, data is sent to a single receiver, defined by a specific address. For example, a letter goes to just one mail box. == Single-domain == In some domains, a multiple source and target schema (proprietary data formats) may exist. An exchange or interchange format is often developed for a single domain, and then necessary routines (mappings) are written to (indirectly) transform/translate each and every source schema to each and every target schema by using the interchange format as an intermediate step. That requires less work than writing and debugging the many routines that would be required to directly translate each source schema directly to each target schema. Examples of these transformative interchange formats include: Standard Interchange Format for geospatial data; Data Interchange Format for spreadsheet data; Open Document Format for spreadsheets, charts, presentations and word processing documents; GPS eXchange Format or Keyhole Markup Language for describing GPS data; GDSII for integrated circuit layout. == Representation == A data exchange (a.k.a. interchange) language defines a domain-independent way to represent data. These languages have evolved from being markup and display-oriented to support the encoding of metadata that describes the structural attributes of the information. Practice has shown that certain types of formal languages are better suited for this task than others, since their specification is driven by a formal process instead of particular software implementation. For example, XML is a markup language that was designed to enable the creation of dialects (the definition of domain-specific sublanguages). However, it does not contain domain-specific dictionaries or fact types. Beneficial to a reliable data exchange is the availability of standard dictionaries-taxonomies and tools libraries such as parsers, schema validators, and transformation tools. === XML === The popularity of XML for data exchange on the World Wide Web has several reasons. First of all, it is closely related to the preexisting standards Standard Generalized Markup Language (SGML) and Hypertext Markup Language (HTML), and as such a parser written to support these two languages can be easily extended to support XML as well. For example, XHTML has been defined as a format that is formal XML, but understood correctly by most (if not all) HTML parsers. === YAML === YAML was designed to be human-readable and authored via a text editor with notion similar to reStructuredText and wiki syntax. YAML 1.2 also includes a shorthand notion that is compatible with JSON, and as such any JSON document is also valid YAML; this however does not hold the other way. === REBOL === REBOL was designed to be human-readable and authored via a text editor. It uses a simple free-form syntax with minimal punctuation and a rich set of data types (such as URL, email, date and time, tuple, string, tag) that respect common standards. It is designed to not need any additional meta-language, being designed in a metacircular fashion which is why the parse dialect used for definitions and transformations of REBOL dialects is also itself a dialect of REBOL. REBOL was used as a source of inspiration for JSON. === Gellish === Gellish English is a formalized subset of natural English (language), which includes a simple grammar and a large, extensible dictionary (taxonomy) that defines the general and domain specific terminology, whereas the concepts are arranged in a hierarchy, which supports inheritance of knowledge and requirements. The dictionary also includes standardized fact types. The terms and relation types together can be used to create and interpret expressions of facts, knowledge, requirements and other information. Gellish can be used in combination with SQL, RDF/XML, OWL and various other meta-languages. The Gellish standard is a combination of ISO 10303-221 (AP221) and ISO 15926. === List === The following describes and compares popular data exchange languages. Columns Schemas – Whether supports representing domain specific data structure definition Flexible – Whether supports extension of the semantic expression capabilities without modifying the schema Semantic verification – Whether supports semantic verification of the correctness of expressions in the language Dictionary – Whether includes a dictionary and a taxonomy (hierarchy) of concepts with inheritance Information model – Whether supports an information model Synonyms and homonyms – Whether supports the use of synonyms and homonyms in expressions Dialecting – Whether is available in multiple natural languages or dialects Web standard – Whether is standardized by a recognized body Transformations – Whether includes a translation to other standards Lightweight – Whether a lightweight version is available Human readable – Whether expressions are understandable without training Compatibility – Which other tools can be used or are required

    Read more →
  • Story (social media)

    Story (social media)

    In social media, a story is a function in which the user tells a narrative or provides status messages and information in the form of short, time-limited clips in an automatically running sequence. == Definition == A story is a short sequence of images, videos, or other social media content, which can be accompanied by backgrounds, music, text, stickers, animations, filters or emojis. Social media platforms typically advance through the sequence automatically when presenting a story to a viewer. Although the sequential nature of stories can be used to tell a narrative, the pieces of a story can also be unrelated. Social media platforms that offer stories will typically have a primary story for each user which consists of everything the user posted to their story over a certain period of time, usually the most recent 24 hours. Most stories cannot be changed afterwards and are only available for a short time. Stories are almost exclusively created on a mobile device such as a smartphone or tablet computer and are usually displayed vertically. == History == In October 2013, Snapchat first introduced the story function as a series of Snaps that can together tell a narrative through a chronological order, with each Snap being viewable by all of the poster's friends and deleted after 24 hours. Stories soon surpassed private Snaps to become Snapchat's most-viewed type of post. After 2015, Snapchat introduced a feature allowing users to post private stories viewable by a chosen subset of their friends. Later other apps would copy this feature. In August 2016, Instagram introduced a stories function that deletes the content after 24 hours. Various commenters have accused the site of copying Snapchat. In February 2017, the instant messenger WhatsApp introduced the Now Status stories function in beta, which was later renamed Status. In March 2017, a story function was introduced in Facebook Messenger. In February 2018, Google launched AMP Stories, bringing a story-style format to certain Google search results on mobile devices. In August 2018, YouTube introduced a stories function that initially was limited to pictures, but was later expanded to support short video clips. The feature was shut down in June 2023. In August 2018, the GIF website Giphy introduced a story function. In March 2022, TikTok added a story feature which allowed users to create 15 second long videos that delete after 24 hours. In June 2023, Telegram CEO Pavel Durov announced stories for Telegram would be released in July 2023. In July 2023, the feature was released for premium users, and in August 2023 it was rolled out for all users. == User motivations == In 2022, a study performed by Jia-Dai (Evelyn) Lu and Jhih-Syuan (Elaine) Lin examined the various motivations for updating stories on Instagram. The researchers found a new configuration of motivations for using Instagram Stories: exploration, self-enhancement, perceived functionality, entertainment, social sharing, relationship building, novelty, and surveillance. The findings also highlighted that contribution and creation activities are likely to result in positive emotions, while creation alone predicts negative emotions while updating stories on Instagram. == Usage statistics == In 2019, around 1.5 billion people worldwide every day on average used the stories function in a social network or messenger. Younger people in particular use this function. More than 20% of people aged 18 to 24 use Instagram stories, while it is just under 2% of those over 55. In a Facebook survey of 18,000 participants from 12 countries, 68% said they used the stories function at least once a month. Stories in the areas of fashion and tourism are particularly popular. The website Fanpage Karma analyzed several Instagram accounts and determined the average reach of posts and stories per follower, concluding that posts have a higher reach than stories, which often have less than half the reach.

    Read more →
  • AppBlock

    AppBlock

    AppBlock is a software tool for managing screen time that limits access to selected mobile applications and websites. Developed by the Czech studio MobileSoft, it is distributed for Android and iOS devices as well as through browser extensions for Google Chrome, Microsoft Edge and Brave, and as desktop solutions. The application is used primarily to restrict time spent on social media and similar distracting services while working and studying. By 2025, the application reported 700,000 monthly active users, with the domestic Czech market accounting for less than one percent of its total user base and revenue. == History == === Origins === AppBlock was created by the Czech software studio MobileSoft, based in Hradec Králové. The studio was founded in 2012 by Miroslav Novosvětský, who remains the sole owner. The idea for the application arose from the use of browser-based website blockers on desktop computers. AppBlock was conceived as a way to reduce the time spent on mobile devices. === Early releases === In its early phase, AppBlock was available only for phones running on Android. Early versions allowed users to limit access to selected applications and websites during specified periods. From the outset, the application was distributed internationally rather than only within the Czech market, and early coverage reported a multi-million number of downloads worldwide. === Expansion of functionality === Over time, AppBlock has expanded beyond basic application blocking to include additional functions related to limiting procrastination and managing attention. The development of AppBlock accelerated during the COVID-19 pandemic. Following a reduction in external client orders, the studio reallocated resources from contract development to the application. Increased digital content consumption during lockdowns contributed to a rise in the application's usage and revenue. As the application developed, it became the company's product with the largest user base. Novosvětský described an increase in downloads over a twelve-month period, which he linked in part to the company's activities abroad, including participation in events focused on mobile marketing in the United States. These activities were an important factor in the further development of AppBlock. === Internationalization and market expansion === Within roughly the first eight years of the company's existence, MobileSoft became active both in the domestic Czech market and in the United States, supported among other things by participation in the CzechAccelerator program, which is intended to help Czech firms enter foreign markets. In mid-August 2021 the developers launched a version for iOS, which soon began to attract paying users. The expansion to iOS was accompanied by plans for cooperation with the Procrastination.com platform, intended to complement the blocking functions with educational content related to digital media use, sleep and work habits. By 2025, AppBlock was localised into 15 languages, with the largest share of users in the United States, the United Kingdom, Germany, and France, with recent growth in Brazil, and usage extending across several continents. AppBlock has reached more than 10 million installations. In the same period its creators announced plans to refine existing functions and to expand support beyond mobile phones to desktop use, including through support for additional web browsers. == Features == === Supported platforms === AppBlock is distributed as a mobile application for Android and iOS users through Google Play and the Apple App Store. Browser extensions for desktop systems are available for Google Chrome, Microsoft Edge and Brave. === Functionality === AppBlock's core function is to restrict access to selected applications and websites. The mobile application shows a list of installed apps and lets the user select which ones to block. It also includes tools to block specific websites and, on iOS, to block certain phrases entered in the Safari browser. AppBlock can mute notifications from selected applications, so alerts from those apps do not appear while blocking is active. In addition to choosing which apps or content to block, the software also offers an allowlist mode, where only selected applications remain accessible and all others are blocked. Blocking rules are organized into configurable schedules, called profiles. Users can create profiles that define time periods when selected apps and websites are unavailable. Newer versions also allow profiles to be activated automatically based on the time of day, days of the week, the device's location, or connection to specific Wi-Fi networks. The iOS version lets users set limits on how often or how long certain apps can be used before they are blocked, and it can track and restrict screen time for individual apps. In addition to these recurring rules, AppBlock includes a Quick Block feature that temporarily blocks selected apps and websites with a single action, without requiring a separate long-term schedule. Strict Mode is an optional setting that limits the ability to change blocking once it is active. For a specified period, it prevents editing AppBlock's rules and can be configured to stop the app from being uninstalled during that time. While Strict Mode is enabled, users cannot modify or disable the restrictions they have set. Deactivation requires specific verification steps, such as connecting the device to a charger or obtaining approval from a designated contact person. The mobile application also includes statistical and reporting features. In addition to blocking, AppBlock lets users view statistics and data about their use of applications and websites, including screen-time summaries and focus sessions that silence notifications and enforce blocking during defined work or study periods. Browser extensions for desktop environments apply AppBlock's website-blocking functions on Windows and macOS systems through supported web browsers. == Business model == AppBlock uses a freemium revenue model. The basic version of the application is available free of charge and allows blocking of up to three applications at the same time. The premium version removes this limit and adds further configuration options. In 2020, the application shifted from a one-time payment structure to a subscription model. By 2021, AppBlock had more than seven thousand paying users and annual revenue of about four million Czech crowns. By 2025, annual revenue reached approximately 4 million US dollars (80 million CZK) before taxes and platform fees, with roughly 20 percent of active users subscribing to the paid version. == Usage == AppBlock limits access to selected applications and websites in order to reduce smartphone overuse and digital distraction. It is used to block social media, games and other services considered addictive, with the aim of reducing frequent checking of mobile devices and creating time intervals in which these services are unavailable. Reported use cases of AppBlock cover work, students, parents, ADHD, mental health, well-being and business. The application is used both by individual users and within workplace initiatives in which employees install it to reduce digital distractions during working hours.

    Read more →
  • CANaerospace

    CANaerospace

    CANaerospace is a higher layer protocol based on Controller Area Network (CAN) which has been developed by Stock Flight Systems in 1998 for aeronautical applications. == Background == CANaerospace supports airborne systems employing the Line-replaceable unit (LRU) concept to share data across CAN and ensures interoperability between CAN LRUs by defining CAN physical layer characteristics, network layers, communication mechanisms, data types and aeronautical axis systems. CANaerospace is an open source project, was initiated to standardize the interface between CAN LRUs on the system level. CANaerospace is continuously being developed further and has also been published by NASA as the Advanced General Aviation Transport Experiments Databus Standard in 2001. It found widespread use in aeronautical research worldwide. A major research aircraft that employs several CANaerospace networks for real-time computer interconnection is the Stratospheric Observatory for Infrared Astronomy (SOFIA), a Boeing 747SP with a 2.5m astronomic telescope. CANaerospace is also frequently used in flight simulation and connects entire aircraft cockpits (i.e. in Eurofighter Typhoon simulators) to the simulation host computers. In Italy CANaerospace is used as UAV data bus technology. Furthermore, CANaerospace serves as communication network in several general aviation avionics systems. The CANaerospace interface definition closes the gap between the ISO/OSI layer 1 and 2 CAN protocol (which is implemented in the CAN controller itself) and the specific requirements of distributed systems in aircraft. It may be used as a primary or ancillary avionics network and was designed to meet the following requirements: Democratic network: CANaerospace does not require any master/slave relationships between LRUs or a "bus controller", thereby avoiding a potential single source of failure. Every node in the network has the same rights for participation in the bus traffic. Self-identifying message format: Each CANaerospace message contains information about the type of the data and the transmitting node. This allows the data to be unambiguously recognized at each receiving node. Continuous Message Numbering: Each CANaerospace message contains a continuously incremented number which allows coherent processing of messages in the receiving stations. Message Status Code: Each CANaerospace message contains information about the integrity of the data is conveying. This allows receiving stations to evaluate the quality of the received data and to react accordingly. Emergency Event Signaling: CANaerospace defines a mechanism that allows each node to transmit information about exception or error situations. This information can be used by other stations to determine the network health. Node Service Interface: As an enhancement to CAN, CANaerospace provides a means for individual stations on the network to communicate with each other using connection-oriented and connectionless services. Predefined CAN Identifier Assignment: CANaerospace offers a predefined identifier assignment list for normal operation data. In addition to the predefined list, user-defined identifier assignment lists may be used. Ease of Implementation: The amount of code to implement CANaerospace is very little by design in order to minimize the effort for testing and certification of flight safety critical systems. Openness to Extensions: All CANaerospace definitions are extendable to provide flexibility for future enhancements and to allow adaptions to the requirements of specific applications. Free Availability: No cost whatsoever apply for the use of CANaerospace. The specification can be downloaded from the Internet == Physical interface == To ensure interoperability and reliable communication, CANaerospace specifies the electrical characteristics, bus transceiver requirements and data rates with the corresponding tolerances based on ISO 11898. The bit timing calculation (baud rate accuracy, sample point definition) and robustness to electromagnetic interference are given special emphasis. Also addressed are CAN connector, wiring considerations and design guidelines to maximize electromagnetic compatibility. == Communication layers == The Bosch CAN specification itself allows messages being transmitted both periodically and aperiodically but does not cover issues like data representation, node addressing or connection-oriented protocols. CAN is entirely based on Anyone-to-Many (ATM) communication which means that CAN messages are always received by all stations in the network. The advantage of the CAN concept is inherent data consistency between all stations, the drawback is that it does not allow node addressing which is the basis for Peer-to-Peer (PTP) communication. Using CAN networks in aeronautical applications, however, demands a standard targeted to the specific requirements of airborne systems which implies that communication between individual stations in the network must be possible to enable the required degree of system monitoring. Consequently, CANaerospace defines additional ISO/OSI layer 3, 4 and 6 functions to support node addressing and unified ATM/PTP communication mechanisms. PTP communication allows to set up client/server interactions between individual stations in the network either temporarily or permanently. More than one of these interactions may be in effect at any given time and each node may be client for one operation and server for another at the same time. This CANaerospace mechanism is called "Node Service Concept" and allows i.e. to distribute system functions over several stations in the network or to control dynamic system reconfiguration in case of failure. The Node Service concept supports both connection-oriented and connectionless interactions like with TCP/IP and UDP/IP for Ethernet. Enabling both ATM and PTP communication for CAN requires the introduction of independent network layers to isolate the different types of communication. This is realized for CANaerospace by forming CAN identifier groups as shown in Figure 1. The resulting structure creates Logical Communication Channels (LCCs) and assigns a specific communication type (ATM, PTP) to each of the LCCs. User-defined LCCs provide the necessary freedom for designers and allow the implementation of CANaerospace according to the needs of specific applications. Figure 1: Logical Communication Channels for CANaerospace As a side effect, the CAN identifier groups in Figure 1 affect the priority of the message transmission in case of bus arbitration. The communication channels are therefore arranged according to their relative importance: Emergency Event Data Channel (EED): This communication channel is used for messages which require immediate action (i.e. system degradation or reconfiguration) and have to be transmitted with very high priority. Emergency Event Data uses ATM communication exclusively. High/Low Priority Node Service Data Channel (NSH/NSL): These communication channels are used for client/server interactions using PTP communication. The corresponding services may be of the connection-oriented as well as the connectionless type. NSH/NSL may also be used to support test and maintenance functions. Normal Operation Data Channel (NOD): This communication channel is used for the transmission of the data which is generated during normal system operation and described in the CANaerospace identifier assignment list. These messages may be transmitted periodically or aperiodically as well as synchronously or asynchronously. All messages which cannot be assigned to other communication channels shall use this channel. High/Low Priority User-Defined Data Channel (UDH/UDL): This channel is dedicated to communication which cannot, due to their specific characteristics, be assigned other channels without violating the CANaerospace specification. As long as the defined identifier range is used, the message content and the communication type (ATM, PTP) for these channels may be specified by the system designer. To ensure interoperability it is highly recommended that the use of these channels is minimized. Debug Service Data Channel (DSD): This channel is dedicated to messages which are used temporarily for development and test purposes only and are not transmitted during normal operation. As long as the defined identifier range is used, the message content and the communication type (ATM, PTP) for these channels may be specified by the system designer. == Data representation == The majority of the real-time control systems used in aeronautics employ "big endian" processor architectures. This data representation was therefore specified for CANaerospace as well. With big endian data representation, the most significant bit of any datum is arranged leftmost and transmitted first on CANaerospace as shown in Figure 2. Figure 2: "Big Endian" Data Representation for CANaerospace CANaerospace uses a self-identifying message

    Read more →
  • Semi-Automatic Ground Environment

    Semi-Automatic Ground Environment

    The Semi-Automated Ground Environment (SAGE) was a system of large computers and associated networking equipment that coordinated data from many radar sites and processed it to produce a single unified image of the airspace over a wide area. SAGE directed and controlled the NORAD response to a possible Soviet air attack, operating in this role from the late 1950s into the 1980s. The processing power behind SAGE was supplied by the largest discrete component-based computer ever built, the AN/FSQ-7, manufactured by IBM. Each SAGE Direction Center (DC) housed an FSQ-7 which occupied an entire floor, approximately 22,000 square feet (2,000 m2) not including supporting equipment. The FSQ-7 was actually two computers, "A" side and "B" side. Computer processing was switched from "A" side to "B" side on a regular basis, allowing maintenance on the unused side. Information was fed to the DCs from a network of radar stations as well as readiness information from various defense sites. The computers, based on the raw radar data, developed "tracks" for the reported targets, and automatically calculated which defenses were within range. Operators used light guns to select targets on-screen for further information, select one of the available defenses, and issue commands to attack. These commands would then be automatically sent to the defense site via teleprinter. Connecting the various sites was an enormous network of telephones, modems and teleprinters. Later additions to the system allowed SAGE's tracking data to be sent directly to CIM-10 Bomarc missiles and some of the US Air Force's interceptor aircraft in-flight, directly updating their autopilots to maintain an intercept course without operator intervention. Each DC also forwarded data to a Combat Center (CC) for "supervision of the several sectors within the division" ("each combat center [had] the capability to coordinate defense for the whole nation"). SAGE became operational in the late 1950s and early 1960s at an estimated total cost between 8 and 12 billion dollars, four times the cost of the Manhattan Project. Throughout its development, there were continual concerns about its real ability to deal with large attacks, and the Operation Sky Shield tests showed that only about one-fourth of enemy bombers would have been intercepted. Nevertheless, SAGE was the backbone of NORAD's air defense system into the 1980s, by which time the tube-based FSQ-7s were increasingly costly to maintain and completely outdated. Today the same command and control task is carried out by microcomputers, based on the same basic underlying data. == Background == === Earlier systems === Just prior to World War II, Royal Air Force (RAF) tests with the new Chain Home (CH) radars had demonstrated that relaying information to the fighter aircraft directly from the radar sites was not feasible. The radars determined the map coordinates of the enemy, but could generally not see the fighters at the same time. This meant the fighters had to be able to determine where to fly to perform an interception but were often unaware of their own exact location and unable to calculate an interception while also flying their aircraft. The solution was to send all of the radar information to a central control station where operators collated the reports into single tracks, and then reported these tracks to the airbases, or sectors. The sectors used additional systems to track their own aircraft, plotting both on a single large map. Operators viewing the map could then see what direction their fighters would have to fly to approach their targets and relay that simply by telling them to fly along a certain heading or vector. This Dowding system was the first ground-controlled interception (GCI) system of large scale, covering the entirety of the UK. It proved enormously successful during the Battle of Britain, and is credited as being a key part of the RAF's success. The system was slow, often providing information that was up to five minutes out of date. Against propeller driven bombers flying at perhaps 225 miles per hour (362 km/h) this was not a serious concern, but it was clear the system would be of little use against jet-powered bombers flying at perhaps 600 miles per hour (970 km/h). The system was extremely expensive in manpower terms, requiring hundreds of telephone operators, plotters and trackers in addition to the radar operators. This was a serious drain on manpower, making it difficult to expand the network. The idea of using a computer to handle the task of taking reports and developing tracks had been explored beginning late in the war. By 1944, analog computers had been installed at the CH stations to automatically convert radar readings into map locations, eliminating two people. Meanwhile, the Royal Navy began experimenting with the Comprehensive Display System (CDS), another analog computer that took X and Y locations from a map and automatically generated tracks from repeated inputs. Similar systems began development with the Royal Canadian Navy, DATAR, and the US Navy, the Naval Tactical Data System (NTDS). A similar system was also specified for the Nike SAM project, specifically referring to a US version of CDS, coordinating the defense over a battle area so that multiple batteries did not fire on a single target. All of these systems were relatively small in geographic scale, generally tracking within a city-sized area. === Valley Committee === When the Soviet Union tested its first atomic bomb in August 1949, the topic of air defense of the US became important for the first time. A study group, the "Air Defense Systems Engineering Committee", was set up under the direction of Dr. George Valley to consider the problem and is known to history as the "Valley Committee". Their December report noted a key problem in air defense using ground-based radars. A bomber approaching a radar station would detect the signals from the radar long before the reflection off the bomber was strong enough to be detected by the station. The committee suggested that when this occurred, the bomber would descend to low altitude, thereby greatly limiting the radar horizon, allowing the bomber to fly past the station undetected. Although flying at low altitude greatly increased fuel consumption, the team calculated that the bomber would only need to do this for about 10% of its flight, making the fuel penalty acceptable. The only solution to this problem was to build a huge number of stations with overlapping coverage. At that point the problem became one of managing the information. Manual plotting was ruled out as too slow, and a computerized solution was the only possibility. To handle this task, the computer would need to be fed information directly, eliminating any manual translation by phone operators, and it would have to be able to analyze that information and automatically develop tracks. A system tasked with defending cities against the predicted future Soviet bomber fleet would have to be dramatically more powerful than the models used in the NTDS or DATAR. The Committee then had to consider whether or not such a computer was possible. The Valley Committee was introduced to Jerome Wiesner, associate director of the Research Laboratory of Electronics at MIT. Wiesner noted that the Servomechanisms Laboratory had already begun development of a machine that might be fast enough. This was the Whirlwind I, originally developed for the Office of Naval Research as a general purpose flight simulator that could simulate any current or future aircraft by changing its software. Wiesner introduced the Valley Committee to Whirlwind's project lead, Jay Forrester, who convinced him that Whirlwind was sufficiently capable. In September 1950, an early microwave early-warning radar system at Hanscom Field was connected to Whirlwind using a custom interface developed by Forrester's team. An aircraft was flown past the site, and the system digitized the radar information and successfully sent it to Whirlwind. With this demonstration, the technical concept was proven. Forrester was invited to join the committee. === Project Charles === With this successful demonstration, Louis Ridenour, chief scientist of the Air Force, wrote a memo stating "It is now apparent that the experimental work necessary to develop, test, and evaluate the systems proposals made by ADSEC will require a substantial amount of laboratory and field effort." Ridenour approached MIT President James Killian with the aim of beginning a development lab similar to the war-era Radiation Laboratory that made enormous progress in radar technology. Killian was initially uninterested, desiring to return the school to its peacetime civilian charter. Ridenour eventually convinced Killian the idea was sound by describing the way the lab would lead to the development of a local electronics industry based on the needs of the lab and the students who would leave the lab to start their

    Read more →
  • Harvest now, decrypt later

    Harvest now, decrypt later

    Harvest now, decrypt later (HNDL) is a surveillance strategy that relies on the acquisition and long-term storage of currently unreadable encrypted data awaiting possible breakthroughs in decryption technology that would render it readable in the future—a hypothetical date referred to as Y2Q (a reference to Y2K), or Q-Day. The most common concern is the prospect of developments in quantum computing which would allow current strong encryption algorithms to be broken at some time in the future, making it possible to decrypt any stored material that had been encrypted using those algorithms. However, the improvement in decryption technology need not be due to a quantum-cryptographic advance; any other form of attack capable of enabling decryption would be sufficient. The existence of this strategy has led to concerns about the need to urgently deploy post-quantum cryptography; even though no practical quantum attacks yet exist, some data stored now may still remain sensitive even decades into the future. As of 2022, the U.S. federal government has proposed a roadmap for organizations to start migrating toward quantum-cryptography-resistant algorithms to mitigate these threats. This new version of Commercial National Security Algorithm Suite uses publicly-available algorithms and is allowed for government use up to the TOP SECRET level. == Terminology and scope == The term “harvest now, decrypt later” encompasses various surveillance or espionage operations in which ciphertext or encrypted communications are collected today with the view that they may one day be decrypted, given sufficient advances in computing power or cryptanalysis. The abbreviation HNDL is sometimes used in technical and policy documents. The “Y2Q” (or “Q-Day”) label draws an analogy to the Y2K date-change issue, emphasising a potential future point at which current cryptography may collapse. The strategy is particularly relevant for data with long confidentiality lifetimes, such as diplomatic communications, personal health records, critical infrastructure logs, or intellectual property. == Mitigation strategies == The primary defense against HNDL attacks is the transition to post-quantum cryptography (PQC), which utilizes algorithms believed to be secure against quantum computer attacks. However, because PQC protects the data payload digitally, rather than the transmission itself, the encrypted data can still be harvested and stored. A complementary approach involves physical layer security (also known as optical layer encryption or photonic shielding). Unlike algorithmic encryption, this method modifies the optical waveform itself—often by burying the signal within optical noise or using spectral phase encoding—to render the transmission unrecordable by standard receivers. By preventing the attacker from capturing a valid signal in the first place, this approach aims to eliminate the "harvest" phase of the threat. Commercial implementations of harvest-proof optical encryption have been developed by firms such as CyberRidge to secure long-haul fiber networks. Field trials have demonstrated 100 Gbps throughput over legacy DWDM networks using this method.

    Read more →
  • Comparison of machine learning software

    Comparison of machine learning software

    The following tables are a comparison of machine learning software such as software frameworks, libraries, and computer programs used for machine learning. == Machine learning software == == Other comparisons == == Machine learning helper libraries and platforms == Apache OpenNLP — natural language processing toolkit CUDA — GPU computing platform used to accelerate machine learning and deep learning workloads Horovod — distributed training framework for deep learning Hugging Face Transformers — library of pretrained transformer models built on other machine learning frameworks Kubeflow — machine learning platform for Kubernetes Mallet — toolkit for natural language processing and text analysis NumPy — numerical computing library used in machine learning OpenCV — computer vision library with machine learning functions ONNX — open format for representing machine learning models pandas — data analysis and data preparation library used in machine learning PlaidML — tensor compiler and backend for machine learning frameworks Polars — Dataframe library used for machine learning data preparation and analysis PyArrow — columnar data library used in machine learning data processing ROOT (TMVA) — data analysis framework with machine learning tools SciPy — scientific computing and optimization library used in machine learning == Online development environments for machine learning == Google Colab — hosted Jupyter Notebook environment commonly used for machine learning and deep learning JupyterLab — notebook-based development environment for machine learning and data science Jupyter Notebook — interactive notebook environment used for machine learning and data science Kaggle — online data science and machine learning platform

    Read more →
  • Cryptographic bill of materials

    Cryptographic bill of materials

    Cryptographic bill of materials (CBOM—also cryptography bill of materials) is a structured inventory of all cryptographic assets present in a software, firmware, device, or system. It enumerates algorithms (and parameters such as key sizes and modes), cryptographic libraries or modules, digital certificates, keys and related material, and protocols in use, and maps their relationships to the components that implement or invoke them. CBOMs are used to improve security analysis, compliance, and cryptographic agility, and are increasingly referenced in guidance for post‑quantum cryptography (PQC) migration. == Definition and scope == A CBOM inventories cryptographic primitives and materials—such as encryption and signature algorithms (with specific variants and modes), key sizes, cryptographic libraries/modules, digital certificates (e.g., X.509), keys and other related cryptographic material, and security protocols (e.g., TLS, IPsec). It also documents dependencies (for example, an application uses an algorithm provided by a library; a protocol uses several algorithms) and can capture certificate lifecycles, cryptographic module certifications (e.g., FIPS 140‑3), and policy conformance metadata. In common practice, a CBOM may be embedded within an SBOM format (such as CycloneDX) or exported as a separate, linked artifact. === Typical CBOM fields === The exact schema varies by implementation, but common fields are summarized below (see CycloneDX CBOM guide and NIST SP 1800‑38B). == Relation to SBOM == A CBOM is complementary to, but distinct from, a software bill of materials (SBOM). Whereas an SBOM lists software components and their versions, a CBOM focuses specifically on the cryptography present and how it is configured and used. For example, an SBOM might enumerate inclusion of a library such as OpenSSL, while the CBOM would identify which algorithms and parameters that library enables (e.g., RSA‑2048, ECDH P‑256, AES‑GCM) and list relevant keys and certificates. The pairing enables both supply‑chain transparency and cryptographic transparency. == History == The term and practice emerged in the early–mid 2020s alongside software‑supply‑chain transparency and PQC planning. The OWASP CycloneDX standard introduced native CBOM support (v1.6 and later), modeling algorithms, keys, certificates, and protocols as first‑class “cryptographic assets” and providing dependency semantics (uses/implements) between software and cryptography. Open tooling from industry and researchers (e.g., IBM's CBOMkit and related generators/viewers) appeared to automate discovery and representation of cryptographic use in the CycloneDX CBOM schema. == Regulatory and policy context == In the United States, policy has emphasized cryptographic inventories as a prerequisite to PQC migration. The White House's National Security Memorandum 10 (2022) directed a government‑wide transition to quantum‑resistant cryptography; the Office of Management and Budget's M‑23‑02 (November 2022) operationalized this by requiring agencies to submit a prioritized inventory of cryptographic systems (with algorithm and key details) by 4 May 2023 and annually thereafter, and tasked CISA/NSA/NIST to develop automated discovery and inventory strategies. A 2024 Office of the National Cyber Director report reiterated that a “comprehensive cryptographic inventory” is the baseline for PQC planning and must be maintained iteratively with both automated and manual discovery. NIST's NCCoE practice guide (SP 1800‑38B, preliminary draft) provides concrete methods for cryptographic discovery and documentation across enterprises, aligning with CBOM‑style representations. CISA later published a strategy to migrate federal agencies to automated cryptography discovery and inventory tools to support continuous reporting. Separately, NSA, CISA, and NIST issued joint guidance encouraging all organisations to prepare cryptographic inventories and roadmaps for PQC, beyond government environments. == Role in quantum readiness and cryptographic agility == Because large‑scale quantum computing threatens widely used public‑key algorithms (e.g., RSA, ECC), organisations are planning multi‑year transitions to post-quantum cryptography. CBOMs enable that planning by identifying where quantum‑vulnerable algorithms appear, prioritising high‑impact systems, and tracking replacements over time. A machine‑readable CBOM also supports cryptographic agility and incident response: if an algorithm, library, or certificate lifecycle becomes non‑compliant or vulnerable, the CBOM indicates which products and systems are affected and where mitigations must be applied first. == Standards and tooling == CycloneDX (OWASP): Native CBOM modelling (v1.6+) for algorithms, certificates, keys/related material, and protocols, with dependency semantics and examples. The project publishes a CBOM guide and use‑case profiles (e.g., certificate and algorithm inventories). NIST NCCoE SP 1800‑38 series: Practice guides for PQC migration include enterprise cryptographic discovery methods that produce CBOM‑like inventories and integrate multiple discovery tools. Government automation initiatives: Following M‑23‑02, CISA issued a strategy to migrate to automated cryptography discovery and inventory tools to support agency reporting and continuous inventory management. Open‑source and vendor tools: IBM's CBOMkit and related components generate, analyse, and visualise CBOMs; the IBM CBOM specification work was upstreamed into CycloneDX 1.6. === Data model and interchange (example) === CycloneDX provides machine‑readable encodings (JSON/XML) for CBOM content. The example below (subset) shows an application depending on a crypto library that provides the AES‑256‑GCM algorithm, and the application also depends on a leaf X.509 certificate. See the CycloneDX CBOM guide, JSON reference, and the “Implementation details” use‑case for the semantics of `dependsOn` and `provides`. == Relationship to cybersecurity supply chain initiatives == CBOMs complement SBOM‑focused supply‑chain transparency introduced by U.S. Executive Order 14028 and NTIA/NIST SBOM work. SBOMs document software components; CBOMs add detail on embedded cryptography to support risk management, policy compliance (e.g., disallowing deprecated algorithms), and PQC transition planning.

    Read more →
  • European Grid Infrastructure

    European Grid Infrastructure

    EGI (originally an initialism for European Grid Infrastructure) is a federation of computing and storage resource providers that deliver advanced computing and data analytics services for research and innovation. The Federation is governed by its participants represented in the EGI Council and coordinated by the EGI Foundation. As of 2024, the EGI Federation supports 160 scientific communities worldwide and over 95,000 users in their intensive data analysis. The most significant scientific communities supported by EGI in 2022 were Medical and Health Sciences, High Energy Physics, and Engineering and Technology. The EGI Federation provideds services through over 150 data centres, of which 25 are cloud sites, in 43 countries and 64 Research Infrastructures (4 of which are members of the Federation). == Name == Originally, EGI stood for European Grid Infrastructure. This reflected its focus on providing access to high-throughput computing resources across Europe using Grid computing techniques. However, as EGI's service offerings expanded beyond traditional grid computing, particularly with the incorporation of federated cloud services, the original meaning of the acronym became less accurate. To emphasise the broader scope of EGI's services and avoid any confusion associated with the outdated term "grid," it is recommended to refer to EGI simply as EGI. == Structure == === EGI Federation === The EGI Federation delivers a scalable digital research infrastructure (e-infrastructure), empowering tens of thousands of researchers across diverse scientific disciplines. Through the EGI Federation, researchers gain access to advanced computing and data analytics capabilities, including large-scale data analysis, while benefiting from the collaborative efforts of hundreds of service providers from both public and private sectors, consolidating resources from Europe and beyond. Overall, the EGI Federation offers a range of services, encompassing distributed high-throughput computing and cloud computing, storage and data management capabilities, co-development of new solutions, expert support, and comprehensive training opportunities. This ecosystem propels collaboration, scientific progress and innovation. === EGI Foundation === The EGI Foundation is the coordinating body of the EGI Federation. It was established in 2010 with headquarters in Amsterdam, Netherlands. The Foundation coordinates the research and innovation efforts of its members, spanning technical areas critical to data-intensive science, including large-scale data processing and analysis, distributed Artificial Intelligence/Machine Learning, federated Identity and access management and the application of digital twins for research. The day-to-day running of the EGI Foundation is supervised by the Executive Board. The board’s members work closely with the EGI Director on operational, technical and financial issues. The Executive Board’s members are appointed by the EGI Council for a two-year term. === EGI Council === The EGI Council is responsible for defining the strategic direction of the EGI Federation. The Council acts as the senior decision-making and supervisory authority of the EGI Foundation, with a mandate to define the strategic direction of the entire EGI ecosystem. === EGI Services === EGI offers a suite of services to support data-intensive research. These services include compute resources, orchestration tools, storage and data management solutions, training programmes, security and identity services, and applications. Compute resources encompass cloud compute, cloud container compute, high-throughput compute, and software distribution. Orchestration tools include the Workload Manager and infrastructure manager. Storage and data management solutions include online storage, data transfer, and DataHub. Training programmes cover FitSM, ISO 27001, and general training infrastructure. EGI Check-in and Secrets Store are key security and identity services, while applications such as Notebooks and Replay enhance research productivity. In addition to services for Research, EGI also provides services for Federation and Business. Services for Federation are designed to help resource providers and user communities collaborate and share resources. EGI also offers a range of services to support businesses in their digital transformation. Through the EGI Digital Innovation Hub (EGI DIH), companies can access advanced computing resources, networking, funding and training opportunities, collaborate with research institutions, and test solutions before investing. == History == In 2002, the first large-scale experimental facility was successfully demonstrated by the DataGrid project under the lead of CERN with tens of technical architects from the major High Energy Physics institutes in the world. For the first time, distributed computing was applied to data-intensive processing. It aimed at developing a large-scale computational grid to facilitate distributed data-intensive scientific computing across High Energy Physics, Earth Observation, and Biology science applications. On 28 February 2003, the first software release of LCG-MW was published. gLite, the Lightweight Middleware for Grid Computing and LCG, Large Hadron Collider Computing Grid, are the cornerstone of the Worldwide LHC Computing Grid, which expanded over time towards the EGI Federation. 2004 marks the year of the first pilot infrastructure, seeing the participation of CERN and data centres in the United Kingdom, Spain, Germany, the Netherlands, France, Canada, Russia, Bulgaria, the Asia-Pacific region and Switzerland. Over the years, the infrastructure has grown into a federation of 128 data centres and 25 cloud providers serving more than 95,000 users worldwide. In 2004, the first data processing tasks started being formally recorded in a central accounting system. The EGI Accounting Portal provides the accounting data for Compute, Storage and Data services gathered from the data centres of the EGI Federation. A few years later, in 2010, EGI was established as the coordinating body of the EGI Federation to build an integrated pan-European infrastructure to support European research communities primarily. In the same year, EGI launched the flagship project EGI Inspire. That project brought together European organisations to establish a sustainable European Grid Infrastructure for large-scale data analysis. The success of the project was due to the adoption of a distributed computing model to solve big data problems. Moreover, EGI-Inspire harmonised operational policies across its federation of affiliated data centres and cloud service providers worldwide, integrating e-infrastructures from 57 countries. The EGI Federation was the first to apply federation to cloud provisioning, opening a new avenue in large-scale interactive data analysis. In 2015, within EGI Engage, opening a new avenue in large-scale interactive data analysis. The EGI Federated Cloud is an IaaS-type cloud, incorporating academic and private clouds and virtualised resources built using open standards. Its development is driven by the needs of the scientific community, resulting in a novel research e-infrastructure that relies on well-established federated operational services, making EGI a dependable resource for scientific endeavours. In 2015, EGI, EUDAT, GÉANT, LIBER and OpenAIRE published a position paper on a 'European Open Science Cloud for Research'. With the EOSC-hub project in 2016, EGI started contributing in practice to shaping the services for the EOSC. The work continued with a series of projects, like EOSC Enhance, EOSC Life and EOSC Synergy. With EGI-ACE and its contribution to EOSC Future, EGI has continued developing the EOSC Core. In early 2024, EGI started providing services to the EOSC EU Node, and with EOSC Beyond it will provide new EOSC Core capabilities and pilot additional national and thematic nodes. In October 2024, EUDAT, GÉANT, OpenAIRE, PRACE and EGI signed a Memorandum of Understanding establishing the European e-Infrastructures Assembly. This collaboration will bolster the position and promote the services of e-Infrastructures, empowering researchers across Europe to drive innovation and advance scientific discovery.

    Read more →
  • Signatures with efficient protocols

    Signatures with efficient protocols

    Signatures with efficient protocols are a form of digital signature invented by Jan Camenisch and Anna Lysyanskaya in 2001. In addition to being secure digital signatures, they need to allow for the efficient implementation of two protocols: A protocol for computing a digital signature in a secure two-party computation protocol. A protocol for proving knowledge of a digital signature in a zero-knowledge protocol. In applications, the first protocol allows a signer to possess the signing key to issue a signature to a user (the signature owner) without learning all the messages being signed or the complete signature. The second protocol allows the signature owner to prove that he has a signature on many messages without revealing the signature and only a (possibly) empty subset of the messages. The combination of these two protocols allows for the implementation of digital credential and ecash protocols.

    Read more →
  • Database index

    Database index

    A database index is a data structure that improves the speed of data retrieval operations on a database table at the cost of additional writes and storage space to maintain the index data structure. Indexes are used to quickly locate data without having to search every row in a database table every time said table is accessed. Indexes can be created using one or more columns of a database table, providing the basis for both rapid random lookups and efficient access of ordered records. An index is a copy of selected columns of data, from a table, that is designed to enable very efficient search. An index normally includes a "key" or direct link to the original row of data from which it was copied, to allow the complete row to be retrieved efficiently. Some databases extend the power of indexing by letting developers create indexes on column values that have been transformed by functions or expressions. For example, an index could be created on upper(last_name), which would only store the upper-case versions of the last_name field in the index. Another option sometimes supported is the use of partial index, where index entries are created only for those records that satisfy some conditional expression. A further aspect of flexibility is to permit indexing on user-defined functions, as well as expressions formed from an assortment of built-in functions. == Usage == === Support for fast lookup === Most database software includes indexing technology that enables sub-linear time lookup to improve performance, as linear search is inefficient for large databases. Suppose a database contains N data items and one must be retrieved based on the value of one of the fields. A simple implementation retrieves and examines each item according to the test. If there is only one matching item, this can stop when it finds that single item, but if there are multiple matches, it must test everything. This means that the number of operations in the average case is O(N) or linear time. Since databases may contain many objects, and since lookup is a common operation, it is often desirable to improve performance. An index is any data structure that improves the performance of lookup. There are many different data structures used for this purpose. There are complex design trade-offs involving lookup performance, index size, and index-update performance. Many index designs exhibit logarithmic (O(log(N))) lookup performance and in some applications it is possible to achieve flat (O(1)) performance. === Policing the database constraints === Indexes are used to police database constraints, such as UNIQUE, EXCLUSION, PRIMARY KEY and FOREIGN KEY. An index may be declared as UNIQUE, which creates an implicit constraint on the underlying table. Database systems usually implicitly create an index on a set of columns declared PRIMARY KEY, and some are capable of using an already-existing index to police this constraint. Many database systems require that both referencing and referenced sets of columns in a FOREIGN KEY constraint are indexed, thus improving performance of inserts, updates and deletes to the tables participating in the constraint. Some database systems support an EXCLUSION constraint that ensures that, for a newly inserted or updated record, a certain predicate holds for no other record. This can be used to implement a UNIQUE constraint (with equality predicate) or more complex constraints, like ensuring that no overlapping time ranges or no intersecting geometry objects would be stored in the table. An index supporting fast searching for records satisfying the predicate is required to police such a constraint. == Index architecture and indexing methods == === Non-clustered === The data is present in arbitrary order, but the logical ordering is specified by the index. The data rows may be spread throughout the table regardless of the value of the indexed column or expression. The non-clustered index tree contains the index keys in sorted order, with the leaf level of the index containing the pointer to the record (page and the row number in the data page in page-organized engines; row offset in file-organized engines). In a non-clustered index, The physical order of the rows is not the same as the index order. The indexed columns are typically non-primary key columns used in JOIN, WHERE, and ORDER BY clauses. There can be more than one non-clustered index on a database table. === Clustered === Clustering alters the data block into a certain distinct order to match the index, resulting in the row data being stored in order. Therefore, only one clustered index can be created on a given database table. Clustered indexes can greatly increase overall speed of retrieval, but usually only where the data is accessed sequentially in the same or reverse order of the clustered index, or when a range of items is selected. Since the physical records are in this sort order on disk, the next row item in the sequence is immediately before or after the last one, and so fewer data block reads are required. The primary feature of a clustered index is therefore the ordering of the physical data rows in accordance with the index blocks that point to them. Some databases separate the data and index blocks into separate files, others put two completely different data blocks within the same physical file(s). === Cluster === When multiple databases and multiple tables are joined, it is called a cluster (not to be confused with clustered index described previously). The records for the tables sharing the value of a cluster key shall be stored together in the same or nearby data blocks. This may improve the joins of these tables on the cluster key, since the matching records are stored together and less I/O is required to locate them. The cluster configuration defines the data layout in the tables that are parts of the cluster. A cluster can be keyed with a B-tree index or a hash table. The data block where the table record is stored is defined by the value of the cluster key. == Column order == The order that the index definition defines the columns in is important. It is possible to retrieve a set of row identifiers using only the first indexed column. However, it is not possible or efficient (on most databases) to retrieve the set of row identifiers using only the second or greater indexed column. For example, in a phone book organized by city first, then by last name, and then by first name, in a particular city, one can easily extract the list of all phone numbers. However, it would be very tedious to find all the phone numbers for a particular last name. One would have to look within each city's section for the entries with that last name. Some databases can do this, others just won't use the index. In the phone book example with a composite index created on the columns (city, last_name, first_name), if we search by giving exact values for all the three fields, search time is minimal—but if we provide the values for city and first_name only, the search uses only the city field to retrieve all matched records. Then a sequential lookup checks the matching with first_name. So, to improve the performance, one must ensure that the index is created on the order of search columns. == Applications and limitations == Indexes are useful for many applications but come with some limitations. Consider the following SQL statement: SELECT first_name FROM people WHERE last_name = 'Smith';. To process this statement without an index the database software must look at the last_name column on every row in the table (this is known as a full table scan). With an index the database simply follows the index data structure (typically a B-tree) until the Smith entry has been found; this is much less computationally expensive than a full table scan. Consider this SQL statement: SELECT email_address FROM customers WHERE email_address LIKE '%@wikipedia.org';. This query would yield an email address for every customer whose email address ends with "@wikipedia.org", but even if the email_address column has been indexed the database must perform a full index scan. This is because the index is built with the assumption that words go from left to right. With a wildcard at the beginning of the search-term, the database software is unable to use the underlying index data structure (in other words, the WHERE-clause is not sargable). This problem can be solved through the addition of another index created on reverse(email_address) and a SQL query like this: SELECT email_address FROM customers WHERE reverse(email_address) LIKE reverse('%@wikipedia.org');. This puts the wild-card at the right-most part of the query (now gro.aidepikiw@%), which the index on reverse(email_address) can satisfy. When the wildcard characters are used on both sides of the search word as %wikipedia.org%, the index available on this field is not used. Rather only a sequential search is performed, which takes ⁠ O ( N ) {\displaystyle

    Read more →
  • European Grid Infrastructure

    European Grid Infrastructure

    EGI (originally an initialism for European Grid Infrastructure) is a federation of computing and storage resource providers that deliver advanced computing and data analytics services for research and innovation. The Federation is governed by its participants represented in the EGI Council and coordinated by the EGI Foundation. As of 2024, the EGI Federation supports 160 scientific communities worldwide and over 95,000 users in their intensive data analysis. The most significant scientific communities supported by EGI in 2022 were Medical and Health Sciences, High Energy Physics, and Engineering and Technology. The EGI Federation provideds services through over 150 data centres, of which 25 are cloud sites, in 43 countries and 64 Research Infrastructures (4 of which are members of the Federation). == Name == Originally, EGI stood for European Grid Infrastructure. This reflected its focus on providing access to high-throughput computing resources across Europe using Grid computing techniques. However, as EGI's service offerings expanded beyond traditional grid computing, particularly with the incorporation of federated cloud services, the original meaning of the acronym became less accurate. To emphasise the broader scope of EGI's services and avoid any confusion associated with the outdated term "grid," it is recommended to refer to EGI simply as EGI. == Structure == === EGI Federation === The EGI Federation delivers a scalable digital research infrastructure (e-infrastructure), empowering tens of thousands of researchers across diverse scientific disciplines. Through the EGI Federation, researchers gain access to advanced computing and data analytics capabilities, including large-scale data analysis, while benefiting from the collaborative efforts of hundreds of service providers from both public and private sectors, consolidating resources from Europe and beyond. Overall, the EGI Federation offers a range of services, encompassing distributed high-throughput computing and cloud computing, storage and data management capabilities, co-development of new solutions, expert support, and comprehensive training opportunities. This ecosystem propels collaboration, scientific progress and innovation. === EGI Foundation === The EGI Foundation is the coordinating body of the EGI Federation. It was established in 2010 with headquarters in Amsterdam, Netherlands. The Foundation coordinates the research and innovation efforts of its members, spanning technical areas critical to data-intensive science, including large-scale data processing and analysis, distributed Artificial Intelligence/Machine Learning, federated Identity and access management and the application of digital twins for research. The day-to-day running of the EGI Foundation is supervised by the Executive Board. The board’s members work closely with the EGI Director on operational, technical and financial issues. The Executive Board’s members are appointed by the EGI Council for a two-year term. === EGI Council === The EGI Council is responsible for defining the strategic direction of the EGI Federation. The Council acts as the senior decision-making and supervisory authority of the EGI Foundation, with a mandate to define the strategic direction of the entire EGI ecosystem. === EGI Services === EGI offers a suite of services to support data-intensive research. These services include compute resources, orchestration tools, storage and data management solutions, training programmes, security and identity services, and applications. Compute resources encompass cloud compute, cloud container compute, high-throughput compute, and software distribution. Orchestration tools include the Workload Manager and infrastructure manager. Storage and data management solutions include online storage, data transfer, and DataHub. Training programmes cover FitSM, ISO 27001, and general training infrastructure. EGI Check-in and Secrets Store are key security and identity services, while applications such as Notebooks and Replay enhance research productivity. In addition to services for Research, EGI also provides services for Federation and Business. Services for Federation are designed to help resource providers and user communities collaborate and share resources. EGI also offers a range of services to support businesses in their digital transformation. Through the EGI Digital Innovation Hub (EGI DIH), companies can access advanced computing resources, networking, funding and training opportunities, collaborate with research institutions, and test solutions before investing. == History == In 2002, the first large-scale experimental facility was successfully demonstrated by the DataGrid project under the lead of CERN with tens of technical architects from the major High Energy Physics institutes in the world. For the first time, distributed computing was applied to data-intensive processing. It aimed at developing a large-scale computational grid to facilitate distributed data-intensive scientific computing across High Energy Physics, Earth Observation, and Biology science applications. On 28 February 2003, the first software release of LCG-MW was published. gLite, the Lightweight Middleware for Grid Computing and LCG, Large Hadron Collider Computing Grid, are the cornerstone of the Worldwide LHC Computing Grid, which expanded over time towards the EGI Federation. 2004 marks the year of the first pilot infrastructure, seeing the participation of CERN and data centres in the United Kingdom, Spain, Germany, the Netherlands, France, Canada, Russia, Bulgaria, the Asia-Pacific region and Switzerland. Over the years, the infrastructure has grown into a federation of 128 data centres and 25 cloud providers serving more than 95,000 users worldwide. In 2004, the first data processing tasks started being formally recorded in a central accounting system. The EGI Accounting Portal provides the accounting data for Compute, Storage and Data services gathered from the data centres of the EGI Federation. A few years later, in 2010, EGI was established as the coordinating body of the EGI Federation to build an integrated pan-European infrastructure to support European research communities primarily. In the same year, EGI launched the flagship project EGI Inspire. That project brought together European organisations to establish a sustainable European Grid Infrastructure for large-scale data analysis. The success of the project was due to the adoption of a distributed computing model to solve big data problems. Moreover, EGI-Inspire harmonised operational policies across its federation of affiliated data centres and cloud service providers worldwide, integrating e-infrastructures from 57 countries. The EGI Federation was the first to apply federation to cloud provisioning, opening a new avenue in large-scale interactive data analysis. In 2015, within EGI Engage, opening a new avenue in large-scale interactive data analysis. The EGI Federated Cloud is an IaaS-type cloud, incorporating academic and private clouds and virtualised resources built using open standards. Its development is driven by the needs of the scientific community, resulting in a novel research e-infrastructure that relies on well-established federated operational services, making EGI a dependable resource for scientific endeavours. In 2015, EGI, EUDAT, GÉANT, LIBER and OpenAIRE published a position paper on a 'European Open Science Cloud for Research'. With the EOSC-hub project in 2016, EGI started contributing in practice to shaping the services for the EOSC. The work continued with a series of projects, like EOSC Enhance, EOSC Life and EOSC Synergy. With EGI-ACE and its contribution to EOSC Future, EGI has continued developing the EOSC Core. In early 2024, EGI started providing services to the EOSC EU Node, and with EOSC Beyond it will provide new EOSC Core capabilities and pilot additional national and thematic nodes. In October 2024, EUDAT, GÉANT, OpenAIRE, PRACE and EGI signed a Memorandum of Understanding establishing the European e-Infrastructures Assembly. This collaboration will bolster the position and promote the services of e-Infrastructures, empowering researchers across Europe to drive innovation and advance scientific discovery.

    Read more →
  • Key Transparency

    Key Transparency

    Key Transparency allows communicating parties to verify public keys used in end-to-end encryption. In many end-to-end encryption services, to initiate communication a user will reach out to a central server and request the public keys of the user with which they wish to communicate. If the central server is malicious or becomes compromised, a man-in-the-middle attack can be launched through the issuance of incorrect public keys. The communications can then be intercepted and manipulated. Additionally, legal pressure could be applied by surveillance agencies to manipulate public keys and read messages. With Key Transparency, public keys are posted to a public log that can be universally audited. Communicating parties can verify public keys used are accurate.

    Read more →