eLabFTW is a web application written by Nicolas Carpi in PHP which can be used to create personal and common logbooks. It has been developed at the Curie Institute originally. Besides there, it is used on universities around the world eLabFTW is licensed under the GNU Affero General Public License as free software. It is translated into seven languages. == Description == eLabFTW is a free and open-source lab book. It is written in PHP and uses a MySQL database. Docker containers are also available. Among the various features are Secure. Entries and transmission are encrypted Timestamps. RFC 3161 compliant timestamping of experiments. Inventory management. Apart from experience logs, it also can manage the inventory Import and export. Entries can be imported and exported == Platforms == eLabFTW is a PHP package with Mysql database. Therefore, it can be executed on most servers. Furthermore, the docker containers allow to run it almost everywhere. == Usage == eLabFTW is used by various universities, like University of Alberta, Berkeley University, Hanover Medical School, Cardiff University and UMC Utrecht
Elastic cloud storage
An elastic cloud is a cloud computing offering that provides variable service levels based on changing needs. Elasticity is an attribute that can be applied to most cloud services. It states that the capacity and performance of any given cloud service can expand or contract according to a customer's requirements and that this can potentially be changed automatically as a consequence of some software-driven event or, at worst, can be reconfigured quickly by the customer's infrastructure management team. Elasticity has been described as one of the five main principles of cloud computing by Rosenburg and Mateos in The Cloud at Your Service - Manning 2011. == History == Cloud computing was first described by Gillet and Kapor in 1996; however, the first practical implementation was a consequence of a strategy to leverage Amazon's excess data center capacity. Amazon and other pioneers of the commercial use of this technology were primarily interested in providing a “public” cloud service, whereby they could offer customers the benefits of using the cloud, particularly the utility-based pricing model benefit. Other suppliers followed suit with a range of cloud-based models all offering elasticity as a core component, but these suppliers were only offering this service as an element of their public cloud service. Due to perceived weaknesses in security, or at least a lack of proven compliance, many organizations, particularly in the financial and public sectors, have been slow adopters of cloud technologies. These wary organizations can achieve some of the benefits of cloud computing by adopting private cloud technologies. An alternative form of the elastic cloud has been offered by vendors such as EMC and IBM, whereby the service is based around an enterprise's own infrastructure but still retains elements of elasticity and the potential to bill by consumption. == Description == Elasticity in cloud computing is the ability for the organization to adjust its storage requirements in terms of capacity and processing with respect to operational requirements. This has the following benefits: Operational Benefits - Services can be acquired quickly, meaning that the evolving requirements of the business can be addressed almost immediately, giving an organization a potential agility advantage. A properly implemented elastic system will provision/de-provision according to application demands, so if a particular business has activity spikes then the provision can be enabled to match the demand and the capacity can be re-allocated. Research and Development (R&D) Projects - R&D activities are no longer hindered by a requirement to secure a capex budget prior to a project starting. Capability can simply be provisioned from the cloud and released at the end of the exercise. Testing and Deployment - With most large-scale projects a size test needs to be performed prior to final rollout. By taking advantage of the elasticity of the cloud and creating a full-scale avatar of the proposed production system, realistic data and traffic volumes can be provisioned and released as needed. Expensive Resources Allocated - This will normally apply only in the context where a customer is applying at least some of their own servers as part of a cloud infrastructure, specifically where a business (for performance reasons) has decided to invest in solid-state storage as opposed to spinning platters. There are instances when, due to activity spikes, a less critical process may need to be moved from the high-performance resources to more traditional storage. Server Specification - When a customer has elected to own/lease hardware, they can select and specify servers that are specifically tuned to meet the likely needs of their operation (i.e., directly controlling the cost/benefit equation). Utility Based Payments - There is, of course, a key cost driver in this process, and the notion that you should pay for what you consume is acceptable for many organizations. When hardware capacity is sourced internally, organizations need to over-provision. This applies just as much to traditional outsourcing as it does to capex-related expenditure on in-house servers. Cloud Platform – At the heart of any cloud storage system is the ability to manage hyperscale object storage and a Hadoop Distributed Files System (HDFS). Elastic storage capability is particularly well suited to hyperscale and Hadoop environments, where its capability to rapidly respond to changing circumstances and priorities is essential
Storage area network
A storage area network (SAN) or storage network is a computer network which provides access to consolidated, block-level data storage. SANs are primarily used to access data storage devices, such as disk arrays and tape libraries from servers so that the devices appear to the operating system as direct-attached storage. A SAN typically is a dedicated network of storage devices not accessible through the local area network (LAN). Although a SAN provides only block-level access, file systems built on top of SANs do provide file-level access and are known as shared-disk file systems. Newer SAN configurations enable hybrid SAN and allow traditional block storage that appears as local storage but also object storage for web services through APIs. == Storage architectures == Storage area networks (SANs) are sometimes referred to as network behind the servers and historically developed out of a centralized data storage model, but with its own data network. A SAN is, at its simplest, a dedicated network for data storage. In addition to storing data, SANs allow for the automatic backup of data, and the monitoring of the storage as well as the backup process. A SAN is a combination of hardware and software. It grew out of data-centric mainframe architectures, where clients in a network can connect to several servers that store different types of data. To scale storage capacities as the volumes of data grew, direct-attached storage (DAS) was developed, where disk arrays or just a bunch of disks (JBODs) were attached to servers. In this architecture, storage devices can be added to increase storage capacity. However, the server through which the storage devices are accessed is a single point of failure, and a large part of the LAN network bandwidth is used for accessing, storing and backing up data. To solve the single point of failure issue, a direct-attached shared storage architecture was implemented, where several servers could access the same storage device. DAS was the first network storage system and is still widely used where data storage requirements are not very high. Out of it developed the network-attached storage (NAS) architecture, where one or more dedicated file server or storage devices are made available in a LAN. Therefore, the transfer of data, particularly for backup, still takes place over the existing LAN. If more than a terabyte of data was stored at any one time, LAN bandwidth became a bottleneck. Therefore, SANs were developed, where a dedicated storage network was attached to the LAN, and terabytes of data are transferred over a dedicated high speed and bandwidth network. Within the SAN, storage devices are interconnected. Transfer of data between storage devices, such as for backup, happens behind the servers and is meant to be transparent. In a NAS architecture data is transferred using the TCP and IP protocols over Ethernet. Distinct protocols were developed for SANs, such as Fibre Channel, iSCSI, Infiniband. Therefore, SANs often have their own network and storage devices, which have to be bought, installed, and configured. This makes SANs inherently more expensive than NAS architectures. == Components == SANs have their own networking devices, such as SAN switches. To access the SAN, so-called SAN servers are used, which in turn connect to SAN host adapters. Within the SAN, a range of data storage devices may be interconnected, such as SAN-capable disk arrays, JBODs and tape libraries. === Host layer === Servers that allow access to the SAN and its storage devices are said to form the host layer of the SAN. Such servers have host adapters, which are cards that attach to slots on the server motherboard (usually PCI slots) and run with a corresponding firmware and device driver. Through the host adapters the operating system of the server can communicate with the storage devices in the SAN. In Fibre channel deployments, a cable connects to the host adapter through the gigabit interface converter (GBIC). GBICs are also used on switches and storage devices within the SAN, and they convert digital bits into light impulses that can then be transmitted over the Fibre Channel cables. Conversely, the GBIC converts incoming light impulses back into digital bits. The predecessor of the GBIC was called gigabit link module (GLM). === Fabric layer === The fabric layer consists of SAN networking devices that include SAN switches, routers, protocol bridges, gateway devices, and cables. SAN network devices move data within the SAN, or between an initiator, such as an HBA port of a server, and a target, such as the port of a storage device. When SANs were first built, hubs were the only devices that were Fibre Channel capable, but Fibre Channel switches were developed and hubs are now rarely found in SANs. Switches have the advantage over hubs that they allow all attached devices to communicate simultaneously, as a switch provides a dedicated link to connect all its ports with one another. When SANs were first built, Fibre Channel had to be implemented over copper cables, these days multimode optical fibre cables are used in SANs. SANs are usually built with redundancy, so SAN switches are connected with redundant links. SAN switches connect the servers with the storage devices and are typically non-blocking allowing transmission of data across all attached wires at the same time. SAN switches are for redundancy purposes set up in a meshed topology. A single SAN switch can have as few as 8 ports and up to 32 ports with modular extensions. So-called director-class switches can have as many as 128 ports. In switched SANs, the Fibre Channel switched fabric protocol FC-SW-6 is used under which every device in the SAN has a hardcoded World Wide Name (WWN) address in the host bus adapter (HBA). If a device is connected to the SAN its WWN is registered in the SAN switch name server. In place of a WWN, or worldwide port name (WWPN), SAN Fibre Channel storage device vendors may also hardcode a worldwide node name (WWNN). The ports of storage devices often have a WWN starting with 5, while the bus adapters of servers start with 10 or 21. === Storage layer === The serialized Small Computer Systems Interface (SCSI) protocol is often used on top of the Fibre Channel switched fabric protocol in servers and SAN storage devices. The Internet Small Computer Systems Interface (iSCSI) over Ethernet and the Infiniband protocols may also be found implemented in SANs, but are often bridged into the Fibre Channel SAN. However, Infiniband and iSCSI storage devices, in particular, disk arrays, are available. The various storage devices in a SAN are said to form the storage layer. It can include a variety of hard disk and magnetic tape devices that store data. In SANs, disk arrays are joined through a RAID which makes a lot of hard disks look and perform like one big storage device. Every storage device, or even partition on that storage device, has a logical unit number (LUN) assigned to it. This is a unique number within the SAN. Every node in the SAN, be it a server or another storage device, can access the storage by referencing the LUN. The LUNs allow for the storage capacity of a SAN to be segmented and for the implementation of access controls. A particular server, or a group of servers, may, for example, be only given access to a particular part of the SAN storage layer, in the form of LUNs. When a storage device receives a request to read or write data, it will check its access list to establish whether the node, identified by its LUN, is allowed to access the storage area, also identified by a LUN. LUN masking is a technique whereby the host bus adapter and the SAN software of a server restrict the LUNs for which commands are accepted. In doing so LUNs that should never be accessed by the server are masked. Another method to restrict server access to particular SAN storage devices is fabric-based access control, or zoning, which is enforced by the SAN networking devices and servers. Under zoning, server access is restricted to storage devices that are in a particular SAN zone. == Network protocols == A mapping layer to other protocols is used to form a network: ATA over Ethernet (AoE), mapping of AT Attachment (ATA) over Ethernet Fibre Channel Protocol (FCP), a mapping of SCSI over Fibre Channel Fibre Channel over Ethernet (FCoE) ESCON over Fibre Channel (FICON), used by mainframe computers HyperSCSI, mapping of SCSI over Ethernet iFCP or SANoIP mapping of FCP over IP iSCSI, mapping of SCSI over TCP/IP iSCSI Extensions for RDMA (iSER), mapping of iSCSI over InfiniBand Network block device, mapping device node requests on UNIX-like systems over stream sockets like TCP/IP SCSI RDMA Protocol (SRP), another SCSI implementation for remote direct memory access (RDMA) transports Storage networks may also be built using Serial Attached SCSI (SAS) and Serial ATA (SATA) technologies. SAS evolved from SCSI direct-attached storage. SATA evolved from Para
Reference data
Reference data is data used to classify or categorize other data. Typically, they are static or slowly changing over time. Examples of reference data include: Units of measurement Country codes Corporate codes Fixed conversion rates e.g., weight, temperature, and length Calendar structure and constraints Reference data sets are sometimes alternatively referred to as a "controlled vocabulary" or "lookup" data. Reference data differs from master data. While both provide context for business transactions, reference data is concerned with classification and categorisation, while master data is concerned with business entities. A further difference between reference data and master data is that a change to the reference data values may require an associated change in business process to support the change, while a change in master data will always be managed as part of existing business processes. For example, adding a new customer or sales product is part of the standard business process. However, adding a new product classification (e.g. "restricted sales item") or a new customer type (e.g. "gold level customer") will result in a modification to the business processes to manage those items. == Externally-defined reference data == For most organisations, most or all reference data is defined and managed within that organisation. Some reference data, however, may be externally defined and managed, for example by standards organizations. An example of externally defined reference data is the set of country codes as defined in ISO 3166-1. == Reference data management == Curating and managing reference data is key to ensuring its quality and thus fitness for purpose. All aspects of an organisation, operational and analytical, are greatly dependent on the quality of an organization's reference data. Without consistency across business process or applications, for example, similar things may be described in quite different ways. Reference data gain in value when they are widely re-used and widely referenced. Examples of good practice in reference data management include: Formalize the reference data management Use external reference data as much as possible Govern the reference data specific to your enterprise Manage reference data at enterprise level Version control your reference data
External memory algorithm
In computing, external memory algorithms or out-of-core algorithms are algorithms that are designed to process data that are too large to fit into a computer's main memory at once. Such algorithms must be optimized to efficiently fetch and access data stored in slow bulk memory (auxiliary memory) such as hard drives or tape drives, or when memory is on a computer network. External memory algorithms are analyzed in the external memory model. == Model == External memory algorithms are analyzed in an idealized model of computation called the external memory model (or I/O model, or disk access model). The external memory model is an abstract machine similar to the RAM machine model, but with a cache in addition to main memory. The model captures the fact that read and write operations are much faster in a cache than in main memory, and that reading long contiguous blocks is faster than reading randomly using a disk read-and-write head. The running time of an algorithm in the external memory model is defined by the number of reads and writes to memory required. The model was introduced by Alok Aggarwal and Jeffrey Vitter in 1988. The external memory model is related to the cache-oblivious model, but algorithms in the external memory model may know both the block size and the cache size. For this reason, the model is sometimes referred to as the cache-aware model. The model consists of a processor with an internal memory or cache of size M, connected to an unbounded external memory. Both the internal and external memory are divided into blocks of size B. One input/output or memory transfer operation consists of moving a block of B contiguous elements from external to internal memory, and the running time of an algorithm is determined by the number of these input/output operations. == Algorithms == Algorithms in the external memory model take advantage of the fact that retrieving one object from external memory retrieves an entire block of size B. This property is sometimes referred to as locality. Searching for an element among N objects is possible in the external memory model using a B-tree with branching factor B. Using a B-tree, searching, insertion, and deletion can be achieved in O ( log B N ) {\displaystyle O(\log _{B}N)} time (in Big O notation). Information theoretically, this is the minimum running time possible for these operations, so using a B-tree is asymptotically optimal. External sorting is sorting in an external memory setting. External sorting can be done via distribution sort, which is similar to quicksort, or via a M B {\displaystyle {\tfrac {M}{B}}} -way merge sort. Both variants achieve the asymptotically optimal runtime of O ( N B log M B N B ) {\displaystyle O\left({\frac {N}{B}}\log _{\frac {M}{B}}{\frac {N}{B}}\right)} to sort N objects. This bound also applies to the fast Fourier transform in the external memory model. The permutation problem is to rearrange N elements into a specific permutation. This can either be done either by sorting, which requires the above sorting runtime, or inserting each element in order and ignoring the benefit of locality. Thus, permutation can be done in O ( min ( N , N B log M B N B ) ) {\displaystyle O\left(\min \left(N,{\frac {N}{B}}\log _{\frac {M}{B}}{\frac {N}{B}}\right)\right)} time. == Applications == The external memory model captures the memory hierarchy, which is not modeled in other common models used in analyzing data structures, such as the random-access machine, and is useful for proving lower bounds for data structures. The model is also useful for analyzing algorithms that work on datasets too big to fit in internal memory. A typical example is geographic information systems, especially digital elevation models, where the full data set easily exceeds several gigabytes or even terabytes of data. This methodology extends beyond general purpose CPUs and also includes GPU computing as well as classical digital signal processing. In general-purpose computing on graphics processing units (GPGPU), powerful graphics cards (GPUs) with little memory (compared with the more familiar system memory, which is most often referred to simply as RAM) are utilized with relatively slow CPU-to-GPU memory transfer (when compared with computation bandwidth). == History == An early use of the term "out-of-core" as an adjective is in 1962 in reference to devices that are other than the core memory of an IBM 360. An early use of the term "out-of-core" with respect to algorithms appears in 1971.
Candid (app)
Candid was a mobile app for anonymous discussions. It used machine learning to create personalized newsfeeds of opinions and real conversations, and also for moderation and filtering. Users posted under pseudonyms such as "HyperMantis", "SincereGiraffe", "GroundedTurtle" and "ExuberantRaptor", that are unique for each thread. Founder and CEO Bindu Reddy said that she needed "a place to express myself and engage in discussions where ideas can be debated on their own merits instead of being used to attack me as a person", which Candid tried to solve by redirecting off-topic comments to their appropriate groups, removing spam and flagging negative posts. They used natural language processing to identify hate speech, slander and threats, and removed them accordingly with human intervention. Candid software analyzed topics and tried to flag rumors and lies as such. Users could flag problematic posts and a team of ten contractors would review them individually. With time the system analyzed a user's interactions and give them labels, such as socializer, explorer, positive, influencer, hater, gossip, etc. In June 2017, Candid announced that it would be shut down because its parent company, Post Intelligence, was being acquired. The app was forecast to close on June 23, 2017, but didn't actually close until June 25, 2017.
Collective operation
Collective operations are building blocks for interaction patterns, that are often used in SPMD algorithms in the parallel programming context. Hence, there is an interest in efficient realizations of these operations. A realization of the collective operations is provided by the Message Passing Interface (MPI). == Definitions == In all asymptotic runtime functions, we denote the latency α {\displaystyle \alpha } (or startup time per message, independent of message size), the communication cost per word β {\displaystyle \beta } , the number of processing units p {\displaystyle p} and the input size per node n {\displaystyle n} . In cases where we have initial messages on more than one node we assume that all local messages are of the same size. To address individual processing units we use p i ∈ { p 0 , p 1 , … , p p − 1 } {\displaystyle p_{i}\in \{p_{0},p_{1},\dots ,p_{p-1}\}} . If we do not have an equal distribution, i.e. node p i {\displaystyle p_{i}} has a message of size n i {\displaystyle n_{i}} , we get an upper bound for the runtime by setting n = max ( n 0 , n 1 , … , n p − 1 ) {\displaystyle n=\max(n_{0},n_{1},\dots ,n_{p-1})} . A distributed memory model is assumed. The concepts are similar for the shared memory model. However, shared memory systems can provide hardware support for some operations like broadcast (§ Broadcast) for example, which allows convenient concurrent read. Thus, new algorithmic possibilities can become available. == Broadcast == The broadcast pattern is used to distribute data from one processing unit to all processing units, which is often needed in SPMD parallel programs to dispense input or global values. Broadcast can be interpreted as an inverse version of the reduce pattern (§ Reduce). Initially only root r {\displaystyle r} with i d {\displaystyle id} 0 {\displaystyle 0} stores message m {\displaystyle m} . During broadcast m {\displaystyle m} is sent to the remaining processing units, so that eventually m {\displaystyle m} is available to all processing units. Since an implementation by means of a sequential for-loop with p − 1 {\displaystyle p-1} iterations becomes a bottleneck, divide-and-conquer approaches are common. One possibility is to utilize a binomial tree structure with the requirement that p {\displaystyle p} has to be a power of two. When a processing unit is responsible for sending m {\displaystyle m} to processing units i . . j {\displaystyle i..j} , it sends m {\displaystyle m} to processing unit ⌈ ( i + j ) / 2 ⌉ {\displaystyle \left\lceil (i+j)/2\right\rceil } and delegates responsibility for the processing units ⌈ ( i + j ) / 2 ⌉ . . j {\displaystyle \left\lceil (i+j)/2\right\rceil ..j} to it, while its own responsibility is cut down to i . . ⌈ ( i + j ) / 2 ⌉ − 1 {\displaystyle i..\left\lceil (i+j)/2\right\rceil -1} . Binomial trees have a problem with long messages m {\displaystyle m} . The receiving unit of m {\displaystyle m} can only propagate the message to other units, after it received the whole message. In the meantime, the communication network is not utilized. Therefore pipelining on binary trees is used, where m {\displaystyle m} is split into an array of k {\displaystyle k} packets of size ⌈ n / k ⌉ {\displaystyle \left\lceil n/k\right\rceil } . The packets are then broadcast one after another, so that data is distributed fast in the communication network. Pipelined broadcast on balanced binary tree is possible in O ( α log p + β n ) {\displaystyle {\mathcal {O}}(\alpha \log p+\beta n)} , whereas for the non-pipelined case it takes O ( ( α + β n ) log p ) {\displaystyle {\mathcal {O}}((\alpha +\beta n)\log p)} cost. == Reduce == The reduce pattern is used to collect data or partial results from different processing units and to combine them into a global result by a chosen operator. Given p {\displaystyle p} processing units, message m i {\displaystyle m_{i}} is on processing unit p i {\displaystyle p_{i}} initially. All m i {\displaystyle m_{i}} are aggregated by ⊗ {\displaystyle \otimes } and the result is eventually stored on p 0 {\displaystyle p_{0}} . The reduction operator ⊗ {\displaystyle \otimes } must be associative at least. Some algorithms require a commutative operator with a neutral element. Operators like s u m {\displaystyle sum} , m i n {\displaystyle min} , m a x {\displaystyle max} are common. Implementation considerations are similar to broadcast (§ Broadcast). For pipelining on binary trees the message must be representable as a vector of smaller object for component-wise reduction. Pipelined reduce on a balanced binary tree is possible in O ( α log p + β n ) {\displaystyle {\mathcal {O}}(\alpha \log p+\beta n)} . == All-Reduce == The all-reduce pattern (also called allreduce) is used if the result of a reduce operation (§ Reduce) must be distributed to all processing units. Given p {\displaystyle p} processing units, message m i {\displaystyle m_{i}} is on processing unit p i {\displaystyle p_{i}} initially. All m i {\displaystyle m_{i}} are aggregated by an operator ⊗ {\displaystyle \otimes } and the result is eventually stored on all p i {\displaystyle p_{i}} . Analog to the reduce operation, the operator ⊗ {\displaystyle \otimes } must be at least associative. All-reduce can be interpreted as a reduce operation with a subsequent broadcast (§ Broadcast). For long messages a corresponding implementation is suitable, whereas for short messages, the latency can be reduced by using a hypercube (Hypercube (communication pattern) § All-Gather/ All-Reduce) topology, if p {\displaystyle p} is a power of two. All-reduce can also be implemented with a butterfly algorithm and achieve optimal latency and bandwidth. All-reduce is possible in O ( α log p + β n ) {\displaystyle {\mathcal {O}}(\alpha \log p+\beta n)} , since reduce and broadcast are possible in O ( α log p + β n ) {\displaystyle {\mathcal {O}}(\alpha \log p+\beta n)} with pipelining on balanced binary trees. All-reduce implemented with a butterfly algorithm achieves the same asymptotic runtime. == Prefix-Sum/Scan == The prefix-sum or scan operation is used to collect data or partial results from different processing units and to compute intermediate results by an operator, which are stored on those processing units. It can be seen as a generalization of the reduce operation (§ Reduce). Given p {\displaystyle p} processing units, message m i {\displaystyle m_{i}} is on processing unit p i {\displaystyle p_{i}} . The operator ⊗ {\displaystyle \otimes } must be at least associative, whereas some algorithms require also a commutative operator and a neutral element. Common operators are s u m {\displaystyle sum} , m i n {\displaystyle min} and m a x {\displaystyle max} . Eventually processing unit p i {\displaystyle p_{i}} stores the prefix sum ⊗ i ′ <= i {\displaystyle \otimes _{i'<=i}} m i ′ {\displaystyle m_{i'}} . In the case of the so-called exclusive prefix sum, processing unit p i {\displaystyle p_{i}} stores the prefix sum ⊗ i ′ < i {\displaystyle \otimes _{i'