Backup

Backup

In information technology, a backup, or data backup is a copy of computer data taken and stored elsewhere so that it may be used to restore the original after a data loss event. The verb form, referring to the process of doing so, is "back up", whereas the noun and adjective form is "backup". Backups can be used to recover data after its loss from data deletion or corruption, or to recover data from an earlier time. Backups provide a simple form of IT disaster recovery; however not all backup systems are able to reconstitute a computer system or other complex configuration such as a computer cluster, active directory server, or database server. A backup system contains at least one copy of all data considered worth saving. The data storage requirements can be large. An information repository model may be used to provide structure to this storage. There are different types of data storage devices used for copying backups of data that is already in secondary storage onto archive files. There are also different ways these devices can be arranged to provide geographic dispersion, data security, and portability. Data is selected, extracted, and manipulated for storage. The process can include methods for dealing with live data, including open files, as well as compression, encryption, and de-duplication. Additional techniques apply to enterprise client-server backup. Backup schemes may include dry runs that validate the reliability of the data being backed up. There are limitations and human factors involved in any backup scheme. == Storage == A backup strategy requires an information repository, "a secondary storage space for data" that aggregates backups of data "sources". The repository could be as simple as a list of all backup media (DVDs, etc.) and the dates produced, or could include a computerized index, catalog, or relational database. === 3-2-1 Backup Rule === The backup data needs to be stored, requiring a backup rotation scheme, which is a system of backing up data to computer media that limits the number of backups of different dates retained separately, by appropriate re-use of the data storage media by overwriting of backups no longer needed. The scheme determines how and when each piece of removable storage is used for a backup operation and how long it is retained once it has backup data stored on it. The 3-2-1 rule can aid in the backup process. It states that there should be at least 3 copies of the data, stored on 2 different types of storage media, and one copy should be kept offsite, in a remote location (this can include cloud storage). 2 or more different media should be used to eliminate data loss due to similar reasons (for example, optical discs may tolerate being underwater while LTO tapes may not, and SSDs cannot fail due to head crashes or damaged spindle motors since they do not have any moving parts, unlike hard drives). An offsite copy protects against fire, theft of physical media (such as tapes or discs) and natural disasters like floods and earthquakes. Physically protected hard drives are an alternative to an offsite copy, but they have limitations like only being able to resist fire for a limited period of time, so an offsite copy still remains as the ideal choice. Because there is no perfect storage, many backup experts recommend maintaining a second copy on a local physical device, even if the data is also backed up offsite. === Backup methods === ==== Unstructured ==== An unstructured repository may simply be a stack of tapes, DVD-Rs or external HDDs with minimal information about what was backed up and when. This method is the easiest to implement, but unlikely to achieve a high level of recoverability as it lacks automation. ==== Full only/System imaging ==== A repository using this backup method contains complete source data copies taken at one or more specific points in time. Copying system images, this method is frequently used by computer technicians to record known good configurations. However, imaging is generally more useful as a way of deploying a standard configuration to many systems rather than as a tool for making ongoing backups of diverse systems. ==== Incremental ==== An incremental backup stores data changed since a reference point in time. Duplicate copies of unchanged data are not copied. Typically a full backup of all files is made once or at infrequent intervals, serving as the reference point for an incremental repository. Subsequently, a number of incremental backups are made after successive time periods. Restores begin with the last full backup and then apply the incrementals. Some backup systems can create a synthetic full backup from a series of incrementals, thus providing the equivalent of frequently doing a full backup. When done to modify a single archive file, this speeds restores of recent versions of files. ==== Near-CDP ==== Continuous Data Protection (CDP) refers to a backup that instantly saves a copy of every change made to the data. This allows restoration of data to any point in time and is the most comprehensive and advanced data protection. Near-CDP backup applications—often marketed as "CDP"—automatically take incremental backups at a specific interval, for example every 15 minutes, one hour, or 24 hours. They can therefore only allow restores to an interval boundary. Near-CDP backup applications use journaling and are typically based on periodic "snapshots", read-only copies of the data frozen at a particular point in time. Near-CDP (except for Apple Time Machine) intent-logs every change on the host system, often by saving byte or block-level differences rather than file-level differences. This backup method differs from simple disk mirroring in that it enables a roll-back of the log and thus a restoration of old images of data. Intent-logging allows precautions for the consistency of live data, protecting self-consistent files but requiring applications "be quiesced and made ready for backup." Near-CDP is more practicable for ordinary personal backup applications, as opposed to true CDP, which must be run in conjunction with a virtual machine or equivalent and is therefore generally used in enterprise client-server backups. Software may create copies of individual files such as written documents, multimedia projects, or user preferences, to prevent failed write events caused by power outages, operating system crashes, or exhausted disk space, from causing data loss. A common implementation is an appended ".bak" extension to the file name. ==== Reverse incremental ==== A Reverse incremental backup method stores a recent archive file "mirror" of the source data and a series of differences between the "mirror" in its current state and its previous states. A reverse incremental backup method starts with a non-image full backup. After the full backup is performed, the system periodically synchronizes the full backup with the live copy, while storing the data necessary to reconstruct older versions. This can either be done using hard links—as Apple Time Machine does, or using binary diffs. ==== Differential ==== A differential backup saves only the data that has changed since the last full backup. This means a maximum of two backups from the repository are used to restore the data. However, as time from the last full backup (and thus the accumulated changes in data) increases, so does the time to perform the differential backup. Restoring an entire system requires starting from the most recent full backup and then applying just the last differential backup. A differential backup copies files that have been created or changed since the last full backup, regardless of whether any other differential backups have been made since, whereas an incremental backup copies files that have been created or changed since the most recent backup of any type (full or incremental). Changes in files may be detected through a more recent date/time of last modification file attribute, and/or changes in file size. Other variations of incremental backup include multi-level incrementals and block-level incrementals that compare parts of files instead of just entire files. === Storage media === Regardless of the repository model that is used, the data has to be copied onto an archive file data storage medium. The medium used is also referred to as the type of backup destination. ==== Magnetic tape ==== Magnetic tape was for a long time the most commonly used medium for bulk data storage, backup, archiving, and interchange. It was previously a less expensive option, but this is no longer the case for smaller amounts of data. Tape is a sequential access medium, so the rate of continuously writing or reading data can be very fast. While tape media itself has a low cost per space, tape drives are typically dozens of times as expensive as hard disk drives and optical drives. Tape media are generally rotated on a schedule so at least one set is off-site in case something should happe

Sorenson Squeeze

Sorenson Squeeze was a software video encoding tool used to compress and convert video and audio files on Mac OS X or Windows operating systems. It was sold as a standalone tool and has also long been bundled with Avid Media Composer. == History == Sorenson Squeeze was first announced on July 17, 2001, as the first variable bit rate (VBR) compression application for Mac OS X, and was released on October 29 of that same year. By March 2002, Sorenson Squeeze became available for Windows OS. Sorenson Squeeze was originally released as a tool for encoding videos for the Web and QuickTime playback but began adding new codecs as more versions were released. The software was discontinued by Sorenson in January 2019, and correspondingly was no longer offered as part of Avid Media Composer. == Features == Squeeze included a number of features to improve video & audio quality. Features included: GPU accelerated H.264 encoding, adaptive bitrate encoding, HD encoding and Dolby certified AC3 Audio. Intelligent encoding presets available in Squeeze included: x265 (H.265) MainConcept H.264 and MainConcept H.264 CUDA. Adaptive bitrate encoding allows for optimal bitrate and error resilience based on network conditions, resulting in a dynamic adjustment of the video bitstream being delivered. It encoded to multiple formats including QuickTime, Windows Media, Flash Video, Silverlight, WebM & WMV. It uses multiple codecs, including the Sorenson codecs SV3 Pro and Spark, H.265, H.264, H.263, VP6, VC1, MPEG2, and many others. Squeeze operates on the Apple Macintosh and Microsoft Windows operating systems. Squeeze offers native plugins to Avid, Apple Final Cut Pro and Adobe Premiere (CS4, CS5) NLEs. Each copy of Squeeze included the Dolby Certified AC3 Consumer encoder. Squeeze also included a simplified review and approval process, which allows the user to automatically send secure, password protected videos for immediate review. Instant feedback is received via Web or mobile. == Versions == Sorenson Squeeze was released on October 29, 2001. Sorenson Squeeze for Macromedia Flash MX was released on March 14, 2002. Sorenson Squeeze 3 for MPEG-4 was released in January 2003. Sorenson Squeeze 3 Compression Suite was released in January 2003. Sorenson Squeeze 5 was released on March 31, 2008. Sorenson Squeeze was updated to version 5.1 on May 11, 2009. Sorenson Squeeze 6 was released on November 3, 2009. Sorenson Squeeze 7 was released January 25, 2011. Sorenson Squeeze 11 was released August 27, 2016. == Awards == Streaming Media magazine Readers’ Choice Award for Encoding Software for 2007, 2008, 2009 and 2010. 2008 Vanguard Award from Digital Content Producer magazine == Squeeze 7 system requirements == Windows Pentium IV-based computer or greater Windows XP, Vista or 7 32- and 64-bit compatible (including AVID 64-bit update); Faster performance on 64-bit systems 512 MB RAM 120 MB available hard drive space QuickTime 7.2 or later DirectX 9.0b or later Macintosh Intel-based processor Mac OS 10.4 or later 32- and 64-bit compatible; Faster performance on 64-bit systems 512 MB RAM 120 MB available hard drive space QuickTime 7.2 or later

Energy informatics

Energy informatics is a research field covering the use of information and communication technology to address energy utilization and management challenges. Methods used for "smart" implementations often combine IoT sensors with artificial intelligence and machine learning. Energy Informatics is founded on flow networks that are the major suppliers and consumers of energy. Their efficiency can be improved by collecting and analyzing information. == Application areas == The field among other consider application areas within: Smart Buildings by developing ICT-centred solutions for improving the energy-efficiency of buildings. Smart Cities by investigating the synergies between demand patterns and supply availability of energy flows in cities and communities to improve energy efficiency, increase integration of renewable sources, and provide resilience towards system faults caused by extreme situations, like hurricanes and flooding. Smart Industries including the development of ICT-centred solutions for improving the energy efficiency and predictability of energy intensive industrial processes, without compromising process and product quality. Smart Energy Networks by developing ICT-centred solutions for coordinating the supply and demand in environmentally sustainable energy networks.

Magic state distillation

Magic state distillation is a method for creating more accurate quantum states from multiple noisy ones, which is important for building fault tolerant quantum computers. It has also been linked to quantum contextuality, a concept thought to contribute to quantum computers' power. The technique was first proposed by Emanuel Knill in 2004, and further analyzed by Sergey Bravyi and Alexei Kitaev the same year. Thanks to the Gottesman–Knill theorem, it is known that some quantum operations (operations in the Clifford group) can be perfectly simulated in polynomial time on a classical computer. In order to achieve universal quantum computation, a quantum computer must be able to perform operations outside this set. Magic state distillation achieves this, in principle, by concentrating the usefulness of imperfect resources, represented by mixed states, into states that are conducive for performing operations that are difficult to simulate classically. A variety of qubit magic state distillation routines and distillation routines for qubits with various advantages have been proposed. == Stabilizer formalism == The Clifford group consists of a set of n {\displaystyle n} -qubit operations generated by the gates {H, S, CNOT} (where H is Hadamard and S is [ 1 0 0 i ] {\displaystyle {\begin{bmatrix}1&0\\0&i\end{bmatrix}}} ) called Clifford gates. The Clifford group generates stabilizer states which can be efficiently simulated classically, as shown by the Gottesman–Knill theorem. This set of gates with a non-Clifford operation is universal for quantum computation. == Magic states == Magic states are purified from n {\displaystyle n} copies of a mixed state ρ {\displaystyle \rho } . These states are typically provided via an ancilla to the circuit. A magic state for the π / 6 {\displaystyle \pi /6} rotation operator is | M ⟩ = cos ⁡ ( β / 2 ) | 0 ⟩ + e i π 4 sin ⁡ ( β / 2 ) | 1 ⟩ {\displaystyle |M\rangle =\cos(\beta /2)|0\rangle +e^{i{\frac {\pi }{4}}}\sin(\beta /2)|1\rangle } where β = arccos ⁡ ( 1 3 ) {\displaystyle \beta =\arccos \left({\frac {1}{\sqrt {3}}}\right)} . A non-Clifford gate can be generated by combining (copies of) magic states with Clifford gates. Since a set of Clifford gates combined with a non-Clifford gate is universal for quantum computation, magic states combined with Clifford gates are also universal. == Purification algorithm for distilling |M〉 == The first magic state distillation algorithm, invented by Sergey Bravyi and Alexei Kitaev, is as follows. Input: Prepare 5 imperfect states. Output: An almost pure state having a small error probability. repeat Apply the decoding operation of the five-qubit error correcting code and measure the syndrome. If the measured syndrome is | 0000 ⟩ {\displaystyle |0000\rangle } , the distillation attempt is successful. else Get rid of the resulting state and restart the algorithm. until The states have been distilled to the desired purity.

Information school

Information school (sometimes abbreviated I-school or iSchool) is a university-level institution committed to understanding the role of information in nature and human endeavors. Synonyms include school of information, department of information studies, or information department. Information schools faculty conduct research into the fundamental aspects of information and related technologies. In addition to granting academic degrees, information schools educate information professionals, researchers, and scholars for an increasingly information-driven world. Information school can also refer, in a more restricted sense, to the members of the iSchools organization (formerly the "iSchools Project"), as governed by the iCaucus. Members of this group share a fundamental interest in the relationships between people, information, technology, and science. These schools, colleges, and departments have been either newly established or have evolved from programs focused on information systems, library science, informatics, computer science, library and information science and information science. Information schools promote an interdisciplinary approach to understanding the opportunities and challenges of information management, with a core commitment to concepts like universal access and user-centered organization of information. The field is concerned broadly with questions of design and preservation across information spaces, from digital and virtual spaces like online communities, the World Wide Web, and databases to physical spaces such as libraries, museums, archives, and other repositories. Information school degree programs include course offerings in areas such as data science, information architecture, design, economics, policy, retrieval, security, and telecommunications; knowledge management, user experience design, and usability; conservation and preservation, including digital preservation; librarianship and library administration; the sociology of information; and human–computer interaction.

EM algorithm and GMM model

In statistics, EM (expectation maximization) algorithm handles latent variables, while GMM is the Gaussian mixture model. == Background == In the picture below, are shown the red blood cell hemoglobin concentration and the red blood cell volume data of two groups of people, the Anemia group and the control group (i.e. the group of people without Anemia). As expected, people with Anemia have lower red blood cell volume and lower red blood cell hemoglobin concentration than those without Anemia. x {\displaystyle x} is a random vector such as x := ( red blood cell volume , red blood cell hemoglobin concentration ) {\displaystyle x:={\big (}{\text{red blood cell volume}},{\text{red blood cell hemoglobin concentration}}{\big )}} , and from medical studies it is known that x {\displaystyle x} are normally distributed in each group, i.e. x ∼ N ( μ , Σ ) {\displaystyle x\sim {\mathcal {N}}(\mu ,\Sigma )} . z {\displaystyle z} is denoted as the group where x {\displaystyle x} belongs, with z i = 0 {\displaystyle z_{i}=0} when x i {\displaystyle x_{i}} belongs to the Anemia group and z i = 1 {\displaystyle z_{i}=1} when x i {\displaystyle x_{i}} belongs to the control group. Also z ∼ Categorical ⁡ ( k , ϕ ) {\displaystyle z\sim \operatorname {Categorical} (k,\phi )} where k = 2 {\displaystyle k=2} , ϕ j ≥ 0 , {\displaystyle \phi _{j}\geq 0,} and ∑ j = 1 k ϕ j = 1 {\displaystyle \sum _{j=1}^{k}\phi _{j}=1} . See Categorical distribution. The following procedure can be used to estimate ϕ , μ , Σ {\displaystyle \phi ,\mu ,\Sigma } . A maximum likelihood estimation can be applied: ℓ ( ϕ , μ , Σ ) = ∑ i = 1 m log ⁡ ( p ( x ( i ) ; ϕ , μ , Σ ) ) = ∑ i = 1 m log ⁡ ∑ z ( i ) = 1 k p ( x ( i ) ∣ z ( i ) ; μ , Σ ) p ( z ( i ) ; ϕ ) {\displaystyle \ell (\phi ,\mu ,\Sigma )=\sum _{i=1}^{m}\log(p(x^{(i)};\phi ,\mu ,\Sigma ))=\sum _{i=1}^{m}\log \sum _{z^{(i)}=1}^{k}p\left(x^{(i)}\mid z^{(i)};\mu ,\Sigma \right)p(z^{(i)};\phi )} As the z i {\displaystyle z_{i}} for each x i {\displaystyle x_{i}} are known, the log likelihood function can be simplified as below: ℓ ( ϕ , μ , Σ ) = ∑ i = 1 m log ⁡ p ( x ( i ) ∣ z ( i ) ; μ , Σ ) + log ⁡ p ( z ( i ) ; ϕ ) {\displaystyle \ell (\phi ,\mu ,\Sigma )=\sum _{i=1}^{m}\log p\left(x^{(i)}\mid z^{(i)};\mu ,\Sigma \right)+\log p\left(z^{(i)};\phi \right)} Now the likelihood function can be maximized by making partial derivative over μ , Σ , ϕ {\displaystyle \mu ,\Sigma ,\phi } , obtaining: ϕ j = 1 m ∑ i = 1 m 1 { z ( i ) = j } {\displaystyle \phi _{j}={\frac {1}{m}}\sum _{i=1}^{m}1\{z^{(i)}=j\}} μ j = ∑ i = 1 m 1 { z ( i ) = j } x ( i ) ∑ i = 1 m 1 { z ( i ) = j } {\displaystyle \mu _{j}={\frac {\sum _{i=1}^{m}1\{z^{(i)}=j\}x^{(i)}}{\sum _{i=1}^{m}1\left\{z^{(i)}=j\right\}}}} Σ j = ∑ i = 1 m 1 { z ( i ) = j } ( x ( i ) − μ j ) ( x ( i ) − μ j ) T ∑ i = 1 m 1 { z ( i ) = j } {\displaystyle \Sigma _{j}={\frac {\sum _{i=1}^{m}1\{z^{(i)}=j\}(x^{(i)}-\mu _{j})(x^{(i)}-\mu _{j})^{T}}{\sum _{i=1}^{m}1\{z^{(i)}=j\}}}} If z i {\displaystyle z_{i}} is known, the estimation of the parameters results to be quite simple with maximum likelihood estimation. But if z i {\displaystyle z_{i}} is unknown it is much more complicated. Being z {\displaystyle z} a latent variable (i.e. not observed), with unlabeled scenario, the expectation maximization algorithm is needed to estimate z {\displaystyle z} as well as other parameters. Generally, this problem is set as a GMM since the data in each group is normally distributed. In machine learning, the latent variable z {\displaystyle z} is considered as a latent pattern lying under the data, which the observer is not able to see very directly. x i {\displaystyle x_{i}} is the known data, while ϕ , μ , Σ {\displaystyle \phi ,\mu ,\Sigma } are the parameter of the model. With the EM algorithm, some underlying pattern z {\displaystyle z} in the data x i {\displaystyle x_{i}} can be found, along with the estimation of the parameters. The wide application of this circumstance in machine learning is what makes EM algorithm so important. == EM algorithm in GMM == The EM algorithm consists of two steps: the E-step and the M-step. Firstly, the model parameters and the z ( i ) {\displaystyle z^{(i)}} can be randomly initialized. In the E-step, the algorithm tries to guess the value of z ( i ) {\displaystyle z^{(i)}} based on the parameters, while in the M-step, the algorithm updates the value of the model parameters based on the guess of z ( i ) {\displaystyle z^{(i)}} of the E-step. These two steps are repeated until convergence is reached. The algorithm in GMM is: Repeat until convergence: 1. (E-step) For each i , j {\displaystyle i,j} , set w j ( i ) := p ( z ( i ) = j | x ( i ) ; ϕ , μ , Σ ) {\displaystyle w_{j}^{(i)}:=p\left(z^{(i)}=j|x^{(i)};\phi ,\mu ,\Sigma \right)} 2. (M-step) Update the parameters ϕ j := 1 m ∑ i = 1 m w j ( i ) {\displaystyle \phi _{j}:={\frac {1}{m}}\sum _{i=1}^{m}w_{j}^{(i)}} μ j := ∑ i = 1 m w j ( i ) x ( i ) ∑ i = 1 m w j ( i ) {\displaystyle \mu _{j}:={\frac {\sum _{i=1}^{m}w_{j}^{(i)}x^{(i)}}{\sum _{i=1}^{m}w_{j}^{(i)}}}} Σ j := ∑ i = 1 m w j ( i ) ( x ( i ) − μ j ) ( x ( i ) − μ j ) T ∑ i = 1 m w j ( i ) {\displaystyle \Sigma _{j}:={\frac {\sum _{i=1}^{m}w_{j}^{(i)}\left(x^{(i)}-\mu _{j}\right)\left(x^{(i)}-\mu _{j}\right)^{T}}{\sum _{i=1}^{m}w_{j}^{(i)}}}} With Bayes' rule, the following result is obtained by the E-step: p ( z ( i ) = j | x ( i ) ; ϕ , μ , Σ ) = p ( x ( i ) | z ( i ) = j ; μ , Σ ) p ( z ( i ) = j ; ϕ ) ∑ l = 1 k p ( x ( i ) | z ( i ) = l ; μ , Σ ) p ( z ( i ) = l ; ϕ ) {\displaystyle p\left(z^{(i)}=j|x^{(i)};\phi ,\mu ,\Sigma \right)={\frac {p\left(x^{(i)}|z^{(i)}=j;\mu ,\Sigma \right)p\left(z^{(i)}=j;\phi \right)}{\sum _{l=1}^{k}p\left(x^{(i)}|z^{(i)}=l;\mu ,\Sigma \right)p\left(z^{(i)}=l;\phi \right)}}} According to GMM setting, these following formulas are obtained: p ( x ( i ) | z ( i ) = j ; μ , Σ ) = 1 ( 2 π ) n / 2 | Σ j | 1 / 2 exp ⁡ ( − 1 2 ( x ( i ) − μ j ) T Σ j − 1 ( x ( i ) − μ j ) ) {\displaystyle p\left(x^{(i)}|z^{(i)}=j;\mu ,\Sigma \right)={\frac {1}{(2\pi )^{n/2}\left|\Sigma _{j}\right|^{1/2}}}\exp \left(-{\frac {1}{2}}\left(x^{(i)}-\mu _{j}\right)^{T}\Sigma _{j}^{-1}\left(x^{(i)}-\mu _{j}\right)\right)} p ( z ( i ) = j ; ϕ ) = ϕ j {\displaystyle p\left(z^{(i)}=j;\phi \right)=\phi _{j}} In this way, a switch between the E-step and the M-step is possible, according to the randomly initialized parameters.

Communication-avoiding algorithm

Communication-avoiding algorithms minimize movement of data within a memory hierarchy for improving its running-time and energy consumption. These minimize the total of two costs (in terms of time and energy): arithmetic and communication. Communication, in this context refers to moving data, either between levels of memory or between multiple processors over a network. It is much more expensive than arithmetic. == Formal theory == === Two-level memory model === A common computational model in analyzing communication-avoiding algorithms is the two-level memory model: There is one processor and two levels of memory. Level 1 memory is infinitely large. Level 0 memory ("cache") has size M {\displaystyle M} . In the beginning, input resides in level 1. In the end, the output resides in level 1. Processor can only operate on data in cache. The goal is to minimize data transfers between the two levels of memory. === Matrix multiplication === Corollary 6.2: More general results for other numerical linear algebra operations can be found in. The following proof is from. == Motivation == Consider the following running-time model: Measure of computation = Time per FLOP = γ Measure of communication = No. of words of data moved = β ⇒ Total running time = γ·(no. of FLOPs) + β·(no. of words) From the fact that β >> γ as measured in time and energy, communication cost dominates computation cost. Technological trends indicate that the relative cost of communication is increasing on a variety of platforms, from cloud computing to supercomputers to mobile devices. The report also predicts that gap between DRAM access time and FLOPs will increase 100× over coming decade to balance power usage between processors and DRAM. Energy consumption increases by orders of magnitude as we go higher in the memory hierarchy. United States president Barack Obama cited communication-avoiding algorithms in the FY 2012 Department of Energy budget request to Congress: New Algorithm Improves Performance and Accuracy on Extreme-Scale Computing Systems. On modern computer architectures, communication between processors takes longer than the performance of a floating-point arithmetic operation by a given processor. ASCR researchers have developed a new method, derived from commonly used linear algebra methods, to minimize communications between processors and the memory hierarchy, by reformulating the communication patterns specified within the algorithm. This method has been implemented in the TRILINOS framework, a highly-regarded suite of software, which provides functionality for researchers around the world to solve large scale, complex multi-physics problems. == Objectives == Communication-avoiding algorithms are designed with the following objectives: Reorganize algorithms to reduce communication across all memory hierarchies. Attain the lower-bound on communication when possible. The following simple example demonstrates how these are achieved. === Matrix multiplication example === Let A, B and C be square matrices of order n × n. The following naive algorithm implements C = C + A B: for i = 1 to n for j = 1 to n for k = 1 to n C(i,j) = C(i,j) + A(i,k) B(k,j) Arithmetic cost (time-complexity): n2(2n − 1) for sufficiently large n or O(n3). Rewriting this algorithm with communication cost labelled at each step for i = 1 to n {read row i of A into fast memory} - n2 reads for j = 1 to n {read C(i,j) into fast memory} - n2 reads {read column j of B into fast memory} - n3 reads for k = 1 to n C(i,j) = C(i,j) + A(i,k) B(k,j) {write C(i,j) back to slow memory} - n2 writes Fast memory may be defined as the local processor memory (CPU cache) of size M and slow memory may be defined as the DRAM. Communication cost (reads/writes): n3 + 3n2 or O(n3) Since total running time = γ·O(n3) + β·O(n3) and β >> γ the communication cost is dominant. The blocked (tiled) matrix multiplication algorithm reduces this dominant term: ==== Blocked (tiled) matrix multiplication ==== Consider A, B and C to be n/b-by-n/b matrices of b-by-b sub-blocks where b is called the block size; assume three b-by-b blocks fit in fast memory. for i = 1 to n/b for j = 1 to n/b {read block C(i,j) into fast memory} - b2 × (n/b)2 = n2 reads for k = 1 to n/b {read block A(i,k) into fast memory} - b2 × (n/b)3 = n3/b reads {read block B(k,j) into fast memory} - b2 × (n/b)3 = n3/b reads C(i,j) = C(i,j) + A(i,k) B(k,j) - {do a matrix multiply on blocks} {write block C(i,j) back to slow memory} - b2 × (n/b)2 = n2 writes Communication cost: 2n3/b + 2n2 reads/writes << 2n3 arithmetic cost Making b as large possible: 3b2 ≤ M we achieve the following communication lower bound: 31/2n3/M1/2 + 2n2 or Ω (no. of FLOPs / M1/2) == Previous approaches for reducing communication == Most of the approaches investigated in the past to address this problem rely on scheduling or tuning techniques that aim at overlapping communication with computation. However, this approach can lead to an improvement of at most a factor of two. Ghosting is a different technique for reducing communication, in which a processor stores and computes redundantly data from neighboring processors for future computations. Cache-oblivious algorithms represent a different approach introduced in 1999 for fast Fourier transforms, and then extended to graph algorithms, dynamic programming, etc. They were also applied to several operations in linear algebra as dense LU and QR factorizations. The design of architecture specific algorithms is another approach that can be used for reducing the communication in parallel algorithms, and there are many examples in the literature of algorithms that are adapted to a given communication topology.