Lancichinetti–Fortunato–Radicchi benchmark

Lancichinetti–Fortunato–Radicchi benchmark

Lancichinetti–Fortunato–Radicchi benchmark is an algorithm that generates benchmark networks (artificial networks that resemble real-world networks). They have a priori known communities and are used to compare different community detection methods. The advantage of the benchmark over other methods is that it accounts for the heterogeneity in the distributions of node degrees and of community sizes. == The algorithm == The node degrees and the community sizes are distributed according to a power law, with different exponents. The benchmark assumes that both the degree and the community size have power law distributions with different exponents, γ {\displaystyle \gamma } and β {\displaystyle \beta } , respectively. N {\displaystyle N} is the number of nodes and the average degree is ⟨ k ⟩ {\displaystyle \langle k\rangle } . There is a mixing parameter μ {\displaystyle \mu } , which is the average fraction of neighboring nodes of a node that do not belong to any community that the benchmark node belongs to. This parameter controls the fraction of edges that are between communities. Thus, it reflects the amount of noise in the network. At the extremes, when μ = 0 {\displaystyle \mu =0} all links are within community links, if μ = 1 {\displaystyle \mu =1} all links are between nodes belonging to different communities. One can generate the benchmark network using the following steps. Step 1: Generate a network with nodes following a power law distribution with exponent γ {\displaystyle \gamma } and choose extremes of the distribution k min {\displaystyle k_{\min }} and k max {\displaystyle k_{\max }} to get desired average degree is ⟨ k ⟩ {\displaystyle \langle k\rangle } . Step 2: ( 1 − μ ) {\displaystyle (1-\mu )} fraction of links of every node is with nodes of the same community, while fraction μ {\displaystyle \mu } is with the other nodes. Step 3: Generate community sizes from a power law distribution with exponent β {\displaystyle \beta } . The sum of all sizes must be equal to N {\displaystyle N} . The minimal and maximal community sizes s min {\displaystyle s_{\min }} and s max {\displaystyle s_{\max }} must satisfy the definition of community so that every non-isolated node is in at least in one community: s min > k min {\displaystyle s_{\min }>k_{\min }} s max > k max {\displaystyle s_{\max }>k_{\max }} Step 4: Initially, no nodes are assigned to communities. Then, each node is randomly assigned to a community. As long as the number of neighboring nodes within the community does not exceed the community size a new node is added to the community, otherwise stays out. In the following iterations the “homeless” node is randomly assigned to some community. If that community is complete, i.e. the size is exhausted, a randomly selected node of that community must be unlinked. Stop the iteration when all the communities are complete and all the nodes belong to at least one community. Step 5: Implement rewiring of nodes keeping the same node degrees but only affecting the fraction of internal and external links such that the number of links outside the community for each node is approximately equal to the mixing parameter μ {\displaystyle \mu } . == Testing == Consider a partition into communities that do not overlap. The communities of randomly chosen nodes in each iteration follow a p ( C ) {\displaystyle p(C)} distribution that represents the probability that a randomly picked node is from the community C {\displaystyle C} . Consider a partition of the same network that was predicted by some community finding algorithm and has p ( C 2 ) {\displaystyle p(C_{2})} distribution. The benchmark partition has p ( C 1 ) {\displaystyle p(C_{1})} distribution. The joint distribution is p ( C 1 , C 2 ) {\displaystyle p(C_{1},C_{2})} . The similarity of these two partitions is captured by the normalized mutual information. I n = ∑ C 1 , C 2 p ( C 1 , C 2 ) log 2 ⁡ p ( C 1 , C 2 ) p ( C 1 ) p ( C 2 ) 1 2 H ( { p ( C 1 ) } ) + 1 2 H ( { p ( C 2 ) } ) {\displaystyle I_{n}={\frac {\sum _{C_{1},C_{2}}p(C_{1},C_{2})\log _{2}{\frac {p(C_{1},C_{2})}{p(C_{1})p(C_{2})}}}{{\frac {1}{2}}H(\{p(C_{1})\})+{\frac {1}{2}}H(\{p(C_{2})\})}}} If I n = 1 {\displaystyle I_{n}=1} the benchmark and the detected partitions are identical, and if I n = 0 {\displaystyle I_{n}=0} then they are independent of each other.

Deconvolution

In mathematics, deconvolution is the inverse of convolution. Both operations are used in signal processing and image processing. For example, it may be possible to recover the original signal after a filter (convolution) by using a deconvolution method with a certain degree of accuracy. Due to the measurement error of the recorded signal or image, it can be demonstrated that the worse the signal-to-noise ratio (SNR), the worse the reversing of a filter will be; hence, inverting a filter is not always a good solution as the error amplifies. Deconvolution offers a solution to this problem. The foundations for deconvolution and time-series analysis were largely laid by Norbert Wiener of the Massachusetts Institute of Technology in his book Extrapolation, Interpolation, and Smoothing of Stationary Time Series (1949). The book was based on work Wiener had done during World War II but that had been classified at the time. Some of the early attempts to apply these theories were in the fields of weather forecasting and economics. == Description == In general, the objective of deconvolution is to find the solution f of a convolution equation of the form: f ∗ g = h {\displaystyle fg=h\,} Usually, h is some recorded signal, and f is some signal that we wish to recover, but has been convolved with a filter or distortion function g, before we recorded it. Usually, h is a distorted version of f and the shape of f can't be easily recognized by the eye or simpler time-domain operations. The function g represents the impulse response of an instrument or a driving force that was applied to a physical system. If we know g, or at least know the form of g, then we can perform deterministic deconvolution. However, if we do not know g in advance, then we need to estimate it. This can be done using methods of statistical estimation or building the physical principles of the underlying system, such as the electrical circuit equations or diffusion equations. There are several deconvolution techniques, depending on the choice of the measurement error and deconvolution parameters: === Raw deconvolution === When the measurement error is very low (ideal case), deconvolution collapses into a filter reversing. This kind of deconvolution can be performed in the Laplace domain. By computing the Fourier transform of the recorded signal h and the system response function g, you get H and G, with G as the transfer function. Using the convolution theorem, F = H / G {\displaystyle F=H/G\,} where F is the estimated Fourier transform of f. Finally, the inverse Fourier transform of the function F is taken to find the estimated deconvolved signal f. Note that G is at the denominator and could amplify elements of the error model if present. === Deconvolution with noise === In physical measurements, the situation is usually closer to ( f ∗ g ) + ε = h {\displaystyle (fg)+\varepsilon =h\,} In this case ε is noise that has entered our recorded signal. If a noisy signal or image is assumed to be noiseless, the statistical estimate of g will be incorrect. In turn, the estimate of ƒ will also be incorrect. The lower the signal-to-noise ratio, the worse the estimate of the deconvolved signal will be. That is the reason why inverse filtering the signal (as in the "raw deconvolution" above) is usually not a good solution. However, if at least some knowledge exists of the type of noise in the data (for example, white noise), the estimate of ƒ can be improved through techniques such as Wiener deconvolution. == Applications == === Seismology === The concept of deconvolution had an early application in reflection seismology. In 1950, Enders Robinson was a graduate student at MIT. He worked with others at MIT, such as Norbert Wiener, Norman Levinson, and economist Paul Samuelson, to develop the "convolutional model" of a reflection seismogram. This model assumes that the recorded seismogram s(t) is the convolution of an Earth-reflectivity function e(t) and a seismic wavelet w(t) from a point source, where t represents recording time. Thus, our convolution equation is s ( t ) = ( e ∗ w ) ( t ) . {\displaystyle s(t)=(ew)(t).\,} The seismologist is interested in e, which contains information about the Earth's structure. By the convolution theorem, this equation may be Fourier transformed to S ( ω ) = E ( ω ) W ( ω ) {\displaystyle S(\omega )=E(\omega )W(\omega )\,} in the frequency domain, where ω {\displaystyle \omega } is the frequency variable. By assuming that the reflectivity is white, we can assume that the power spectrum of the reflectivity is constant, and that the power spectrum of the seismogram is the spectrum of the wavelet multiplied by that constant. Thus, | S ( ω ) | ≈ k | W ( ω ) | . {\displaystyle |S(\omega )|\approx k|W(\omega )|.\,} If we assume that the wavelet is minimum phase, we can recover it by calculating the minimum phase equivalent of the power spectrum we just found. The reflectivity may be recovered by designing and applying a Wiener filter that shapes the estimated wavelet to a Dirac delta function (i.e., a spike). The result may be seen as a series of scaled, shifted delta functions (although this is not mathematically rigorous): e ( t ) = ∑ i = 1 N r i δ ( t − τ i ) , {\displaystyle e(t)=\sum _{i=1}^{N}r_{i}\delta (t-\tau _{i}),} where N is the number of reflection events, r i {\displaystyle r_{i}} are the reflection coefficients, t − τ i {\displaystyle t-\tau _{i}} are the reflection times of each event, and δ {\displaystyle \delta } is the Dirac delta function. In practice, since we are dealing with noisy, finite bandwidth, finite length, discretely sampled datasets, the above procedure only yields an approximation of the filter required to deconvolve the data. However, by formulating the problem as the solution of a Toeplitz matrix and using Levinson recursion, we can relatively quickly estimate a filter with the smallest mean squared error possible. We can also do deconvolution directly in the frequency domain and get similar results. The technique is closely related to linear prediction. === Optics and other imaging === In optics and imaging, the term "deconvolution" is specifically used to refer to the process of reversing the optical distortion that takes place in an optical microscope, electron microscope, telescope, or other imaging instrument, thus creating clearer images. It is usually done in the digital domain by a software algorithm, as part of a suite of microscope image processing techniques. Deconvolution is also practical to sharpen images that suffer from fast motion or jiggles during capturing. Early Hubble Space Telescope images were distorted by a flawed mirror and were sharpened by deconvolution. The usual method is to assume that the optical path through the instrument is optically perfect, convolved with a point spread function (PSF), that is, a mathematical function that describes the distortion in terms of the pathway a theoretical point source of light (or other waves) takes through the instrument. Usually, such a point source contributes a small area of fuzziness to the final image. If this function can be determined, it is then a matter of computing its inverse or complementary function, and convolving the acquired image with that. The result is the original, undistorted image. In practice, finding the true PSF is impossible, and usually an approximation of it is used, theoretically calculated or based on some experimental estimation by using known probes. Real optics may also have different PSFs at different focal and spatial locations, and the PSF may be non-linear. The accuracy of the approximation of the PSF will dictate the final result. Different algorithms can be employed to give better results, at the price of being more computationally intensive. Since the original convolution discards data, some algorithms use additional data acquired at nearby focal points to make up some of the lost information. Regularization in iterative algorithms (as in expectation-maximization algorithms) can be applied to avoid unrealistic solutions. When the PSF is unknown, it may be possible to deduce it by systematically trying different possible PSFs and assessing whether the image has improved. This procedure is called blind deconvolution. Blind deconvolution is a well-established image restoration technique in astronomy, where the point nature of the objects photographed exposes the PSF thus making it more feasible. It is also used in fluorescence microscopy for image restoration, and in fluorescence spectral imaging for spectral separation of multiple unknown fluorophores. The most common iterative algorithm for the purpose is the Richardson–Lucy deconvolution algorithm; the Wiener deconvolution (and approximations) are the most common non-iterative algorithms. For some specific imaging systems such as laser pulsed terahertz systems, PSF can be modeled mathematically. As a result, as shown in the figure, deconvolution of the modeled PS

Gutmann method

The Gutmann method is an algorithm for securely erasing the contents of computer hard disk drives, such as files. Devised by Peter Gutmann and Colin Plumb and presented in the paper Secure Deletion of Data from Magnetic and Solid-State Memory in July 1996, it involved writing a series of 35 patterns over the region to be erased. The selection of patterns assumes that the user does not know the encoding mechanism used by the drive, so it includes patterns designed specifically for three types of drives. A user who knows which type of encoding the drive uses can choose only those patterns intended for their drive. A drive with a different encoding mechanism would need different patterns. Most of the patterns in the Gutmann method were designed for older MFM/RLL-encoded disks. Gutmann himself has noted that more modern drives no longer use these older encoding techniques, making parts of the method irrelevant. He said "In the time since this paper was published, some people have treated the 35-pass overwrite technique described in it more as a kind of voodoo incantation to banish evil spirits than the result of a technical analysis of drive encoding techniques". Since about 2001, some ATA IDE and SATA hard drive manufacturer designs include support for the ATA Secure Erase standard, obviating the need to apply the Gutmann method when erasing an entire drive. The Gutmann method does not apply to USB sticks: a 2011 study reports that 71.7% of data remained available. On solid state drives it resulted in 0.8–4.3% recovery. == Background == The delete function in most operating systems simply marks the space occupied by the file as reusable (removes the pointer to the file) without immediately removing any of its contents. At this point the file can be fairly easily recovered by numerous recovery applications. However, once the space is overwritten with other data, there is no known way to use software to recover it. It cannot be done with software alone since the storage device only returns its current contents via its normal interface. Gutmann claims that intelligence agencies have sophisticated tools, including magnetic force microscopes, which together with image analysis, can detect the previous values of bits on the affected area of the media (for example hard disk). This claim however seems to be invalid based on the thesis "Data Reconstruction from a Hard Disk Drive using Magnetic Force Microscopy". == Method == An overwrite session consists of a lead-in of four random write patterns, followed by patterns 5 to 31 (see rows of table below), executed in a random order, and a lead-out of four more random patterns. Each of patterns 5 to 31 was designed with a specific magnetic media encoding scheme in mind, which each pattern targets. The drive is written to for all the passes even though the table below only shows the bit patterns for the passes that are specifically targeted at each encoding scheme. The result should obscure any data on the drive so that only the most advanced physical scanning (e.g., using a magnetic force microscope) of the drive is likely to be able to recover any data. The series of patterns is as follows: Encoded bits shown in bold are what should be present in the ideal pattern, although due to the encoding the complementary bit is actually present at the start of the track. == Criticism == Daniel Feenberg of the National Bureau of Economic Research, an American private nonprofit research organization, criticized Gutmann's claim that intelligence agencies are likely to be able to read overwritten data, citing a lack of evidence for such claims. He finds that Gutmann cites one non-existent source and sources that do not actually demonstrate recovery, only partially-successful observations. The definition of "random" is also quite different from the usual one used: Gutmann expects the use of pseudorandom data with sequences known to the recovering side, not an unpredictable one such as a cryptographically secure pseudorandom number generator. Nevertheless, some published government security procedures consider an overwritten disk to still be sensitive. Human factors and potential limitations in the overwriting software create a residual risk that is not considered acceptable at the highest security levels. Gutmann himself has responded to some of these criticisms and also criticized how his algorithm has been abused in an epilogue to his original paper, in which he states: In the time since this paper was published, some people have treated the 35-pass overwrite technique described in it more as a kind of voodoo incantation to banish evil spirits than the result of a technical analysis of drive encoding techniques. As a result, they advocate applying the voodoo to PRML and EPRML drives even though it will have no more effect than a simple scrubbing with random data. In fact performing the full 35-pass overwrite is pointless for any drive since it targets a blend of scenarios involving all types of (normally-used) encoding technology, which covers everything back to 30+-year-old MFM methods (if you don't understand that statement, re-read the paper). If you're using a drive which uses encoding technology X, you only need to perform the passes specific to X, and you never need to perform all 35 passes. For any modern PRML/EPRML drive, a few passes of random scrubbing is the best you can do. As the paper says, "A good scrubbing with random data will do about as well as can be expected". This was true in 1996, and is still true now. Gutmann's statement has been criticized for not recognizing that PRML/EPRML does not replace RLL, with critics claiming PRML/EPRML to be a signal detection method rather than a data encoding method. Polish data recovery service Kaleron has also claimed that Gutmann's publication contains further factual errors and assumptions that do not apply to actual disks.

Bottom-up and top-down approaches

Bottom-up and top-down are strategies of composition and decomposition in fields as diverse as information processing and ordering knowledge, software, humanistic and scientific theories (see systemics), time management, and organization. In practice they can be seen as a style of thinking, teaching, or leadership. A top-down approach (also known as stepwise design and stepwise refinement and in some cases used as a synonym of decomposition) is essentially the breaking down of a system to gain insight into its compositional subsystems in a reverse engineering fashion. In a top-down approach an overview of the system is formulated, specifying, but not detailing, any first-level subsystems. Each subsystem is then refined in yet greater detail, sometimes in many additional subsystem levels, until the entire specification is reduced to base elements. A top-down model is often specified with the assistance of black boxes, which makes it easier to manipulate. However, black boxes may fail to clarify elementary mechanisms or be detailed enough to realistically validate the model. A top-down approach starts with the big picture, then breaks down into smaller segments. A bottom-up approach is the piecing together of systems to give rise to more complex systems, thus making the original systems subsystems of the emergent system. Bottom-up processing is a type of information processing based on incoming data from the environment to form a perception. From a cognitive psychology perspective, information enters the eyes in one direction (sensory input, or the "bottom"), and is then turned into an image by the brain that can be interpreted and recognized as a perception (output that is "built up" from processing to final cognition). In a bottom-up approach the individual base elements of the system are first specified in great detail. These elements are then linked together to form larger subsystems, which then in turn are linked, sometimes in many levels, until a complete top-level system is formed. This strategy often resembles a "seed" model, by which the beginnings are small but eventually grow in complexity and completeness. But "organic strategies" may result in a tangle of elements and subsystems, developed in isolation and subject to local optimization as opposed to meeting a global purpose. == Computer science == === Software development === In the software development process, the top-down and bottom-up approaches play a key role. Top-down approaches emphasize planning and a complete understanding of the system. It is inherent that no coding can begin until a sufficient level of detail has been reached in the design of at least some part of the system. Top-down approaches are implemented by attaching the stubs in place of the module. But these delay testing of the ultimate functional units of a system until significant design is complete. Bottom-up emphasizes coding and early testing, which can begin as soon as the first module has been specified. But this approach runs the risk that modules may be coded without having a clear idea of how they link to other parts of the system, and that such linking may not be as easy as first thought. Re-usability of code is one of the main benefits of a bottom-up approach. Top-down design was promoted in the 1970s by IBM researchers Harlan Mills and Niklaus Wirth. Mills developed structured programming concepts for practical use and tested them in a 1969 project to automate the New York Times morgue index. The engineering and management success of this project led to the spread of the top-down approach through IBM and the rest of the computer industry. Among other achievements, Niklaus Wirth, the developer of Pascal programming language, wrote the influential paper Program Development by Stepwise Refinement. Since Niklaus Wirth went on to develop languages such as Modula and Oberon (where one could define a module before knowing about the entire program specification), one can infer that top-down programming was not strictly what he promoted. Top-down methods were favored in software engineering until the late 1980s, and object-oriented programming assisted in demonstrating the idea that both aspects of top-down and bottom-up programming could be used. Modern software design approaches usually combine top-down and bottom-up approaches. Although an understanding of the complete system is usually considered necessary for good design—leading theoretically to a top-down approach—most software projects attempt to make use of existing code to some degree. Pre-existing modules give designs a bottom-up flavor. === Programming === Top-down is a programming style, the mainstay of traditional procedural languages, in which design begins by specifying complex pieces and then dividing them into successively smaller pieces. The technique for writing a program using top-down methods is to write a main procedure that names all the major functions it will need. Later, the programming team looks at the requirements of each of those functions and the process is repeated. These compartmentalized subroutines eventually will perform actions so simple they can be easily and concisely coded. When all the various subroutines have been coded the program is ready for testing. By defining how the application comes together at a high level, lower-level work can be self-contained. In a bottom-up approach the individual base elements of the system are first specified in great detail. These elements are then linked together to form larger subsystems, which in turn are linked, sometimes at many levels, until a complete top-level system is formed. This strategy often resembles a "seed" model, by which the beginnings are small, but eventually grow in complexity and completeness. Object-oriented programming (OOP) is a paradigm that uses "objects" to design applications and computer programs. In mechanical engineering with software programs such as Pro/ENGINEER, Solidworks, and Autodesk Inventor users can design products as pieces not part of the whole and later add those pieces together to form assemblies like building with Lego. Engineers call this "piece part design". === Parsing === Parsing is the process of analyzing an input sequence (such as that read from a file or a keyboard) in order to determine its grammatical structure. This method is used in the analysis of both natural languages and computer languages, as in a compiler. Bottom-up parsing is parsing strategy that recognizes the text's lowest-level small details first, before its mid-level structures, and leaves the highest-level overall structure to last. In top-down parsing, on the other hand, one first looks at the highest level of the parse tree and works down the parse tree by using the rewriting rules of a formal grammar. == Natural sciences == === Nanotechnology === Top-down and bottom-up are two approaches for the manufacture of products. These terms were first applied to the field of nanotechnology by the Foresight Institute in 1989 to distinguish between molecular manufacturing (to mass-produce large atomically precise objects) and conventional manufacturing (which can mass-produce large objects that are not atomically precise). Bottom-up approaches seek to have smaller (usually molecular) components built up into more complex assemblies, while top-down approaches seek to create nanoscale devices by using larger, externally controlled ones to direct their assembly. Certain valuable nanostructures, such as Silicon nanowires, can be fabricated using either approach, with processing methods selected on the basis of targeted applications. A top-down approach often uses the traditional workshop or microfabrication methods where externally controlled tools are used to cut, mill, and shape materials into the desired shape and order. Micropatterning techniques, such as photolithography and inkjet printing belong to this category. Vapor treatment can be regarded as a new top-down secondary approaches to engineer nanostructures. Bottom-up approaches, in contrast, use the chemical properties of single molecules to cause single-molecule components to (a) self-organize or self-assemble into some useful conformation, or (b) rely on positional assembly. These approaches use the concepts of molecular self-assembly and/or molecular recognition. See also Supramolecular chemistry. Such bottom-up approaches should, broadly speaking, be able to produce devices in parallel and much cheaper than top-down methods but could potentially be overwhelmed as the size and complexity of the desired assembly increases. === Neuroscience and psychology === These terms are also employed in cognitive sciences including neuroscience, cognitive neuroscience and cognitive psychology to discuss the flow of information in processing. Typically, sensory input is considered bottom-up, and higher cognitive processes, which have more information from other sources, are considered top-down. A bottom-up proc

List of algorithms

An algorithm is a fundamental set of rules or defined procedures that are typically designed and used to be a simpler way to solve a specific problem or a broad set of problems. Simply speaking, algorithms define different processes, sets of rules and regulations, or methodologies that are to be followed through in calculations, data processing, data mining, pattern recognition, automated reasoning or other problem-solving operations. With the increasing automation of services, more and more decisions are being made by algorithms. Some general examples are risk assessments, anticipatory policing, and pattern recognition technology. The following is a list of well-known algorithms. == Automated planning == == Combinatorial algorithms == === General combinatorial algorithms === Brent's algorithm: finds a cycle in function value iterations using only two iterators Floyd's cycle-finding algorithm: finds a cycle in function value iterations Gale–Shapley algorithm: solves the stable matching problem Pseudorandom number generators (uniformly distributed—see also List of pseudorandom number generators for other PRNGs with varying degrees of convergence and varying statistical quality): ACORN generator Blum Blum Shub Lagged Fibonacci generator Linear congruential generator Mersenne Twister === Graph algorithms === Blossom algorithm: algorithm for constructing maximum-cardinality matching on graphs. Coloring algorithm: algorithms for graph (vertex or edge) coloring (subject to constraints, e.g. proper coloring or list coloring) Hopcroft–Karp algorithm: convert a bipartite graph to a maximum-cardinality matching Hungarian algorithm: algorithm for finding a perfect matching Prüfer coding: conversion between a labeled tree and its Prüfer sequence Tarjan's off-line lowest common ancestors algorithm: computes lowest common ancestors for pairs of nodes in a tree Topological sort: finds linear order of nodes (e.g. jobs) based on their dependencies. ==== Graph drawing ==== Coin graph drawing algorithms for finite connected planar graphs (approximately computing the theoretical circle-packing given by the Koebe-Andreev-Thurston theorem). See also Fáry's theorem on straight-line drawings of planar graphs. Force-based algorithms (also known as force-directed algorithms or spring-based algorithms) Spectral layout ==== Network theory ==== Network analysis Link analysis Girvan–Newman algorithm: detect communities in complex systems Web link analysis Hyperlink-Induced Topic Search (HITS) (also known as Hubs and authorities) PageRank TrustRank Flow networks Dinic's algorithm: is a strongly polynomial algorithm for computing the maximum flow in a flow network. Edmonds–Karp algorithm: implementation of Ford–Fulkerson Ford–Fulkerson algorithm: computes the maximum flow in a graph Karger's algorithm: a Monte Carlo method to compute the minimum cut of a connected graph Push–relabel algorithm: computes a maximum flow in a graph ==== Routing for graphs ==== Edmonds' algorithm (also known as Chu–Liu/Edmonds' algorithm): find maximum or minimum branchings Euclidean minimum spanning tree: algorithms for computing the minimum spanning tree of a set of points in the plane Longest path problem: find a simple path of maximum length in a given graph Minimum spanning tree Borůvka's algorithm Kruskal's algorithm Prim's algorithm Reverse-delete algorithm Nonblocking minimal spanning switch say, for a telephone exchange Shortest path problem Bellman–Ford algorithm: computes shortest paths in a weighted graph (where some of the edge weights may be negative) Dijkstra's algorithm: computes shortest paths in a graph with non-negative edge weights Floyd–Warshall algorithm: solves the all pairs shortest path problem in a weighted, directed graph Johnson's algorithm: all pairs shortest path algorithm in sparse weighted directed graph Transitive closure problem: find the transitive closure of a given binary relation Traveling salesman problem Christofides algorithm Nearest neighbour algorithm Vehicle routing problem Clarke and Wright Saving algorithm Warnsdorff's rule: a heuristic method for solving the Knight's tour problem ==== Graph search ==== A: special case of best-first search that uses heuristics to improve speed B: a best-first graph search algorithm that finds the least-cost path from a given initial node to any goal node (out of one or more possible goals) Backtracking: abandons partial solutions when they are found not to satisfy a complete solution Beam search: is a heuristic search algorithm that is an optimization of best-first search that reduces its memory requirement Beam stack search: integrates backtracking with beam search Best-first search: traverses a graph in the order of likely importance using a priority queue Bidirectional search: find the shortest path from an initial vertex to a goal vertex in a directed graph Breadth-first search: traverses a graph level by level Brute-force search: an exhaustive and reliable search method, but computationally inefficient in many applications D: an incremental heuristic search algorithm Depth-first search: traverses a graph branch by branch Dijkstra's algorithm: a special case of A for which no heuristic function is used General Problem Solver: a seminal theorem-proving algorithm intended to work as a universal problem solver machine. Iterative deepening depth-first search (IDDFS): a state space search strategy Jump point search: an optimization to A which may reduce computation time by an order of magnitude using further heuristics Lexicographic breadth-first search (also known as Lex-BFS): a linear time algorithm for ordering the vertices of a graph SSS: state space search traversing a game tree in a best-first fashion similar to that of the A search algorithm Uniform-cost search: a tree search that finds the lowest-cost route where costs vary ==== Subgraphs ==== Cliques Bron–Kerbosch algorithm: a technique for finding maximal cliques in an undirected graph MaxCliqueDyn maximum clique algorithm: find a maximum clique in an undirected graph Strongly connected components Kosaraju's algorithm Path-based strong component algorithm Tarjan's strongly connected components algorithm Subgraph isomorphism problem === Sequence algorithms === ==== Approximate sequence matching ==== Bitap algorithm: fuzzy algorithm that determines if strings are approximately equal. Phonetic algorithms Daitch–Mokotoff Soundex: a Soundex refinement which allows matching of Slavic and Germanic surnames Double Metaphone: an improvement on Metaphone Match rating approach: a phonetic algorithm developed by Western Airlines Metaphone: an algorithm for indexing words by their sound, when pronounced in English NYSIIS: phonetic algorithm, improves on Soundex Soundex: a phonetic algorithm for indexing names by sound, as pronounced in English String metrics: computes a similarity or dissimilarity (distance) score between two pairs of text strings Damerau–Levenshtein distance: computes a distance measure between two strings, improves on Levenshtein distance Dice's coefficient (also known as the Dice coefficient): a similarity measure related to the Jaccard index Hamming distance: sum number of positions which are different Jaro–Winkler distance: is a measure of similarity between two strings Levenshtein edit distance: computes a metric for the amount of difference between two sequences Trigram search: search for text when the exact syntax or spelling of the target object is not precisely known ==== Selection algorithms ==== Introselect Quickselect ==== Sequence search ==== Linear search: locates an item in an unsorted sequence Selection algorithm: finds the kth largest item in a sequence Sorted lists Binary search algorithm: locates an item in a sorted sequence Eytzinger binary search: cache friendly binary search algorithm Fibonacci search technique: search a sorted sequence using a divide and conquer algorithm that narrows down possible locations with the aid of Fibonacci numbers Jump search (or block search): linear search on a smaller subset of the sequence Predictive search: binary-like search which factors in magnitude of search term versus the high and low values in the search. Sometimes called dictionary search or interpolated search. Uniform binary search: an optimization of the classic binary search algorithm Ternary search: a technique for finding the minimum or maximum of a function that is either strictly increasing and then strictly decreasing or vice versa ==== Sequence merging ==== k-way merge algorithm Simple merge algorithm Union (merge, with elements on the output not repeated) ==== Sequence permutations ==== Fisher–Yates shuffle (also known as the Knuth shuffle): randomly shuffle a finite set Heap's permutation generation algorithm: interchange elements to generate next permutation Schensted algorithm: constructs a pair of Young tableaux from a permutation Steinhaus–Johnson–Trotter algorithm (also known as the Johnson–Trotter algorithm):

Averbis

Averbis has a focus on healthcare, pharma, automotive and intellectual property analytics. Averbis is involved in various research projects of the German Federal Ministry of Economics and Energy and the European Union such as DebugIT, EUCases, Mantra and SEMCARE. In addition to these projects, Averbis was also involved in the following projects: Greenpilot is a virtual library, which provides technical information in the fields of nutrition, environment and agriculture. Medpilot is a virtual library, which provides information about medicine and related sciences. In 2013, Averbis has been nominated for the German Founder Prize 2013. Averbis GmbH provides text analytics and text mining software to transform unstructured text into actionable information. It was founded in 2007 by IT experts after years of relevant scientific experience in the field of text mining and multilingual information retrieval. Averbis works in the field of terminology management, natural language processing, machine learning and semantic search. Its text mining software is embedded into the text mining framework UIMA.

Media aggregation platform

A Media Aggregation Platform or Media Aggregation Portal (MAP) is an over the top service for distributing web-based streaming media content from multiple sources to a large audience. MAPs consist of networks of sources who host their own content which viewers can choose and access directly from a larger variety of content to choose from than a single source can offer. The service is used by content providers, looking to extend the reach of their content. Unlike multichannel video programming distributor (MVPD) or multiple-system operators (MSO), MAPs rely on the Internet rather than cables or satellite. As more network television channels have moved online in the early 21st century, joining web-native channels like Netflix, MAPs aggregate content the way that MSOs and MVPDs have used cable, and to a lesser extent satellite and IPTV infrastructure. There are companies that offer a similar service for free, including Yidio and StreamingMoviesRight, while others charge a subscription fee like as FreeCast Inc's Rabbit TV Plus. When compared with MSOs and MVPDs, MAP networks have much lower costs due to lack of physical infrastructure. The majority of revenue from MAP services are retained by the content creators, and revenue is instead collected from advertisements, pay-per-view, and subscription-based content offerings instead of licensing and reselling content. MAP service consumers interact and purchase content directly from its source, without the markup added by a middleman.