This is a list of operating systems. Computer operating systems can be categorized by technology, ownership, licensing, working state, usage, and by many other characteristics. In practice, many of these groupings may overlap. Criteria for inclusion is notability, as shown either through an existing Wikipedia article or citation to a reliable source. == Proprietary == === Acorn Computers === Arthur ARX MOS RISC iX RISC OS === Amazon === Fire OS === Amiga Inc. === AmigaOS AmigaOS 1.0-3.9 (Motorola 68000) AmigaOS 4 (PowerPC) Amiga Unix (a.k.a. Amix) === Amstrad === AMSDOS Contiki CP/M 2.2 CP/M Plus SymbOS === Apple === Apple II Apple DOS Apple Pascal ProDOS GS/OS GNO/ME Contiki Apple III Apple SOS Apple Lisa Mac Classic Mac OS A/UX (UNIX System V with BSD extensions) Copland MkLinux Pink Rhapsody macOS (formerly Mac OS X and OS X) macOS Server (formerly Mac OS X Server and OS X Server) Apple Network Server IBM AIX (Apple-customized) Apple MessagePad Newton OS iPhone and iPod Touch iOS (formerly iPhone OS) iPad iPadOS Apple Watch watchOS Apple TV tvOS Embedded operating systems bridgeOS Apple Vision Pro visionOS Embedded operating systems A/ROSE iPod software (unnamed embedded OS for iPod) Unnamed NetBSD variant for Airport Extreme and Time Capsule === Apollo Computer, Hewlett-Packard === Domain/OS – One of the first network-based systems. Run on Apollo/Domain hardware. Later bought by Hewlett-Packard. === Atari === Atari DOS (for 8-bit computers) Atari TOS Atari MultiTOS Contiki (for 8-bit, ST, Portfolio) === BAE Systems === XTS-400 === Be Inc. === BeOS BeIA BeOS r5.1d0 magnussoft ZETA (based on BeOS r5.1d0 source code, developed by yellowTAB) === Bell Labs === Unix ("Ken's new system," for its creator (Ken Thompson), officially Unics and then Unix, the prototypic operating system created in Bell Labs in 1969 that formed the basis for the Unix family of operating systems) UNIX Time-Sharing System v1 UNIX Time-Sharing System v2 UNIX Time-Sharing System v3 UNIX Time-Sharing System v4 UNIX Time-Sharing System v5 UNIX Time-Sharing System v6 MINI-UNIX PWB/UNIX USG CB Unix UNIX Time-Sharing System v7 (It is from Version 7 Unix (and, to an extent, its descendants listed below) that almost all Unix-based and Unix-like operating systems descend.) Unix System III Unix System IV Unix System V Unix System V Releases 2.0, 3.0, 3.2, 4.0, and 4.2 UNIX Time-Sharing System v8 UNIX Time-Sharing System v9 UNIX Time-Sharing System v10 Non-Unix Operating Systems: BESYS Plan 9 from Bell Labs Inferno === Burroughs Corporation, Unisys === Burroughs MCP === CII === Siris 8 === Commodore International === GEOS AmigaOS AROS Research Operating System === Control Data Corporation === ==== Lower 3000 series ==== SCOPE (Supervisory Control Of Program Execution) ==== Upper 3000 series ==== SCOPE (Supervisory Control Of Program Execution) Drum SCOPE ==== 6x00 and related Cyber ==== Chippewa Operating System (COS) MACE (Mansfield and Cahlander Executive) Kronos (Kronographic OS) NOS (Network Operating System) NOS/VE (NOS Virtual Environment) SCOPE (Supervisory Control Of Program Execution) NOS/BE NOS Batch Environment SIPROS (Simultaneous Processing Operating System) ==== Star-100 ==== Multiple Console Time Sharing System (MCTS), from General Motors Research === CloudMosa === Puffin OS === Convergent Technologies === Convergent Technologies Operating System (CTOS) – later acquired by Unisys === Cromemco === Cromemco DOS (CDOS) – a Disk Operating system compatible with CP/M Cromix – a multitasking, multi-user, Unix-like OS for Cromemco microcomputers with Z80A and/or 68000 CPU === Data General === AOS for 16-bit Data General Eclipse computers and AOS/VS for 32-bit (MV series) Eclipses, MP/AOS for microNOVA-based computers DG/UX RDOS Real-time Disk Operating System, with variants: RTOS and DOS (not related to PC DOS, MS-DOS etc.) === Datapoint === CTOS Cassette Tape Operating System for the Datapoint 2200 DOS Disk Operating System for the Datapoint 2200, 5500, and 1100 === DDC-I, Inc. === Deos – Time & Space Partitioned RTOS, Certified to DO-178B, Level A since 1998 HeartOS – POSIX-based Hard Real-Time Operating System === Digital Research, Inc. === CP/M CP/M CP/M for Intel 8080/8085 and Zilog Z80 Personal CP/M, a refinement of CP/M CP/M Plus with BDOS 3.0 CP/M-68K CP/M for Motorola 68000 CP/M-8000 CP/M for Zilog Z8000 CP/M-86 CP/M for Intel 8088/8086 CP/M-86 Plus Personal CP/M-86 MP/M Multi-user version of CP/M-80 MP/M II MP/M-86 Multi-user version of CP/M-86 MP/M 8-16, a dual-processor variant of MP/M for 8086 and 8080 CPUs. Concurrent CP/M, the successor of CP/M-80 and MP/M-80 Concurrent CP/M-86, the successor of CP/M-86 and MP/M-86 Concurrent CP/M 8-16, a dual-processor variant of Concurrent CP/M for 8086 and 8080 CPUs. Concurrent CP/M-68K, a variant for the 68000 DOS Concurrent DOS, the successor of Concurrent CP/M-86 with PC-MODE Concurrent PC DOS, a Concurrent DOS variant for IBM compatible PCs Concurrent DOS 8-16, a dual-processor variant of Concurrent DOS for 8086 and 8080 CPUs Concurrent DOS 286 Concurrent DOS XM, a real-mode variant of Concurrent DOS with EEMS support Concurrent DOS 386 Concurrent DOS 386/MGE, a Concurrent DOS 386 variant with advanced graphics terminal capabilities Concurrent DOS 68K, a port of Concurrent DOS to Motorola 68000 CPUs with DOS source code portability capabilities FlexOS 1.0 – 2.34, a derivative of Concurrent DOS 286 FlexOS 186, a variant of FlexOS for terminals FlexOS 286, a variant of FlexOS for hosts Siemens S5-DOS/MT, an industrial control system based on FlexOS IBM 4680 OS, a POS operating system based on FlexOS IBM 4690 OS, a POS operating system based on FlexOS Toshiba 4690 OS, a POS operating system based on IBM 4690 OS and FlexOS FlexOS 386, a later variant of FlexOS for hosts IBM 4690 OS, a POS operating system based on FlexOS Toshiba 4690 OS, a POS operating system based on IBM 4690 OS and FlexOS FlexOS 68K, a derivative of Concurrent DOS 68K Multiuser DOS, the successor of Concurrent DOS 386 CCI Multiuser DOS Datapac Multiuser DOS Datapac System Manager, a derivative of Datapac Multiuser DOS IMS Multiuser DOS IMS REAL/32, a derivative of Multiuser DOS IMS REAL/NG, the successor of REAL/32 DOS Plus 1.1 – 2.1, a single-user, multi-tasking system derived from Concurrent DOS 4.1 – 5.0 DR-DOS 3.31 – 6.0, a single-user, single-tasking native DOS derived from Concurrent DOS 6.0 Novell PalmDOS 1.0 Novell "Star Trek" Novell DOS 7, a single-user, multi-tasking system derived from DR DOS Caldera OpenDOS 7.01 Caldera DR-DOS 7.02 and higher === Digital Equipment Corporation, Compaq, Hewlett-Packard, Hewlett Packard Enterprise === Batch-11/DOS-11 OS/8 RSTS/E – multi-user time-sharing OS for PDP-11s RSX-11 – multiuser, multitasking OS for PDP-11s RT-11 – single user OS for PDP-11 TOPS-10 – for the PDP-10 TENEX – an ancestor of TOPS-20 from BBN, for the PDP-10 TOPS-20 – for the PDP-10 DEC MICA – for the DEC PRISM Digital UNIX – derived from OSF/1, became HP's Tru64 UNIX Ultrix VMS – originally by DEC (now by VMS Software Inc.) for the VAX mini-computer range; later renamed OpenVMS and ported to Alpha, and subsequently ported to Intel Itanium and then to x86-64 WAITS – for the PDP-6 and PDP-10 === ENEA AB === OSE – Flexible, small footprint, high-performance RTOS for control processors === Fujitsu === Towns OS XSP OS/IV MSP MSP-EX === GEC Computers === COS DOS OS4000 === General Electric, Honeywell, Bull === Real-Time Multiprogramming Operating System GCOS Multics === Google === ChromiumOS is an open source operating system development version of ChromeOS. Both operating systems are based on the Linux kernel. ChromeOS is designed to work exclusively with web applications, though has been updated to run Android apps with full support for Google Play Store. Announced on July 7, 2009, ChromeOS is currently publicly available and was released summer 2011. The ChromeOS source code was released on November 19, 2009, under the BSD license as ChromiumOS. Container-Optimized OS (COS) is an operating system that is optimized for running Docker containers, based on ChromiumOS. Android is an operating system for mobile devices. It consists of Android Runtime (userland) with Linux (kernel), with its Linux kernel modified to add drivers for mobile device hardware and to remove unused Vanilla Linux drivers. gLinux, a Linux distribution that Google uses internally Fuchsia is a capability-based real-time operating system (RTOS) scalable to universal devices, in early development, from the tiniest embedded hardware, wristwatches, tablets to the largest personal computers. Unlike ChromeOS and Android, it is not based on the Linux kernel, but instead began on a new microkernel called "Zircon", derived from "Little Kernel". Wear OS a version of Google's Android operating system designed for smartwatches and other wearables. === Green Hills Software === INTEGRITY – Reliable Operating system INTEGRITY-178B – A DO-178B certified version of INTEGRITY. μ-
Speech segmentation
Speech segmentation is the process of identifying the boundaries between words, syllables, or phonemes in spoken natural languages. The term applies both to the mental processes used by humans, and to artificial processes of natural language processing. In the field of automatic pronunciation assessment, the process of segmenting an utterance against expected word(s) is called forced alignment. Speech segmentation is a subfield of general speech perception and an important subproblem of the technologically focused field of speech recognition, and cannot be adequately solved in isolation. As in most natural language processing problems, one must take into account context, grammar, and semantics, and even so the result is often a probabilistic division (statistically based on likelihood) rather than a categorical one. Though it seems that coarticulation—a phenomenon which may happen between adjacent words just as easily as within a single word—presents the main challenge in speech segmentation across languages, some other problems and strategies employed in solving those problems can be seen in the following sections. This problem overlaps to some extent with the problem of text segmentation that occurs in some languages which are traditionally written without inter-word spaces, like Chinese and Japanese, compared to writing systems which indicate speech segmentation between words by a word divider, such as the space. However, even for those languages, text segmentation is often much easier than speech segmentation, because the written language usually has little interference between adjacent words, and often contains additional clues not present in speech (such as the use of Chinese characters for word stems in Japanese). == Lexical recognition == In natural languages, the meaning of a complex spoken sentence can be understood by decomposing it into smaller lexical segments (roughly, the words of the language), associating a meaning to each segment, and combining those meanings according to the grammar rules of the language. Though lexical recognition is not thought to be used by infants in their first year, due to their highly limited vocabularies, it is one of the major processes involved in speech segmentation for adults. Three main models of lexical recognition exist in current research: first, whole-word access, which argues that words have a whole-word representation in the lexicon; second, decomposition, which argues that morphologically complex words are broken down into their morphemes (roots, stems, inflections, etc.) and then interpreted and; third, the view that whole-word and decomposition models are both used, but that the whole-word model provides some computational advantages and is therefore dominant in lexical recognition. To give an example, in a whole-word model, the word "cats" might be stored and searched for by letter, first "c", then "ca", "cat", and finally "cats". The same word, in a decompositional model, would likely be stored under the root word "cat" and could be searched for after removing the "s" suffix. "Falling", similarly, would be stored as "fall" and suffixed with the "ing" inflection. Though proponents of the decompositional model recognize that a morpheme-by-morpheme analysis may require significantly more computation, they argue that the unpacking of morphological information is necessary for other processes (such as syntactic structure) which may occur parallel to lexical searches. As a whole, research into systems of human lexical recognition is limited due to little experimental evidence that fully discriminates between the three main models. In any case, lexical recognition likely contributes significantly to speech segmentation through the contextual clues it provides, given that it is a heavily probabilistic system—based on the statistical likelihood of certain words or constituents occurring together. For example, one can imagine a situation where a person might say "I bought my dog at a ____ shop" and the missing word's vowel is pronounced as in "net", "sweat", or "pet". While the probability of "netshop" is extremely low, since "netshop" isn't currently a compound or phrase in English, and "sweatshop" also seems contextually improbable, "pet shop" is a good fit because it is a common phrase and is also related to the word "dog". Moreover, an utterance can have different meanings depending on how it is split into words. A popular example, often quoted in the field, is the phrase "How to wreck a nice beach", which sounds very similar to "How to recognize speech". As this example shows, proper lexical segmentation depends on context and semantics which draws on the whole of human knowledge and experience, and would thus require advanced pattern recognition and artificial intelligence technologies to be implemented on a computer. Lexical recognition is of particular value in the field of computer speech recognition, since the ability to build and search a network of semantically connected ideas would greatly increase the effectiveness of speech-recognition software. Statistical models can be used to segment and align recorded speech to words or phones. Applications include automatic lip-synch timing for cartoon animation, follow-the-bouncing-ball video sub-titling, and linguistic research. Automatic segmentation and alignment software is commercially available. == Phonotactic cues == For most spoken languages, the boundaries between lexical units are difficult to identify; phonotactics are one answer to this issue. One might expect that the inter-word spaces used by many written languages like English or Spanish would correspond to pauses in their spoken version, but that is true only in very slow speech, when the speaker deliberately inserts those pauses. In normal speech, one typically finds many consecutive words being said with no pauses between them, and often the final sounds of one word blend smoothly or fuse with the initial sounds of the next word. The notion that speech is produced like writing, as a sequence of distinct vowels and consonants, may be a relic of alphabetic heritage for some language communities. In fact, the way vowels are produced depends on the surrounding consonants just as consonants are affected by surrounding vowels; this is called coarticulation. For example, in the word "kit", the [k] is farther forward than when we say 'caught'. But also, the vowel in "kick" is phonetically different from the vowel in "kit", though we normally do not hear this. In addition, there are language-specific changes which occur in casual speech which makes it quite different from spelling. For example, in English, the phrase "hit you" could often be more appropriately spelled "hitcha". From a decompositional perspective, in many cases, phonotactics play a part in letting speakers know where to draw word boundaries. In English, the word "strawberry" is perceived by speakers as consisting (phonetically) of two parts: "straw" and "berry". Other interpretations such as "stra" and "wberry" are inhibited by English phonotactics, which does not allow the cluster "wb" word-initially. Other such examples are "day/dream" and "mile/stone" which are unlikely to be interpreted as "da/ydream" or "mil/estone" due to the phonotactic probability or improbability of certain clusters. The sentence "Five women left", which could be phonetically transcribed as [faɪvwɪmɘnlɛft], is marked since neither /vw/ in /faɪvwɪmɘn/ nor /nl/ in /wɪmɘnlɛft/ are allowed as syllable onsets or codas in English phonotactics. These phonotactic cues often allow speakers to easily distinguish the boundaries in words. Vowel harmony in languages like Finnish can also serve to provide phonotactic cues. While the system does not allow front vowels and back vowels to exist together within one morpheme, compounds allow two morphemes to maintain their own vowel harmony while coexisting in a word. Therefore, in compounds such as "selkä/ongelma" ('back problem') where vowel harmony is distinct between two constituents in a compound, the boundary will be wherever the switch in harmony takes place—between the "ä" and the "ö" in this case. Still, there are instances where phonotactics may not aid in segmentation. Words with unclear clusters or uncontrasted vowel harmony as in "opinto/uudistus" ('student reform') do not offer phonotactic clues as to how they are segmented. From the perspective of the whole-word model, however, these words are thought be stored as full words, so the constituent parts would not necessarily be relevant to lexical recognition. == In infants and non-natives == Infants are one major focus of research in speech segmentation. Since infants have not yet acquired a lexicon capable of providing extensive contextual clues or probability-based word searches within their first year, as mentioned above, they must often rely primarily upon phonotactic and rhythmic cues (with prosody being the dominant cue), all
Diffusion map
Diffusion maps is a dimensionality reduction or feature extraction algorithm introduced by Coifman and Lafon which computes a family of embeddings of a data set into Euclidean space (often low-dimensional) whose coordinates can be computed from the eigenvectors and eigenvalues of a diffusion operator on the data. The Euclidean distance between points in the embedded space is equal to the "diffusion distance" between probability distributions centered at those points. Different from linear dimensionality reduction methods such as principal component analysis (PCA), diffusion maps are part of the family of nonlinear dimensionality reduction methods which focus on discovering the underlying manifold that the data has been sampled from. By integrating local similarities at different scales, diffusion maps give a global description of the data-set. Compared with other methods, the diffusion map algorithm is robust to noise perturbation and computationally inexpensive. == Definition of diffusion maps == Following and , diffusion maps can be defined in four steps. === Connectivity === Diffusion maps exploit the relationship between heat diffusion and random walk Markov chain. The basic observation is that if we take a random walk on the data, walking to a nearby data-point is more likely than walking to another that is far away. Let ( X , A , μ ) {\displaystyle (X,{\mathcal {A}},\mu )} be a measure space, where X {\displaystyle X} is the data set and μ {\displaystyle \mu } represents the distribution of the points on X {\displaystyle X} . Based on this, the connectivity k {\displaystyle k} between two data points, x {\displaystyle x} and y {\displaystyle y} , can be defined as the probability of walking from x {\displaystyle x} to y {\displaystyle y} in one step of the random walk. Usually, this probability is specified in terms of a kernel function of the two points: k : X × X → R {\displaystyle k:X\times X\rightarrow \mathbb {R} } . For example, the popular Gaussian kernel: k ( x , y ) = exp ( − | | x − y | | 2 ϵ ) {\displaystyle k(x,y)=\exp \left(-{\frac {||x-y||^{2}}{\epsilon }}\right)} More generally, the kernel function has the following properties k ( x , y ) = k ( y , x ) {\displaystyle k(x,y)=k(y,x)} ( k {\displaystyle k} is symmetric) k ( x , y ) ≥ 0 ∀ x , y {\displaystyle k(x,y)\geq 0\,\,\forall x,y} ( k {\displaystyle k} is positivity preserving). The kernel constitutes the prior definition of the local geometry of the data-set. Since a given kernel will capture a specific feature of the data set, its choice should be guided by the application that one has in mind. This is a major difference with methods such as principal component analysis, where correlations between all data points are taken into account at once. Given ( X , k ) {\displaystyle (X,k)} , we can then construct a reversible discrete-time Markov chain on X {\displaystyle X} (a process known as the normalized graph Laplacian construction): d ( x ) = ∫ X k ( x , y ) d μ ( y ) {\displaystyle d(x)=\int _{X}k(x,y)d\mu (y)} and define: p ( x , y ) = k ( x , y ) d ( x ) {\displaystyle p(x,y)={\frac {k(x,y)}{d(x)}}} Although the new normalized kernel does not inherit the symmetric property, it does inherit the positivity-preserving property and gains a conservation property: ∫ X p ( x , y ) d μ ( y ) = 1 {\displaystyle \int _{X}p(x,y)d\mu (y)=1} === Diffusion process === From p ( x , y ) {\displaystyle p(x,y)} we can construct a transition matrix of a Markov chain ( M {\displaystyle M} ) on X {\displaystyle X} . In other words, p ( x , y ) {\displaystyle p(x,y)} represents the one-step transition probability from x {\displaystyle x} to y {\displaystyle y} , and M t {\displaystyle M^{t}} gives the t-step transition matrix. We define the diffusion matrix L {\displaystyle L} (it is also a version of graph Laplacian matrix) L i , j = k ( x i , x j ) {\displaystyle L_{i,j}=k(x_{i},x_{j})\,} We then define the new kernel L i , j ( α ) = k ( α ) ( x i , x j ) = L i , j ( d ( x i ) d ( x j ) ) α {\displaystyle L_{i,j}^{(\alpha )}=k^{(\alpha )}(x_{i},x_{j})={\frac {L_{i,j}}{(d(x_{i})d(x_{j}))^{\alpha }}}\,} or equivalently, L ( α ) = D − α L D − α {\displaystyle L^{(\alpha )}=D^{-\alpha }LD^{-\alpha }\,} where D is a diagonal matrix and D i , i = ∑ j L i , j . {\displaystyle D_{i,i}=\sum _{j}L_{i,j}.} We apply the graph Laplacian normalization to this new kernel: M = ( D ( α ) ) − 1 L ( α ) , {\displaystyle M=({D}^{(\alpha )})^{-1}L^{(\alpha )},\,} where D ( α ) {\displaystyle D^{(\alpha )}} is a diagonal matrix and D i , i ( α ) = ∑ j L i , j ( α ) . {\displaystyle {D}_{i,i}^{(\alpha )}=\sum _{j}L_{i,j}^{(\alpha )}.} p ( x j , t | x i ) = M i , j t {\displaystyle p(x_{j},t|x_{i})=M_{i,j}^{t}\,} One of the main ideas of the diffusion framework is that running the chain forward in time (taking larger and larger powers of M {\displaystyle M} ) reveals the geometric structure of X {\displaystyle X} at larger and larger scales (the diffusion process). Specifically, the notion of a cluster in the data set is quantified as a region in which the probability of escaping this region is low (within a certain time t). Therefore, t not only serves as a time parameter, but it also has the dual role of scale parameter. The eigendecomposition of the matrix M t {\displaystyle M^{t}} yields M i , j t = ∑ l λ l t ψ l ( x i ) ϕ l ( x j ) {\displaystyle M_{i,j}^{t}=\sum _{l}\lambda _{l}^{t}\psi _{l}(x_{i})\phi _{l}(x_{j})\,} where { λ l } {\displaystyle \{\lambda _{l}\}} is the sequence of eigenvalues of M {\displaystyle M} and { ψ l } {\displaystyle \{\psi _{l}\}} and { ϕ l } {\displaystyle \{\phi _{l}\}} are the biorthogonal left and right eigenvectors respectively. Due to the spectrum decay of the eigenvalues, only a few terms are necessary to achieve a given relative accuracy in this sum. ==== Parameter α and the diffusion operator ==== The reason to introduce the normalization step involving α {\displaystyle \alpha } is to tune the influence of the data point density on the infinitesimal transition of the diffusion. In some applications, the sampling of the data is generally not related to the geometry of the manifold we are interested in describing. In this case, we can set α = 1 {\displaystyle \alpha =1} and the diffusion operator approximates the Laplace–Beltrami operator. We then recover the Riemannian geometry of the data set regardless of the distribution of the points. To describe the long-term behavior of the point distribution of a system of stochastic differential equations, we can use α = 0.5 {\displaystyle \alpha =0.5} and the resulting Markov chain approximates the Fokker–Planck diffusion. With α = 0 {\displaystyle \alpha =0} , it reduces to the classical graph Laplacian normalization. === Diffusion distance === The diffusion distance at time t {\displaystyle t} between two points can be measured as the similarity of two points in the observation space with the connectivity between them. It is given by D t ( x i , x j ) 2 = ∑ y ( p ( y , t | x i ) − p ( y , t | x j ) ) 2 ϕ 0 ( y ) {\displaystyle D_{t}(x_{i},x_{j})^{2}=\sum _{y}{\frac {(p(y,t|x_{i})-p(y,t|x_{j}))^{2}}{\phi _{0}(y)}}} where ϕ 0 ( y ) {\displaystyle \phi _{0}(y)} is the stationary distribution of the Markov chain, given by the first left eigenvector of M {\displaystyle M} . Explicitly: ϕ 0 ( y ) = d ( y ) ∑ z ∈ X d ( z ) {\displaystyle \phi _{0}(y)={\frac {d(y)}{\sum _{z\in X}d(z)}}} Intuitively, D t ( x i , x j ) {\displaystyle D_{t}(x_{i},x_{j})} is small if there is a large number of short paths connecting x i {\displaystyle x_{i}} and x j {\displaystyle x_{j}} . There are several interesting features associated with the diffusion distance, based on our previous discussion that t {\displaystyle t} also serves as a scale parameter: Points are closer at a given scale (as specified by D t ( x i , x j ) {\displaystyle D_{t}(x_{i},x_{j})} ) if they are highly connected in the graph, therefore emphasizing the concept of a cluster. This distance is robust to noise, since the distance between two points depends on all possible paths of length t {\displaystyle t} between the points. From a machine learning point of view, the distance takes into account all evidences linking x i {\displaystyle x_{i}} to x j {\displaystyle x_{j}} , allowing us to conclude that this distance is appropriate for the design of inference algorithms based on the majority of preponderance. === Diffusion process and low-dimensional embedding === The diffusion distance can be calculated using the eigenvectors by D t ( x i , x j ) 2 = ∑ l λ l 2 t ( ψ l ( x i ) − ψ l ( x j ) ) 2 {\displaystyle D_{t}(x_{i},x_{j})^{2}=\sum _{l}\lambda _{l}^{2t}(\psi _{l}(x_{i})-\psi _{l}(x_{j}))^{2}\,} So the eigenvectors can be used as a new set of coordinates for the data. The diffusion map is defined as: Ψ t ( x ) = ( λ 1 t ψ 1 ( x ) , λ 2 t ψ 2 ( x ) , … , λ k t ψ k ( x ) ) {\displaystyle \Psi _{t}(x)=(\lambda _{1}^{t}\psi _{1}(x),\lambda _{2}^{t}\psi _{2}(x),\ld
Jubatus
Jubatus is an open-source online machine learning and distributed computing framework developed at Nippon Telegraph and Telephone and Preferred Infrastructure. Its features include classification, recommendation, regression, anomaly detection and graph mining. It supports many client languages, including C++, Java, Ruby and Python. It uses Iterative Parameter Mixture for distributed machine learning. == Notable Features == Jubatus supports: Multi-classification algorithms: Perceptron Passive Aggressive Confidence Weighted Adaptive Regularization of Weight Vectors Normal Herd Recommendation algorithms using: Inverted index Minhash Locality-sensitive hashing Regression algorithms: Passive Aggressive feature extraction method for natural language: n-gram Text segmentation
Ilastik
ilastik is free open source software for image classification and segmentation. No previous experience in image processing is required to run the software. Since 2018 ilastik is further developed and maintained by Anna Kreshuk's group at European Molecular Biology Laboratory. == Features == ilastik allows user to annotate an arbitrary number of classes in images with a mouse interface. Using these user annotations and the generic (nonlinear) image features, the user can train a random forest classifier. Trained ilastik classifiers can be applied new data not included in the training set in ilastik via its batch processing functionality, or without using the graphical user interface, in headless mode. ilastik can be integrated into various related tools: Pre-trained workflows can be executed directly from ImageJ/Fiji using the ilastik-ImageJ plugin. Pre-trained ilastik Pixel Classification workflows can be run directly in Python with the ilastik Python package, which is available via conda. ilastik has a CellProfiler module to use ilastik classifiers to process images within a CellProfiler framework. == History == ilastik was first released in 2011 by scientists at the Heidelberg Collaboratory for Image Processing (HCI), University of Heidelberg. == Application == The Interactive Learning and Segmentation Toolkit Carving Cell classification and neuron classification Synapse detection Cell tracking Neural Network Classification == Resources == ilastik project is hosted on GitHub. It is a collaborative project, any contributions such as comments, bug reports, bug fixes or code contributions are welcome. The ilastik team can be contacted for user support on the image.sc forum.
PlantUML
PlantUML is an open-source tool allowing users to create diagrams from a plain text language. Besides various UML diagrams, PlantUML has support for various other software development related formats (such as Archimate, Block diagram, BPMN, C4, Computer network diagram, ERD, Gantt chart, Mind map, and WBD), as well as visualisation of JSON and YAML files. The language of PlantUML is an example of a domain-specific language. Besides its own DSL, PlantUML also understands AsciiMath, Creole, DOT, and LaTeX. It uses Graphviz software to lay out its diagrams and Tikz for LaTeX support. Images can be output as PNG, SVG, LaTeX and even ASCII art. PlantUML has also been used to allow blind people to design and read UML diagrams. == Applications that use PlantUML == There are various extensions or add-ons that incorporate PlantUML. Atom has a community maintained PlantUML syntax highlighter and viewer. Confluence wiki has a PlantUML plug-in for Confluence Server, which renders diagrams on-the-fly during a page reload. There is an additional PlantUML plug-in for Confluence Cloud. Doxygen integrates diagrams for which sources are provided after the startuml command. Eclipse has a PlantUML plug-in. Google Docs has an add-on called PlantUML Gizmo that works with the PlantUML.com server. IntelliJ IDEA can create and display diagrams embedded into Markdown (built-in) or in standalone files (using a plugin). LaTeX using the Tikz package has limited support for PlantUML. LibreOffice has Libo_PlantUML extension to use PlantUML diagrams. MediaWiki has a PlantUML plug-in which renders diagrams in pages as SVG or PNG. Microsoft Word can use PlantUML diagrams via a Word Template Add-in. There is an additional Visual Studio Tools for Office add-in called PlantUML Gizmo that works in a similar fashion. NetBeans has a PlantUML plug-in. Notepad++ has a PlantUML plug-in. Obsidian has a PlantUML plug-in. Org-mode has a PlantUML org-babel support. Rider has a PlantUML plug-in. Sublime Text has a PlantUML package called PlantUmlDiagrams for Sublime Text 2 and 3. Visual Studio Code has various PlantUML extensions on its marketplace, most popular being PlantUML by jebbs. Vnote open source notetaking markdown application has built in PlantUML support. Xcode has a community maintained Source Editor Extension to generate and view PlantUML class diagrams from Swift source code. == Text format to communicate UML at source code level == PlantUML uses well-formed and human-readable code to render the diagrams. There are other text formats for UML modelling, but PlantUML supports many diagram types, and does not need an explicit layout, though it is possible to tweak the diagrams if necessary. +--------------------------------------+ | TEDx Talks Recommendation | | System | +--------------------------------------+ | +----------------------------------+ | | | Visitor | | | +----------------------------------+ | | | + View Recommended Talks | | | | + Search Talks | | | +----------------------------------+ | +--------------------------------------+ | | V +--------------------------------------+ | Authenticated User | +--------------------------------------+ | +----------------------------------+ | | | User | | | +----------------------------------+ | | | + View Recommended Talks | | | | + Search Talks | | | | + Save Favorite Talks | | | +----------------------------------+ | +--------------------------------------+ | | V +--------------------------------------+ | Admin | +--------------------------------------+ | +----------------------------------+ | | | Admin | | | +----------------------------------+ | | | + CRUD Talks | | | | + Manage Users | | | +----------------------------------+ | +--------------------------------------+
(1+ε)-approximate nearest neighbor search
(1+ε)-approximate nearest neighbor search is a variant of the nearest neighbor search problem. A solution to the (1+ε)-approximate nearest neighbor search is a point or multiple points within distance (1+ε) R from a query point, where R is the distance between the query point and its true nearest neighbor. Reasons to approximate nearest neighbor search include the space and time costs of exact solutions in high-dimensional spaces (see curse of dimensionality) and that in some domains, finding an approximate nearest neighbor is an acceptable solution. Approaches for solving (1+ε)-approximate nearest neighbor search include k-d trees, locality-sensitive hashing and brute-force search.