In vector computer graphics, CAD systems, and geographic information systems, a geometric primitive (or prim) is the simplest (i.e. 'atomic' or irreducible) geometric shape that the system can handle (draw, store). Sometimes the subroutines that draw the corresponding objects are called "geometric primitives" as well. The most "primitive" primitives are point and straight line segments, which were all that early vector graphics systems had. In constructive solid geometry, primitives are simple geometric shapes such as a cube, cylinder, sphere, cone, pyramid, torus. Modern 2D computer graphics systems may operate with primitives which are curves (segments of straight lines, circles and more complicated curves), as well as shapes (boxes, arbitrary polygons, circles). A common set of two-dimensional primitives includes lines, points, and polygons, although some people prefer to consider triangles primitives, because every polygon can be constructed from triangles (polygon triangulation). All other graphic elements are built up from these primitives. In three dimensions, triangles or polygons positioned in three-dimensional space can be used as primitives to model more complex 3D forms. In some cases, curves (such as Bézier curves, circles, etc.) may be considered primitives; in other cases, curves are complex forms created from many straight, primitive shapes. == Common primitives == The set of geometric primitives is based on the dimension of the region being represented: Point (0-dimensional), a single location with no height, width, or depth. Line or curve (1-dimensional), having length but no width, although a linear feature may curve through a higher-dimensional space. Planar surface or curved surface (2-dimensional), having length and width. Volumetric region or solid (3-dimensional), having length, width, and depth. In GIS, the terrain surface is often spoken of colloquially as "2 1/2 dimensional," because only the upper surface needs to be represented. Thus, elevation can be conceptualized as a scalar field property or function of two-dimensional space, affording it a number of data modeling efficiencies over true 3-dimensional objects. A shape of any of these dimensions greater than zero consists of an infinite number of distinct points. Because digital systems are finite, only a sample set of the points in a shape can be stored. Thus, vector data structures typically represent geometric primitives using a strategic sample, organized in structures that facilitate the software interpolating the remainder of the shape at the time of analysis or display, using the algorithms of Computational geometry. A Point is a single coordinate in a Cartesian coordinate system. Some data models allow for Multipoint features consisting of several disconnected points. A Polygonal chain or Polyline is an ordered list of points (termed vertices in this context). The software is expected to interpolate the intervening shape of the line between adjacent points in the list as a parametric curve, most commonly a straight line, but other types of curves are frequently available, including circular arcs, cubic splines, and Bézier curves. Some of these curves require additional points to be defined that are not on the line itself, but are used for parametric control. A Polygon is a polyline that closes at its endpoints, representing the boundary of a two-dimensional region. The software is expected to use this boundary to partition 2-dimensional space into an interior and exterior. Some data models allow for a single feature to consist of multiple polylines, which could collectively connect to form a single closed boundary, could represent a set of disjoint regions (e.g., the state of Hawaii), or could represent a region with holes (e.g., a lake with an island). A Parametric shape is a standardized two-dimensional or three-dimensional shape defined by a minimal set of parameters, such as an ellipse defined by two points at its foci, or three points at its center, vertex, and co-vertex. A Polyhedron or Polygon mesh is a set of polygon faces in three-dimensional space that are connected at their edges to completely enclose a volumetric region. In some applications, closure may not be required or may be implied, such as modeling terrain. The software is expected to use this surface to partition 3-dimensional space into an interior and exterior. A triangle mesh is a subtype of polyhedron in which all faces must be triangles, the only polygon that will always be planar, including the Triangulated irregular network (TIN) commonly used in GIS. A parametric mesh represents a three-dimensional surface by a connected set of parametric functions, similar to a spline or Bézier curve in two dimensions. The most common structure is the Non-uniform rational B-spline (NURBS), supported by most CAD and animation software. == Application in GIS == A wide variety of vector data structures and formats have been developed during the history of Geographic information systems, but they share a fundamental basis of storing a core set of geometric primitives to represent the location and extent of geographic phenomena. Locations of points are almost always measured within a standard Earth-based coordinate system, whether the spherical Geographic coordinate system (latitude/longitude), or a planar coordinate system, such as the Universal Transverse Mercator. They also share the need to store a set of attributes of each geographic feature alongside its shape; traditionally, this has been accomplished using the data models, data formats, and even software of relational databases. Early vector formats, such as POLYVRT, the ARC/INFO Coverage, and the Esri shapefile support a basic set of geometric primitives: points, polylines, and polygons, only in two dimensional space and the latter two with only straight line interpolation. TIN data structures for representing terrain surfaces as triangle meshes were also added. Since the mid 1990s, new formats have been developed that extend the range of available primitives, generally standardized by the Open Geospatial Consortium's Simple Features specification. Common geometric primitive extensions include: three-dimensional coordinates for points, lines, and polygons; a fourth "dimension" to represent a measured attribute or time; curved segments in lines and polygons; text annotation as a form of geometry; and polygon meshes for three-dimensional objects. Frequently, a representation of the shape of a real-world phenomenon may have a different (usually lower) dimension than the phenomenon being represented. For example, a city (a two-dimensional region) may be represented as a point, or a road (a three-dimensional volume of material) may be represented as a line. This dimensional generalization correlates with tendencies in spatial cognition. For example, asking the distance between two cities presumes a conceptual model of the cities as points, while giving directions involving travel "up," "down," or "along" a road imply a one-dimensional conceptual model. This is frequently done for purposes of data efficiency, visual simplicity, or cognitive efficiency, and is acceptable if the distinction between the representation and the represented is understood, but can cause confusion if information users assume that the digital shape is a perfect representation of reality (i.e., believing that roads really are lines). == In 3D modelling == In CAD software or 3D modelling, the interface may present the user with the ability to create primitives which may be further modified by edits. For example, in the practice of box modelling the user will start with a cuboid, then use extrusion and other operations to create the model. In this use the primitive is just a convenient starting point, rather than the fundamental unit of modelling. A 3D package may also include a list of extended primitives which are more complex shapes that come with the package. For example, a teapot is listed as a primitive in 3D Studio Max. == In graphics hardware == Various graphics accelerators exist with hardware acceleration for rendering specific primitives such as lines or triangles, frequently with texture mapping and shaders. Modern 3D accelerators typically accept sequences of triangles as triangle strips.
Microsoft Teams
Microsoft Teams is a team collaboration platform developed by Microsoft as part of the Microsoft 365 suite. It offers features such as workspace chat, video conferencing, file storage, and integration with both Microsoft and third-party applications and services. Teams gradually replaced earlier Microsoft messaging and collaboration platforms, including Skype for Business, Skype, Flip, and Microsoft Classroom. The platform saw significant growth during the COVID-19 pandemic, alongside competitors such as Zoom, Slack, and Google Meet, as organizations shifted to remote work and virtual meetings. As of January 2023, Microsoft reported approximately 280 million monthly active users. == History == On August 29, 2007, Microsoft acquired Parlano, the developer of the persistent group chat tool MindAlign. Years later, on March 4, 2016, Microsoft considered acquiring Slack for $8 billion. However, the proposal was reportedly opposed by Bill Gates, who advocated for focusing on enhancing Skype for Business instead. Lu Qi, then executive vice president of Applications and Services, had led the initiative to pursue the Slack acquisition. Following Lu's departure later that year, Microsoft announced Microsoft Teams on November 2, 2016, at an event in New York City, positioning it as a direct competitor to Slack. Teams launched worldwide on March 14, 2017. The service was initially led by corporate vice president Brian MacDonald. In response to the launch, Slack published a full-page advertisement in The New York Times welcoming the competition and outlining its product philosophy. Although Slack was used by 28 companies in the Fortune 100, The Verge wrote that executives would question paying for the service if Teams provides a similar function in their company's existing Office 365 subscription. However, ZDNET noted that the platforms initially served different markets, as Teams did not support external users, making it less appealing to small businesses and freelancers, a limitation Microsoft later addressed. In response to Teams' announcement, Slack deepened in-product integration with Google services. In May 2017, Microsoft announced that Teams would replace Microsoft Classroom in Office 365 Education. A free version of Teams was released on July 12, 2018, offering most core features at no cost, albeit with limits on users and storage. In January 2019, Microsoft introduced updates targeting "Firstline Workers" to improve Teams’ performance across shared or limited-access devices. In September 2019, Microsoft announced the retirement of Skype for Business in favor of Teams, which took effect on July 31, 2021. In early 2020, Microsoft introduced a push-to-talk "Walkie Talkie" feature aimed at firstline workers using smartphones and tablets over Wi-Fi or cellular networks. The COVID-19 pandemic significantly boosted usage of Teams. On March 19, 2020, Microsoft reported 44 million daily active users. In April, the platform logged 4.1 billion meeting minutes in a single day. A public preview of Microsoft Teams for Linux was released in December 2019, but the Linux client was discontinued in 2022. In July 2020, Microsoft shut down its video game livestreaming platform Mixer, and announced that some of its technologies would be repurposed for use in Teams. On February 28, 2025, Microsoft announced that Skype would be fully retired on May 5, 2025, with users given options to export their data or transition to Microsoft Teams. In October 2025, together with other Microsoft 365 suite apps, Teams had its logo updated. == Usage == == Underlying software == Microsoft Teams, as part of the Microsoft 365 suite, utilizes SharePoint and Exchange Online. Each Team, Shared Channel, and Private Channel has its own Microsoft 365 Group and SharePoint Site used for file storage. Messages are stored in Cosmos DB and are journaled to Exchange Online mailboxes. Private messages, including messages in Private Channels, are journaled to the sender and recipients' mailboxes. Public Channel messages are journaled to their corresponding Team's group mailbox, whereas, messages from Shared Channels are journaled to their own mailboxes. Contacts and voicemail are stored in Exchange Online. Microsoft Teams client is a web-based desktop app, originally developed on top of the Electron framework which combines the Chromium rendering engine and the Node.js JavaScript platform. Version 2.0 client was rebuilt using the Evergreen version of Microsoft Edge WebView2 in place of Electron. == Features == === Chats === Teams allows users to communicate in two-way persistent chats with one or multiple participants. Participants can message using text, emojis, stickers and gifs, as well as sharing links and files. In August 2022, the chat feature was updated for "chat with yourself"; allowing for the organization of files, notes, comments, images, and videos within a private chat tab. === Teams === Teams allows communities, groups, or teams to contribute in a shared workspace where messages and digital content on a specific topic are shared. Team members can join through an invitation sent by a team administrator or owner or sharing of a specific URL. Teams for Education allows admins and teachers to set up groups for classes, professional learning communities (PLCs), staff members, and everyone. === Channels === Channels allow team members to communicate without the use of email or group SMS (texting). Users can reply to posts with text, images, GIFs, and image macros. Direct messages send private messages to designated users rather than the entire channel. Connectors can be used within a channel to submit information contacted through a third-party service. Connectors include Mailchimp, Facebook Pages, Twitter, Power BI and Bing News. === Group conversations === Ad-hoc groups can be created to share instant messaging, audio calls (VoIP), and video calls inside the client software. === Telephone replacement === A feature on one of the higher cost licencing tiers allows connectivity to the public switched telephone network (PSTN) telephone system. This allows users to use Teams as if it were a telephone, making and receiving calls over the PSTN, including the ability to host "conference calls" with multiple participants. === Meeting === Meetings can be scheduled with multiple participants able to share audio, video, chat and presented content with all participants. Multiple users can connect via a meeting link. Automated minutes are possible using the recording and transcript features. Teams has a plugin for Microsoft Outlook to schedule a Teams Meeting in Outlook for a specific date and time and invite others to attend. If a meeting is scheduled within a channel, users visiting the channel are able to see if a meeting is in progress. ==== Teams Live Events ==== Teams Live Events replaces Skype Meeting Broadcast for users to broadcast to 10,000 participants on Teams, Yammer, or Microsoft Stream. ==== Breakout Rooms ==== Breakout rooms split a meeting into small groups. This is often utilized for collaboration during trainings or any environment where having all participants speak at once could be disruptive or unfeasible. Breakout rooms can be set by the hosts to a certain length of time, after which all participants will automatically rejoin the main meeting room. ==== Front Row ==== Front Row adjusts the layout of the viewer's screen, placing the speaker or content in the center of the gallery with other meeting participant's video feeds reduced in size and located below the speaker. === Education === Microsoft Teams for Education allows teachers to distribute, provide feedback, and grade student assignments turned in via Teams using the Assignments tab through Office 365 for Education subscribers. Quizzes can also be assigned to students through an integration with Office Forms. === Protocols === Microsoft Teams is based on a number of Microsoft-specific protocols. Video conferences are realized over the protocol MNP24, known from the Skype consumer version. VoIP and video conference clients based on SIP and H.323 need special gateways to connect to Microsoft Teams servers. With the help of Interactive Connectivity Establishment (ICE), clients behind Network address translation routers and restrictive firewalls are also able to connect, if peer-to-peer is not possible. === Integrations === Microsoft Teams has integrations through Microsoft AppSource, its integration marketplace. In 2020, Microsoft partnered with KUDO, a cloud-based solution with language interpretation, to allow integrated language meeting controls. In June 2022, an update was released using AI to improve call audio through the elimination of background feedback loops and cancelling non-vocal audio. == Anti-trust controversy == In July 2023, the European Commission opened an anti-trust investigation into the possibility that Microsoft unfairly used its office suite market power to increase sales of Teams and hurt
IDistance
In pattern recognition, iDistance is an indexing and query processing technique for k-nearest neighbor queries on point data in multi-dimensional metric spaces. The kNN query is one of the hardest problems on multi-dimensional data, especially when the dimensionality of the data is high. iDistance is designed to process kNN queries in high-dimensional spaces efficiently and performs extremely well for skewed data distributions, which usually occur in real-life data sets. iDistance employs a two-phase search strategy involving an initial filtering of candidate regions and a subsequent refinement of results, an approach aligned with the Filter and Refine Principle (FRP). This means that the index first prunes the search space to eliminate unlikely candidates, then verifies the true nearest neighbors in a refinement step, following the general FRP paradigm used in database search algorithms. The iDistance index can also be augmented with machine learning models to learn data distributions for improved searching and storage of multi-dimensional data. == Indexing == Building the iDistance index has two steps: A number of reference points in the data space are chosen. There are various ways of choosing reference points. Using cluster centers as reference points is the most efficient way. The data points are partitioned into Voronoi cells based on well-chosen reference points. The distance between a data point and its closest reference point is calculated. This distance plus a scaling value is called the point's iDistance. By this means, points in a multi-dimensional space are mapped to one-dimensional values, and then a B+-tree can be adopted to index the points using the iDistance as the key. The figure on the right shows an example where three reference points (O1, O2, O3) are chosen. The data points are then mapped to a one-dimensional space and indexed in a B+-tree. Various extensions have been proposed to make the selection of reference points for effective query performance, including employing machine learning to learn the identification of reference points. == Query processing == To process a kNN query, the query is mapped to a number of one-dimensional range queries, which can be processed efficiently on a B+-tree. In the above figure, the query Q is mapped to a value in the B+-tree while the kNN search ``sphere" is mapped to a range in the B+-tree. The search sphere expands gradually until the k NNs are found. This corresponds to gradually expanding range searches in the B+-tree. The iDistance technique can be viewed as a way of accelerating the sequential scan. Instead of scanning records from the beginning to the end of the data file, the iDistance starts the scan from spots where the nearest neighbors can be obtained early with a very high probability. == Applications == The iDistance has been used in many applications including Image retrieval Video indexing Similarity search in P2P systems Mobile computing Recommender system == Historical background == The iDistance was first proposed by Cui Yu, Beng Chin Ooi, Kian-Lee Tan and H. V. Jagadish in 2001. Later, together with Rui Zhang, they improved the technique and performed a more comprehensive study on it in 2005.
Common Voice
Common Voice is a crowdsourcing project started by Mozilla to create a free and open speech corpus. The project is supported by volunteers who record sample sentences with a microphone and review recordings of other users. The transcribed sentences are collected in a voice database available under the public domain license CC0. This license ensures that developers can use the database for voice-to-text and text-to-voice applications without restrictions or costs. == Aims == Common Voice aims to provide diverse voice samples. According to Mozilla's Katharina Borchert, many existing projects took datasets from public radio or otherwise had datasets that underrepresented both women and people with pronounced accents. == Voice database == The first dataset was released in November 2017. More than 20,000 users worldwide had recorded 500 hours of English sentences. In February 2019, the first batch of languages was released for use. This included 18 languages such as English, French, German and Mandarin Chinese, but also less prevalent languages like Welsh and Kabyle. In total, this included almost 1,400 hours of recorded voice data from more than 42,000 contributors. By July 2020 the database had amassed 7,226 hours of voice recordings in 54 languages, 5,591 hours of which had been verified by volunteers. In May 2021, following the work to add Kinyarwanda, the project received a grant to add Kiswahili. At the beginning of 2022, Bengali.AI partnered with Common Voice to launch the "Bangla Speech Recognition" project that aims to make machines understand the Bangla language. 2000 hours of voice was collected. In September 2022, it was announced that the Twi language of Ghana was the 100th language to be added to the database. As of December 2025, Mozilla Common Voice collects voice data for over 250 languages, with the most hours having been collected in English, Catalan, Kinyarwanda, Belarusian and Esperanto.
Random forest
Random forests or random decision forests is an ensemble learning method for classification, regression and other tasks that works by creating a multitude of decision trees during training. For classification tasks, the output of the random forest is the class selected by most trees. For regression tasks, the output is the average of the predictions of the trees. Random forests correct for decision trees' habit of overfitting to their training set. The first algorithm for random decision forests was created in 1995 by Tin Kam Ho using the random subspace method, which, in Ho's formulation, is a way to implement the "stochastic discrimination" approach to classification proposed by Eugene Kleinberg. An extension of the algorithm was developed by Leo Breiman and Adele Cutler, who registered "Random Forests" as a trademark in 2006 (as of 2019, owned by Minitab, Inc.). The extension combines Breiman's "bagging" idea and random selection of features, introduced first by Ho and later independently by Amit and Geman in order to construct a collection of decision trees with controlled variance. == History == The general method of random decision forests was first proposed by Salzberg and Heath in 1993, with a method that used a randomized decision tree algorithm to create multiple trees and then combine them using majority voting. This idea was developed further by Ho in 1995. Ho established that forests of trees splitting with oblique hyperplanes can gain accuracy as they grow without suffering from overtraining, as long as the forests are randomly restricted to be sensitive to only selected feature dimensions. A subsequent work along the same lines concluded that other splitting methods behave similarly, as long as they are randomly forced to be insensitive to some feature dimensions. This observation that a more complex classifier (a larger forest) gets more accurate nearly monotonically is in sharp contrast to the common belief that the complexity of a classifier can only grow to a certain level of accuracy before being hurt by overfitting. The explanation of the forest method's resistance to overtraining can be found in Kleinberg's theory of stochastic discrimination. The early development of Breiman's notion of random forests was influenced by the work of Amit and Geman who introduced the idea of searching over a random subset of the available decisions when splitting a node, in the context of growing a single tree. The idea of random subspace selection from Ho was also influential in the design of random forests. This method grows a forest of trees, and introduces variation among the trees by projecting the training data into a randomly chosen subspace before fitting each tree or each node. Finally, the idea of randomized node optimization, where the decision at each node is selected by a randomized procedure, rather than a deterministic optimization was first introduced by Thomas G. Dietterich. The proper introduction of random forests was made in a paper by Leo Breiman, that has become one of the world's most cited papers. This paper describes a method of building a forest of uncorrelated trees using a CART like procedure, combined with randomized node optimization and bagging. In addition, this paper combines several ingredients, some previously known and some novel, which form the basis of the modern practice of random forests, in particular: Using out-of-bag error as an estimate of the generalization error. Measuring variable importance through permutation. The report also offers the first theoretical result for random forests in the form of a bound on the generalization error which depends on the strength of the trees in the forest and their correlation. == Algorithm == === Preliminaries: decision tree learning === Decision trees are a popular method for various machine learning tasks. Tree learning is almost "an off-the-shelf procedure for data mining", say Hastie et al., "because it is invariant under scaling and various other transformations of feature values, is robust to inclusion of irrelevant features, and produces inspectable models. However, they are seldom accurate". In particular, trees that are grown very deep tend to learn highly irregular patterns: they overfit their training sets, i.e. have low bias, but very high variance. Random forests are a way of averaging multiple deep decision trees, trained on different parts of the same training set, with the goal of reducing the variance. This comes at the expense of a small increase in the bias and some loss of interpretability, but generally greatly boosts the performance in the final model. === Bagging === The training algorithm for random forests applies the general technique of bootstrap aggregating, or bagging, to tree learners. Given a training set X = x1, ..., xn with responses Y = y1, ..., yn, bagging repeatedly (B times) selects a random sample with replacement of the training set and fits trees to these samples: After training, predictions for unseen samples x' can be made by averaging the predictions from all the individual regression trees on x': f ^ = 1 B ∑ b = 1 B f b ( x ′ ) {\displaystyle {\hat {f}}={\frac {1}{B}}\sum _{b=1}^{B}f_{b}(x')} or by taking the plurality vote in the case of classification trees. This bootstrapping procedure leads to better model performance because it decreases the variance of the model, without increasing the bias. This means that while the predictions of a single tree are highly sensitive to noise in its training set, the average of many trees is not, as long as the trees are not correlated. Simply training many trees on a single training set would give strongly correlated trees (or even the same tree many times, if the training algorithm is deterministic); bootstrap sampling is a way of de-correlating the trees by showing them different training sets. Additionally, an estimate of the uncertainty of the prediction can be made as the standard deviation of the predictions from all the individual regression trees on x′: σ = ∑ b = 1 B ( f b ( x ′ ) − f ^ ) 2 B − 1 . {\displaystyle \sigma ={\sqrt {\frac {\sum _{b=1}^{B}(f_{b}(x')-{\hat {f}})^{2}}{B-1}}}.} The number B of samples (equivalently, of trees) is a free parameter. Typically, a few hundred to several thousand trees are used, depending on the size and nature of the training set. B can be optimized using cross-validation, or by observing the out-of-bag error: the mean prediction error on each training sample xi, using only the trees that did not have xi in their bootstrap sample. The training and test error tend to level off after some number of trees have been fit. === From bagging to random forests === The above procedure describes the original bagging algorithm for trees. Random forests also include another type of bagging scheme: they use a modified tree learning algorithm that selects, at each candidate split in the learning process, a random subset of the features. This process is sometimes called "feature bagging". The reason for doing this is the correlation of the trees in an ordinary bootstrap sample: if one or a few features are very strong predictors for the response variable (target output), these features will be selected in many of the B trees, causing them to become correlated. An analysis of how bagging and random subspace projection contribute to accuracy gains under different conditions is given by Ho. Typically, for a classification problem with p {\displaystyle p} features, p {\displaystyle {\sqrt {p}}} (rounded down) features are used in each split. For regression problems the inventors recommend p / 3 {\displaystyle p/3} (rounded down) with a minimum node size of 5 as the default. In practice, the best values for these parameters should be tuned on a case-to-case basis for every problem. === ExtraTrees === Adding one further step of randomization yields extremely randomized trees, or ExtraTrees. As with ordinary random forests, they are an ensemble of individual trees, but there are two main differences: (1) each tree is trained using the whole learning sample (rather than a bootstrap sample), and (2) the top-down splitting is randomized: for each feature under consideration, a number of random cut-points are selected, instead of computing the locally optimal cut-point (based on, e.g., information gain or the Gini impurity). The values are chosen from a uniform distribution within the feature's empirical range (in the tree's training set). Then, of all the randomly chosen splits, the split that yields the highest score is chosen to split the node. Similar to ordinary random forests, the number of randomly selected features to be considered at each node can be specified. Default values for this parameter are p {\displaystyle {\sqrt {p}}} for classification and p {\displaystyle p} for regression, where p {\displaystyle p} is the number of features in the model. === Random forests for high-dimensional data === The basic random forest procedure may
Real-Time UML
Real-Time UML (RTUML) refers to the application of the Unified Modelling Language (UML) for the analysis, design, and implementation of real-time and embedded systems, where timing constraints, concurrency, and resource management are critical. It extends standard UML with profiles, notations, and semantics to handle hard and soft real-time requirements, such as modelling predictable response times and fault tolerance. RTUML is not a separate language but a methodology leveraging UML diagrams (e.g., statecharts, sequence diagrams) for time-sensitive applications like automotive controls, avionics, and medical devices. The term is closely associated with Bruce Powel Douglass, who popularised it through his books and the Harmony process for embedded software development. As of 2025, RTUML remains relevant in industries requiring certified systems, though its adoption varies with agile methodologies and model-driven engineering tools. == Background == Real-Time UML emerged in the late 1990s as UML was standardized by the Object Management Group (OMG) in 1997, addressing the need for object-oriented modeling in real-time systems previously dominated by procedural languages like C. Traditional real-time development relied on "bare metal" programming or theoretical models, but RTUML introduced visual notations for object structure, behaviour, and timing. Bruce Powel Douglass’s 1999 book, Real-Time UML: Developing Efficient Objects for Embedded Systems, formalised the approach, emphasising statecharts for concurrency and timing constraints. Later editions (2004, 2006) incorporated UML 2.0 features like activity and timing diagrams, aligning with OMG’s Real-Time Profile (now part of MARTE—Modelling and Analysis of Real-Time and Embedded Systems). The Harmony process integrates RTUML with executable models for simulation and code generation. RTUML addresses hard real-time systems (e.g., strict deadlines in avionics) versus soft real-time (e.g., media streaming), using UML extensions for schedulability analysis. == Key concepts == RTUML adapts UML diagrams and techniques for real-time needs: Statecharts and Behaviour Modelling: Extended state machines model reactive behaviour, using and-states for concurrency, pseudostates for transitions, and timing constraints (e.g., {duration < 10ms}). Examples include cardiac pacemaker models. Sequence and Interaction Diagrams: Capture message timing, priorities, and resource allocation in multi-threaded systems. Architectural Patterns: Define logical and physical architectures with active objects for concurrency and patterns like observer or publisher-subscriber. Timing and Constraints: Use Object Constraint Language (OCL) for specifying deadlines and priorities. Profiles and Extensions: OMG’s UML Profile for Schedulability, Performance, and Time (SPT) and MARTE add stereotypes like RT::ActiveObject. These support iterative development, from requirements to deployment, often with tools like IBM Rhapsody or Enterprise Architect. == Applications == RTUML is used in: Embedded Systems: Modelling automotive ECUs or UAV controls. Avionics and Defence: DO-178C-compliant designs for fault tolerance. Medical Devices: Pacemakers or ventilators with precise timing. Industrial Automation: RTOS task visualisation via sequence diagrams. Tools like IBM Rhapsody support RTUML for model-based development and code generation in C/C++. == Criticism and adoption == RTUML’s complexity can overwhelm simple systems, and its use in agile environments is limited, where lightweight diagrams are preferred. Surveys indicate UML (including RTUML) is used in 30–50% of embedded projects, often for documentation rather than full model-driven engineering. It remains standard in academia and certified industries like aerospace.
Stochastic block model
The stochastic block model is a generative model for random graphs. This model tends to produce graphs containing communities, subsets of nodes characterized by being connected with one another with particular edge densities. For example, edges may be more common within communities than between communities. Its mathematical formulation was first introduced in 1983 in the field of social network analysis by Paul W. Holland et al. The stochastic block model is important in statistics, machine learning, and network science, where it serves as a useful benchmark for the task of recovering community structure in graph data. == Definition == The stochastic block model takes the following parameters: The number n {\displaystyle n} of vertices; a partition of the vertex set { 1 , … , n } {\displaystyle \{1,\ldots ,n\}} into disjoint subsets C 1 , … , C r {\displaystyle C_{1},\ldots ,C_{r}} , called communities; a symmetric r × r {\displaystyle r\times r} matrix P {\displaystyle P} of edge probabilities. The edge set is then sampled at random as follows: any two vertices u ∈ C i {\displaystyle u\in C_{i}} and v ∈ C j {\displaystyle v\in C_{j}} are connected by an edge with probability P i j {\displaystyle P_{ij}} . An example problem is: given a graph with n {\displaystyle n} vertices, where the edges are sampled as described, recover the groups C 1 , … , C r {\displaystyle C_{1},\ldots ,C_{r}} . == Special cases == If the probability matrix is a constant, in the sense that P i j = p {\displaystyle P_{ij}=p} for all i , j {\displaystyle i,j} , then the result is the Erdős–Rényi model G ( n , p ) {\displaystyle G(n,p)} . This case is degenerate—the partition into communities becomes irrelevant—but it illustrates a close relationship to the Erdős–Rényi model. The planted partition model is the special case that the values of the probability matrix P {\displaystyle P} are a constant p {\displaystyle p} on the diagonal and another constant q {\displaystyle q} off the diagonal. Thus two vertices within the same community share an edge with probability p {\displaystyle p} , while two vertices in different communities share an edge with probability q {\displaystyle q} . Sometimes it is this restricted model that is called the stochastic block model. The case where p > q {\displaystyle p>q} is called an assortative model, while the case p < q {\displaystyle p P j k {\displaystyle P_{ii}>P_{jk}} whenever j ≠ k {\displaystyle j\neq k} : all diagonal entries dominate all off-diagonal entries. A model is called weakly assortative if P i i > P i j {\displaystyle P_{ii}>P_{ij}} whenever i ≠ j {\displaystyle i\neq j} : each diagonal entry is only required to dominate the rest of its own row and column. Disassortative forms of this terminology exist, by reversing all inequalities. For some algorithms, recovery might be easier for block models with assortative or disassortative conditions of this form. == Typical statistical tasks == Much of the literature on algorithmic community detection addresses three statistical tasks: detection, partial recovery, and exact recovery. === Detection === The goal of detection algorithms is simply to determine, given a sampled graph, whether the graph has latent community structure. More precisely, a graph might be generated, with some known prior probability, from a known stochastic block model, and otherwise from a similar Erdos-Renyi model. The algorithmic task is to correctly identify which of these two underlying models generated the graph. === Partial recovery === In partial recovery, the goal is to approximately determine the latent partition into communities, in the sense of finding a partition that is correlated with the true partition significantly better than a random guess. === Exact recovery === In exact recovery, the goal is to recover the latent partition into communities exactly. The community sizes and probability matrix may be known or unknown. == Statistical lower bounds and threshold behavior == Stochastic block models exhibit a sharp threshold effect reminiscent of percolation thresholds. Suppose that we allow the size n {\displaystyle n} of the graph to grow, keeping the community sizes in fixed proportions. If the probability matrix remains fixed, tasks such as partial and exact recovery become feasible for all non-degenerate parameter settings. However, if we scale down the probability matrix at a suitable rate as n {\displaystyle n} increases, we observe a sharp phase transition: for certain settings of the parameters, it will become possible to achieve recovery with probability tending to 1, whereas on the opposite side of the parameter threshold, the probability of recovery tends to 0 no matter what algorithm is used. For partial recovery, the appropriate scaling is to take P i j = P ~ i j / n {\displaystyle P_{ij}={\tilde {P}}_{ij}/n} for fixed P ~ {\displaystyle {\tilde {P}}} , resulting in graphs of constant average degree. In the case of two equal-sized communities, in the assortative planted partition model with probability matrix P = ( p ~ / n q ~ / n q ~ / n p ~ / n ) , {\displaystyle P=\left({\begin{array}{cc}{\tilde {p}}/n&{\tilde {q}}/n\\{\tilde {q}}/n&{\tilde {p}}/n\end{array}}\right),} partial recovery is feasible with probability 1 − o ( 1 ) {\displaystyle 1-o(1)} whenever ( p ~ − q ~ ) 2 > 2 ( p ~ + q ~ ) {\displaystyle ({\tilde {p}}-{\tilde {q}})^{2}>2({\tilde {p}}+{\tilde {q}})} , whereas any estimator fails partial recovery with probability 1 − o ( 1 ) {\displaystyle 1-o(1)} whenever ( p ~ − q ~ ) 2 < 2 ( p ~ + q ~ ) {\displaystyle ({\tilde {p}}-{\tilde {q}})^{2}<2({\tilde {p}}+{\tilde {q}})} . For exact recovery, the appropriate scaling is to take P i j = P ~ i j log n / n {\displaystyle P_{ij}={\tilde {P}}_{ij}\log n/n} , resulting in graphs of logarithmic average degree. Here a similar threshold exists: for the assortative planted partition model with r {\displaystyle r} equal-sized communities, the threshold lies at p ~ − q ~ = r {\displaystyle {\sqrt {\tilde {p}}}-{\sqrt {\tilde {q}}}={\sqrt {r}}} . In fact, the exact recovery threshold is known for the fully general stochastic block model. == Algorithms == In principle, exact recovery can be solved in its feasible range using maximum likelihood, but this amounts to solving a constrained or regularized cut problem such as minimum bisection that is typically NP-complete. Hence, no known efficient algorithms will correctly compute the maximum-likelihood estimate in the worst case. However, a wide variety of algorithms perform well in the average case, and many high-probability performance guarantees have been proven for algorithms in both the partial and exact recovery settings. Successful algorithms include spectral clustering of the vertices, semidefinite programming, forms of belief propagation, and community detection among others. == Variants == Several variants of the model exist. One minor tweak allocates vertices to communities randomly, according to a categorical distribution, rather than in a fixed partition. More significant variants include the degree-corrected stochastic block model, the hierarchical stochastic block model, the geometric block model, censored block model and the mixed-membership block model. == Topic models == Stochastic block model have been recognised to be a topic model on bipartite networks. In a network of documents and words, Stochastic block model can identify topics: group of words with a similar meaning. == Extensions to signed graphs == Signed graphs allow for both favorable and adverse relationships and serve as a common model choice for various data analysis applications, e.g., correlation clustering. The stochastic block model can be trivially extended to signed graphs by assigning both positive and negative edge weights or equivalently using a difference of adjacency matrices of two stochastic block models. == DARPA/MIT/AWS Graph Challenge: streaming stochastic block partition == GraphChallenge encourages community approaches to developing new solutions for analyzing graphs and sparse data derived from social media, sensor feeds, and scientific data to enable relationships between events to be discovered as they unfold in the field. Streaming stochastic block partition is one of the challenges since 2017. Spectral clustering has demonstrated outstanding performance compared to the original and even improved base algorithm, matching its quality of clusters while being multiple orders of magnitude faster.