AI Detector Jobs

AI Detector Jobs — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Viola–Jones object detection framework

    Viola–Jones object detection framework

    The Viola–Jones object detection framework is a machine learning object detection framework proposed in 2001 by Paul Viola and Michael Jones. It was motivated primarily by the problem of face detection, although it can be adapted to the detection of other object classes. In short, it consists of a sequence of classifiers. Each classifier is a single perceptron with several binary masks (Haar features). To detect faces in an image, a sliding window is computed over the image. For each image, the classifiers are applied. If at any point, a classifier outputs "no face detected", then the window is considered to contain no face. Otherwise, if all classifiers output "face detected", then the window is considered to contain a face. The algorithm is efficient for its time, able to detect faces in 384 by 288 pixel images at 15 frames per second on a conventional 700 MHz Intel Pentium III. It is also robust, achieving high precision and recall. While it has lower accuracy than more modern methods such as convolutional neural network, its efficiency and compact size (only around 50k parameters, compared to millions of parameters for typical CNN like DeepFace) means it is still used in cases with limited computational power. For example, in the original paper, they reported that this face detector could run on the Compaq iPAQ at 2 fps (this device has a low power StrongARM without floating point hardware). == Problem description == Face detection is a binary classification problem combined with a localization problem: given a picture, decide whether it contains faces, and construct bounding boxes for the faces. To make the task more manageable, the Viola–Jones algorithm only detects full view (no occlusion), frontal (no head-turning), upright (no rotation), well-lit, full-sized (occupying most of the frame) faces in fixed-resolution images. The restrictions are not as severe as they appear, as one can normalize the picture to bring it closer to the requirements for Viola-Jones. any image can be scaled to a fixed resolution for a general picture with a face of unknown size and orientation, one can perform blob detection to discover potential faces, then scale and rotate them into the upright, full-sized position. the brightness of the image can be corrected by white balancing. the bounding boxes can be found by sliding a window across the entire picture, and marking down every window that contains a face. This would generally detect the same face multiple times, for which duplication removal methods, such as non-maximal suppression, can be used. The "frontal" requirement is non-negotiable, as there is no simple transformation on the image that can turn a face from a side view to a frontal view. However, one can train multiple Viola-Jones classifiers, one for each angle: one for frontal view, one for 3/4 view, one for profile view, a few more for the angles in-between them. Then one can at run time execute all these classifiers in parallel to detect faces at different view angles. The "full-view" requirement is also non-negotiable, and cannot be simply dealt with by training more Viola-Jones classifiers, since there are too many possible ways to occlude a face. == Components of the framework == A full presentation of the algorithm is in. Consider an image I ( x , y ) {\displaystyle I(x,y)} of fixed resolution ( M , N ) {\displaystyle (M,N)} . Our task is to make a binary decision: whether it is a photo of a standardized face (frontal, well-lit, etc) or not. Viola–Jones is essentially a boosted feature learning algorithm, trained by running a modified AdaBoost algorithm on Haar feature classifiers to find a sequence of classifiers f 1 , f 2 , . . . , f k {\displaystyle f_{1},f_{2},...,f_{k}} . Haar feature classifiers are crude, but allows very fast computation, and the modified AdaBoost constructs a strong classifier out of many weak ones. At run time, a given image I {\displaystyle I} is tested on f 1 ( I ) , f 2 ( I ) , . . . f k ( I ) {\displaystyle f_{1}(I),f_{2}(I),...f_{k}(I)} sequentially. If at any point, f i ( I ) = 0 {\displaystyle f_{i}(I)=0} , the algorithm immediately returns "no face detected". If all classifiers return 1, then the algorithm returns "face detected". For this reason, the Viola-Jones classifier is also called "Haar cascade classifier". === Haar feature classifiers === Consider a perceptron f w , b {\displaystyle f_{w,b}} defined by two variables w ( x , y ) , b {\displaystyle w(x,y),b} . It takes in an image I ( x , y ) {\displaystyle I(x,y)} of fixed resolution, and returns f w , b ( I ) = { 1 , if ∑ x , y w ( x , y ) I ( x , y ) + b > 0 0 , else {\displaystyle f_{w,b}(I)={\begin{cases}1,\quad {\text{if }}\sum _{x,y}w(x,y)I(x,y)+b>0\\0,\quad {\text{else}}\end{cases}}} A Haar feature classifier is a perceptron f w , b {\displaystyle f_{w,b}} with a very special kind of w {\displaystyle w} that makes it extremely cheap to calculate. Namely, if we write out the matrix w ( x , y ) {\displaystyle w(x,y)} , we find that it takes only three possible values { + 1 , − 1 , 0 } {\displaystyle \{+1,-1,0\}} , and if we color the matrix with white on + 1 {\displaystyle +1} , black on − 1 {\displaystyle -1} , and transparent on 0 {\displaystyle 0} , the matrix is in one of the 5 possible patterns shown on the right. Each pattern must also be symmetric to x-reflection and y-reflection (ignoring the color change), so for example, for the horizontal white-black feature, the two rectangles must be of the same width. For the vertical white-black-white feature, the white rectangles must be of the same height, but there is no restriction on the black rectangle's height. ==== Rationale for Haar features ==== The Haar features used in the Viola-Jones algorithm are a subset of the more general Haar basis functions, which have been used previously in the realm of image-based object detection. While crude compared to alternatives such as steerable filters, Haar features are sufficiently complex to match features of typical human faces. For example: The eye region is darker than the upper-cheeks. The nose bridge region is brighter than the eyes. Composition of properties forming matchable facial features: Location and size: eyes, mouth, bridge of nose Value: oriented gradients of pixel intensities Further, the design of Haar features allows for efficient computation of f w , b ( I ) {\displaystyle f_{w,b}(I)} using only constant number of additions and subtractions, regardless of the size of the rectangular features, using the summed-area table. === Learning and using a Viola–Jones classifier === Choose a resolution ( M , N ) {\displaystyle (M,N)} for the images to be classified. In the original paper, they recommended ( M , N ) = ( 24 , 24 ) {\displaystyle (M,N)=(24,24)} . ==== Learning ==== Collect a training set, with some containing faces, and others not containing faces. Perform a certain modified AdaBoost training on the set of all Haar feature classifiers of dimension ( M , N ) {\displaystyle (M,N)} , until a desired level of precision and recall is reached. The modified AdaBoost algorithm would output a sequence of Haar feature classifiers f 1 , f 2 , . . . , f k {\displaystyle f_{1},f_{2},...,f_{k}} . The details of the modified AdaBoost algorithm is detailed below. ==== Using ==== To use a Viola-Jones classifier with f 1 , f 2 , . . . , f k {\displaystyle f_{1},f_{2},...,f_{k}} on an image I {\displaystyle I} , compute f 1 ( I ) , f 2 ( I ) , . . . f k ( I ) {\displaystyle f_{1}(I),f_{2}(I),...f_{k}(I)} sequentially. If at any point, f i ( I ) = 0 {\displaystyle f_{i}(I)=0} , the algorithm immediately returns "no face detected". If all classifiers return 1, then the algorithm returns "face detected". === Learning algorithm === The speed with which features may be evaluated does not adequately compensate for their number, however. For example, in a standard 24x24 pixel sub-window, there are a total of M = 162336 possible features, and it would be prohibitively expensive to evaluate them all when testing an image. Thus, the object detection framework employs a variant of the learning algorithm AdaBoost to both select the best features and to train classifiers that use them. This algorithm constructs a "strong" classifier as a linear combination of weighted simple “weak” classifiers. h ( x ) = sgn ⁡ ( ∑ j = 1 M α j h j ( x ) ) {\displaystyle h(\mathbf {x} )=\operatorname {sgn} \left(\sum _{j=1}^{M}\alpha _{j}h_{j}(\mathbf {x} )\right)} Each weak classifier is a threshold function based on the feature f j {\displaystyle f_{j}} . h j ( x ) = { − s j if f j < θ j s j otherwise {\displaystyle h_{j}(\mathbf {x} )={\begin{cases}-s_{j}&{\text{if }}f_{j}<\theta _{j}\\s_{j}&{\text{otherwise}}\end{cases}}} The threshold value θ j {\displaystyle \theta _{j}} and the polarity s j ∈ ± 1 {\displaystyle s_{j}\in \pm 1} are determined in the training, as well as the coefficients α j {\displaystyle \alpha _{j}} . Here a simplified version of the lea

    Read more →
  • Order-independent transparency

    Order-independent transparency

    Order-independent transparency (OIT) is a class of techniques in rasterisational computer graphics for rendering transparency in a 3D scene, which do not require rendering geometry in sorted order for alpha compositing. == Description == Commonly, 3D geometry with transparency is rendered by blending (using alpha compositing) all surfaces into a single buffer (think of this as a canvas). Each surface occludes existing color and adds some of its own color depending on its alpha value, a ratio of light transmittance. The order in which surfaces are blended affects the total occlusion or visibility of each surface. For a correct result, surfaces must be blended from farthest to nearest or nearest to farthest, depending on the alpha compositing operation, over or under. Ordering may be achieved by rendering the geometry in sorted order, for example sorting triangles by depth, but can take a significant amount of time, not always produce a solution (in the case of intersecting or circularly overlapping geometry) and the implementation is complex. Instead, order-independent transparency sorts geometry per-pixel, after rasterisation. For exact results this requires storing all fragments before sorting and compositing. == History == The A-buffer is a computer graphics technique introduced in 1984 which stores per-pixel lists of fragment data (including micro-polygon information) in a software rasteriser, REYES, originally designed for anti-aliasing but also supporting transparency. More recently, depth peeling in 2001 described a hardware accelerated OIT technique. With limitations in graphics hardware the scene's geometry had to be rendered many times. A number of techniques have followed, to improve on the performance of depth peeling, still with the many-pass rendering limitation. For example, Dual Depth Peeling (2008). In 2009, two significant features were introduced in GPU hardware/drivers/Graphics APIs that allowed capturing and storing fragment data in a single rendering pass of the scene, something not previously possible. These are, the ability to write to arbitrary GPU memory from shaders and atomic operations. With these features a new class of OIT techniques became possible that do not require many rendering passes of the scene's geometry. The first was storing the fragment data in a 3D array, where fragments are stored along the z dimension for each pixel x/y. In practice, most of the 3D array is unused or overflows, as a scene's depth complexity is typically uneven. To avoid overflow the 3D array requires large amounts of memory, which in many cases is impractical. Two approaches to reducing this memory overhead exist. Packing the 3D array with a prefix sum scan, or linearizing, removed the unused memory issue but requires an additional depth complexity computation rendering pass of the geometry. The "Sparsity-aware" S-Buffer, Dynamic Fragment Buffer, "deque" D-Buffer, Linearized Layered Fragment Buffer all pack fragment data with a prefix sum scan and are demonstrated with OIT. Storing fragments in per-pixel linked lists provides tight packing of this data and in late 2011, driver improvements reduced the atomic operation contention overhead making the technique very competitive. == Exact OIT == Exact, as opposed to approximate, OIT accurately computes the final color, for which all fragments must be sorted. For high depth complexity scenes, sorting becomes the bottleneck. One issue with the sorting stage is local memory limited occupancy, in this case a SIMT attribute relating to the throughput and operation latency hiding of GPUs. Backwards memory allocation (BMA) groups pixels by their depth complexity and sorts them in batches to improve the occupancy and hence performance of low depth complexity pixels in the context of a potentially high depth complexity scene. Up to a 3× overall OIT performance increase is reported. Sorting is typically performed in a local array, however performance can be improved further by making use of the GPU's memory hierarchy and sorting in registers, similarly to an external merge sort, especially in conjunction with BMA. == Approximate OIT == Approximate OIT techniques relax the constraint of exact rendering to provide faster results. Higher performance can be gained from not having to store all fragments or only partially sorting the geometry. A number of techniques also compress, or reduce, the fragment data. These include: Stochastic Transparency: draw in a higher resolution in full opacity but discard some fragments. Downsampling will then yield transparency. Adaptive Transparency, a two-pass technique where the first constructs a visibility function which compresses on the fly (this compression avoids having to fully sort the fragments) and the second uses this data to composite unordered fragments. Intel's pixel synchronization avoids the need to store all fragments, removing the unbounded memory requirement of many other OIT techniques. Weighted Blended Order-Independent Transparency replaced the over operator with a commutative approximation. Feeding depth information into the weight produces visually-acceptable occlusion. == OIT in Hardware == The Sega Dreamcast games console included hardware support for automatic OIT.

    Read more →
  • Shell Control Box

    Shell Control Box

    Shell Control Box (SCB) is a network security appliance that controls privileged access to remote IT systems, records activities in replayable audit trails, and prevents malicious actions. For example, it records as a system administrator updates a file server or a third-party network operator configures a router. The recorded audit trails can be replayed like a movie to review the events as they occurred. The content of the audit trails is indexed to make searching for events and automatic reporting possible. SCB is a Linux-based device developed by Balabit. It is an application level proxy gateway. In 2017, Balabit changed the name of the product to Privileged Session Management (PSM) and repositioned it as the core module of its Privileged Access Management solution. == Main Features == Balabit’s Privileged Session Management (PSM), Shell Control Box (SCB) is a device that controls, monitors, and audits remote administrative access to servers and network devices. It is a tool to oversee system administrators by controlling the encrypted connections used for administration. PSM (SCB) has full control over the SSH, RDP, Telnet, TN3270, TN5250, Citrix ICA, and VNC connections, providing a framework (with solid boundaries) for the work of the administrators. === Gateway Authentication === PSM (SCB) acts as an authentication gateway, enforcing strong authentication before users access IT assets. PSM can also integrate to user directories (for example, a Microsoft Active Directory) to resolve the group memberships of the users who access the protected servers. Credentials for accessing the server are retrieved transparently from PSM’s credential store or a third-party password management system by PSM impersonating the authenticated user. This automatic password retrieval protects the confidentiality of passwords as users can never access them. === Access Control === PSM controls and audits privileged access over the most wide-spread protocols such as SSH, RDP, or HTTP(s). The detailed access management helps to control who can access what and when on servers. It is also possible to control advanced features of the protocols, like the type of channels permitted. For example, unneeded channels like file transfer or file sharing can be disabled, reducing the security risk on the server. With PSM policies for privileged access can be enforced in one single system. === 4-eyes Authorization === To avoid accidental misconfiguration and other human errors, PSM supports the 4-eyes authorization principle. This is achieved by requiring an authorizer to allow administrators to access the server. The authorizer also has the possibility to monitor – and terminate - the session of the administrator in real-time, as if they were watching the same screen. === Real-time Monitoring and Session Termination === PSM can monitor the network traffic in real time, and execute various actions if a certain pattern (for example, a suspicious command, window title or text) appears on the screen. PSM can also detect specific patterns such as credit card numbers. In case of detecting a suspicious user action, PSM can send an e-mail alert or immediately terminate the connection. For example, PSM can block the connection before a destructive administrator command, such as the „rm” comes into effect. === Session Recording === PSM makes user activities traceable by recording them in tamper-proof and confidential audit trails. It records the selected sessions into encrypted, timestamped, and digitally signed audit trails. Audit trails can be browsed online, or followed real-time to monitor the activities of the users. PSM replays the recorded sessions just like a movie – actions of the users can be seen exactly as they appeared on their monitor. The Balabit Desktop Player enables fast forwarding during replays, searching for events (for example, typed commands or pressing Enter) and texts seen by the user. In the case of any problems (database manipulation, unexpected shutdown, etc.) the circumstances of the event are readily available in the trails, thus the cause of the incident can be identified. In addition to recording audit trails, transferred files can be also recorded and extracted for further analysis.

    Read more →
  • List of software palettes

    List of software palettes

    This is a list of software palettes used by computers. Systems that use a 4-bit or 8-bit pixel depth can display up to 16 or 256 colors simultaneously. Many personal computers in the early 1990s displayed at most 256 different colors, freely selected by software (either by the user or by a program) from their wider hardware's RGB color palette. Usual selections of colors in limited subsets (generally 16 or 256) of the full palette includes some RGB level arrangements commonly used with the 8-bit palettes as master palettes or universal palettes (i.e., palettes for multipurpose uses). These are some representative software palettes, but any selection can be made in such of systems. For specific hardware color palettes, see the list of monochrome and RGB palettes, list of 8-bit computer hardware graphics, the list of 16-bit computer hardware graphics and the list of video game console palettes articles. Each palette is represented by an array of color patches. A one-pixel size version appears below each palette, to make it easy to compare palette sizes. For each unique palette, an image color test chart and sample image (truecolor original follows) rendered with that palette (without dithering) are given. The test chart shows the full 8-bit, 256 levels of the red, green, and blue (RGB) primary colors and cyan, magenta, and yellow complementary colors, along with a full 8-bit, 256 levels grayscale. Gradients of RGB intermediate colors (orange, lime green, sea green, sky blue, violet and fuchsia), and a full hue spectrum are also present. Color charts are not gamma corrected. These elements illustrate the color depth and distribution of the colors of any given palette, and the sample image indicates how the color selection of such palettes could represent real-life images. == System specifics == These are selections of colors officially employed as system palettes in some popular operating systems for personal computers that support 8-bit displays. === Microsoft Windows and IBM OS/2 default 16-color palette === Used by these platforms as a roughly backward compatible palette for the CGA, EGA and VGA text modes, but with colors arranged in a different order. Also, is the default palette for 16 color icons. The corresponding indices into this palette are: === Microsoft Windows default 20-color palette === In 256-color mode, there are four additional standard Windows colors, twenty system reserved colors in total; thus the system leaves 236 palette indexes free for applications to use. The system color entries inside a 256-color palette table are the first ten plus the last ten. In any case, the additional system colors do not seem to add a sharp color richness: they are only some intermediate shades of grayish colors. Since Windows 95, these additional colors can be changed by the system when a color scheme needs custom colors, reducing their utility as static, unchanging palette entries. The complete 20-color Windows system palette is: === Apple Macintosh default 16-color palette === When Apple Computer introduced the Macintosh II in 1987, this 16-color palette was included in System 4.1. === RISC OS default palette === Acorn RISC OS 2.x and 3.x provided this 16-color palette: === Solaris default 16-color palette === Solaris OS used this color palette: == RGB arrangements == These are selections of colors based in evenly ordered RGB levels which provide complete RGB combinations, mainly used as master palettes to display any kind of image within the limitations of the 8-bit pixel depth. === 6 level RGB === Having six levels for every primary, with 6³ = 216 combinations. The index can be addressed by (36×R)+(6×G)+B, with all R, G and B values in a range from 0 to 5. Intended as homogeneous RGB cube, it gives six true grays. Also, there is room for another sorts of 40 colors, so operating systems or programs can add extra colors. Systems that use this software palette are: Web-safe colors Apple Macintosh 256 color default palette. It also contains four gradients of ten shades each for gray, red, green and blue. === 6-7-6 levels RGB === This palette is constructed with six levels for red and blue primaries and seven levels for the green primary, giving 6×7×6 = 252 combinations. The index can be addressed by (42×R)+(6×G)+B, with R and B values in a range from 0 to 5 and G in a range from 0 to 6. The same case as the former, but with an added level of green due to the greater sensibility of the normal human eye to this frequency. It does not provide true grays, but remaining indexes can be filled with four intermediate grays. In any case, there is little room for any other color. === 6-8-5 levels RGB === This palette is constructed with six levels for red, eight levels for green and five levels for the blue primaries, giving 6×8×5 = 240 combinations. The index can be addressed by (40×R)+(5×G)+B, with R ranging from 0 to 5, G from 0 to 7 and B from 0 to 4. Levels are chosen in function of sensibility of the normal human eye to every primary color. Also, it does not provide true grays. Remaining indexes can be filled with sixteen intermediate grays or other fixed colors. In fact, this is the best balanced RGB master software palette, in a compromise between the RGB arrangement based in the human eye's sensibility and a sufficient remaining palette entries for another purposes. === 8-8-4 levels RGB === The 8-8-4 level RGB use eight levels for each of the red and green color components (3+3 high order bits), and four levels (2 low order bits) for the blue component, due to the lesser sensitivity of the normal human eye to this primary color. This results in an 8×8×4 = 256-color palette as follows: This RGB software palette occupies the full 8-bit range of possible palette entries, so there is no room for other fixed colors. Software using this palette must draw their user interface elements with the same colors used to show pictures. Also again, it does not provide true grays. == Other common uses of software palettes == === Grayscale palettes === Simple palette made doing every triplet RGB primaries having equal values as a continuous gradient from black to white through the full available palette entries. Here is the 8-bit, 256 levels palette: Used to display pure grayscale TIFF or JPEG images, for example. === Color gradient palettes === Palettes made of a continuous color gradient from darkest to lightest arbitrary hues. The pixel data is treated as if it were grayscale, but the color table plays with RGB color combinations, not only gray. The relationship between the original luminance and the mapped one can vary, but the lighting scale is preserved along all the palette entries. One very common case of such palettes is the sepia tone palette, which gives an image an old fashioned and aged look (left). Another gradient example, based on blue hues, is presented here (right), but any hue or mixing of hues can be used. Many cell phones with built-in cameras have options to take colorized photos using this technique. === Adaptive palettes === Those whose whole number of available indexes are filled with RGB combinations selected from the statistical order of appearance (usually balanced) of a concrete full true color original image. There exist many algorithms to pick the colors through color quantization; one well known is the Heckbert's median-cut algorithm. Here is the 8-bit, 256 color palette used with the color test chart and the image sample above: Adaptive palettes only work well with a unique image. Trying to display different images with adaptive palettes over an 8-bit display usually results in only one image with correct colors, because the images have different palettes and only one can be displayed at a time. Here is an example of what happens when an indexed color image is displayed with any color palette that is not its own adaptive palette: === False color palettes === Arbitrary gradient color scales, usually 256 shades, with no relationship with real colors of a given image. They are employed to artificially colorize a grayscale image to reveal details and/or to map the pixel level values to amounts of some physical magnitude (potential, temperature, altitude, etc.) Note, in the example above, that new details can be seen as blue over magenta in the background's dark areas of the original photograph. Here is the 8-bit, 256 color gradient palette used with the color test chart and the image sample above: There exist many false color palettes, some of them standardized, used mainly in scientific applications: astronomy and radioastronomy, satellite land imaging, thermography, study of materials, tomography and magnetic resonance imaging in medicine, etc.

    Read more →
  • Kernel embedding of distributions

    Kernel embedding of distributions

    In machine learning, the kernel embedding of distributions (also called the kernel mean or mean map) comprises a class of nonparametric methods in which a probability distribution is represented as an element of a reproducing kernel Hilbert space (RKHS). A generalization of the individual data-point feature mapping done in classical kernel methods, the embedding of distributions into infinite-dimensional feature spaces can preserve all of the statistical features of arbitrary distributions, while allowing one to compare and manipulate distributions using Hilbert space operations such as inner products, distances, projections, linear transformations, and spectral analysis. This learning framework is very general and can be applied to distributions over any space Ω {\displaystyle \Omega } on which a sensible kernel function (measuring similarity between elements of Ω {\displaystyle \Omega } ) may be defined. For example, various kernels have been proposed for learning from data which are: vectors in R d {\displaystyle \mathbb {R} ^{d}} , discrete classes/categories, strings, graphs/networks, images, time series, manifolds, dynamical systems, and other structured objects. The theory behind kernel embeddings of distributions has been primarily developed by Alex Smola, Le Song, Arthur Gretton, and Bernhard Schölkopf. A review of recent works on kernel embedding of distributions can be found in. The analysis of distributions is fundamental in machine learning and statistics, and many algorithms in these fields rely on information theoretic approaches such as entropy, mutual information, or Kullback–Leibler divergence. However, to estimate these quantities, one must first either perform density estimation, or employ sophisticated space-partitioning/bias-correction strategies which are typically infeasible for high-dimensional data. Commonly, methods for modeling complex distributions rely on parametric assumptions that may be unfounded or computationally challenging (e.g. Gaussian mixture models), while nonparametric methods like kernel density estimation (Note: the smoothing kernels in this context have a different interpretation than the kernels discussed here) or characteristic function representation (via the Fourier transform of the distribution) break down in high-dimensional settings. Methods based on the kernel embedding of distributions sidestep these problems and also possess the following advantages: Data may be modeled without restrictive assumptions about the form of the distributions and relationships between variables Intermediate density estimation is not needed Practitioners may specify the properties of a distribution most relevant for their problem (incorporating prior knowledge via choice of the kernel) If a characteristic kernel is used, then the embedding can uniquely preserve all information about a distribution, while thanks to the kernel trick, computations on the potentially infinite-dimensional RKHS can be implemented in practice as simple Gram matrix operations Dimensionality-independent rates of convergence for the empirical kernel mean (estimated using samples from the distribution) to the kernel embedding of the true underlying distribution can be proven. Learning algorithms based on this framework exhibit good generalization ability and finite sample convergence, while often being simpler and more effective than information theoretic methods Thus, learning via the kernel embedding of distributions offers a principled drop-in replacement for information theoretic approaches and is a framework which not only subsumes many popular methods in machine learning and statistics as special cases, but also can lead to entirely new learning algorithms. == Definitions == Let X {\displaystyle X} denote a random variable with domain Ω {\displaystyle \Omega } and distribution P {\displaystyle P} . Given a symmetric, positive-definite kernel k : Ω × Ω → R {\displaystyle k:\Omega \times \Omega \rightarrow \mathbb {R} } the Moore–Aronszajn theorem asserts the existence of a unique RKHS H {\displaystyle {\mathcal {H}}} on Ω {\displaystyle \Omega } (a Hilbert space of functions f : Ω → R {\displaystyle f:\Omega \to \mathbb {R} } equipped with an inner product ⟨ ⋅ , ⋅ ⟩ H {\displaystyle \langle \cdot ,\cdot \rangle _{\mathcal {H}}} and a norm ‖ ⋅ ‖ H {\displaystyle \|\cdot \|_{\mathcal {H}}} ) for which k {\displaystyle k} is a reproducing kernel, i.e., in which the element k ( x , ⋅ ) {\displaystyle k(x,\cdot )} satisfies the reproducing property ⟨ f , k ( x , ⋅ ) ⟩ H = f ( x ) ∀ f ∈ H , ∀ x ∈ Ω . {\displaystyle \langle f,k(x,\cdot )\rangle _{\mathcal {H}}=f(x)\qquad \forall f\in {\mathcal {H}},\quad \forall x\in \Omega .} One may alternatively consider x ↦ k ( x , ⋅ ) {\displaystyle x\mapsto k(x,\cdot )} as an implicit feature mapping φ : Ω → H {\displaystyle \varphi :\Omega \rightarrow {\mathcal {H}}} (which is therefore also called the feature space), so that k ( x , x ′ ) = ⟨ φ ( x ) , φ ( x ′ ) ⟩ H {\displaystyle k(x,x')=\langle \varphi (x),\varphi (x')\rangle _{\mathcal {H}}} can be viewed as a measure of similarity between points x , x ′ ∈ Ω . {\displaystyle x,x'\in \Omega .} While the similarity measure is linear in the feature space, it may be highly nonlinear in the original space depending on the choice of kernel. === Kernel embedding === The kernel embedding of the distribution P {\displaystyle P} in H {\displaystyle {\mathcal {H}}} (also called the kernel mean or mean map) is given by: μ X := E [ k ( X , ⋅ ) ] = E [ φ ( X ) ] = ∫ Ω φ ( x ) d P ( x ) {\displaystyle \mu _{X}:=\mathbb {E} [k(X,\cdot )]=\mathbb {E} [\varphi (X)]=\int _{\Omega }\varphi (x)\ \mathrm {d} P(x)} If P {\displaystyle P} allows a square integrable density p {\displaystyle p} , then μ X = E k p {\displaystyle \mu _{X}={\mathcal {E}}_{k}p} , where E k {\displaystyle {\mathcal {E}}_{k}} is the Hilbert–Schmidt integral operator. A kernel is characteristic if the mean embedding μ : { family of distributions over Ω } → H {\displaystyle \mu :\{{\text{family of distributions over }}\Omega \}\to {\mathcal {H}}} is injective. Each distribution can thus be uniquely represented in the RKHS and all statistical features of distributions are preserved by the kernel embedding if a characteristic kernel is used. === Empirical kernel embedding === Given n {\displaystyle n} training examples { x 1 , … , x n } {\displaystyle \{x_{1},\ldots ,x_{n}\}} drawn independently and identically distributed (i.i.d.) from P , {\displaystyle P,} the kernel embedding of P {\displaystyle P} can be empirically estimated as μ ^ X = 1 n ∑ i = 1 n φ ( x i ) {\displaystyle {\widehat {\mu }}_{X}={\frac {1}{n}}\sum _{i=1}^{n}\varphi (x_{i})} === Joint distribution embedding === If Y {\displaystyle Y} denotes another random variable (for simplicity, assume the co-domain of Y {\displaystyle Y} is also Ω {\displaystyle \Omega } with the same kernel k {\displaystyle k} which satisfies ⟨ φ ( x ) ⊗ φ ( y ) , φ ( x ′ ) ⊗ φ ( y ′ ) ⟩ = k ( x , x ′ ) k ( y , y ′ ) {\displaystyle \langle \varphi (x)\otimes \varphi (y),\varphi (x')\otimes \varphi (y')\rangle =k(x,x')k(y,y')} ), then the joint distribution P ( x , y ) ) {\displaystyle P(x,y))} can be mapped into a tensor product feature space H ⊗ H {\displaystyle {\mathcal {H}}\otimes {\mathcal {H}}} via C X Y = E [ φ ( X ) ⊗ φ ( Y ) ] = ∫ Ω × Ω φ ( x ) ⊗ φ ( y ) d P ( x , y ) {\displaystyle {\mathcal {C}}_{XY}=\mathbb {E} [\varphi (X)\otimes \varphi (Y)]=\int _{\Omega \times \Omega }\varphi (x)\otimes \varphi (y)\ \mathrm {d} P(x,y)} By the equivalence between a tensor and a linear map, this joint embedding may be interpreted as an uncentered cross-covariance operator C X Y : H → H {\displaystyle {\mathcal {C}}_{XY}:{\mathcal {H}}\to {\mathcal {H}}} from which the cross-covariance of functions f , g ∈ H {\displaystyle f,g\in {\mathcal {H}}} can be computed as Cov ⁡ ( f ( X ) , g ( Y ) ) := E [ f ( X ) g ( Y ) ] − E [ f ( X ) ] E [ g ( Y ) ] = ⟨ f , C X Y g ⟩ H = ⟨ f ⊗ g , C X Y ⟩ H ⊗ H {\displaystyle \operatorname {Cov} (f(X),g(Y)):=\mathbb {E} [f(X)g(Y)]-\mathbb {E} [f(X)]\mathbb {E} [g(Y)]=\langle f,{\mathcal {C}}_{XY}g\rangle _{\mathcal {H}}=\langle f\otimes g,{\mathcal {C}}_{XY}\rangle _{{\mathcal {H}}\otimes {\mathcal {H}}}} Given n {\displaystyle n} pairs of training examples { ( x 1 , y 1 ) , … , ( x n , y n ) } {\displaystyle \{(x_{1},y_{1}),\dots ,(x_{n},y_{n})\}} drawn i.i.d. from P {\displaystyle P} , we can also empirically estimate the joint distribution kernel embedding via C ^ X Y = 1 n ∑ i = 1 n φ ( x i ) ⊗ φ ( y i ) {\displaystyle {\widehat {\mathcal {C}}}_{XY}={\frac {1}{n}}\sum _{i=1}^{n}\varphi (x_{i})\otimes \varphi (y_{i})} === Conditional distribution embedding === Given a conditional distribution P ( y ∣ x ) , {\displaystyle P(y\mid x),} one can define the corresponding RKHS embedding as μ Y ∣ x = E [ φ ( Y ) ∣ X ] = ∫ Ω φ ( y ) d P ( y ∣ x ) {\displaystyle \mu _{Y\mid x}=\mathbb {E} [\varphi (Y)\mid X]=\int _{\Omega

    Read more →
  • Micro stuttering

    Micro stuttering

    Micro stuttering is a visual artifact in real-time computer graphics in which the time intervals between consecutively displayed frames are uneven, even though the average frame rate reported by benchmarking software appears adequate. Tools such as 3DMark typically compute frame rates over intervals of one second or more, which can conceal momentary drops in the instantaneous frame rate that the viewer perceives as hitching or jerking of on-screen motion. At low frame rates the effect is visible as a stutter in moving images, degrading the experience in interactive applications such as video games. In severe cases a lower but more consistent frame rate can appear smoother than a higher but more erratic one. The term gained prominence in the late 2000s in discussions of multi-GPU rendering (see History), but micro stuttering also affects single-GPU systems. Common causes on modern hardware include real-time shader compilation, asset streaming from storage, VRAM exhaustion, and driver bugs. == Causes == === Shader compilation === A common cause of micro stuttering on modern PCs is real-time shader compilation. Shaders are small programs that instruct the GPU on how to render visual effects such as lighting, shadows, and reflections. On consoles, developers can pre-compile all shaders for the known, fixed hardware. On PCs, the variety of GPU architectures means shaders must often be compiled at run time, either when the game launches or during gameplay itself. When the rendering engine encounters a shader that has not yet been compiled, the CPU must finish the compilation before the GPU can draw the affected object. This causes a spike in frame time that the player perceives as a hitch. The problem has been particularly associated with games built on Unreal Engine 4 running under DirectX 12, because DX12 shifts more shader management responsibility to the application. Several techniques exist to reduce shader compilation stutter. Pipeline State Object (PSO) pre-caching records the shader permutations used at runtime so that they can be compiled in advance on subsequent launches. Asynchronous shader compilation moves the work to background CPU threads to avoid blocking the main rendering thread. Platform-level services such as Steam's shader pre-caching distribute previously compiled shaders to users with matching GPU hardware. The Steam Deck, which contains a single fixed GPU, benefits from pre-compiled shader caches because all units share the same hardware configuration. === Other causes === Micro stuttering on single-GPU systems can have several additional causes. CPU bottlenecks or scheduling interruptions from background tasks can prevent the processor from preparing frames at regular intervals. Asset streaming during gameplay (loading textures, geometry, or audio from storage) can produce hitches sometimes called traversal stutter; the use of solid-state drives and technologies such as DirectStorage has reduced but not eliminated this. VRAM exhaustion forces data to be swapped between video memory and system memory over the PCI Express bus, which is slower. Graphics driver bugs can also introduce stutter; Nvidia released hotfix driver 551.46 in February 2024 to correct intermittent micro stuttering when V-Sync was enabled. == Measurement == Micro stuttering drew attention to the limitations of average frame rate as a performance metric. In 2013, Scott Wasson at The Tech Report published a series of articles advocating frame time analysis, in which the delivery time of every individual frame is recorded and plotted rather than collapsed into a single frames-per-second figure. This approach was adopted by other hardware review publications in the following years. GPU reviews now routinely report 1% low and 0.1% low frame rates alongside the average. The 1% low is the average frame rate of the slowest 1% of frames in a sample; it serves as an indicator of worst-case smoothness. A large gap between the average and the 1% low suggests poor frame pacing. Tools for capturing per-frame timing data include FRAPS, PresentMon, OCAT, CapFrameX, and MSI Afterburner with RivaTuner Statistics Server. == Mitigation == === Frame pacing === Frame pacing is a software technique that regulates the timing of frame delivery to produce even intervals between displayed frames. Game engines, GPU drivers, and platform libraries all implement frame pacing strategies to varying degrees. On mobile platforms, Google provides the Android Frame Pacing library (Swappy) as part of the Android Game Development Kit. In December 2025, the Khronos Group published the VK_EXT_present_timing Vulkan extension, giving developers explicit control over presentation timing in a cross-platform graphics API for the first time. === Variable refresh rate === Variable refresh rate (VRR) display technologies allow a monitor's refresh rate to change to match the GPU's frame output. Implementations include Nvidia G-Sync (2013), AMD FreeSync (2015), and the VESA Adaptive-Sync standard built into DisplayPort 1.2a and later. VRR eliminates the screen tearing that results from a mismatch between frame rate and refresh rate, and avoids the frame-holding behaviour of V-Sync that can itself cause stutter. It is effective at smoothing moderate frame rate fluctuations but cannot compensate for large sudden spikes in frame time such as those caused by shader compilation or heavy asset streaming. VRR support has become standard in gaming monitors, televisions (via HDMI 2.1), and the Xbox Series X/S and PlayStation 5 consoles. === Frame generation === Beginning with DLSS 3 on the GeForce RTX 40 series in 2022, Nvidia introduced AI-based frame generation, which uses dedicated optical flow hardware and a neural network to create new frames between traditionally rendered ones. AMD followed with FSR 3 in 2023, using an algorithmic approach, and the AI-based FSR 4 for the Radeon RX 9000 series in 2025. DLSS 4, released in January 2025 for the GeForce RTX 50 series, can generate up to three frames per rendered frame using a technique called Multi Frame Generation. Frame generation increases the displayed frame rate but introduces its own frame pacing concerns. If the underlying rendered frames are unevenly timed, the interpolated frames can make the unevenness more apparent rather than less. DLSS 4 addresses this with hardware-level flip metering on the GPU's display engine, which controls the timing of frame presentation more precisely than the CPU-based pacing used in DLSS 3. Both vendors pair frame generation with latency-reduction features (Nvidia Reflex and AMD Anti-Lag+) to offset the additional input latency that results from inserting synthetic frames into the pipeline. === Frame rate limiters === Capping the frame rate below the display's maximum refresh rate, using tools such as RivaTuner Statistics Server, in-game limiters, or driver-level settings, is a common way to improve frame pacing. Preventing the GPU from running ahead of the display reduces variability in frame delivery times and can produce a smoother result than an uncapped but more irregular frame rate. == History == === Multi-GPU configurations === Micro stuttering was first widely documented in the late 2000s as a side effect of multi-GPU configurations using Alternate Frame Rendering (AFR), in which consecutive frames are assigned to alternating GPUs. Because each GPU may take a different amount of time to complete its assigned frame — due to varying scene complexity, driver scheduling, or inter-GPU communication overhead — the resulting frame delivery is irregular even when the average frame rate is high. Both Nvidia SLI and AMD CrossFireX were affected, with dual-GPU setups exhibiting the worst frame pacing irregularities. In 2012 benchmarks using Battlefield 3, dual Radeon HD 7970 cards in CrossFire showed 85% variation in frame delivery times compared with 7% for a single card, while dual GeForce GTX 680 cards in SLI showed only 7% variation compared with 5% for a single card. Multi-GPU micro stuttering became a significant factor in the eventual decline and discontinuation of consumer multi-GPU gaming. Nvidia restricted SLI to a handful of enthusiast-class cards from the GeForce 10 series onward, then replaced it with NVLink on the GeForce RTX 20 series, which saw limited gaming adoption. AMD ceased active CrossFire development around 2017. By the mid-2020s, neither vendor's current consumer GPUs support multi-GPU rendering for games. Other factors that contributed to the decline include DirectX 12 placing multi-GPU support in the hands of game developers rather than driver authors, the incompatibility of temporal anti-aliasing and other temporal rendering techniques with AFR, and the increasing size, power draw, and cost of individual GPUs. The third-party utility RadeonPro could reduce CrossFire micro stuttering through dynamic V-Sync and frame pacing adjustments, and AMD later introduced a driver-level frame paci

    Read more →
  • Language-Theoretic Security

    Language-Theoretic Security

    Language-theoretic security, or LangSec, is an approach to software security that focuses on input handling, complexity, and program design as strategies to improve the verifiability of computer programs. It was introduced in 2005 by Robert J. Hansen and Meredith L. Patterson at BlackHat and in 2011 by Len Sassaman and Patterson. It aims to create a formal description of which software is likely to have security vulnerabilities of particular classes, and why. It considers programs to have an inherent parser component, whether or not explicit, composed of that part of the program which operates on external input before that input is fully parsed. A central hypothesis of language-theoretic security is that vulnerabilities in software increase according to the computational power of the notional input-accepting automaton equivalent to this parser, using the definitions of automata theory. The lower bound on this computational power is the input language complexity of the program. The extent to which reducing this complexity is possible is a function of the specification of the communication protocol or file format the program takes as input. == Parsing as a security mechanism == The behaviour of a program is defined with reference to its expected input. Unexpected input being used by a program is a factor in numerous security bugs, including the so-called Android master key vulnerability (CVE-2013-4787), because accepting unexpected input renders the program's specification ambiguous. In that instance, the unexpected ambiguity came in the form of a ZIP file with duplicate filenames. If a program fully parses its input and only acts on input that unambiguously meets the specification, it follows that the program will avoid these types of vulnerabilities. This is an intentional inversion of the Postel principle. Accepting only unambiguous and valid input is a more formal requirement than input validation or sanitization, and narrows the number of possible but unanticipated program states that can be induced in an application via user input. Conversely, failure to do this is associated with security vulnerabilities. Input sanitization in particular is held to be an inadequate approach to avoiding malicious input because it inherently ignores context-sensitive properties of the input; it can therefore result in paradoxical effects, such as sanitization code activating otherwise inert cross-site scripting payloads in browsers. === Parser differentials === If the language of accepted program input is sufficiently simple, it is possible to verify that two implementations parse the same input language consistently. This is advantageous because it shows no parser differential exists between the two implementations. The requisite level of simplicity is theoretically that for which there is a solution to the equivalence problem. If the two parsers involved in CVE-2013-4787 were equivalent - that is, if they rendered the same output state given the same input state - the vulnerability could not have existed. One strategy for doing this is to publish machine-readable specifications of a format or protocol, and then use a parser generator to generate the parser code. An example of a parser generator built for this purpose is DaeDaLus. The combination of Lex with any of GNU Bison, ANTLR, or Yacc also accomplishes this. However, many parser generators allow the mixing of general purpose code with the parsing definitions, which weakens the guarantees provided by parsing. === Analysis of injection attacks === Injection attacks are generally the result of differences between the serializer (or "unparser") and the corresponding parser at a layer boundary in a system; therefore, they are a special case of parser differentials. In a SQL injection attack, for example, an attacker is able to cause the application with which they are interacting to serialize a SQL query that has different semantics than intended. In the simplest case where the payload ends a string and adds new code, the payload has crossed the code-data boundary in SQL. In language-theoretic security, this is treated as a bug in the serializer of the SQL query, which should instead be written in a way that constrains its possible outputs to those within the scope of the intended query. === Parser combinators === If a parser generator is not used, it is still possible to avoid implementation bugs by using parser combinator such as Nom to implement the parser code. This has the drawback of relying on a programmer correctly translating the specification into the language of the parser generator library, though this task is still less error-prone than hand-coding a parser. == Input format complexity == Complexity in computer programs is associated with security vulnerabilities. Within the domain of language-theoretic security, complexity is described with reference to the computational power of the abstract machine necessary to implement the program, or more particularly, to implement the parser for its input language. This complexity describes whether it is possible to show that there is no unintended or undesired functionality in the program which might be exploitable by an attacker. To be bounded in complexity, the program's input must be well-defined both in terms of form and of semantics. === Weird machines === A weird machine is a model of computation in a program that exists in parallel with, but is distinct from, the intended abstract model of computation in that program. Some classes of weird machine arise from the multi-layered nature of computer programs, or the context in which the programs run; others result from the unanticipated functionality a program has due to its complexity or to software bugs. The more complex the computation model of a program, the more likely it is to implement a weird machine. Depending on context, the weird machine may or may not be concretely useful for an attacker. Since the space of weird machines in the context of some program is the universe of all possible states that are not within the program's intended states, many exploited states including remote code execution and injection attacks belong to the domain of weird machines. A reduction in weird machines is therefore a likely correlate with reduced program vulnerability. === SafeDocs project === SafeDocs is a DARPA project undertaken in 2018 to take existing file formats, create safer subsets of them, and develop programming tools to work for the safer formats. The initial test case for this was PDF. The purpose of creating safer subsets in this case is to lower the minimum bound on parser complexity so that it becomes possible to create tools that will generate correct, normative parsers for them. == Relation to programming languages == The analytic framework of language-theoretic security assumes programs to be virtual machines that execute their input. A document that is read by an application is in this sense a form of machine code, in a generalization of the data as code idea, following the automata theory description of parsers. === Type-safe programming languages === Parsing input and serializing output are operations that consume one data type and emit another. A programming language can therefore check that data is correctly parsed and contains the expected structure by checking data types, and correct serializing (or unparsing) can be implemented as operations on the data types that are relevant to the program's output. This approach can be used to show that the recognizer and unparser patterns have been implemented. It is also possible to implement type checking across a distributed system to enforce parsing and unparsing of the expected structures and to verify that the assumptions made in designing the compositional properties of a distributed system have been followed. === Memory-safe programming languages === In the general case, spatial memory correctness is undecidable. If any proof of spatial memory correctness is to be made, it is therefore necessary to bound the complexity of the code. Interpreted languages such as Java and Python effectively accomplish this via runtime bounds checking, and frameworks for runtime bounds checking also exist for C. The effect of these strategies for spatial memory correctness are to create a halt state in place of a spatial memory correctness violation; therefore, it can be shown that the program will not violate spatial memory correctness, but in exchange, it cannot be shown in the general case that programs will not have runtime bounds checking exceptions. Some programming languages, such as Rust, accomplish this using borrow checking. The borrow checker acts to assure spatial memory correctness by compile-time reference counting. Code for which spatial memory correctness cannot be shown to not be violated therefore does not compile, inherently limiting the complexity of the spatial memory correctness of the program to what is decidable. Thi

    Read more →
  • Seccomp

    Seccomp

    seccomp (short for secure computing) is a computer security facility in the Linux kernel. seccomp allows a process to make a one-way transition into a "secure" state where it cannot make any system calls except exit(), sigreturn(), read() and write() to already-open file descriptors. Should it attempt any other system calls, the kernel will either just log the event or terminate the process with SIGKILL or SIGSYS. In this sense, it does not virtualize the system's resources but isolates the process from them entirely. seccomp mode is enabled via the prctl(2) system call using the PR_SET_SECCOMP argument, or (since Linux kernel 3.17) via the seccomp(2) system call. seccomp mode used to be enabled by writing to a file, /proc/self/seccomp, but this method was removed in favor of prctl(). In some kernel versions, seccomp disables the RDTSC x86 instruction, which returns the number of elapsed processor cycles since power-on, used for high-precision timing. seccomp-bpf is an extension to seccomp that allows filtering of system calls using a configurable policy implemented using Berkeley Packet Filter rules. It is used by OpenSSH and vsftpd as well as the Google Chrome/Chromium web browsers on ChromeOS and Linux. (In this regard seccomp-bpf achieves similar functionality, but with more flexibility and higher performance, to the older systrace—which seems to be no longer supported for Linux.) Some consider seccomp comparable to OpenBSD pledge(2) and FreeBSD capsicum(4). == History == seccomp was first devised by Andrea Arcangeli in January 2005 for use in public grid computing and was originally intended as a means of safely running untrusted compute-bound programs. It was merged into the Linux kernel mainline in kernel version 2.6.12, which was released on March 8, 2005. == Software using seccomp or seccomp-bpf == Android uses a seccomp-bpf filter in the zygote since Android 8.0 Oreo. systemd's sandboxing options are based on seccomp. QEMU, the Quick Emulator, the core component to the modern virtualization together with KVM uses seccomp on the parameter --sandbox Docker – software that allows applications to run inside of isolated containers. Docker can associate a seccomp profile with the container using the --security-opt parameter. Arcangeli's CPUShare was the only known user of seccomp for a while. Writing in February 2009, Linus Torvalds expresses doubt whether seccomp is actually used by anyone. However, a Google engineer replied that Google is exploring using seccomp for sandboxing its Chrome web browser. Firejail is an open source Linux sandbox program that utilizes Linux namespaces, Seccomp, and other kernel-level security features to sandbox Linux and Wine applications. As of Chrome version 20, seccomp-bpf is used to sandbox Adobe Flash Player. As of Chrome version 23, seccomp-bpf is used to sandbox the renderers. Snap specify the shape of their application sandbox using "interfaces" which snapd translates to seccomp, AppArmor and other security constructs vsftpd uses seccomp-bpf sandboxing as of version 3.0.0. OpenSSH has supported seccomp-bpf since version 6.0. Mbox uses ptrace along with seccomp-bpf to create a secure sandbox with less overhead than ptrace alone. LXD, a Ubuntu "hypervisor" for containers Firefox and Firefox OS, which use seccomp-bpf Tor supports seccomp since 0.2.5.1-alpha Lepton, a JPEG compression tool developed by Dropbox uses seccomp Kafel is a configuration language, which converts readable policies into seccompb-bpf bytecode Subgraph OS uses seccomp-bpf Flatpak uses seccomp for process isolation Bubblewrap is a lightweight sandbox application developed from Flatpak minijail uses seccomp for process isolation SydBox uses seccomp-bpf to improve the runtime and security of the ptrace sandboxing used to sandbox package builds on Exherbo Linux distribution. File, a Unix program to determine filetypes, uses seccomp to restrict its runtime environment Zathura, a minimalistic document viewer, uses seccomp filter to implement different sandbox modes Tracker, a indexing and preview application for the GNOME desktop environment, uses seccomp to prevent automatic exploitation of parsing vulnerabilities in media files

    Read more →
  • Class activation mapping

    Class activation mapping

    Class activation mapping methods are explainable AI (XAI) techniques used to visualize the regions of an input image that are the most relevant for a particular task, especially image classification, in convolutional neural networks (CNNs). These methods generate heatmaps by weighting the feature maps from a convolutional layer according to their relevance to the target class. In the field of artificial intelligence, generically defined as "the effort to automate intellectual tasks normally performed by humans", machine learning and deep learning were created. They both use statistical and computational methods to learn patterns from data, reducing the need for manually coded rules. Machine learning models are trained on input data and the known respective answers, learning the underlying patterns or structures present in the data. Traditional Machine learning algorithms employ manually designed feature sets, posing a direct link between machine learning designers and employed features. Deep learning is a subfield of machine learning, based on the concept of successive layers of representation, in which the data is progressively unfolded in different ways, to extract relevant and informative patterns in data analysis. Deep learning algorithms are defined as feature learning algorithms automatically learning hierarchical feature representations from raw data, extracting increasingly abstract features through multiple layers. CNNs are a specific architecture of deep learning models, designed to process spatially structured data, such as images, exploiting a series of convolution, non-linear activation and pooling operations to extract relevant features, contained in the so-called feature maps from input data. CNNs have demonstrated to be highly effective in a variety of computer vision and image processing tasks. CNNs (and deep learning models more broadly) are described as black boxes due to their complex and non-transparent internal layers of representation. The need for clearer indications on its internal working and decision-making process gave birth to XAI techniques. Among the proposed XAI techniques for computer vision tasks, Class activation mapping methods can show which pixels in an input image are important to the predicted logit for a class of interest, in a classification task. Class activation mapping methods were originally developed for class-discriminative scenarios to visualize which parts of the input image influenced the classification decision, namely to visually highlight the regions of those feature maps that contribute most strongly to the prediction of a given class. More advanced versions of these methods are not limited to image classification tasks, but have been extended also to several vision-related tasks, such as object detection, image captioning, visual question answering and image segmentation. == Background == The following methods laid the groundwork for the class activation maps approaches, forming the conceptual basis of using gradients to highlight class-discriminative regions. === Class model visualization and saliency maps for convolutional neural networks === The class model visualization and image-specific saliency maps approaches have been presented in the foundational work "Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps" by Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman and it generalizes the deconvnet method by Zeiler and Fergus. Class model visualization synthesizes an artificial input image that strongly activates the output neurons associated with a target class. Given a trained, fixed model, this method starts with a zero-initialized image, backpropagates the gradients from the class score to the image pixels, updates the image pixels increasing the specific class scores and it repeats the pixel updating process, showing an encoded (idealized version) prototype of the class of interest. Image-specific class saliency visualization method provides a visual explanation by highlighting the most relevant pixels in an image for predicting a certain class C of interest. This is done by computing the gradient of the class score with respect to the input image, I 0 , {\displaystyle I_{0},} w = ∂ S C ∂ I | I 0 {\displaystyle w=\left.{\frac {\partial S_{C}}{\partial I}}\right|_{I_{0}}} approximating the model locally (around I 0 {\displaystyle I_{0}} ) as linear, using a first-order Taylor expansion: S C ( I ) ≈ w C T I + b {\displaystyle S_{C}(I)\approx w_{C}^{T}I+b} . The magnitude of w C {\displaystyle w_{C}} , the gradient, indicates the importancy of the pixels: larger gradients suggest greater influence on the prediction. Once the gradient is known, the saliency map is defined as the maximum absolute gradient across the color channels: M i j = m a x C | ∂ S C ∂ I i j C | {\displaystyle M_{ij}=max_{C}\left|{\frac {\partial S_{C}}{\partial I_{ij}^{C}}}\right|} resulting in an saliency map (i.e. heatmap). === Guided backpropagation === The concept of guided backpropagation can be traced for the first time in the paper by Springenberg et al. "Striving For Simplicity: The All Convolutional Net" and also this method builds upon the work by Zeiler and Fergus "Visualizing and Understanding Convolutional Networks". Guided backpropagation core is to understand what a CNN is learning, by visualizing the patterns that activate more strongly individual neurons (or filters), in architectures which do not rely on max-pooling layer. When propagating gradients back through a rectified linear unit (ReLU), guided backpropagation passes the gradient if and only if the input to the ReLU was positive (forward pass) and the output gradient is positive (backward signal), tackling both inactive neurons, negative gradients and suppressing the noise. The result displays sharper, high-resolution visualizations of what each neuron is responding to. Guided backpropagation represents a simple and practical method for model interpretability, helping understand how and where neural networks detect semantic concepts across layers. Moreover, it can be applied to any network architecture, due to its working principle. == Base versions == Class activation mapping and gradient-weighted class activation mapping are the original and most widely used methods for visual explanations in convolutional neural networks. These methods serve as the foundation for many later developments in explainable AI. Notation: In this article, the symbols i and j represent integer indices that disappear inside sums or averages, while x and y are the continuous (or up-sampled integer) coordinates of the final heat-map that is plotted. === Class activation mapping (CAM) === Class activation mapping (CAM) was the first, and the original, version of CAM methods, and it gave the name to the whole category. The approach was firstly introduced by Zhou et al. in their seminal work "Learning Deep Features for Discriminative Localization". This approach achieves class-specific heatmaps by modifying image classification CNN architectures, replacing fully-connected layers with convolutional layers and a final global average pooling layer. Its main scope is to localize and highlight discriminative regions of an input image that a CNN uses to identify a particular class, without needing explicit bounding box annotations. ==== Global average pooling (GAP) ==== Global average pooling (GAP) represents the key element in the original CAM approach. It is a dimensionality reduction technique and, similarly to other pooling layers, it allows the downsampling of the feature maps, calculating representative values for a specific region of the feature map. The particularity of GAP is that it calculates a single value for an entire feature map, significantly reducing the model dimensions. ==== Mathematical description ==== The mathematical description considers as its key the combination of convolutional and GAP layers. In CAM, it is mandatory to have the GAP layer after the last convolutional layer and before the final linear classifier layer. This last element of the architecture connects the output logits (the network predictions) y C {\displaystyle y^{C}} , to the GAP values, with its respective fine-tuned weights, w k C {\displaystyle w_{k}^{C}} . Considering A k {\displaystyle A^{k}} as the last feature maps of the last convolutional layer, GAP produces one value for each feature map, by averaging all the matrix elements (i, j) of the feature map: F k = 1 m n ∑ i = 1 m ∑ j = 1 n A i j k {\displaystyle F^{k}={\frac {1}{mn}}\sum _{i=1}^{m}\sum _{j=1}^{n}A_{ij}^{k}} with A k = [ A 11 k A 12 k ⋯ A 1 n k A 21 k A 22 k ⋯ A 2 n k ⋮ ⋮ ⋱ ⋮ A m 1 k A m 2 k ⋯ A m n k ] = { A i j k ∣ 1 ≤ i ≤ m , 1 ≤ j ≤ n } {\displaystyle A^{k}={\begin{bmatrix}A_{11}^{k}&A_{12}^{k}&\cdots &A_{1n}^{k}\\A_{21}^{k}&A_{22}^{k}&\cdots &A_{2n}^{k}\\\vdots &\vdots &\ddots &\vdots \\A_{m1}^{k}&A_{m2}^{k}&\cdots &A_{mn}^{k}\end{bmatrix}}=\left\{A_{

    Read more →
  • VieON

    VieON

    VieON is an mobile application for television and video on demand provided by VieON Joint Stock Company (formerly Dzones), a subsidiary of DatVietVAC Media and Entertainment Group in Vietnam. The app was launched in 2020, featuring over 140 domestic and international television channels, original series, popular entertainment programs known nationwide, top-tier sports events and live streaming of major events. Additionally, VieON provides animated films, television series and television programs from various countries such as South Korea and China. == History == The application was planned for development in 2016, with the cooperation of strategic consulting partner BCG Digital Ventures from the United States. Prior to 2020, VieON was a rebranded version of VTVcab ON, a product managed by Vietnam Cable Television Corporation (VTVCab) and DatVietVAC. On June 15, 2020, after four years of research and testing, the new version of VieON was officially released by DatVietVAC Group, with Vie Channel Joint Stock Company as the business entity and service provider. This is considered the official launch date of the application. On July 21, 2023, VieON transitioned its business operations and service provision to VieON Joint Stock Company. In January 2024, VieON officially launched its global version, VieON Global, targeting Vietnamese users living abroad. == Background == According to Kantar Media Vietnam, up to 84% of Vietnamese people aged 15–54 use social media daily, and in a similar survey by Nielsen, 90% of respondents said they watch live TV weekly. Additionally, according to research organization Muvi, Southeast Asia's OTT market revenue could reach $650 million annually starting next year. Understanding this, DatVietVAC Group has planned to research and develop an OTT application, even though the Vietnamese market already has some major players such as FPT Play and the international giant Netflix. Additionally, DatVietVAC does not hide its ambition to make this application the number one entertainment channel for Vietnamese people.

    Read more →
  • VSCO

    VSCO

    VSCO ( ), formerly known as VSCO Cam, is a photography mobile app available for iOS and Android devices. The app was created by Joel Flory and Greg Lutze. The VSCO app allows users to capture photos in the app and edit them, using preset filters and editing tools. == History == Visual Supply Company was founded by Joel Flory and Greg Lutze in California, in 2011. VSCO was launched in 2012. It raised $40 million from investors in May 2014. In 2017, VSCO launched a subscription model. As of 2018, Visual Supply Company has $90 million in funding from investors and over 2 million paying members. In 2019, VSCO acquired Rylo, a video editing startup founded by the original developer of Instagram’s Hyperlapse. Visual Supply Company has locations in Oakland, California, where it is headquartered, and Chicago, Illinois. In December 2020 VSCO acquired AI-powered video editing app Trash. In April 2018, VSCO reached over 30 million users. In September 2023, Eric Wittman was appointed as the new CEO and co-founder Joel Flory became executive chairman. == Usage == Users must register an account to use the app. Photos can be taken or imported from the camera roll, as well as short videos or animated GIFs (known in the app as DSCO; iOS only). The user can edit their photos through various preset filters, or through the "toolkit" feature which allows finer adjustments to fade, clarity, skin tone, tint, sharpness, saturation, contrast, temperature, exposure, and other properties. Users have the option of posting their photos to their profile, where they can also add captions and hashtags. Photos can also be exported back into the camera roll or shared with other social networking services. The users also have an option to edit their own videos from their camera roll with the VSCO yearly membership, but they are not able to post camera roll as VSCO Film X videos to their account on VSCO. JPEG and raw image files can be used. Research on image based social media platforms has found that engagement with posting, editing, and interacting with images can influence users' mood, self esteem, and body satisfaction. Studies also suggest that greater emotional investment in social media content is associated with increased negative psychological outcomes including stress and depressive symptoms. == In popular culture == VSCO's Oakland headquarters was a key filming location for Boots Riley's 2018 film Sorry to Bother You.

    Read more →
  • RFinder

    RFinder

    RFinder ("repeater finder") is a subscription-based website and mobile app. RFinder's main service is the World Wide Repeater Directory (WWRD), which is a directory of amateur radio repeaters. RFinder is the official repeater directory of several amateur radio associations. RFinder has listings for several amateur radio modes, including FM, D-STAR, DMR, and ATV. == World Wide Repeater Directory == Repeaters are listed in the directory along with its call sign, Maidenhead Locator System and GPS coordinates, transmit/receive offset ("split"), CTCSS and DCS squelch settings, and VoIP settings (IRLP and Echolink nodes). The directory has over 50,000 repeater listings in over 170 countries. === Website === The RFinder website has several search options including for routes. === Forums === RFinder user forums is for help and support for the app and hardware. === Mobile app === RFinder has mobile apps for Android and iOS. When using the mobile app, RFinder can display the distance to repeaters, based on the mobile device's current location. === ARRL Repeater Directory === The ARRL publishes the ARRL Repeater Directory which contains over 31,000 repeater listings for the US and Canada with listings provided by RFinder. == Subscription == RFinder requires a subscription. A one-year subscription is US$12.99. == Radio programming software == Some radio programming software applications can query RFinder and download repeater listing to program radios. Compatible software includes: CHIRP RT Systems == Radio associations == RFinder is the official repeater directory of the following associations: Amateur Radio Society Italy American Radio Relay League Cayman Amateur Radio Society Deutscher Amateur Radio Club Federacion Mexicana de Radio Experimentadores L’association Réseau des Émetteurs Français Lietuvos Radijo Mėgėjų Draugija Liga de Amadores Brasilieros de Radio Emissão Radio Amateurs of Canada Radio Society of Great Britain Rede dos Emissores Portugueses Unión de Radioaficionados Españoles

    Read more →
  • MLOps

    MLOps

    MLOps or ML Ops is a paradigm that aims to deploy and maintain machine learning models in production reliably and efficiently. It bridges the gap between machine learning development and production operations, ensuring that models are robust, scalable, and aligned with business goals. The word is a compound of "machine learning" and the continuous delivery practice (CI/CD) of DevOps in the software field. Machine learning models are tested and developed in isolated experimental systems. When an algorithm is ready to be launched, MLOps is practiced between data scientists, DevOps, and machine learning engineers to transition the algorithm to production systems. Similar to DevOps or DataOps approaches, MLOps seeks to increase automation and improve the quality of production models, while also focusing on business and regulatory requirements. While MLOps started as a set of best practices, it is slowly evolving into an independent approach to ML lifecycle management. MLOps applies to the entire lifecycle - from integrating with model generation (software development lifecycle, continuous integration/continuous delivery), orchestration, and deployment, to health, diagnostics, governance, and business metrics. == Definition == MLOps is a paradigm, including aspects like best practices, sets of concepts, as well as a development culture when it comes to the end-to-end conceptualization, implementation, monitoring, deployment, and scalability of machine learning products. Most of all, it is an engineering practice that leverages three contributing disciplines: machine learning, software engineering (especially DevOps), and data engineering. MLOps is aimed at productionizing machine learning systems by bridging the gap between development (Dev) and operations (Ops). Essentially, MLOps aims to facilitate the creation of machine learning products by leveraging these principles: CI/CD automation, workflow orchestration, reproducibility; versioning of data, model, and code; collaboration; continuous ML training and evaluation; ML metadata tracking and logging; continuous monitoring; and feedback loops. == History == Interest in operationalizing machine learning systems began to grow in the mid-2010s as ML projects started moving from experimentation to production use. The challenges associated with sustaining such systems were highlighted in a 2015 paper. The predicted growth in machine learning included an estimated doubling of ML pilots and implementations from 2017 to 2018, and again from 2018 to 2020. Reports show a majority (up to 88%) of corporate machine learning initiatives are struggling to move beyond test stages. However, those organizations that actually put machine learning into production saw a 3–15% profit margin increases. The MLOps market size was USD 2,191.8 Million in 2024, and is projected to be USD 16,613.4 Million in 2030. == Architecture == Machine Learning systems can be categorized in eight different categories: data collection, data processing, feature engineering, data labeling, model design, model training and optimization, endpoint deployment, and endpoint monitoring. Each step in the machine learning lifecycle is built in its own system, but requires interconnection. These are the minimum systems that enterprises need to scale machine learning within their organization. == Goals == There are a number of goals enterprises want to achieve through MLOps systems successfully implementing ML across the enterprise, including: Deployment and automation Reproducibility of models and predictions Diagnostics Governance and regulatory compliance Scalability Collaboration Business uses Monitoring and management A standard practice, such as MLOps, takes into account each of the aforementioned areas, which can help enterprises optimize workflows and avoid issues during implementation. Vendors such as Adaptive ML deliver commercial reinforcement learning operations (RLOps) and MLOps-infrastructure, targeting organizations deploying large language models in production. A common architecture of an MLOps system would include data science platforms where models are constructed and the analytical engines where computations are performed, with the MLOps tool orchestrating the movement of machine learning models, data and outcomes between the systems.

    Read more →
  • Tactical NAV

    Tactical NAV

    Tactical NAV, also known as TACNAV-X, is a location-based tracking app designed for use by military personnel. The app is primarily designed to assist in pinpointing enemy fire and mapping waypoints. Tactical NAV also helps users efficiently relay critical information to tactical operations centers for prompt decision-making regarding airstrikes or medical evacuations. The TACNAV-X platform is intended to enhance situational awareness, refine navigation capabilities, and assist in tactical decision-making across various operational environments. == Overview == Tactical NAV allows users to pinpoint enemy fire. == History == Tactical NAV was designed by U.S. Army Captain Jonathan J. Springer, a Field Artillery officer serving as a Battalion Fire Support Officer (FSO) in the 101st Airborne Division. Springer conceived the idea for the app during his third tour in Afghanistan in support of Operation Enduring Freedom. On June 25, 2010, after a rocket attack by the Taliban killed two soldiers in his battalion, he was inspired to create an app that would prevent similar losses in the future, enhance situational awareness, and assist soldiers serving on combat deployments. In 2010, Springer founded TacNav Systems (formerly AppDaddy Technologies) to develop mobile applications for use by military personnel. He tested the app during combat operations in eastern Afghanistan and verified TACNAV-X's accuracy using DAGRs, AFATDS, Falcon View, CPOF, ATAK, and other approved Department of Defense (DoD) systems. As of 2012, the app had been downloaded 8,000 times.

    Read more →
  • FoundationDB

    FoundationDB

    FoundationDB is a free and open-source multi-model distributed NoSQL database owned by Apple Inc. with a shared-nothing architecture. The product was designed around a "core" database, with additional features supplied in "layers." The core database exposes an ordered key–value store with transactions. The transactions are able to read or write multiple keys stored on any machine in the cluster while fully supporting ACID properties. Transactions are used to implement a variety of data models via layers. The FoundationDB Alpha program began in January 2012 and concluded on March 4, 2013, with their public Beta release. Their 1.0 version was released for general availability on August 20, 2013. On March 24, 2015, it was reported that Apple has acquired the company. A notice on the FoundationDB web site indicated that the company has "evolved" its mission and would no longer offer downloads of the software. On April 19, 2018, Apple open sourced the software, releasing it under the Apache 2.0 license. == Main features == The main features of FoundationDB include the following: Ordered key–value store In addition to supporting standard key-based reads and writes, the ordering property enables range reads that can efficiently scan large swaths of data. Transactions Transaction processing employs multiversion concurrency control for reads and optimistic concurrency for writes. Transactions can span multiple keys stored on multiple machines. ACID properties FoundationDB guarantees serializable isolation and strong durability via redundant storage on disk before transactions are considered committed. Layers Layers map new data models, APIs, and query languages to the FoundationDB core. They employ FoundationDB's ability to update multiple data elements in a single transaction, ensuring consistency. An example is their SQL layer. Commodity clusters FoundationDB is designed for deployment on distributed clusters of commodity hardware running Linux. Replication FoundationDB stores each piece of data on multiple machines according to a configurable replication factor. Triple replication is the recommended mode for clusters of 5 or more machines. Scalability FoundationDB is designed to support horizontal scaling through the addition of machines to a cluster while automatically handling data replication and partitioning. Systems supported FoundationDB supports packages for Linux, Windows, and macOS. The Linux version supports production clusters, while the Windows and macOS versions support local operation for development purposes. Configurations on Amazon EC2 are also supported. Programming language bindings FoundationDB supports language bindings for Python, Go, Ruby, Node.js, Java, PHP, and C, all of which are made available with the product. == Design limitations == The design of FoundationDB results in several limitations: Long transactions FoundationDB does not support transactions running over five seconds. Large transactions Transaction size cannot exceed 10 MB of total written keys and values. Large keys and values Keys cannot exceed 10 kB in size. Values cannot exceed 100 kB in size. == History == FoundationDB, headquartered in Vienna, Virginia, was started in 2009 by Nick Lavezzo, Dave Rosenthal, and Dave Scherer, drawing on their experience in executive and technology roles at their previous company, Visual Sciences. In March 2015 the FoundationDB Community site was updated to state that the company had changed directions and would no longer be offering downloads of its product. The company was acquired by Apple Inc., which was confirmed March 25, 2015. On April 19, 2018, Apple open sourced the software, releasing it under the Apache 2.0 license.

    Read more →