AI App Orange Logo

AI App Orange Logo — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Adversarial stylometry

    Adversarial stylometry

    Adversarial stylometry is the practice of altering writing style to reduce the potential for stylometry to discover the author's identity or their characteristics. This task is also known as authorship obfuscation or authorship anonymisation. Stylometry poses a significant privacy challenge in its ability to unmask anonymous authors or to link pseudonyms to an author's other identities, which, for example, creates difficulties for whistleblowers, activists, and hoaxers and fraudsters. The privacy risk is expected to grow as machine learning techniques and text corpora develop. All adversarial stylometry shares the core idea of faithfully paraphrasing the source text so that the meaning is unchanged but the stylistic signals are obscured. Such a faithful paraphrase is an adversarial example for a stylometric classifier. Several broad approaches to this exist, with some overlap: imitation, substituting the author's own style for another's; translation, applying machine translation with the hope that this eliminates characteristic style in the source text; and obfuscation, deliberately modifying a text's style to make it not resemble the author's own. Manually obscuring style is possible, but laborious; in some circumstances, it is preferable or necessary. Automated tooling, either semi- or fully-automatic, could assist an author. How best to perform the task and the design of such tools is an open research question. While some approaches have been shown to be able to defeat particular stylometric analyses, particularly those that do not account for the potential of adversariality, establishing safety in the face of unknown analyses is an issue. Ensuring the faithfulness of the paraphrase is a critical challenge for automated tools. It is uncertain if the practice of adversarial stylometry is detectable in itself. Some studies have found that particular methods produced signals in the output text, but a stylometrist who is uncertain of what methods may have been used may not be able to reliably detect them. == History == Rao & Rohatgi (2000), an early work in adversarial stylometry, identified machine translation as a possibility, but noted that the quality of translators available at the time presented severe challenges. Kacmarcik & Gamon (2006) is another early work. Brennan, Afroz & Greenstadt (2012) performed the first evaluation of adversarial stylometric methods on actual texts. Brennan & Greenstadt (2009) introduced the first corpus of adversarially authored texts specifically for evaluating stylometric methods; other corpora include the International Imitation Hemingway Competition, the Faux Faulkner contest, and the hoax blog A Gay Girl in Damascus. == Motivations == Rao & Rohatgi (2000) suggest that short, unattributed documents (i.e., anonymous posts) are not at risk of stylometric identification, but pseudonymous authors who have not practiced adversarial stylometry in producing corpuses of thousands of words may be vulnerable. Narayanan et al. (2012) attempted large-scale deanonymisation of 100,000 blog authors with mixed results: the identifications were significantly better than chance, but only accurately matched the blog and author a fifth of the time; identification improved with the number of posts written by the author in the corpus. Even if an author is not identified, some of their characteristics may still be deduced stylometrically, or stylometry may narrow the anonymity set of potential authors sufficiently for other information to complete the identification. Detecting author characteristics (e.g., gender or age) is often simpler than identifying an author from a large, possibly open, set of candidates. Modern machine learning techniques offer powerful tools for identification; further development of corpora and computational stylometric techniques are likely to raise further privacy issues. Gröndahl & Asokan (2020a) say that the general validity of the hypothesis underlying stylometry—that authors have invariant, content-independent 'style fingerprints'—is uncertain, but "the deanonymisation attack is a real privacy concern". Those interested in practicing adversarial stylometry and stylistic deception include whistleblowers avoiding retribution; journalists and activists; perpetrators of frauds and hoaxes; authors of fake reviews; literary forgers; criminals disguising their identity from investigators; and, generally, anyone with a desire for anonymity or pseudonymity. Authors, or agents acting on behalf of authors, may also attempt to remove stylistic clues to author characteristics (e.g., race or gender) so that knowledge of those characteristics cannot be used for discrimination (e.g., through algorithmic bias). Another possible use for adversarial stylometry is in disguising automatically generated text as human-authored. == Methods == With imitation, the author attempts to mislead stylometry by matching their style to another author's. An incomplete imitation, where some of the true author's unique characteristics appear alongside the imitated author's, can be a detectable signal for the use of adversarial stylometry. Imitation can be performed automatically with style transfer systems, though this typically requires a large corpus in the target style for the system to learn from. Another approach is translation, which employs machine translation of a source text to eliminate characteristic style, often through multiple translators in sequence to produce a round-trip translation. Such chained translation can lead to texts being significantly altered, even to the point of incomprehensibility; improved translation tools reduce this risk. More simply-structured texts can be easier to machine translate without losing the original meaning. Machine translation blurs into direct stylistic imitation or obfuscation achieved through automated style transfer, which can be viewed as a "translation" with the same language as input and output. With low-quality translation tools, an author can be required to manually correct major translation errors while avoiding the hazard of re-introducing stylistic characteristics. Wang, Juola & Riddell (2022) found that gross errors introduced by Google Translate were rare, but more common with several intermediate translations—however, occasional simple or short sentences and misspellings in the source text appeared verbatim in the output, potentially providing an identifying signal. Chain translation can leave characteristic traces of its application in a document, which may allow reconstruction of the intermediate languages used and the number of translation steps performed. Obfuscation involves deliberately changing the style of a text to reduce its similarity to other texts by some metric; this may be performed at the time of writing by conscious modification, or as part of a revision process with feedback from the metric being targeted as an input to decide when the text has been sufficiently obfuscated. In contrast to translation, complex texts can offer more opportunities for effective obfuscation without altering meaning, and likewise genres with more permissible variation allow more obfuscation. However, longer texts are harder to thoroughly obfuscate. Obfuscation can blend into imitation if the author develops a novel target style, distinct from their original style. With respect to masking author characteristics, obfuscation may aim to achieve a union (adding signals for imitated characteristics) or an intersection (removing signals and normalising) of other authors' styles. Avoiding the author's own idiosyncrasies and producing a "normalised" text is a critical obfuscatory step: an author may have a unique tendency to misspell certain words, use particular variants, or to format a document in a characteristic way. Stylometric signals vary in how simply they can be adversarially masked; an author may easily change their vocabulary by conscious choice, but altering the pattern of grammar or the letter frequency in their text may be harder to achieve, though Juola & Vescovi (2011) report that imitation typically succeeds at masking more characteristics than obfuscation. Automated obfuscation may require large amounts of training data written by the author. Concerning automated implementations of adversarial stylometry, two possible implementations are rule-based systems for paraphrasing; and encoder–decoder architectures, where the text passes through an intermediate format that is (intended to be) style-neutral. Another division in automated methods is whether there is feedback from an identification system or not. With such feedback, finding paraphrases for author masking has been characterised as a heuristic search problem, exploring textual variants until the result is stylistically sufficiently far (in the case of obfuscation) or near (in the case of imitation), which then constitutes an adversarial example for that identification system. == Evaluation == How

    Read more →
  • FMLLR

    FMLLR

    In signal processing, Feature space Maximum Likelihood Linear Regression (fMLLR) is a global feature transform that are typically applied in a speaker adaptive way, where fMLLR transforms acoustic features to speaker adapted features by a multiplication operation with a transformation matrix. In some literature, fMLLR is also known as the Constrained Maximum Likelihood Linear Regression (cMLLR). == Overview == fMLLR transformations are trained in a maximum likelihood sense on adaptation data. These transformations may be estimated in many ways, but only maximum likelihood (ML) estimation is considered in fMLLR. The fMLLR transformation is trained on a particular set of adaptation data, such that it maximizes the likelihood of that adaptation data given a current model-set. This technique is a widely used approach for speaker adaptation in HMM-based speech recognition. Later research also shows that fMLLR is an excellent acoustic feature for DNN/HMM hybrid speech recognition models. The advantage of fMLLR includes the following: the adaptation process can be performed within a pre-processing phase, and is independent of the ASR training and decoding process. this type of adapted feature can be applied to deep neural networks (DNN) to replace traditionally used mel-spectrogram in end-to-end speech recognition models. fMLLR's speaker adaptation process leads to a significant performance boost for ASR models, hence outperforming other transform or features like MFCCs (Mel-Frequency Cepstral Coefficients) and FBANKs (Filter bank) coefficients. fMLLR features can be efficiently realized with speech toolkits like Kaldi. Major problem and disadvantage of fMLLR: when the amount of adaptation data is limited, the transformation matrices tends to easily overfit the given data. == Computing fMLLR transform == Feature transform of fMLLR can be easily computed with the open source speech tool Kaldi, the Kaldi script uses the standard estimation scheme described in Appendix B of the original paper, in particular the section Appendix B.1 "Direct method over rows". In the Kaldi formulation, fMLLR is an affine feature transform of the form x {\displaystyle x} → A {\displaystyle A} x {\displaystyle x} + b {\displaystyle +b} , which can be written in the form x {\displaystyle x} →W x ^ {\displaystyle {\hat {x}}} , where x ^ {\displaystyle {\hat {x}}} = [ x 1 ] {\displaystyle {\begin{bmatrix}x\\1\end{bmatrix}}} is the acoustic feature x {\displaystyle x} with a 1 appended. Note that this differs from some of the literature where the 1 comes first as x ^ {\displaystyle {\hat {x}}} = [ 1 x ] {\displaystyle {\begin{bmatrix}1\\x\end{bmatrix}}} . The sufficient statistics stored are: K = ∑ t , j , m γ j , m ( t ) Σ j m − 1 μ j m x ( t ) + {\displaystyle K=\sum _{t,j,m}\gamma _{j,m}(t)\textstyle \Sigma _{jm}^{-1}\mu _{jm}x(t)^{+}\displaystyle } where Σ j m − 1 {\displaystyle \textstyle \Sigma _{jm}^{-1}\displaystyle } is the inverse co-variance matrix. And for 0 ≤ i ≤ D {\displaystyle 0\leq i\leq D} where D {\displaystyle D} is the feature dimension: G ( i ) = ∑ t , j , m γ j , m ( t ) ( 1 σ j , m 2 ( i ) ) x ( t ) + x ( t ) + T {\displaystyle G^{(i)}=\sum _{t,j,m}\gamma _{j,m}(t)\left({\frac {1}{\sigma _{j,m}^{2}(i)}}\right)x(t)^{+}x(t)^{+T}\displaystyle } For a thorough review that explains fMLLR and the commonly used estimation techniques, see the original paper "Maximum likelihood linear transformations for HMM-based speech recognition ". Note that the Kaldi script that performs the feature transforms of fMLLR differs with by using a column of the inverse in place of the cofactor row. In other words, the factor of the determinant is ignored, as it does not affect the transform result and can causes potential danger of numerical underflow or overflow. == Comparing with other features or transforms == Experiment result shows that by using the fMLLR feature in speech recognition, constant improvement is gained over other acoustic features on various commonly used benchmark datasets (TIMIT, LibriSpeech, etc). In particular, fMLLR features outperform MFCCs and FBANKs coefficients, which is mainly due to the speaker adaptation process that fMLLR performs. In, phoneme error rate (PER, %) is reported for the test set of TIMIT with various neural architectures: As expected, fMLLR features outperform MFCCs and FBANKs coefficients despite the use of different model architecture. Where MLP (multi-layer perceptron) serves as a simple baseline, on the other hand RNN, LSTM, and GRU are all well known recurrent models. The Li-GRU architecture is based on a single gate and thus saves 33% of the computations over a standard GRU model, Li-GRU thus effectively address the gradient vanishing problem of recurrent models. As a result, the best performance is obtained with the Li-GRU model on fMLLR features. == Extract fMLLR features with Kaldi == fMLLR can be extracted as reported in the s5 recipe of Kaldi. Kaldi scripts can certainly extract fMLLR features on different dataset, below are the basic example steps to extract fMLLR features from the open source speech corpora Librispeech. Note that the instructions below are for the subsets train-clean-100,train-clean-360,dev-clean, and test-clean, but they can be easily extended to support the other sets dev-other, test-other, and train-other-500. These instruction are based on the codes provided in this GitHub repository, which contains Kaldi recipes on the LibriSpeech corpora to execute the fMLLR feature extraction process, replace the files under $KALDI_ROOT/egs/librispeech/s5/ with the files in the repository. Install Kaldi. Install Kaldiio. If running on a single machine, change the following lines in $KALDI_ROOT/egs/librispeech/s5/cmd.sh to replace queue.pl to run.pl: Change the data path in run.sh to your LibriSpeech data path, the directory LibriSpeech/ should be under that path. For example: Install flac with: sudo apt-get install flac Run the Kaldi recipe run.sh for LibriSpeech at least until Stage 13 (included), for simplicity you can use the modified run.sh. Copy exp/tri4b/trans. files into exp/tri4b/decode_tgsmall_train_clean_/ with the following command: Compute the fMLLR features by running the following script, the script can also be downloaded here: Compute alignments using: Apply CMVN and dump the fMLLR features to new .ark files, the script can also be downloaded here: Use the Python script to convert Kaldi generated .ark features to .npy for your own dataloader, an example Python script is provided:

    Read more →
  • VGACAD

    VGACAD

    VGACAD was the parent of a suite of shareware graphic utilities made for the MS-DOS operating system used in the IBM PC and clones. It was popular for editing and capturing images using BSAVE (graphics image format) and provided an early graphic editing suite compatible with multiple graphic cards and resolutions, used on the IBM PC. == Usage == Written by Lawrence Gozum in 1987, it was the genesis of multiple versions and improvements over 10 years. Ran with his brother, Marvin initially helped with design ideas, strategic focus, technical support calls, and managing the early shareware business. The growth of the VGACAD suite grew quickly to preoccupy most of their time. Lawrence then focused more of his efforts on software and formed Applied Insights, to manage VGACAD and its offspring, VidFun, and Ai Picture Explorer. At its peak, its users ranged from individuals, Federal government offices, museums and major newspapers. == Features == VGACAD was a misnomer, and meant VGA-Computer Assisted Drawing, rather than computer-aided design, as CAD is commonly referred to today. Its longevity was due to its color accuracy, speed, small size, and that its suite of small utilities often worked stand-alone. One called VGACAP, for 'capture', dumped video memory into a file that could later be converted to popular graphic image formats, later made commonplace when Microsoft Windows programmed the print screen key to dump graphics into the clipboard. However, VGACAP ran insulated apart from early versions of Windows, and thus could capture screens were applications prohibited such function.

    Read more →
  • Logistics automation

    Logistics automation

    Logistics automation is the application of computer software or automated machinery to logistics operations in order to improve its efficiency. Typically this refers to operations within a warehouse or distribution center, with broader tasks undertaken by supply chain engineering systems and enterprise resource planning systems. Logistics automation systems can powerfully complement the facilities provided by these higher level computer systems. The focus on an individual node within a wider logistics network allows systems to be highly tailored to the requirements of that node. == Components == Logistics automation systems comprise a variety of hardware and software components: Fixed machinery Automated storage and retrieval systems, including: Cranes serve a rack of locations, allowing many levels of stock to be stacked vertically, and allowing for higher storage densities and better space utilization than alternatives. In systems produced by Amazon Robotics, automated guided vehicles move items to a human picker. Conveyors: Containers can enter automated conveyors in one area of the warehouse and, either through hard-coded rules or data input, be moved to a selected destination. Vertical carousels based on the paternoster lift system or using space optimization, similar to vending machines, but on a larger scale. Sortation systems: similar to conveyors but typically with higher capacity and able to divert containers more quickly. Typically used to distribute high volumes of small cartons to a large set of locations. Industrial robots: four- to six-axis industrial robots, e.g. palletizing robots, are used for palletizing, depalletizing, packaging, commissioning and order picking. Typically all of these will automatically identify and track containers using barcodes or, increasingly, RFID tags. Motion check weighers may be used to reject cases or individual products that are under or over their specified weight. They are often used in kitting conveyor lines to ensure all pieces belonging in the kit are present. Mobile technology Radio data terminals: these are handheld or truck-mounted terminals which connect by radio to logistics automation software and provide instructions to operators moving throughout the warehouse. Many also have barcode scanners to allow identification of containers more quickly and accurately than manual keyboard entry. Software Integration software: this provides overall control of the automation machinery and allows cranes to be connected to conveyors for seamless stock movements. Operational control software: provides low-level decision-making, such as where to store incoming containers, and where to retrieve them when requested. Business control software: provides higher-level functionality, such as identification of incoming deliveries/stock, scheduling order fulfillment, and assignment of stock to outgoing trailers. == Benefits == A typical warehouse or distribution center will receive stock of a variety of products from suppliers and store these until the receipt of orders from customers, whether individual buyers (e.g. mail order), retail branches (e.g. chain stores), or other companies (e.g. wholesalers). A logistics automation system may provide the following: Automated goods in processes: Incoming goods can be marked with barcodes and the automation system notified of the expected stock. On arrival, the goods can be scanned and thereby identified, and taken via conveyors, sortation systems, and automated cranes into an automatically assigned storage location. Automated goods retrieval for orders: On receipt of orders, the automation system is able to immediately locate goods and retrieve them to a pick-face location. Automated dispatch processing: Combining knowledge of all orders placed at the warehouse the automation system can assign picked goods into dispatch units and then into outbound loads. Sortation systems and conveyors can then move these onto the outgoing trailers. If needed, repackaging to ensure proper protection for further distribution or to change the package format for specific retailers/customers. A complete warehouse automation system can drastically reduce the workforce required to run a facility, with human input required only for a few tasks, such as picking units of product from a bulk packed case. Even here, assistance can be provided with equipment such as pick-to-light units. Smaller systems may only be required to handle part of the process. Examples include automated storage and retrieval systems, which simply use cranes to store and retrieve identified cases or pallets, typically into a high-bay storage system which would be unfeasible to access using fork-lift trucks or any other means. The use of Automatic Guided Vehicles maximizes the output compared to humans since they can do repetitive tasks for long hours and with least to no supervision. An AGV is built and programmed for precision and accuracy thereby reducing the chances of errors in a warehouse, especially when dealing with fragile goods. == Automation software == Software or cloud-based SaaS solutions are used for logistics automation which helps the supply chain industry in automating the workflow as well as management of the system. Knowledge @ Wharton staff writers noted in 2011 that some manufacturers and retailers were weathering the Great Recession "by signing up for pay-as-you-go logistics services available through the Internet 'cloud'". They identified the benefits and reduced costs which came from sharing information about shipments with suppliers, hauliers and end users. There is little generalized software available in this market. This is because there is no rule to generalize the system as well as work flow even though the practice is more or less the same. Most of the commercial companies do use one or the other of the custom solutions. But there are various software solutions that are being used within the departments of logistics. There are a few departments in Logistics, namely: Conventional Department, Container Department, Warehouse, Marine Engineering, Heavy Haulage, etc. Software used in these departments Conventional department : CVT software / CTMS software. Container Trucking: CTMS software Warehouse : WMS/WCS Improving Effectiveness of Logistics Management Logistical Network Information Transportation Sound Inventory Management Warehousing, Materials Handling & Packaging

    Read more →
  • Parkerian Hexad

    Parkerian Hexad

    The Parkerian Hexad is a set of six elements of information security proposed by Donn B. Parker in 1998. The Parkerian Hexad adds three additional attributes to the three classic security attributes of the CIA triad (confidentiality, integrity, availability). The Parkerian Hexad attributes are the following: Confidentiality Possession or Control Integrity Authenticity Availability Utility These attributes of information are atomic in that they are not broken down into further constituents; they are non-overlapping in that they refer to unique aspects of information. Any information security breach can be described as affecting one or more of these fundamental attributes of information. == Attributes from the CIA triad == === Confidentiality === Confidentiality refers to the "quality or state of being private or secret; known only to a limited few", or "the property that information is not made available or disclosed to unauthorized individuals, entities, or processes". For example: If an enterprise's strategic plans are leaked to competitors then this is a breach of confidentiality; If unauthorized persons gain access to an individual's financial records then that individual's confidentiality is breached. === Integrity === Integrity refers to being correct or consistent with the intended state of information. Any unauthorized modification of data, whether deliberate or accidental, is a breach of data integrity. For example: Data stored on disk are expected to be stable. If the data is changed at random by problems with a disk controller then this is a breach of integrity; Data generated by a medical device is transmitted and stored in the healthcare center but neither altered nor tampered with; Application programs are supposed to record information correctly. If the application introduces deviations from the intended values then this is a breach of integrity. "From Donn Parker: My definition of information integrity comes from the dictionaries. Integrity means that the information is whole, sound, and unimpaired (not necessarily correct). It means nothing is missing from the information it is complete and in intended good order". === Availability === Availability means having timely access to information. For example: A disk crash or denial-of-service attacks both cause a breach of availability. Any delay in response of a system that exceeds the expected service levels for that system can be described as a breach of availability. GPS jamming can lead to loss of Availability of the GPS system. == Parker's added attributes == === Authenticity === Authenticity is the "quality of being authentic or of established authority for truth and correctness". Parker defines it thus: "is the information genuine and accurate? Does it conform to reality and have validity?" and "authoritative, valid, true, real, genuine, or worthy of acceptance or belief by reason of conformity to fact and reality". === Possession or control === Possession or control refers to the loss of data by the authorized user (even if the ʺthiefʺ cannot access the data). From a control systems perspective, it is any loss of control (the ability to change settings and functions) or loss of view (the ability to monitor the system’s operation and its response to controls). Suppose a thief were to steal a sealed envelope containing a bank debit card and its personal identification number. Even if the thief did not open that envelope, it's reasonable for the victim to be concerned that the thief could do so at any time. That situation illustrates a loss of control or possession of information but does not involve the breach of confidentiality. === Utility === Utility refers to the data's usefulness. For example: Suppose someone encrypted data on disk to prevent unauthorized access or undetected modifications–and then lost the decryption key: that would be a breach of utility. The data would be confidential, controlled, integral, authentic, and available–they just wouldn't be useful in that form. The conversion of salary data from one currency into an inappropriate currency would be a breach of utility, as would the storage of data in a format inappropriate for a specific computer architecture; e.g., EBCDIC instead of ASCII or 9-track magnetic tape instead of DVD-ROM. A tabular representation of data substituted for a graph could be described as a breach of utility if the substitution made it more difficult to interpret the data. Utility is often confused with availability because breaches such as those described in these examples may also require time to work around the change in data format or presentation. However, the concept of usefulness is distinct from that of availability.

    Read more →
  • Kai's Power Tools

    Kai's Power Tools

    Kai's Power Tools (KPT) are a set of API plugins created by the German computer scientist Kai Krause in 1992 that were designed for use with Adobe Photoshop and Corel Photo-Paint. Kai's Power Tools were sold to Corel in 2000 when MetaCreations was closed. There are various versions of Kai's Power Tools. KPT 3, 5, 6, and X sets are compilations of different filters. The program interface features a reward-based function in which a bonus function is revealed as the user moves towards more complex aspects of the tool. == Filters == The KPT Convolver is a mathematics based filter; the level of precision and varying effects can be achieved by using numerical values of colour, tint, hue, saturation, contrast, brightness, luminosity, and posterize. The KPT Projector takes the current image or selection and offers a number of interactive perspective warp effects. To a large extent, with its draggable distortion handles and its moving, scaling and rotating options, this simply duplicates Adobe Photoshop's Free Transform capabilities. What is completely different is the ability to rotate the bitmap image in 3D space and to tile the results if desired. It can also animate the distortions by dragging keyframes from the preview window into an animation palette. KPT 6 will then preview the animation and output it to various sizes in avi or mov format. This animation capability is even more useful with the KPT Turbulence filter. This is another distortion filter, but one that treats the image as if it was completely liquid. The preview panel shows the animation in real time. The KPT Goo filter is used to produce a single frame freeform liquid distortion. This filter is available both with KPT 6 and the standalone version. It works by effectively turning a bitmap image into a liquid that can be interactively smeared, smudged, twirled, and pinched with the range of tools on offer. The obvious use is to distort photographic portraits into caricatures. KPT Materializer can create advanced surface textures based on bump maps that define troughs and peaks. It can use any external image for the basis of the bump map or alternatively the user can pick out the hue, saturation, luminance or red, green, or blue channel of the current image. It can then offset, scale and rotate the texture map, control its lighting, and even blend in a reflection map. The filter can be used for anything from providing an oil-painting feel to an entire image, to giving the illusion of depth to a selection. Also producing the impression of depth is the KPT Gel filter which uses various paint tools to synthesize photo-realistic 3D materials such as metals, liquids, or plastics. Gel painting is very different from traditional 2D painting as the brush strokes pool together when they touch and refract the underlying image. It can also manipulate 3D paint—once it has been added—by twirling, pinching, and carving it. The opposite is true of the Equalizer filter, which is used for applying variations on sharpening effects. The filter has three modes. The first mode, Equalizer, looks and works rather like the graphic equalizer on a stereo system, enabling adjustment of the level of pixel contrast within nine bands of different visual frequencies. The second mode, Contrast Sharpen, allows for increasing the contrast between light and dark areas in an image. The third mode, Bounded Sharpen, can sharpen an image without causing oversharpening, which can lead to halo effects. This feature is particularly useful when pulling out the detail in an image softened by resizing. KPT SceneBuilder is used for producing photorealistic 3D scenes by importing and rendering 3DS files. The main image window offers three tabs for editing in 2D and 3D mode and for setting up the object's final texture. Many users regard this filter as being the most impressive because it acts as a standalone 3D rendering tool and provides control over everything from transparency, reflection, refraction, bump mapping through to multiple light sources, and so on but without the ability to create or edit objects. The final filter, KPT SkyEffects, also has its roots in Metacreations' experience with 3D programs such as Bryce and RayDream. This filter is designed to simulate the interaction between the light from the sun or moon with no less than six atmospheric layers of haze, fog and cloud. The filter is typical of the KPT 6 collection as a whole: at times the interface is inspired and offers the ability to create beautiful reddening sunsets simply by interactively dragging the sun toward the horizon, producing realistic sunsets and moonscapes. == Other effects == Kai's Power Tools 6 features a lens flare effect for precisely managing the type of glow, halo, streaks, and reflection. The addition of a library of preset effects helps to overcome this by allowing the user to choose a standard effect and then interactively position the flare in the image preview. KPT 6 provides a new engine in the form of the KPT Reaction, which takes a reaction seed and turns it into a seamlessly tiling pattern based on a reaction diffusion process. It offers random noise, regular dots or reticulated voronoi patterns or a bitmap image itself as the seed. Corel has no plans for any updates.

    Read more →
  • Lenna

    Lenna

    Lenna (or Lena) is a standard test image used in the field of digital image processing, starting in 1973. It is a picture of the Swedish model Lena Forsén, shot by photographer Dwight Hooker and cropped from the centerfold of the November 1972 issue of Playboy magazine. Lenna has attracted controversy because of its subject matter. Starting in the mid-2010s, many journals have deemed it inappropriate and discouraged its use, while others have banned it from publication outright. Forsén herself has called for it to be retired, saying "It's time I retired from tech." The spelling "Lenna" came from the model's desire to encourage the proper pronunciation of her name. "I didn't want to be called Leena [English: ]," she explained. == History == Before Lenna, the first use of a Playboy magazine image to illustrate image processing algorithms was in 1961. Lawrence G. Roberts used two cropped six-bit grayscale facsimile scanned images from Playboy's July 1960 issue featuring Playmate Teddi Smith, in his master's thesis on image dithering at Massachusetts Institute of Technology. Lenna was originally intended for high resolution color image processing study. Its history was described in the May 2001 newsletter of the IEEE Professional Communication Society, in an article by Jamie Hutchinson: Alexander Sawchuk estimates that it was in June or July of 1973 when he, then an assistant professor of electrical engineering at the University of Southern California Signal and Image Processing Institute (SIPI), along with a graduate student and the SIPI lab manager, was hurriedly searching the lab for a good image to scan for a colleague's conference paper. They got tired of their stock of usual test images, dull stuff dating back to television standards work in the early 1960s. They wanted something glossy to ensure good output dynamic range, and they wanted a human face. Just then, somebody happened to walk in with a recent issue of Playboy. The engineers tore away the top third of the centerfold so they could wrap it around the drum of their Muirhead wirephoto scanner, which they had outfitted with analog-to-digital converters (one each for the red, green, and blue channels) and a Hewlett Packard 2100 minicomputer. The Muirhead had a fixed resolution of 100 lines per inch and the engineers wanted a 512×512 image, so they limited the scan to the top 5.12 inches of the picture, effectively cropping it at the subject's shoulders. The image's reach was limited in the 1970s and 80s, which is reflected in it initially only appearing in .org domains, but in July 1991, the image featured on the cover of Optical Engineering alongside Peppers, another popular test image. This drew the attention of Playboy to the potential copyright infringement. The peak of image hits on the internet was in 1995. The scan became one of the most used images in computer history. The use of the photo in electronic imaging has been described as "clearly one of the most important events in [its] history". The image spread to over 100 different domains, particularly .com and .edu. In a 1999 issue of IEEE Transactions on Image Processing "Lena" was used in three separate articles, and the picture continued to appear in scientific journals throughout the beginning of the 21st century. Lenna is so widely accepted in the image processing community that Forsén was a guest at the 50th annual Conference of the Society for Imaging Science and Technology (IS&T) in 1997. In 2015, Lena Forsén was also guest of honor at the banquet of IEEE ICIP 2015. After delivering a speech, she chaired the best paper award ceremony. To explain why the image became a standard in the field, David C. Munson, editor-in-chief of IEEE Transactions on Image Processing, stated that it was a good test image because of its detail, flat regions, shading, and texture. He also noted that "the Lena image is a picture of an attractive woman. It is not surprising that the (mostly male) image processing research community gravitated toward an image that they found attractive." While Playboy often cracks down on illegal uses of its material and did initially send a notice to the publisher of Optical Engineering about its unauthorized use in that publication, over time it has decided to overlook the wide use of Lena. Eileen Kent, VP of new media at Playboy, said, "We decided we should exploit this, because it is a phenomenon." == Criticism == The use of the image has produced controversy because Playboy is "seen (by some) as being degrading to women". In a 1999 essay on reasons for the male predominance in computer science, applied mathematician Dianne P. O'Leary wrote: Suggestive pictures used in lectures on image processing ... convey the message that the lecturer caters to the males only. For example, it is amazing that the "Lena" pin-up image is still used as an example in courses and published as a test image in journals today. A 2012 paper on compressed sensing used a photo of the model Fabio Lanzoni as a test image to draw attention to this issue. The use of the test image at the magnet school Thomas Jefferson High School for Science and Technology in Fairfax County, Virginia, provoked a guest editorial by a senior in The Washington Post in 2015 about its detrimental impact on aspiring female students in computer science. In 2017, the Journal of Modern Optics published an editorial titled "On alternatives to Lenna" suggesting three images (Pirate, Cameraman, and Peppers) that "are reasonably close to Lenna in feature space". In 2018, the Nature Nanotechnology journal announced that they would no longer consider articles using Lenna. In the same year SPIE, the publishers of Optical Engineering, also announced that they "strongly discourage" the use of Lenna, and would no longer consider new submissions containing the image "without convincing scientific justification for its use". They noted that aside from the copyright and ethical issues, that it was also no longer useful as a standard image: "In today's age of high-resolution digital image technology, it seems difficult to argue that a 512 × 512 image produced with a 1970s-era analog scanner is the best we have to offer as an image quality test standard". Forsén stated in the 2019 documentary film Losing Lena, "I retired from modeling a long time ago. It's time I retired from tech, too... Let's commit to losing me." The Institute of Electrical and Electronics Engineers (IEEE) announced that, starting April 1, 2024, it will no longer allow use of Lenna in its publications.

    Read more →
  • DataScene

    DataScene

    DataScene is a scientific graphing, animation, data analysis, and real-time data monitoring software package. It was developed with the Common Language Infrastructure technology and the GDI+ graphics library. With the two Common Language Runtime engines - the .Net and Mono frameworks - DataScene runs on all major operating systems. With DataScene, the user can plot 39 types 2D & 3D graphs (e.g., Area graph, Bar graph, Boxplot graph, Pie graph, Line graph, Histogram graph, Surface graph, Polar graph, Water Fall graph, etc.), manipulate, print, and export graphs to various formats (e.g., Bitmap, WMF/EMF, JPEG, PNG, GIF, TIFF, PostScript, and PDF), analyze data with different mathematical methods (fitting curves, calculating statics, FFT, etc.), create chart animations for presentations (e.g. with PowerPoint), classes, and web pages, and monitor and chart real-time data. == History == DataScene was first released (version 1.0) in March 2009 for the Windows platform and the .Net 2.0 framework. Since version 2.0, DataScene has been ported to the Mono framework 2.6 and all Linux and Unix/X11 operating systems. Cyberwit offers free licensing for the Express edition of DataScene.

    Read more →
  • Language identification

    Language identification

    In natural language processing, language identification or language guessing is the problem of determining which natural language a given content is in. Computational approaches to this problem view it as a special case of text categorization, solved with various statistical methods. == Overview == === Logical approach === A common non-statistical intuitive approach (though highly uncertain) is to look for common letter combinations, or distinctive diacritics or punctuation. === Statistical approach === There are several statistical approaches to language identification. An older statistical method by Grefenstette was based on the frequency of short n-grams, which are often function morphemes. For example, "ing" is more common in English than in French, while the sequence "que" is more common in French. Given a new page found on the Web, one counts the number of occurrences of each such short sequence and picks the language whose frequency table it matches the most. One technique is to compare the compressibility of the text to the compressibility of texts in a set of known languages. This approach is known as mutual information based distance measure. The same technique can also be used to empirically construct family trees of languages which closely correspond to the trees constructed using historical methods. Mutual information based distance measure is essentially equivalent to more conventional model-based methods and is not generally considered to be either novel or better than simpler techniques. Another technique, as described by Cavnar and Trenkle (1994) and Dunning (1994) is to create a language n-gram model from a "training text" for each of the languages. These models can be based on characters (Cavnar and Trenkle) or encoded bytes (Dunning); in the latter, language identification and character encoding detection are integrated. Then, for any piece of text needing to be identified, a similar model is made, and that model is compared to each stored language model. The most likely language is the one with the model that is most similar to the model from the text needing to be identified. This approach can be problematic when the input text is in a language for which there is no model. In that case, the method may return another, "most similar" language as its result. Also problematic for any approach are pieces of input text that are composed of several languages, as is common on the Web. As of 2025, a commonly used baseline method is via the fastText library, which has comparable classification accuracy as deep learning techniques, but much faster. == Identifying similar languages == One of the great bottlenecks of language identification systems is to distinguish between closely related languages. Similar languages like Bulgarian and Macedonian or Indonesian and Malay present significant lexical and structural overlap, making it challenging for systems to discriminate between them. In 2014 the DSL shared task has been organized providing a dataset (Tan et al., 2014) containing 13 different languages (and language varieties) in six language groups: Group A (Bosnian, Croatian, Serbian), Group B (Indonesian, Malaysian), Group C (Czech, Slovak), Group D (Brazilian Portuguese, European Portuguese), Group E (Peninsular Spanish, Argentine Spanish), Group F (American English, British English). The best system reached performance of over 95% results (Goutte et al., 2014). Results of the DSL shared task are described in Zampieri et al. 2014. == Software == Apache OpenNLP includes char n-gram based statistical detector and comes with a model that can distinguish 103 languages Apache Tika contains a language detector for 18 languages

    Read more →
  • Robinson compass mask

    Robinson compass mask

    In image processing, a Robinson compass mask is a type of compass mask used for edge detection. It has eight major compass orientations, each will extract the edges in respect to its direction. A combined use of compass masks of different directions could detect the edges from different angles. == Technical explanation == The Robinson compass mask is defined by taking a single mask and rotating it to form eight orientations: North: [ − 1 0 1 − 2 0 2 − 1 0 1 ] {\displaystyle {\text{North:}}{\begin{bmatrix}-1&0&1\\-2&0&2\\-1&0&1\end{bmatrix}}} North West: [ 0 1 2 − 1 0 1 − 2 − 1 0 ] {\displaystyle {\text{North West:}}{\begin{bmatrix}0&1&2\\-1&0&1\\-2&-1&0\end{bmatrix}}} West: [ 1 2 1 0 0 0 − 1 − 2 − 1 ] {\displaystyle {\text{West:}}{\begin{bmatrix}1&2&1\\0&0&0\\-1&-2&-1\end{bmatrix}}} South West: [ 2 1 0 1 0 − 1 0 − 1 − 2 ] {\displaystyle {\text{South West:}}{\begin{bmatrix}2&1&0\\1&0&-1\\0&-1&-2\end{bmatrix}}} South: [ 1 0 − 1 2 0 − 2 1 0 − 1 ] {\displaystyle {\text{South:}}{\begin{bmatrix}1&0&-1\\2&0&-2\\1&0&-1\end{bmatrix}}} South East: [ 0 − 1 − 2 1 0 − 1 2 1 0 ] {\displaystyle {\text{South East:}}{\begin{bmatrix}0&-1&-2\\1&0&-1\\2&1&0\end{bmatrix}}} East: [ − 1 − 2 − 1 0 0 0 1 2 1 ] {\displaystyle {\text{East:}}{\begin{bmatrix}-1&-2&-1\\0&0&0\\1&2&1\end{bmatrix}}} North East: [ − 2 − 1 0 − 1 0 1 0 1 2 ] {\displaystyle {\text{North East:}}{\begin{bmatrix}-2&-1&0\\-1&0&1\\0&1&2\end{bmatrix}}} The direction axis is the line of zeros in the matrix. Robinson compass mask is similar to kirsch compass masks, but is simpler to implement. Since the matrix coefficients only contains 0, 1, 2, and are symmetrical, only the results of four masks need to be calculated, the other four results are the negation of the first four results. An edge, or contour is an tiny area with neighboring distinct pixel values. The convolution of each mask with the image would create a high value output where there is a rapid change of pixel value, thus an edge point is found. All the detected edge points would line up as edges. == Example == An example of Robinson compass masks applied to the original image. Obviously, the edges in the direction of the mask is enhanced.

    Read more →
  • D/Vision Pro

    D/Vision Pro

    D/Vision Pro was one of the earliest marketed non-linear editing systems. It was released by TouchVision Systems, Inc. in the mid-1990s. The program was DOS-based and worked on either Intel's 386 or 486 processor. The system used AVI compression and worked with the Action Media II board. The system allowed users to digitize video, audio, and timecode, create an edit decision list (EDL), instantly play back the edited program, and output the finished EDL in a wide variety of formats. These cost-effective editing systems were used by numerous independent filmmakers and in low-budget productions during the mid-late 1990s. D/Vision Pro's low-quality compression led TouchVision (later renamed D/Vision Systems) to abandon it in favor of D/Vision Online, which was purchased by Discreet Logic and renamed edit. In June 2002, Discreet discontinued edit, as they did not want it to interfere with smoke sales which were more profitable. Discreet was later purchased by Autodesk.

    Read more →
  • Video editing software

    Video editing software

    Video editing software or a video editor is software used for performing the post-production video editing of digital video sequences on a non-linear editing system (NLE). It has replaced traditional flatbed celluloid film editing tools and analog video tape editing machines. Video editing software serves a lot of purposes, such as filmmaking, audio commentary, and general editing of video content. In NLE software, the user manipulates sections of video, images, and audio on a sequence. These clips can be trimmed, cut, and manipulated in many different ways. When editing is finished, the user exports the sequence as a video file. == Components == === Timeline === NLE software is typically based on a timeline interface where sections moving image video recordings, known as clips, are laid out in sequence and played back. The NLE offers a range of tools for trimming, splicing, cutting, and arranging clips across the timeline. Another kind of clip is a text clip, used to add text to a video, such as title screens or movie credits. Audio clips can additionally be mixed together, such as mixing a soundtrack with multiple sound effects. Typically, the timeline is divided into multiple rows on the y-axis for different clips playing simultaneously, whereas the x-axis represents the run time of the video. Effects such as transitions can be performed on each clip, such as a crossfade effect going from one scene to another. === Exporting === Since video editors represent a project with a file format specific to the program, one needs to export the video file in order to publish it. Once a project is complete, the editor can then export to movies in a variety of formats in a context that may range from broadcast tape formats to compressed video files for web publishing (such as on an online video platform or personal website), optical media, or saved to mobile devices. To facilitate editing, source video typically has a higher resolution than the desired output. Therefore, higher resolution video needs to be downscaled during exporting, or after exporting in a process known as transsizing. === Visual effects === As digital video editing advanced, visual effects became possible, and is part of the standard toolkit, usually found in prosumer and professional grade software. A common ability is to do compositing techniques such as chroma keying or luma keying, among others, which allow different objects to look as if they are in the same scene. A different kind of visual effects is motion capture. Software such as Blender can perform motion capture to make animated objects follow an actor's movements. === Additional features === Most professional video editors are able to do color grading, which is to manipulate visual attributes of a video such as contrast to enhance output, and improve emotional impact. Some video editors such as iMovie include stock footage available for use. == Hardware requirements == As video editing puts great demands on storage and graphics performance, especially at high resolutions such as 4K, and for videos with many visual effects, powerful hardware is often required. It is not uncommon for a computer built for video editing to have a lot of drive capacity, and a powerful graphics processing unit, which optimally has hardware accelerated video encoding. Having sufficient disk space is important since videos can take up large amounts of storage, depending on the resolution and compression format used. Each minute of a Full HD (1080p) video at 30 fps takes up 60MB of space. When visual effects are used, a server farm can be employed to speed up the rendering process. == Examples == Video editing software can be divided into consumer grade, which focuses on ease-of-use, along with professional grade software, which focuses on feature availability, and advanced editing techniques. The typical use case for the former is to edit personal videos on the go, when more advanced editing is not required. === Consumer grade === Photos (Apple) Google Photos YouTube Create === Prosumer grade === ==== Proprietary software ==== iMovie CyberLink PowerDirector === Professional grade === ==== Proprietary software ==== Final Cut Pro Adobe Premiere Pro DaVinci Resolve Vegas Pro Lightworks Camtasia Media Composer ==== Free and open source software ==== Avidemux Blender Cinelerra Flowblade Kdenlive OpenShot Shotcut While most video editing software has been separate from the operating systems, some operating systems have had a video editor installed by default, such as Windows Movie Maker in Windows XP, or as a component of the default photo viewer, such as the Photos app on iOS. Some social media platforms, such as TikTok and Instagram may include a rudimentary video editor to trim clips.

    Read more →
  • ISO/IEC JTC 1/SC 24

    ISO/IEC JTC 1/SC 24

    ISO/IEC JTC 1/SC 24 Computer graphics, image processing and environmental data representation is a standardization subcommittee of the joint subcommittee ISO/IEC JTC 1 of the International Organization for Standardization (ISO) and the International Electrotechnical Commission (IEC), which develops and facilitates standards within the field of computer graphics, image processing, and environmental data representation. The international secretariat of ISO/IEC JTC 1/SC 24 is the British Standards Institute (BSI) located in the United Kingdom. == History == ISO/IEC JTC 1/SC 24 was formed in 1987 from ISO/TC 97 as a result of Resolution 21 at the ISO/IEC JTC 1 plenary. The group's origins began in computer graphics, the standardization of which was originally under ISO/IEC JTC 1/SC 21/WG 2. However, when ISO/IEC JTC 1/SC 24 was created, the standardization activity of ISO/IEC JTC 1/SC 21/WG 2 was carried over to the new subcommittee. The initial five working groups of ISO/IEC JTC 1/SC 24 were titled, “Architecture,” “Application programming interfaces,” “Metafiles and interfaces,” “Language bindings,” and “Validation, testing and registration.” The work of ISO/IEC JTC 1/SC 24 began with the Graphical Kernel System (GKS), which was adopted from ISO/IEC JTC 1/SC 21/WG 2. However, since GKS only addressed 2D functionality, attention turned to the standardization of 3D functionality. This resulted in two standards being published: GKS-3D in 1988 and PHIGS in 1989, both of which addressed 3D functionality. Since 1991, ISO/IEC JTC 1/SC 24 has held plenaries in a number of countries, including the Netherlands, Germany, United States, France, Canada, Japan, Sweden, Korea, United Kingdom, Australia, and Czech Republic. == Scope == The scope of ISO/IEC JTC 1/SC 24 is the “Standardization of interfaces for information technology based applications relating to”: Computer graphics Image processing Environmental data representation Support for the Mixed and Augmented Reality (MAR) Interaction with, and visual representation of, information Included are the following related areas: Modeling and simulation and related reference models Virtual reality with accompanying augmented reality/augmented virtuality aspects and related reference models Application program interfaces Functional specifications Representation models Interchange formats, encodings and their specifications, including metafiles Device interfaces Testing methods Registration procedures Presentation and support for creation of multimedia, hypermedia, and mixed reality documents Excluded are the following areas: Character and image coding Coding of multimedia, hypermedia, and mixed reality document interchange formats JTC 1 work in user system interfaces and document presentation ISO/TC 207 work on ISO 14000 environment management, ISO/TC 211 work on geographic information and geomatics Software environments as described by ISO/IEC JTC 1/SC 22 == Structure == ISO/IEC JTC 1/SC 24 is made up of four active working groups, each of which carries out specific tasks in standards development within the field of computer graphics, image processing and environmental data representation, together with ITU-T Study Group 16. As a response to changing standardization needs, working groups of ISO/IEC JTC 1/SC 24 can be disbanded if their area of work is no longer applicable, or established if new working areas arise. The focus of each working group is described in the group's terms of reference. Active working groups of ISO/IEC JTC 1/SC 24 are: == Collaborations == ISO/IEC JTC 1/SC 24 works in close collaboration with a number of other organizations or subcommittees, both internal and external to ISO or IEC, in order to avoid conflicting or duplicative work. Organizations internal to ISO or IEC that collaborate with or are in liaison to ISO/IEC JTC 1/SC 24 include: ISO/IEC JTC 1/WG 7, Sensor Networks ISO/IEC JTC 1/SC 29, Coding of audio, picture, multimedia and hypermedia information ISO/IEC JTC 1/SC 32, Data management and interchange ISO/TAG 14, Imagery and technology ISO/TC 130, Graphic Technology ISO/TC 184/SC 4, Industrial data ISO/TC 211, Geographic information/Geomatics ISO/TC 215, Health informatics IEC TC 100, Audio, video and multimedia system and equipment Some organizations external to ISO or IEC that collaborate with or are in liaison to ISO/IEC JTC 1/SC 24 include: Defence Geospatial Information Working Group (DGIWG) Digital Imaging and Communications in Medicine (DICOM) International Hydrographic Organization (IHO) The Khronos Group NATO - Joint Intelligence Surveillance and Reconnaissance Capability Group (JISRCG) OMG Robotics DTF Open CGM Open Geospatial Consortium (OGC) SEDRIS Organization Simulation Interoperability Standards Organization (SISO) US National Imagery Transmission Format Standard (NITFS) Technical Board (US NTB) Web3D Consortium World Intellectual Property Organization (WIPO) World Wide Web Consortium (W3C) == Member countries == Countries pay a fee to ISO to be members of subcommittees. The 11 "P" (participating) members of ISO/IEC JTC 1/SC 24 are: Australia, China, Egypt, France, India, Japan, Republic of Korea, Portugal, Russian Federation, United Kingdom, and United States. The 22 "O" (observer) members of ISO/IEC JTC 1/SC 24 are: Argentina, Austria, Belgium, Bosnia and Herzegovina, Bulgaria, Canada, Cuba, Czech Republic, Finland, Ghana, Hungary, Iceland, Indonesia, Islamic Republic of Iran, Italy, Kazakhstan, Malaysia, Poland, Romania, Serbia, Slovakia, Switzerland, and Thailand. == Published standards == ISO/IEC JTC 1/SC 24 currently has 80 published standards under their direct responsibility within the field of computer graphics, image processing, and environmental data representation, including:

    Read more →
  • Couch to 5K

    Couch to 5K

    Couch to 5K, abbreviated C25K, is an exercise plan that gradually progresses from beginner running toward a 5 kilometre (3.1 mile) run over nine weeks. == Operations == The Couch to 5K running plan, also known as C25K, created by Josh Clark in 1996, was developed with the expectation of creating a plan for new runners to start running. The plan is aimed to have users work out for 20 to 30 minutes, three days a week. Within the program, users can be expected to perform different tasks such as intervals of running with period of short walks in between to help build endurance in the weeks up to the final goal of a 5K run. During the nine weeks leading up to the race, the runner will learn to set their own pace and where their strengths and weaknesses are within running. Often, the daily workouts start with a five-minute warm-up walk and works up to running five kilometres without a walking break within nine weeks. Users are not expected to have any experience in running and can be some of the first running that they ever do. The main goal is to turn that unexperienced runner into someone who can run a 5K. Clark started the website Kick and featured C25K on the site. In 2001, Kick merged with Cool Running, a New England–based running site. Clark later sold his stake in Cool Running and the Couch to 5K program. Cool Running was absorbed into Active.com, operated by Active Network, LLC. Active Network provides mobile apps for Couch to 5K, as well as 5K to 10K, a follow-up program. The NHS in the UK provides downloadable podcasts and a smartphone app (Android and iOS) for the plan. A mobile app, created by Zen Labs, has training plans that are based on the Couch to 5K running plan from CoolRunning.com. It is one of the highest-rated health and fitness apps available on Android and iOS. As of 2016, the C25K app has been used by over 5 million people.

    Read more →
  • Image scaling

    Image scaling

    In computer graphics and digital imaging, image scaling is the resizing of a digital image. In video technology, the magnification of digital material is known as upscaling or resolution enhancement. When scaling a vector graphic image, the graphic primitives that make up the image can be rendered using geometric transformations at any resolution with no loss of image quality. When scaling a raster graphics image, a new image with a higher or lower number of pixels must be generated. In the case of decreasing the pixel number (scaling down), this usually results in a visible quality loss. From the standpoint of digital signal processing, the scaling of raster graphics is a two-dimensional example of sample-rate conversion, the conversion of a discrete signal from a sampling rate (in this case, the local sampling rate) to another. == Mathematical == Image scaling can be interpreted as a form of image resampling or image reconstruction from the view of the Nyquist sampling theorem. According to the theorem, downsampling to a smaller image from a higher-resolution original can only be carried out after applying a suitable 2D anti-aliasing filter to prevent aliasing artifacts. The image is reduced to the information that can be carried by the smaller image. In the case of up sampling, a reconstruction filter takes the place of the anti-aliasing filter. A more sophisticated approach to upscaling treats the problem as an inverse problem, solving the question of generating a plausible image that, when scaled down, would look like the input image. A variety of techniques have been applied for this, including optimization techniques with regularization terms and the use of machine learning from examples. == Algorithms == An image size can be changed in several ways. === Nearest-neighbor interpolation === One of the simpler ways of increasing image size is nearest-neighbor interpolation, replacing every pixel with the nearest pixel in the output; for upscaling, this means multiple pixels of the same color will be present. This can preserve sharp details but also introduce jaggedness in previously smooth images. 'Nearest' in nearest-neighbor does not have to be the mathematical nearest. One common implementation is to always round toward zero. Rounding this way produces fewer artifacts and is faster to calculate. This algorithm is often preferred for images which have little to no smooth edges. A common application of this can be found in pixel art. === Bilinear and bicubic interpolation === Bilinear interpolation works by interpolating pixel color values, introducing a continuous transition into the output even where the original material has discrete transitions. Although this is desirable for continuous-tone images, this algorithm reduces contrast (sharp edges) in a way that may be undesirable for line art. Bicubic interpolation yields substantially better results, with an increase in computational cost. === Sinc and Lanczos resampling === Sinc resampling, in theory, provides the best possible reconstruction for a perfectly bandlimited signal. In practice, the assumptions behind sinc resampling are not completely met by real-world digital images. Lanczos resampling, an approximation to the sinc method, yields better results. Bicubic interpolation can be regarded as a computationally efficient approximation to Lanczos resampling. === Box sampling === One weakness of bilinear, bicubic, and related algorithms is that they sample a specific number of pixels. When downscaling below a certain threshold, such as more than twice for all bi-sampling algorithms, the algorithms will sample non-adjacent pixels, which results in both losing data and rough results. The trivial solution to this issue is box sampling, which is to consider the target pixel a box on the original image and sample all pixels inside the box. This ensures that all input pixels contribute to the output. The major weakness of this algorithm is that it is hard to optimize. === Mipmap === Another solution to the downscale problem of bi-sampling scaling is mipmaps. A mipmap is a prescaled set of downscaled copies. When downscaling, the nearest larger mipmap is used as the origin to ensure no scaling below the useful threshold of bilinear scaling. This algorithm is fast and easy to optimize. It is standard in many frameworks, such as OpenGL. The cost is using more image memory, exactly one-third more in the standard implementation. === Fourier-transform methods === Simple interpolation based on the Fourier transform pads the frequency domain with zero components (a smooth window-based approach would reduce the ringing). Besides the good conservation (or recovery) of details, notable are the ringing and the circular bleeding of content from the left border to the right border (and the other way around). === Edge-directed interpolation === Edge-directed interpolation algorithms aim to preserve edges in the image after scaling, unlike other algorithms, which can introduce staircase artifacts. Examples of algorithms for this task include New Edge-Directed Interpolation (NEDI), Edge-Guided Image Interpolation (EGGI), Iterative Curvature-Based Interpolation (ICBI), and Directional Cubic Convolution Interpolation (DCCI). A 2013 analysis found that DCCI had the best scores in peak signal-to-noise ratio and structural similarity on a series of test images. === hqx === For magnifying computer graphics with low resolution and/or few colors (usually from 2 to 256 colors), better results can be achieved by hqx or other pixel-art scaling algorithms. These produce sharp edges and maintain a high level of detail. === Vectorization === Vector extraction, or vectorization, offers another approach. Vectorization first creates a resolution-independent vector representation of the graphic to be scaled. The resulting SVG vector file can then be exported and rendered at any required resolution without quality loss, serving directly as production-ready artwork for scalable display & printing. This technique is used by Adobe Illustrator, Live Trace, and Inkscape. Scalable Vector Graphics are well suited to simple geometric images, while photographs do not fare well with vectorization due to their complexity. === Deep convolutional neural networks === This method uses machine learning for more detailed images, such as photographs and complex artwork. Programs that use this method include waifu2x, Imglarger and Neural Enhance. Demonstration of conventional vs. waifu2x upscaling with noise reduction, using a detail of Phosphorus and Hesperus by Evelyn De Morgan. [Click image for full size] AI-driven upscaling software allows detail and sharpness to be added to historical photographs, where it is not present in the original. The availability of AI upscaling tools has led to confusion where a person believes that the upscaled version of a blurry image is genuinely showing them the subject of the original photograph. In 2025 a user of the social media site X posted an AI-upscaled version of a low resolution photo of Donald Trump that they had zoomed in on, and asked if anyone could "explain what the hell is happening to his forehead". Experts noted that the image had been distorted by the upscaling process, and that such tools "inevitably have to invent, or at least recreate, details that were or were not there". == Applications == === General === Image scaling is used in, among other applications, web browsers, image editors, image and file viewers, software magnifiers, digital zoom, the process of generating thumbnail images, and when outputting images through screens or printers. === Video === This application is the magnification of images for home theaters for HDTV-ready output devices from PAL-Resolution content, for example, from a DVD player. Upscaling is performed in real time, and the output signal is not saved. === Pixel-art scaling === As pixel-art graphics are usually low-resolution, they rely on careful placement of individual pixels, often with a limited palette of colors. This results in graphics that rely on stylized visual cues to define complex shapes with little resolution, down to individual pixels. This makes scaling pixel art a particularly difficult problem. Specialized algorithms were developed to handle pixel-art graphics, as the traditional scaling algorithms do not take perceptual cues into account. Since a typical application is to improve the appearance of fourth-generation and earlier video games on arcade and console emulators, many are designed to run in real time for small input images at 60 frames per second. On fast hardware, these algorithms are suitable for gaming and other real-time image processing. These algorithms provide sharp, crisp graphics, while minimizing blur. Scaling art algorithms have been implemented in a wide range of emulators such as HqMAME and DOSBox, as well as 2D game engines and game engine recreations such as ScummVM. They gained recognition with game

    Read more →