AI Analyst Jobs

AI Analyst Jobs — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Language model benchmark

    Language model benchmark

    A language model benchmark is a standardized test designed to evaluate the performance of language models on various natural language processing tasks. These tests are intended for comparing different models' capabilities in areas such as language understanding, generation, and reasoning. Benchmarks generally consist of a dataset and corresponding evaluation metrics. The dataset provides text samples and annotations, while the metrics measure a model's performance on tasks like answering questions, text classification, and machine translation. These benchmarks are developed and maintained by academic institutions, research organizations, and industry players to track progress in the field. In addition to accuracy, the metrics can include throughput, energy efficiency, bias, trust, and sustainability. == Overview == === Types === Benchmarks may be described by the following adjectives, not mutually exclusive: Classical: These tasks are studied in natural language processing, even before the advent of deep learning. Examples include the Penn Treebank for testing syntactic and semantic parsing, as well as bilingual translation benchmarked by BLEU scores. Question answering: These tasks have a text question and a text answer, often multiple-choice. They can be open-book or closed-book. Open-book QA resembles reading comprehension questions, with relevant passages included as annotation in the question, in which the answer appears. Closed-book QA includes no relevant passages. Closed-book QA is also called open-domain question-answering. Before the era of large language models, open-book QA was more common, and understood as testing information retrieval methods. Closed-book QA became common since GPT-2 as a method to measure knowledge stored within model parameters. Omnibus: An omnibus benchmark combines many benchmarks, often previously published. It is intended as an all-in-one benchmarking solution. Reasoning: These tasks are usually in the question-answering format, but are intended to be more difficult than standard question answering. Multimodal: These tasks require processing not only text, but also other modalities, such as images and sound. Examples include OCR and transcription. Agency: These tasks are for a language-model–based software agent that operates a computer for a user, such as editing images, browsing the web, etc. Adversarial: A benchmark is "adversarial" if the items in the benchmark are picked specifically so that certain models do badly on them. Adversarial benchmarks are often constructed after state of the art (SOTA) models have saturated (achieved 100% performance) a benchmark, to renew the benchmark. A benchmark is "adversarial" only at a certain moment in time, since what is adversarial may cease to be adversarial as newer SOTA models appear. Public/Private: A benchmark might be partly or entirely private, meaning that some or all of the questions are not publicly available. The idea is that if a question is publicly available, then it might be used for training, which would be "training on the test set" and invalidate the result of the benchmark. Usually, only the guardians of the benchmark have access to the private subsets, and to score a model on such a benchmark, one must send the model weights, or provide API access, to the guardians. The boundary between a benchmark and a dataset is not sharp. Generally, a dataset contains three "splits": training, test, and validation. Both the test and validation splits are essentially benchmarks. In general, a benchmark is distinguished from a test/validation dataset in that a benchmark is typically intended to be used to measure the performance of many different models that are not trained specifically for doing well on the benchmark, while a test/validation set is intended to be used to measure the performance of models trained specifically on the corresponding training set. In other words, a benchmark may be thought of as a test/validation set without a corresponding training set. Conversely, certain benchmarks may be used as a training set, such as the English Gigaword or the One Billion Word Benchmark, which in modern language is just the negative log-likelihood loss on a pretraining set with 1 billion words. Indeed, the distinction between benchmark and dataset in language models became sharper after the rise of the pretraining paradigm, whereby a model is first trained on massive, unlabeled datasets to learn general language patterns, syntax, and knowledge (pretraining), and the base model is then adapted to specific, downstream tasks using smaller, labeled datasets (fine-tuning). === Lifecycle === Generally, the life cycle of a benchmark consists of the following steps: Inception: A benchmark is published. It can be simply given as a demonstration of the power of a new model (implicitly) that others then picked up as a benchmark, or as a benchmark that others are encouraged to use (explicitly). Growth: More papers and models use the benchmark, and the performance on the benchmark grows. Maturity, degeneration or deprecation: A benchmark may be saturated, after which researchers move on to other benchmarks. Progress on the benchmark may also be neglected as the field moves to focus on other benchmarks. Renewal: A saturated benchmark can be upgraded to make it no longer saturated, allowing further progress. === Construction === Like datasets, benchmarks are typically constructed by several methods, individually or in combination: Web scraping: Ready-made question-answer pairs may be scraped online, such as from websites that teach mathematics and programming. Conversion: Items may be constructed programmatically from scraped web content, such as by blanking out named entities from sentences, and asking the model to fill in the blank. This was used for making the CNN/Daily Mail Reading Comprehension Task. Crowd sourcing: Items may be constructed by paying people to write them, such as on Amazon Mechanical Turk. This was used for making the MCTest. === Evaluation === Generally, benchmarks are fully automated. This limits the questions that can be asked. For example, with mathematical questions, "proving a claim" would be difficult to automatically check, while "calculate an answer with a unique integer answer" would be automatically checkable. With programming tasks, the answer can generally be checked by running unit tests, with an upper limit on runtime. The benchmark scores are of the following kinds: For multiple choice or cloze questions, common scores are accuracy (frequency of correct answer), precision, recall, F1 score, etc. pass@n: The model is given n {\displaystyle n} attempts to solve each problem. If any attempt is correct, the model earns a point. The pass@n score is the model's average score over all problems. k@n: The model makes n {\displaystyle n} attempts to solve each problem, but only k {\displaystyle k} attempts out of them are selected for submission. If any submission is correct, the model earns a point. The k@n score is the model's average score over all problems. cons@n: The model is given n {\displaystyle n} attempts to solve each problem. If the most common answer is correct, the model earns a point. The cons@n score is the model's average score over all problems. Here "cons" stands for "consensus" or "majority voting". The pass@n score can be estimated more accurately by making N > n {\displaystyle N>n} attempts, and use the unbiased estimator 1 − ( N − c n ) ( N n ) {\displaystyle 1-{\frac {\binom {N-c}{n}}{\binom {N}{n}}}} , where c {\displaystyle c} is the number of correct attempts. For less well-formed tasks, where the output can be any sentence, there are the following commonly used scores including BLEU ROUGE, METEOR, NIST, word error rate, LEPOR, CIDEr, and SPICE. === Issues === error: Some benchmark answers may be wrong. ambiguity: Some benchmark questions may be ambiguously worded. subjective: Some benchmark questions may not have an objective answer at all. This problem generally prevents creative writing benchmarks. Similarly, this prevents benchmarking writing proofs in natural language, though benchmarking proofs in a formal language is possible. open-ended: Some benchmark questions may not have a single answer of a fixed size. This problem generally prevents programming benchmarks from using more natural tasks such as "write a program for X", and instead uses tasks such as "write a function that implements specification X". inter-annotator agreement: Some benchmark questions may be not fully objective, such that even people would not agree with 100% on what the answer should be. This is common in natural language processing tasks, such as syntactic annotation. shortcut: Some benchmark questions may be easily solved by an "unintended" shortcut. For example, in the SNLI benchmark, having a negative word like "not" in the second sentence is a strong signal for the "Contradiction" category, regardless of what the se

    Read more →
  • Network Abstraction Layer

    Network Abstraction Layer

    The Network Abstraction Layer (NAL) is a part of the H.264/AVC and HEVC video coding standards. The main goal of the NAL is the provision of a "network-friendly" video representation addressing "conversational" (video telephony) and "non conversational" (storage, broadcast, or streaming) applications. NAL has achieved a significant improvement in application flexibility relative to prior video coding standards. == Introduction == An increasing number of services and growing popularity of high definition TV are creating greater needs for higher coding efficiency. Moreover, other transmission media such as cable modem, xDSL, or UMTS offer much lower data rates than broadcast channels, and enhanced coding efficiency can enable the transmission of more video channels or higher quality video representations within existing digital transmission capacities. Video coding for telecommunication applications has diversified from ISDN and T1/E1 service to embrace PSTN, mobile wireless networks, and LAN/Internet network delivery. Throughout this evolution, continued efforts have been made to maximize coding efficiency while dealing with the diversification of network types and their characteristic formatting and loss/error robustness requirements. The H.264/AVC and HEVC standards are designed for technical solutions including areas like broadcasting (over cable, satellite, cable modem, DSL, terrestrial, etc.) interactive or serial storage on optical and magnetic devices, conversational services, video-on-demand or multimedia streaming, multimedia messaging services, etc. Moreover, new applications may be deployed over existing and future networks. This raises the question about how to handle this variety of applications and networks. To address this need for flexibility and customizability, the design covers a NAL that formats the Video Coding Layer (VCL) representation of the video and provides header information in a manner appropriate for conveyance by a variety of transport layers or storage media. The NAL is designed in order to provide "network friendliness" to enable simple and effective customization of the use of VCL for a broad variety of systems. The NAL facilitates the ability to map VCL data to transport layers such as: RTP/IP for any kind of real-time wire-line and wireless Internet services. File formats, e.g., ISO MP4 for storage and MMS. H.32X for wireline and wireless conversational services. MPEG-2 systems for broadcasting services, etc. The full degree of customization of the video content to fit the needs of each particular application is outside the scope of the video coding standardization effort, but the design of the NAL anticipates a variety of such mappings. Some key concepts of the NAL are NAL units, byte stream, and packet formats uses of NAL units, parameter sets, and access units. A short description of these concepts is given below. == NAL units == The coded video data is organized into NAL units, each of which is effectively a packet that contains an integer number of bytes. The first byte of each H.264/AVC NAL unit is a header byte that contains an indication of the type of data in the NAL unit. For HEVC the header was extended to two bytes. All the remaining bytes contain payload data of the type indicated by the header. The NAL unit structure definition specifies a generic format for use in both packet-oriented and bitstream-oriented transport systems, and a series of NAL units generated by an encoder is referred to as a NAL unit stream. == NAL Units in Byte-Stream Format Use == Some systems require delivery of the entire or partial NAL unit stream as an ordered stream of bytes or bits within which the locations of NAL unit boundaries need to be identifiable from patterns within the coded data itself. For use in such systems, the H.264/AVC and HEVC specifications define a byte stream format. In the byte stream format, each NAL unit is prefixed by a specific pattern of three bytes called a start code prefix. The boundaries of the NAL unit can then be identified by searching the coded data for the unique start code prefix pattern. The use of emulation prevention bytes guarantees that start code prefixes are unique identifiers of the start of a new NAL unit. A small amount of additional data (one byte per video picture) is also added to allow decoders that operate in systems that provide streams of bits without alignment to byte boundaries to recover the necessary alignment from the data in the stream. Additional data can also be inserted in the byte stream format that allows expansion of the amount of data to be sent and can aid in achieving more rapid byte alignment recovery, if desired. == NAL Units in Packet-Transport System Use == In other systems (e.g., IP/RTP systems), the coded data is carried in packets that are framed by the system transport protocol, and identification of the boundaries of NAL units within the packets can be established without use of start code prefix patterns. In such systems, the inclusion of start code prefixes in the data would be a waste of data carrying capacity, so instead the NAL units can be carried in data packets without start code prefixes. == VCL and Non-VCL NAL Units == NAL units are classified into VCL and non-VCL NAL units. VCL NAL units contain the data that represents the values of the samples in the video pictures. Non-VCL NAL units contain any associated additional information such as parameter sets (important header data that can apply to a large number of VCL NAL units) and supplemental enhancement information (timing information and other supplemental data that may enhance usability of the decoded video signal but are not necessary for decoding the values of the samples in the video pictures). == Parameter Sets == A parameter set contains shared configuration data that is carried in non-VCL NAL units. Parameter sets are typically reused when decoding many coded pictures within a video sequence. Each VCL NAL unit references a picture parameter set (PPS), which in turn references a sequence parameter set (SPS). There are two types of parameter sets: Sequence parameter set (SPS), which specifies mostly constant configuration such as resolution, bit depth, or chroma format. (For a concrete implementation, see FFmpeg's SPS struct.) Picture parameter set (PPS), which applies on top of an SPS, and specifies configuration such as QP offsets. (For a concrete implementation, see FFmpeg's PPS struct.) The sequence and picture parameter-set mechanism decouples the transmission of infrequently changing information from the transmission of coded representations of the values of the samples in the video pictures. Each VCL NAL unit contains an identifier that refers to the content of the relevant picture parameter set and each picture parameter set contains an identifier that refers to the content of the relevant sequence parameter set. In this manner, a small amount of data (the identifier) can be used to refer to a larger amount of information (the parameter set) without repeating that information within each VCL NAL unit. Sequence and picture parameter sets can be sent well ahead of the VCL NAL units that they apply to, and can be repeated to provide robustness against data loss. In some applications, parameter sets may be sent within the channel that carries the VCL NAL units (termed "in-band" transmission). In other applications, it can be advantageous to convey the parameter sets "out-of-band" using a more reliable transport mechanism than the video channel itself. == Access Units == A set of NAL units in a specified form is referred to as an access unit. The decoding of each access unit results in one decoded picture. Each access unit contains a set of VCL NAL units that together compose a primary coded picture. It may also be prefixed with an access unit delimiter to aid in locating the start of the access unit. Some supplemental enhancement information containing data such as picture timing information may also precede the primary coded picture. The primary coded picture consists of a set of VCL NAL units consisting of slices or slice data partitions that represent the samples of the video picture. Following the primary coded picture may be some additional VCL NAL units that contain redundant representations of areas of the same video picture. These are referred to as redundant coded pictures, and are available for use by a decoder in recovering from loss or corruption of the data in the primary coded pictures. Decoders are not required to decode redundant coded pictures if they are present. Finally, if the coded picture is the last picture of a coded video sequence (a sequence of pictures that is independently decodable and uses only one sequence parameter set), an end of sequence NAL unit may be present to indicate the end of the sequence; and if the coded picture is the last coded picture in the entire NAL unit stream, an end of stream NAL unit may be present to

    Read more →
  • Odor source localization

    Odor source localization

    Odor source localization (OSL) is the problem of locating the origin of an airborne or waterborne chemical plume using one or more mobile sensors, typically robots equipped with chemical sensors. The task sits at the intersection of robotics, fluid dynamics and machine olfaction. Chemical plumes in turbulent flows are intermittent and patchy, and most chemical sensors respond slowly and have limited selectivity, so the instantaneous reading available to a moving sensor is a poor proxy for the underlying time-averaged concentration field. Robotic OSL has been studied since the late 1980s and has applications including the detection of gas leaks, search and rescue after industrial accidents, and environmental monitoring of industrial emissions. == History == Robotic odor search emerged in the late 1980s and 1990s, drawing on earlier work in chemical ecology that had described how moths and other insects locate distant pheromone sources. R. A. Russell at Monash University was among the first to build mobile robots that followed chemical trails on the floor and tracked airborne odor plumes. Distributed and multi-robot odor search were investigated by Hayes, Martinoli and Goodman at the California Institute of Technology and EPFL, who studied cooperative plume-tracing on simulated and physical robot swarms. In 2007 Vergassola, Villermaux and Shraiman introduced infotaxis, an information-theoretic search strategy in which a sensor moves so as to maximize the expected information gain about source location, rather than following a chemical concentration gradient; the paper appeared in Nature and prompted substantial follow-up work in the robotics community. From the mid-2010s, multi-rotor unmanned aerial vehicles carrying lightweight chemical sensors became a common experimental platform for OSL research. == Problem formulation == OSL is generally decomposed into three sub-problems: plume detection (deciding whether a chemical signal is present), plume traversal (moving so as to remain in contact with the plume), and source declaration (deciding when the source has been reached). The mathematical difficulty depends strongly on the assumed dispersion model. In laminar or low-Reynolds number flows a Gaussian advection–diffusion model gives a smooth concentration field with a well-defined gradient. In turbulent flows, which dominate most realistic environments, the plume is filamentary: the sensor receives short, randomly spaced bursts of chemical separated by periods of zero signal, and the time-averaged field is not a useful guide on the time scales at which a robot must act. Source-term estimation, surveyed by Hutchinson and colleagues, additionally aims to recover both the position and the release rate of the source from the observed concentrations, often using probabilistic filters. == Biological inspiration == Many OSL strategies are explicitly modeled on the behavior of male moths flying upwind toward a pheromone source. As reviewed by Cardé and Willis, moths combine an upwind surge whenever they detect a filament of pheromone with a wider crosswind cast when contact is lost, producing a characteristic zig-zag trajectory that has been transposed onto mobile robots by several groups. Other biological models draw on the search behavior of dogs and of marine animals such as blue crabs and lobsters, which integrate chemical and bilateral hydrodynamic cues over much shorter ranges. == Algorithms and strategies == === Reactive strategies === Reactive strategies select the next motion as a direct function of the current sensor reading. Chemotaxis steers along the locally estimated concentration gradient, which is effective in laminar plumes but degrades severely in turbulence. Anemotaxis exploits a measured wind direction by surging upwind when chemical contact is made. The bio-inspired cast-and-surge family combines anemotaxis with a deterministic crosswind cast on contact loss, and is the dominant reactive approach for turbulent environments. === Probabilistic and information-theoretic strategies === Probabilistic methods maintain a posterior distribution over possible source locations and choose actions that improve that distribution. The infotaxis strategy of Vergassola, Villermaux and Shraiman selects the move that maximizes the expected reduction in entropy of the source-location posterior, and is effective in regimes where the spatial gradient is unusable. Bayesian source-term estimation extends this idea by inferring both source position and release rate, typically using particle filters or sequential Monte Carlo. === Map-based strategies === Map-based methods build a spatial model of the time-averaged gas distribution from sensor readings collected along the robot's trajectory and search for local maxima in that model. Lilienthal and colleagues describe a family of kernel-based gas distribution mapping techniques in which point measurements are convolved with a Gaussian kernel to produce a spatially extrapolated estimate. Such methods are most useful when the source can be assumed quasi-stationary and the robot is able to revisit locations. === Multi-robot and swarm strategies === Multiple robots searching cooperatively can shorten search times. Cooperative formations spread the sensors across the crosswind axis, making detection of an intermittent plume more likely. Swarm-based approaches, reviewed by Wang and colleagues, deploy larger numbers of simpler agents and rely on collective behavior rather than centralized planning; reported advantages include improved coverage of the search area and the possibility of locating multiple sources in parallel. == Sensors and platforms == Most OSL systems use metal-oxide semiconductor (MOX) sensors, photoionization detectors or electrochemical cells, which trade off sensitivity, selectivity, response time and power consumption. Ishida and colleagues describe how these sensors interact with airflow around the robot body, an effect that motivates careful aerodynamic design and active sampling. Mobile platforms include wheeled ground robots for indoor and structured outdoor environments, multi-rotor unmanned aerial vehicles for open spaces and elevated sources, and autonomous underwater vehicles for chemical plumes in the marine environment. == Notable systems == Among the early demonstrations, R. A. Russell's series of differential-drive robots at Monash University localized volatile sources in still and ventilated rooms during the 1990s. The Smelling Nano Aerial Vehicle reported by Burgués and colleagues used a Crazyflie nano-quadcopter (approximately 27 grams in mass and 10 cm across) carrying a custom MOX gas sensing board, and built three-dimensional gas distribution maps of indoor releases from sweeping flights of less than three minutes. The GADEN simulator, released by Monroy and colleagues, couples three-dimensional dispersion computed from an OpenFOAM CFD solver with models of MOX and photo-ionization gas sensors, and is widely used to test mobile-robot olfaction algorithms in simulation. == Applications == Reported applications include the localization of natural-gas and methane leaks in urban infrastructure, search for chemical contamination after industrial accidents, search and rescue, and environmental monitoring of industrial emissions. Drug- and explosives-detection robots are an adjacent application area, although these typically rely on close-range sniffing rather than long-range plume tracking. == Open challenges == Open challenges identified in recent reviews include the limited speed, selectivity and stability of available chemical sensors; the scarcity of standardized, large-scale benchmarks comparable to those available in computer vision; reliable handling of multi-source environments, where standard single-source assumptions fail; and the integration of OSL with other autonomous-vehicle subsystems such as obstacle avoidance and navigation in three-dimensional turbulent flow.

    Read more →
  • Spatial anti-aliasing

    Spatial anti-aliasing

    In digital signal processing, spatial anti-aliasing is a technique for minimizing the distortion artifacts (aliasing) when representing a high-resolution image at a lower resolution. Anti-aliasing is used in digital photography, computer graphics, digital audio, and many other applications. Anti-aliasing means removing signal components that have a higher frequency than is able to be properly resolved by the recording (or sampling) device. This removal is done before (re)sampling at a lower resolution. When sampling is performed without removing this part of the signal, it causes undesirable artifacts such as black-and-white noise. In signal acquisition and audio, anti-aliasing is often done using an analog anti-aliasing filter to remove the out-of-band component of the input signal prior to sampling with an analog-to-digital converter. In digital photography, optical anti-aliasing filters made of birefringent materials smooth the signal in the spatial optical domain. The anti-aliasing filter essentially blurs the image slightly in order to reduce the resolution to or below that achievable by the digital sensor (the larger the pixel pitch, the lower the achievable resolution at the sensor level). == Examples == In computer graphics, anti-aliasing improves the appearance of "jagged" polygon edges, or "jaggies", so they are smoothed out on the screen. However, it incurs a performance cost for the graphics card and uses more video memory. The level of anti-aliasing determines how smooth polygon edges are (and how much video memory it consumes). Near the top of an image with a receding checker-board pattern, the image is difficult to recognise and often not considered aesthetically pleasing. In contrast, when anti-aliased the checker-board near the top blends into grey, which is usually the desired effect when the resolution is insufficient to show the detail. Even near the bottom of the image, the edges appear much smoother in the anti-aliased image. Multiple methods exist, including the sinc filter, which is considered a better anti-aliasing algorithm. When magnified, it can be seen how anti-aliasing interpolates the brightness of the pixels at the boundaries to produce grey pixels since the space is occupied by both black and white tiles. These help make the sinc filter antialiased image appear much smoother than the original. In a simple diamond image, anti-aliasing blends the boundary pixels; this reduces the aesthetically jarring effect of the sharp, step-like boundaries that appear in the aliased graphic. Anti-aliasing is often applied in rendering text on a computer screen, to suggest smooth contours that better emulate the appearance of text produced by conventional ink-and-paper printing. Particularly with fonts displayed on typical LCD screens, it is common to use subpixel rendering techniques like ClearType. Sub-pixel rendering requires special colour-balanced anti-aliasing filters to turn what would be severe colour distortion into barely-noticeable colour fringes. Equivalent results can be had by making individual sub-pixels addressable as if they were full pixels, and supplying a hardware-based anti-aliasing filter as is done in the OLPC XO-1 laptop's display controller. Pixel geometry affects all of this, whether the anti-aliasing and sub-pixel addressing are done in software or hardware. == Simplest approach to anti-aliasing == The most basic approach to anti-aliasing a pixel is determining what percentage of the pixel is occupied by a given region in the vector graphic - in this case a pixel-sized square, possibly transposed over several pixels - and using that percentage as the colour. A Python program producing a basic plot of a single, white-on-black anti-aliased point using the method is as follows: This method is generally best suited for simple graphics, such as basic lines or curves, and applications that would otherwise have to convert absolute coordinates to pixel-constrained coordinates, such as 3D graphics. It is a fairly fast function, but it is relatively low-quality, and gets slower as the complexity of the shape increases. For purposes requiring very high-quality graphics or very complex vector shapes, this will probably not be the best approach. Note: The plot_antialiased_point routine above cannot blindly set the colour value to the percent calculated. It must add the new value to the existing value at that location up to a maximum of 1. Otherwise, the brightness of each pixel will be equal to the darkest value calculated in time for that location which produces a very bad result. For example, if one point sets a brightness level of 0.90 for a given pixel and another point calculated later barely touches that pixel and has a brightness of 0.05, the final value set for that pixel should be 0.95, not 0.05. For more sophisticated shapes, the algorithm may be generalized as rendering the shape to a pixel grid with higher resolution than the target display surface (usually a multiple that is a power of 2 to reduce distortion), then using bicubic interpolation to determine the average intensity of each real pixel on the display surface. == Signal processing approach to anti-aliasing == In this approach, the ideal image is regarded as a signal. The image displayed on the screen is taken as samples, at each (x,y) pixel position, of a filtered version of the signal. Ideally, one would understand how the human brain would process the original signal, and provide an on-screen image that will yield the most similar response by the brain. The most widely accepted analytic tool for such problems is the Fourier transform; this decomposes a signal into basis functions of different frequencies, known as frequency components, and gives us the amplitude of each frequency component in the signal. The waves are of the form: cos ⁡ ( 2 j π x ) cos ⁡ ( 2 k π y ) {\displaystyle \ \cos(2j\pi x)\cos(2k\pi y)} where j and k are arbitrary non-negative integers. There are also frequency components involving the sine functions in one or both dimensions, but for the purpose of this discussion, the cosine will suffice. The numbers j and k together are the frequency of the component: j is the frequency in the x direction, and k is the frequency in the y direction. The goal of an anti-aliasing filter is to greatly reduce frequencies above a certain limit, known as the Nyquist frequency, so that the signal will be accurately represented by its samples, or nearly so, in accordance with the sampling theorem; there are many different choices of detailed algorithm, with different filter transfer functions. Current knowledge of human visual perception is not sufficient, in general, to say what approach will look best. == Two dimensional considerations == The previous discussion assumes that the rectangular mesh sampling is the dominant part of the problem. The filter usually considered optimal is not rotationally symmetrical, as shown in this first figure; this is because the data is sampled on a square lattice, not using a continuous image. This sampling pattern is the justification for doing signal processing along each axis, as it is traditionally done on one dimensional data. Lanczos resampling is based on convolution of the data with a discrete representation of the sinc function. If the resolution is not limited by the rectangular sampling rate of either the source or target image, then one should ideally use rotationally symmetrical filter or interpolation functions, as though the data were a two dimensional function of continuous x and y. The sinc function of the radius has too long a tail to make a good filter (it is not even square-integrable). A more appropriate analog to the one-dimensional sinc is the two-dimensional Airy disc amplitude, the 2D Fourier transform of a circular region in 2D frequency space, as opposed to a square region. One might consider a Gaussian plus enough of its second derivative to flatten the top (in the frequency domain) or sharpen it up (in the spatial domain), as shown. Functions based on the Gaussian function are natural choices, because convolution with a Gaussian gives another Gaussian whether applied to x and y or to the radius. Similarly to wavelets, another of its properties is that it is halfway between being localized in the configuration (x and y) and in the spectral (j and k) representation. As an interpolation function, a Gaussian alone seems too spread out to preserve the maximum possible detail, and thus the second derivative is added. As an example, when printing a photographic negative with plentiful processing capability and on a printer with a hexagonal pattern, there is no reason to use sinc function interpolation. Such interpolation would treat diagonal lines differently from horizontal and vertical lines, which is like a weak form of aliasing. == Practical real-time anti-aliasing approximations == There are only a handful of primitives used at the lowest level in a real-time rend

    Read more →
  • Splitwise

    Splitwise

    Splitwise is an online expense-splitting application software accessible via web browser and mobile app. The app facilitates repayments of shared bills by calculating what each person in a group owes. The primary competitor to the app is Venmo, which only operates in the U.S. Splitwise allows users to create groups with friends to determine what each person owes. All expenses and allocations are added to the app, and Splitwise simplifies the transaction history to determine exactly what payments need to be made to whom to settle outstanding balances. Splitwise stores user information via cloud storage. It was developed and is owned by Splitwise Inc., based in Providence, Rhode Island, United States. == History == The app was launched in February 2011 as SplitTheRent, intended to be used for rent splitting, by Ryan Laughlin, Jon Bittner and Marshall Weir. In September 2013, Splitwise was integrated with Venmo to allow users to settle payments via Venmo. In April 2024, Splitwise partnered with Tink, a Visa payment services company, to incorporate a bank transfer feature directly in the Splitwise app. === Financing === In December 2014, the company raised $1.4 million. In October 2016, the company raised $5 million. In April 2021, Splitwise raised $20 million in funding from series A round run by Insight Partners. == Reception == A 2022 opinion piece in The Guardian by London journalist Imogen West-Knights shared the negative effects of exactly splitting bills among friends and family members. West-Knights argued that Splitwise and similar apps can "turn people into those true enemies of all that is fun and joyful in the world: accountants." However, she said the app does work better when used by couples rather than friend groups. Other reviews noted that the app makes people petty. In contrast, an article published by Condé Nast Traveler describes how Splitwise eliminated stress caused by complicated offline bill splitting, saying it "fixed such a pervasive obstacle in group travel." Coverage by The Wall Street Journal lands somewhere in between the two contrasting views, saying Splitwise and similar apps are helpful, but users need to be prepared for difficult money-related conversations that may arise. An etiquette advisor at Debrett's, said, "The less talk you can have about money on any of these occasions, the better." An editor suggested conversations as simple as asking, "We’re splitting this evenly, right?" before a meal.

    Read more →
  • BiP (software)

    BiP (software)

    BiP is a freeware instant messaging application developed by Lifecell Ventures Cooperatief U.A., a subsidiary of Turkcell incorporated in the Netherlands. It allows users to send text messages, voice messages and video calling, and it can be downloaded from the App Store, Google Play, and Huawei AppGallery. BiP has over 53 million users worldwide, and was first released in 2013. == Functions == BiP is a secure, and free communication platform. BiP allows making video and audio calls, allows sharing images, videos and location. BiP includes instant translations to 106 languages and exchange rates. President Erdoğan's Communications Office opposed WhatsApp's enforcement of its updated privacy policy and announced that Erdoğan left WhatsApp and opened an account in Telegram and BiP. The Turkish Ministry of National Defense has announced that it will move information groups to BiP for the same reason. == Others == Banglalink announced a BiP messenger partnership in Bangladesh The Communications Office of President Erdoğan opposed WhatsApp's enforcement of its updated privacy policy and announced that Erdoğan left WhatsApp and opened an account in Telegram and BiP. The Turkish Ministry of National Defense has announced that it will move information groups to BiP for the same reason. The CEO of BiP is Burak Akinci. The number of downloads of the app is 80 million globally.

    Read more →
  • OpenPipeline

    OpenPipeline

    openPipeline is an open-source plug-in for Autodesk Maya that is designed to assist in a Production Pipeline structure and Computer animation. == Development == Created in Maya Embedded Language, openPipeline was initiated at Eyebeam Atelier and further developed at Pratt Institute in the Digital Arts Lab. The initial release date was December 28, 2006. == Contributors == Rob O'Neill (Creator) Paris Mavroidis Meng-Han Ho

    Read more →
  • JasPer

    JasPer

    JasPer is a computer software project to create a reference implementation of the codec specified in the JPEG-2000 Part-1 standard (i.e. ISO/IEC 15444-1) - started in 1997 at Image Power Inc. and at the University of British Columbia. It consists of a C library and some sample applications useful for testing the codec. The copyright owner began licensing the code to the public under an MIT License-style license in 2004 in response to requests from the open-source community. As of 2011 JasPer operated as a component of many software projects, both free and proprietary, including (but not limited to) netpbm (as of release 10.12), ImageMagick and KDE (as of version 3.2). As of 22 June 2010 the GEGL graphics library supported JasPer in its latest Git versions. In a series of objective JPEG-2000-compression quality tests conducted in 2004, "JasPer was the best codec, closely followed by IrfanView and Kakadu". However, Jasper remains one of the slowest implementations of the JPEG-2000 codec, as it was designed for reference, not performance. == Etymology == The name "JasPer" has simultaneous connotations with Canada's Jasper National Park, with the semi-precious gemstone, jasper, and with "JP" as an abbreviation of the JPEG-2000 standard.

    Read more →
  • Shadow and highlight enhancement

    Shadow and highlight enhancement

    Shadow and highlight enhancement refers to an image processing technique used to correct exposure. The use of this technique has been gaining popularity, making its way onto magazine covers, digital media, and photos. It is, however, considered by some to be akin to other destructive Photoshop filters, such as the Watercolor filter, or the Mosaic filter. == Shadow recovery == A conservative application of the shadow/highlight tool can be very useful in recovering shadows, though it tends to leave a telltale halo around the boundary between highlight and shadow if used incorrectly. A way to avoid this is to use the bracketing technique, although this usually requires a tripod. == Highlight recovery == Recovering highlights with this tool, however, has mixed results, especially when using it on images with skin in them, and often makes people look like they have been "sprayed with fake tan". == Shadow brightening - manual == One way to brighten shadows in image editing software such as GIMP or Adobe Photoshop is to duplicate the background layer, invert the copy and set the blend modes of that top layer to "Soft Light". You can also use an inverted black and white copy of the image as a mask on a brightening layer, such as Curves or Levels. == Shadow brightening - automatic == Several automatic computer image processing-based shadow recovery and dynamic range compression methods can yield a similar effect. Some of these methods include the retinex method and homomorphic range compression. The retinex method is based on work from 1963 by Edwin Land, the founder of Polaroid. Shadow enhancement can also be accomplished using adaptive image processing algorithms such as adaptive histogram equalization or contrast limiting adaptive histogram equalization (CLAHE).

    Read more →
  • Captions (app)

    Captions (app)

    Mirage (formerly known as Captions) is a video-generating, video-editing and AI research company headquartered in New York City. Their first app, Captions, is available on iOS, Android, and Web and offers a suite of tools aimed at streamlining the creation and editing of videos. Their enterprise platform, Mirage Studio, generates AI actors and videos for marketing assets and video campaigns. == History == Mirage was co-founded by Gaurav Misra and Dwight Churchill. During Misra's time leading design engineering at Snap Inc., he followed the rise of a new category of video, the "talking video." In 2021, Misra left Snap to found Mirage with his former colleague Churchill. Later that year, the Captions app launched with early backing from venture capital firms Sequoia Capital and Andreessen Horowitz as well as individual investors. In 2023, the company released Lipdub, an Al dubbing app which translates any video with spoken audio into 28 languages. In October 2023, Captions shared that it maintained over 100,000 daily active users with "about a million" videos being created monthly. In November 2024, Captions acquired AlpacaML, a generative AI company that focused on art and other images. In June 2025, Captions launched Mirage Studio, for marketers and advertising agencies. In September 2025, Captions rebranded their company to Mirage. This change reflects the company's focus on developing their proprietary foundation model and future video products. == Products == The Captions app offers features to automate common production tasks including captioning, editing, dubbing, script creation, and music integration. Mirage Studio allows users to generate AI avatars and create short-form videos from prompts or audio. == Awards == In 2023, the company was recognized as part of Fast Company's "Next Big Things In Tech" series. In 2024, the company won 2 Webby Awards for Best Use of AI & Machine Learning and Creative Production.

    Read more →
  • Photoanalysis

    Photoanalysis

    Photoanalysis (or photo analysis) refers to the study of pictures to compile various types of data, for example, to measure the size distribution of virtually anything that can be captured by photo. Photoanalysis technology has changed the way mines and mills quantify fragmented material. Images are an effective way to document conditions before, after, and even during blasting activities. The technology is advancing at a high rate, and lenses, storage media memory, light sensitivity and resolution have been improving steadily. Today's digital cameras and camcorders include high-resolution optics, compact size, automatic time and date stamps, good battery life, shutters to freeze motion, and computers to autofocus and eliminate jitter using image stabilization. == Mining == Photoanalysis in mining operations can provide an automated system that forewarns a company of potential problems with materials, leading to economies and reduced damage caused from over-sized materials. It can also help determine the effectiveness of blasts. A company can use this technology to monitor materials moving on a conveyor belt in an underground environment, to measure piles left over from a blast, and even measure the amount of material being carried by dump trucks or vessels to a destination. Photoanalysis is being used on SAG mills worldwide to control the size of rock being crushed. Companies are using this technology to determine the size of particles being processed in the SAG Mill.[1] Archived 2009-05-23 at the Wayback Machine Having oversize material entering the SAG mill makes an operation less efficient, costing companies money in electrical and maintenance costs. Photoanalysis technology can eliminate unwanted material before it enters the mill, keeping rock crushing costs low. == Forestry == Wood chip size can affect the overall quality of a product. With automated photoanalysis systems, companies can remove any unwanted wrong-size particles without stopping their mill process. Photoanalysis can affect how efficiently forestry companies operate. In mills worldwide, photoanalysis technology is improving the use of lumber products, cutting back on the amount of trees being used to operate, and saving companies money through quality control optimization.[2] With the current downturn in the North American forestry industry, operators are looking at making their mills more efficient and effective when processing materials. Photoanalysis technology helps identify any weaknesses in the process by continuously monitoring different sections of an operation. == Agriculture == Agricultural companies can, using photoanalysis, monitor conveyor belts of food without contaminating the product by touching it. Other benefits of photoanalysis systems include: Automated removal of any unwanted material on food conveyor Improved quality control for the most important parts of the agricultural process Pinpoint accuracy that helps the efficiency and effectiveness of product handling techniques The importance of photoanalysis technology is being noticed by the agricultural industry as it identifies any unwanted materials going through the process. In an example, if a mouse is on a conveyor of corn, photoanalysis technology would be able to identify the unwanted object and remove it before it contaminates the whole process. == Origins of photoanalysis technology == Photoanalysis technology was created by using the Waterloo Image Enhancement Process in the 1980s. After further development of the imaging process with explosives producer DuPont, engineers Tom Palangio and Takis Katsabanis began selling photoanalysis software commercially. They later renamed the process WipFrag, standing for Waterloo Image Process Fragmentation Today, photoanalysis technology has evolved into stabilized and portable systems that can automatically capture and analyze results instantly. Thousands of these products are currently being used around the world to measure fragmented material. == Photoanalysis equipment photos == == Fragmentation analysis == Fragmentation analysis is becoming a popular term in mining, agricultural and forestry industries. With the majority of money in these industries directed towards the proper sizing of materials, companies are using fragmentation analysis to determine various factors within an operation.[3] The two main ways a company keeps track of fragmented material are through manual and automated sieving procedures. Manual sieving involves extracting a sample of material to analyze the size distribution. The results can be tabulated within two days. Automated sieving is an advanced way of sieving materials running through a process. Without having to extract the material, photoanalysis can take place, allowing for immediate results with pinpoint accuracy. == Blast Fragmentation Software == Operators are using fragmentation analysis to determine the effectiveness of various blasts. With automated sieving technology, workers can track the success of these blasts and receive instant results. Companies are using these results to determine what blasting method yielded the best results for their specific operation. The common variables associated with blast optimization are the provided Particle Size Distribution (PSD) from a shovel fragmentation system, geology including rock type and fracturing, and energy factor. By using photoanalysis the fragmented materials can be monitored, offering pinpoint accuracy and allowing mine operators to make adjustments to future blasting procedures. See Optical Granulometry to view the automated sieving process. == Pre-crushing analysis == Maintenance costs can be significantly reduced if an operation focuses on the fragmentation of the particles passing through their process. Automated sieving systems can detect and help remove any oversize material before it enters the crusher and causes maintenance problems. It also helps determine the effectiveness of the mining process prior to crushing; the sizing of material is always a critical part of operations in the mining, forestry and agricultural industries. Having an analysis taking place at every major point in an operation allows for the proper tracking of material being processed. Engineers can then determine what part of the process needs improving based solely on the size of material. == Post-crushing analysis == Measuring how effective industrial crushers are, can help save a company millions of dollars in energy costs on an annual basis. There are two components that affect a typical crusher: the size of the material inputted, and the speed at which the crusher is moving. If the user can find a perfect balance between these two components, the materials will be crushed to the right size in the shortest time possible. Meeting the material standards set by governments and large companies can be hard. Having a post-crushing analysis taking place ensures that no oversize material gets shipped; eliminating the chance of getting fined for not meeting industry specifications.

    Read more →
  • Oculus Quill

    Oculus Quill

    Quill is a painting and animation software for virtual reality. It runs on Microsoft Windows with Oculus Rift headsets. It is used to create 3D paintings and animated cartoons. Quill was released on November 29, 2016, on the Oculus Store. Theater Elsewhere(formerly Quill Theater), an application for viewing creations made in Quill, was later made available following the release of the Oculus Quest. In September 2021, Facebook, now known as Meta Platforms, and the owner of Oculus, sold Quill to its original creator, who continues to develop and support the app. == Development == Quill was originally developed by Oculus Story Studio as an internal tool for the creative needs of the studio's project Dear Angelica directed by Saschka Unseld along with its art-director Wesley Allsbrook. == Controls == The software works on Oculus Rift utilizing its 6DoF motion controllers. Users can paint in 3D space using their hands naturally, and animate those paintings with keyframes. They can also capture videos and photos of their creations. == Reception == Dear Angelica, a VR story fully painted in Quill, was nominated for an Emmy Award in 2017.

    Read more →
  • Non-native speech database

    Non-native speech database

    A non-native speech database is a speech database of non-native pronunciations of English. Such databases are used in the development of: multilingual automatic speech recognition systems, text to speech systems, pronunciation trainers, and second language learning systems. == List == The actual table with information about the different databases is shown in Table 2. === Legend === In the table of non-native databases some abbreviations for language names are used. They are listed in Table 1. Table 2 gives the following information about each corpus: The name of the corpus, the institution where the corpus can be obtained, or at least further information should be available, the language which was actually spoken by the speakers, the number of speakers, the native language of the speakers, the total amount of non-native utterances the corpus contains, the duration in hours of the non-native part, the date of the first public reference to this corpus, some free text highlighting special aspects of this database and a reference to another publication. The reference in the last field is in most cases to the paper which is especially devoted to describe this corpus by the original collectors. In some cases it was not possible to identify such a paper. In these cases a paper is referenced which is using this corpus is. Some entries are left blank and others are marked with unknown. The difference here is that blank entries refer to attributes where the value is just not known. Unknown entries, however, indicate that no information about this attribute is available in the database itself. As an example, in the Jupiter weather database no information about the origin of the speakers is given. Therefore this data would be less useful for verifying accent detection or similar issues. Where possible, the name is a standard name of the corpus, for some of the smaller corpora, however, there was no established name and hence an identifier had to be created. In such cases, a combination of the institution and the collector of the database is used. In the case where the databases contain native and non-native speech, only attributes of the non-native part of the corpus are listed. Most of the corpora are collections of read speech. If the corpus instead consists either partly or completely of spontaneous utterances, this is mentioned in the Specials column.

    Read more →
  • Paint.NET

    Paint.NET

    Paint.NET (sometimes stylized as paint.net) is a freeware general-purpose raster graphics editor program for Microsoft Windows, developed with the .NET platform. Paint.NET was originally created by Rick Brewster as a Washington State University student project, and has evolved from a simple replacement for the Microsoft Paint program into a program for editing mainly graphics, with support for plugins. == History == Paint.NET originated as a computer science senior design project by Rick Brewster during spring 2004 at Washington State University. Version 1.0 consisted of 36,000 lines of code and was written in four months. In contrast, version 3.35 has approximately 162,000 lines of code. The Paint.NET project continued over the summer and into the autumn 2004 semester for both the version 1.1 and 2.0 releases. Development continued with one programmer who worked on previous versions of Paint.NET while he was a student at WSU. As of May 2006 the program had been downloaded at least 2 million times, at a rate of about 180,000 per month. Initially, Paint.NET was released under a modified version of the MIT License, with the exclusion of the installer, text, and graphics. However, citing issues with the open source code being plagiarized by others that had rebranded the software as their own and bundled user content without their permission, the availability of the source code was restricted, in December 2007 Brewster announced his intent to restrict access to components of the program (including its installer, resources, and user interface). In November 2009, the software was made proprietary, restricting the sale or creation of derivative works of the software. Starting with version 4.0.18, Paint.NET is published in two editions: A classic edition remains freeware, similar to all other versions since 3.5. Another edition, however, is published to Microsoft Store under a trialware license and is available to purchase for US$14.99. According to the developer, this was done to enable the users to contribute to the development with more convenience, even though the old avenue of donation was not closed. In May 2026, Brewster revealed that he obtained the paint.net domain after attempting to do so for 22 years. Historically, the editor was hosted on getpaint.net, and according to Brewster, the previous owners of paint.net would not sell the domain and asked for "lots and lots of money". In December of the previous year, paint.net began hosting content that impersonated Paint.NET, therefore becoming a clear case of trademark infringement and domain squatting. Brewster stated that he was able to obtain the domain afterwards with the help of a lawyer. == Overview == Paint.NET is primarily programmed in the C# programming language. Its native image format, .PDN, is a compressed representation of the application's internal object format, which preserves layering and other information. == Plugins == Paint.NET supports plugins, which add image adjustments, effects, and support for additional file types. They can be programmed using any .NET Framework programming language, though they are most commonly written in C#. These are created by volunteer coders on the program's discussion board, the Paint.NET Forum. Though most are simply published via the discussion board, some have been included with a later release of the program. For instance, a DirectDraw Surface file type plugin, (originally by Dean Ashton) and an Ink Sketch and Soften Portrait effect (originally by David Issel) were added to Paint.NET in version 3.10. Hundreds of plugins have been produced; such as Shape3D, which renders a 2D drawing into a 3D shape. Some plugins expand on the functionality that comes with Paint.NET, such as Curves+ and Sharpen+, which extend the included tools Curves and Sharpen, respectively. Examples of file type plugins include an Animated Cursor and Icon plugin and an Adobe Photoshop file format plugin. Several of these plugins are based on existing open source software, such as a raw image format plugin that uses dcraw and a PNG optimization plugin that uses OptiPNG. == Forks == === paint-mono === Paint.NET was created exclusively for Windows and has no native support for other operating systems. Due to its former open-source licensing, the development of alternative versions was possible. In May 2007, Miguel de Icaza officially started a porting project called paint-mono. This project had partially ported Paint.NET 3.0 to Mono, an open-source implementation of the Common Language Infrastructure on which the .NET Framework is based. This allowed Paint.NET to be run on Mono-supported platforms, such as Linux. This port is no longer maintained and has not been updated since March 2009. Newer Mono runtime 6 versions are able to run original Paint.NET releases up to 3.5.11 with only minor issues. === Pinta === In 2010, developer Jonathan Pobst started a project called Pinta, describing it as a clone of Paint.NET for Mono and Gtk#. Pinta reused the adjustments and effects code from Paint.NET but otherwise is original code.

    Read more →
  • AppyStore

    AppyStore

    AppyStore is a comprehensive learning videos and games app for kids up to the age of 8 years. The platform developed by Mauj Mobile, a mobile value-added services (VAS) provider curates content to help in child development by leveraging technology. Mauj is funded by Sequoia Capital, Westbridge Capital and Intel Capital. == Background == AppyStore was launched in 2014 as a platform providing content for kids between the ages of 1.5 and 6 years. AppyStore subsequently extended its services for kids up to 8 years of age. The company operates on a subscription-based model and claims to have 5,000 learning games and videos segregated in 18 learning areas developed to help children gain optimal skills and qualities. According to an article published in Business Standard, the application is claimed to be one of the top 5 apps that help to enhance the logical and imaginative capabilities of children. AppyStore was awarded the Best app for kids by Google Play in December 2017. == Service == The company provides content via a website and an Android app. The website and android app provide learning games, rhymes, phonics, reading, stories, science, numbers, maths, logic videos comprising puzzles, worksheets, videos and fun activities and the premium subscription also includes physical worksheets which are home delivered. This content is educational and has been handpicked by teachers and experts with an understanding of the major areas of child development milestones for children up to 8 years of age. The mobile application also allows parents to track the progress of their child on the basis of the number of videos viewed.

    Read more →