AI Code Meme

AI Code Meme — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Language identification

    Language identification

    In natural language processing, language identification or language guessing is the problem of determining which natural language a given content is in. Computational approaches to this problem view it as a special case of text categorization, solved with various statistical methods. == Overview == === Logical approach === A common non-statistical intuitive approach (though highly uncertain) is to look for common letter combinations, or distinctive diacritics or punctuation. === Statistical approach === There are several statistical approaches to language identification. An older statistical method by Grefenstette was based on the frequency of short n-grams, which are often function morphemes. For example, "ing" is more common in English than in French, while the sequence "que" is more common in French. Given a new page found on the Web, one counts the number of occurrences of each such short sequence and picks the language whose frequency table it matches the most. One technique is to compare the compressibility of the text to the compressibility of texts in a set of known languages. This approach is known as mutual information based distance measure. The same technique can also be used to empirically construct family trees of languages which closely correspond to the trees constructed using historical methods. Mutual information based distance measure is essentially equivalent to more conventional model-based methods and is not generally considered to be either novel or better than simpler techniques. Another technique, as described by Cavnar and Trenkle (1994) and Dunning (1994) is to create a language n-gram model from a "training text" for each of the languages. These models can be based on characters (Cavnar and Trenkle) or encoded bytes (Dunning); in the latter, language identification and character encoding detection are integrated. Then, for any piece of text needing to be identified, a similar model is made, and that model is compared to each stored language model. The most likely language is the one with the model that is most similar to the model from the text needing to be identified. This approach can be problematic when the input text is in a language for which there is no model. In that case, the method may return another, "most similar" language as its result. Also problematic for any approach are pieces of input text that are composed of several languages, as is common on the Web. As of 2025, a commonly used baseline method is via the fastText library, which has comparable classification accuracy as deep learning techniques, but much faster. == Identifying similar languages == One of the great bottlenecks of language identification systems is to distinguish between closely related languages. Similar languages like Bulgarian and Macedonian or Indonesian and Malay present significant lexical and structural overlap, making it challenging for systems to discriminate between them. In 2014 the DSL shared task has been organized providing a dataset (Tan et al., 2014) containing 13 different languages (and language varieties) in six language groups: Group A (Bosnian, Croatian, Serbian), Group B (Indonesian, Malaysian), Group C (Czech, Slovak), Group D (Brazilian Portuguese, European Portuguese), Group E (Peninsular Spanish, Argentine Spanish), Group F (American English, British English). The best system reached performance of over 95% results (Goutte et al., 2014). Results of the DSL shared task are described in Zampieri et al. 2014. == Software == Apache OpenNLP includes char n-gram based statistical detector and comes with a model that can distinguish 103 languages Apache Tika contains a language detector for 18 languages

    Read more →
  • EPUAP

    EPUAP

    ePUAP (Electronic Platform of Public Administration Services) is a Polish nationwide platform for communication of citizens with public administrations in a uniform and standardized way. Built as part of the ePUAP-WKP project (State Informatization Plan). Service providers are public administration units and public institutions (especially entities that perform tasks commissioned by the state). The platform provides service providers with technological infrastructure to provide services to citizens (recipients). Among the participants of ePUAP there are both central administration units and local governments, including municipal offices. Among the services offered by ePUAP is also Profil Zaufany (Trusted Profile), which enables electronic filing with legal effect without the need to use a qualified signature and SAML-based single sign-on mechanism, which enables the same ePUAP account to log on to websites of various service providers. The website www.epuap.gov.pl enables defining citizen and businesses service processes, creates channels of access to different systems of public administration and extends the package of public services provided electronically. Services available through the ePUAP platform may be accessed at the official website. Currently all administration services are available in Polish only. == Overview == It is described by the Polish government as "a coherent and systematic action program designed and developed to allow public institutions make their electronic services available to the public". The platform provides citizens, businesses and institutions with a number of services intended to ensure smooth and safe communication between: customer to administrations (C2A), business to administration (B2A), administration to administration (A2A). === Main goals === The main project objectives are to create a single, secure and electronic access channel to public services for citizens, businesses and public administration and also to reduce time and lower the costs of sharing information resources and functionalities of administration domain systems. Within the project, the following functionalities and services were delivered: Public services catalogue – a method of presenting and describing administration services, ePUAP platform – a web platform designed to provide public services on the Internet, Interoperability portal – a portal for experts working on recommendations for electronic documents and forms used within Polish administration systems to assure the uniformity of IT standards, Central Repository of Electronic Document Models – a database for valid document models and electronic forms. == History and background == The ePUAP project was carried out in the years 2005–2008. Currently, a continuation project ePUAP2 is being carried out with the following objectives: to increase the number of online services available to the public including the registry services, to widen the scale of usage of public electronic services, to integrate subsequent systems of public administration and business on ePUAP portal, to define new processes of customer and business services. === ePUAP2 === ePUAP2 is a public and administrative project that extends the set of functional services developed during the first edition of the project and is another step in the process of transforming Poland into a modern and citizen-friendly country. The implementation period for the project covers the years 2009–2013. Project financing The cost of the project “Construction of electronic Platform of Public Administration Services” – 32 million PLN was covered in 75% by the funds from the European Regional Development Fund (under the Sector Operational Programme "Supporting Competitiveness of Enterprises for the years 2004–2006"), while the remaining 25% of the cost was covered by a Polish national co-financing. Funds for the ePUAP2 project were gained from the 7th priority axis of the Innovative Economy Operational Programme and amounts to 140 million PLN (85% of eligible expenses were covered by the European Regional Development Fund, 15% were covered by a national co-financing). The trustee of ePUAP is the Polish Ministry of the Interior and Administration. == Legal regulations == According to the Polish law from 1 May 2008, public authorities are required to accept documents in electronic form (bringing applications and proposals and other activities in electronic form). ePUAP enables public institutions to meet this requirement by providing a service infrastructure to set up am electronic inbox. The ePUAP inbox meets legal requirements, in particular: issuing an official confirmation of receipt in accordance with the regulation of the Prime Minister of 29 September 2005 on the organizational and technical conditions for the delivery of electronic documents to public entities; cooperation with hardware security modules (HSM), meeting the technical requirements set out in the law; handling documents electronically in accordance with the minimum requirements set out in the Regulation of the Polish Council of Ministers of 11 October 2005 on minimum requirements for ICT systems. == Incidents == === Crashes === The ePUAP system very often happens smaller or larger failures. Because it is used to sign the application profiles trusted also in other electronic systems such as public administration. Electronic Services Platform created by ZUS, the system fault ePUAP it very difficult to settle official matters most electronically. === "Infoafera" === According to TVN and the release of TVP News from 10 April 2014, the creation of ePUAP is also associated with the so-called "Infoafera." On 10 April 2014, the Minister of Internal Affairs of Poland confirmed the information that the American technology company HP confessed to its participation in the Polish info-tour and corruption of Polish officials. By March 2014, the construction of ePUAP and its maintenance cost PLN 98.4 million. PLN 67.8 million has been used for this project. Challenged expenses only on the portal itself is approx. PLN 20 million.

    Read more →
  • System Service Descriptor Table

    System Service Descriptor Table

    The System Service Descriptor Table (SSDT) is an internal dispatch table within Microsoft Windows. == Function == The SSDT maps syscalls to kernel function addresses. When a syscall is issued by a user space application, it contains the service index as parameter to indicate which syscall is called. The SSDT is then used to resolve the address of the corresponding function within ntoskrnl.exe. In modern Windows kernels, two SSDTs are used: One for generic routines (KeServiceDescriptorTable) and a second (KeServiceDescriptorTableShadow) for graphical routines. A parameter passed by the calling userspace application determines which SSDT shall be used. == Hooking == Modification of the SSDT allows to redirect syscalls to routines outside the kernel. These routines can be either used to hide the presence of software or to act as a backdoor to allow attackers permanent code execution with kernel privileges. For both reasons, hooking SSDT calls is often used as a technique in both Windows kernel mode rootkits and antivirus software. In 2010, many computer security products which relied on hooking SSDT calls were shown to be vulnerable to exploits using race conditions to attack the products' security checks.

    Read more →
  • Trustworthy computing

    Trustworthy computing

    The term trustworthy computing (TwC) has been applied to computing systems that are inherently secure, available, and reliable. It is particularly associated with the Microsoft initiative of the same name, launched in 2002. == History == Until 1995, there were restrictions on commercial traffic over the Internet. On, May 26, 1995, Bill Gates sent the "Internet Tidal Wave" memorandum to Microsoft executives assigning "...the Internet this highest level of importance..." but Microsoft's Windows 95 was released without a web browser as Microsoft had not yet developed one. The success of the web had caught them by surprise but by mid 1995, they were testing their own web server, and on August 24, 1995, launched a major online service, The Microsoft Network (MSN). The National Research Council recognized that the rise of the Internet simultaneously increased societal reliance on computer systems while increasing the vulnerability of such systems to failure and produced an important report in 1999, "Trust in Cyberspace". This report reviews the cost of un-trustworthy systems and identifies actions required for improvement. == Microsoft and Trustworthy Computing == Bill Gates launched Microsoft's "Trustworthy Computing" initiative with a January 15, 2002 memo, referencing an internal whitepaper by Microsoft CTO and Senior Vice President Craig Mundie. The move was reportedly prompted by the fact that they "...had been under fire from some of its larger customers–government agencies, financial companies and others–about the security problems in Windows, issues that were being brought front and center by a series of self-replicating worms and embarrassing attacks." such as Code Red, Nimda, Klez and Slammer. Four areas were identified as the initiative's key areas: Security, Privacy, Reliability, and Business Integrity, and despite some initial scepticism, at its 10-year anniversary it was generally accepted as having "...made a positive impact on the industry...". The Trustworthy Computing campaign was the main reason why Easter eggs disappeared from Windows, Office and other Microsoft products.

    Read more →
  • Path tracing

    Path tracing

    Path tracing is a rendering algorithm in computer graphics that simulates how light interacts with objects and participating media to generate realistic (physically plausible) images. It is based on earlier, more limited, ray tracing algorithms. Path tracing is used to create photorealistic images for artistic purposes, and for applications such as architectural rendering and product design. It is also used to render frames for animated films, and visual effects for film and television. Because it can be very accurate and unbiased, it is commonly used to generate reference images when testing the quality of other rendering algorithms. The technique uses the Monte Carlo method to compute estimates of global illumination and simulate the ways different materials reflect (or scatter), transmit, absorb, and emit light. It can incorporate simple modeling of the effects of aperture and lens (depth of field, and bokeh) and shutter speed (motion blur), or more realistic simulation of the optical components in a camera. The algorithm works by describing illumination in a scene using the rendering equation, or light transport equation, and finding an approximate solution using Monte Carlo integration. An inefficient (but accurate) version of the algorithm can be very simple, and involves tracing a ray from the camera, allowing this ray to bounce in random directions as it hits different objects in the scene, and computing the amount of light transmitted along the path to the camera whenever the path encounters a light source. This process is repeated many times for each pixel (each repetition, with generated path and transmitted light, is called a sample), and the results are averaged. One main difference between this algorithm and standard ray tracing is that a single unbranching path is traced each time, while "Whitted-style" or "Cook-style" ray tracing recursively samples branching paths (e.g. when light is both reflected and refracted by a glass object). More practical versions incorporate improvements such as quasi-Monte Carlo methods (techniques that distribute samples more evenly), importance sampling (take more samples of paths that are likely to transport more light), and next event estimation (allow a very limited form of branching, and sample additional paths that connect to the lights more directly). Because path tracing uses random samples there is noise in the final image, which decreases as more samples are taken. Images commonly require many thousands of samples per pixel (spp) to reduce noise to an acceptable level, and denoising techniques (e.g. based on neural networks) are often used. Denoising is usually necessary when path tracing is used for real-time rendering in video games, because relatively few samples can be taken. Many alternative algorithms for path tracing have been developed, although they do not always outperform more straightforward implementations. These include bidirectional path tracing (which traces paths forwards from the light source as well as backwards from the camera), Metropolis light transport, and ways of combining path tracing with photon mapping. Video games often use biased versions of path tracing to improve performance (e.g. limiting the number of bounces in each path). A family of techniques called ReSTIR has been developed that can help real-time path tracing by sharing data between nearby pixels and consecutive frames. == History == Like all ray tracing methods, path tracing is based on ray casting, which Arthur Appel used for computer graphics rendering in the late 1960s. In 1980, John Turner Whitted published a recursive ray tracing algorithm that allows rendering images of scenes containing mirrored surfaces and refractive transparent objects. In 1984, Cook et al. described a form of ray tracing called distributed ray tracing, which uses Monte Carlo integration to render effects such as depth of field, motion blur, reflection from rough surfaces, and area lights. The same year, the radiosity method (not a ray tracing method) was published, which was the first physically based method for rendering diffuse global illumination. In 1986, Jim Kajiya published a paper exploring how to use distributed ray tracing to render physically-based global illumination, and this paper also introduced and named the method called "path tracing". Path tracing and other distributed ray tracing techniques were further refined in the late 1980s and early 1990s by researchers such as James Arvo and Peter Shirley, and by Greg Ward in the open source Radiance software. Despite being theoretically able to render any lighting, the original form of path tracing can sometimes be very inefficient (or noisy) for rendering light that is reflected or refracted before illuminating a visible surface, including diffuse global illumination where light enters an area through narrow gaps, because it traces paths only from the camera. To address this, variations of path tracing that trace paths from both the camera and from light sources, called bidirectional path tracing, were published in 1993 by Eric Lafortune and Yves Willems, and in 1997 by Eric Veach and Leonidas Guibas. In 1997 Veach and Guibas also published an alternative method called Metropolis light transport, which combines bidirectional path tracing with the Metropolis method. Veach's lengthy Ph.D. dissertation described both techniques, along with the theoretical background of path tracing; later, the book Physically Based Rendering (which won an Academy Award for Technical Achievement in 2014) helped to make information about path tracing more widely available. Path tracing requires tracing a large number of paths of light in order to produce an image with a visually acceptable amount of noise. This made path tracing very slow on computers available in the 1980s and 1990s, and noise remained a problem when trying to reproduce the style of earlier computer graphics animated films. Most animated films produced until around 2010, by studios such as Pixar, used rasterization-based rendering, with ray tracing used selectively for reflections (and later for precomputed or cached global illumination). However the speed of computers rapidly increased during the 1990s. Blue Sky Studios pioneered using Monte Carlo ray tracing for global illumination in animation, including in the 1998 short film "Bunny", but they did not disclose the precise techniques used. Path tracing gradually become more practical for film production in the early 2000s. The Arnold renderer, developed by Marcos Fajardo, was used by Sony Pictures Imageworks to produce the feature-length film Monster House, released in 2006. Pixar rewrote their RenderMan software to use path tracing, and released their first feature-length path-traced film Finding Dory in 2016. Although path tracing still had a large computational cost, animation studios discovered that less human labor was required when using it, for example because global illumination no longer needed to be faked by manually placing lights. The amount of noise present in path traced images still caused difficulties, particularly when rendering motion blur (which was used extensively by earlier animated films) but denoising techniques were developed to address this. New techniques were also needed for rendering hair and fur, and to handle the extremely large scenes sometimes required by films. Renderers such as Arnold, and Disney's Hyperion, originally only used CPUs for rendering, but as GPUs became more capable (and APIs such as CUDA, OpenCL, and OptiX were released) researchers and developers began adapting algorithms and implementations to use GPUs. GPUs can dramatically reduce rendering time: for example using a high-end GPU to accelerate portions of the rendering code can make it over 30 times faster than using only a high-end CPU. == Description == Kajiya's 1986 paper defined a recursive integral equation called the rendering equation, which describes a simplified form of light transport. Using Monte Carlo integration for the integral on the right side of the equation leads fairly directly to the path tracing algorithm: I ( x , x ′ ) = g ( x , x ′ ) [ ϵ ( x , x ′ ) + ∫ S ρ ( x , x ′ , x ″ ) I ( x ′ , x ″ ) d x ″ ] {\displaystyle I(x,x')=g(x,x')\left[\epsilon (x,x')+\int _{S}\rho (x,x',x'')I(x',x'')dx''\right]} This expresses I(x,x'), the light arriving at point x from point x', as the product of a geometry term, g(x,x'), which is 0 if there is something blocking the light between the two points and 1 otherwise, and the amount of light leaving point x' and traveling towards x. The light leaving point x' is the sum of the light emitted by the surface at x', and the integral of the light arriving at x' from all other points in the scene (the integration domain S) and being reflected towards x. The factor ρ(x,x',x''), which calculates how much light is reflected, must take into account the angles at which the light is arriving and leaving, and

    Read more →
  • Confidential computing

    Confidential computing

    Confidential computing is a security and privacy-enhancing computational technique focused on protecting data in use. Confidential computing can be used in conjunction with storage and network encryption, which protect data at rest and data in transit respectively. It is designed to address software, protocol, cryptographic, and basic physical and supply-chain attacks, although some critics have demonstrated architectural and side-channel attacks effective against the technology. The technology protects data in use by performing computations in a hardware-based trusted execution environment (TEE). Confidential data is released to the TEE only once it is assessed to be trustworthy. Different types of confidential computing define the level of data isolation used, whether virtual machine, application, or function, and the technology can be deployed in on-premise data centers, edge locations, or the public cloud. It is often compared with other privacy-enhancing computational techniques such as fully homomorphic encryption, secure multi-party computation, and Trusted Computing. Confidential computing is promoted by the Confidential Computing Consortium (CCC) industry group, whose membership includes major providers of the technology. == Properties == Trusted execution environments (TEEs) "prevent unauthorized access or modification of applications and data while they are in use, thereby increasing the security level of organizations that manage sensitive and regulated data". Trusted execution environments can be instantiated on a computer's processing components such as a central processing unit (CPU) or a graphics processing unit (GPU). In their various implementations, TEEs can provide different levels of isolation including virtual machine, individual application, or compute functions. Typically, data in use in a computer's compute components and memory exists in a decrypted state and can be vulnerable to examination or tampering by unauthorized software or administrators. According to the CCC, confidential computing protects data in use through a minimum of three properties: Data confidentiality: "Unauthorized entities cannot view data while it is in use within the TEE". Data integrity: "Unauthorized entities cannot add, remove, or alter data while it is in use within the TEE". Code integrity: "Unauthorized entities cannot add, remove, or alter code executing in the TEE". In addition to trusted execution environments, remote cryptographic attestation is an essential part of confidential computing. The attestation process assesses the trustworthiness of a system and helps ensure that confidential data is released to a TEE only after it presents verifiable evidence that it is genuine and operating with an acceptable security posture. It allows the verifying party to assess the trustworthiness of a confidential computing environment through an "authentic, accurate, and timely report about the software and data state" of that environment. "Hardware-based attestation schemes rely on a trusted hardware component and associated firmware to execute attestation routines in a secure environment". Without attestation, a compromised system could deceive others into trusting it, claim it is running certain software in a TEE, and potentially compromise the confidentiality or integrity of the data being processed or the integrity of the trusted code. == Technical approaches == Technical approaches to confidential computing may vary in which software, infrastructure and administrator elements are allowed to access confidential data. The "trust boundary," which circumscribes a trusted computing base (TCB), defines which elements have the potential to access confidential data, whether they are acting benignly or maliciously. Confidential computing implementations enforce the defined trust boundary at a specific level of data isolation. The three main types of confidential computing are: Virtual machine isolation Application isolation, also known as process isolation Function isolation, also known as library isolation Virtual machine isolation removes the elements controlled by the computer infrastructure or cloud provider, but allows potential data access by elements inside a virtual machine running on the infrastructure. Application or process isolation permits data access only by authorized software applications or processes. Function or library isolation is designed to permit data access only by authorized subroutines or modules within a larger application, blocking access by any other system element, including unauthorized code in the larger application. == Threat model == As confidential computing is concerned with the protection of data in use, only certain threat models can be addressed by this technique. Other types of attacks are better addressed by other privacy-enhancing technologies. === In scope === The following threat vectors are generally considered in scope for confidential computing: Software attacks: including attacks on the host’s software and firmware. This may include the operating system, hypervisor, BIOS, other software and workloads. Protocol attacks: including "attacks on protocols associated with attestation as well as workload and data transport". This includes vulnerabilities in the "provisioning or placement of the workload" or data that could cause a compromise. Cryptographic attacks: including "vulnerabilities found in ciphers and algorithms due to a number of factors, including mathematical breakthroughs, availability of computing power and new computing approaches such as quantum computing". The CCC notes several caveats in this threat vector, including relative difficulty of upgrading cryptographic algorithms in hardware and recommendations that software and firmware be kept up-to-date. A multi-faceted, defense-in-depth strategy is recommended as a best practice. Basic physical attacks: including cold boot attacks, bus and cache snooping and plugging attack devices into an existing port, such as a PCI Express slot or USB port. Basic upstream supply-chain attacks: including attacks that would compromise TEEs through changes such as added debugging ports. The degree and mechanism of protection against these threats varies with specific confidential computing implementations. === Out of scope === Threats generally defined as out of scope for confidential computing include: Sophisticated physical attacks: including physical attacks that "require long-term and/or invasive access to hardware" such as chip scraping techniques and electron microscope probes. Upstream hardware supply-chain attacks: including attacks on the CPU manufacturing process, CPU supply chain in key injection/generation during manufacture. Attacks on components of a host system that are not directly providing the capabilities of the trusted execution environment are also generally out-of-scope. Availability attacks: confidential computing is designed to protect the confidentiality and integrity of protected data and code. It does not address availability attacks such as Denial of Service or Distributed Denial of Service attacks. == Use cases == Confidential computing can be deployed in the public cloud, on-premise data centers, or distributed "edge" locations, including network nodes, branch offices, industrial systems and others. === Data privacy and security === Confidential computing protects the confidentiality and integrity of data and code from the infrastructure provider, unauthorized or malicious software and system administrators, and other cloud tenants, which may be a concern for organizations seeking control over sensitive or regulated data. The additional security capabilities offered by confidential computing can help accelerate the transition of more sensitive workloads to the cloud or edge locations. === Multi-party analytics === Confidential computing can enable multiple parties to engage in joint analysis using confidential or regulated data inside a TEE while preserving privacy and regulatory compliance. In this case, all parties benefit from the shared analysis, but no party's sensitive data or confidential code is exposed to the other parties or system host. Examples include multiple healthcare organizations contributing data to medical research, or multiple banks collaborating to identify financial fraud or money laundering. Oxford University researchers proposed the alternative paradigm called "Confidential Remote Computing" (CRC), which supports confidential operations in Trusted Execution Environments across endpoint computers considering multiple stakeholders as mutually distrustful data, algorithm and hardware providers. === Confidential generative AI === Confidential computing technologies can be applied to various stages of a generative AI deployments to help increase data or model privacy, security, and regulatory compliance. TEEs and remote attestation can protect the integrity of data during AI model training, keep

    Read more →
  • Site Security Handbook

    Site Security Handbook

    The Site Security Handbook, RFC 2196, is a guide on setting computer security policies and procedures for sites that have systems on the Internet (however, the information provided should also be useful to sites not yet connected to the Internet). The guide lists issues and factors that a site must consider when setting their own policies. It makes a number of recommendations and provides discussions of relevant areas. This guide is only a framework for setting security policies and procedures. In order to have an effective set of policies and procedures, a site will have to make many decisions, gain agreement, and then communicate and implement these policies. The guide is a product of the IETF SSH working group, and was published in 1997, obsoleting the earlier RFC 1244 from 1991.

    Read more →
  • Halloween Problem

    Halloween Problem

    In computing, the Halloween Problem refers to a phenomenon in databases in which an update operation causes a change in the physical location of a row, potentially allowing the row to be visited again later in the same update operation. This could even cause an infinite loop in some cases where updates continually place the updated record ahead of the scan performing the update operation. The potential for this database error was first discovered by Don Chamberlin, Pat Selinger, and Morton Astrahan in the mid-1970s, on Halloween day, while working on query optimization. They wrote a SQL query supposed to give a ten percent raise to every employee who earned less than $25,000. This query would run successfully, with no errors, but when finished all the employees in the database earned at least $25,000, because it kept giving them a raise until they reached that level. The expectation was that the query would iterate over each of the employee records with a salary less than $25,000 precisely once. In fact, because even updated records were visible to the query execution engine and so continued to match the query's criteria, salary records were matching multiple times and each time being given a 10% raise until they were all greater than $25,000. Contrary to what some believe, the name is not descriptive of the nature of the problem but rather was given due to the day it was discovered on. As recounted by Don Chamberlin: Pat and Morton discovered this problem on Halloween... I remember they came into my office and said, "Chamberlin, look at this. We have to make sure that when the optimizer is making a plan for processing an update, it doesn't use an index that is based on the field that is being updated. How are we going to do that?" It happened to be on a Friday, and we said, "Listen, we are not going to be able to solve this problem this afternoon. Let's just give it a name. We'll call it the Halloween Problem and we'll work on it next week." And it turns out it has been called that ever since.

    Read more →
  • Distributed Common Ground System

    Distributed Common Ground System

    The Distributed Common Ground System (DCGS) is a system which produces military intelligence for multiple branches of the American military. == DCGS Programs == DCGS-N - DCGS for the United States Navy DCGS-A - DCGS for the United States Army AF DCGS - DCGS for the United States Air Force DCGS-MC - DCGS for the United States Marine Corps DCGS-SOF - DCGS for the United States Special Operations Forces IS&A Support Center - DCGS-A Help Desk for the United States Army - https://dcgsahelp.max.gov/ - Max.gov sunset 15 December 2023 == Description == While in U.S. Air Force use, the system produces intelligence collected by the U-2 Dragonlady, RQ-4 Global Hawk, MQ-9 Reaper and MQ-1 Predator. The previous system of similar use was the Deployable Ground Station (DGS), which was first deployed in July 1994. Subsequent version of DGS were developed from 1995 through 2009. Although officially designated a "weapons system", it consists of computer hardware and software connected together in a computer network, devoted to processing and dissemination of information such as images. The 480th Intelligence, Surveillance and Reconnaissance Wing of the Air Combat Command operates and maintains the USAF system. A plan envisioned in 1998 was to develop interoperable systems for the Army and Navy, in addition to the Air Force. By 2006, version 10.6 was deployed by the Air Force, and a version known as DCGS-A was developed for the Army. After a 2010 report by General Michael T. Flynn, the program was intended to use cloud computing and be as easy to use as an iPad, which soldiers over a few years were commonly using. By April 2011, project manager Colonel Charles Wells announced version 3 of the Army system (code named "Griffin") was being deployed in the US war in Afghanistan. In January 2012, the United States Army Communications-Electronics Research, Development and Engineering Center hosted a meeting based on the DCGS-A early experience. It brought together technology providers in the hope of developing more integrated systems using cloud computing with open architectures, compared to previously specialized custom-built systems. A major contractor was Lockheed Martin, with computers supplied by Silicon Graphics International out of its Chippewa Falls, Wisconsin office. Software known as the Analyst's Notebook, originally developed by i2 Limited, was included in DCGS-A. IBM acquired i2 in 2011. Some US Army personnel reported using a Palantir Technologies product to improve their ability to predict locations of improvised explosive devices. An April 2012 report recommending further study after initial success. Palantir software was rated easy to use, but did not have the flexibility and wide number of data sources of DCGS-A. In July 2012, Congressman Duncan D. Hunter (from California, the state where Palantir is based) complained of US DoD obstacles to its wider use. Although a limited test in August 2011 by the Test and Evaluation Command had recommended deployment, operation problems of DCGS-A included the baseline system was "not operationally effective" with reboots on average about every 8 hours. A set of improvements was identified in November 2012. The press reported some of the shortcomings uncovered by General Genaro Dellarocco in the tests. The ambitious goal of integrating 473 data sources for 75 million reports proved to be challenging, after spending an estimated $2.3 billion on the Army system alone. In May 2013 Politico reported that Palantir lobbyists and some anonymous returning veterans continued to advocate the use of its software, despite its interoperability limits. In particular, members of special forces and US Marines were not required to use the official Army system. Similar stories appeared in other publications, with Army representatives (such as Major General Mary A. Legere) citing the limitations of various systems. Congressman Hunter was a member of the House Armed Services Committee which required a review of the program, after two other members of congress sent an open letter to Secretary of Defense Leon Panetta. The Senate Defense Appropriations Subcommittee included testimony from Army Chief of Staff General Ray Odierno. The 130th Engineer Brigade (United States) has found the system to be "unstable, slow, not friendly and a major hindrance to operations". The equivalent system for the United States Navy was planned for initial deployment by 2015, and within a shipboard network called Consolidated Afloat Networks and Enterprise Services (CANES) by 2016. Some early testing was announced in 2009 aboard the aircraft carrier USS Harry Truman. A portion of the software, a distributed data framework for the DCGS integration backbone (DIB) version 4, was submitted to an open-source software repository of the Codice Foundation on GitHub. The framework was new for DIB version 4, replacing the legacy DIB portal with an Ozone Widget Framework interface. It was written in the Java programming language. == DCGS-A == Distributed Common Ground System-Army (DCGS-A) is the United States Army's primary system to post data, process information, and disseminate Intelligence, Surveillance and Reconnaissance (ISR) information about the threat, weather, and terrain to echelons. DCGS-A provides commanders the ability to task battle-space sensors and receive intelligence information from multiple sources. === Promotion === An August 17, 2011, UPI article quoted i2 Chief Executive Officer Robert Griffin who commented on DCGS-A's best-of-breed approach to development. The article detailed the Army contracting with i2 for Analyst's Notebook software. "With its open architecture, Analyst's Notebook supports the Army's strategy to employ and integrate best-of-breed solutions from across the industry to meet the dynamic needs users face in the field on a daily basis." A February 1, 2012, article in the Army web page quoted Mark Kitz, DCGS-A technical director. DCGS-A "uses the latest in cloud technology to rapidly gather, collaborate and share intelligence data from multiple sources to deliver a common operating picture. DCGS-A is able to rapidly adapt to changing operational environments by leveraging an iterative development model and open architecture allowing for collaboration with multiple government, industry and academic partners." A July 2012 article in SIGNAL Magazine, monthly publication of the Armed Forces Communications and Electronics Association, promoted DCGS-A as taking advantage of technological environments with which young soldiers are familiar. The article quoted the DCGS-A program manager, Col. Charles Wells on the systems benefits. The article also included Lockheed Martin's DCGS-A program manager. The Milwaukee Journal Sentinel published an article May 4, 2012, about Wisconsin-located companies helping DCGS-A with cloud computing technology. The article promoted the speed when cloud computing processes intelligence and cost savings by analyzing data in the field. === The U.S. Army's 2011 Posture Statement === The U.S. Army released its 2011 Army Posture Statement March 2. It included a statement on DCGS-A: “The Distributed Common Ground System-Army (DCGS-A) is the Army's premier intelligence, surveillance, and reconnaissance (ISR) enterprise for the tasking of sensors, analysis and processing of data, exploitation of data, and dissemination of intelligence (TPED) across all echelons. It is the Army component of the larger Defense Intelligence Information Enterprise (DI2E) and interoperable with other Service DCGS programs. Under the DI2E framework, USD (I) hopes to provide COCOM Joint Intelligence Operations Centers (JIOCs) capabilities interoperable with DCGS-A through a Cloud/widget approach. DCGS-A connects tactical, operational, and theater-level commanders to hundreds of intelligence and intelligence-related data sources at all classification levels and allows them to focus efforts of the entire ISR community on their information requirements. === Comparisons === Some Ground Commanders who describe DCGS-A as "unwieldy and unreliable, hard to learn and difficult to use," supporting alternative software from Palantir Technologies. Palantir software supports small unit situational awareness, but is not sufficiently funded to support the broader role that DCGS-A fulfills. == Operators == 480th Intelligence, Surveillance and Reconnaissance Wing 9th Intelligence Squadron 13th Intelligence Squadron 548th Intelligence, Surveillance and Reconnaissance Group 548 Operational Support Squadron 48th Intelligence Squadron 101st Intelligence Squadron 113th Air Support Operations Squadron 127th Command and Control Squadron 161st Intelligence Squadron

    Read more →
  • VieON

    VieON

    VieON is an mobile application for television and video on demand provided by VieON Joint Stock Company (formerly Dzones), a subsidiary of DatVietVAC Media and Entertainment Group in Vietnam. The app was launched in 2020, featuring over 140 domestic and international television channels, original series, popular entertainment programs known nationwide, top-tier sports events and live streaming of major events. Additionally, VieON provides animated films, television series and television programs from various countries such as South Korea and China. == History == The application was planned for development in 2016, with the cooperation of strategic consulting partner BCG Digital Ventures from the United States. Prior to 2020, VieON was a rebranded version of VTVcab ON, a product managed by Vietnam Cable Television Corporation (VTVCab) and DatVietVAC. On June 15, 2020, after four years of research and testing, the new version of VieON was officially released by DatVietVAC Group, with Vie Channel Joint Stock Company as the business entity and service provider. This is considered the official launch date of the application. On July 21, 2023, VieON transitioned its business operations and service provision to VieON Joint Stock Company. In January 2024, VieON officially launched its global version, VieON Global, targeting Vietnamese users living abroad. == Background == According to Kantar Media Vietnam, up to 84% of Vietnamese people aged 15–54 use social media daily, and in a similar survey by Nielsen, 90% of respondents said they watch live TV weekly. Additionally, according to research organization Muvi, Southeast Asia's OTT market revenue could reach $650 million annually starting next year. Understanding this, DatVietVAC Group has planned to research and develop an OTT application, even though the Vietnamese market already has some major players such as FPT Play and the international giant Netflix. Additionally, DatVietVAC does not hide its ambition to make this application the number one entertainment channel for Vietnamese people.

    Read more →
  • Overcast (app)

    Overcast (app)

    Overcast is a podcast app for iOS that was launched in 2014 by founder and operator Marco Arment. == Founder and operator == Arment was also the Chief Technology Officer of Tumblr and founder of Instapaper before founding Overcast, and he had created his own podcasts before launching the app. In March 2023, Arment told The Vergecast how he built and maintains Overcast by himself, and that he uses ad banners promoting podcasts to cover the costs of the free app. == Features and reception == In 2014, Overcast received positive reviews from MacWorld and iMore. In 2015, The Verge and The Sweet Setup each named it the best podcast app for iOS that year. In 2017, Discover Pods gave an endorsement citing the "smart speed" feature, which shortens quiet gaps in a podcast. In April 2019, Overcast introduced a feature that allowed users to share clips from podcasts to social media. In January 2020, Overcast was updated to allow users to skip the intros and outros of podcasts.

    Read more →
  • Load file

    Load file

    A load file in the litigation community is commonly referred to as the file used to import data (coded, captured or extracted data from ESI processing) into a database; or the file used to link images. These load files carry commands, commanding the software to carry out certain functions with the data found in them. Load files are usually ASCII text files that have delimited fields of information. Such load files may have data about documents to be imported into a document management software such as Concordance or Summation. Or they may have the path or directory where images may reside so that the software can link such images to their corresponding records. Some database programs take one load file for importing images and another for importing data while others take only one load file for both pieces of information. OCR or Search-able Text which is considered "data" is also imported into most database programs via the same load files. Though some people prefer to load the OCR into their databases by running a separate command to search and find the desired text. Commonly used databases and their corresponding file extensions are: Summation (DII , CSV), Concordance (OPT, DAT), Sanction (SDT), IPRO (LFP), Ringtail (MDB) and DB/TextWorks (TXT).

    Read more →
  • FMLLR

    FMLLR

    In signal processing, Feature space Maximum Likelihood Linear Regression (fMLLR) is a global feature transform that are typically applied in a speaker adaptive way, where fMLLR transforms acoustic features to speaker adapted features by a multiplication operation with a transformation matrix. In some literature, fMLLR is also known as the Constrained Maximum Likelihood Linear Regression (cMLLR). == Overview == fMLLR transformations are trained in a maximum likelihood sense on adaptation data. These transformations may be estimated in many ways, but only maximum likelihood (ML) estimation is considered in fMLLR. The fMLLR transformation is trained on a particular set of adaptation data, such that it maximizes the likelihood of that adaptation data given a current model-set. This technique is a widely used approach for speaker adaptation in HMM-based speech recognition. Later research also shows that fMLLR is an excellent acoustic feature for DNN/HMM hybrid speech recognition models. The advantage of fMLLR includes the following: the adaptation process can be performed within a pre-processing phase, and is independent of the ASR training and decoding process. this type of adapted feature can be applied to deep neural networks (DNN) to replace traditionally used mel-spectrogram in end-to-end speech recognition models. fMLLR's speaker adaptation process leads to a significant performance boost for ASR models, hence outperforming other transform or features like MFCCs (Mel-Frequency Cepstral Coefficients) and FBANKs (Filter bank) coefficients. fMLLR features can be efficiently realized with speech toolkits like Kaldi. Major problem and disadvantage of fMLLR: when the amount of adaptation data is limited, the transformation matrices tends to easily overfit the given data. == Computing fMLLR transform == Feature transform of fMLLR can be easily computed with the open source speech tool Kaldi, the Kaldi script uses the standard estimation scheme described in Appendix B of the original paper, in particular the section Appendix B.1 "Direct method over rows". In the Kaldi formulation, fMLLR is an affine feature transform of the form x {\displaystyle x} → A {\displaystyle A} x {\displaystyle x} + b {\displaystyle +b} , which can be written in the form x {\displaystyle x} →W x ^ {\displaystyle {\hat {x}}} , where x ^ {\displaystyle {\hat {x}}} = [ x 1 ] {\displaystyle {\begin{bmatrix}x\\1\end{bmatrix}}} is the acoustic feature x {\displaystyle x} with a 1 appended. Note that this differs from some of the literature where the 1 comes first as x ^ {\displaystyle {\hat {x}}} = [ 1 x ] {\displaystyle {\begin{bmatrix}1\\x\end{bmatrix}}} . The sufficient statistics stored are: K = ∑ t , j , m γ j , m ( t ) Σ j m − 1 μ j m x ( t ) + {\displaystyle K=\sum _{t,j,m}\gamma _{j,m}(t)\textstyle \Sigma _{jm}^{-1}\mu _{jm}x(t)^{+}\displaystyle } where Σ j m − 1 {\displaystyle \textstyle \Sigma _{jm}^{-1}\displaystyle } is the inverse co-variance matrix. And for 0 ≤ i ≤ D {\displaystyle 0\leq i\leq D} where D {\displaystyle D} is the feature dimension: G ( i ) = ∑ t , j , m γ j , m ( t ) ( 1 σ j , m 2 ( i ) ) x ( t ) + x ( t ) + T {\displaystyle G^{(i)}=\sum _{t,j,m}\gamma _{j,m}(t)\left({\frac {1}{\sigma _{j,m}^{2}(i)}}\right)x(t)^{+}x(t)^{+T}\displaystyle } For a thorough review that explains fMLLR and the commonly used estimation techniques, see the original paper "Maximum likelihood linear transformations for HMM-based speech recognition ". Note that the Kaldi script that performs the feature transforms of fMLLR differs with by using a column of the inverse in place of the cofactor row. In other words, the factor of the determinant is ignored, as it does not affect the transform result and can causes potential danger of numerical underflow or overflow. == Comparing with other features or transforms == Experiment result shows that by using the fMLLR feature in speech recognition, constant improvement is gained over other acoustic features on various commonly used benchmark datasets (TIMIT, LibriSpeech, etc). In particular, fMLLR features outperform MFCCs and FBANKs coefficients, which is mainly due to the speaker adaptation process that fMLLR performs. In, phoneme error rate (PER, %) is reported for the test set of TIMIT with various neural architectures: As expected, fMLLR features outperform MFCCs and FBANKs coefficients despite the use of different model architecture. Where MLP (multi-layer perceptron) serves as a simple baseline, on the other hand RNN, LSTM, and GRU are all well known recurrent models. The Li-GRU architecture is based on a single gate and thus saves 33% of the computations over a standard GRU model, Li-GRU thus effectively address the gradient vanishing problem of recurrent models. As a result, the best performance is obtained with the Li-GRU model on fMLLR features. == Extract fMLLR features with Kaldi == fMLLR can be extracted as reported in the s5 recipe of Kaldi. Kaldi scripts can certainly extract fMLLR features on different dataset, below are the basic example steps to extract fMLLR features from the open source speech corpora Librispeech. Note that the instructions below are for the subsets train-clean-100,train-clean-360,dev-clean, and test-clean, but they can be easily extended to support the other sets dev-other, test-other, and train-other-500. These instruction are based on the codes provided in this GitHub repository, which contains Kaldi recipes on the LibriSpeech corpora to execute the fMLLR feature extraction process, replace the files under $KALDI_ROOT/egs/librispeech/s5/ with the files in the repository. Install Kaldi. Install Kaldiio. If running on a single machine, change the following lines in $KALDI_ROOT/egs/librispeech/s5/cmd.sh to replace queue.pl to run.pl: Change the data path in run.sh to your LibriSpeech data path, the directory LibriSpeech/ should be under that path. For example: Install flac with: sudo apt-get install flac Run the Kaldi recipe run.sh for LibriSpeech at least until Stage 13 (included), for simplicity you can use the modified run.sh. Copy exp/tri4b/trans. files into exp/tri4b/decode_tgsmall_train_clean_/ with the following command: Compute the fMLLR features by running the following script, the script can also be downloaded here: Compute alignments using: Apply CMVN and dump the fMLLR features to new .ark files, the script can also be downloaded here: Use the Python script to convert Kaldi generated .ark features to .npy for your own dataloader, an example Python script is provided:

    Read more →
  • Clipmap

    Clipmap

    In computer graphics, clipmapping is a method of clipping a mipmap to a subset of data pertinent to the geometry being displayed. This is useful for loading as little data as possible when memory is limited, such as on a graphics processing unit. The technique is used for LODing in NVIDIA’s implementation of voxel cone tracing. The high-resolution levels of the mipmapped scene representation are clipped to a region near the camera, while lower resolution levels are clipped further away. == MegaTexture == MegaTexture is a clipmap implementation developed by id Software. It was introduced in their id Tech 4 engine and also appeared in id Tech 5 and id Tech 6 before being removed in id Tech 7. MegaTexture is a texture allocation technique that uses a single, extremely large texture rather than repeating multiple smaller textures. It is also featured in Splash Damage's game Enemy Territory: Quake Wars, and was developed by id Software former technical director John Carmack. MegaTexture employs a single large texture space for static terrain. The texture is stored on removable media or a computer's hard drive and streamed as needed, allowing large amounts of detail and variation over a large area with comparatively little RAM usage. Depending on the pixel resolution per square meter, covering a large area could require several gigabytes of memory. However, RAM is also filled by the rest of the game and the underlying operating system, limiting the amount available for texturing. As the player moves around the game, different sections of the MegaTexture are loaded into memory. They are then scaled to the correct size and applied to the 3D models of the terrain. Id has presented a more advanced technique that builds upon the MegaTexture idea and virtualizes both the geometry and the textures to obtain unique geometry down to the equivalent of the texel: the sparse voxel octree (SVO). It works by raycasting the geometry represented by voxels (instead of triangles) stored in an octree. The goal is to stream parts of the octree into video memory, going further down along the tree for nearby objects to give them more details, and to use higher level, larger voxels for farther objects, which give an automatic level of detail (LOD) system for both geometry and textures at the same time. The geometric detail that can be obtained using this method is nearly infinite, which removes the need for faking 3-dimensional details with techniques such as normal mapping. Despite that most voxel rendering tests use very large amounts of memory (up to several GB), Jon Olick of id Software claimed the technology is able to compress such SVO to 1.15 bits per voxel of position data. == Virtual texturing == Unlike clipmaps, which clip each mip level around a viewpoint-dependent clipcenter and therefore work best for terrain, virtual texturing preprocesses texture data into equally sized tiles that can be streamed for arbitrary textured geometry. Rage, powered by the id Tech 5 engine, uses a more advanced technique called virtual texturing. Textures can measure up to 128000×128000 pixels and are also used for in-game models and sprites, etc. and not just the terrain. Wolfenstein: The New Order and the 2016 version of Doom also use these. Carmageddon: Reincarnation also uses virtual texturing, though unlike id's virtual texturing system, which is designed for unique texture-mapping everywhere, their system is designed to use storage space sparingly while still offering good blend of texture variation and resolution.

    Read more →
  • Screen space ambient occlusion

    Screen space ambient occlusion

    Screen space ambient occlusion (SSAO) is a computer graphics technique for efficiently approximating the ambient occlusion effect in real time. It was developed by Vladimir Kajalin while working at Crytek and was used for the first time in 2007 by the video game Crysis, also developed by Crytek. == Implementation == The algorithm is implemented as a pixel shader, analyzing the scene depth buffer which is stored in a texture. For every pixel on the screen, the pixel shader samples the depth values around the current pixel and tries to compute the amount of occlusion from each of the sampled points. In its simplest implementation, the occlusion factor depends only on the depth difference between sampled point and current point. Without additional smart solutions, such a brute force method would require about 200 texture reads per pixel for good visual quality. This is not acceptable for real-time rendering on current graphics hardware. In order to get high quality results with far fewer reads, sampling is performed using a randomly rotated kernel. The kernel orientation is repeated every N screen pixels in order to have only high-frequency noise in the final picture. In the end this high frequency noise is greatly removed by a NxN post-process blurring step taking into account depth discontinuities (using methods such as comparing adjacent normals and depths). Such a solution allows a reduction in the number of depth samples per pixel to about 16 or fewer while maintaining a high quality result, and allows the use of SSAO in soft real-time applications like computer games. Compared to other ambient occlusion solutions, SSAO has the following advantages: Independent from scene complexity. No data pre-processing needed, no loading time and no memory allocations in system memory. Works with dynamic scenes. Works in the same consistent way for every pixel on the screen. No CPU usage – it can be executed completely on the GPU. May be easily integrated into any modern graphics pipeline. SSAO also has the following disadvantages: Rather local and in many cases view-dependent, as it is dependent on adjacent texel depths which may be generated by any geometry whatsoever. Hard to correctly smooth/blur out the noise without interfering with depth discontinuities, such as object edges (the occlusion should not "bleed" onto objects). Because SSAO operates only on the current depth buffer, it can miss occluding geometry that is not rasterized into the z-buffer and may produce undersampling-related artifacts.

    Read more →