AI Coding Benchmark Ranking

AI Coding Benchmark Ranking — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Structured sparsity regularization

    Structured sparsity regularization

    Structured sparsity regularization is a class of methods, and an area of research in statistical learning theory, that extend and generalize sparsity regularization learning methods. Both sparsity and structured sparsity regularization methods seek to exploit the assumption that the output variable Y {\displaystyle Y} (i.e., response, or dependent variable) to be learned can be described by a reduced number of variables in the input space X {\displaystyle X} (i.e., the domain, space of features or explanatory variables). Sparsity regularization methods focus on selecting the input variables that best describe the output. Structured sparsity regularization methods generalize and extend sparsity regularization methods, by allowing for optimal selection over structures like groups or networks of input variables in X {\displaystyle X} . Common motivation for the use of structured sparsity methods are model interpretability, high-dimensional learning (where dimensionality of X {\displaystyle X} may be higher than the number of observations n {\displaystyle n} ), and reduction of computational complexity. Moreover, structured sparsity methods allow to incorporate prior assumptions on the structure of the input variables, such as overlapping groups, non-overlapping groups, and acyclic graphs. Examples of uses of structured sparsity methods include face recognition, magnetic resonance image (MRI) processing, socio-linguistic analysis in natural language processing, and analysis of genetic expression in breast cancer. == Definition and related concepts == === Sparsity regularization === Consider the linear kernel regularized empirical risk minimization problem with a loss function V ( y i , f ( x ) ) {\displaystyle V(y_{i},f(x))} and the ℓ 0 {\displaystyle \ell _{0}} "norm" as the regularization penalty: min w ∈ R d 1 n ∑ i = 1 n V ( y i , ⟨ w , x i ⟩ ) + λ ‖ w ‖ 0 , {\displaystyle \min _{w\in \mathbb {R} ^{d}}{\frac {1}{n}}\sum _{i=1}^{n}V(y_{i},\langle w,x_{i}\rangle )+\lambda \|w\|_{0},} where x , w ∈ R d {\displaystyle x,w\in \mathbb {R^{d}} } , and ‖ w ‖ 0 {\displaystyle \|w\|_{0}} denotes the ℓ 0 {\displaystyle \ell _{0}} "norm", defined as the number of nonzero entries of the vector w {\displaystyle w} . f ( x ) = ⟨ w , x i ⟩ {\displaystyle f(x)=\langle w,x_{i}\rangle } is said to be sparse if ‖ w ‖ 0 = s < d {\displaystyle \|w\|_{0}=s 0 {\displaystyle w_{j}>0} . However, as in this case groups may overlap, we take the intersection of the complements of those groups that are not set to zero. This intersection of complements selection criteria implies the modeling choice that we allow some coefficients within a particular group g {\displaystyle g} to be set to zero, while others within the same group g {\displaystyle g} may remain positive. In other words, coefficients within a group may differ depending on the several group memberships that each variable within the group may have. ==== Union of groups: latent group Lasso ==== A different approach is to consider union of groups for variable selection. This approach captures the modeling situation where variables can be selected as long as they belong at least to one group with positive coefficients. This modeling perspective implies that we want to preserve group structure. The formulation of the union of groups approach is also referred to as latent group Lasso, and requires to modify the group ℓ 2 {\displaystyle \ell _{2}} norm considered above and introduce the following regularizer R ( w ) = i n f { ∑ g ‖ w g ‖ g : w = ∑ g = 1 G w ¯ g } {\displaystyle R(w)=inf\left\{\sum _{g}\|w_{g}\|_{g}:w=\sum _{g=1}^{G}{\bar {w}}_{g}\right\}} where w ∈ R d {\displaystyle w\in {\mathbb {R^{d}} }} , w g ∈ G g {\displaystyle w_{g}\in G_{g}} is the vector of coefficients of group g, and w ¯ g ∈ R d {\displaystyle {\bar {w}}_{g}\in {\mathbb {R^{d}} }} is a vector with coefficients w g j {\displaystyle w_{g}^{j}} for all variables j {

    Read more →
  • IPUMS

    IPUMS

    IPUMS, originally the Integrated Public Use Microdata Series, is the world's largest individual-level population database. IPUMS consists of microdata samples from United States (IPUMS-USA) and international (IPUMS-International) census records, as well as data from U.S. and international surveys. The records are converted into a consistent format and made available to researchers through a web-based data dissemination and analysis system. IPUMS is housed at the Institute for Social Research and Data Innovation (ISRDI), an interdisciplinary research center at the University of Minnesota, under the direction of Professor Steven Ruggles. == Description == IPUMS includes all persons enumerated in the United States censuses from 1850 to 1950 (though, the 1890 census is missing because it was destroyed in a fire) and from the American Community Survey since 2000 and the Current Population Survey since 1962. IPUMS includes household-level data for United States Censuses from 1790 to 1840, due to the first six censuses only including the name of the head of household, with tallied household totals following. IPUMS provides consistent variable names, coding schemes, and documentation across all the samples, facilitating the analysis of long-term change. IPUMS-International includes countries from Africa, Asia, Europe, and Latin America for 1960 forward. The database currently includes more than a billion individuals enumerated in 365 censuses from 94 countries around the world. IPUMS-International converts census microdata for multiple countries into a consistent format, allowing for comparisons across countries and time periods. Special efforts are made to simplify use of the data while losing no meaningful information. Comprehensive documentation is provided in a coherent form to facilitate comparative analyses of social and economic change. Additional databases in the IPUMS family include the: North Atlantic Population Project (NAPP) IPUMS National Historical Geographic Information System (NHGIS) IPUMS Health Surveys IPUMS Global Health IPUMS Time Use The Journal of American History described the effort as "One of the great archival projects of the past two decades." Liens Socio, the French portal for the social sciences, gave IPUMS the only “best site” designation that has gone to any non-French website, writing “IPUMS est un projet absolument extraordinaire...époustouflante [mind-blowing]!” The official motto of IPUMS is "use it for good, never for evil." All public IPUMS data and documentation are available online free of charge.

    Read more →
  • WebGPU Shading Language

    WebGPU Shading Language

    WebGPU Shading Language (WGSL, internet media type: text/wgsl) is a high-level shading language and the normative shader language for the WebGPU API on the web. WGSL's syntax is influenced by Rust and is designed with strong static validation, explicit resource binding, and portability in mind for secure execution in browsers. In web contexts, WebGPU implementations accept WGSL source and perform compilation to platform-specific intermediate forms (for example, to SPIR‑V, DXIL, or MSL via the user agent), but such backends are not exposed to web content. == History and background == Graphics on the web historically used WebGL, with shaders written in GLSL ES. As applications demanded more modern GPU features and finer control over compute and graphics pipelines, the W3C's GPU for the Web Community Group and Working Group created WebGPU and its companion shading language, WGSL, to provide a secure, portable model suitable for the web platform. WGSL was developed to be human-readable, avoid undefined behavior common in legacy shading languages, and align closely with WebGPU's resource and validation model. == Design goals == WGSL's design emphasizes: Safety and determinism suitable for web security constraints (extensive static validation and well-defined semantics). Portability across diverse GPU backends via an abstract resource model shared with WebGPU. Readability and explicitness (no preprocessor, minimal implicit conversions, explicit address spaces and bindings). Alignment with modern GPU features (compute, storage buffers, textures, atomics) while retaining a familiar C/Rust-like syntax. == Language overview == === Types and values === Core scalar types include bool, i32, u32, and f32. Vectors (e.g., vec2, vec3, vec4) and matrices (up to 4×4) are available for floating-point element types. Optional f16 (half precision) may be enabled via a WebGPU feature; availability is implementation-dependent. Atomic types (atomic, atomic) support limited atomic operations in qualified address spaces. === Variables and address spaces === Variables are declared with let (immutable), var (mutable), or const (compile-time constant). Storage classes (address spaces) include function, private, workgroup, uniform, and storage with read or read_write access as applicable. WGSL defines explicit layout and alignment rules; attributes such as @align, @size, and @stride control data layout for buffer interoperability. === Functions and control flow === Functions use explicit parameter and return types. Control flow includes if, switch, for, while, and loop constructs, with break/continue. Recursion is disallowed; entry-point call graphs must be acyclic. === Entry points and attributes === Shaders define stage entry points with @vertex, @fragment, or @compute. Attributes annotate bindings and interfaces, including @group, @binding (resource binding), @location (user-defined I/O), @builtin (stage built-ins such as position or global_invocation_id), @interpolate, and @workgroup_size. === Resources === WGSL exposes buffers (uniform, storage), textures (sampled, storage, and multisampled variants), and samplers (filtering/non-filtering/comparison). The binding model is explicit via descriptor sets called groups and bindings, matching WebGPU's pipeline layout model. == Compilation and validation == Browsers compile WGSL to platform-appropriate representations and native driver formats; the specific compilation pipeline is not observable by web content. WGSL source undergoes strict parsing and static validation, and WebGPU enforces robust resource access rules to avoid out-of-bounds memory hazards, contributing to predictable behavior across implementations. == Shader stages == WGSL supports three pipeline stages: vertex, fragment, and compute. === Vertex shaders === Vertex shaders transform per-vertex inputs and produce values for rasterization, including a clip-space position written to the position builtin. ==== Example ==== === Fragment shaders === Fragment shaders run per-fragment and compute color (and optionally depth) outputs written to color attachments. ==== Example ==== If half-precision (vec4h, shorthand for vec4) is desired, the code must be prefaced with a enable f16; statement. === Compute shaders === Compute shaders run in workgroups and are used for general-purpose GPU computations. ==== Example ==== == Differences from GLSL and HLSL == Compared with legacy shading languages, WGSL: Omits a preprocessor and requires explicit types and conversions. Uses explicit address spaces and binding annotations aligned with WebGPU's model. Enforces strict validation to avoid undefined behavior common in other shading languages. Defines a portable, web-focused feature set; 16-bit types and other features are opt-in and may depend on device capabilities.

    Read more →
  • GEPIR

    GEPIR

    GEPIR (Global Electronic Party Information Registry) was a distributed database operated and owned by GS1 that contains basic information on over 1,000,000 companies in over 100 countries. The database could be searched by Global Trade Item Number (GTIN) code (including Universal Product Code (UPC) and EAN-13 codes), container Code (Serial Shipping Container Code (SSCC)), location number (Global Location Number (GLN)), and (in some countries) the company name. A SOAP webservice existed for API access. As of end December 2023, GEPIR was replaced by a service called Verified by GS1. While it operated, GEPIR had more than 1 million members in more than 100 countries. In 2013, all GS1 111 member organisations joined GEPIR. == Access == GEPIR was accessible for free in almost all countries but the number of request per day was limited (from 20 to 30). Since October 2013, GS1 France restricts access to GEPIR to companies (registration with SIREN code was required to use it). A premium access service had been created by GS1 France in January 2010 which allows companies to use GS1 web and SOAP interface without any limit. == System architecture == GEPIR was a lookup service coordinated by the GS1 GO that provided all end users with the ability to look up information about GS1 Identification Keys. Depending on the service, systems were provided by GS1 Member Organisations (MOs) or 3rd party service providers, or both. Where a GS1 MO did not choose to provide the service directly to its end users, the GS1 Global Office provided the service for that geography. Some services involved a technical component deployed by the GS1 Global Office that coordinates the systems provided by GS1 MOs and/or 3rd party service providers. The GEPIR service was provided by systems deployed by GS1 MOs, with the GS1 GO providing a central point of coordination to federate the local systems. The GS1 GO also provides the MO-level service for MOs that could not or did not wish to deploy their own system.

    Read more →
  • ELIZA

    ELIZA

    ELIZA is an early natural language processing computer program developed from 1964 to 1967 at MIT by Joseph Weizenbaum. Created to explore communication between humans and machines, ELIZA simulated conversation by using a pattern matching and substitution methodology that gave users an illusion of understanding on the part of the program, but gave no response that could be considered really understanding what was being said by either party. Whereas the ELIZA program itself was written (originally) in MAD-SLIP, the pattern matching directives that contained most of its language capability were provided in separate "scripts", represented in a Lisp-like expression. The most famous script, DOCTOR, simulated a psychotherapist of the Rogerian school (in which the therapist often reflects back the patient's words to the patient), and used rules, dictated in the script, to respond with non-directional questions to user inputs. As such, ELIZA was one of the first chatbots (originally "chatterbots") and one of the first programs capable of attempting the Turing test. Weizenbaum intended the program as a method to explore communication between humans and machines. He was surprised that some people, including his secretary, attributed human-like feelings to the computer program, a phenomenon that came to be called the ELIZA effect. Many academics believed that the program would be able to positively influence the lives of many people, particularly those with psychological issues, and that it could aid doctors working on such patients' treatment. While ELIZA was capable of engaging in discourse, it could not converse with true understanding. However, many early users were convinced of ELIZA's intelligence and understanding, despite Weizenbaum's insistence to the contrary. The original ELIZA source code had been missing since its creation in the 1960s, as it was not common to publish articles that included source code at that time. However, more recently the MAD-SLIP source code was discovered in the MIT archives and published on various platforms, such as the Internet Archive. The source code is of high historical interest since it demonstrates not only the specificity of programming languages and techniques at that time, but also the beginning of software layering and abstraction as a means of achieving sophisticated software programming. == Overview == Joseph Weizenbaum's ELIZA, running the DOCTOR script, created a conversational interaction somewhat similar to what might take place in the office of "a [non-directive] psychotherapist in an initial psychiatric interview" and to "demonstrate that the communication between man and machine was superficial". While ELIZA is best known for acting in the manner of a psychotherapist, the speech patterns are due to the data and instructions supplied by the DOCTOR script. ELIZA itself examined the text for keywords, applied values to said keywords, and transformed the input into an output; the script that ELIZA ran determined the keywords, set the values of keywords, and set the rules of transformation for the output. Weizenbaum chose to make the DOCTOR script in the context of psychotherapy to "sidestep the problem of giving the program a data base of real-world knowledge", allowing it to reflect back the patient's statements to carry the conversation forward. The result was a somewhat intelligent-seeming response that reportedly deceived some early users of the program. Weizenbaum named his program ELIZA after Eliza Doolittle, a working-class character in George Bernard Shaw's Pygmalion (also appearing in the musical My Fair Lady, which was based on the play and was hugely popular at the time). According to Weizenbaum, ELIZA's ability to be "incrementally improved" by various users made it similar to Eliza Doolittle, since Eliza Doolittle was taught to speak with an upper-class accent in Shaw's play. However, unlike the human character in Shaw's play, ELIZA is incapable of learning new patterns of speech or new words through interaction alone. Edits must be made directly to ELIZA's active script in order to change the manner by which the program operates. Weizenbaum first implemented ELIZA in his own SLIP list-processing language, where, depending upon the initial entries by the user, the illusion of human intelligence could appear, or be dispelled through several interchanges. Some of ELIZA's responses were so convincing that Weizenbaum and several others have anecdotes of users becoming emotionally attached to the program, occasionally forgetting that they were conversing with a computer. Weizenbaum's own secretary reportedly asked Weizenbaum to leave the room so that she and ELIZA could have a real conversation. Weizenbaum was surprised by this, later writing: "I had not realized ... that extremely short exposures to a relatively simple computer program could induce powerful delusional thinking in quite normal people." In 1966, interactive computing (via a teletype) was new. It was 11 years before the personal computer became familiar to the general public, and three decades before most people encountered attempts at natural language processing in Internet services like Ask.com or PC help systems such as Microsoft Office Clippit. Although those programs included years of research and work, ELIZA remains a milestone because it was the first time a programmer had attempted such a human-machine interaction with the goal of creating the illusion (however brief) of human–human interaction. At the ICCC 1972, ELIZA was brought together with another early artificial-intelligence program named PARRY for a computer-only conversation. While ELIZA was built to speak as a doctor, PARRY was intended to simulate a patient with schizophrenia. == Design and implementation == Weizenbaum originally wrote ELIZA in MAD-SLIP for CTSS on an IBM 7094 as a program to make natural-language conversation possible with a computer. To accomplish this, Weizenbaum identified five "fundamental technical problems" for ELIZA to overcome: the identification of key words, the discovery of a minimal context, the choice of appropriate transformations, the generation of responses in the absence of key words, and the provision of an editing capability for ELIZA scripts. Weizenbaum solved these problems and made ELIZA such that it had no built-in contextual framework or universe of discourse. However, this required ELIZA to have a script of instructions on how to respond to inputs from users. ELIZA starts its process of responding to an input by a user by first examining the text input for a "keyword". A "keyword" is a word designated as important by the acting ELIZA script, which assigns to each keyword a precedence number, or a RANK, designed by the programmer. If such words are found, they are put into a "keystack", with the keyword of the highest RANK at the top. The input sentence is then manipulated and transformed as the rule associated with the keyword of the highest RANK directs. For example, when the DOCTOR script encounters words such as "alike" or "same", it would output a message pertaining to similarity, in this case "In what way?", as these words had high precedence number. This also demonstrates how certain words, as dictated by the script, can be manipulated regardless of contextual considerations, such as switching first-person pronouns and second-person pronouns and vice versa, as these too had high precedence numbers. Such words with high precedence numbers are deemed superior to conversational patterns and are treated independently of contextual patterns. Following the first examination, the next step of the process is to apply an appropriate transformation rule, which includes two parts: the "decomposition rule" and the "reassembly rule". First, the input is reviewed for syntactical patterns in order to establish the minimal context necessary to respond. Using the keywords and other nearby words from the input, different disassembly rules are tested until an appropriate pattern is found. Using the script's rules, the sentence is then "dismantled" and arranged into sections of the component parts as the "decomposition rule for the highest-ranking keyword" dictates. The example that Weizenbaum gives is the input "You are very helpful", which is transformed to "I are very helpful". This is then broken into (1) empty (2) "I" (3) "are" (4) "very helpful". The decomposition rule has broken the phrase into four small segments that contain both the keywords and the information in the sentence. The decomposition rule then designates a particular reassembly rule, or set of reassembly rules, to follow when reconstructing the sentence. The reassembly rule takes the fragments of the input that the decomposition rule had created, rearranges them, and adds in programmed words to create a response. Using Weizenbaum's example previously stated, such a reassembly rule would take the fragments and apply them to the phrase "What makes

    Read more →
  • Telebirr

    Telebirr

    Telebirr (Amharic: ቴሌብር) is a mobile payment service developed and was launched by Ethio telecom, the state owned telecommunication and Internet service provider in Ethiopia. It took five months to develop the end-to-end service. It facilitates the delivery of cashless transactions. The platform deployed currently has the capacity of processing up to 100 transactions per second (TPS) and can be scaled up to 1000 TPS. The service is accessible via SMS, USSD, and smartphone applications. Telebirr works in five languages. == Services == Though the service is fully accessible for any customer of Ethio telecom, the users need to register through the mobile application called Telebirr or using an authorized agent or Ethio telecom shop or Unstructured Supplementary Service Data (USSD), 127# nationally. However, Telebirr also provides a “quick registration” by using any information that already exists in Ethio telecom's system.

    Read more →
  • Linked timestamping

    Linked timestamping

    Linked timestamping is a type of trusted timestamping where issued time-stamps are related to each other. Each time-stamp would contain data that authenticates the time-stamp before it, the authentication would be authenticating the entire message, including the previous time-stamps authentication, making a chain. This makes it impossible to add a time-stamp in to the middle of the chain, as any time-stamps afterwards would be different. == Description == Linked timestamping creates time-stamp tokens which are dependent on each other, entangled in some authenticated data structure. Later modification of the issued time-stamps would invalidate this structure. The temporal order of issued time-stamps is also protected by this data structure, making backdating of the issued time-stamps impossible, even by the issuing server itself. The top of the authenticated data structure is generally published in some hard-to-modify and widely witnessed media, like printed newspaper or public blockchain. There are no (long-term) private keys in use, avoiding PKI-related risks. Suitable candidates for the authenticated data structure include: Linear hash chain Merkle tree (binary hash tree) Skip list The simplest linear hash chain-based time-stamping scheme is illustrated in the following diagram: The linking-based time-stamping authority (TSA) usually performs the following distinct functions: Aggregation For increased scalability the TSA might group time-stamping requests together which arrive within a short time-frame. These requests are aggregated together without retaining their temporal order and then assigned the same time value. Aggregation creates a cryptographic connection between all involved requests; the authenticating aggregate value will be used as input for the linking operation. Linking Linking creates a verifiable and ordered cryptographic link between the current and already issued time-stamp tokens. Publishing The TSA periodically publishes some links, so that all previously issued time-stamp tokens depend on the published link and that it is practically impossible to forge the published values. By publishing widely witnessed links, the TSA creates unforgeable verification points for validating all previously issued time-stamps. == Security == Linked timestamping is inherently more secure than the usual, public-key signature based time-stamping. All consequential time-stamps "seal" previously issued ones - hash chain (or other authenticated dictionary in use) could be built only in one way; modifying issued time-stamps is nearly as hard as finding a preimage for the used cryptographic hash function. Continuity of operation is observable by users; periodic publications in widely witnessed media provide extra transparency. Tampering with absolute time values could be detected by users, whose time-stamps are relatively comparable by system design. Absence of secret keys increases system trustworthiness. There are no keys to leak and hash algorithms are considered more future-proof than modular arithmetic based algorithms, e.g. RSA. Linked timestamping scales well - hashing is much faster than public key cryptography. There is no need for specific cryptographic hardware with its limitations. The common technology for guaranteeing long-term attestation value of the issued time-stamps (and digitally signed data) is periodic over-time-stamping of the time-stamp token. Because of missing key-related risks and of the plausible safety margin of the reasonably chosen hash function this over-time-stamping period of hash-linked token could be an order of magnitude longer than of public-key signed token. == Research == === Foundations === Stuart Haber and W. Scott Stornetta proposed in 1990 to link issued time-stamps together into linear hash-chain, using a collision-resistant hash function. The main rationale was to diminish TSA trust requirements. Tree-like schemes and operating in rounds were proposed by Benaloh and de Mare in 1991 and by Bayer, Haber and Stornetta in 1992. Benaloh and de Mare constructed a one-way accumulator in 1994 and proposed its use in time-stamping. When used for aggregation, one-way accumulator requires only one constant-time computation for round membership verification. Surety started the first commercial linked timestamping service in January 1995. Linking scheme is described and its security is analyzed in the following article by Haber and Sornetta. Buldas et al. continued with further optimization and formal analysis of binary tree and threaded tree based schemes. Skip-list based time-stamping system was implemented in 2005; related algorithms are quite efficient. === Provable security === Security proof for hash-function based time-stamping schemes was presented by Buldas, Saarepera in 2004. There is an explicit upper bound N {\displaystyle N} for the number of time stamps issued during the aggregation period; it is suggested that it is probably impossible to prove the security without this explicit bound - the so-called black-box reductions will fail in this task. Considering that all known practically relevant and efficient security proofs are black-box, this negative result is quite strong. Next, in 2005 it was shown that bounded time-stamping schemes with a trusted audit party (who periodically reviews the list of all time-stamps issued during an aggregation period) can be made universally composable - they remain secure in arbitrary environments (compositions with other protocols and other instances of the time-stamping protocol itself). Buldas, Laur showed in 2007 that bounded time-stamping schemes are secure in a very strong sense - they satisfy the so-called "knowledge-binding" condition. The security guarantee offered by Buldas, Saarepera in 2004 is improved by diminishing the security loss coefficient from N {\displaystyle N} to N {\displaystyle {\sqrt {N}}} . The hash functions used in the secure time-stamping schemes do not necessarily have to be collision-resistant or even one-way; secure time-stamping schemes are probably possible even in the presence of a universal collision-finding algorithm (i.e. universal and attacking program that is able to find collisions for any hash function). This suggests that it is possible to find even stronger proofs based on some other properties of the hash functions. At the illustration above hash tree based time-stamping system works in rounds ( t {\displaystyle t} , t + 1 {\displaystyle t+1} , t + 2 {\displaystyle t+2} , ...), with one aggregation tree per round. Capacity of the system ( N {\displaystyle N} ) is determined by the tree size ( N = 2 l {\displaystyle N=2^{l}} , where l {\displaystyle l} denotes binary tree depth). Current security proofs work on the assumption that there is a hard limit of the aggregation tree size, possibly enforced by the subtree length restriction. == Standards == ISO 18014 part 3 covers 'Mechanisms producing linked tokens'. American National Standard for Financial Services, "Trusted Timestamp Management and Security" (ANSI ASC X9.95 Standard) from June 2005 covers linking-based and hybrid time-stamping schemes. There is no IETF RFC or standard draft about linking based time-stamping. RFC 4998 (Evidence Record Syntax) encompasses hash tree and time-stamp as an integrity guarantee for long-term archiving.

    Read more →
  • Medical data breach

    Medical data breach

    Medical data, including patients' identity information, health status, disease diagnosis and treatment, and biogenetic information, not only involve patients' privacy but also have a special sensitivity and important value, which may bring physical and mental distress and property loss to patients and even negatively affect social stability and national security once leaked. However, the development and application of medical AI must rely on a large amount of medical data for algorithm training, and the larger and more diverse the amount of data, the more accurate the results of its analysis and prediction will be. However, the application of big data technologies such as data collection, analysis and processing, cloud storage, and information sharing has increased the risk of data leakage. In the United States, the rate of such breaches has increased over time, with 176 million records breached by the end of 2017. By 2024, the U.S. Department of Health and Human Services reported 725 large healthcare data breaches affecting approximately 275 million individual records in a single year, marking a significant escalation in both the frequency and scale of incidents. == Black market for health data == In February 2015 an NPR report claimed that organized crime networks had ways of selling health data in the black market. In 2015 a Beazley employee estimated that medical records could sell on the black market for US$40-50. == How data is lost == Theft, data loss, hacking, and unauthorized account access are ways in which medical data breaches happen. Among reported breaches of medical information in the United States networked information systems accounted for the largest number of records breached. There are many data breaches happening in the US health care system, among business associates of the health care providers that continuously gain access to patients' data. == List of data breaches == In February 2024, a ransomware attack on Change Healthcare, a subsidiary of UnitedHealth Group, compromised the protected health information of approximately 100 million individuals, making it the largest healthcare data breach in United States history. The attack disrupted claims processing for healthcare providers nationwide for several weeks. In May 2024, MediSecure suffered a cyberattack involving ransomware in Australia. In May 2021, the Health Service Executive in the Republic of Ireland was the victim of a cyberattack involving ransomware, in the Health Service Executive cyberattack, with admission records and test results present in a sample of the data reviewed by the Financial Times. In October 2018, the Centers for Medicare and Medicaid Services in the US reported that around 75,000 individual records had been affected by a data breach that took place through the ACA Agent and Broker Portal. In 2018, Social Indicators Research published the scientific evidence of 173,398,820 (over 173 million) individuals affected in USA from October 2008 (when the data were collected) to September 2017 (when the statistical analysis took place). In 2015, Anthem Inc. lost data for 37 million people in the Anthem medical data breach In 2014 4.5 million people using Complete Health Systems had their data stolen In 2013-14 1 million people using Montana Department of Public Health and Human Services had their data stolen In 2013 4 million people using Advocate Health and Hospitals Corporation had their data stolen In 2011 4.9 million users of Tricare services had their data stolen due to an employee error by Science Applications International Corporation In 2011 1.9 million people using Health Net had their data stolen In 2011 1 million people using Nemours Foundation had their data stolen In 2010 6800 people using New York-Presbyterian Hospital and Columbia University Medical Center had their data breached. In response, those organizations agreed to pay the United States Department of Health and Human Services a US$4.8 million dollar fine. In 2009 1 million people using BlueCross BlueShield of Tennessee had their data stolen == Regulation == In the United States, the Health Insurance Portability and Accountability Act and Health Information Technology for Economic and Clinical Health Act require companies to report data breaches to affected individuals and the federal government. Under the HIPAA Breach Notification Rule, covered entities must notify affected individuals without unreasonable delay and no later than 60 days after discovering a breach of unsecured protected health information. Breaches affecting 500 or more individuals must also be reported to the HHS Secretary and to prominent media outlets serving the affected state or jurisdiction within the same timeframe; HHS publicly lists these larger breaches on its breach portal, commonly known as the "wall of shame." Breaches affecting fewer than 500 individuals are reported to HHS annually, no later than 60 days after the end of the calendar year in which they were discovered. Health Information Privacy Health Insurance Portability and Accountability Act of 1996 (HIPAA). - 45 CFR Parts 160 and 164, Standards for Privacy of Individually Identifiable Health Information and Security Standards for the Protection of Electronic Protected Health Information. HIPAA includes provisions designed to save health care businesses money by encouraging electronic transactions, as well as regulations to protect the security and confidentiality of patient information. The Privacy Rule became effective April 14, 2001, and most covered entities (health plans, health care clearinghouses, and health care providers that conduct certain financial and administrative transactions electronically) had until April 2003 to comply. This security provision became effective April 21, 2003. The Health Insurance Portability and Accountability Act (HIPAA) is the baseline set of federal regulations governing medical information. It does three things: i. i. i.Establish a structure for how personal health information is disclosed and establish the rights of individuals with respect to health information; ii.Specify security standards for the retention and transmission of electronic patient information; iii.Need a common format and data structure for the electronic exchange of health information. California-Specific Laws California’s medical privacy laws, primarily the Confidentiality of Medical Information Act (CMIA), the data breach sections of the Civil Code, and sections of the Health and Safety Code, provide HIPAA-like protections, although the terminology is different. HIPAA establishes a federal "minimum standard" that applies where there are gaps in California law, and HIPAA also specifies that stricter state laws will override or supersede HIPAA. California's health care privacy laws apply to providers who provide personal health records (PHR), while HIPAA only applies when the provider providing the PHR is a business associate of a covered entity. Federal law does not grant individuals the right to file a lawsuit in the event of a data breach (only the Attorney General can file a lawsuit), but California law does. This means that California law sets a higher standard for medical privacy, and that individuals in California enjoy stronger legal protections and more ways to hold entities that violate their medical privacy accountable. In the UK, the legal framework for how patient data is cared for and processed is the Data Protection Act 2018 (DPA), which incorporates the EU General Data Protection Regulation (GDPR) into law, and the common law duty of confidentiality (CLDC). The data protection legislation requires that the collection and processing of personal data be fair, lawful and transparent. This means that the collection and processing of data as defined by data protection legislation must always have a valid lawful basis and must also meet the requirements of the CLDC. In the China, Article 18 of the "National Health Care Big Data Standards, Security and Services Management Measures (for Trial Implementation)" (National Health Planning and Development (2018) No. 23) promulgated by the National Health Care Commission in 2018 states, "The responsible unit shall adopt measures such as data classification, important data backup, and encryption authentication to guarantee the security of health care big data." However, the scope and definition of important data are not covered. Although the "Information Security Technology-Healthcare Data Security Guide" (the "Guide") issued by the National Standardization Committee also proposes that important data should be evaluated and approved in accordance with the regulations, there is likewise no definition of the connotation and definition of important data.

    Read more →
  • Computational intelligence

    Computational intelligence

    In computer science, computational intelligence (CI) refers to concepts, paradigms, algorithms and implementations of systems that are designed to show "intelligent" behavior in complex and changing environments. These systems are aimed at mastering complex tasks in a wide variety of technical or commercial areas and offer solutions that recognize and interpret patterns, control processes, support decision-making or autonomously manoeuvre vehicles or robots in unknown environments, among other things. These concepts and paradigms are characterized by the ability to learn or adapt to new situations, to generalize, to abstract, to discover and associate. Nature-analog or nature-inspired methods play a key role in this. CI approaches primarily address those complex real-world problems for which traditional or mathematical modeling is not appropriate for various reasons: the processes cannot be described exactly with complete knowledge, the processes are too complex for mathematical reasoning, they contain some uncertainties during the process, such as unforeseen changes in the environment or in the process itself, or the processes are simply stochastic in nature. Thus, CI techniques are properly aimed at processes that are ill-defined, complex, nonlinear, time-varying and/or stochastic. A recent definition of the IEEE Computational Intelligence Societey describes CI as the theory, design, application and development of biologically and linguistically motivated computational paradigms. Traditionally the three main pillars of CI have been Neural Networks, Fuzzy Systems and Evolutionary Computation. ... CI is an evolving field and at present in addition to the three main constituents, it encompasses computing paradigms like ambient intelligence, artificial life, cultural learning, artificial endocrine networks, social reasoning, and artificial hormone networks. ... Over the last few years there has been an explosion of research on Deep Learning, in particular deep convolutional neural networks. Nowadays, deep learning has become the core method for artificial intelligence. In fact, some of the most successful AI systems are based on CI. However, as CI is an emerging and developing field there is no final definition of CI, especially in terms of the list of concepts and paradigms that belong to it. The general requirements for the development of an “intelligent system” are ultimately always the same, namely the simulation of intelligent thinking and action in a specific area of application. To do this, the knowledge about this area must be represented in a model so that it can be processed. The quality of the resulting system depends largely on how well the model was chosen in the development process. Sometimes data-driven methods are suitable for finding a good model and sometimes logic-based knowledge representations deliver better results. Hybrid models are usually used in real applications. According to actual textbooks, the following methods and paradigms, which largely complement each other, can be regarded as parts of CI: Fuzzy systems Neural networks and, in particular, convolutional neural networks Evolutionary computation and, in particular, multi-objective evolutionary optimization Swarm intelligence Bayesian networks Artificial immune systems Learning theory Probabilistic methods == Relationship between hard and soft computing and artificial and computational intelligence == Artificial intelligence (AI) is used in the media, but also by some of the scientists involved, as a kind of umbrella term for the various techniques associated with it or with CI. Craenen and Eiben state that attempts to define or at least describe CI can usually be assigned to one or more of the following groups: "Relative definition” comparing CI to AI Conceptual treatment of key notions and their roles in CI Listing of the (established) areas that belong to it The relationship between CI and AI has been a frequently discussed topic during the development of CI. While the above list implies that they are synonyms, the vast majority of AI/CI researchers working on the subject consider them to be distinct fields, where either CI is an alternative to AI AI includes CI CI includes AI The view of the first of the above three points goes back to Zadeh, the founder of the fuzzy set theory, who differentiated machine intelligence into hard and soft computing techniques, which are used in artificial intelligence on the one hand and computational intelligence on the other. In hard computing (HC) and traditional AI (e.g. expert systems), inaccuracy and uncertainty are undesirable characteristics of a system, while soft computing (SC) and thus CI focus on dealing with these characteristics. The adjacent figure illustrates this view and lists the most important CI techniques. Another frequently mentioned distinguishing feature is the representation of information in symbolic form in AI and in sub-symbolic form in CI techniques. Hard computing is a conventional computing method based on the principles of certainty and accuracy and it is deterministic. It requires a precisely stated analytical model of the task to be processed and a prewritten program, i.e. a fixed set of instructions. The models used are based on Boolean logic (also called crisp logic), where e.g. an element can be either a member of a set or not and there is nothing in between. When applied to real-world tasks, systems based on HC result in specific control actions defined by a mathematical model or algorithm. If an unforeseen situation occurs that is not included in the model or algorithm used, the action will most likely fail. Soft computing, on the other hand, is based on the fact that the human mind is capable of storing information and processing it in a goal-oriented way, even if it is imprecise and lacks certainty. SC is based on the model of the human brain with probabilistic thinking, fuzzy logic and multi-valued logic. Soft computing can process a wealth of data and perform a large number of computations, which may not be exact, in parallel. For hard problems for which no satisfying exact solutions based on HC are available, SC methods can be applied successfully. SC methods are usually stochastic in nature i.e., they are a randomly defined processes that can be analyzed statistically but not with precision. Up to now, the results of some CI methods, such as deep learning, cannot be verified and it is also not clear what they are based on. This problem represents an important scientific issue for the future. AI and CI are catchy terms, but they are also so similar that they can be confused. The meaning of both terms has developed and changed over a long period of time, with AI being used first. Bezdek describes this impressively and concludes that such buzzwords are frequently used and hyped by the scientific community, science management and (science) journalism. Not least because AI and biological intelligence are emotionally charged terms and it is still difficult to find a generally accepted definition for the basic term intelligence. == History == In 1950, Alan Turing, one of the founding fathers of computer science, developed a test for computer intelligence known as the Turing test. In this test, a person can ask questions via a keyboard and a monitor without knowing whether his counterpart is a human or a computer. A computer is considered intelligent if the interrogator cannot distinguish the computer from a human. This illustrates the discussion about intelligent computers at the beginning of the computer age. The term Computational Intelligence was first used as the title of the journal of the same name in 1985 and later by the IEEE Neural Networks Council (NNC), which was founded 1989 by a group of researchers interested in the development of biological and artificial neural networks. On November 21, 2001, the NNC became the IEEE Neural Networks Society, to become the IEEE Computational Intelligence Society two years later by including new areas of interest such as fuzzy systems and evolutionary computation. The NNC helped organize the first IEEE World Congress on Computational Intelligence in Orlando, Florida in 1994. On this conference the first clear definition of Computational Intelligence was introduced by Bezdek: A system is computationally intelligent when it: deals with only numerical (low-level) data, has pattern-recognition components, does not use knowledge in the AI sense; and additionally when it (begins to) exhibit (1) computational adaptivity; (2) computational fault tolerance; (3) speed approaching human-like turnaround and (4) error rates that approximate human performance. Today, with machine learning and deep learning in particular utilizing a breadth of supervised, unsupervised, and reinforcement learning approaches, the CI landscape has been greatly enhanced, with novell intelligent approaches. == The main algorithmic approaches of CI and their applicati

    Read more →
  • Overcast (app)

    Overcast (app)

    Overcast is a podcast app for iOS that was launched in 2014 by founder and operator Marco Arment. == Founder and operator == Arment was also the Chief Technology Officer of Tumblr and founder of Instapaper before founding Overcast, and he had created his own podcasts before launching the app. In March 2023, Arment told The Vergecast how he built and maintains Overcast by himself, and that he uses ad banners promoting podcasts to cover the costs of the free app. == Features and reception == In 2014, Overcast received positive reviews from MacWorld and iMore. In 2015, The Verge and The Sweet Setup each named it the best podcast app for iOS that year. In 2017, Discover Pods gave an endorsement citing the "smart speed" feature, which shortens quiet gaps in a podcast. In April 2019, Overcast introduced a feature that allowed users to share clips from podcasts to social media. In January 2020, Overcast was updated to allow users to skip the intros and outros of podcasts.

    Read more →
  • Snapshot isolation

    Snapshot isolation

    In databases, and transaction processing (transaction management), snapshot isolation is a guarantee that all reads made in a transaction will see a consistent snapshot of the database (in practice it reads the last committed values that existed at the time it started), and the transaction itself will successfully commit only if no updates it has made conflict with any concurrent updates made since that snapshot. Snapshot isolation has been adopted by several major database management systems, such as InterBase, Firebird, Oracle, MySQL, PostgreSQL, SQL Anywhere, MongoDB and Microsoft SQL Server (2005 and later). The main reason for its adoption is that it allows better performance than serializability, yet still avoids most of the concurrency anomalies that serializability avoids (but not all). In practice snapshot isolation is implemented within multiversion concurrency control (MVCC), where generational values of each data item (versions) are maintained: MVCC is a common way to increase concurrency and performance by generating a new version of a database object each time the object is written, and allowing transactions' read operations of several last relevant versions (of each object). Snapshot isolation has been used to criticize the ANSI SQL-92 standard's definition of isolation levels, as it exhibits none of the "anomalies" that the SQL standard prohibited, yet is not serializable (the anomaly-free isolation level defined by ANSI). In spite of its distinction from serializability, snapshot isolation is sometimes referred to as serializable by Oracle. == Definition == A transaction executing under snapshot isolation appears to operate on a personal snapshot of the database, taken at the start of the transaction. When the transaction concludes, it will successfully commit only if the values updated by the transaction have not been changed externally since the snapshot was taken. Such a write–write conflict will cause the transaction to abort. In a write skew anomaly, two transactions (T1 and T2) concurrently read an overlapping data set (e.g. values V1 and V2), concurrently make disjoint updates (e.g. T1 updates V1, T2 updates V2), and finally concurrently commit, neither having seen the update performed by the other. Were the system serializable, such an anomaly would be impossible, as either T1 or T2 would have to occur "first", and be visible to the other. In contrast, snapshot isolation permits write skew anomalies. As a concrete example, imagine V1 and V2 are two balances held by a single person, Phil. The bank will allow either V1 or V2 to run a deficit, provided the total held in both is never negative (i.e. V1 + V2 ≥ 0). Both balances are currently $100. Phil initiates two transactions concurrently, T1 withdrawing $200 from V1, and T2 withdrawing $200 from V2. If the database guaranteed serializable transactions, the simplest way of coding T1 is to deduct $200 from V1, and then verify that V1 + V2 ≥ 0 still holds, aborting if not. T2 similarly deducts $200 from V2 and then verifies V1 + V2 ≥ 0. Since the transactions must serialize, either T1 happens first, leaving V1 = −$100, V2 = $100, and preventing T2 from succeeding (since V1 + (V2 − $200) is now −$200), or T2 happens first and similarly prevents T1 from committing. If the database is under snapshot isolation(MVCC), however, T1 and T2 operate on private snapshots of the database: each deducts $200 from an account, and then verifies that the new total is zero, using the other account value that held when the snapshot was taken. Since neither update conflicts, both commit successfully, leaving V1 = V2 = −$100, and V1 + V2 = −$200. Some systems built using multiversion concurrency control (MVCC) may support (only) snapshot isolation to allow transactions to proceed without worrying about concurrent operations, and more importantly without needing to re-verify all read operations when the transaction finally commits. This is convenient because MVCC maintains a series of recent history consistent states. The only information that must be stored during the transaction is a list of updates made, which can be scanned for conflicts fairly easily before being committed. However, MVCC systems (such as MarkLogic) will use locks to serialize writes together with MVCC to obtain some of the performance gains and still support the stronger "serializability" level of isolation. == Workarounds == Potential inconsistency problems arising from write skew anomalies can be fixed by adding (otherwise unnecessary) updates to the transactions in order to enforce the serializability property. Materialize the conflict Add a special conflict table, which both transactions update in order to create a direct write–write conflict. Promotion Have one transaction "update" a read-only location (replacing a value with the same value) in order to create a direct write–write conflict (or use an equivalent promotion, e.g. Oracle's SELECT FOR UPDATE). In the example above, we can materialize the conflict by adding a new table which makes the hidden constraint explicit, mapping each person to their total balance. Phil would start off with a total balance of $200, and each transaction would attempt to subtract $200 from this, creating a write–write conflict that would prevent the two from succeeding concurrently. However, this approach violates the normal form. Alternatively, we can promote one of the transaction's reads to a write. For instance, T2 could set V1 = V1, creating an artificial write–write conflict with T1 and, again, preventing the two from succeeding concurrently. This solution may not always be possible. In general, therefore, snapshot isolation puts some of the problem of maintaining non-trivial constraints onto the user, who may not appreciate either the potential pitfalls or the possible solutions. The upside to this transfer is better performance. == Terminology == Snapshot isolation is called "serializable" mode in Oracle and PostgreSQL versions prior to 9.1, which may cause confusion with the "real serializability" mode. There are arguments both for and against this decision; what is clear is that users must be aware of the distinction to avoid possible undesired anomalous behavior in their database system logic. == History == Snapshot isolation arose from work on multiversion concurrency control databases, where multiple versions of the database are maintained concurrently to allow readers to execute without colliding with writers. Such a system allows a natural definition and implementation of such an isolation level. InterBase, later owned by Borland, was acknowledged to provide SI rather than full serializability in version 4, and likely permitted write-skew anomalies since its first release in 1985. Unfortunately, the ANSI SQL-92 standard was written with a lock-based database in mind, and hence is rather vague when applied to MVCC systems. Berenson et al. wrote a paper in 1995 critiquing the SQL standard, and cited snapshot isolation as an example of an isolation level that did not exhibit the standard anomalies described in the ANSI SQL-92 standard, yet still had anomalous behaviour when compared with serializable transactions. In 2008, Cahill et al. showed that write-skew anomalies could be prevented by detecting and aborting "dangerous" triplets of concurrent transactions. This implementation of serializability is well-suited to multiversion concurrency control databases, and has been adopted in PostgreSQL 9.1, where it is known as Serializable Snapshot Isolation (SSI). When used consistently, this eliminates the need for the above workarounds. The downside over snapshot isolation is an increase in aborted transactions. This can perform better or worse than snapshot isolation with the above workarounds, depending on workload.

    Read more →
  • Mobile Passport Control

    Mobile Passport Control

    Mobile Passport Control (MPC) is a mobile app that enables eligible travelers entering the United States to submit their passport information and customs declaration form to Customs and Border Protection via smartphone or tablet and go through the inspections process using an expedited lane. It is available to "U.S. citizens, U.S. lawful permanent residents, Canadian B1/B2 citizen visitors and returning Visa Waiver Program travelers with approved ESTA". The app is available on iOS and Android devices and is operational at 34 US airports, 14 international airports offering preclearance facilities, and 4 seaports. The use of Mobile Passport Control operations have increased threefold from 2016 to 2017. == History == Mobile Passport Control operations were launched in Atlanta at the Hartsfield-Jackson International Airport in 2016 and is now available at 34 U.S. airports, 14 international airports that offer preclearance and 4 U.S. cruise ports. The Mobile Passport app is authorized by CBP and sponsored by the Airports Council International-North America, Boeing, and the Port of Everglades. Airside Mobile, Inc. secured a Series A funding of $6 million in the fall of 2017. == How it works == During the customs process at the Federal Inspection Service (FIS) area of a U.S. airport, travelers arriving from international locations typically wait in long lines before presenting passports and paperwork and verbally answering questions made by CBP officials. Eligible travelers who have downloaded the Mobile Passport app can expedite this process by submitting information regarding their passport and trip details, and a newly-taken selfie, via their mobile device to CBP officials, then access an expedited line. Mobile Passport Control users will be required to show their physical passport(s) and briefly talk to a CBP officer. == Locations == === US airports === Atlanta (ATL) Baltimore (BWI) Boston (BOS) Charlotte (CLT) Chicago (ORD) Dallas/Ft Worth (DFW) Denver (DEN) Detroit (DTW) as of 7/2024 Ft. Lauderdale (FLL) Honolulu (HNL) Houston (HOU and IAH) Kansas City (MCI) Las Vegas (LAS) Los Angeles (LAX) Miami (MIA) Minneapolis (MSP) New York (JFK) Newark (EWR) Oakland (OAK) Orlando (MCO) Palm Beach (PBI) Philadelphia (PHL) Phoenix (PHX) Pittsburgh (PIT) Portland (PDX) Sacramento (SMF) San Diego (SAN) San Francisco (SFO) San Jose (SJC) San Juan (SJU) Seattle (SEA) Tampa (TPA) Washington Dulles (IAD) === International Preclearance locations === Abu Dhabi (AUH) Aruba (AUA) Bermuda (BDA) Calgary (YYC) Dublin (DUB) Edmonton (YEG) Halifax (YHZ) Montreal (YUL) Nassau (NAS) Ottawa (YOW) Shannon (SNN) Toronto (YYZ) Vancouver (YVR) Winnipeg (YWG) Sepinggan (BPN) === Seaports === Fort Lauderdale (PEV) Miami (MSE) San Juan (PUE) West Palm Beach (WPB)

    Read more →
  • YaDICs

    YaDICs

    YaDICs is a program written to perform digital image correlation on 2D and 3D tomographic images. The program was designed to be both modular, by its plugin strategy and efficient, by it multithreading strategy. It incorporates different transformations (Global, Elastic, Local), optimizing strategy (Gauss-Newton, Steepest descent), Global and/or local shape functions (Rigid-body motions, homogeneous dilatations, flexural and Brazilian test models)... == Theoretical background == === Context === In solid mechanics, digital image correlation is a tool that allows to identify the displacement field to register a reference image (called herein fixed image) to images during an experiment (mobile image). For example, it is possible to observe the face of a specimen with a painted speckle on it in order to determine its displacement fields during a tensile test. Before the appearance of such methods, researchers usually used strain gauges to measure the mechanical state of the material but strain gauges only measure the strain on a point and don't allow to understand material with an heterogeneous behavior. One can obtain a full in plane strain tensor by derivation of the displacement fields. Many methods are based upon the optical flow. In fluid mechanics a similar method is used, called Particle Image Velocimetry (PIV); the algorithms are similar to those of DIC but it is impossible to ensure that the optical flow is conserved so a vast majority of the software used the normalized cross correlation metric. In mechanics the displacement or velocity fields are the only concern, registering images is just a side effect. There is another process called image registration using the same algorithms (on monomodal images) but where the goal is to register images and thereby identifying the displacement field is just a side effect. YaDICs uses the general principle of image registration with a particular attention to the displacement fields basis. === Image registration principle === YaDICs can be explained using the classical image registration framework: === Image registration general scheme === The common idea of image registration and digital image correlation is to find the transformation between a fixed image and a moving one for a given metric using an optimization scheme. While there are many methods to achieve such a goal, Yadics focuses on registering images with the same modality. The idea behind the creation of this software is to be able to process data that comes from a μ-tomograph; i.e.: data cube over 10003 voxels. With such a size it is not possible to use naive approach usually used in a two-dimensional context. In order to get sufficient performances OpenMP parallelism is used and data are not globally stored in memory. As an extensive description of the different algorithms is given in. === Sampling === Contrary to image registration, Digital Image Correlation targets the transformation, one wants to extracted the most accurate transformation from the two images and not just match the images. Yadics uses the whole image as a sampling grid: it is thus a total sampling. === Interpolator === It is possible to choose between bilinear interpolation and bicubic interpolation for the grey level evaluation at non integer coordinates. The bi-cubic interpolation is the recommended one. === Metrics === ==== Sum of squared differences (SSD) ==== The SSD is also known as mean squared error. The equation below defines the SSD metric: S S D ( μ , I F , I M ) = 1 | Ω F | ∑ x i ∈ Ω F ( I F ( x i ) − I M ( T μ ( x i ) ) ) 2 , {\displaystyle SSD(\mu ,{\mathcal {I_{F}}},{\mathcal {I_{M}}})={\dfrac {1}{\left|\Omega _{F}\right|}}\sum _{x_{i}\in \Omega _{F}}\left({\mathcal {I_{F}}}(x_{i})-{\mathcal {I_{M}}}({T}_{\mu }(x_{i}))\right)^{2},} where I F {\displaystyle {\mathcal {I_{F}}}} is the fixed image, I M {\displaystyle {\mathcal {I_{M}}}} the moving one, Ω F {\displaystyle \Omega _{F}} the integration area | Ω F | {\displaystyle \left|\Omega _{F}\right|} the number of pi(vo)xels (cardinal) and T μ {\displaystyle {T}_{\mu }} the transformation parametrized by μ The transformation can be written as: T μ ( x ) = x + { Φ ( x ) } t { μ } . {\displaystyle T_{\mu }(x)=x+\left\{\Phi (x)\right\}^{t}\left\{\mu \right\}.} This metric is the main one used in the YaDICs as it works well with same modality images. One has to find the minimum of this metric ==== Normalized cross-correlation ==== The normalized cross-correlation (NCC) is used when one cannot assure the optical flow conservation; it happens in case of change of lighting or if particles disappear from the scene can occur in particle images velocimetry (PIV). The NCC is defined by: N C C ( μ , I F , I M ) = ∑ x i ∈ Ω F ( I F ( x i ) − I F ¯ ) ( I M ( T μ ( x i ) ) − I M ¯ ) ∑ x i ∈ Ω F ( I F ( x i ) − I F ¯ ) 2 ∑ x i ∈ Ω F ( I M ( T μ ( x i ) ) − I M ¯ ) 2 , {\displaystyle NCC(\mu ,{\mathcal {I_{F}}},{\mathcal {I_{M}}})={\dfrac {\sum _{x_{i}\in \Omega _{F}}\left({\mathcal {I_{F}}}(x_{i})-{\overline {\mathcal {I_{F}}}}\right)\left({\mathcal {I_{M}}}({T}_{\mu }(x_{i}))-{\overline {\mathcal {I_{M}}}}\right)}{\sqrt {\sum _{x_{i}\in \Omega _{F}}\left({\mathcal {I_{F}}}(x_{i})-{\overline {\mathcal {I_{F}}}}\right)^{2}\sum _{x_{i}\in \Omega _{F}}\left({\mathcal {I_{M}}}({T}_{\mu }(x_{i}))-{\overline {\mathcal {I_{M}}}}\right)^{2}}}},} where I F ¯ {\displaystyle {\overline {\mathcal {I_{F}}}}} and I M ¯ {\displaystyle {\overline {\mathcal {I_{M}}}}} are the mean values of the fixed and mobile images. This metric is only used to find local translation in Yadics. This metric with translation transform can be solved using cross-correlation methods, which are non iterative and can be accelerated using Fast Fourier Transform . === Classification of transformations === There are three categories of parametrization: elastic, global and local transformation. The elastic transformations respect the partition of unity, there are no holes created or surfaces counted several times. This is commonly used in Image Registration by the use of B-Spline functions and in solid mechanics with finite element basis. The global transformations are defined on the whole picture using rigid body or affine transformation (which is equivalent to homogeneous strain transformation). More complex transformations can be defined such as mechanically based one. These transformations have been used for stress intensity factor identification by and for rod strain by. The local transformation can be considered as the same global transformation defined on several Zone Of Interest (ZOI) of the fixed image. ==== Global ==== Several global transforms have been implemented: Rigid and homogeneous (Tx,Ty,Rz in 2D; Tx,Ty,Tz,Rx,Ry,Rz,Exx,Eyy,Ezz,Eyz,Exz,Exy in 3D) Brazilian (Only in 2D), Dynamic Flexion, ==== Elastic ==== First-order quadrangular finite elements Q4P1 are used in Yadics. ===== Local ===== Every global transform can be used on a local mesh. === Optimization === The YaDICs optimization process follows a gradient descent scheme. The first step is to compute the gradient of the metric regarding the transform parameters ∂ S S D ( μ , I F , I M ) ∂ μ = 2 | Ω F | ∑ x i ∈ Ω F ( I F ( x i ) − I M ( T μ ( x i ) ) ) ∂ I M ( T μ ( x i ) ∂ μ = 2 | Ω F | ∑ x i ∈ Ω F ( I F ( x i ) − I M ( T μ ( x i ) ) ) ( ∂ T μ ( x i ) ∂ μ ) t ∂ I M ( T μ ( x i ) ) ∂ x {\displaystyle {\begin{array}{lcl}{\dfrac {\partial SSD(\mu ,{\mathcal {I_{F}}},{\mathcal {I_{M}}})}{\partial \mu }}&=&{\dfrac {2}{\left|\Omega _{F}\right|}}\sum _{x_{i}\in \Omega _{F}}\left({\mathcal {I_{F}}}(x_{i})-{\mathcal {I_{M}}}({T}_{\mu }(x_{i}))\right){\dfrac {\partial {\mathcal {I_{M}}}({T}_{\mu }(x_{i})}{\partial \mu }}\\&=&{\dfrac {2}{\left|\Omega _{F}\right|}}\sum _{x_{i}\in \Omega _{F}}\left({\mathcal {I_{F}}}(x_{i})-{\mathcal {I_{M}}}({T}_{\mu }(x_{i}))\right)\left({\dfrac {\partial {T}_{\mu }(x_{i})}{\partial \mu }}\right)^{t}{\dfrac {\partial {\mathcal {I_{M}}}({T}_{\mu }(x_{i}))}{\partial x}}\\\end{array}}} ==== Gradient method ==== Once the metric gradient has been computed, one has to find an optimization strategy The gradient method principle is explained below: μ k + 1 = μ k + α k d k {\displaystyle \mu _{k+1}=\mu _{k}+\alpha _{k}d_{k}} The gradient step can be constant or updated at every iteration. d k = − γ k ∂ C ( μ , I F , I M ) ∂ μ {\displaystyle d_{k}=-\gamma _{k}{\dfrac {\partial {\mathcal {C}}(\mu ,{\mathcal {I_{F}}},{\mathcal {I_{M}}})}{\partial \mu }}} , γ k {\displaystyle \gamma _{k}} allows one to choose between the following methods : γ k {\displaystyle \gamma _{k}} ⟹ {\displaystyle \Longrightarrow } steepest descent, γ k = [ ∂ C ( μ , I F , I M ) ∂ μ ∂ C ( μ , I F , I M ) ∂ μ t ] − 1 {\displaystyle \gamma _{k}=\left[{\dfrac {\partial {\mathcal {C}}(\mu ,{\mathcal {I_{F}}},{\mathcal {I_{M}}})}{\partial \mu }}{\dfrac {\partial {\mathcal {C}}(\mu ,{\mathcal {I_{F}}},{\mathcal {I_{M}}})}{\partial \mu }}^{t}\right]^{-1}} ⟹ {\displaystyle \Longrightarrow } Gauss-Newto

    Read more →
  • Network eavesdropping

    Network eavesdropping

    Network eavesdropping, also known as eavesdropping attack, sniffing attack, or snooping attack, is a method that retrieves user information through the internet. This attack happens on electronic devices like computers and smartphones. This network attack typically happens under the usage of unsecured networks, such as public wifi connections or shared electronic devices. Eavesdropping attacks through the network is considered one of the most urgent threats in industries that rely on collecting and storing data. Internet users use eavesdropping via the Internet to improve information security. A typical network eavesdropper may be called a Black-hat hacker and is considered a low-level hacker as it is simple to network eavesdrop successfully. The threat of network eavesdroppers is a growing concern. Research and discussions are brought up in the public's eye, for instance, types of eavesdropping, open-source tools, and commercial tools to prevent eavesdropping. Models against network eavesdropping attempts are built and developed as privacy is increasingly valued. Sections on cases of successful network eavesdropping attempts and its laws and policies in the National Security Agency are mentioned. Some laws include the Electronic Communications Privacy Act and the Foreign Intelligence Surveillance Act. == Types of attacks == Types of network eavesdropping include intervening in the process of decryption of messages on communication systems, attempting to access documents stored in a network system, and listening on electronic devices. Types include electronic performance monitoring and control systems, keystroke logging, man-in-the-middle attacks, observing exit nodes on a network, and Skype & Type. === Electronic performance monitoring and control systems (EPMCSs) === Electronic performance monitoring and control systems are used by employees or companies and organizations to collect, store, analyze, and report actions or performances of employers when they are working. The beginning of this system is used to increase the efficiency of workers, but instances of unintentional eavesdropping can occur, for example, when employees' casual phone calls or conversations would be recorded. === Keystroke logging === Keystroke logging is a program that can oversee the writing process of the user. It can be used to analyze the user's typing activities, as keystroke logging provides detailed information on activities like typing speed, pausing, deletion of texts, and more behaviors. By monitoring the activities and sounds of the keyboard strikes, the message typed by the user can be translated. Although keystroke logging systems do not explain reasons for pauses or deletion of texts, it allows attackers to analyze text information. Keystroke logging can also be used with eye-tracking devices which monitor the movements of the user's eyes to determine patterns of the user's typing actions which can be used to explain the reasons for pauses or deletion of texts. === Man-in-the-middle attack (MitM) === A Man-in-the-middle attack is an active eavesdropping method that intrudes on the network system. It can retrieve and alter the information sent between two parties without anyone noticing. The attacker hijacks the communication systems and gains control over the transport of data, but cannot insert voice messages that sound or act like the actual users. Attackers also create independent communications through the system with the users acting as if the conversation between users is private. The "man-in-the-middle" can also be referred to as lurkers in a social context. A lurker is a person who rarely or never posts anything online, but the person stays online and observes other users' actions. Lurking can be valuable as it lets people gain knowledge from other users. However, like eavesdropping, lurking into other users' private information violates privacy and social norms. === Observing exit nodes === Distributed networks including communication networks are usually designed so that nodes can enter and exit the network freely. However, this poses a danger in which attacks can easily access the system and may cause serious consequences, for example, leakage of the user's phone number or credit card number. In many anonymous network pathways, the last node before exiting the network may contain actual information sent by users. Tor exit nodes are an example. Tor is an anonymous communication system that allows users to hide their IP addresses. It also has layers of encryption that protect information sent between users from eavesdropping attempts trying to observe the network traffic. However, Tor exit nodes are used to eavesdrop at the end of the network traffic. The last node in the network path flowing through the traffic, for instance, Tor exit nodes, can acquire original information or messages that were transmitted between different users. === Skype & Type (S&T) === Skype & Type (S&T) is a new keyboard acoustic eavesdropping attack that takes advantage of Voice-over IP (VoIP). S&T is practical and can be used in many applications in the real world, as it does not require attackers to be close to the victim and it can work with only some leaked keystrokes instead of every keystroke. With some knowledge of the victim's typing patterns, attackers can gain a 91.7% accuracy typed by the victim. Different recording devices including laptop microphones, smartphones, and headset microphones can be used for attackers to eavesdrop on the victim's style and speed of typing. It is especially dangerous when attackers know what language the victim is typing in. == Tools to prevent eavesdropping attacks == Computer programs where the source code of the system is shared with the public for free or for commercial use can be used to prevent network eavesdropping. They are often modified to cater to different network systems, and the tools are specific in what task it performs. In this case, Advanced Encryption Standard-256, Bro, Chaosreader, CommView, Firewalls, Security Agencies, Snort, Tcptrace, and Wireshark are tools that address network security and network eavesdropping. === Advanced encryption standard-256 (AES-256) === It is a cipher block chaining (CBC) mode for ciphered messages and hash-based message codes. The AES-256 contains 256 keys for identifying the actual user, and it represents the standard used for securing many layers on the internet. AES-256 is used by Zoom Phone apps that help encrypt chat messages sent by Zoom users. If this feature is used in the app, users will only see encrypted chats when they use the app, and notifications of an encrypted chat will be sent with no content involved. === Bro === Bro is a system that detects network attackers and abnormal traffic on the internet. It emerged at the University of California, Berkeley that detects invading network systems. The system does not apply to the detection of eavesdropping by default, but can be modified to an offline analyzing tool for eavesdropping attacks. Bro runs under Digital Unix, FreeBSD, IRIX, SunOS, and Solaris operating systems, with the implementation of approximately 22,000 lines of C++ and 1,900 lines of Bro. It is still in the process of development for real-world applications. === Chaosreader === Chaosreader is a simplified version of many open-source eavesdropping tools. It creates HTML pages on the content of when a network intrusion is detected. No actions are taken when an attack occurs and only information such as time, network location on which system or wall the user is trying to attack will be recorded. === CommView === CommView is specific to Windows systems which limits real-world applications because of its specific system usage. It captures network traffic and eavesdropping attempts by using packet analyzing and decoding. === Firewalls === Firewall technology filters network traffic and blocks malicious users from attacking the network system. It prevents users from intruding into private networks. Having a firewall in the entrance to a network system requires user authentications before allowing actions performed by users. There are different types of firewall technologies that can be applied to different types of networks. === Security agencies === A Secure Node Identification Agent is a mobile agent used to distinguish secure neighbor nodes and informs the Node Monitoring System (NMOA). The NMOA stays within nodes and monitors the energy exerted, and receives information about nodes including node ID, location, signal strength, hop counts, and more. It detects nodes nearby that are moving out of range by comparing signal strengths. The NMOA signals the Secure Node Identification Agent (SNIA) and updates each other on neighboring node information. The Node BlackBoard is a knowledge base that reads and updates the agents, acting as the brain of the security system. The Node Key Management agent is created when an encryption key is inserted to th

    Read more →
  • Render layers

    Render layers

    When creating computer-generated imagery, final scenes appearing in movies and television productions are usually produced by rendering more than one "layer" or "pass," which are multiple images designed to be put together through digital compositing to form a completed frame. Rendering in passes is based on a traditions in motion control photography which predate CGI. As an example, for a visual effects shot, a camera could be programmed to move past a physical model of a spaceship in one pass to film the fully lit beauty pass of the ship, and then to repeat exactly the same camera move passing the ship again to photograph additional elements such as the illuminated windows in the ship or its thrusters. Once all of the passes were filmed, they could then be optically printed together to form a completed shot. The terms render layers and render passes are sometimes used interchangeably. However, rendering in layers refers specifically to separating different objects into separate images, such as a layer each for foreground characters, sets, distant landscape, and sky. On the other hand, rendering in passes refers to separating out different aspects of the scene, such as shadows, highlights, or reflections, into separate images.

    Read more →