RevoScaleR

RevoScaleR

RevoScaleR is a machine learning package in R created by Microsoft. It is available as part of Machine Learning Server, Microsoft R Client, and Machine Learning Services in Microsoft SQL Server 2016. The package contains functions for creating linear model, logistic regression, random forest, decision tree and boosted decision tree, and K-means, in addition to some summary functions for inspecting and visualizing data. It has a Python package counterpart called revoscalepy. Another closely related package is MicrosoftML, which contains machine learning algorithms that RevoScaleR does not have, such as neural network and SVM. In June 2021, Microsoft announced to open source the RevoScaleR and revoscalepy packages, making them freely available under the MIT License. == Concepts == Many R packages are designed to analyze data that can fit in the memory of the machine and usually do not make use of parallel processing. RevoScaleR was designed to address these limitations. The functions in RevoScaleR orientate around three main abstraction concepts that users can specify to process large amount of data that might not fit in memory and exploit parallel resources to speed up the analysis. === Compute Contexts === A compute context refers to the location where the computation on the data happens. It could be "local" (on the client machine) or "remote" (on a data platform such as a SQL server, or Spark). Pushing the computation to a remote server allows people to take advantage of the greater compute resources that a remote machine may have. If the data being analyzed reside on the same machine, using a remote compute context also removes the need to pull data across the network onto the client machine. === Data source === Data source defines where the data comes from. There are various data sources available in RevoScaleR, such as text data, Xdf data, in-SQL data, and a spark dataframe. People can wrap their data in a data source object and use that as run analytics in different compute context. Different data sources are available in different compute context. For example, if the compute context is set to SQL server, then the only data source one can use would be an in-SQL data source. === Analytics === Analytic functions in RevoScaleR takes in data source object, a compute context, and the other parameters needed to build the specific model, such as formula for the logistic regression or the number of trees in a decision tree. In addition to those parameters, one can also specify the level of parallelism, such as the size of the data chunk for each process or number of processes to build the model. However, parallelism is only available in non-express edition. == Limitations == The package is mostly meant to be used with a SQL server or other remote machines. To fully leverage the abstractions it uses to process a large dataset, one needs a remote server and non-Express free edition of the package. It cannot be easily installed such as by running "install.packages("RevoScaleR")" like most open source R packages. It's available only through Microsoft R Client, a distribution of R for data science, or Microsoft Machine Learning Server (stand-alone with no SQL server attached), or Microsoft Machine Learning Services (a SQL server services). However, one can still use the analytics functions in an Express, free version of the package.

Noom

Noom is an American privately held digital health company that provides weight management and behavioral health services through a subscription-based mobile application. Founded in 2008, the company combines behavior change psychology with access to weight loss medications and dietary supplements. The platform incorporates elements of cognitive behavioral therapy (CBT) and goal-setting strategies, and its programs are designed to support users in developing healthier habits. In addition to its weight management services, Noom has expanded to offer products related to stress management and general wellness. Noom has received both praise and criticism. Supporters cite its focus on mental and behavioral aspects of health, while critics have raised concerns about the accuracy of its calorie goals, the use of algorithmically determined weight loss targets, and questions about the qualifications of some of its coaching staff. == History == Noom was founded in 2008 by friends Artem Petakov and Saeju Jeong. The company's mobile app officially launched in 2016. In 2025, Noom relocated its headquarters from New York City to Princeton, New Jersey. Petakov, a former software engineer at Google, currently leads Noom Ventures, while Jeong serves as Noom's Chairman. In 2023, Geoff Cook was appointed CEO of Noom. In 2019, Noom partnered with Novo Nordisk to offer patients prescribed the diabetes medication Saxenda one year of free access to the Noom platform. In 2020, Noom reported $400 million in revenue. As of April 2021, the company stated it employed approximately 3,000 people, including 2,700 coaches. == Services == === Noom App === The Noom app is the primary platform through which users engage with the company's services. Upon creating an account, users are prompted to provide physical information such as weight, height, and age, along with experiential data including lifestyle habits, personal goals, and perceived obstacles. Users log their meals and physical activity, and in return, the app delivers feedback through multiple channels: algorithmically generated insights, guidance from a human coach, peer interaction, educational articles, and interactive quizzes. The app has been reviewed by a range of media outlets, including newspapers such as the Chicago Tribune and USA Today; health information sources such as WebMD; and lifestyle magazines including Good Housekeeping. === Other services === In 2024, Noom launched Noom Vibe, a mobile application that encourages users to develop healthy habits by awarding "vibes"—a form of points—for activities such as walking or meeting step goals. That same year, Noom introduced a 3D body scanning feature within its app, designed to help users monitor physical changes and prevent muscle atrophy during weight loss. Also in 2024, Noom began offering a compounded GLP-1 medication as part of its weight management program. The formulation includes the same active ingredient found in the anti-obesity medications Wegovy and Ozempic. == Research == In 2016, a study published in Scientific Reports analyzed data from approximately 36,000 users of the Noom app, of whom 78% were female and 22% male. The data were collected between October 2012 and April 2014. To be included in the analysis, users had to log their weight at least twice per month over a period of six consecutive months. The study found that 78% of participants self-reported weight loss while using the app. The median duration of weight reporting was 267 days (approximately nine months). The frequency of data logging was positively correlated with weight loss. Additionally, male users had a higher average starting BMI and reported greater average weight loss compared to female users. In 2017, the Centers for Disease Control and Prevention (CDC) recognized Noom as a certified diabetes prevention program, making it the first mobile health application to receive such designation. == Criticisms == === Health programs === Noom has been criticized for promoting elements of diet culture in its advertising campaigns. The app has also faced criticism for setting calorie goals that some users and experts have deemed inappropriately low, and for employing coaches who may lack formal qualifications as registered dietitians. Coaching has been described as relying heavily on canned responses. Upon sign-up, users are prompted to complete a questionnaire consisting of over 50 questions, which is used to generate a personalized program. In 2021, the UK-based organization Privacy International alleged that Noom, along with other diet platforms, used such lengthy surveys to attract users but did not always tailor the resulting programs to the collected data. The organization claimed that many users received the same or highly similar programs regardless of their answers. It also raised concerns about the handling of potentially sensitive health data, alleging a lack of transparency regarding the sharing of such data with third parties, including Facebook, potentially in violation of the European General Data Protection Regulation (GDPR). In a follow-up investigation in 2023, Privacy International reported that Noom had made "significant positive changes" to its data handling practices. However, the organization noted that data was still being shared with Facebook and concluded that "there is still room for improvement." === Billing issues lawsuit === In August 2020, the Better Business Bureau (BBB) issued a warning to consumers regarding Noom's subscription practices. The BBB reported that numerous customers had filed complaints about difficulties canceling their subscriptions after the free trial period, as well as challenges in contacting the company to request refunds. In February 2022, Noom agreed to a $62 million settlement in a class-action lawsuit that alleged the company had used deceptive billing practices related to automatic subscription renewals. Qualifying claimants received approximately $167 each. During the case, a former senior software engineer at Noom testified that the cancellation process was intentionally designed to be difficult, with the goal of generating revenue from customers who failed to cancel in time. In response, Noom stated that it had taken steps to improve transparency around its pricing and policies, including the implementation of self-service cancellation tools.

AI content watermarking

AI content watermarking is the process of embedding imperceptible yet detectable signals into content generated by artificial intelligence systems, such as text, images, audio, or video. The technique allows the content to be traced and identified as machine-generated without compromising its quality for the end user. AI watermarking has emerged as a key approach to address growing concerns about misinformation, deepfakes, copyright infringement, and the traceability of synthetic content in the context of the rapid development of generative artificial intelligence. Unlike traditional visible watermarks used in photography, AI content watermarks are typically invisible to humans and can only be detected and deciphered algorithmically. The concept is distinct from the watermarking of AI models themselves (to prevent model theft) and from the watermarking of training data (to combat unauthorized data use). Modern AI watermarking schemes are typically formalized as a pair of algorithms, an embedding (or generation) algorithm and a detection algorithm, sharing a secret key, whose performance is evaluated along three competing axes: quality (the watermark must not noticeably degrade outputs), detectability (the watermark must be statistically distinguishable from unwatermarked content), and robustness (the watermark must persist under adversarial or incidental modifications). == Background == Digital watermarking has been used for decades to protect physical and digital media, from paper currency to photographs. Classical schemes typically embedded a fixed bit-string into a fixed cover signal, with robustness criteria defined against a small fixed set of distortions such as JPEG compression or additive Gaussian noise. The rapid advancement of generative AI in the early 2020s, however, created a new and qualitatively different demand: rather than protecting a single artifact, watermarks for AI content must be embedded automatically across an open-ended distribution of generated outputs while remaining robust to a much wider class of adversarial transformations, including paraphrasing, image regeneration via diffusion models, and re-recording. Large image generation models such as DALL-E, Stable Diffusion, and Midjourney, along with large language models like ChatGPT, made it possible to produce highly realistic synthetic text, images, audio, and video at scale, raising significant ethical and security concerns. In July 2023, the Biden administration secured voluntary commitments from leading AI companies, including OpenAI, Alphabet, Meta, and Amazon, to develop watermarking and other provenance technologies to help users identify AI-generated content. == Formal definitions and design goals == Most modern AI watermarking schemes can be formalized as a pair of algorithms ( W m , D e t e c t ) {\displaystyle ({\mathsf {Wm}},{\mathsf {Detect}})} parameterized by a secret key k {\displaystyle k} . The embedding algorithm W m {\displaystyle {\mathsf {Wm}}} takes a generative model M {\displaystyle M} (and optionally a prompt) and returns a watermarked output x {\displaystyle x} ; the detection algorithm D e t e c t ( x , k ) {\displaystyle {\mathsf {Detect}}(x,k)} outputs a real-valued score (typically a p-value or log-likelihood ratio) used to decide whether x {\displaystyle x} was produced by the watermarked generator. The literature evaluates such schemes along several largely conflicting criteria: Criteria for evaluation include imperceptibility or quality preservation, measured for text via perplexity and human preference judgments, and for images and audio via metrics such as PSNR, SSIM, LPIPS, or PESQ. Detectability is typically expressed as the true positive rate at a fixed false positive rate (e.g. 1% or 10^-6), or as the number of tokens or pixels needed to reach a given confidence level. Robustness refers to the requirement that the watermark should survive expected modifications like JPEG or MP3 compression, cropping, noise, paraphrasing, or machine translation. Distortion-freeness is a stronger property requiring that the marginal distribution of any single watermarked output be statistically identical to the unwatermarked model's distribution. Schemes due to Aaronson, Christ et al., and Kuditipudi et al. are distortion-free in this sense, while the original Kirchenbauer et al. scheme is not. Forgery resistance or unforgeability means an adversary without the secret key should be unable to produce content that passes detection. == Techniques == AI watermarking techniques vary significantly depending on the type of content being watermarked. At its core, the process involves two main stages: embedding (or encoding) the watermark, and detection. There are two primary methods for embedding: watermarking during content generation, which requires access to the AI model itself but is generally more robust, and post-generation watermarking, which can be applied to content from any source, including closed-source models. Watermarks can be broadly classified as visible, including overt marks such as logos or text overlays, or imperceptible, which are detectable only by algorithms. They can also be classified by durability: robust watermarks are designed to withstand common transformations such as compression, cropping, and re-encoding, while fragile watermarks are easily destroyed by any alteration, making them useful for tamper detection. A further axis distinguishes zero-bit watermarks, which only signal "this content was generated by model M," from multi-bit watermarks, which embed an arbitrary payload (such as a user identifier) that can be recovered at detection time. === Text === Text watermarking is considered one of the most challenging modalities because natural language offers relatively limited redundancy compared to images or audio. Modern approaches for large language models alter the autoregressive sampling process so that some statistical signature is left in the choice of tokens, while leaving the surface form of the text unchanged. The literature distinguishes three main families of generation-time text watermarks. Logit-biasing schemes (e.g. KGW) add a fixed bias δ {\displaystyle \delta } to a pseudorandomly selected subset of vocabulary logits before softmax sampling. Reweighting or sampling-based schemes (e.g. SynthID-Text) compose multiple pseudorandom tournaments over the model's full distribution. Distortion-free schemes based on the Gumbel-max trick or inverse transform sampling (Aaronson 2022; Kuditipudi et al. 2023; Christ et al. 2024) preserve the marginal output distribution of the model. ==== KGW: token-probability shifting ==== The pioneering "green list / red list" scheme of Kirchenbauer et al. (KGW), introduced at ICML 2023, is the foundation for most subsequent text watermarks. At each decoding step t {\displaystyle t} , a pseudorandom function (PRF) keyed by a secret k {\displaystyle k} is applied to a context window of h {\displaystyle h} previous tokens to deterministically partition the vocabulary V {\displaystyle V} of size N {\displaystyle N} into a "green list" G ⊂ V {\displaystyle G\subset V} of size γ N {\displaystyle \gamma N} and its complement, the "red list" R = V ∖ G {\displaystyle R=V\setminus G} , where γ ∈ ( 0 , 1 ) {\displaystyle \gamma \in (0,1)} (typically γ = 1 / 2 {\displaystyle \gamma =1/2} ) is the green fraction. A logits processor then increments every green-list logit by a fixed bias δ > 0 {\displaystyle \delta >0} before softmax: ℓ v ′ = ℓ v + δ ⋅ 1 [ v ∈ G ] {\displaystyle \ell '_{v}=\ell _{v}+\delta \cdot \mathbf {1} [v\in G]} so that, after sampling, green tokens are over-represented but generation is not constrained to green tokens alone; high-entropy positions tolerate the bias gracefully, while low-entropy positions (where one token dominates the logits) override the watermark and preserve correctness on factual content. Detection requires only the secret key and the candidate text, not the language model itself. The detector recomputes the partition g ( ⋅ ) {\displaystyle g(\cdot )} for each token, counts the number of green hits | G | hits {\displaystyle |G|_{\text{hits}}} in a sequence of length T {\displaystyle T} , and computes a one-proportion z-test statistic: z = | G | hits − γ T T γ ( 1 − γ ) {\displaystyle z={\frac {|G|_{\text{hits}}-\gamma T}{\sqrt {T\gamma (1-\gamma )}}}} Under the null hypothesis that the text was written by an unwatermarked source (human or another model), the green-hit count is approximately binomially distributed with mean γ T {\displaystyle \gamma T} ; a large positive z {\displaystyle z} rejects the null hypothesis. The original paper reports that fewer than 25 watermarked tokens are sufficient to detect a watermark with a false positive rate below 10^-5 on the OPT-1.3B model. A follow-up study by the same group documented robustness under temperature sampling, top-p (nucleus) sampling, and human paraphrasing, and proposed sliding-window

Online service provider

An online service provider (OSP) can, for example, be an Internet service provider, an email provider, a news provider (press), an entertainment provider (music, movies), a search engine, an e-commerce site, an online banking site, a health site, an official government site, social media, a wiki, or a Usenet newsgroup. In its original more limited definition, it referred only to a commercial computer communication service in which paid members could dial via a computer modem the service's private computer network and access various services and information resources such as bulletin board systems, downloadable files and programs, news articles, chat rooms, and electronic mail services. The term "online service" was also used in references to these dial-up services. The traditional dial-up online service differed from the modern Internet service provider in that they provided a large degree of content that was only accessible by those who subscribed to the online service, while ISP mostly serves to provide access to the Internet and generally provides little if any exclusive content of its own. In the U.S., the Online Copyright Infringement Liability Limitation Act (OCILLA) portion of the U.S. Digital Millennium Copyright Act has expanded the legal definition of online service in two different ways for different portions of the law. It states in section 512(k)(1): (A) As used in subsection (a), the term "service provider" means an entity offering the transmission, routing, or providing of connections for digital online communications, between or among points specified by a user, of material of the user's choosing, without modification to the content of the material as sent or received. (B) As used in this section, other than subsection (a), the term "service provider" means a provider of online services or network access, or the operator of facilities therefore, and includes an entity described in subparagraph (A). These broad definitions make it possible for numerous web businesses to benefit from the OCILLA. == History == The first commercial online services went live in 1969. CompuServe (owned in the 1980s and 1990s by H&R Block) and The Source (for a time owned by The Reader's Digest) are considered the first major online services created to serve the market of personal computer users. Utilizing text-based interfaces and menus, these services allowed anyone with a modem and communications software to use email, chat, news, financial and stock information, bulletin boards, special interest groups (SIGs), forums and general information. Subscribers could exchange email only with other subscribers of the same service. (For a time a service called DASnet carried mail among several online services, and CompuServe, MCI Mail, and other services experimented with X.400 protocols to exchange email until the Internet rendered these outmoded.) Other text-based online services followed such as Delphi, GEnie and MCI Mail. The 1980s also saw the rise of independent Computer Bulletin Boards, or BBSes. (Online services are not BBSes. An online service may contain an electronic bulletin board, but the term "BBS" is reserved for independent dialup, microcomputer-based services that are usually single-user systems.) The commercial services used pre-existing packet-switched (X.25) data communications networks, or the services' own networks (as with CompuServe). In either case, users dialed into local access points and were connected to remote computer centers where information and services were located. As with telephone service, subscribers paid by the minute, with separate day-time and evening/weekend rates. As the use of computers that supported color and graphics, such the Atari 8-bit computers, Commodore 64, TI-99/4A, Apple II, and early IBM PC compatibles, increased, online services gradually developed framed or partially graphical information displays. Early services such as CompuServe added increasingly sophisticated graphics-based front end software to present their information, though they continued to offer text-based access for those who needed or preferred it. In 1985 Viewtron, which began as a Videotex service requiring a dedicated terminal, introduced software allowing home computer owners access. Beginning in the mid-1980s graphics based online services such as PlayNET, Prodigy, and Quantum Link (aka Q-Link) were developed. Quantum Link, which was based on Commodore-only Playnet software, later developed AppleLink Personal Edition, PC-Link (based on Tandy's DeskMate), and Promenade (for IBM), all of which (including Q-Link) were later combined as America Online. These online services presaged the web browser that would change global online life 10 years later. Before Quantum Link, Apple computer had developed its own service, called AppleLink, which was mostly a support network targeted at Apple dealers and developers. Later, Apple offered the short-lived eWorld, targeted at Mac consumers and based on the Mac version of the America Online software. Beginning in 1992, the Internet, which had previously been limited to government, academic, and corporate research settings, was opened to commercial entities. The first online service to offer Internet access was DELPHI, which had developed TCP/IP access much earlier, in connection with an environmental group that rated Internet access. The explosion of popularity of the World Wide Web in 1994 accelerated the development of the Internet as an information and communication resource for consumers and businesses. The sudden availability of low- to no-cost email and appearance of free independent web sites broke the business model that had supported the rise of the early online service industry. CompuServe, BIX, AOL, DELPHI, and Prodigy gradually added access to Internet e-mail, Usenet newsgroups, ftp, and to web sites. At the same time, they moved from usage-based billing to monthly subscriptions. Similarly, companies that paid to have AOL host their information or early online stores began to develop their own web sites, putting further stress on the economics of the online industry. Only the largest services like AOL (which later acquired CompuServe, just as CompuServe acquired The Source) were able to make the transition to the Internet-centric world. A new class of online service provider arose to provide access to the Internet, the internet service provider or ISP. Internet-only service providers like UUNET, The Pipeline, Panix, Netcom, the World, EarthLink, and MindSpring provided no content of their own, concentrating their efforts on making it easy for nontechnical users to install the various software required to "get online" before consumer operating systems came internet-enabled out of the box. In contrast to the online services' multitiered per-minute or per-hour rates, many ISPs offered flat-fee, unlimited access plans. Independent companies sprang up to offer access and packages to compete with the big networks (eg, the-wire.com, 1994 in Toronto and bway.net 1995 in New York). These providers first offered access through telephone and modem, just as did the early online services providers. By the early 2000s, these independent ISPs had largely been supplanted by high speed and broadband access through cable and phone companies, as well as wireless access. The importance of the online services industry was vital in "paving the road" for the information superhighway. When Mosaic and Netscape were released in 1994, they had a ready audience of more than 10 million people who were able to download their first web browser through an online service. Though ISPs quickly began offering software packages with setup to their customers, this brief period gave many users their first online experience. Two online services in particular, Prodigy and AOL, are often confused with the Internet, or the origins of the Internet. Prodigy's Chief Technical Officer said in 1999: "Eleven years ago, the Internet was just an intangible dream that Prodigy brought to life. Now it is a force to be reckoned with." Despite that statement, neither service provided the back bone for the Internet, nor did either start the Internet. == Online service interfaces == The first online service used a simple text-based interface in which content was largely text only and users made choices via a command prompt. This allowed just about any computer with a modem and terminal communications program the ability to access these text-based online services. CompuServe would later offer, with the advent of the Apple Macintosh and Microsoft Windows-based PCs, a GUI interface program for their service. This provided a very rudimentary GUI interface. CompuServe continued to offer text-only access for those needing it. Online services like Prodigy and AOL developed their online service around a GUI and thus unlike CompuServe's early GUI-based software, these online services provided a more robust GUI interface. Early GUI-base

Collision detection

Collision detection is the computational problem of detecting an intersection of two or more objects in virtual space. More precisely, it deals with the questions of if, when, and where two or more objects intersect. Collision detection is a classic problem of computational geometry with applications in computer graphics, physical simulation, video games, robotics (including autonomous driving), and computational physics. Collision detection algorithms can be divided into operating on 2D or 3D spatial objects. == Overview == Collision detection is closely linked to calculating the distance between objects, as objects collide when the distance between them is less than or equal to zero. Negative distances indicate that one object has penetrated another. Performing collision detection requires more context than just the distance between the objects. Accurately identifying the points of contact on both objects' surfaces is also essential for computing a physically accurate collision response. The complexity of this task increases with the level of detail in the objects' representations: the more intricate the model, the greater the computational cost. Collision detection frequently involves dynamic objects, adding a temporal dimension to distance calculations. Instead of simply measuring distance between static objects, collision detection algorithms often aim to determine whether the objects' motion will bring them to a point in time when their distance is zero—an operation that adds significant computational overhead. In collision detection involving multiple objects, a naive approach would require detecting collisions for all pairwise combinations of objects. As the number of objects increases, the number of required comparisons grows rapidly: for n {\displaystyle n} objects, n ( n − 1 ) / 2 {n(n-1)}/{2} intersection tests are needed with a naive approach. This quadratic growth makes such an approach computationally expensive as n {\displaystyle n} increases. Due to the complexity mentioned above, collision detection is a computationally intensive process. Nevertheless, it is essential for interactive applications like video games, robotics, and real-time physics engines. To manage these computational demands, extensive efforts have gone into optimizing collision detection algorithms. A commonly used approach towards accelerating the required computations is to divide the process into two phases: the broad phase and the narrow phase. The broad phase aims to answer the question of whether objects might collide, using a conservative but efficient approach to rule out pairs that clearly do not intersect, thus avoiding unnecessary calculations. Objects that cannot be definitively separated in the broad phase are passed to the narrow phase. Here, more precise algorithms determine whether these objects actually intersect. If they do, the narrow phase often calculates the exact time and location of the intersection. == Broad phase == This phase aims at quickly finding objects or parts of objects for which it can be quickly determined that no further collision test is needed. A useful property of such approach is that it is output sensitive. In the context of collision detection this means that the time complexity of the collision detection is proportional to the number of objects that are close to each other. An early example of that is the I-COLLIDE where the number of required narrow phase collision tests was O ( n + m ) {\displaystyle O(n+m)} where n {\displaystyle n} is the number of objects and m {\displaystyle m} is the number of objects at close proximity. This is a significant improvement over the quadratic complexity of the naive approach. === Spatial partitioning === Several approaches can be grouped under the spatial partitioning umbrella, which includes octrees (for 3D), quadtrees (for 2D), binary space partitioning (or BSP trees) and other, similar approaches. If one splits space into a number of simple cells, and if two objects can be shown not to be in the same cell, then they need not be checked for intersection. Dynamic scenes and deformable objects require updating the partitioning which can add overhead. === Bounding volume hierarchy === Bounding Volume Hierarchy (BVH) is a tree structure over a set of bounding volumes. Collision is determined by doing a tree traversal starting from the root. If the bounding volume of the root doesn't intersect with the object of interest, the traversal can be stopped. If, however there is an intersection, the traversal proceeds and checks the branches for each there is an intersection. Branches for which there is no intersection with the bounding volume can be culled from further intersection test. Therefore, multiple objects can be determined to not intersect at once. BVH can be used with deformable objects such as cloth or soft-bodies but the volume hierarchy has to be adjusted as the shape deforms. For deformable objects we need to be concerned about self-collisions or self intersections. BVH can be used for that end as well. Collision between two objects is computed by computing intersection between the bounding volumes of the root of the tree as there are collision we dive into the sub-trees that intersect. Exact collisions between the actual objects, or its parts (often triangles of a triangle mesh) need to be computed only between intersecting leaves. The same approach works for pair wise collision and self-collisions. === Exploiting temporal coherence === During the broad-phase, when the objects in the world move or deform, the data-structures used to cull collisions have to be updated. In cases where the changes between two frames or time-steps are small and the objects can be approximated well with axis-aligned bounding boxes, the sweep and prune algorithm can be a suitable approach. Several key observation make the implementation efficient: Two bounding-boxes intersect if, and only if, there is overlap along all three axes; overlap can be determined, for each axis separately, by sorting the intervals for all the boxes; and lastly, between two frames updates are typically small (making sorting algorithms optimized for almost-sorted lists suitable for this application). The algorithm keeps track of currently intersecting boxes, and as objects move, re-sorting the intervals helps keep track of the status. === Pairwise pruning === Once a pair of physical bodies has been selected for further investigation, collisions need to be checked more carefully. However, in many applications, individual objects (if they are not too deformable) are described by a set of smaller primitives, mainly triangles. So there are two sets of triangles, S = S 1 , S 2 , … , S n {\displaystyle S={S_{1},S_{2},\dots ,S_{n}}} and T = T 1 , T 2 , … , T n {\displaystyle T={T_{1},T_{2},\dots ,T_{n}}} (for simplicity, each set has the same number of triangles.) The obvious thing to do is to check all triangles S j {\displaystyle S_{j}} against all triangles T k {\displaystyle T_{k}} for collisions, but this involves n 2 {\displaystyle n^{2}} comparisons, which is highly inefficient. If possible, it is desirable to use a pruning algorithm to reduce the number of pairs of triangles that need to be checked. The most widely used family of algorithms is known as the hierarchical bounding volumes method. As a preprocessing step, for each object (e.g., S {\displaystyle S} and T {\displaystyle T} ) calculates a hierarchy of bounding volumes. Then, at each time step, when collisions need to be checked between S {\displaystyle S} and T {\displaystyle T} , the hierarchical bounding volumes are used to reduce the number of pairs of triangles under consideration. For simplicity, provide an example using bounding spheres, although it has been noted that spheres are undesirable in many cases. If E {\displaystyle E} is a set of triangles, a bounding sphere is pre-calculated. B ( E ) {\displaystyle B(E)} . There are many ways of choosing B ( E ) {\displaystyle B(E)} , B ( E ) {\displaystyle B(E)} is a sphere that completely contains E {\displaystyle E} and is as small as possible. Ahead of time, B ( S ) {\displaystyle B(S)} and B ( T ) {\displaystyle B(T)} can be computed. Clearly, if these two spheres do not intersect (and that is very easy to test), then neither do S {\displaystyle S} and T {\displaystyle T} . This is not much better than an n-body pruning algorithm, however. If E = E 1 , E 2 , … , E m {\displaystyle E={E_{1},E_{2},\dots ,E_{m}}} is a set of triangles, then split it into two halves L ( E ) := E 1 , E 2 , … , E m / 2 {\displaystyle L(E):={E_{1},E_{2},\dots ,E_{m/2}}} and R ( E ) := E m / 2 + 1 , … , E m − 1 , E m {\displaystyle R(E):={E_{m/2+1},\dots ,E_{m-1},E_{m}}} . Apply this to S {\displaystyle S} and T {\displaystyle T} , and calculate (ahead of time) the bounding spheres B ( L ( S ) ) , B ( R ( S ) ) {\displaystyle B(L(S)),B(R(S))} and B ( L ( T ) ) , B ( R ( T ) ) {\displaystyle B(L(T)),B(R(T))} . T

Visual analytics

Visual analytics is a multidisciplinary science and technology field that emerged from information visualization and scientific visualization. It focuses on how analytical reasoning can be facilitated by interactive visual interfaces. == Overview == Visual analytics is "the science of analytical reasoning facilitated by interactive visual interfaces." It can address problems whose size, complexity, and need for closely coupled human and machine analysis may make them otherwise intractable. Visual analytics advances scientific and technological development across multiple domains, including analytical reasoning, human–computer interaction, data transformations, visual representation for computation and analysis, analytic reporting, and the transition of new technologies into practice. As a research agenda, visual analytics brings together several scientific and technical communities from computer science, information visualization, cognitive and perceptual sciences, interactive design, graphic design, and social sciences. Visual analytics integrates new computational and theory-based tools with innovative interactive techniques and visual representations to enable human-information discourse. The design of the tools and techniques is based on cognitive, design, and perceptual principles. This science of analytical reasoning provides the reasoning framework upon which one can build both strategic and tactical visual analytics technologies for threat analysis, prevention, and response. Analytical reasoning is central to the analyst's task of applying human judgments to reach conclusions from a combination of evidence and assumptions. Visual analytics has some overlapping goals and techniques with information visualization and scientific visualization. There is currently no clear consensus on the boundaries between these fields, but broadly speaking the three areas can be distinguished as follows: Scientific visualization deals with data that has a natural geometric structure (e.g., MRI data, wind flows). Information visualization handles abstract data structures such as trees or graphs. Visual analytics is especially concerned with coupling interactive visual representations with underlying analytical processes (e.g., statistical procedures, data mining techniques) such that high-level, complex activities can be effectively performed (e.g., sense making, reasoning, decision making). Visual analytics seeks to marry techniques from information visualization with techniques from computational transformation and analysis of data. Information visualization forms part of the direct interface between user and machine, amplifying human cognitive capabilities in six basic ways: by increasing cognitive resources, such as by using a visual resource to expand human working memory, by reducing search, such as by representing a large amount of data in a small space, by enhancing the recognition of patterns, such as when information is organized in space by its time relationships, by supporting the easy perceptual inference of relationships that are otherwise more difficult to induce, by perceptual monitoring of a large number of potential events, and by providing a manipulable medium that, unlike static diagrams, enables the exploration of a space of parameter values These capabilities of information visualization, combined with computational data analysis, can be applied to analytic reasoning to support the sense-making process. == History == As an interdisciplinary approach, visual analytics has its roots in information visualization, cognitive sciences, and computer science. The term and scope of the field was defined in the early 2000s through researchers such as Jim Thomas, Kristin A. Cook, John Stasko, Pak Chung Wong, Daniel A. Keim and David S. Ebert. As a reaction to the September 11, 2001 attacks the United States Department of Homeland Security was established in late 2002, combining dozens of previously separated government agencies. Building upon earlier work on visual data mining by Daniel A. Keim starting in the late 1990s, this simultaneously lead to the development of a research agenda for visual analytics. As part of these efforts the National Visualization and Analytics Center (NVAC) at Pacific Northwest National Laboratory was established in 2004, whose charter was to develop system to mitigate information overload after the September 11, 2001 attacks in the intelligence community. Their research work determined core challenges, posed open research questions, and positioned visual analytics as a new research domain, in particular through the 2005 research agenda Illuminating the Path. In 2006, the IEEE VIS community led by Pak Chung Wong and Daniel A. Keim launched the annual IEEE Conference on Visual Analytics Science and Technology (VAST), providing a dedicated venue for research into visual analytics, which in 2020 merged to form the IEEE Visualization conference. In 2008, scope and challenges of visual analytics were conceptually defined by Daniel A. Keim and Jim Thomas in their influential book about visual data mining. The domain was further refined as part of the European Commissions FP7 VisMaster program in the late 2000s. == Topics == === Scope === Visual analytics is a multidisciplinary field that includes the following focus areas: Analytical reasoning techniques that enable users to obtain deep insights that directly support assessment, planning, and decision making Data representations and transformations that convert all types of conflicting and dynamic data in ways that support visualization and analysis Techniques to support production, presentation, and dissemination of the results of an analysis to communicate information in the appropriate context to a variety of audiences. Visual representations and interaction techniques that take advantage of the human eye's broad bandwidth pathway into the mind to allow users to see, explore, and understand large amounts of information at once. === Analytical reasoning techniques === Analytical reasoning techniques are the method by which users obtain deep insights that directly support situation assessment, planning, and decision making. Visual analytics must facilitate high-quality human judgment with a limited investment of the analysts’ time. Visual analytics tools must enable diverse analytical tasks such as: Understanding past and present situations quickly, as well as the trends and events that have produced current conditions Identifying possible alternative futures and their warning signs Monitoring current events for emergence of warning signs as well as unexpected events Determining indicators of the intent of an action or an individual Supporting the decision maker in times of crisis. These tasks will be conducted through a combination of individual and collaborative analysis, often under extreme time pressure. Visual analytics must enable hypothesis-based and scenario-based analytical techniques, providing support for the analyst to reason based on the available evidence. === Data representations === Data representations are structured forms suitable for computer-based transformations. These structures must exist in the original data or be derivable from the data themselves. They must retain the information and knowledge content and the related context within the original data to the greatest degree possible. The structures of underlying data representations are generally neither accessible nor intuitive to the user of the visual analytics tool. They are frequently more complex in nature than the original data and are not necessarily smaller in size than the original data. The structures of the data representations may contain hundreds or thousands of dimensions and be unintelligible to a person, but they must be transformable into lower-dimensional representations for visualization and analysis. === Theories of visualization === Theories of visualization include: Jacques Bertin's Semiology of Graphics (1967) Nelson Goodman's Languages of Art (1977) Jock D. Mackinlay's Automated design of optimal visualization (APT) (1986) Leland Wilkinson's Grammar of Graphics (1998) Hadley Wickham's Layered Grammar of Graphics (2010) === Visual representations === Visual representations translate data into a visible form that highlights important features, including commonalities and anomalies. These visual representations make it easy for users to perceive salient aspects of their data quickly. Augmenting the cognitive reasoning process with perceptual reasoning through visual representations permits the analytical reasoning process to become faster and more focused. == Process == The input for the data sets used in the visual analytics process are heterogeneous data sources (i.e., the internet, newspapers, books, scientific experiments, expert systems). From these rich sources, the data sets S = S1, ..., Sm are chosen, whereas each Si , i ∈ (1, ..., m) consists of attrib

Pwnie Awards

The Pwnie Awards are an annual awards ceremony that recognizes both excellence and incompetence in the field of information security, described by SecurityWeek as an event that "recognizes excellence and mocks incompetence in cybersecurity." Winners are selected by a committee of security industry professionals from nominations collected from the information security community. Nominees are announced yearly at Summercon, and the awards themselves are presented at the Black Hat Security Conference. == Origins == The name Pwnie Award is based on the word "pwn", which is hacker slang meaning to "compromise" or "control" based on the previous usage of the word "own" (and it is pronounced similarly). The name "The Pwnie Awards," pronounced as "Pony," is meant to sound like the Tony Awards, an awards ceremony for Broadway theater in New York City. == History == The Pwnie Awards were founded in 2007 by Alexander Sotirov and Dino Dai Zovi following discussions regarding Dino's discovery of a cross-platform QuickTime vulnerability (CVE-2007-2175) and Alexander's discovery of an ANI file processing vulnerability (CVE-2007-0038) in Internet Explorer. == Winners == === 2024 === Most Epic Fail: Crowdstrike for 2024 CrowdStrike incident Best Mobile Bug: Operation Triangulation Lamest Vendor Response: Xiaomi for obstructing Pwn2Own researchers from using their services Best Cryptographic Attack: GoFetch Best Desktop Bug: forcing realtime WebAudio playback in Chrome (CVE-2023-5996) Best Song: Touch Some Grass by UwU Underground Best Privilege Escalation: Windows Streaming Service UAF (CVE-2024-30089) by Valentina Palmiotti (chompie) Best Remote Code Execution: Microsoft Message Queuing (MSMQ) Remote Code Execution Vulnerability (CVE-2024-30080) Most Epic Achievement: Discovery and reverse engineering of the XZ Utils backdoor Most Innovative Research: Let the Cache Cache and Let the WebAssembly Assemble: Knocking’ on Chrome’s Shell by Edouard Bochin, Tao Yan, and Bo Qu Most Underhyped Research: See No Eval: Runtime Dynamic Code Execution in Objective-C === 2023 === Best Desktop Bug: CountExposure! by RyeLv(@b2ahex) Best Cryptographic Attack: Video-based cryptanalysis: Extracting Cryptographic Keys from Video Footage of a Device’s Power LED by Ben Nassi, Etay Iluz, Or Cohen, Ofek Vayner, Dudi Nassi, Boris Zadov, Yuval Elovici Best Song: Clickin’ Most Innovative Research: Inside Apple’s Lightning: Jtagging the iPhone for Fuzzing and Profit Most Under-Hyped Research: Activation Context Cache Poisoning Best Privilege Escalation Bug: URB Excalibur: Slicing Through the Gordian Knot of VMware VM Escapes Best Remote Code Execution Bug: ClamAV RCE Lamest Vendor Response: Three Lessons From Threema: Analysis of a Secure Messenger Most Epic Fail: “Holy fucking bingle, we have the no fly list,” Epic Achievement: Clement Lecigne: 0-days hunter world champion Lifetime Achievement Award: Mudge === 2022 === Lamest Vendor Response: Google's "TAG" response team for "unilaterally shutting down a counterterrorism operation." Epic Achievement: Yuki Chen’s Windows Server-Side RCE Bugs Most Epic Fail: HackerOne Employee Caught Stealing Vulnerability Reports for Personal Gains Best Desktop Bug: Pietro Borrello, Andreas Kogler, Martin Schwarzl, Moritz Lipp, Daniel Gruss, Michael Schwarz for Architecturally Leaking Data from the Microarchitecture Most Innovative Research: Pietro Borrello, Martin Schwarzl, Moritz Lipp, Daniel Gruss, Michael Schwarz for Custom Processing Unit: Tracing and Patching Intel Atom Microcode Best Cryptographic Attack: Hertzbleed: Turning Power Side-Channel Attacks Into Remote Timing Attacks on x86 by Yingchen Wang, Riccardo Paccagnella, Elizabeth Tang He, Hovav Shacham, Christopher Fletcher, David Kohlbrenner Best Remote Code Execution Bug: KunlunLab for Windows RPC Runtime Remote Code Execution (CVE-2022-26809) Best Privilege Escalation Bug: Qidan He of Dawnslab, for Mystique in the House: The Droid Vulnerability Chain That Owns All Your Userspace Best Mobile Bug: FORCEDENTRY Most Under-Hyped Research: Yannay Livneh for Spoofing IP with IPIP Best Song: Dialed Up by Project Mammoth === 2021 === Lamest Vendor Response: Cellebrite, for their response to Moxie, the creator of Signal, reverse-engineering their UFED and accompanying software and reporting a discovered exploit. Epic Achievement: Ilfak Guilfanov, in honor of IDA's 30th Anniversary. Best Privilege Escalation Bug: Baron Samedit of Qualys, for the discovery of a 10-year-old exploit in sudo. Best Song: The Ransomware Song by Forrest Brazeal Best Server-Side Bug: Orange Tsai, for his Microsoft Exchange Server ProxyLogon attack surface discoveries. Best Cryptographic Attack: The NSA for its disclosure of a bug in the verification of signatures in Windows which breaks the certificate trust chain. Most Innovative Research: Enes Göktaş, Kaveh Razavi, Georgios Portokalidis, Herbert Bos, and Cristiano Giuffrida at VUSec for their research on the "BlindSide" Attack. Most Epic Fail: Microsoft, for their failure to fix PrintNightmare. Best Client-Side Bug: Gunnar Alendal's discovery of a buffer overflow on the Samsung Galaxy S20's secure chip. Most Under-Hyped Research: The Qualys Research Team for 21Nails, 21 vulnerabilities in Exim, the Internet's most popular mail server. === 2020 === Best Server-Side Bug: BraveStarr (CVE-2020-10188) – A Fedora 31 netkit telnetd remote exploit (Ronald Huizer') Best Privilege Escalation Bug: checkm8 – A permanent unpatchable USB bootrom exploit for a billion iOS devices. (axi0mX) Epic Achievement: "Remotely Rooting Modern Android Devices" (Guang Gong) Best Cryptographic Attack: Zerologon vulnerability (Tom Tervoort, CVE-2020-1472) Best Client-Side Bug: RCE on Samsung Phones via MMS (CVE-2020-8899 and -16747), a zero click remote execution attack. (Mateusz Jurczyk) Most Under-Hyped Research: Vulnerabilities in System Management Mode (SMM) and Trusted Execution Technology (TXT) (CVE-2019-0151 and -0152) (Gabriel Negreira Barbosa, Rodrigo Rubira Branco, Joe Cihula) Most Innovative Research: TRRespass: When Memory Vendors Tell You Their Chips Are Rowhammer-free, They Are Not. (Pietro Frigo, Emanuele Vannacci, Hasan Hassan, Victor van der Veen, Onur Mutlu, Cristiano Giuffrida, Herbert Bos, Kaveh Razavi) Most Epic Fail: Microsoft; for the implementation of Elliptic-curve signatures which allowed attackers to generate private pairs for public keys of any signer, allowing HTTPS and signed binary spoofing. (CVE-2020-0601) Best Song: Powertrace by Rebekka Aigner, Daniel Gruss, Manuel Weber, Moritz Lipp, Patrick Radkohl, Andreas Kogler, Maria Eichlseder, ElTonno, tunefish, Yuki and Kater Lamest Vendor Response: Daniel J. Bernstein (CVE-2005-1513) === 2019 === Best Server-Side Bug: Orange Tsai and Meh Chang, for their SSL VPN research. Most Innovative Research: Vectorized Emulation Brandon Falk Best Cryptographic Attack: \m/ Dr4g0nbl00d \m/ Mathy Vanhoef, Eyal Ronen Lamest Vendor Response: Bitfi Most Over-hyped Bug: Allegations of Supermicro hardware backdoors, Bloomberg Most Under-hyped Bug: Thrangrycat, (Jatin Kataria, Red Balloon Security) === 2018 === Most Innovative Research: Spectre/Meltdown (Paul Kocher, Jann Horn, Anders Fogh, Daniel Genkin, Daniel Gruss, Werner Haas, Mike Hamburg, Moritz Lipp, Stefan Mangard, Thomas Prescher, Michael Schwarz, Yuval Yarom) Best Privilege Escalation Bug: Spectre/Meltdown (Paul Kocher, Jann Horn, Anders Fogh, Daniel Genkin, Daniel Gruss, Werner Haas, Mike Hamburg, Moritz Lipp, Stefan Mangard, Thomas Prescher, Michael Schwarz, Yuval Yarom) Lifetime Achievement: Michał Zalewski Best Cryptographic Attack: ROBOT - Return Of Bleichenbacher’s Oracle Threat Hanno Böck, Juraj Somorovsky, Craig Young Lamest Vendor Response: Bitfi hardware crypto-wallet, after the "unhackable" device was hacked to extract the keys required to steal coins and rooted to play Doom. === 2017 === Epic Achievement: Federico Bento for Finally getting TIOCSTI ioctl attack fixed Most Innovative Research: ASLR on the line Ben Gras, Kaveh Razavi, Erik Bosman, Herbert Bos, Cristiano Giuffrida Best Privilege Escalation Bug: DRAMMER Victor van der Veen, Yanick Fratantonio, Martina Lindorfer, Daniel Gruss, Clementine Maurice, Giovanni Vigna, Herbert Bos, Kaveh Razavi, Cristiano Giuffrida Best Cryptographic Attack: The first collision for full SHA-1 Marc Stevens, Elie Bursztein, Pierre Karpman, Ange Albertini, Yarik Markov Lamest Vendor Response: Lennart Poettering - for mishandling security vulnerabilities most spectacularly for multiple critical Systemd bugs Best Song: Hello (From the Other Side) - Manuel Weber, Michael Schwarz, Daniel Gruss, Moritz Lipp, Rebekka Aigner === 2016 === Most Innovative Research: Dedup Est Machina: Memory Deduplication as an Advanced Exploitation Vector Erik Bosman, Kaveh Razavi, Herbert Bos, Cristiano Giuffrida Lifetime Achievement: Peiter Zatko aka Mudge Best Cryptographic Attack: DROWN attack Nimrod Aviram et al. Best Song: Cyberlier - Katie Mous