Ilya Sutskever

Ilya Sutskever

Ilya Sutskever (Hebrew: איליה סוצקבר; born 1986) is a computer scientist who specializes in machine learning. He has made several major contributions to the field of deep learning, including sequence-to-sequence learning, reasoning models, GPT models, and contributions to CLIP, DALL-E, and AlphaGo. With Alex Krizhevsky and Geoffrey Hinton, he co-created AlexNet, a convolutional neural network. One of the most highly cited computer scientists in history, he has won the NeurIPS Test of Time Award for his lasting impact on AI research three times in a row (2022–2024) and received the National Academy of Sciences Award for the Industrial Application of Science in 2026. Sutskever co-founded and was chief scientist at OpenAI, where he oversaw the research breakthroughs that led to large language models and to the launch of ChatGPT. He also led the research that led to reasoning models such as o1. In 2023, he was one of the members of OpenAI's board that ousted Sam Altman as its CEO; Altman was reinstated a week later, and Sutskever stepped down from the board. In June 2024, Sutskever co-founded the company Safe Superintelligence Inc., alongside Daniel Gross and Daniel Levy. Within a year, the company was valued at more than $30 billion. == Early life and education == Sutskever was born in 1986 into a Jewish family in Nizhny Novgorod, Russia (then Gorky, Russian SFSR, Soviet Union). At the age of 5, he immigrated to Israel with his family and grew up in Jerusalem. Sutskever proved to be a good student in school, and in eighth grade started taking classes at the Open University of Israel. At 16, he moved with his family to Canada, where he attended high school for a month before being admitted to the University of Toronto in Ontario as a third-year undergraduate student. At the University of Toronto, Sutskever received a bachelor's degree in mathematics in 2005, a master's degree in computer science in 2007, and a PhD in computer science in 2013. His doctoral advisor was Geoffrey Hinton. In 2012, Sutskever built AlexNet in collaboration with Geoffrey Hinton and Alex Krizhevsky. == Career and research == In 2012, Sutskever spent about two months as a postdoc with Andrew Ng at Stanford University. He then returned to the University of Toronto and joined Hinton's new research company DNNResearch, a spinoff of Hinton's research group. In 2013, Google acquired DNNResearch and hired Sutskever as a research scientist at Google Brain. At Google Brain, Sutskever worked with Oriol Vinyals and Quoc Viet Le to create the sequence-to-sequence learning algorithm, and worked on TensorFlow. He is also one of the AlphaGo paper's many co-authors. At the end of 2015, Sutskever left Google to become cofounder and chief scientist of the newly founded organization OpenAI. In 2022, Sutskever tweeted, "it may be that today's large neural networks are slightly conscious", which triggered debates about AI consciousness. He is considered to have played a key role in the development of ChatGPT, and later in leading the research that led to reasoning models. He is credited with establishing OpenAI’s scaling ethos. In 2023, he announced that he would co-lead OpenAI's new "Superalignment" project, which was trying to solve the alignment of superintelligences within four years. He wrote that even if superintelligence seems far off, it could happen this decade. Sutskever was formerly one of the six board members of the nonprofit entity that controlled OpenAI. In November 2023, the board fired Sam Altman, saying that "he was not consistently candid in his communications with the board". He authored a 52-page memo that relied heavily on information from Mira Murati, accusing Altman of lying, manipulating executives, and fostering internal division. Sutskever submitted the memo to the board after months of tension and dissatisfaction with Altman's leadership style, and ultimately joined the board in voting for Altman's termination. In an all-hands company meeting shortly after the board meeting, Sutskever said that firing Altman was "the board doing its duty", but the next week, he expressed regret at having participated in Altman's ouster. Altman's firing and OpenAI's co-founder Greg Brockman's resignation led three senior researchers to resign from OpenAI. After that, Sutskever stepped down from the OpenAI board and was absent from OpenAI's office. Some sources suggested he was leading the team remotely, while others said he no longer had access to the team's work. In May 2024, Sutskever announced his departure from OpenAI to focus on a new project that was "very personally meaningful" to him. His decision followed a turbulent period at OpenAI marked by leadership crises and internal debates about the direction of AI development and alignment protocols. Jan Leike, the other leader of the superalignment project, announced his departure hours later, citing an erosion of safety and trust in OpenAI's leadership. In June 2024, Sutskever announced Safe Superintelligence Inc., a new company he founded with Daniel Gross and Daniel Levy with offices in Palo Alto and Tel Aviv. In contrast to OpenAI, which releases revenue-generating products, Sutskever said the new company's "first product will be the safe superintelligence, and it will not do anything else up until then". In September 2024, the company announced that it had raised $1 billion from venture capital firms including Andreessen Horowitz, Sequoia Capital, DST Global, and SV Angel. In March 2025, Safe Superintelligence Inc. raised $2 billion more and reportedly reached a $32 billion valuation, notably due to Sutskever's reputation. In June 2025, SSI rejected an offer from Meta Platforms to buy the company. Sutskever became CEO of SSI shortly thereafter, after co-founder and CEO Gross left for Meta. In an October 2024 interview after winning the Nobel Prize in Physics, Geoffrey Hinton expressed support for Sutskever's decision to fire Altman, emphasizing concerns about AI safety. During the Musk v. Altman trial in 2026, Sutskever confirmed he had a $7 billion stake in OpenAI. === Awards and honors === In 2015, Sutskever was named in MIT Technology Review's 35 Innovators Under 35. In 2018, he was the keynote speaker at Nvidia Ntech 2018 and AI Frontiers Conference 2018. In 2022, he was elected a Fellow of the Royal Society (FRS). In 2023 and 2024, included in Time's list of the 100 most influential people in AI In 2022, 2023, and 2024, he won Neural Information Processing Systems’ Test of Time award, which recognizes papers that significantly shaped the AI field over at least ten years. In 2025, he received an honorary doctorate from his alma mater, the University of Toronto In 2026, he received the National Academy of Sciences Award for the Industrial Application of Science, presented for the first time in artificial intelligence.

Spatiotemporal reservoir resampling

Spatiotemporal reservoir resampling, commonly known as ReSTIR (from "Reservoir-based SpatioTemporal Importance Resampling"), is a collection of computer graphics techniques for reusing samples during rendering. It was developed primarily to allow more realistic lighting in real-time rendering, because relatively few rays can be traced per pixel while maintaining an acceptable frame rate. It can also be used to speed up off-line path tracing. The first ReSTIR paper, published in 2020, provided algorithms for direct lighting, allowing scenes containing thousands of lights to be rendered in real time on a high-end GPU. Researchers later proposed versions for rendering indirect lighting (and more recently, motion blur and depth of field) and built up a framework of mathematical concepts and notation conventions that help analyze such algorithms. A major focus of this work is removing or reducing the bias that could be introduced when samples from other pixels or frames are reused—or selectively allowing some bias in order to speed up rendering and reduce variance (visible as "noise" in the image). Versions for path tracing apply transformations called shift mappings to samples, typically reusing parts of paths closer to the light and modifying the portion closer to the camera. ReSTIR-related papers and talks have been presented every year at the SIGGRAPH conference since 2020. One of the first games to incorporate ReSTIR into its rendering was Cyberpunk 2077. == Overview and motivation == According to Chris Wyman, one of the co-authors of the original paper, although developers commonly thought that bias was acceptable for real-time rendering, end users (e.g. gamers) are well-aware of the artifacts caused by bias and many have a negative opinion of common sample-reuse techniques such as temporal anti-aliasing (TAA), which may cause "ghosting" when the camera moves, and denoising, which causes blurring and other artifacts. ReSTIR techniques can reduce or avoid these types of bias by reusing samples of the set of possible paths taken by light to reach the camera, instead of reusing rendered pixel color values (which are typically the average of multiple samples, discarding information such as the direction of the light). While other techniques reuse samples in a generic post-processing step, ReSTIR passes can test for shadowing, and reused samples are converted into pixel color values by rendering code that takes the characteristics of different materials into account (e.g. by implementing BRDFs). However the output of ReSTIR is noisy, and a denoising pass is typically still used. Stochastic ray tracing techniques such as path tracing need to average multiple samples (produced by tracing individual rays) in order to render a visually acceptable image. When using a simple unbiased renderer based on Monte Carlo integration, halving the deviation of the result (apparent as "noise" in the image) requires multiplying the number of samples by four, meaning that a rapidly increasingly number of samples is needed to improve quality, Standard ways to mitigate this problem include importance sampling (which requires finding improved sampling distributions for specific situations), and quasi-Monte Carlo integration (which usually still requires tracing a large number of rays). ReSTIR offers a solution that multiplies the effective number of samples while tracing a fixed number of additional rays per frame. Temporal reuse multiplies the effective sample count by the number of frames rendered. Spatial reuse multiplies the effective count by the number of neighboring pixels examined. These two types of reuse can be combined, allowing spatial reuse to be applied recursively, which appears to offer an exponentially increasing effective sample count, however this is quickly limited by the size of the neighborhood used for spatial reuse. Spatial reuse is also potentially less effective near shadow and object edges, especially for objects with fine geometric detail, and temporal reuse is limited by movement of the camera and scene elements. == Variations == Many variations of ReSTIR have been proposed that generalize or improve the original technique (which builds on an earlier method called RIS), specialize it for particular types of illumination or other visual effects, or allow incorporation into rendering algorithms other than standard path tracing. Some published versions are listed below. == Algorithms == === Basic algorithm === ReSTIR uses a combination of resampled importance sampling (RIS) and weighted reservoir sampling (WRS) which the authors call streaming RIS. RIS processes samples from an initial probability distribution (e.g. a probability distribution for which a cheap sampling method exists) and generates samples in a new probability distribution (e.g. a sampling distribution that is optimal for rendering but is impractical to draw samples from directly). WRS allows this to be done while storing only a small number of samples in memory, which is especially helpful on a GPU. Information about the samples is stored in a data structure called a reservoir. WRS also allows samples from multiple reservoirs to be combined ("merged") into a single reservoir; this is crucial for sample reuse. Each pixel has a reservoir, typically containing only a single sample when ReSTIR is used for real-time rendering (some implementations use a larger number, e.g. four samples). The reservoir is typically initialized to a sample drawn using a simple method and is then updated by RIS steps and by reservoir merging, so that the pixel value produced by shading using the sample(s) currently in the reservoir, times the weight for the sample, is always an unbiased estimate of the correct pixel value. If appropriate resampling steps are used, the variance of this estimate (or some function of it, typically the luminance of the RGB color value) decreases with each step. A possible sequence of steps performed for each frame, suitable for computing unbiased direct illumination (DI) is: Perform reservoir resampling by drawing multiple light samples and using streaming RIS to choose one, using probabilities based on a target function, e.g. the luminance of the sample's contribution to the pixel. A weight is also computed for the sample. Typically, a single visibility check is performed here, after choosing a sample, setting the weight to 0 if the light is shadowed. Resampling (combined with the visibility check) ensures that the expected value of the weight times the sample brightness is the correct (unbiased) value for the pixel. (temporal reuse) For each pixel, merge the sample(s) from the previous frame into the current reservoir. Multiple importance sampling (MIS) weights are used to avoid bias due to the fact that the samples in the previous frame's reservoirs may have a different target probability distribution if the objects, lights, or camera have moved. (spatial reuse) For each pixel, choose one or more neighboring pixels and merge their samples into the current pixel's reservoir. Multiple importance sampling (MIS) weights are used to avoid bias due to the fact that the samples in each pixel's reservoir have a different target probability distribution. Because computing unbiased MIS weights requires tracing additional rays (along with other work such as evaluating BRDFs), real-time rendering often uses only a single neighboring pixel. Use the sample in each pixel's reservoir, along with its weight, to determine the color of the pixel for the current frame. Alternatively, multiple samples examined during the preceding steps may be averaged and used to shade the pixel instead (decoupled shading and sampling). For direct lighting, the initial samples used in step 1 are typically drawn by importance sampling from the set of lights in a scene. The algorithm above (from the original ReSTIR paper) draws many lower-quality light samples (e.g. 32) using a fast method, without considering visibility, and chooses one using streaming RIS. Visibility is then tested for the final chosen sample. Considering visibility for each sample drawn would require tracing 32 rays, which would make it much more expensive. The intent is to reduce the number of rays traced, relying on the sample reuse in steps 2 and 3 to make up for the loss of quality caused by rejecting many of the rays due to shadowing. A large part of the initial efforts to optimize ReSTIR (to make it run in real-time on available hardware) went into reducing the cost of randomly sampling the lights. Glossy surfaces may require a larger number of samples, and combining light sampling with BRDF sampling (using MIS) may increase quality. Step 2 (temporal reuse) is sometimes skipped for off-line rendering, and the output of multiple repetitions of initial sampling and spatial reuse is averaged instead; this helps avoids artifacts due to correlations. Step 3 (spatial reuse) may be repeated multiple times in a single frame.

Power cycling

Power cycling is the act of turning a piece of equipment, usually a computer, off and then on again. Reasons for power cycling include having an electronic device reinitialize its set of configuration parameters or recover from an unresponsive state of its mission critical functionality, such as in a crash or hang situation. Power cycling can also be used to reset network activity inside a modem. It can also be among the first steps for troubleshooting an issue. == Overview == Power cycling can be done manually, usually using the power switch on the device, or remotely, through some type of external device connected to the power input. In the data center environment, remote control power cycling can usually be done through a power distribution unit, over the network. In the home environment, this can be done through home automation powerline communications. Most Internet service providers publish a "how-to" on their website showing their customers the correct procedure to power cycle their devices. Power cycling is a common diagnostic procedure usually performed first when a computer system freezes. However, frequently power cycling a computer can cause thermal stress. Reset has an equal effect on the software but may be less problematic for the hardware as power is not interrupted. == Historical uses == On all Apollo missions to the moon, the landing radar was required to acquire the surface before a landing could be attempted. But on Apollo 14, the landing radar was unable to lock on. Mission control told the astronauts to cycle the power. They did, the radar locked on just in time, and the landing was completed. During the Rosetta mission to comet 67P/Churyumov–Gerasimenko, the Philae lander did not return the expected telemetry on awakening after arrival at the comet. The problem was diagnosed as "somehow a glitch in the electronics", engineers cycled the power, and the lander awoke correctly. During the launch of the billion dollar AEHF-6 satellite on 26 March 2020 by an Atlas V rocket from Cape Canaveral Space Force Station in Florida, a hold was called at T-46 seconds due to hydraulic system not responding as expected. The launch crew turned it off and back on, and the launch proceeded normally. In 2023 the Interstellar Boundary Explorer spacecraft stopped responding to commands after an anomaly. When gentler techniques failed, NASA resorted to rebooting the spacecraft with the remote equivalent of a power cycle.

Ethiopian feminists facing digital gender-based violence

Against a background of traditional views of women, rising internet use, a young population and an unsafe offline life, women and girls in Ethiopia are facing increasing amounts of digital violence. Some women, feeling endangered, have left the country as a result. Researchers, activists and lawyers have called for online content to be taken down and specific digital legislation to be drafted and enforced. == Online violence and its offline effects == Sexual violence against women and girls in Ethiopia is common. In 2023, in the Women, Peace and Security Index by Georgetown University, Ethiopia came 146 out of 177 countries. Over several years online harassment of and violence against women and girls in Ethiopia has increased. It can range from sexist remarks about appearance and women’s role in society, to revenge porn, threats of beating, acid attacks, abduction, rape or death. The real-life effect on women and girls of these attacks can include mental health problems, damaged reputations and a withdrawal from public and economic life. When the online attacks migrate to the real world, for example when online attackers find out where the targeted women and girls live, this can result in physical attacks, street harassment, threats to children and can cause victims to move house or job or even flee the country in fear of femicide. In a country that criminalises homosexuality, it can also lead to physical attacks on LGBTQI+ people in particular and indeed on anybody labelled as homosexual. == Research studies == The Centre for Information Resilience (CIR) conducted interviews with Ethiopian women holding public roles or being active online. The centre published a report on this in 2024 entitled ‘Silenced, Shamed and Threatened’. They found that technology-facilitated gender-based violence (TFGBV) had become “normalised to the point of invisibility.” In 2024, CER also published an analysis of gendered hate speech on social media in Ethiopia called ‘Normalised and invisible.’ It is thought that traditional views of women, the young population, the rise in internet use and the war in Tigray, when sexual violence was used as a weapon of war by Ethiopian and Eritrean soldiers, have all helped to create an online environment in which even femicide is considered unremarkable. AFP Fact Check collaborated with Deutsche Welle Akademie, to investigate the cyber harassment of women in Ethiopia, analysing misogynistic posts published on TikTok and Facebook. They discovered disparaging remarks about women’s physical appearance, threats of acid attacks and other physical violence, and the public sharing of women’s phone numbers. == Individuals affected == Women in particular jeopardy of digital gender-based violence are feminists, activists, politicians and those with a public profile. Some women are known to have fled Ethiopia fearing for their lives after online and offline threats. Yordanos Bezabih, an Ethiopian women’s rights activist, started a campaign with the hashtag #JusticeforHeaven to fight against gender-based cyberspace violence. As a result, she herself become a target. She experienced years of online threats of acid attacks, gang-rape and death. In 2025, subscribers to an online community organised a search for her address. Deepfake nude images of her were shared, she was filmed in real life, her house and online accounts were broken into, her private photos and messages posted on social media. When the attackers finally circulated her address, suggesting that she be executed, she left Ethiopia on a human rights defender scholarship. In 2023, Lella Misikir helped to start a campaign, called ‘My Whistle, My Voice’, that suggested women carry whistles and use them if they were harassed in the street. A TikTok video of the campaign became popular. Shortly after, videos of Misikir were circulated suggesting that she was gay. Her online attackers next searched for her address. In November 2024, Misikir left the country. == Legal issues == Ethiopia has some laws on online harassment and defamation, for example the Computer Crimes Proclamation. However, technology-facilitated, gender-based violence (TFGBV), such as deepfakes, non-consensual image sharing, and coordinated harassment, is not explicitly recognized as crime. In practice too, women are often not believed when reporting such violence and are not taken seriously. Police advice is often that women affected should simply leave the online space. Social media platforms can remove content when it is brought to their attention but the offenders are not banned. Users can only block them.

Matt Mullenweg

Matthew Charles Mullenweg (born January 11, 1984) is an American web developer and entrepreneur. He is known as a co-founder of the free and open-source web publishing software WordPress, and the founder of Automattic. == Early life and education == Mullenweg was born January 11, 1984, in Houston, Texas, to Chuck and Kathleen Mullenweg and grew up in the Willowbend neighborhood. His older sister was born in 1974. His father, who died in 2016, was a computer programmer who worked for Brown & Root, and encouraged his children to start using home computers at an early age. His mother was a stay-at-home mother. The Mullenwegs were raised Catholic. He attended Kinder High School for the Performing and Visual Arts, studying jazz and playing the saxophone. Mullenweg suffered from migraines as a child that forced him to miss extended periods of school. He attended the University of Houston for two years, studying philosophy and political science. He dropped out after his sophomore year in 2004 to work for CNET, which promised him that he could allocate time to the development of WordPress. == Career == Mullenweg began blogging in 2002 on the open source platform b2. B2 developer Michael Valdrighi abandoned the project and Mullenweg took it over in 2003. He and Mike Little created a b2 fork that year they called WordPress and published it under the GNU General Public License. In March 2003, he co-founded the Global Multimedia Protocols Group (GMPG) with Eric A. Meyer and Tantek Çelik. In April 2004, he helped launch Ping-O-Matic, a mechanism for notifying search engines about blog updates. In October 2004, he was hired by CNET who would allow him to develop WordPress part-time as part of his job. He dropped out of college and moved to San Francisco for the position. === Automattic === After leaving CNET in 2005, Mullenweg founded Automattic as a fully distributed company. Toni Schneider was hired as CEO so Mullenweg could learn how to manage a large organization. During this period, Mullenweg focused on product development while Schneider managed the company. In January 2014, Mullenweg resumed the role of CEO, replacing Schneider. He led Automattic's expansion and a series of acquisitions, including WooCommerce in 2015, The Atavist Magazine in 2018, Tumblr in 2019, Pocket Casts in 2021, and Beeper in 2024. Mullenweg received the Heinz Award for Technology, the Economy and Employment in 2016, for "helping to democratize online publishing". Automattic's valuation reached $7.5 billion in 2021. At the time, WordPress hosted 28 million websites, or 40 percent of all websites on the Internet. == Public disputes == On several occasions, Mullenweg has publicly challenged competitors to WordPress and WordPress.com. He has stated that he prefers to settle disputes in the court of public opinion and described his approach as "brinksmanship", noting that the potential cost of legal action could put Automattic in a "tough spot". In 2008, shortly before WordPress 2.5's release, Six Apart's Movable Type published "A WordPress 2.5 Upgrade Guide"—a comparison of their CMS with their rival, WordPress—as a company blog article that Mullenweg characterized as "desperate and dirty". In 2013, developers on the digital marketplace Envato were banned from speaking at WordPress events after he criticized the platform for selling WordPress themes with the graphics and CSS components under a proprietary license instead of the GPL. In 2016, Mullenweg accused Wix.com, a competitor to WordPress.com, of reusing WordPress's mobile text editor code in Wix's own mobile app without adhering to the terms of the GPL. Despite the license's requirement to publish anything built with GPL code under the GPL, Wix's CEO claimed that the company open-sourced their forked version of the component and satisfied the license's terms before the app switched to its own fork of the MIT-licensed text editor that the WordPress editor was based upon. The new fork added a clause to the MIT license that forbids redistribution under any other license. In 2022, Mullenweg criticized GoDaddy for not reinvesting in the WordPress project sufficiently. On January 9, 2025, the representative of the WordPress Sustainability team, Thijs Buijs, resigned via WordPress.org’s Slack channel, citing dissatisfaction with Matt Mullenweg’s December 24, 2024, Reddit post titled “What drama should I create in 2025?” highlighting concerns about what he described as “unsustainable leadership”. In response, Matt Mullenweg thanked Thijs Buijs for reminding him of the existence of a sustainability team, announced its disbanding, and subsequently closed Wordpress.org's #sustainability Slack channel. === Tumblr === Mullenweg began a three-month sabbatical from his role as CEO at the beginning of February 2024. During that time, Mullenweg engaged in a public feud with a transgender Tumblr user who, frustrated with the failure of Tumblr (owned by Automattic) to address transphobic harassment, posted that she wished Mullenweg would die in a comedic way. The user was subsequently banned. Responding to user uproar, Mullenweg addressed the ban in posts on his personal Tumblr blog, in which he characterized the post as a death threat, and shared private account information about the user. Mullenweg also responded to individual commenters on Tumblr in posts and direct messages, and went to Twitter to respond to the banned user's tweets about the situation. A few days later, transgender employees of Tumblr and Automattic made a post on the official Tumblr staff blog characterizing his response as "unwarranted and harmful" and stating that he did not speak on their behalf. They also said that the user's post was not a realistic threat of violence and not the reason for her ban. === WP Engine dispute === == Audrey Capital == Mullenweg is a principal at angel investment firm Audrey Capital, which he co-founded in 2008 alongside Naveen Selvadurai and Audrey Kim. As of 2024, the company lists investments in companies such as CoinDesk, MakerBot, Sonos, SpaceX, Ring, as well as software companies including Calm, Chartbeat, DailyBurn, Memrise, Genius, Nord Security and Telegram. It has also funded startups that provide services to web developers including Creative Market, GitLab, NPM, SendGrid, Stripe and Typekit. From 2017 to 2019, Mullenweg also served as a board member for GitLab. Mullenweg has employed a team of contributors to WordPress through Audrey Capital since 2010, who work separately from Automattic. On the 20th anniversary of WordPress' initial release, Mullenweg announced a scholarship program aimed at the children of significant contributors to open-source projects.

Anomaly detection

In data analysis, anomaly detection (also referred to as outlier detection and sometimes as novelty detection) is generally understood to be the identification of rare items, events or observations which deviate significantly from the majority of the data and do not conform to a well defined notion of normal behavior. Such examples may arouse suspicions of being generated by a different mechanism, or appear inconsistent with the remainder of that set of data. Anomaly detection finds application in many domains including cybersecurity, medicine, machine vision, statistics, neuroscience, law enforcement and financial fraud to name only a few. Anomalies were initially searched for clear rejection or omission from the data to aid statistical analysis, for example to compute the mean or standard deviation. They were also removed to better predictions from models such as linear regression, and more recently their removal aids the performance of machine learning algorithms. However, in many applications anomalies themselves are of interest and are the observations most desirous in the entire data set, which need to be identified and separated from noise or irrelevant outliers. Three broad categories of anomaly detection techniques exist. Supervised anomaly detection techniques require a data set that has been labeled as "normal" and "abnormal" and involves training a classifier. However, this approach is rarely used in anomaly detection due to the general unavailability of labelled data and the inherent unbalanced nature of the classes. Semi-supervised anomaly detection techniques assume that some portion of the data is labelled. This may be any combination of the normal or anomalous data, but more often than not, the techniques construct a model representing normal behavior from a given normal training data set, and then test the likelihood of a test instance to be generated by the model. Unsupervised anomaly detection techniques assume the data is unlabelled and are by far the most commonly used due to their wider and relevant application. == Definition == Many attempts have been made in the statistical and computer science communities to define an anomaly. The most prevalent ones include the following, and can be categorised into three groups: those that are ambiguous, those that are specific to a method with pre-defined thresholds usually chosen empirically, and those that are formally defined: === Ill defined === An outlier is an observation which deviates so much from the other observations as to arouse suspicions that it was generated by a different mechanism. Anomalies are instances or collections of data that occur very rarely in the data set and whose features differ significantly from most of the data. An outlier is an observation (or subset of observations) which appears to be inconsistent with the remainder of that set of data. An anomaly is a point or collection of points that is relatively distant from other points in multi-dimensional space of features. Anomalies are patterns in data that do not conform to a well-defined notion of normal behaviour. === Specific === Let T be observations from a univariate Gaussian distribution and O a point from T. Then the z-score for O is greater than a pre-selected threshold if and only if O is an outlier. == History == === Intrusion detection === The concept of intrusion detection, a critical component of anomaly detection, has evolved significantly over time. Initially, it was a manual process where system administrators would monitor for unusual activities, such as a vacationing user's account being accessed or unexpected printer activity. This approach was not scalable and was soon superseded by the analysis of audit logs and system logs for signs of malicious behavior. By the late 1970s and early 1980s, the analysis of these logs was primarily used retrospectively to investigate incidents, as the volume of data made it impractical for real-time monitoring. The affordability of digital storage eventually led to audit logs being analyzed online, with specialized programs being developed to sift through the data. These programs, however, were typically run during off-peak hours due to their computational intensity. The 1990s brought the advent of real-time intrusion detection systems capable of analyzing audit data as it was generated, allowing for immediate detection of and response to attacks. This marked a significant shift towards proactive intrusion detection. As the field has continued to develop, the focus has shifted to creating solutions that can be efficiently implemented across large and complex network environments, adapting to the ever-growing variety of security threats and the dynamic nature of modern computing infrastructures. == Applications == Anomaly detection is applicable in a very large number and variety of domains, and is an important subarea of unsupervised machine learning. As such it has applications in cyber-security, intrusion detection, fraud detection, fault detection, system health monitoring, event detection in sensor networks, detecting ecosystem disturbances, defect detection in images using machine vision, medical diagnosis and law enforcement. === Intrusion detection === Anomaly detection was proposed for intrusion detection systems (IDS) by Dorothy Denning in 1986. Anomaly detection for IDS is normally accomplished with thresholds and statistics, but can also be done with soft computing, and inductive learning. Types of features proposed by 1999 included profiles of users, workstations, networks, remote hosts, groups of users, and programs based on frequencies, means, variances, covariances, and standard deviations. The counterpart of anomaly detection in intrusion detection is misuse detection. === Fintech fraud detection === Anomaly detection is vital in fintech for fraud prevention. === Preprocessing === Preprocessing data to remove anomalies can be an important step in data analysis, and is done for a number of reasons. Statistics such as the mean and standard deviation are more accurate after the removal of anomalies, and the visualisation of data can also be improved. In supervised learning, removing the anomalous data from the dataset often results in a statistically significant increase in accuracy. === Video surveillance === Anomaly detection has become increasingly vital in video surveillance to enhance security and safety. With the advent of deep learning technologies, methods using Convolutional Neural Networks (CNNs) and Simple Recurrent Units (SRUs) have shown significant promise in identifying unusual activities or behaviors in video data. These models can process and analyze extensive video feeds in real-time, recognizing patterns that deviate from the norm, which may indicate potential security threats or safety violations. An important aspect for video surveillance is the development of scalable real-time frameworks. Such pipelines are required for processing multiple video streams with low computational resources. === IT infrastructure === In IT infrastructure management, anomaly detection is crucial for ensuring the smooth operation and reliability of services. These are complex systems, composed of many interactive elements and large data quantities, requiring methods to process and reduce this data into a human and machine interpretable format. Techniques like the IT Infrastructure Library (ITIL) and monitoring frameworks are employed to track and manage system performance and user experience. Detected anomalies can help identify and pre-empt potential performance degradations or system failures, thus maintaining productivity and business process effectiveness. === IoT systems === Anomaly detection is critical for the security and efficiency of Internet of Things (IoT) systems. It helps in identifying system failures and security breaches in complex networks of IoT devices. The methods must manage real-time data, diverse device types, and scale effectively. Garg et al. have introduced a multi-stage anomaly detection framework that improves upon traditional methods by incorporating spatial clustering, density-based clustering, and locality-sensitive hashing. This tailored approach is designed to better handle the vast and varied nature of IoT data, thereby enhancing security and operational reliability in smart infrastructure and industrial IoT systems. === Petroleum industry === Anomaly detection is crucial in the petroleum industry for monitoring critical machinery. A 2015 paper proposed a novel segmentation algorithm using support vector machines to analyze sensor data for real-time anomaly detection. === Oil and gas pipeline monitoring === In the oil and gas sector, anomaly detection is not just crucial for maintenance and safety, but also for environmental protection. Aljameel et al. propose an advanced machine learning-based model for detecting minor leaks in oil and gas pipelines, a task traditional methods may miss.

Abjjad

Abjjad is an Arabic reading application that was launched in June 2012 by Eman Hylooz. Abjjad offers users the ability to download and read thousands of books offline through its iOS and Android applications. In December of 2020, Abjjad had more than 1.5 million registered accounts. == About Abjjad == Abjjad was founded in June 2012 by Eman Hylooz as a reader community dedicated to Arab readers, authors, and book lovers. Abjjad developed into a smart electronic platform to provide Arabic electronic books with ease to Arab readers everywhere after discovering a large gap in the world of Arab publishing, which is the legal electronic publishing, by forming strategic partnership with Arab publishers such as Dar Al-Shorouk, Dar Al Tanweer, Dar Al Adab, and Dar Al Saqi. == History == In May 2012, Oasis500 provided Abjjad with the seed funding to launch the website. In June 2012, Abjjad was launched with a budget of 15 thousand dollars. Within the first three months more than 10 thousand members were registered in Abjjad. Abjjad has participated in different local and international forums to meet several investors and entrepreneurs. In October 2012 Abjjad participated in Global thinkers forum in Amman, Jordan where Eman Hylooz, founder & CEO, presented the concept of Abjjad, its vision and future plans In mid-December 2012 Abjjad participated in Global Entrepreneurship in Dubai where it was presented to investors as a start-up and a new project in the Middle East. In February 2013 Abjjad was one of ten startups MENA apps has nominated from Jordan and Palestine to participate in startup Turkey. In May 2013 Abjjad participated in World Economic Forum in Amman, Jordan and later in June 2013 participated in Arab Net in Dubai. By the end of 2013, Abjjad won the Mohammed Bin Rashid Al Maktoum's Best Arab Start-Up Business Award for 2013. During 29 October 2013 till January 2014 Abjjad has launched their campaign for crowd funding through Eureeca Abjjad managed to raise US$161,000 in 88 days from 43 regional donors, over US$40,000 over its initial target. By the end of 2020. Abjjad had raised a $1 million investment round led by Jordan Entrepreneurship Fund, Ramal Capital Fund, and JordInvest Fund. Because the funds will be used to acquire users and e-books, Abjjad hopes to become the largest Arab electronic library as well as the largest income-generating platform for Arab authors and publishers, while also providing readers with a unique digital reading experience. == Features == The ability to read an unlimited number of books from an electronic library containing thousands of Arabic and translated books. Abjjad ebook library is constantly expanding and cooperating with new publishing houses to add more books. Reading offline without an internet connection. The application allows the user to download books in seconds and read them anywhere. Intuitive feature which include the ability to flip the pages of the book, highlight the reader's favorite quotes, and add notes, in addition to night reading mode and the option to modify the style and size of the front. The ability to interact with other readers and read their book reviews. More than 1.5 million Arabic readers make up the Abjjad reader community, and the user can read and connect with their reviews, book ratings, and favorite quotes. A virtual personal library that enables the user to rate and organize books by placing them on one of the three shelves: I will read it, currently readings, and/or read it. Abjjad's library includes various genres and literary fields, such as: reference books, novels, stories, literature, psychological books, philosophy, biography, politics, history, religion, self-improvement and human development books, as well as international books translated into Arabic. The library includes the most famous works of Arab authors such as: Naguib Mahfouz, Mahmoud Darwish, Radwa Ashour, Tayeb Salih. Aside from Arabic translation of works by well-known worldwide authors including: Elif Shafak, Fyodor Dostoevsky, Mark Manson, and others. == Statistics == In December of 2020, Abjjad had more than 1.5 million registered accounts. == Awards and honors == 2013: Won the Mohammad Bin Rashid Award for Best Arabic Startup 2014: Won the Golden Award for Jawa's "Best Online Community" 2015: Won the Business Women of the Year Award by Bank al Etihad 2016: Won the Said Khoury Award for Entrepreneurs and Innovators 2016: Won the Best Application in the Arabic Region Award by His Highness Sheikh Salem Al-Ali Al-Sabah in Kuwait. 2019: Won the Mohammad Bin Rashid Award for Arabic Language for the best artistic, cultural or intellectual world to serve the Arabic language. == Abjjad in the media == Abjjad has taken a huge interest in the Middle Eastern and western media; the author of Startup Rising: The Entrepreneurial Revolution Remaking the Middle East, Christopher M. Schroeder, has interviewed Eman Hylooz and wrote about her experience with Abjjad in his book. In addition, France24-Monte Carlo Doualiya has interviewed Ms. Hylooz on Retweet program to discuss Abjjad idea and provide the latest statistics of the website. Moreover, Sky News Arabia interviewed Hylooz to relate her experience with Oasis500 and Eureeca in Abjjad's crowdinvestment campaignPage text. furthermore, Al-Aan TV interviewed Ms.Hylooz in ArabNet in Dubai, 2013. Abjjad has been mentioned on Oasis500 website as one of the five startups which the company funded and gained different prizes. Wamda, Mediame and crowdfundinsider have discussed Abjjad's experience in the crowd investment on Eureeca. And the expert in the Arabic literature in English, M. Lynx Qualey, has interviewed Eman Hylooz in March 2013 to talk about Abjjad's story of success, how it differs from other social networks and what are its future plans. Abjjad was also featured in "Hashtag Arabi" website when it launched its premium subscription called "Abjjad Unlimited" in 2017 with the support of the Abdul Hameed Shoman Foundation. In her interview with the Jordan Times, Eman also discussed her background in computer science and software development, which helped her found Abjjad.