Your AI Slop Bores Me (stylized in all lowercase) is a website and social experiment created by programmer Mihir Maroju. Serving as a parody of large language models (LLMs) like ChatGPT and Claude, all questions and image prompts posed by users are answered by other, randomly-selected human users of the site. As of March 2026, the site has reached 50 million hits and sits at 16,000 concurrent users. == Background == In an interview with Fast Company, Maroju said he was inspired to create the site by his frustration with AI proliferating the internet with AI generated content, saying the site came from "a frustration for AI art and its proliferation, making artists' lives worse and also just filling the internet with low-effort generic slop". == Overview == The site has a credit system, in which a first-time user will be given 1 credit for free. Every 10 minutes, if a user has 0 credits, they will receive 2 credits. Once the credits are used up, the user can no longer do prompts unless the user earns them. The user can earn credits by responding to other user's prompts by "larping as AI" while given a 75-second time limit. Prompts can either be for a written response, or a drawing for the other user to fulfill the prompt. The maximum amount of credits a user can have is 6 credits, and cannot exceed the maximum limit. If the prompting user activates "thinking mode", the countdown is extended to 150 seconds for the cost of 2 credits. == Reception == The site has garnered attention and praise from X users, and across many online communities. The Daily Dot's Rachel Kiley wrote that "the best part about the game is that there's really no right or wrong way to do it. Humans aren't LLMs trained on copyrighted material and the whole of the free internet, but we do retain a certain amount of the information we've learned from those things over the course of our lives, while also being capable of creativity". Chris Taylor of Mashable called the site "amateurish and charming". Aftermath's Nicole Carpenter wrote that the site reminded her of "the human touch of chaos".
DBGallery
DBGallery, short for Database Gallery, is a cloud-based Software as a Service (SaaS) and on-prem webserver for teams of various sizes. DBGallery enables users to centrally store, manage, catalog, archive, and securely share image, video, and document files. It facilitates version control, detects duplicates, and offers an intuitive and advanced search functionality, making assets easily accessible to all users. It takes advantage of current AI technologies to automatically add significant metadata to images, facilitates custom-trained AI models, and offers bespoke AI features. Additionally, DBGallery provides team management tools, workflow management, an activity audit trail, and other collaborative features that foster a productive environment for both internal and external stakeholders. == History == DBGallery's first public release was December 2007. Since then each year has seen continuous enhancements. 2013 added support for additional non-English languages in its meta-data. 2014 added support for creating custom data fields for tagging and search. In 2015 included the ability to auto-tag images using Reverse Geocoding. 2018 added artificial intelligence (AI) image recognition as a further addition to auto-tagging. March 2020 added complete image collection management via the web (e.g. file and folder drag and drop), a new collection dashboard, custom data layouts, and an improved audit trail. 2021 saw user experience improvements provided by improved styling and performance enhancements. Version 12 was released in October 2021. It added the ability to upload unlimited file sizes and made significant performance improvements for very large collections. June 2022 saw the release of a global duplicate images search. In late 2022, DBGallery began offering significantly reduced cloud storage cost, at a third of its previous prices, which played into its recent high-volume/high-capacity capabilities and its clients' subsequent demand for additional storage. 2023 saw improvements in user and role management, introduced it's mobile app (PWA), and improved custom-trained object detection. Release 14.0 in the spring of 2024 had large sharing improvements and a new find related images feature. Winter 2025's v15 release introduced AI-generated image descriptions, image-to-text, and facial recognition.
Artificial intelligence in India
The artificial intelligence (AI) market in India is projected to reach $8 billion by 2025, growing at 40% CAGR from 2020 to 2025. This growth is part of the broader AI boom, a global period of rapid technological advancements with India being pioneer starting in the early 2010s with NLP based Chatbots from Haptik, Corover.ai, Niki.ai and then gaining prominence in the early 2020s based on reinforcement learning, marked by breakthroughs such as generative AI models from Krutrim, Sarvam, CoRover, OpenAI and Alphafold by Google DeepMind. In India, the development of AI has been similarly transformative, with applications in healthcare, finance, and education, bolstered by government initiatives like NITI Aayog's 2018 National Strategy for Artificial Intelligence. Institutions such as the Indian Statistical Institute and the Indian Institute of Science published breakthrough AI research papers and patents. India's transformation to AI is primarily being driven by startups and government initiatives & policies like Digital India. By fostering technological trust through digital public infrastructure, India is tackling socioeconomic issues by taking a bottom-up approach to AI. NASSCOM and Boston Consulting Group estimate that by 2027, India's AI services might be valued at $17 billion. According to 2025 Technology and Innovation Report, by UN Trade and Development, India ranks 10th globally for private sector investments in AI. According to Mary Meeker, India has emerged as a key market for AI platforms, accounting for the largest share of ChatGPT's mobile app users and having the third-largest user base for DeepSeek in 2025. While AI presents significant opportunities for economic growth and social development in India, challenges such as data privacy concerns, skill shortages, and ethical considerations need to be addressed for responsible AI deployment. The growth of AI in India has also led to an increase in the number of cyberattacks that use AI to target organizations. == History == === Early days (1960s-1980s) === The TIFRAC (Tata Institute of Fundamental Research Automatic Calculator) was designed and developed by a team led by Rangaswamy Narasimhan between 1954 and 1960. He worked on pattern recognition from 1961 to 1964 at the University of Illinois Urbana-Champaign's Digital Computer Laboratory. In order to conduct research on database technology, computer networking, computer graphics, and systems software, he and M. G. K. Menon founded the National Centre for Software Development and Computing Techniques. In 1965, he established the Computer Society of India and supervised the initial research work on AI at Tata Institute of Fundamental Research. Jagdish Lal launched the first computer science program in 1976 at Motilal Nehru Regional Engineering College. H. K. Kesavan from the University of Waterloo and Vaidyeswaran Rajaraman from the University of Wisconsin–Madison joined the IIT Kanpur Electrical Engineering Department in 1963–1964 as Assistant Professor and Head of Department, respectively. H.N. Mahabala, who was employed at Bendix Corporation's Computer Division, joined the department in 1965. He previously worked with Marvin Minsky. The IIT Kanpur Computer Center was led by H. K. Kesavan, with Vaidyeswaran Rajaraman serving as his deputy. Kesavan informally permitted Rajaraman and Mahabala to introduce artificial intelligence into computer science classes. The computer science program was approved by IIT Kanpur in 1971 and split out from the electrical engineering department. In 1973, an IBM System/370 Model 155 was installed at IIT Madras. John McCarthy, head of the Artificial Intelligence Laboratory at Stanford University visited IIT Kanpur in 1971. He donated PDP-1 with a time-sharing operating system. During the 1970s, the balance of payments deficit in India restricted import of computers. The Department of Computer Science and Automation at the Indian Institute of Science established in 1969, played an important role in nurturing the development of data science and artificial intelligence in India. First course on AI was introduced in the 1970s by G. Krishna. B. L. Deekshatulu introduced the first course on pattern recognition in the early 1970s. === Foundation phase === ==== 1980s ==== In the 1980s, the Indian Statistical Institute's Optical Character Recognition Project was one of the country's first attempts at studying artificial intelligence and machine learning. OCR technology has benefited greatly from the work of ISI's Computer Vision and Pattern Recognition Unit, which is headed by Bidyut Baran Chaudhuri. He also contributed in the development of computer vision and digital image processing. As part of the Indian Fifth Generation Computer Systems Research Programme, the Department of Electronics, with support from the United Nations Development Programme, initiated the Knowledge Based Computer Systems Project in 1986, marking the beginning of India's first major AI research program. Prime Minister Rajiv Gandhi requested that the Department of Electronics and IISc to initiate the Parallel Processing Project in 1986–1987. The Center for Development of Advanced Computing eventually joined those efforts. IIT Madras was selected to develop system diagnosis, ISI for image processing, National Centre for Software Technology for natural language processing and TIFR for speech processing. In 1987, the proposal of N. Seshagiri, Director General of the National Informatics Centre for the prototype development of supercomputer was cleared. Negotiations for a Cray supercomputer were underway between the Reagan administration and the Rajiv Gandhi government. US Defense Secretaries Frank Carlucci and Caspar Weinberger visited New Delhi after the US approved the transfer in 1988. The sale of a lower-end XMP-14 supercomputer was permitted in lieu of the Cray XMP-24 supercomputer due to security concerns. The Center for Development of Advanced Computing was formally established in March 1988 by the Ministry of Communications and Information Technology (previously the Ministry of IT) within the Department of Information Technology (formerly the Department of Electronics) in response to a recommendation made to the Prime Minister by the Scientific Advisory Council. The National Initiative in Supercomputing, which produced the PARAM series, was led by Vijay P. Bhatkar. For the first ten years, supercomputing and Indian language computing were the two main focus areas. C-DAC has expanded its operations in order to meet the needs in a number of domains, including network and internet software, real-time systems, artificial intelligence, and NLP. Under the direction of Professor KV Ramakrishnamacharyulu from National Sanskrit University and Professor Rajeev Sangal from the International Institute of Information Technology, Hyderabad, the Akshar Bharati Research Group was established in 1984 with support from IIT Kanpur and the University of Hyderabad for computational processing of Indian languages. They focused on computational linguistics, NLP with ontological database systems, and Indian language/translation theories with linguistic tradition. ==== 1990s ==== From IIT Kanpur, Mohan Tambe joined C-DAC in the 1990s to work on Graphics and Intelligence based Script Technology (GIST), which addressed the challenge of adapting personal computer software based on Latin script to Devanagiri and a number of other Indian language scripts. He was previously working on the Machine Translation for Indian languages Project. Within C-DAC, he established the GIST group. The technology was expanded to encompass NLP, artificial intelligence-based machine-aided language learning and translation, multimedia and multilingual computing solutions, and more. GIST resulted in the creation of G-CLASS (GIST cross language search plug-ins suite), a cross-language search engine. The Applied Artificial Intelligence Group at C-DAC has developed some basic and novel applications in the field of NLP, including machine translation, information extraction/retrieval, automatic summarization, speech recognition, text-to-speech synthesis, intelligent language teaching, and natural language-based document management with Decision Support Systems. These applications are the result of the foundation laid by previous language technology activities. Software firms in the Indian private sector began looking into AI applications, mostly in the area of business process automation. In order to allow machines to read, comprehend, and interpret human languages, the Language Technologies Research Center was founded in October 1999 at the International Institute of Information Technology, Hyderabad. It focused on the advancements in semantic parsing, information extraction, natural language generation, sentiment analysis, and dialogue systems. Some of the early AI research in India was driven by societal needs. For example; Eklavya, a knowledge-based program created by I
Chandy–Misra–Haas algorithm resource model
The Chandy–Misra–Haas algorithm resource model checks for deadlock in a distributed system. It was developed by K. Mani Chandy, Jayadev Misra and Laura M. Haas. == Locally dependent == Consider the n processes P1, P2, P3, P4, P5,, ... ,Pn which are performed in a single system (controller). P1 is locally dependent on Pn, if P1 depends on P2, P2 on P3, so on and Pn−1 on Pn. That is, if P 1 → P 2 → P 3 → … → P n {\displaystyle P_{1}\rightarrow P_{2}\rightarrow P_{3}\rightarrow \ldots \rightarrow P_{n}} , then P 1 {\displaystyle P_{1}} is locally dependent on P n {\displaystyle P_{n}} . If P1 is said to be locally dependent to itself if it is locally dependent on Pn and Pn depends on P1: i.e. if P 1 → P 2 → P 3 → … → P n → P 1 {\displaystyle P_{1}\rightarrow P_{2}\rightarrow P_{3}\rightarrow \ldots \rightarrow P_{n}\rightarrow P_{1}} , then P 1 {\displaystyle P_{1}} is locally dependent on itself. == Description == The algorithm uses a message called probe(i,j,k) to transfer a message from controller of process Pj to controller of process Pk. It specifies a message started by process Pi to find whether a deadlock has occurred or not. Every process Pj maintains a boolean array dependent which contains the information about the processes that depend on it. Initially the values of each array are all "false". === Controller sending a probe === Before sending, the probe checks whether Pj is locally dependent on itself. If so, a deadlock occurs. Otherwise it checks whether Pj, and Pk are in different controllers, are locally dependent and Pj is waiting for the resource that is locked by Pk. Once all the conditions are satisfied it sends the probe. === Controller receiving a probe === On the receiving side, the controller checks whether Pk is performing a task. If so, it neglects the probe. Otherwise, it checks the responses given Pk to Pj and dependentk(i) is false. Once it is verified, it assigns true to dependentk(i). Then it checks whether k is equal to i. If both are equal, a deadlock occurs, otherwise it sends the probe to next dependent process. == Algorithm == In pseudocode, the algorithm works as follows: === Controller sending a probe === if Pj is locally dependent on itself then declare deadlock else for all Pj,Pk such that (i) Pi is locally dependent on Pj, (ii) Pj is waiting for 'Pk and (iii) Pj, Pk are on different controllers. send probe(i, j, k). to home site of Pk === Controller receiving a probe === if (i)Pk is idle / blocked (ii) dependentk(i) = false, and (iii) Pk has not replied to all requests of to Pj then begin "dependents""k"(i) = true; if k == i then declare that Pi is deadlocked else for all Pa,Pb such that (i) Pk is locally dependent on Pa, (ii) Pa is waiting for 'Pb and (iii) Pa, Pb are on different controllers. send probe(i, a, b). to home site of Pb end == Example == P1 initiates deadlock detection. C1 sends the probe saying P2 depends on P3. Once the message is received by C2, it checks whether P3 is idle. P3 is idle because it is locally dependent on P4 and updates dependent3(2) to True. As above, C2 sends probe to C3 and C3 sends probe to C1. At C1, P1 is idle so it update dependent1(1) to True. Therefore, deadlock can be declared. == Complexity == Suppose there are n {\displaystyle n} controllers and m {\displaystyle m} processes, at most m ( n − 1 ) / 2 {\displaystyle m(n-1)/2} messages need to be exchanged to detect a deadlock, with a delay of O ( n ) {\displaystyle O(n)} messages.
Geospatial metadata
Geospatial metadata (also geographic metadata) is a type of metadata applicable to geographic data and information. Such objects may be stored in a geographic information system (GIS) or may simply be documents, data-sets, images or other objects, services, or related items that exist in some other native environment but whose features may be appropriate to describe in a (geographic) metadata catalog (may also be known as a data directory or data inventory). == Definition == ISO 19115:2013 "Geographic Information – Metadata" from ISO/TC 211, the industry standard for geospatial metadata, describes its scope as follows: [This standard] provides information about the identification, the extent, the quality, the spatial and temporal aspects, the content, the spatial reference, the portrayal, distribution, and other properties of digital geographic data and services. ISO 19115:2013 also provides for non-digital mediums: Though this part of ISO 19115 is applicable to digital data and services, its principles can be extended to many other types of resources such as maps, charts, and textual documents as well as non-geographic data. The U.S. Federal Geographic Data Committee (FGDC) describes geospatial metadata as follows: A metadata record is a file of information, usually presented as an XML document, which captures the basic characteristics of a data or information resource. It represents the who, what, when, where, why and how of the resource. Geospatial metadata commonly document geographic digital data such as Geographic Information System (GIS) files, geospatial databases, and earth imagery but can also be used to document geospatial resources including data catalogs, mapping applications, data models and related websites. Metadata records include core library catalog elements such as Title, Abstract, and Publication Data; geographic elements such as Geographic Extent and Projection Information; and database elements such as Attribute Label Definitions and Attribute Domain Values. == History == The growing appreciation of the value of geospatial metadata through the 1980s and 1990s led to the development of a number of initiatives to collect metadata according to a variety of formats either within agencies, communities of practice, or countries/groups of countries. For example, NASA's "DIF" metadata format was developed during an Earth Science and Applications Data Systems Workshop in 1987, and formally approved for adoption in 1988. Similarly, the U.S. FGDC developed its geospatial metadata standard over the period 1992–1994. The Spatial Information Council of Australia and New Zealand (ANZLIC), a combined body representing spatial data interests in Australia and New Zealand, released version 1 of its "metadata guidelines" in 1996. ISO/TC 211 undertook the task of harmonizing the range of formal and de facto standards over the approximate period 1999–2002, resulting in the release of ISO 19115 "Geographic Information – Metadata" in 2003 and a subsequent revision in 2013. As of 2011 individual countries, communities of practice, agencies, etc. have started re-casting their previously used metadata standards as "profiles" or recommended subsets of ISO 19115, occasionally with the inclusion of additional metadata elements as formal extensions to the ISO standard. The growth in popularity of Internet technologies and data formats, such as Extensible Markup Language (XML) during the 1990s led to the development of mechanisms for exchanging geographic metadata on the web. In 2004, the Open Geospatial Consortium released the current version (3.1) of Geography Markup Language (GML), an XML grammar for expressing geospatial features and corresponding metadata. With the growth of the Semantic Web in the 2000s, the geospatial community has begun to develop ontologies for representing semantic geospatial metadata. Some examples include the Hydrology and Administrative ontologies developed by the Ordnance Survey in the United Kingdom. == ISO 19115: Geographic information – Metadata == ISO 19115 is a standard of the International Organization for Standardization (ISO). The standard is part of the ISO geographic information suite of standards (19100 series). ISO 19115 and its parts define how to describe geographical information and associated services, including contents, spatial-temporal purchases, data quality, access and rights to use. The objective of this International Standard is to provide a clear procedure for the description of digital geographic data-sets so that users will be able to determine whether the data in a holding will be of use to them and how to access the data. By establishing a common set of metadata terminology, definitions and extension procedures, this standard promotes the proper use and effective retrieval of geographic data. ISO 19115 was revised in 2013 to accommodate growing use of the internet for metadata management, as well as add many new categories of metadata elements (referred to as codelists) and the ability to limit the extent of metadata use temporally or by user. == ISO 19139 Geographic information Metadata XML schema implementation == ISO 19139:2012 provides the XML implementation schema for ISO 19115 specifying the metadata record format and may be used to describe, validate, and exchange geospatial metadata prepared in XML. The standard is part of the ISO geographic information suite of standards (19100 series), and provides a spatial metadata XML (spatial metadata eXtensible Mark-up Language (smXML)) encoding, an XML schema implementation derived from ISO 19115, Geographic information – Metadata. The metadata includes information about the identification, constraint, extent, quality, spatial and temporal reference, distribution, lineage, and maintenance of the digital geographic data-set. == Metadata directories == Also known as metadata catalogues or data directories. (need discussion of, and subsections on GCMD, FGDC metadata gateway, ASDD, European and Canadian initiatives, etc. etc.) GIS Inventory – National GIS Inventory System which is maintained by the US-based National States Geographic Information Council (NSGIC) as a tool for the entire US GIS Community. Its primary purpose is to track data availability and the status of geographic information system (GIS) implementation in state and local governments to aid the planning and building of statewide spatial data infrastructures (SSDI). The Random Access Metadata for Online Nationwide Assessment (RAMONA) database is a critical component of the GIS Inventory. RAMONA moves its FGDC-compliant metadata (CSDGM Standard) for each data layer to a web folder and a Catalog Service for the Web (CSW) that can be harvested by Federal programs and others. This provides far greater opportunities for discovery of user information. The GIS Inventory website was originally created in 2006 by NSGIC under award NA04NOS4730011 from the Coastal Services Center, National Oceanic and Atmospheric Administration, U.S. Department of Commerce. The Department of Homeland Security has been the principal funding source since 2008 and they supported the development of the Version 5 during 2011/2012 under Order Number HSHQDC-11-P-00177. The Federal Emergency Management Agency and National Oceanic and Atmospheric Administration have provided additional resources to maintain and improve the GIS Inventory. Some US Federal programs require submission of CSDGM-Compliant Metadata for data created under grants and contracts that they issue. The GIS Inventory provides a very simple interface to create the required Metadata. GCMD - Global Change Master Directory's goal is to enable users to locate and obtain access to Earth science data sets and services relevant to global change and Earth science research. The GCMD database holds more than 20,000 descriptions of Earth science data sets and services covering all aspects of Earth and environmental sciences. ECHO - The EOS Clearing House (ECHO) is a spatial and temporal metadata registry, service registry, and order broker. It allows users to more efficiently search and access data and services through the Reverb Client or Application Programmer Interfaces (APIs). ECHO stores metadata from a variety of science disciplines and domains, totalling over 3400 Earth science data sets and over 118 million granule records. GoGeo - GoGeo is a service run by EDINA (University of Edinburgh) and is supported by Jisc. GoGeo allows users to conduct geographically targeted searches to discover geospatial datasets. GoGeo searches many data portals from the HE and FE community and beyond. GoGeo also allows users to create standards compliant metadata through its Geodoc metadata editor. == Geospatial metadata tools == There are many proprietary GIS or geospatial products that support metadata viewing and editing on GIS resources. For example, ESRI's ArcGIS Desktop, SOCET GXP, Autodesk's AutoCAD Map 3D 2008, Arcitecta's Mediaflux and Intergraph's Geo
Meta-Labeling
Meta-labeling, also known as corrective AI, is a machine learning (ML) technique utilized in quantitative finance to enhance the performance of investment and trading strategies, developed in 2017 by Marcos López de Prado at Guggenheim Partners and Cornell University. The core idea is to separate the decision of trade direction (side) from the decision of trade sizing, addressing the inefficiencies of simultaneously learning both side and size predictions. The side decision involves forecasting market movements (long, short, neutral), while the size decision focuses on risk management and profitability. It serves as a secondary decision-making layer that evaluates the signals generated by a primary predictive model. By assessing the confidence and likely profitability of those signals, meta-labeling allows investors and algorithms to dynamically size positions and suppress false positives. == Motivation == Meta-labeling is designed to improve precision without sacrificing recall. As noted by López de Prado, attempting to model both the direction and the magnitude of a trade using a single algorithm can result in poor generalization. By separating these tasks, meta-labeling enables greater flexibility and robustness: Enhances control over capital allocation. Reduces overfitting by limiting model complexity. Allows the use of interpretability tools and tailored thresholds to manage risk. Enables dynamic trade suppression in unfavorable regimes. == Applications == Meta-labeling has been applied in a variety of financial ML contexts, including: Algorithmic trading: Filtering and sizing trades to reduce false positives. Portfolio optimization: Scaling exposure across multiple signals with differing confidence levels. Risk management: Dynamically disabling strategies in adverse market conditions. Model validation: Interpreting when and why a model may be underperforming due to regime shifts. == General architecture == Meta-labeling decouples two core components of systematic trading strategies: directional prediction and position sizing. The process involves training a primary model to generate trade signals (e.g., buy, sell, or hold) and then training a secondary model to determine whether each signal is likely to lead to a profitable trade. The second model outputs a probability that is interpreted as the confidence in the forecast, which can be used to adjust the position size or to filter out unreliable trades. Meta-labeling is typically implemented as a three-stage process: Primary model (M1): Predicts the direction or label of a financial outcome using features such as market prices, returns, or volatility indicators. A typical output is directional, e.g., Y ∈ {−1,0,1}, representing short, neutral, or long positions. Secondary model (M2): A binary classifier trained to predict whether the primary model's prediction will be profitable. The target variable is a binary meta-label F ∈ { 0 , 1 } {\displaystyle F\in \{0,1\}} . Inputs can include features used in the primary model, performance diagnostics, or market regime data. Position sizing algorithm (M3): Translates the output probability of the secondary model into a position size. Higher confidence scores result in larger allocations, while lower confidence leads to reduced or zero exposure. === Stage 1: Forecasting side === Primary model architecture Figure 1 Figure 1 presents the architecture of a primary model. It focuses on forecasting the side of the trade. Following the example, this model (M1) takes in input data – such as open-high-low-close data and determines the side of the position to take: a negative number is a short position, and positive number is a long position, the range is set between −1 and 1 (the closer it is to −1 or 1, the stronger the models conviction is). When training the model, the labels are −1 and 1, based on the direction of forward returns for some predefined investment horizon. The researcher may decide to apply a recall check (τ: "Tau") by setting a minimum threshold that the initial output needs to be to qualify of a short or long position (if the threshold is not met, no side forecast is predicted, leading to closing of any open positions), this leads to the primary model output which is one of three possible side forecasts: −1, 0, or 1. The primary model also generates evaluation data which can be used by the secondary model, to improve performance of size forecasts. Some examples of evaluation data include rolling accuracy, F1, recall, precision, and AUC scores. === Stage 2: Filtering out false positives === General meta-labeling architecture Figure 2 Next comes the phase of filtering out false positives, by applying a secondary machine learning model (M2), which is a binary classifier trained to determine if the trade will be profitable or not. The model takes as input four general groupings of data: General input data which is predictive of a false positive. For example the last 30 days rolling volatility of the underlying asset. Evaluation data. Market state and regime data, one may find that macro economic data or clustering the market into regimes may help as specific trading strategies are known to perform better in particular regimes. Example: momentum based strategies perform best in periods with low volatility and strong directional moves. Primary models initial input which is a value between −1 and 1. This highlights the strength of the primary models conviction. The output of the model is a value between −1 and 1 (if using a Tanh function) which will indicate the strength of the conviction that a short or long position is profitable, or it could simply be between 0 and 1 (using a sigmoid function) if one only wanted to know if it made money or not. This output allows filtering out trades that are likely to lead to losses. One could stop at this point or use the outputs of the secondary model as inputs to a position sizing algorithm (M3) which could further enhance strategy performance metrics by translating the output probability of the secondary model into a position size. Higher confidence scores result in larger allocations, while lower confidence leads to reduced or zero exposure. === Stage 3: Optimizing position sizes === ==== Position sizing methods (M3) ==== Various algorithms have been proposed for transforming predicted probabilities into trade sizes: All-or-nothing: Allocate 100% of capital if the probability exceeds a predefined threshold (e.g., 0.5); otherwise, do not trade. Model confidence: Use the probability score directly as the fraction of capital allocated. Linear scaling: Rescale the model's probabilities using min-max normalization based on the training data. Normal CDF (NCDF): Use a normal cumulative distribution function applied to a z-statistic derived from the predicted probability. Empirical CDF (ECDF): Rank probabilities based on their percentile in the training data to ensure relative allocation. Sigmoid Optimal Position Sizing (SOPS): Applies a smooth non-linear sigmoid transformation optimized to maximize risk-adjusted returns (Sharpe ratio). ==== Model calibration ==== Each machine learning algorithm used in meta-labeling tends to produce outputs with different characteristic distributions; for example, some are approximately normally distributed, whereas others exhibit a pronounced U-shape, concentrating probabilities near the extremes. Due to these varying distributions, simply summing the outputs of different models can inadvertently lead to uneven weighting of signals, biasing trade decisions. To address this, model calibration techniques are essential to adjust the predicted probabilities towards frequentist probabilities, ensuring that model outputs reflect true likelihoods more accurately. Two common calibration techniques are: Platt scaling (Sigmoid scaling): Suitable for correcting S-shaped calibration plots typically produced by models such as support vector machines (SVMs). Isotonic regression: Fits a non-decreasing step function to probabilities and is effective particularly with larger datasets, though it can sometimes lead to overfitting. Transforming predictions to frequentist probabilities is crucial as it provides probabilistic outputs that are directly interpretable as the actual likelihood of an event occurring. Such calibration significantly enhances the effectiveness of fixed position sizing methods, reducing maximum drawdowns and increasing risk-adjusted returns. However, calibration has less impact on position sizing methods that directly estimate parameters from the training data, such as ECDF and SOPS, suggesting that calibration is a critical step mainly for fixed methods that rely heavily on raw model outputs. =
Document
A document is a written, drawn, presented, or memorialized representation of thought, often the manifestation of non-fictional, as well as fictional, content. The etymology of the word "document" derives from the Latin documentum, which denotes a "teaching" or "lesson": the verb doceō denotes "to teach". Historically, the term "document" was usually used to indicate written proof useful as evidence of a truth or fact. In the Computer Age, the term "document" typically refers to a primarily textual computer file, encompassing its structural and format elements, such as fonts, colors, and images. In the contemporary era, the definition of "document" has expanded beyond its traditional medium, such as paper, to encompass electronic documents as well. History, events, examples, opinions, stories, and creativity can all be expressed in documents. "Documentation" is distinct because it has more denotations than "document". Documents are also distinguished from "realia", which are three-dimensional objects that would otherwise satisfy the definition of "document" because they memorialize or represent thought. Documents are usually considered to be two-dimensional representations. == Abstract definitions == The concept of "document" has been defined by Suzanne Briet as "any concrete or symbolic indication, preserved or recorded, for reconstructing or for proving a phenomenon, whether physical or mental." An often-cited article concludes that "the evolving notion of document" among Jonathan Priest, Paul Otlet, Briet, Walter Schürmeyer, and the other documentalists increasingly emphasized whatever functioned as a document rather than traditional physical forms of documents. The shift to digital technology would seem to make this distinction even more important. David M. Levy has said that an emphasis on the technology of digital documents has impeded our understanding of digital documents as documents. A conventional document, such as a mail message or a technical report, exists physically in digital technology as a string of bits, as does everything else in a digital environment. As an object of study, it has been made into a document. It has become physical evidence by those who study it. "Document" is defined in library and information science and documentation science as a fundamental, abstract idea: the word denotes everything that may be represented or memorialized to serve as evidence. The classic example provided by Briet is an antelope: "An antelope running wild on the plains of Africa should not be considered a document[;] she rules. But if it were to be captured, taken to a zoo and made an object of study, it has been made into a document. It has become physical evidence being used by those who study it. Indeed, scholarly articles written about the antelope are secondary documents, since the antelope itself is the primary document." This opinion has been interpreted as an early expression of actor–network theory. == Kinds == A document can be structured, like tabular documents, lists, forms, or scientific charts, semi-structured like a book or a newspaper article, or unstructured like a handwritten note. Documents are sometimes classified as secret, private, or public. They may also be described as drafts or proofs. When a document is copied, the source is denominated the "original". Documents are used in numerous fields, e.g.: Academia: manuscript, thesis, paper, journal, chart, and technical drawing Media: mock-up, script, image, photography, and newspaper article Administration, law, and politics: application, brief, certificate, commission, constitutional document, form, gazette, identity document, license, manifesto, summons, census, and white paper Business: invoice, request for proposal, proposal, contract, packing slip, manifest, report (detailed and summary), spreadsheet, material safety data sheet, waybill, bill of lading, financial statement, nondisclosure agreement (NDA), mutual nondisclosure agreement, and user guide Geography and planning: topographic map, cadastre, legend, and architectural plan Such standard documents can be drafted based on a template. == Drafting == The page layout of a document is how information is graphically arranged in the space of the document, e.g., on a page. If the appearance of the document is of concern, the page layout is generally the responsibility of a graphic designer. Typography concerns the design of letter and symbol forms and their physical arrangement in the document (see typesetting). Information design concerns the effective communication of information, especially in industrial documents and public signs. Simple textual documents may not require visual design and may be drafted only by an author, clerk, or transcriber. Forms may require a visual design for their initial fields, but not to complete the forms. == Media == Traditionally, the medium of a document was paper and the information was applied to it in ink, either by handwriting (to make a manuscript) or by a mechanical process (e.g., a printing press or laser printer). Today, some short documents also may consist of sheets of paper stapled together. Historically, documents were inscribed with ink on papyrus (starting in ancient Egypt) or parchment; scratched as runes or carved on stone using a sharp tool, e.g., the Tablets of Stone described in the Bible; stamped or incised in clay and then baked to make clay tablets, e.g., in the Sumerian and other Mesopotamian civilizations. The papyrus or parchment was often rolled into a scroll or cut into sheets and bound into a codex (book). Contemporary electronic means of memorializing and displaying documents include: Monitor of a desktop computer, laptop, tablet; optionally with a printer to produce a hard copy; Personal digital assistant; Dedicated e-book device; Electronic paper, typically, using the Portable Document Format (PDF); Information appliance; Digital audio player; and Radio and television service provider. Digital documents usually require a specific file format to be presentable in a specific medium. == In law == Documents in all forms frequently serve as material evidence in criminal and civil proceedings. The forensic analysis of such a document is within the scope of questioned document examination. To catalog and manage the large number of documents that may be produced during litigation, Bates numbering is often applied to all documents in the lawsuit so that each document has a unique, arbitrary, identification number.