Density-based clustering validation

Density-based clustering validation

Density-Based Clustering Validation (DBCV) is a metric designed to assess the quality of clustering solutions, particularly for density-based clustering algorithms like DBSCAN, Mean shift, and OPTICS. This metric is particularly suited for identifying concave and nested clusters, where traditional metrics such as the Silhouette coefficient, Davies–Bouldin index, or Calinski–Harabasz index often struggle to provide meaningful evaluations. Unlike traditional validation measures, which often rely on compact and well-separated clusters, DBCV index evaluates how well clusters are defined in terms of local density variations and structural coherence. This metric was introduced in 2014 by David Moulavi and colleagues in their work. It utilizes density connectivity principles to quantify clustering structures, making it especially effective at detecting arbitrarily shaped clusters in concave datasets, where traditional metrics may be less reliable. The DBCV index has been employed for clustering analysis in bioinformatics, ecology, techno-economy, and health informatics , as well as in numerous other fields. == Definition == DBCV index evaluates clustering structures by analyzing the relationships between data points within and across clusters. Given a dataset X = x 1 , x 2 , . . . , x n {\displaystyle X={x_{1},x_{2},...,x_{n}}} , a density-based algorithm partitions it into K clusters C 1 , C 2 , . . . , C K {\displaystyle {C_{1},C_{2},...,C_{K}}} . Each point x i {\displaystyle x_{i}} belongs to a specific cluster, denoted as C c l u s t e r ( x i ) {\displaystyle C_{cluster(x_{i})}} A key concept in DBCV index is the notion of density-connected paths. Two points within the same cluster are considered density-connected if there exists a sequence of intermediate points linking them, where each consecutive pair meets a predefined density criterion. The density-based distance between two points is determined by identifying the optimal path that minimizes the maximum local reachability distance along its trajectory. DBCV index extends the Silhouette coefficient by redefining cluster cohesion and separation using density-based distances: Within-cluster density distance measures how closely a point is related to other members of its cluster: a i = 1 | C c l u s t e r ( x i ) | − 1 ∑ x j ∈ C c l u s t e r ( x i ) , y ≠ x d d e n s i t y ( x j , x i ) {\displaystyle a_{i}={\frac {1}{|C_{cluster(x_{i})}|-1}}\sum _{x_{j}\in C_{cluster(x_{i})},y\neq x}d_{density}(x_{j},x_{i})} Nearest-cluster density distance quantifies how far a point is from the closest external cluster: b i = min C ≠ C cluster ( x i ) C ∈ { C 1 , … , C K } ( 1 | C | ∑ x j ∈ C d density ( x i , x j ) ) . {\displaystyle b_{i}=\min _{C\neq C_{{\text{cluster}}(x_{i})} \atop C\in \{C_{1},\dots ,C_{K}\}}\left({\frac {1}{|C|}}\sum _{x_{j}\in C}d_{\text{density}}(x_{i},x_{j})\right).} Using these measures, the DBCV index is computed as: D B C V = 1 n ∑ i = 1 n b i − a i max ( a i , b i ) {\displaystyle DBCV={\frac {1}{n}}\sum _{i=1}^{n}{\frac {b_{i}-a_{i}}{\max(a_{i},b_{i})}}} == Explanation == DBCV index values range between −1 and +1: +1: Strongly cohesive and well-separated clusters. 0: Ambiguous clustering structure. −1: Poorly formed clusters or incorrect assignments. By leveraging density-based distances instead of traditional Euclidean measures, DBCV index provides a more robust evaluation of clustering performance in datasets with irregular or non-spherical distributions.

Concordancer

A concordancer is a computer program that automatically constructs a concordance—an alphabetised index of every occurrence of a word or phrase in a body of text, each entry displayed with its surrounding context. Concordancers are primary tools in corpus linguistics, lexicography, computer-assisted translation, and language teaching. The most common display format is the key word in context (KWIC) layout, in which each hit appears centred on a line with a fixed span of words to its left and right, enabling rapid scanning of usage patterns across many occurrences. == History == === Pre-computational concordances === The compilation of concordances predates computers by many centuries. Around 1230, the French Dominican cardinal Hugh of Saint-Cher directed a team of friars in assembling a concordance of the Latin Vulgate Bible, generally regarded as the first systematic concordance of any text. To help readers locate passages, Hugh divided each biblical chapter into lettered sections. Later milestones include a Hebrew Old Testament concordance compiled by Rabbi Mordecai Nathan (1448), Alexander Cruden's Complete Concordance to the Holy Scriptures (1737), and the manuscript Asaf ha-Mazkir, an unfinished concordance to the Babylonian Talmud compiled by Moses Rigotz around the turn of the 19th century. === First computer concordance === The first concordance produced with computing assistance was the Index Thomisticus, a comprehensive lexical index of the writings of and around Thomas Aquinas, totalling approximately 10.6 million Latin words. The Italian Jesuit priest Roberto Busa conceived the project in 1946 and secured the sponsorship of IBM in 1949 after a meeting with chairman Thomas J. Watson. Keypunch operators in Gallarate, Italy, encoded the texts onto punched cards from around 1950. IBM executive Paul Tasman developed the processing methods. The full 56-volume printed edition was completed around 1980, followed by a CD-ROM edition in 1989 and a web-accessible version in 2005. === The KWIC format === The key word in context (KWIC) display was formalised as a computational technique by Hans Peter Luhn, a researcher at IBM, in a 1960 paper in American Documentation. In KWIC output, each instance of the search term (the node word) is centred on a line with a fixed window of words to each side; sorting the resulting lines alphabetically by the immediately adjacent word reveals collocational and phraseological patterns at a glance. === COCOA === One of the first dedicated concordancing programs was COCOA (COunt and COncordance Generation on Atlas), created in 1965 by D. B. Russell at University College London and the Atlas Computer Laboratory in Harwell, Oxfordshire. Written in approximately 4,000 cards of FORTRAN, it processed text annotated with flat, non-hierarchical markup tags and could produce word counts and concordances in multiple languages. Within its first six months COCOA had been applied to texts in at least six languages. A second version designed for multiple mainframe platforms was distributed to British computing centres in the mid-1970s. Growing dissatisfaction with its interface and the eventual withdrawal of Atlas Laboratory support prompted British funding bodies to commission a successor program. === Oxford Concordance Program === The Oxford Concordance Program (OCP) was designed and written in FORTRAN by Susan Hockey and Ian Marriott at Oxford University Computing Services (OUCS) between 1979 and 1980 and first released in 1981. Hockey and Marriott acknowledged that OCP owed much to COCOA and the CLOC system at the University of Birmingham. OCP accepted COCOA-format markup to encode metadata such as author, act, scene, and line number, and was described by its authors as "a machine-independent text analysis program for producing word lists, indices and concordances in a variety of languages and alphabets." By the mid-1980s it had been licensed to approximately 240 institutions in 23 countries. A personal computer version, Micro-OCP, was developed for the IBM PC and sold by Oxford University Press from the late 1980s. Version 2 was rewritten in 1985–86 and documented in the same 1987 article by Hockey and co-author John Martin. === Personal computer era === The availability of affordable personal computers in the 1980s and 1990s enabled standalone concordancing applications that analysts could run locally without specialist computing facilities. MicroConcord, developed by Mike Scott and Tim Johns and published by Oxford University Press in 1993 for MS-DOS, was among the first concordancers designed specifically for classroom language teaching. WordSmith Tools, also developed by Mike Scott, was first released in 1996 and became one of the most widely used corpus analysis suites in academic linguistics research. Other tools from this era include TACT (University of Toronto, 1989), a suite of MS-DOS freeware programs for literary text analysis, and MonoConc, a Windows concordancer created by Michael Barlow. === Web-based concordancers === From the late 1990s onwards, web-based concordancers hosted on remote servers gave researchers browser access to large preloaded corpora without requiring local storage or processing. The Sketch Engine, developed by Adam Kilgarriff and Pavel Rychlý (Masaryk University), was launched commercially in July 2003 by Lexical Computing Limited and introduced word sketches—automatically generated one-page profiles of a word's typical grammatical relations and collocations. AntConc, created by Laurence Anthony at Waseda University, Tokyo, was first released in 2002 as freeware for Windows, macOS, and Linux. == Features == Modern concordancers typically offer a range of analytical functions beyond basic KWIC display. These commonly include: KWIC display with the node word centred and context words in aligned columns, sortable by the word one, two, or three positions to the left or right of the node (L1–L3 and R1–R3) Concordance plots, visualising the distribution of hits as marks along a scaled bar representing each text in the corpus Frequency and word lists, both alphabetical and ranked by frequency Collocation statistics, identifying words that co-occur with the search term more often than chance, quantified by measures such as mutual information, the t-score, or log-likelihood Keyword analysis, comparing word frequencies between a study corpus and a reference corpus to identify statistically distinctive items N-gram analysis, finding frequently recurring word sequences of a specified length Part-of-speech tagging integration, allowing searches filtered to particular grammatical categories Unicode support for multilingual text Bilingual and parallel concordancers additionally display aligned text in two or more languages side by side, enabling comparison of translation equivalents across language pairs. == Notable concordancers == === WordSmith Tools === Created by Mike Scott and first released in 1996, WordSmith Tools is a Windows corpus analysis suite that evolved from MicroConcord. Its three core modules are Concord (KWIC concordances), WordList (frequency and alphabetical word lists), and Keywords (statistical keyword identification relative to a reference corpus). Oxford University Press used WordSmith Tools for dictionary preparation work. Version 4.0 is freely available; later versions are sold by Lexical Analysis Software Limited. === AntConc === AntConc is a freeware, multiplatform concordancing toolkit created by Laurence Anthony, Professor of Applied Linguistics at Waseda University, Tokyo. First released in 2002 and formally described in a 2005 academic paper, it runs on Windows, macOS, and Linux. Its tools include a KWIC concordancer, a concordance plot for visualising distribution across texts, a collocates tool, a keyword list, and an n-gram analysis module. Because it is free and requires only plain text files, AntConc is widely used in linguistics courses and independent research worldwide. === Sketch Engine === The Sketch Engine is a corpus management and query system co-created by Adam Kilgarriff and Pavel Rychlý and launched in 2003 by Lexical Computing Limited. It provides browser-based access to over 800 corpora in more than 100 languages. Beyond concordance searching, it offers word sketches, collocation analysis, distributional thesaurus construction, keyword and terminology extraction, and diachronic analysis. It is used by major publishers including Macmillan and Oxford University Press for lexicographic research. A subset tool, SKELL (Sketch Engine for Language Learning), is freely accessible to individual learners. === Wmatrix === Wmatrix is a web-based corpus processing environment developed by Paul Rayson at the University Centre for Computer Corpus Research on Language (UCREL), Lancaster University. Alongside concordances and frequency lists, Wmatrix integrates CLAWS part-of-speech tagging and the USAS semantic tagger, enabling keyword analysis simultane

Fuzzy markup language

Fuzzy Markup Language (FML) is a specific purpose markup language based on XML, used for describing the structure and behavior of a fuzzy system independently of the hardware architecture devoted to host and run it. == Overview == FML was designed and developed by Giovanni Acampora during his Ph.D. course in Computer Science, at University of Salerno, Italy, in 2004. The original idea inspired Giovanni Acampora to create FML was the necessity of creating a cooperative fuzzy-based framework aimed at automatically controlling a living environment characterized by a plethora of heterogeneous devices whose interactions were devoted to maximize the human comfort under energy saving constraints. This framework represented one of the first concrete examples of Ambient Intelligence. Beyond this pioneering application, the major advantage of using XML to describe a fuzzy system is hardware/software interoperability. Indeed, all that is needed to read an FML file is the appropriate schema for that file, and an FML parser. This markup approach makes it much easier to exchange fuzzy systems between software: for example, a machine learning application could extract fuzzy rules which could then be read directly into a fuzzy inference engine or uploaded into a fuzzy controller. Also, with technologies like XSLT, it is possible to compile the FML into the programming language of your choice, ready for embedding into whatever application you please. As stated by Mike Watts on his popular Computational Intelligence blog: "Although Acampora's motivation for developing FML seems to be to develop embedded fuzzy controllers for ambient intelligence applications, FML could be a real boon for developers of fuzzy rule extraction algorithms: from my own experience during my PhD, I know that having to design a file format and implement the appropriate parsers for rule extraction and fuzzy inference engines can be a real pain, taking as much time as implementing the rule extraction algorithm itself. I would much rather have used something like FML for my work." A complete overview of FML and related applications can be found in the book titled On the power of Fuzzy Markup Language edited by Giovanni Acampora, Chang-Shing Lee, Vincenzo Loia and Mei-Hui Wang, and published by Springer in the series Studies on Fuzziness and Soft Computing. == Syntax, grammar and hardware synthesis == FML allows fuzzy systems to be coded through a collection of correlated semantic tags capable of modeling the different components of a classical fuzzy controller such as knowledge base, rule base, fuzzy variables and fuzzy rules. Therefore, the FML tags used to build a fuzzy controller represent the set of lexemes used to create fuzzy expressions. In order to design a well-formed XML-based language, an FML context-free grammar is defined by means of a XML schema which defines name, type and attributes characterized each XML element. However, since an FML program represents only a static view of a fuzzy logic controller, XSLT is provided to change this static view to a computable version. Indeed, XSLTs modules are able to convert the FML-based fuzzy controller in a general purpose computer language using an XSL file containing the translation description. At this level, the control is executable for the hardware. In short, FML is essentially composed by three layers: XML in order to create a new markup language for fuzzy logic control; a XML Schema in order to define the legal building blocks; eXtensible Stylesheet Language Transformations (XSLT) in order to convert a fuzzy controller description into a specific programming language. === Syntax === FML syntax is composed of XML tags and attributes which describe the different components of a fuzzy logic controller listed below: fuzzy knowledge base; fuzzy rule base; inference engine fuzzification subsystem; defuzzification subsystem. In detail, the opening tag of each FML program is which represents the fuzzy controller under modeling. This tag has two attributes: name and ip. The first attribute permits to specify the name of fuzzy controller and ip is used to define the location of controller in a computer network. The fuzzy knowledge base is defined by means of the tag which maintains the set of fuzzy concepts used to model the fuzzy rule base. In order to define the fuzzy concept related controlled system, tag uses a set of nested tags: defines the fuzzy concept; defines a linguistic term describing the fuzzy concept; a set of tags defining a shape of fuzzy sets are related to fuzzy terms. The attributes of tag are: name, scale, domainLeft, domainRight, type and, for only an output, accumulation, defuzzifier and defaultValue. The name attribute defines the name of fuzzy concept, for instance, temperature; scale is used to define the scale used to measure the fuzzy concept, for instance, Celsius degree; domainLeft and domainRight are used to model the universe of discourse of fuzzy concept, that is, the set of real values related to fuzzy concept, for instance [0°,40°] in the case of Celsius degree; the position of fuzzy concept into rule (consequent part or antecedent part) is defined by type attribute (input/output); accumulation attribute defines the method of accumulation that is a method that permits the combination of results of a variable of each rule in a final result; defuzzifier attribute defines the method used to execute the conversion from a fuzzy set, obtained after aggregation process, into a numerical value to give it in output to system; defaultValue attribute defines a real value used only when no rule has fired for the variable at issue. As for tag , it uses two attributes: name used to identify the linguistic value associate with fuzzy concept and complement, a boolean attribute that defines, if it is true, it is necessary to consider the complement of membership function defined by given parameters. Fuzzy shape tags, used to complete the definition of fuzzy concept, are: Every shaping tag uses a set of attributes which defines the real outline of corresponding fuzzy set. The number of these attributes depends on the chosen fuzzy set shape. In order to make an example, consider the Tipper Inference System described in Mathworks Matlab Fuzzy Logic Toolbox Tutorial. This Mamdani system is used to regulate the tipping in, for example, a restaurant. It has got two variables in input (food and service) and one in output (tip). FML code for modeling part of knowledge base of this fuzzy system containing variables food and tip is shown below. A special tag that can furthermore be used to define a fuzzy shape is . This tag is used to customize fuzzy shape (custom shape). The custom shape modeling is performed via a set of tags that lists the extreme points of geometric area defining the custom fuzzy shape. Obviously, the attributes used in tag are x and y coordinates. As for rule base component, FML allows to define a set of rule bases, each one of them describes a different behavior of system. The root of each rule base is modeled by tag which defines a fuzzy rule set. The tag uses five attributes: name, type, activationMethod, andMethod and orMethod. Obviously, the name attribute uniquely identifies the rule base. The type attribute permits to specify the kind of fuzzy controller (Mamdani or TSK) respect to the rule base at issue. The activationMethod attribute defines the method used to implication process; the andMethod and orMethod attribute define, respectively, the and and or algorithm to use by default. In order to define the single rule the tag is used. The attributes used by the tag are: name, connector, operator and weight. The name attribute permits to identify the rule; connector is used to define the logical operator used to connect the different clauses in antecedent part (and/or); operator defines the algorithm to use for chosen connector; weight defines the importance of rule during inference engine step. The definition of antecedent and consequent rule part is obtained by using and tags. tag is used to model the fuzzy clauses in antecedent and consequent part. This tag use the attribute modifier to describe a modification to term used in the clause. The possible values for this attribute are: above, below, extremely, intensify, more or less, norm, not, plus, slightly, somewhat, very, none. To complete the definition of fuzzy clause the nested and tags have to be used. A sequence of tags realizes a fuzzy rule base. As example, consider a Mamdani rule composed by (food is rancid) OR (servi

IJCAI Award for Research Excellence

The IJCAI Award for Research Excellence is a biannual award before given at the IJCAI conference to researcher in artificial intelligence as a recognition of excellence of their career. Beginning in 2016, the conference is held annually and so is the award. == Laureates == The recipients of this award have been: John McCarthy (1985) Allen Newell (1989) Marvin Minsky (1991) Raymond Reiter (1993) Herbert A. Simon (1995) Aravind Joshi (1997) Judea Pearl (1999) Donald Michie (2001) Nils Nilsson (2003) Geoffrey E. Hinton (2005) Alan Bundy (2007) Victor R. Lesser (2009) Robert Kowalski (2011) Hector Levesque (2013) Barbara Grosz (2015) for her pioneering research in Natural Language Processing and in theories and applications of Multiagent Collaboration. Michael I. Jordan (2016) for his groundbreaking and impactful research in both the theory and application of statistical machine learning. Andrew Barto (2017) for his pioneering work in the theory of reinforcement learning. Jitendra Malik (2018) Yoav Shoham (2019) Eugene Freuder (2020) Richard S. Sutton (2021) Stuart J. Russell (2022) Sarit Kraus (2023) for her pioneering work of the study of interactions among self-interested agents, creating the field of automated negotiation, and developing methods for coalition formation and teamwork, both as formal models and real-world implementations. == Winners of also Turing Award == John McCarthy (1971) Allen Newell (1975) Marvin Minsky (1969) Herbert A. Simon (1975) Judea Pearl (2011) Geoffrey Hinton (2018) Andrew Barto (2024) Richard S. Sutton (2024)

Pippit

Pippit (Chinese: 小云雀; pinyin: Xiǎoyúnquè) is an artificial intelligence content creation platform developed by the Chinese technology company ByteDance. The platform, powered by CapCut leverages multimodal AI technology to streamline professional-grade video and image production, specifically targeting small and medium-sized enterprisesand social media creators. == History == In May 2025, ByteDance officially launched Pippit, which is positioned as an AI video and picture creation tool. In early 2026, Pippit underwent a major architectural overhaul with the integration of the Dreamina seedance 2.0. This technical milestone introduced the "Short Drama Agent" functionality, which enables the end-to-end conversion of scripts up to 100,000 words into fully rendered video productions.

Neural scaling law

In machine learning, a neural scaling law is an empirical scaling law that describes how neural network performance changes as key factors are scaled up or down. These factors typically include the number of parameters, training dataset size, and training cost. Some models also exhibit performance gains by scaling inference through increased test-time compute (TTC), extending neural scaling laws beyond training to the deployment phase. == Introduction == In general, a deep learning model can be characterized by four parameters: model size, training dataset size, training cost, and the post-training error rate (e.g., the test set error rate). Each of these variables can be defined as a real number, usually written as N , D , C , L {\displaystyle N,D,C,L} (respectively: parameter count, dataset size, computing cost, and loss). A neural scaling law is a theoretical or empirical statistical law between these parameters. There are also other parameters with other scaling laws. === Size of the model === In most cases, the model's size is simply the number of parameters. However, one complication arises with the use of sparse models, such as mixture-of-expert models. With sparse models, during inference, only a fraction of their parameters are used. In comparison, most other kinds of neural networks, such as transformer models, always use all their parameters during inference. === Size of the training dataset === The size of the training dataset is usually quantified by the number of data points within it. Larger training datasets are typically preferred, as they provide a richer and more diverse source of information from which the model can learn. This can lead to improved generalization performance when the model is applied to new, unseen data. However, increasing the size of the training dataset also increases the computational resources and time required for model training. With the "pretrain, then finetune" method used for most large language models, there are two kinds of training dataset: the pretraining dataset and the finetuning dataset. Their sizes have different effects on model performance. Generally, the finetuning dataset is less than 1% the size of pretraining dataset. In some cases, a small amount of high quality data suffices for finetuning, and more data does not necessarily improve performance. Many scaling laws, due to their inherent diminishing returns nature, value data based on a submodular set function which was shown in a paper on this topic. === Cost of training === Training cost is typically measured in terms of time (how long it takes to train the model) and computational resources (how much processing power and memory are required). It is important to note that the cost of training can be significantly reduced with efficient training algorithms, optimized software libraries, and parallel computing on specialized hardware such as GPUs or TPUs. The cost of training a neural network model is a function of several factors, including model size, training dataset size, the training algorithm complexity, and the computational resources available. In particular, doubling the training dataset size does not necessarily double the cost of training, because one may train the model for several times over the same dataset (each being an "epoch"). === Performance === The performance of a neural network model is evaluated based on its ability to accurately predict the output given some input data. Common metrics for evaluating model performance include: Negative log-likelihood per token (logarithm of perplexity) for language modeling; Accuracy, precision, recall, and F1 score for classification tasks; Mean squared error (MSE) or mean absolute error (MAE) for regression tasks; Elo rating in a competition against other models, such as gameplay or preference by a human judge. Performance can be improved by using more data, larger models, different training algorithms, regularizing the model to prevent overfitting, and early stopping using a validation set. When the performance is a number bounded within the range of [ 0 , 1 ] {\displaystyle [0,1]} , such as accuracy, precision, etc., it often scales as a sigmoid function of cost, as seen in the figures. == Examples == === (Hestness, Narang, et al, 2017) === The 2017 paper is a common reference point for neural scaling laws fitted by statistical analysis on experimental data. Previous works before the 2000s, as cited in the paper, were either theoretical or orders of magnitude smaller in scale. Whereas previous works generally found the scaling exponent to scale like L ∝ D − α {\displaystyle L\propto D^{-\alpha }} , with α ∈ { 0.5 , 1 , 2 } {\displaystyle \alpha \in \{0.5,1,2\}} , the paper found that α ∈ [ 0.07 , 0.35 ] {\displaystyle \alpha \in [0.07,0.35]} . Of the factors they varied, only task can change the exponent α {\displaystyle \alpha } . Changing the architecture optimizers, regularizers, and loss functions, would only change the proportionality factor, not the exponent. For example, for the same task, one architecture might have L = 1000 D − 0.3 {\displaystyle L=1000D^{-0.3}} while another might have L = 500 D − 0.3 {\displaystyle L=500D^{-0.3}} . They also found that for a given architecture, the number of parameters necessary to reach lowest levels of loss, given a fixed dataset size, grows like N ∝ D β {\displaystyle N\propto D^{\beta }} for another exponent β {\displaystyle \beta } . They studied machine translation with LSTM ( α ∼ 0.13 {\displaystyle \alpha \sim 0.13} ), generative language modelling with LSTM ( α ∈ [ 0.06 , 0.09 ] , β ≈ 0.7 {\displaystyle \alpha \in [0.06,0.09],\beta \approx 0.7} ), ImageNet classification with ResNet ( α ∈ [ 0.3 , 0.5 ] , β ≈ 0.6 {\displaystyle \alpha \in [0.3,0.5],\beta \approx 0.6} ), and speech recognition with two hybrid (LSTMs complemented by either CNNs or an attention decoder) architectures ( α ≈ 0.3 {\displaystyle \alpha \approx 0.3} ). === (Henighan, Kaplan, et al, 2020) === A 2020 analysis studied statistical relations between C , N , D , L {\displaystyle C,N,D,L} over a wide range of values and found similar scaling laws, over the range of N ∈ [ 10 3 , 10 9 ] {\displaystyle N\in [10^{3},10^{9}]} , C ∈ [ 10 12 , 10 21 ] {\displaystyle C\in [10^{12},10^{21}]} , and over multiple modalities (text, video, image, text to image, etc.). In particular, the scaling laws it found are (Table 1 of ): For each modality, they fixed one of the two C , N {\displaystyle C,N} , and varying the other one ( D {\displaystyle D} is varied along using D = C / 6 N {\displaystyle D=C/6N} ), the achievable test loss satisfies L = L 0 + ( x 0 x ) α {\displaystyle L=L_{0}+\left({\frac {x_{0}}{x}}\right)^{\alpha }} where x {\displaystyle x} is the varied variable, and L 0 , x 0 , α {\displaystyle L_{0},x_{0},\alpha } are parameters to be found by statistical fitting. The parameter α {\displaystyle \alpha } is the most important one. When N {\displaystyle N} is the varied variable, α {\displaystyle \alpha } ranges from 0.037 {\displaystyle 0.037} to 0.24 {\displaystyle 0.24} depending on the model modality. This corresponds to the α = 0.34 {\displaystyle \alpha =0.34} from the Chinchilla scaling paper. When C {\displaystyle C} is the varied variable, α {\displaystyle \alpha } ranges from 0.048 {\displaystyle 0.048} to 0.19 {\displaystyle 0.19} depending on the model modality. This corresponds to the β = 0.28 {\displaystyle \beta =0.28} from the Chinchilla scaling paper. Given fixed computing budget, optimal model parameter count is consistently around N o p t ( C ) = ( C 5 × 10 − 12 petaFLOP-day ) 0.7 = 9.0 × 10 − 7 C 0.7 {\displaystyle N_{opt}(C)=\left({\frac {C}{5\times 10^{-12}{\text{petaFLOP-day}}}}\right)^{0.7}=9.0\times 10^{-7}C^{0.7}} The parameter 9.0 × 10 − 7 {\displaystyle 9.0\times 10^{-7}} varies by a factor of up to 10 for different modalities. The exponent parameter 0.7 {\displaystyle 0.7} varies from 0.64 {\displaystyle 0.64} to 0.75 {\displaystyle 0.75} for different modalities. This exponent corresponds to the ≈ 0.5 {\displaystyle \approx 0.5} from the Chinchilla scaling paper. It's "strongly suggested" (but not statistically checked) that D o p t ( C ) ∝ N o p t ( C ) 0.4 ∝ C 0.28 {\displaystyle D_{opt}(C)\propto N_{opt}(C)^{0.4}\propto C^{0.28}} . This exponent corresponds to the ≈ 0.5 {\displaystyle \approx 0.5} from the Chinchilla scaling paper. The scaling law of L = L 0 + ( C 0 / C ) 0.048 {\displaystyle L=L_{0}+(C_{0}/C)^{0.048}} was confirmed during the training of GPT-3 (Figure 3.1 ). === Chinchilla scaling (Hoffmann, et al, 2022) === One particular scaling law ("Chinchilla scaling") states that, for a large language model (LLM) autoregressively trained for one epoch, with a cosine learning rate schedule, we have: { C = C 0 N D L = A N α + B D β + L 0 {\displaystyle {\begin{cases}C=C_{0}ND\\L={\frac {A}{N^{\alpha }}}+{\frac {B}{D^{\beta }}}+L_{0}\end{cases}}} where the variables are C {\displaystyle C} is the cost o

The Old Axolotl

The Old Axolotl (Polish: Starość aksolotla) is a 2015 digital-only novel by Polish science-fiction author Jacek Dukaj. The novel was released in Polish on March 10, 2015, and shortly afterward, on March 24 that year, in English (translated by Stanley Bill). It has been described as "an experiment in reading (and creating) the electronic literature of the future". It is Dukaj's first novel to be published in English, though several of his short stories (The Golden Galley, 1996, The Iron General, 2010, The Apocrypha of Lem, 2011) have been translated prior to this. The novel has inspired two Netflix original series: the 2020 Belgian Into the Night, and its 2022 Turkish language spin-off Yakamoz S-245. == Plot == The novel presents a post-apocalyptic, cyberpunk vision of Earth where biological life has been wiped out, inhabited by robots and mechs, many of which are humans whose consciousness has been digitized in the wake of an extinction event. == Significance and analysis == The novel is an example of electronic literature, available only in digital formats, and has no traditional paper version. It was designed from the beginning not only to incorporate more traditional elements such as illustrations, but also hypertext, and 3D-printable models of main robotic characters designed by Alex Jaeger, the art director of Transformers films. The novel composition is layered, with the narrative layer, an encyclopedic/hyperlinked footnote layer, and a multimedia layer, including illustrations and a short promotional video by the Oscar-nominated Platige Image studio. One of the novel's central questions is: "What does it mean to be human?" Other subjects include post humanism and other "staples of cyberpunk and related genres, such as the artificial intelligence". The novel is representative of Dukaj's prose, posing philosophical questions about the future of man and technology. The author explained that: "stories such as The Old Axolotl that model an ‘escape from the body’ are born out of a sense of progress as a process of ‘de-animalising’ human beings through science. This has its origin in the pre-Enlightenment intuition of ‘liberation from nature’. For one of the last shackles of nature is corporeality itself, the limitations of our physicality." The other major element of the novel is Dukaj's attempts to introduce the reader to the new style of electronic literature. The novel was nominated for the 2016 Janusz A. Zajdel Award.