Leakage (machine learning)

Leakage (machine learning)

In statistics and machine learning, leakage (also known as data leakage or target leakage) refers to the use of information during model training that would not be available at prediction time. This results in overly optimistic performance estimates, as the model appears to perform better during evaluation than it actually would in a production environment. Leakage is often subtle and indirect, making it difficult to detect and eliminate. It can lead a statistician or modeler to select a suboptimal model, which may be outperformed by a leakage-free alternative. == Leakage modes == Leakage can occur at multiple stages of the machine learning workflow. Broadly, its sources can be divided into two categories: those arising from features and those arising from training examples. === Feature leakage === Feature or column-wise leakage is caused by the inclusion of columns which are one of the following: a duplicate label, a proxy for the label, or the label itself. These features, known as anachronisms, will not be available when the model is used for predictions, and result in leakage if included when the model is trained. For example, including a "MonthlySalary" column when predicting "YearlySalary"; or "MinutesLate" when predicting "IsLate". === Training example leakage === Row-wise leakage is caused by improper sharing of information between rows of data. Types of row-wise leakage include: Premature featurization; leaking from premature featurization before Cross-validation/Train/Test split (must fit MinMax/ngrams/etc on only the train split, then transform the test set) Duplicate rows between train/validation/test (for example, oversampling a dataset to pad its size before splitting; or, different rotations/augmentations of a single image; bootstrap sampling before splitting; or duplicating rows to up sample the minority class) Non-independent and identically distributed random (non-IID) data Time leakage (for example, splitting a time-series dataset randomly instead of newer data in test set using a train/test split or rolling-origin cross-validation) Group leakage—not including a grouping split column (for example, Andrew Ng's group had 100k x-rays of 30k patients, meaning ~3 images per patient. The paper used random splitting instead of ensuring that all images of a patient were in the same split. Hence the model partially memorized the patients instead of learning to recognize pneumonia in chest x-rays.) A 2023 review found data leakage to be "a widespread failure mode in machine-learning (ML)-based science", having affected at least 294 academic publications across 17 disciplines, and causing a potential reproducibility crisis. == Detection == Data leakage in machine learning can be detected through various methods, focusing on performance analysis, feature examination, data auditing, and model behavior analysis. Performance-wise, unusually high accuracy or significant discrepancies between training and test results often indicate leakage. Inconsistent cross-validation outcomes may also signal issues. Feature examination involves scrutinizing feature importance rankings and ensuring temporal integrity in time series data. A thorough audit of the data pipeline is crucial, reviewing pre-processing steps, feature engineering, and data splitting processes. Detecting duplicate entries across dataset splits is also important. For language models, the Min-K% method can detect the presence of data in a pretraining dataset. It presents a sentence suspected to be present in the pretraining dataset, and computes the log-likelihood of each token, then compute the average of the lowest K of these. If this exceeds a threshold, then the sentence is likely present. This method is improved by comparing against a baseline of the mean and variance. Analyzing model behavior can reveal leakage. Models relying heavily on counter-intuitive features or showing unexpected prediction patterns warrant investigation. Performance degradation over time when tested on new data may suggest earlier inflated metrics due to leakage. Advanced techniques include backward feature elimination, where suspicious features are temporarily removed to observe performance changes. Using a separate hold-out dataset for final validation before deployment is advisable.

Kernel density estimation

In statistics, kernel density estimation (KDE) is the application of kernel smoothing for probability density estimation, i.e., a non-parametric method to estimate the probability density function of a random variable based on kernels as weights. KDE answers a fundamental data smoothing problem where inferences about the population are made based on a finite data sample. In some fields such as signal processing and econometrics it is also termed the Parzen–Rosenblatt window method, after Emanuel Parzen and Murray Rosenblatt, who are usually credited with independently creating it in its current form. One of the famous applications of kernel density estimation is in estimating the class-conditional marginal densities of data when using a naive Bayes classifier, which can improve its prediction accuracy. == Definition == Let x = ( x 1 , x 2 , x 3 , . . . ) {\displaystyle \mathbf {x} =\left(x_{1},x_{2},x_{3},...\right)} be independent and identically distributed samples drawn from some univariate distribution with an unknown density f at any given point x. We are interested in estimating the shape of this function f. Its kernel density estimator is f ^ h ( x ) = 1 n ∑ i = 1 n K h ( x − x i ) = 1 n h ∑ i = 1 n K ( x − x i h ) , {\displaystyle {\hat {f}}_{h}(x)={\frac {1}{n}}\sum _{i=1}^{n}K_{h}(x-x_{i})={\frac {1}{nh}}\sum _{i=1}^{n}K{\left({\frac {x-x_{i}}{h}}\right)},} where K is the kernel — a non-negative function — and h > 0 is a smoothing parameter called the bandwidth or simply width. A kernel with subscript h is called the scaled kernel and defined as Kh(x) = ⁠1/h⁠ K(⁠x/h⁠). Intuitively one wants to choose h as small as the data will allow; however, there is always a trade-off between the bias of the estimator and its variance. The choice of bandwidth is discussed in more detail below. A range of kernel functions are commonly used: uniform, triangular, biweight, triweight, Epanechnikov (parabolic), normal, and others. The Epanechnikov kernel is optimal in a mean square error sense, though the loss of efficiency is small for the kernels listed previously. Due to its convenient mathematical properties, the normal kernel is often used, which means K(x) = ϕ(x), where ϕ is the standard normal density function. The kernel density estimator then becomes f ^ h ( x ) = 1 n ∑ i = 1 n 1 h 2 π exp ⁡ ( − ( x − x i ) 2 2 h 2 ) , {\displaystyle {\hat {f}}_{h}(x)={\frac {1}{n}}\sum _{i=1}^{n}{\frac {1}{h{\sqrt {2\pi }}}}\exp \left({\frac {-(x-x_{i})^{2}}{2h^{2}}}\right),} where h {\displaystyle h} is the standard deviation of the sample x {\displaystyle \mathbf {x} } . The construction of a kernel density estimate finds interpretations in fields outside of density estimation. For example, in thermodynamics, this is equivalent to the amount of heat generated when heat kernels (the fundamental solution to the heat equation) are placed at each data point locations xi. Similar methods are used to construct discrete Laplace operators on point clouds for manifold learning (e.g. diffusion map). == Example == Kernel density estimates are closely related to histograms, but can be endowed with properties such as smoothness or continuity by using a suitable kernel. The diagram below based on these 6 data points illustrates this relationship: For the histogram, first, the horizontal axis is divided into sub-intervals or bins which cover the range of the data: In this case, six bins each of width 2. Whenever a data point falls inside this interval, a box of height 1/12 is placed there. If more than one data point falls inside the same bin, the boxes are stacked on top of each other. For the kernel density estimate, normal kernels with a standard deviation of 1.5 (indicated by the red dashed lines) are placed on each of the data points xi. The kernels are summed to make the kernel density estimate (solid blue curve). The smoothness of the kernel density estimate (compared to the discreteness of the histogram) illustrates how kernel density estimates converge faster to the true underlying density for continuous random variables. == Bandwidth selection == The bandwidth of the kernel is a free parameter which exhibits a strong influence on the resulting estimate. To illustrate its effect, we take a simulated random sample from the standard normal distribution (plotted at the blue spikes in the rug plot on the horizontal axis). The grey curve is the true density (a normal density with mean 0 and variance 1). In comparison, the red curve is undersmoothed since it contains too many spurious data artifacts arising from using a bandwidth h = 0.05, which is too small. The green curve is oversmoothed since using the bandwidth h = 2 obscures much of the underlying structure. The black curve with a bandwidth of h = 0.337 is considered to be optimally smoothed since its density estimate is close to the true density. An extreme situation is encountered in the limit h → 0 {\displaystyle h\to 0} (no smoothing), where the estimate is a sum of n delta functions centered at the coordinates of analyzed samples. In the other extreme limit h → ∞ {\displaystyle h\to \infty } the estimate retains the shape of the used kernel, centered on the mean of the samples (completely smooth). The most common optimality criterion used to select this parameter is the expected L2 risk function, also termed the mean integrated squared error: MISE ⁡ ( h ) = E [ ∫ ( f ^ h ( x ) − f ( x ) ) 2 d x ] {\displaystyle \operatorname {MISE} (h)=\operatorname {E} \!\left[\int \!{\left({\hat {f}}\!_{h}(x)-f(x)\right)}^{2}dx\right]} Under weak assumptions on f and K, (f is the, generally unknown, real density function), MISE ⁡ ( h ) = AMISE ⁡ ( h ) + o ( ( n h ) − 1 + h 4 ) {\displaystyle \operatorname {MISE} (h)=\operatorname {AMISE} (h)+{\mathcal {o}}{\left((nh)^{-1}+h^{4}\right)}} where o is the little o notation, and n the sample size (as above). The AMISE is the asymptotic MISE, i. e. the two leading terms, AMISE ⁡ ( h ) = R ( K ) n h + 1 4 m 2 ( K ) 2 h 4 R ( f ″ ) {\displaystyle \operatorname {AMISE} (h)={\frac {R(K)}{nh}}+{\frac {1}{4}}m_{2}(K)^{2}h^{4}R(f'')} where R ( g ) = ∫ g ( x ) 2 d x {\textstyle R(g)=\int g(x)^{2}\,dx} for a function g, m 2 ( K ) = ∫ x 2 K ( x ) d x {\textstyle m_{2}(K)=\int x^{2}K(x)\,dx} and f ″ {\displaystyle f''} is the second derivative of f {\displaystyle f} and K {\displaystyle K} is the kernel. The minimum of this AMISE is the solution to this differential equation ∂ ∂ h AMISE ⁡ ( h ) = − R ( K ) n h 2 + m 2 ( K ) 2 h 3 R ( f ″ ) = 0 {\displaystyle {\frac {\partial }{\partial h}}\operatorname {AMISE} (h)=-{\frac {R(K)}{nh^{2}}}+m_{2}(K)^{2}h^{3}R(f'')=0} or h AMISE = R ( K ) 1 / 5 m 2 ( K ) 2 / 5 R ( f ″ ) 1 / 5 n − 1 / 5 = C n − 1 / 5 {\displaystyle h_{\operatorname {AMISE} }={\frac {R(K)^{1/5}}{m_{2}(K)^{2/5}R(f'')^{1/5}}}n^{-1/5}=Cn^{-1/5}} Neither the AMISE nor the hAMISE formulas can be used directly since they involve the unknown density function f {\displaystyle f} or its second derivative f ″ {\displaystyle f''} . To overcome that difficulty, a variety of automatic, data-based methods have been developed to select the bandwidth. Several review studies have been undertaken to compare their efficacies, with the general consensus that the plug-in selectors and cross validation selectors are the most useful over a wide range of data sets. Substituting any bandwidth h which has the same asymptotic order n−1/5 as hAMISE into the AMISE gives that AMISE(h) = O(n−4/5), where O is the big O notation. It can be shown that, under weak assumptions, there cannot exist a non-parametric estimator that converges at a faster rate than the kernel estimator. Note that the n−4/5 rate is slower than the typical n−1 convergence rate of parametric methods. If the bandwidth is not held fixed, but is varied depending upon the location of either the estimate (balloon estimator) or the samples (pointwise estimator), this produces a particularly powerful method termed adaptive or variable bandwidth kernel density estimation. Bandwidth selection for kernel density estimation of heavy-tailed distributions is relatively difficult. === A rule-of-thumb bandwidth estimator === If Gaussian basis functions are used to approximate univariate data, and the underlying density being estimated is Gaussian, the optimal choice for h (that is, the bandwidth that minimises the mean integrated squared error) is: h = ( 4 σ ^ 5 3 n ) 1 / 5 ≈ 1.06 σ ^ n − 1 / 5 , {\displaystyle h={\left({\frac {4{\hat {\sigma }}^{5}}{3n}}\right)}^{1/5}\approx 1.06\,{\hat {\sigma }}\,n^{-1/5},} An h {\displaystyle h} value is considered more robust when it improves the fit for long-tailed and skewed distributions or for bimodal mixture distributions. This is often done empirically by replacing the standard deviation σ ^ {\displaystyle {\hat {\sigma }}} by the parameter A {\displaystyle A} below: A = min ( σ ^ , I Q R 1.34 ) {\displaystyle A=\min \left({\hat {\sigma }},{\frac {\mathrm {IQR} }{1.34}}\right)} where IQR is the

Transhuman Space

Transhuman Space (THS) is a role-playing game by David Pulver, published by Steve Jackson Games as part of the "Powered by GURPS" (Generic Universal Role-Playing System) line. Set in the year 2100, humanity has begun to colonize the Solar System. The pursuit of transhumanism is now in full swing, as more and more people reach fully posthuman states. In 2002, the Transhuman Space adventure "Orbital Decay" received an Origins Award nomination for Best Role-Playing Game Adventure. Transhuman Space won the 2003 Grog d'Or Award for Best Role-playing Game, Game Line or RPG Setting. == Setting == The game assumes that no cataclysm — natural or human-induced — swept Earth in the 21st century. Instead, constant developments in information technology, genetic engineering, nanotechnology and nuclear physics generally improved condition of the average human life. Plagues of the 20th century (like cancer or AIDS) have been suppressed, the ozone layer is being restored and Earth's ecosystems are recovering (although thermal emission by fusion power plants poses an environmental threat—albeit a much lesser one than previous sources of energy). Thanks to modern medicine humans live biblical timespans surrounded by various artificially intelligent helper applications and robots (cybershells), sensory experience broadcasts (future TV) and cyberspace telepresence. Thanks to cheap and clean fusion energy humanity has power to fuel all these wonders, restore and transform its home planet and finally settle on other heavenly bodies. Human genetic engineering has advanced to the point that anyone—single individuals, same-sex couples, groups of three or more—can reproduce. The embryos can be allowed to be developed naturally, or they can undergo three levels of tinkering: 1. Genefixing, which corrects defects; 2. Upgrades, which boost natural abilities (Ishtar Upgrades are slightly more attractive than usual, Metanoia Upgrades are more intelligent, etc.); and... 3. Full transition to parahuman status (Nyx Parahumans only need a few hours of sleep per week, Aquamorphs can live underwater, etc.) Another type of human genetic engineering, far more controversial, is the creation of bioroids, fully sentient slave races. People can "upload" by recording the simulation of their brains on computer disks. The emulated individual then becomes a ghost, an infomorph very easily confused with "sapient artificial intelligence". However, this technology has several problems as the solely available "brainpeeling" technique is fatal to the original biological lifeform being simulated, has a significant failure rate and the philosophical questions regarding personal identity remain equivocal. Any infomorph, regardless of its origin, can be plugged into a "cybershell" (robotic or cybernetic body), or a biological body, or "bioshell". Or, the individual can illegally make multiple "xoxes", or copies of themselves, and scatter them throughout the system, exponentially increasing the odds that at least one of them will live for centuries more, if not forever. This is also a time of space colonization. First, humanity (specifically China, followed by the United States and others) colonized Mars in a fashion resembling that outlined in the Mars Direct project. The Moon, Lagrangian points, inner planets and asteroids soon followed. In the late 21st century even some of Saturn's moons have been settled as a base for that planet's Helium-3 scooping operations. Transhuman Space's setting is neither utopia nor dystopia, however: several problems have arisen from these otherwise beneficial developments. The generation gap has become a chasm as lifespans increase. No longer do the elite fear death, and no longer can the young hope to replace them. While it seemed that outworld colonies would offer accommodation and work for those young ones, they are being replaced by genetically tailored bioroids and AI-powered cybershells. The concept of humanity is no longer clear in a world where even some animals speak of their rights and the dead haunt both cyberspace and reality (in form of infomorph-controlled bioshells or cybershells). And the wonders of high science are not universally shared — some countries merely struggle with informatization while others suffer from nanoplagues, defective drugs, implants and software tested on their populace. In some poor countries high-tech tyrants oppress their backward people. And in outer space all sort of modern crime thrives, barely suppressed by military forces. == Publication history == After the initial set of GURPS books that were published using the GURPS Lite, later publications such as Transhuman Space by David Pulver were labelled simply "Powered by GURPS" without using the name "GURPS" in the book title. Transhuman Space received a significant amount of supporting publications, and was the largest original background setting that Steve Jackson Games produced in 15 years. Shannon Appelcline noted that by its inclusion of posthuman characters, the book began to show the limits of the GURPS system as it was, which is something that Pulver would address soon thereafter. Steve Jackson Games has not updated the core book (GURPS Transhuman Space) to 4th edition, although the supplement Transhuman Space: Changing Times provides a path for migrating to 4th edition. It has produced several 4th edition supplements for the setting: Transhuman Space: Bioroid Bazaar, Transhuman Space: Cities on the Edge, Transhuman Space: Martial Arts 2100, Transhuman Space: Personnel Files 2-5, Transhuman Space: Shell-Tech, GURPS Spaceships 8: Transhuman Spacecraft, Transhuman Space: Transhuman Mysteries, and Transhuman Space: Wings of the Rising Sun. == Reception == In a review of Transhuman Space in Black Gate, William Stoddard said "Transhuman Space was a richly detailed setting; if it had imperfections, it had enough depth to make up for them. I think it has the potential to become a classic in its field. Perhaps a campaign set in its default start year of 2100 could leave the early twenty-first century blurry enough to avoid obvious incongruities." == Reviews == Review in Vol. 20, No. 1 of Prometheus, the journal of the Libertarian Futurist Society.

Vagueness

In linguistics and philosophy, a vague predicate is one which gives rise to borderline cases. For example, the English adjective "tall" is vague since it is not clearly true or false for someone of middling height. By contrast, the word "prime" is not vague since every number is definitively either prime or not. Vagueness is commonly diagnosed by a predicate's ability to give rise to the sorites paradox. Vagueness is separate from ambiguity, in which an expression has multiple denotations. For instance the word "bank" is ambiguous since it can refer either to a river bank or to a financial institution, but there are no borderline cases between both interpretations. Vagueness is a major topic of research in philosophical logic, where it serves as a potential challenge to classical logic. Work in formal semantics has sought to provide a compositional semantics for vague expressions in natural language. Work in philosophy of language has addressed implications of vagueness for the theory of meaning, while metaphysicists have considered whether reality itself is vague. == Importance == The concept of vagueness has philosophical importance. Suppose one wants to come up with a definition of "right" in the moral sense. One wants a definition to cover actions that are clearly right and exclude actions that are clearly wrong, but what does one do with the borderline cases? Surely, there are such cases. Some philosophers say that one should try to come up with a definition that is itself unclear on just those cases. Others say that one has an interest in making his or her definitions more precise than ordinary language, or his or her ordinary concepts, themselves allow; they recommend one advances precising definitions. === In law === Vagueness is also a problem which arises in law, and in some cases, judges have to arbitrate regarding whether a borderline case does, or does not, satisfy a given vague concept. Examples include disability (how much loss of vision is required before one is legally blind?), human life (at what point from conception to birth is one a legal human being, protected for instance by laws against murder?), adulthood (most familiarly reflected in legal ages for driving, drinking, voting, consensual sex, etc.), race (how to classify someone of mixed racial heritage), etc. Even such apparently unambiguous concepts such as biological sex can be subject to vagueness problems, not just from transsexuals' gender transitions but also from certain genetic conditions which can give an individual mixed male and female biological traits (see intersex). In the common law system, vagueness is a possible legal defence against by-laws and other regulations. The legal principle is that delegated power cannot be used more broadly than the delegator intended. Therefore, a regulation may not be so vague as to regulate areas beyond what the law allows. Any such regulation would be "void for vagueness" and unenforceable. This principle is sometimes used to strike down municipal by-laws that forbid "explicit" or "objectionable" contents from being sold in a certain city; courts often find such expressions to be too vague, giving municipal inspectors discretion beyond what the law allows. In the US this is known as the vagueness doctrine and in Europe as the principle of legal certainty. === In science === Many scientific concepts are of necessity vague, for instance species in biology cannot be precisely defined, owing to unclear cases such as ring species. Nonetheless, the concept of species can be clearly applied in the vast majority of cases. As this example illustrates, to say that a definition is "vague" is not necessarily a criticism. Consider those animals in Alaska that are the result of breeding huskies and wolves: are they dogs? It is not clear: they are borderline cases of dogs. This means one's ordinary concept of doghood is not clear enough to let us rule conclusively in this case. == Approaches == The philosophical question of what the best theoretical treatment of vagueness is—which is closely related to the problem of the paradox of the heap, a.k.a. sorites paradox—has been the subject of much philosophical debate. === Fuzzy logic === One theoretical approach is that of fuzzy logic, developed by American mathematician Lotfi Zadeh. Fuzzy logic proposes a gradual transition between "perfect falsity", for example, the statement "Bill Clinton is bald", to "perfect truth", for, say, "Patrick Stewart is bald". In ordinary logics, there are only two truth-values: "true" and "false". The fuzzy perspective differs by introducing an infinite number of truth-values along a spectrum between perfect truth and perfect falsity. Perfect truth may be represented by "1", and perfect falsity by "0". Borderline cases are thought of as having a "truth-value" anywhere between 0 and 1 (for example, 0.6). Advocates of the fuzzy logic approach have included K. F. Machina (1976) and Dorothy Edgington (1993). === Supervaluationism === Another theoretical approach is known as "supervaluationism". This approach has been defended by Kit Fine and Rosanna Keefe. Fine argues that borderline applications of vague predicates are neither true nor false, but rather are instances of "truth value gaps". He defends an interesting and sophisticated system of vague semantics, based on the notion that a vague predicate might be "made precise" in many alternative ways. This system has the consequence that borderline cases of vague terms yield statements that are neither true, nor false. Given a supervaluationist semantics, one can define the predicate "supertrue" as meaning "true on all precisifications". This predicate will not change the semantics of atomic statements (e.g. "Frank is bald", where Frank is a borderline case of baldness), but does have consequences for logically complex statements. In particular, the tautologies of sentential logic, such as "Frank is bald or Frank is not bald", will turn out to be supertrue, since on any precisification of baldness, either "Frank is bald" or "Frank is not bald" will be true. Since the presence of borderline cases seems to threaten principles like this one (excluded middle), the fact that supervaluationism can "rescue" them is seen as a virtue. === Subvaluationism === Subvaluationism is the logical dual of supervaluationism, and has been defended by Dominic Hyde (2008) and Pablo Cobreros (2011). Whereas the supervaluationist characterises truth as 'supertruth', the subvaluationist characterises truth as 'subtruth', or "true on at least some precisifications". Subvaluationism proposes that borderline applications of vague terms are both true and false. It thus has "truth-value gluts". According to this theory, a vague statement is true if it is true on at least one precisification and false if it is false under at least one precisification. If a vague statement comes out true under one precisification and false under another, it is both true and false. Subvaluationism ultimately amounts to the claim that vagueness is a truly contradictory phenomenon. Of a borderline case of "bald man" it would be both true and false to say that he is bald, and both true and false to say that he is not bald. === Epistemicist view === A fourth approach, known as "the epistemicist view", has been defended by Timothy Williamson (1994), R. A. Sorensen (1988) and (2001), and Nicholas Rescher (2009). They maintain that vague predicates do, in fact, draw sharp boundaries, but that one cannot know where these boundaries lie. One's confusion about whether some vague word does or does not apply in a borderline case is due to one's ignorance. For example, in the epistemicist view, there is a fact of the matter, for every person, about whether that person is old or not old; some people are ignorant of this fact. === As a property of objects === One possibility is that one's words and concepts are perfectly precise, but that objects themselves are vague. Consider Peter Unger's example of a cloud (from his famous 1980 paper, "The Problem of the Many"): it is not clear where the boundary of a cloud lies; for any given bit of water vapor, one can ask whether it is part of the cloud or not, and for many such bits, one will not know how to answer. Hence, perhaps such a term as 'cloud' is not itself vague, but rather precisely denotes a vague object. This strategy has occasionally been poorly received; most notably, in Gareth Evans' short paper "Can There Be Vague Objects?" (1978), wherein an argument is examined which appears to show that vague identity-statements are impossible (i.e., result in logical incoherence). David Lewis explains that the reader is intended to conclude, with Evans, that—since there clearly are, in fact, meaningful vague identities—any purported proof to the contrary cannot be right; and as the proof relies upon the premise that vague terms precisely denote vague objects, but fails under the view that vague terms reflect a merel

ICAART

The International Conference on Agents and Artificial Intelligence (ICAART) is a meeting point for researchers (among others) with interest in the areas of Agents and Artificial Intelligence. There are 2 tracks in ICAART, one related to Agents and Distributed AI in general and the other one focused in topics related to Intelligent Systems and Computational Intelligence. The conference program is composed of several different kind of sessions like technical sessions, poster sessions, keynote lectures, tutorials, special sessions, doctoral consortiums, panels and industrial tracks. The papers presented in the conference are made available at the SCITEPRESS digital library, published in the conference proceedings and some of the best papers are invited to a post-publication with Springer. ICAART's first edition was in 2009 counting with several keynote speakers like Marco Dorigo, Edward H. Shortliffe and Eduard Hovy. Since then, the conference had several other invited speakers like Katia Sycara, Nick Jennings, Robert Kowalski, Boi Faltings and Tim Finin. Bart Selman is one of the names confirmed for the next edition of this conference. Since 2012 the conference is held in conjunction with 2 other conferences: the International Conference on Operations Research and Enterprise Systems (ICORES) and the International Conference on Pattern Recognition Applications and Methods (ICPRAM). == Areas == === Agents === Agent communication languages Cooperation and Coordination Distributed Problem Solving Economic Agent Models Emotional Intelligence Group Decision Making Intelligent Auctions and Markets Mobile Agents Multi-agent systems Negotiation and Interaction Protocols Nep News Detection Agent Models and Architectures Physical Agents at Work Privacy, Safety and Security Programming Environments and Languages Robot and Multi-Robot Systems Self Organizing Systems Semantic Web Simulation Swarm Intelligence Task Planning and Execution Transparency and Ethical Issues Agent-Oriented Software Engineering Web Intelligence Agent Platforms and Interoperability Autonomous systems Cloud Computing and Its Impact Cognitive robotics Collective Intelligence Conversational Agents === Artificial intelligence === AI and Creativity Deep Learning Evolutionary Computing Fuzzy Systems Hybrid Intelligent Systems Industrial Applications of AI Intelligence and Cybersecurity Intelligent User Interfaces Knowledge Representation and Reasoning Knowledge-Based Systems Ambient Intelligence Machine learning Model-Based Reasoning Natural Language Processing Neural Networks Ontologies Planning and Scheduling Social Network Analysis Soft Computing State Space Search Bayesian Networks Uncertainty in AI Vision and Perception Visualization Big Data Case-Based Reasoning Cognitive Systems Constraint Satisfaction Data Mining Data Science == Editions == === ICAART 2023 – Lisbon, Portugal === === ICAART 2020 – Valletta, Malta === === ICAART 2019 – Prague, Czech Republic === Proceedings - Proceedings of the 11th International Conference on Web Information Systems and Technologies - Volume 1. ISBN 978-989-758-350-6 Proceedings - Proceedings of the 11th International Conference on Web Information Systems and Technologies - Volume 2. ISBN 978-989-758-350-6 === ICAART 2018 – Funchal, Madeira, Portugal === Proceedings - Proceedings of the 10th International Conference on Web Information Systems and Technologies - Volume 1. ISBN 978-989-758-275-2 Proceedings - Proceedings of the 10th International Conference on Web Information Systems and Technologies - Volume 2. ISBN 978-989-758-275-2 === ICAART 2017 – Porto, Portugal === Proceedings - Proceedings of the 9th International Conference on Web Information Systems and Technologies - Volume 1. ISBN 978-989-758-219-6 Proceedings - Proceedings of the 9th International Conference on Web Information Systems and Technologies - Volume 2. ISBN 978-989-758-220-2 === ICAART 2016 – Rome, Italy === Proceedings - Proceedings of the 8th International Conference on Web Information Systems and Technologies - Volume 1. ISBN 978-989-758-172-4 Proceedings - Proceedings of the 8th International Conference on Web Information Systems and Technologies - Volume 2. ISBN 978-989-758-172-4 === ICAART 2015 – Lisbon, Portugal === Proceedings - Proceedings of the 7th International Conference on Web Information Systems and Technologies - Volume 1. ISBN 978-989-758-073-4 Proceedings - Proceedings of the 7th International Conference on Web Information Systems and Technologies - Volume 2. ISBN 978-989-758-074-1 === ICAART 2014 – ESEO, Angers, Loire Valley, France === Proceedings - Proceedings of the 6th International Conference on Web Information Systems and Technologies - Volume 1. ISBN 978-989-758-015-4 Proceedings - Proceedings of the 6th International Conference on Web Information Systems and Technologies - Volume 2. ISBN 978-989-758-016-1 === ICAART 2013 – Barcelona, Spain === Proceedings - Proceedings of the 5th International Conference on Web Information Systems and Technologies - Volume 1. ISBN 978-989-8565-38-9 Proceedings - Proceedings of the 5th International Conference on Web Information Systems and Technologies - Volume 2. ISBN 978-989-8565-39-6 === ICAART 2012 – Vilamoura, Algarve, Portugal === Proceedings - Proceedings of the 4th International Conference on Web Information Systems and Technologies - Volume 1. ISBN 978-989-8425-95-9 Proceedings - Proceedings of the 4th International Conference on Web Information Systems and Technologies - Volume 2. ISBN 978-989-8425-96-6 === ICAART 2011 – Rome, Italy === Proceedings - Proceedings of the 3rd International Conference on Web Information Systems and Technologies - Volume 1. ISBN 978-989-8425-40-9 Proceedings - Proceedings of the 3rd International Conference on Web Information Systems and Technologies - Volume 2. ISBN 978-989-8425-41-6 === ICAART 2010 – Valencia, Spain === Proceedings - Proceedings of the 2nd International Conference on Web Information Systems and Technologies - Volume 1. ISBN 978-989-674-021-4 Proceedings - Proceedings of the 2nd International Conference on Web Information Systems and Technologies - Volume 2. ISBN 978-989-674-022-1 === ICAART 2009 – Porto, Portugal === Proceedings - Proceedings of the 1st International Conference on Web Information Systems and Technologies. ISBN 978-989-8111-66-1

Microapp

A microapp is a super-specialized application designed to perform one task or use case with the only objective of doing it well. They follow the single responsibility principle, which states that "a class should have one and only one reason to change." Micro applications help developers create less complex applications while reducing costs by breaking down monolithic systems into groups of independent services acting as one system. A good example of Microapps would be https://docs.citrix.com/en-us/legacy-archive/downloads/microapps.pdfthat provide single purpose action from Salesforce and over 40 applications on its workspace. == Requirements and characteristics == Microapps usually are accessible on any device, display, or operating system without installation on the viewer's device. To qualify as a microapp, the entity must: be built and deployed as an independent software module bring together various media types into a single experience have advanced security and compliance features be functionally-extensible comply with granular data demands be agnostic single use case oriented Microapps differentiate from traditional web or mobile applications by how the end-user interacts with them. Consequently, they can be embedded in websites or viewed online to bypass app stores and are typically built to provide a focused experience to the user. == Usage == Microapps are typically used for commercial purposes to reduce development costs for projects not requiring the large scope of a traditional web or mobile application. In addition, they are often used to showcase in-depth information or enrich marketing material with interactivity. Lately, micro apps are being used to boost productivity by providing quick tools to people to reuse best practices. Users have been interacting with microapps for a while with suites like Microsoft 365 and Google Workspace, where each one of their end-user services could be considered as a microapp. All these microapps share a unique identity manager to provide a unified user experience. == Benefits == Replacing monolith systems with microapps provide several advantages like: Reduce complexity for developers and users. Smaller, more cohesive, and maintainable codebases Scalable organizations with decoupled, autonomous teams Allows for hyper-specialization Independent deployment Multi-stack == Cloud-native microapps == Technologies like Kubernetes, or OpenShift, allow companies to replace their monolith and legacy systems with modular software taking advantage of microapps on reducing costs and improve reliability and security. == Microapps vs. microservices == There is a widespread misunderstanding between these two concepts, which is the key difference. Microservices is an architectural style that is systems-centric, meaning it decouples the presentation and data layer using web services APIs. On the other side, micro apps behave more as a super-architecture style (that embraces microservices among other types), and it is user-centric, meaning they decouple the whole monolith system onto modules that are designed to interact with final users. Both architectural styles rely on modularity to provide high performance, scalability, and resilience. == Considerations == Developing Micro apps requires a different approach than traditional software, and user experience is crucial. The following considerations are essential for switching to microapps. To run multiple microapps is required a single identity management system. Microservices are well suited to make microapps more powerful Apps with different levels of maturity might create a non-unified user experience. Duplication of dependencies can create security issues and inefficiencies. Suitable for well-organized teams

Loab

Loab ( LOBE) is a fictional character that artist and writer Steph Maj Swanson claimed to have discovered with a text-to-image AI model in April 2022. In a viral Twitter thread, Swanson described the images of Loab as an unexpectedly emergent property of the software, saying they discovered them when asking the model to produce something "as different from the prompt as possible". == History == The Sweden-based artist Steph Maj Swanson said that they first generated these images in April 2022 by using the algorithmic technique of "negative prompt weights" accessing latent space. The initial prompt - 'Brando::-1', requesting the opposite of actor Marlon Brando - generated a "skyline logo" with the cryptic lettering "DIGITA PNTICS". Attempting to generate the opposite of this image using the prompt "DIGITA PNTICS skyline logo::-1" yielded what Swanson described as "off-putting images, all of the same devastated-looking older woman with defined triangles of rosacea(?) on her cheeks". Swanson nicknamed the character "Loab", after one of the generated images resembled an album cover that included the printed word "loab". Swanson says that using the image as a prompt for further images produced increasingly violent and gory results. Swanson speculated that something about the image could be "adjacent to extremely gory and macabre imagery in the distribution of the AI's world knowledge". Swanson says that when they combined images of Loab with other pictures, the subsequent results consistently return an image including Loab, regardless of how much distortion they added to the prompts to try and remove her visage. Swanson speculated that the latent space region of the AI map that Loab is located in, in addition to being near gruesome imagery, must be isolated enough that any combinations with other images could only use Loab from her area and no related images due to its isolation. After enough crossbreeding of images and dilution attempts, Swanson was able to eventually generate images without Loab, but found that crossbreeding those diluted images would also eventually lead to a version of Loab to reappear in the resulting images. Swanson has said that "for various reasons" they declined to disclose the software used to create the images. Loab has been referred to as the "first AI-generated cryptid" and as such has gone viral. Despite hyping up the cryptid nature of the discovery in their wording, Swanson admitted that "Loab isn't really haunted, of course", but noted that the mythos that has sprung up around the AI-generated character has gone beyond their initial involvement. Swanson speculated that people sharing pictures and memes of Loab would lead future AIs to use those images as a part of their latent space maps, making her an innate part of the internet landscape, with Swanson adding "If we want to get rid of her, it's already too late." == Response == There has been discussion of whether the Loab series of images are "a legitimate quirk of AI art software, or a cleverly disguised creepypasta." Smithsonian magazine has written that "Loab sparked some lengthy ethical conversations around visual aesthetics, art and technology," and some have criticized the labeling of a woman with rosacea as a horror image, considering this to be "stigmatizing disability". Swanson responded that if the AI map is combining Loab with violent imagery, then that is a "social bias" in the data being used for the image modeling software. The Atlantic writer Stephen Marche described Loab as a "form of expression that has never existed before" whose authorship is unclear and that exists as an "emanation of the collective imagistic heritage, the unconscious visual mind". Laurens Verhagen in de Volkskrant commented that rather than showing that there are "dark horror creatures hidden deep within AI", the existence of Loab instead implies that our current "understanding of AI is limited". Mhairi Aitken at the Alan Turing Institute stated that rather than a "creepy" emergent property, output results like Loab were representative of the "limitations of AI image-generator models" and was more concerned about the urban legends that are born from such "boring" innocuous things and how easily "other people take these things seriously". Carly Cassella for ScienceAlert described Loab as a "modern day tronie" (a style of Dutch painting) that is not representative of an actual person, but just a concept or idea, similar but distinct from works like the Girl With A Pearl Earring. Wired's Joel Warner argued that Loab was only the beginning and that, with AI text generators such as ChatGPT becoming more commonplace, a "linguistic version of Loab" would emerge in that space as well and begin creating ideas through "intentional prompts" or otherwise that will be as disturbing as The 120 Days of Sodom.