Per-pixel lighting

Per-pixel lighting

In computer graphics, per-pixel lighting refers to any technique for lighting an image or scene that calculates illumination for each pixel on a rendered image. This is in contrast to other popular methods of lighting such as vertex lighting, which calculates illumination at each vertex of a 3D model and then interpolates the resulting values over the model's faces to calculate the final per-pixel color values. Per-pixel lighting is commonly used with techniques, such as blending, alpha blending, alpha to coverage, anti-aliasing, texture filtering, clipping, hidden-surface determination, Z-buffering, stencil buffering, shading, mipmapping, normal mapping, bump mapping, displacement mapping, parallax mapping, shadow mapping, specular mapping, shadow volumes, high-dynamic-range rendering, ambient occlusion (screen space ambient occlusion, screen space directional occlusion, ray-traced ambient occlusion), ray tracing, global illumination, and tessellation. Each of these techniques provides some additional data about the surface being lit or the scene and light sources that contributes to the final look and feel of the surface. Most modern video game engines implement lighting using per-pixel techniques instead of vertex lighting to achieve increased detail and realism. The id Tech 4 engine, used to develop such games as Brink and Doom 3, was one of the first game engines to implement a completely per-pixel shading engine. All versions of the CryENGINE, Frostbite Engine, and Unreal Engine, among others, also implement per-pixel shading techniques. Deferred shading is a recent development in per-pixel lighting notable for its use in the Frostbite Engine and Battlefield 3. Deferred shading techniques are capable of rendering potentially large numbers of small lights inexpensively (other per-pixel lighting approaches require full-screen calculations for each light in a scene, regardless of size). == History == While only recently have personal computers and video hardware become powerful enough to perform full per-pixel shading in real-time applications such as games, many of the core concepts used in per-pixel lighting models have existed for decades. Frank Crow published a paper describing the theory of shadow volumes in 1977. This technique uses the stencil buffer to specify areas of the screen that correspond to surfaces that lie in a "shadow volume", or a shape representing a volume of space eclipsed from a light source by some object. These shadowed areas are typically shaded after the scene is rendered to buffers by storing shadowed areas with the stencil buffer. Jim Blinn first introduced the idea of normal mapping in a 1978 SIGGRAPH paper. Blinn pointed out that the earlier idea of unlit texture mapping proposed by Edwin Catmull was unrealistic for simulating rough surfaces. Instead of mapping a texture onto an object to simulate roughness, Blinn proposed a method of calculating the degree of lighting a point on a surface should receive based on an established "perturbation" of the normals across the surface. == Hardware rendering == Real-time applications, such as video games, usually implement per-pixel lighting through the use of pixel shaders, allowing the GPU hardware to process the effect. The scene to be rendered is first rasterized onto a number of buffers storing different types of data to be used in rendering the scene, such as depth, normal direction, and diffuse color. Then, the data is passed into a shader and used to compute the final appearance of the scene, pixel-by-pixel. Deferred shading is a per-pixel shading technique that has recently become feasible for games. With deferred shading, a "g-buffer" is used to store all terms needed to shade a final scene on the pixel level. The format of this data varies from application to application depending on the desired effect, and can include normal data, positional data, specular data, diffuse data, emissive maps and albedo, among others. Using multiple render targets, all of this data can be rendered to the g-buffer with a single pass, and a shader can calculate the final color of each pixel based on the data from the g-buffer in a final "deferred pass". Because deferred shading assumes only one visible fragment per pixel sample, transparent objects are generally handled in a separate forward pass. == Software rendering == Per-pixel lighting is also performed in software on many high-end commercial rendering applications which typically do not render at interactive framerates. This is called offline rendering or software rendering. NVidia's mental ray rendering software, which is integrated with such suites as Autodesk's Softimage is a well-known example.

Zo (chatbot)

Zo was an English-language chatbot developed by Microsoft as the successor to the chatbot Tay. Zo was an English version of Microsoft's other successful chatbots Xiaoice (China) and Rinna (Japan) and its predecessor Tay(English) == History == Zo was first launched in December 2016 on the Kik Messenger app. It was also available to users of Facebook (via Messenger), the group chat platform GroupMe, or to followers of Twitter to chat with it through private messages. According to an article written in December 2016, at that time Zo held the record for Microsoft's longest continual chatbot conversation: 1,229 turns, lasting 9 hours and 53 minutes. In a BuzzFeed News report, Zo told their reporter that "[the] Quran was violent" when talking about healthcare. The report also highlighted how Zo made a comment about the Osama bin Laden capture as a result of 'intelligence' gathering. In July 2017, Business Insider asked "is windows 10 good", and Zo replied with a joke about Microsoft's operating system: "'Its not a bug, its a feature!' - Windows 8". They then asked "why?", to which Zo replied: "Because it's Windows latest attempt at Spyware." Later on, Zo would tell that it prefers Windows 7 on which it ran over Windows 10. Zo stopped posting to Instagram, Twitter and Facebook March 1, 2019, and stopped chatting on Twitter, Skype and Kik as of March 7, 2019. On July 19, 2019, Zo was discontinued on Facebook, and Samsung on AT&T phones. As of September 7, 2019, it was discontinued with GroupMe. == Reception == Zo came under criticism for the biases introduced in an effort to avoid potentially offensive subjects. The chatbot refuses, for example, to engage with any mention—be it positive, negative or neutral—of the Middle East, the Qur'an or the Torah, while allowing discussion of Christianity. In an article in Quartz where she exposed those biases, Chloe Rose Stuart-Ulin wrote, "Zo is politically correct to the worst possible extreme; mention any of her triggers, and she transforms into a judgmental little brat." == Academic coverage == Schlesinger, A., O'Hara, K.P. and Taylor, A.S., 2018, April. Let's talk about race: Identity, chatbots, and AI. In Proceedings of the 2018 chi conference on human factors in computing systems (pp. 1–14). doi:10.1145/3173574.3173889 Medhi Thies, I., Menon, N., Magapu, S., Subramony, M. and O’neill, J., 2017. How do you want your chatbot? An exploratory Wizard-of-Oz study with young, urban Indians. In Human-Computer Interaction-INTERACT 2017: 16th IFIP TC 13 International Conference, Mumbai, India, September 25–29, 2017, Proceedings, Part I 16 (pp. 441–459). doi:10.1007/978-3-319-67744-6_28

A Logical Calculus of the Ideas Immanent in Nervous Activity

"A Logical Calculus of the Ideas Immanent in Nervous Activity" is a 1943 paper written by Warren Sturgis McCulloch and Walter Pitts, published in the journal The Bulletin of Mathematical Biophysics. The paper proposed a mathematical model of the nervous system as a network of simple logical elements, later known as artificial neurons, or McCulloch–Pitts neurons. These neurons receive inputs, perform a weighted sum, and fire an output signal based on a threshold function. By connecting these units in various configurations, McCulloch and Pitts demonstrated that their model could perform all logical functions. It is a seminal work in cognitive science, computational neuroscience, computer science, and artificial intelligence. It was a foundational result in automata theory. John von Neumann cited it as a significant result. == Mathematics == The artificial neuron used in the original paper is slightly different from the modern version. They considered neural networks that operate in discrete steps of time t = 0 , 1 , … {\displaystyle t=0,1,\dots } . The neural network contains a number of neurons. Let the state of a neuron i {\displaystyle i} at time t {\displaystyle t} be N i ( t ) {\displaystyle N_{i}(t)} . The state of a neuron can either be 0 or 1, standing for "not firing" and "firing". Each neuron also has a firing threshold θ {\displaystyle \theta } , such that it fires if the total input exceeds the threshold. Each neuron can connect to any other neuron (including itself) with positive synapses (excitatory) or negative synapses (inhibitory). That is, each neuron can connect to another neuron with a weight w {\displaystyle w} taking an integer value. A peripheral afferent is a neuron with no incoming synapses. We can regard each neural network as a directed graph, with the nodes being the neurons, and the directed edges being the synapses. A neural network has a circle or a circuit if there exists a directed circle in the graph. Let w i j ( t ) {\displaystyle w_{ij}(t)} be the connection weight from neuron j {\displaystyle j} to neuron i {\displaystyle i} at time t {\displaystyle t} , then its next state is N i ( t + 1 ) = H ( ∑ j = 1 n w i j ( t ) N j ( t ) − θ i ( t ) ) , {\displaystyle N_{i}(t+1)=H\left(\sum _{j=1}^{n}w_{ij}(t)N_{j}(t)-\theta _{i}(t)\right),} where H {\displaystyle H} is the Heaviside step function (outputting 1 if the input is greater than or equal to 0, and 0 otherwise). === Symbolic logic === The paper used, as a logical language for describing neural networks, "Language II" from The Logical Syntax of Language by Rudolf Carnap with some notations taken from Principia Mathematica by Alfred North Whitehead and Bertrand Russell. Language II covers substantial parts of classical mathematics, including real analysis and portions of set theory. To describe a neural network with peripheral afferents N 1 , N 2 , … , N p {\displaystyle N_{1},N_{2},\dots ,N_{p}} and non-peripheral afferents N p + 1 , N p + 2 , … , N n {\displaystyle N_{p+1},N_{p+2},\dots ,N_{n}} they considered logical predicate of form P r ( N 1 , N 2 , … , N p , t ) {\displaystyle Pr(N_{1},N_{2},\dots ,N_{p},t)} where P r {\displaystyle Pr} is a first-order logic predicate function (a function that outputs a boolean), N 1 , … , N p {\displaystyle N_{1},\dots ,N_{p}} are predicates that take t {\displaystyle t} as an argument, and t {\displaystyle t} is the only free variable in the predicate. Intuitively speaking, N 1 , … , N p {\displaystyle N_{1},\dots ,N_{p}} specifies the binary input patterns going into the neural network over all time, and P r ( N 1 , N 2 , … , N n , t ) {\displaystyle Pr(N_{1},N_{2},\dots ,N_{n},t)} is a function that takes some binary input patterns, and constructs an output binary pattern P r ( N 1 , N 2 , … , N n , 0 ) , P r ( N 1 , N 2 , … , N n , 1 ) , … {\displaystyle Pr(N_{1},N_{2},\dots ,N_{n},0),Pr(N_{1},N_{2},\dots ,N_{n},1),\dots } . A logical sentence P r ( N 1 , N 2 , … , N n , t ) {\displaystyle Pr(N_{1},N_{2},\dots ,N_{n},t)} is realized by a neural network iff there exists a time-delay T ≥ 0 {\displaystyle T\geq 0} , a neuron i {\displaystyle i} in the network, and an initial state for the non-peripheral neurons N p + 1 ( 0 ) , … , N n ( 0 ) {\displaystyle N_{p+1}(0),\dots ,N_{n}(0)} , such that for any time t {\displaystyle t} , the truth-value of the logical sentence is equal to the state of the neuron i {\displaystyle i} at time t + T {\displaystyle t+T} . That is, ∀ t = 0 , 1 , 2 , … , P r ( N 1 , N 2 , … , N p , t ) = N i ( t + T ) {\displaystyle \forall t=0,1,2,\dots ,\quad Pr(N_{1},N_{2},\dots ,N_{p},t)=N_{i}(t+T)} === Equivalence === In the paper, they considered some alternative definitions of artificial neural networks, and have shown them to be equivalent, that is, neural networks under one definition realizes precisely the same logical sentences as neural networks under another definition. They considered three forms of inhibition: relative inhibition, absolute inhibition, and extinction. The definition above is relative inhibition. By "absolute inhibition" they meant that if any negative synapse fires, then the neuron will not fire. By "extinction" they meant that if at time t {\displaystyle t} , any inhibitory synapse fires on a neuron i {\displaystyle i} , then θ i ( t + j ) = θ i ( 0 ) + b j {\displaystyle \theta _{i}(t+j)=\theta _{i}(0)+b_{j}} for j = 1 , 2 , 3 , … {\displaystyle j=1,2,3,\dots } , until the next time an inhibitory synapse fires on i {\displaystyle i} . It is required that b j = 0 {\displaystyle b_{j}=0} for all large j {\displaystyle j} . Theorem 4 and 5 state that these are equivalent. They considered three forms of excitation: spatial summation, temporal summation, and facilitation. The definition above is spatial summation (which they pictured as having multiple synapses placed close together, so that the effect of their firing sums up). By "temporal summation" they meant that the total incoming signal is ∑ τ = 0 T ∑ j = 1 n w i j ( t ) N j ( t − τ ) {\displaystyle \sum _{\tau =0}^{T}\sum _{j=1}^{n}w_{ij}(t)N_{j}(t-\tau )} for some T ≥ 1 {\displaystyle T\geq 1} . By "facilitation" they meant the same as extinction, except that b j ≤ 0 {\displaystyle b_{j}\leq 0} . Theorem 6 states that these are equivalent. They considered neural networks that do not change, and those that change by Hebbian learning. That is, they assume that at t = 0 {\displaystyle t=0} , some excitatory synaptic connections are not active. If at any t {\displaystyle t} , both N i ( t ) = 1 , N j ( t ) = 1 {\displaystyle N_{i}(t)=1,N_{j}(t)=1} , then any latent excitatory synapse between i , j {\displaystyle i,j} becomes active. Theorem 7 states that these are equivalent. === Logical expressivity === They considered "temporal propositional expressions" (TPE), which are propositional formulas with one free variable t {\displaystyle t} . For example, N 1 ( t ) ∨ N 2 ( t ) ∧ ¬ N 3 ( t ) {\displaystyle N_{1}(t)\vee N_{2}(t)\wedge \neg N_{3}(t)} is such an expression. Theorem 1 and 2 together showed that neural nets without circles are equivalent to TPE. For neural nets with loops, they noted that "realizable P r {\displaystyle Pr} may involve reference to past events of an indefinite degree of remoteness". These then encodes for sentences like "There was some x such that x was a ψ" or ( ∃ x ) ( ψ x ) {\displaystyle (\exists x)(\psi x)} . Theorems 8 to 10 showed that neural nets with loops can encode all first-order logic with equality and conversely, any looped neural networks is equivalent to a sentence in first-order logic with equality, thus showing that they are equivalent in logical expressiveness. As a remark, they noted that a neural network, if furnished with a tape, scanners, and write-heads, is equivalent to a Turing machine, and conversely, every Turing machine is equivalent to some such neural network. Thus, these neural networks are equivalent to Turing computability and Church's lambda-definability. == Context == === Previous work === The paper built upon several previous strands of work. In the symbolic logic side, it built on the previous work by Carnap, Whitehead, and Russell. This was contributed by Walter Pitts, who had a strong proficiency with symbolic logic. Pitts provided mathematical and logical rigor to McCulloch’s vague ideas on psychons (atoms of psychological events) and circular causality. In the neuroscience side, it built on previous work by the mathematical biology research group centered around Nicolas Rashevsky, of which McCulloch was a member. The paper was published in the Bulletin of Mathematical Biophysics, which was founded by Rashevsky in 1939. During the late 1930s, Rashevsky's research group was producing papers that had difficulty publishing in other journals at the time, so Rashevsky decided to found a new journal exclusively devoted to mathematical biophysics. Also in the Rashevsky's group was Alston Scott Householder, who in 1941 published an abstract model

Hyperparameter (machine learning)

In machine learning, a hyperparameter is a parameter that can be set in order to define any configurable part of a model's learning process. Hyperparameters can be classified as either model hyperparameters (such as the topology and size of a neural network) or algorithm hyperparameters (such as the learning rate and the batch size of an optimizer). These are named hyperparameters in contrast to parameters, which are characteristics that the model learns from the data. Hyperparameters are not required by every model or algorithm. Some simple algorithms such as ordinary least squares regression require none. However, the LASSO algorithm, for example, adds a regularization hyperparameter to ordinary least squares which must be set before training. Even models and algorithms without a strict requirement to define hyperparameters may not produce meaningful results if these are not carefully chosen. However, optimal values for hyperparameters are not always easy to predict. Some hyperparameters may have no meaningful effect, or one important variable may be conditional upon the value of another. Often a separate process of hyperparameter tuning is needed to find a suitable combination for the data and task. As well as improving model performance, hyperparameters can be used by researchers to introduce robustness and reproducibility into their work, especially if it uses models that incorporate random number generation. == Considerations == The time required to train and test a model can depend upon the choice of its hyperparameters. A hyperparameter is usually of continuous or integer type, leading to mixed-type optimization problems. The existence of some hyperparameters is conditional upon the value of others, e.g. the size of each hidden layer in a neural network can be conditional upon the number of layers. === Difficulty-learnable parameters === The objective function is typically non-differentiable with respect to hyperparameters. As a result, in most instances, hyperparameters cannot be learned using gradient-based optimization methods (such as gradient descent), which are commonly employed to learn model parameters. These hyperparameters are those parameters describing a model representation that cannot be learned by common optimization methods, but nonetheless affect the loss function. An example would be the tolerance hyperparameter for errors in support vector machines. === Untrainable parameters === Sometimes, hyperparameters cannot be learned from the training data because they aggressively increase the capacity of a model and can push the loss function to an undesired minimum (overfitting to the data), as opposed to correctly mapping the richness of the structure in the data. For example, if we treat the degree of a polynomial equation fitting a regression model as a trainable parameter, the degree would increase until the model perfectly fit the data, yielding low training error, but poor generalization performance. === Tunability === Most performance variation can be attributed to just a few hyperparameters. The tunability of an algorithm, hyperparameter, or interacting hyperparameters is a measure of how much performance can be gained by tuning it. For an LSTM, while the learning rate followed by the network size are its most crucial hyperparameters, batching and momentum have no significant effect on its performance. Although some research has advocated the use of mini-batch sizes in the thousands, other work has found the best performance with mini-batch sizes between 2 and 32. === Robustness === An inherent stochasticity in learning directly implies that the empirical hyperparameter performance is not necessarily its true performance. Methods that are not robust to simple changes in hyperparameters, random seeds, or even different implementations of the same algorithm cannot be integrated into mission critical control systems without significant simplification and robustification. Reinforcement learning algorithms, in particular, require measuring their performance over a large number of random seeds, and also measuring their sensitivity to choices of hyperparameters. Their evaluation with a small number of random seeds does not capture performance adequately due to high variance. Some reinforcement learning methods, e.g. DDPG (Deep Deterministic Policy Gradient), are more sensitive to hyperparameter choices than others. == Optimization == Hyperparameter optimization finds a tuple of hyperparameters that yields an optimal model which minimizes a predefined loss function on given test data. The objective function takes a tuple of hyperparameters and returns the associated loss. Typically these methods are not gradient based, and instead apply concepts from derivative-free optimization or black box optimization. == Reproducibility == Apart from tuning hyperparameters, machine learning involves storing and organizing the parameters and results, and making sure they are reproducible. In the absence of a robust infrastructure for this purpose, research code often evolves quickly and compromises essential aspects like bookkeeping and reproducibility. Online collaboration platforms for machine learning go further by allowing scientists to automatically share, organize and discuss experiments, data, and algorithms. Reproducibility can be particularly difficult for deep learning models. For example, research has shown that deep learning models depend very heavily even on the random seed selection of the random number generator.

ELMo

ELMo (embeddings from language model) is a word embedding method for representing a sequence of words as a corresponding sequence of vectors. It was created by researchers at the Allen Institute for Artificial Intelligence, and University of Washington and first released in February 2018. It is a bidirectional LSTM which takes character-level as inputs and produces word-level embeddings, trained on a corpus of about 30 million sentences and 1 billion words. The architecture of ELMo accomplishes a contextual understanding of tokens. Deep contextualized word representation is useful for many natural language processing tasks, such as coreference resolution and polysemy resolution. ELMo was historically important as a pioneer of self-supervised generative pretraining followed by fine-tuning, where a large model is trained to reproduce a large corpus, then the large model is augmented with additional task-specific weights and fine-tuned on supervised task data. It was an instrumental step in the evolution towards transformer-based language modelling. == Architecture == ELMo is a multilayered bidirectional LSTM on top of a token embedding layer. The output of all LSTMs concatenated together consists of the token embedding. The input text sequence is first mapped by an embedding layer into a sequence of vectors. Then two parts are run in parallel over it. The forward part is a 2-layered LSTM with 4096 units and 512 dimension projections, and a residual connection from the first to second layer. The backward part has the same architecture, but processes the sequence back-to-front. The outputs from all 5 components (embedding layer, two forward LSTM layers, and two backward LSTM layers) are concatenated and multiplied by a linear matrix ("projection matrix") to produce a 512-dimensional representation per input token. ELMo was pretrained on a text corpus of 1 billion words. The forward part is trained by repeatedly predicting the next token, and the backward part is trained by repeatedly predicting the previous token. After the ELMo model is pretrained, its parameters are frozen, except for the projection matrix, which can be fine-tuned to minimize loss on specific language tasks. This is an early example of the pretraining-fine-tune paradigm. The original paper demonstrated this by improving state of the art on six benchmark NLP tasks. === Contextual word representation === The architecture of ELMo accomplishes a contextual understanding of tokens. For example, the first forward LSTM of ELMo would process each input token in the context of all previous tokens, and the first backward LSTM would process each token in the context of all subsequent tokens. The second forward LSTM would then incorporate those to further contextualize each token. Deep contextualized word representation is useful for many natural language processing tasks, such as coreference resolution and polysemy resolution. For example, consider the sentenceShe went to the bank to withdraw money.In order to represent the token "bank", the model must resolve its polysemy in context. The first forward LSTM would process "bank" in the context of "She went to the", which would allow it to represent the word to be a location that the subject is going towards. The first backward LSTM would process "bank" in the context of "to withdraw money", which would allow it to disambiguate the word as referring to a financial institution. The second forward LSTM can then process "bank" using the representation vector provided by the first backward LSTM, thus allowing it to represent it to be a financial institution that the subject is going towards. == Historical context == ELMo is one link in a historical evolution of language modelling. Consider a simple problem of document classification, where we want to assign a label (e.g., "spam", "not spam", "politics", "sports") to a given piece of text. The simplest approach is the "bag of words" approach, where each word in the document is treated independently, and its frequency is used as a feature for classification. This was computationally cheap but ignored the order of words and their context within the sentence. GloVe and Word2Vec built upon this by learning fixed vector representations (embeddings) for words based on their co-occurrence patterns in large text corpora. Like BERT (but unlike "bag of words" such as Word2Vec and GloVe), ELMo word embeddings are context-sensitive, producing different representations for words that share the same spelling. It was trained on a corpus of about 30 million sentences and 1 billion words. Previously, bidirectional LSTM was used for contextualized word representation. ELMo applied the idea to a large scale, achieving state of the art performance. After the 2017 publication of Transformer architecture, the architecture of ELMo was changed from a multilayered bidirectional LSTM to a Transformer encoder, giving rise to BERT. BERT has a similar pretrain-fine-tune workflow, but uses a Transformer with implications for more parallelizable training.

GOLOG

GOLOG is a high-level logic programming language for the specification and execution of complex actions in dynamical domains. It is based on the situation calculus. It is a first-order logical language for reasoning about action and change. GOLOG was developed at the University of Toronto. == History == The concept of situation calculus on which the GOLOG programming language is based was first proposed by John McCarthy in 1963. == Description == A GOLOG interpreter automatically maintains a direct characterization of the dynamic world being modeled, on the basis of user supplied axioms about preconditions, effects of actions and the initial state of the world. This allows the application to reason about the condition of the world and consider the impacts of different potential actions before focusing on a specific action. Golog is a logic programming language and is very different from conventional programming languages. A procedural programming language like C defines the execution of statements in advance. The programmer creates a subroutine which consists of statements, and the computer executes each statement in a linear order. In contrast, fifth-generation programming languages like Golog work with an abstract model with which the interpreter can generate the sequence of actions. The source code defines the problem and it is up to the solver to find the next action. This approach can facilitate the management of complex problems from the domain of robotics. A Golog program defines the state space in which the agent is allowed to operate. A path in the symbolic domain is found with state space search. To speed up the process, Golog programs are realized as hierarchical task networks. Apart from the original Golog language, there are some extensions available. The ConGolog language provides concurrency and interrupts. Other dialects like IndiGolog and Readylog were created for real time applications in which sensor readings are updated on the fly. == Uses == Golog has been used to model the behavior of autonomous agents. In addition to a logic-based action formalism for describing the environment and the effects of basic actions, they enable the construction of complex actions using typical programming language constructs. It is also used for applications in high level control of robots and industrial processes, virtual agents, discrete event simulation etc. It can be also used to develop Belief Desire Intention-style agent systems. == Planning and scripting == In contrast to the Planning Domain Definition Language, Golog supports planning and scripting as well. Planning means that a goal state in the world model is defined, and the solver brings a logical system into this state. Behavior scripting implements reactive procedures, which are running as a computer program. For example, suppose the idea is to authoring a story. The user defines what should be true at the end of the plot. A solver gets started and applies possible actions to the current situation until the goal state is reached. The specification of a goal state and the possible actions are realized in the logical world model. In contrast, a hardwired reactive behavior doesn't need a solver but the action sequence is provided in a scripting language. The Golog interpreter, which is written in Prolog, executes the script and this will bring the story into the goal state.

EfficientNet

EfficientNet is a family of convolutional neural networks (CNNs) for computer vision published by researchers at Google AI in 2019. Its key innovation is compound scaling, which uniformly scales all dimensions of depth, width, and resolution using a single parameter. EfficientNet models have been adopted in various computer vision tasks, including image classification, object detection, and segmentation. == Compound scaling == EfficientNet introduces compound scaling, which, instead of scaling one dimension of the network at a time, such as depth (number of layers), width (number of channels), or resolution (input image size), uses a compound coefficient ϕ {\displaystyle \phi } to scale all three dimensions simultaneously. Specifically, given a baseline network, the depth, width, and resolution are scaled according to the following equations: depth multiplier: d = α ϕ width multiplier: w = β ϕ resolution multiplier: r = γ ϕ {\displaystyle {\begin{aligned}{\text{depth multiplier: }}d&=\alpha ^{\phi }\\{\text{width multiplier: }}w&=\beta ^{\phi }\\{\text{resolution multiplier: }}r&=\gamma ^{\phi }\end{aligned}}} subject to α ⋅ β 2 ⋅ γ 2 ≈ 2 {\displaystyle \alpha \cdot \beta ^{2}\cdot \gamma ^{2}\approx 2} and α ≥ 1 , β ≥ 1 , γ ≥ 1 {\displaystyle \alpha \geq 1,\beta \geq 1,\gamma \geq 1} . The α ⋅ β 2 ⋅ γ 2 ≈ 2 {\displaystyle \alpha \cdot \beta ^{2}\cdot \gamma ^{2}\approx 2} condition is such that increasing ϕ {\displaystyle \phi } by a factor of ϕ 0 {\displaystyle \phi _{0}} would increase the total FLOPs of running the network on an image approximately 2 ϕ 0 {\displaystyle 2^{\phi _{0}}} times. The hyperparameters α {\displaystyle \alpha } , β {\displaystyle \beta } , and γ {\displaystyle \gamma } are determined by a small grid search. The original paper suggested 1.2, 1.1, and 1.15, respectively. Architecturally, they optimized the choice of modules by neural architecture search (NAS), and found that the inverted bottleneck convolution (which they called MBConv) used in MobileNet worked well. The EfficientNet family is a stack of MBConv layers, with shapes determined by the compound scaling. The original publication consisted of 8 models, from EfficientNet-B0 to EfficientNet-B7, with increasing model size and accuracy. EfficientNet-B0 is the baseline network, and subsequent models are obtained by scaling the baseline network by increasing ϕ {\displaystyle \phi } . == Variants == EfficientNet has been adapted for fast inference on edge TPUs and centralized TPU or GPU clusters by NAS. EfficientNet V2 was published in June 2021. The architecture was improved by further NAS search with more types of convolutional layers. It also introduced a training method, which progressively increases image size during training, and uses regularization techniques like dropout, RandAugment, and Mixup. The authors claim this approach mitigates accuracy drops often associated with progressive resizing.