Discrete diffusion model

Discrete diffusion model

In machine learning, discrete diffusion models are a class of diffusion models, which themselves are a class of latent variable generative models. Each discrete diffusion model consists of two major components: the forward jump diffusion process, and the reverse jump diffusion process. The goal of diffusion modeling is, given a given dataset and a forward process, to learn a model for the reverse process, such that the reverse process can generate new elements that are distributed similarly as the original dataset. A trained discrete diffusion model can be sampled in many ways, which trades off computational efficiency and sample quality. In general, higher quality data can be obtained, but at the price of higher computational cost. In standard diffusion modeling, the diffusion process takes place over a state space that is continuous space of R n {\displaystyle \mathbb {R} ^{n}} , but over a discrete set S {\displaystyle S} . A discrete set is simply a set where one cannot speak of "infinitesimally close" points. Points can be more or less separated from each other, but the separation is always a finite number. This in particular means the standard framework of continuous diffusion does not apply, since it uses gaussian noise, which is continuous. Nevertheless, an analogous theory can be produced. Discrete diffusion is usually used for language modeling. In practice, the state space S {\displaystyle S} is not only discrete, but finite, so this is what we will assume from now on. == Continuous time Markov process == In the case of continuous state space, during the forward discrete diffusion process, at each step t → t + d t {\displaystyle t\to t+dt} , we mix in an infinitesimal amount of gaussian noise d x t = − 1 2 β ( t ) x t d t + β ( t ) d W t {\displaystyle dx_{t}=-{\frac {1}{2}}\beta (t)x_{t}dt+{\sqrt {\beta (t)}}dW_{t}} . This changes the probability density function, by first a convolution with the density of a gaussian, followed by a scaling. In the case of discrete state space, the gaussian noise must be replaced by a noise that takes values over a finite set. For example, if the noise is the uniform distribution over S {\displaystyle S} , then the probability distribution at time t + d t {\displaystyle t+dt} satisfies q t + d t ( x ) = ( 1 − d t ) q t ( x ) + d t ( 1 | S | ∑ y ∈ S q t ( y ) ) {\displaystyle q_{t+dt}(x)=(1-dt)q_{t}(x)+dt\left({\frac {1}{|S|}}\sum _{y\in S}q_{t}(y)\right)} More succinctly, ∂ t q t ( x ) = − ( 1 − 1 | S | ) q t ( x ) + ∑ y ∈ S , y ≠ x 1 | S | q t ( y ) {\displaystyle \partial _{t}q_{t}(x)=-\left(1-{\frac {1}{|S|}}\right)q_{t}(x)+\sum _{y\in S,y\neq x}{\frac {1}{|S|}}q_{t}(y)} In general, we do not need to convolve with a uniformly distributed noise, but with an arbitrary noise process. That is, we use an arbitrary matrix Q t {\displaystyle Q_{t}} such that ∂ t q t ( y ) = ∑ x ∈ S Q t ( y , x ) q t ( x ) {\displaystyle \partial _{t}q_{t}(y)=\sum _{x\in S}Q_{t}(y,x)q_{t}(x)} where Q t {\displaystyle Q_{t}} is called the rate matrix. Any matrix may be used as a rate matrix if it has non-negative off-diagonals, and each column sums to 0: Q t ( y , x ) ≥ 0 ∀ y ≠ x , ∑ y ∈ S Q t ( y , x ) = 0 ∀ x {\displaystyle Q_{t}(y,x)\geq 0\quad \forall y\neq x,\quad \sum _{y\in S}Q_{t}(y,x)=0\quad \forall x} A continuous time Markov chain (CTMC) is defined by a continuous function Q {\displaystyle Q} that maps any time t ∈ [ 0 , T ) {\displaystyle t\in [0,T)} to a rate matrix Q t {\displaystyle Q_{t}} . Given the function Q {\displaystyle Q} , time-evolution under the CTMC is done as follows: Given state x t {\displaystyle x_{t}} at time t {\displaystyle t} , and given an infinitesimal d t {\displaystyle dt} , the state at t + d t {\displaystyle t+dt} is x t + d t {\displaystyle x_{t+dt}} , such that Pr ( x t + d t | x t ) = { 1 + Q t ( x t + d t , x t ) d t if x t + d t = x t Q t ( x t + d t , x t ) d t else {\displaystyle \Pr(x_{t+dt}|x_{t})={\begin{cases}1+Q_{t}(x_{t+dt},x_{t})dt&{\text{if }}x_{t+dt}=x_{t}\\Q_{t}(x_{t+dt},x_{t})dt&{\text{else}}\end{cases}}} This implies that the probability distribution function evolves according to ∂ t q t ( y ) = ∑ x ∈ S Q t ( y , x ) q t ( x ) {\displaystyle \partial _{t}q_{t}(y)=\sum _{x\in S}Q_{t}(y,x)q_{t}(x)} which is what we previously specified. === Backward process === Similarly to the case of continuous diffusion, in discrete diffusion, there exists a backward diffusion process Q ¯ t {\displaystyle {\bar {Q}}_{t}} : s ( x , t ) y := q t ( y ) q t ( x ) , Q ¯ t ( y , x ) := { s ( x , t ) y Q t ( x , y ) if y ≠ x − ∑ y : y ≠ x Q ¯ t ( y , x ) if y = x {\displaystyle s(x,t)_{y}:={\frac {q_{t}(y)}{q_{t}(x)}},\quad {\bar {Q}}_{t}(y,x):={\begin{cases}s(x,t)_{y}Q_{t}(x,y)&{\text{if }}y\neq x\\-\sum _{y:y\neq x}{\bar {Q}}_{t}(y,x)&{\text{if }}y=x\end{cases}}} where s ( x , t ) y {\displaystyle s(x,t)_{y}} should be interpreted as the discrete score or concrete score, since, abusing notation a bit, the score function is ∇ ln ⁡ ρ t ( x ) = 1 d x ( ρ t ( x + d x ) ρ t ( x ) − 1 ) {\displaystyle \nabla \ln \rho _{t}(x)={\frac {1}{dx}}\left({\frac {\rho _{t}(x+dx)}{\rho _{t}(x)}}-1\right)} . If we picture the distribution q t {\displaystyle q_{t}} as a bunch of point-masses, one per state x ∈ S {\displaystyle x\in S} , then the forward diffusion from time t {\displaystyle t} to t + d t {\displaystyle t+dt} is performed by removing Q t ( x , y ) q t ( y ) d t {\displaystyle Q_{t}(x,y)q_{t}(y)dt} from the mass at y {\displaystyle y} and moving it to the mass at x {\displaystyle x} , for each pair x ≠ y {\displaystyle x\neq y} . Thus, the process is reversed in detail by the CTMC defined by Q ¯ {\displaystyle {\bar {Q}}} , since Q ¯ t ( y , x ) q t ( x ) = Q t ( x , y ) q t ( y ) {\displaystyle {\bar {Q}}_{t}(y,x)q_{t}(x)=Q_{t}(x,y)q_{t}(y)} . Given Q ¯ t {\displaystyle {\bar {Q}}_{t}} , if we have a way to sample from q t {\displaystyle q_{t}} , then we can sample from q t − d t {\displaystyle q_{t-dt}} by first sampling x t ∼ q t {\displaystyle x_{t}\sim q_{t}} , then sampling x t − d t {\displaystyle x_{t-dt}} according to Pr ( x t − d t | x t ) = { 1 + Q ¯ t ( x t − d t , x t ) d t if x t − d t = x t Q ¯ t ( x t − d t , x t ) d t else {\displaystyle \Pr(x_{t-dt}|x_{t})={\begin{cases}1+{\bar {Q}}_{t}(x_{t-dt},x_{t})dt&{\text{if }}x_{t-dt}=x_{t}\\{\bar {Q}}_{t}(x_{t-dt},x_{t})dt&{\text{else}}\end{cases}}} === Overall plan of score-matching discrete diffusion modeling === Similar to score-matching continuous diffusion, score-matching discrete diffusion is a method to sample an initial distribution. If we have a certain function s θ {\displaystyle s_{\theta }} that approximates the true score function s θ ( x , t ) y ≈ s ( x , t ) y {\displaystyle s_{\theta }(x,t)_{y}\approx s(x,t)_{y}} , then it allows a corresponding Q ¯ θ {\displaystyle {\bar {Q}}^{\theta }} to be defined in the same way. If we also have a base distribution q base {\displaystyle q_{\text{base}}} such that it is easy to sample from, and approximately equal to the true terminal distribution q base ≈ q T {\displaystyle q_{\text{base}}\approx q_{T}} , then we can perform the backward CTMC with Q ¯ θ {\displaystyle {\bar {Q}}^{\theta }} and q T θ := q terminal {\displaystyle q_{T}^{\theta }:=q_{\text{terminal}}} . When both approximations are good, the backward CTMC would give q 0 θ ≈ q 0 {\displaystyle q_{0}^{\theta }\approx q_{0}} . This is the idea of score-matching discrete diffusion modeling. If q data {\displaystyle q_{\text{data}}} is sharp, in the sense that for some x , x ′ {\displaystyle x,x'} , we have q data ( x ) ≫ q data ( x ′ ) {\displaystyle q_{\text{data}}(x)\gg q_{\text{data}}(x')} , then the score function would diverge as 1 / t {\displaystyle 1/t} at the t → 0 {\displaystyle t\to 0} limit. To avoid this in practice, it is common to use early stopping, which is to stop the backward process at some time δ > 0 {\displaystyle \delta >0} , and sample from q δ θ {\displaystyle q_{\delta }^{\theta }} instead of q 0 θ {\displaystyle q_{0}^{\theta }} . === Tractable forward processes === The theory of CTMC works for any continuous choice of rate matrices Q {\displaystyle Q} . However, most choices are computationally expensive and cannot be used in practice. In the case of continuous diffusion, the gaussian noise is used for the simple reason that the sum of any number of gaussians is still a gaussian. This allows one to sample any x t ∼ ρ t {\displaystyle x_{t}\sim \rho _{t}} by sampling a single x 0 ∼ ρ 0 {\displaystyle x_{0}\sim \rho _{0}} , followed by a single gaussian noise z ∼ N ( 0 , I ) {\displaystyle z\sim {\mathcal {N}}(0,I)} , and let x t = α ¯ t x 0 + σ t z {\displaystyle x_{t}={\sqrt {{\bar {\alpha }}_{t}}}x_{0}+\sigma _{t}z} , without needing any x s {\displaystyle x_{s}} for any 0 < s < t {\displaystyle 0

NRD Cyber Security

NRD Cyber Security is a Lithuanian company that provides cybersecurity solutions, consulting, and other services. The organization specializes in CSIRT and SOC creation, modernization and training. It has helped to establish national and sectorial CSIRTs around the world, including countries, such as Bangladesh, Egypt, Bhutan, Kosovo, Malawi and others. NRD Cyber Security was found in 2013 to provide quality cybersecurity services to nations and organizations. In 2018 it was included in The Deloitte Technology Fast 50 in Europe list. In 2024 it was awarded the #98 place in MSSP Alert Top 250 world's managed security service providers. The company is a member of various cybersecurity organizations, such as Forum of Incident Response and Security Teams (FIRST), The Global Forum on Cyber Expertise (GFCE), Unicrons Lt. It is a strategic partner of The Global Cyber Security Capacity Centre (GCSCC) at University of Oxford.

Digital image correlation for electronics

Digital image correlation analyses have applications in material property characterization, displacement measurement, and strain mapping. As such, DIC is becoming an increasingly popular tool when evaluating the thermo-mechanical behavior of electronic components and systems. == CTE measurements and glass transition temperature identification == The most common application of DIC in the electronics industry is the measurement of coefficient of thermal expansion (CTE). Because it is a non-contact, full-field surface technique, DIC is ideal for measuring the effective CTE of printed circuit boards (PCB) and individual surfaces of electronic components. It is especially useful for characterizing the properties of complex integrated circuits, as the combined thermal expansion effects of the substrate, molding compound, and die make effective CTE difficult to estimate at the substrate surface with other experimental methods. DIC techniques can be used to calculate average in-plane strain as a function of temperature over an area of interest during a thermal profile. Linear curve-fitting and slope calculation can then be used to estimate an effective CTE for the observed area. Because the driving factor in solder fatigue is most often the CTE mismatch between a component and the PCB it is soldered to, accurate CTE measurements are vital for calculating printed circuit board assembly (PCBA) reliability metrics. DIC is also useful for characterizing the thermal properties of polymers. Polymers are often used in electronic assemblies as potting compounds, conformal coatings, adhesives, molding compounds, dielectrics, and underfills. Because the stiffness of such materials can vary widely, accurately determining their thermal characteristics with contact techniques that transfer load to the specimen, such as dynamic mechanical analysis (DMA) and thermomechanical analysis (TMA), is difficult to do with consistency. Accurate CTE measurements are important for these materials because, depending on the specific use case, expansion and contraction of these materials can drastically affect solder joint reliability. For example, if a stiff conformal coating or other polymeric encapsulation is allowed to flow under a QFN, its expansion and contraction during thermal cycling can add tensile stress to the solder joints and expedite fatigue failure. DIC techniques will also allow the detection of glass transition temperature (Tg). At a glass transition temperature, the strain vs. temperature plot will exhibit a change in slope. Determining the Tg is very important for polymeric materials that could have glass transition temperatures within the operating temperature range of the electronics assemblies and components on which they are used. For example, some potting materials can see the Elastic Modulus of the material change by a factor of 100 or more over the glass transition region. Such changes can have drastic effects on an electronic assembly's reliability if they are not planned for in the design process. == Out-of-plane component warpage == When 3D DIC techniques are employed, out-of-plane motion can be tracked in addition to in-plane motion. Out-of-plane warpage is especially of interest at the component level of electronics packaging for solder joint reliability quantification. Excessive warpage during reflow can contribute to defective solder joints by lifting the edges of the component away from the board and creating head-in-pillow defects in ball grid arrays (BGA). Warpage can also shorten the fatigue life of adequate joints by adding tensile stresses to edge joints during thermal cycling. == Thermo-mechanical strain mapping == When a PCBA is over-constrained, thermo-mechanical stress brought about during thermal expansion can cause board strains that could negatively affect individual component and overall assembly reliability. The full-field monitoring capabilities of an image correlation technique allow for the measurement of strain magnitude and location on the surface of a specimen during a displacement-causing event, such as PCBA during a thermal profile. These "strain maps" allow for the comparison of strain levels over full areas of interest. Many traditional discrete methods, like extensometers and strain gauges, only allow for localized measurements of strain, inhibiting their ability to efficiently measure strain across larger areas of interest. DIC techniques have also been used to generate strain maps from purely mechanical events, such as drop impact tests, on electronic assemblies.

DVD

DVD (digital video disc or digital versatile disc) is a digital optical disc data storage format. It was invented and developed in 1995 and first released on November 1, 1996, in Japan. The medium can store any kind of digital data and has been widely used to store video programs (watched using DVD players), software and other computer files. DVDs offer significantly higher storage capacity than compact discs (CD) while having the same dimensions. A standard single-layer DVD can store up to 4.7 GB of data, a dual-layer DVD up to 8.5 GB. Dual-layer, double-sided DVDs can store up to a maximum of 17.08 GB. Prerecorded DVDs are mass-produced using molding machines that physically stamp data onto the DVD. Such discs are a form of DVD-ROM because data can only be read and not written or erased. Blank recordable DVD discs (DVD-R and DVD+R) can be recorded once using a DVD recorder and then function as a DVD-ROM. Rewritable DVDs (DVD-RW, DVD+RW, and DVD-RAM) can be recorded and erased many times. DVDs are used in DVD-Video consumer digital video format and less commonly in DVD-Audio consumer digital audio format, as well as for authoring DVD discs written in a special AVCHD format to hold high definition material (often in conjunction with AVCHD format camcorders). DVDs containing other types of information may be referred to as DVD data discs. == Etymology == The Oxford English Dictionary comments that, "In 1995, rival manufacturers of the product initially named digital video disc agreed that, in order to emphasize the flexibility of the format for multimedia applications, the preferred abbreviation DVD would be understood to denote digital versatile disc." The OED also states that in 1995, "The companies said the official name of the format will simply be DVD. Toshiba had been using the name 'digital video disc', but that was switched to 'digital versatile disc' after computer companies complained that it left out their applications." "Digital versatile disc" is the explanation provided in a DVD Forum Primer from 2000 and in the DVD Forum's mission statement, which the purpose is to promote broad acceptance of DVD products on technology, across entertainment, and other industries. Because DVDs became highly popular for the distribution of movies in the 2000s, the term DVD became popularly used in English as a noun to describe specifically a full-length movie released on the format; for example the phrase "to watch a DVD" describes watching a movie on DVD. == History == === Development and launch === Released in 1987, CD Video used analog video encoding on optical discs matching the established standard 120 mm (4.7 in) size of audio CDs. Video CD (VCD) became one of the first formats for distributing digitally encoded films in this format, in 1993. In the same year, two new optical disc storage formats were being developed. One was the Multimedia Compact Disc (MMCD), backed by Philips and Sony (developers of the CD and CD-i), and the other was the Super Density (SD) disc, supported by Toshiba, Time Warner, Matsushita Electric, Hitachi, Mitsubishi Electric, Pioneer, Thomson, and JVC. By the time of the press launches for both formats in January 1995, the MMCD nomenclature had been dropped, and Philips and Sony were referring to their format as Digital Video Disc (DVD). On May 3, 1995, an ad hoc industry technical group formed from five computer companies (IBM, Apple, Compaq, Hewlett-Packard, and Microsoft) issued a press release stating that they would only accept a single format. The group voted to boycott both formats unless the two camps agreed on a single, converged standard. They recruited Lou Gerstner, president of IBM, to pressure the executives of the warring factions. In one significant compromise, the MMCD and SD groups agreed to adopt proposal SD 9, which specified that both layers of the dual-layered disc be read from the same side—instead of proposal SD 10, which would have created a two-sided disc that users would have to turn over. Philips/Sony strongly insisted on the source code, EFMPlus, that Kees Schouhamer Immink had designed for the MMCD, because it makes it possible to apply the existing CD servo technology. Its drawback was a loss from 5 to 4.7 Gigabytes of capacity. As a result, the DVD specification provided a storage capacity of 4.7 GB (4.38 GiB) for a single-layered, single-sided disc and 8.5 GB (7.92 GiB) for a dual-layered, single-sided disc. The DVD specification ended up similar to Toshiba and Matsushita's Super Density Disc, except for the dual-layer option. MMCD was single-sided and optionally dual-layer, whereas SD was two half-thickness, single-layer discs which were pressed separately and then glued together to form a double-sided disc. Philips and Sony decided that it was in their best interests to end the format war, and on September 15, 1995 agreed to unify with companies backing the Super Density Disc to release a single format, with technologies from both. After other compromises between MMCD and SD, the group of computer companies won the day, and a single format was agreed upon. The computer companies also collaborated with the Optical Storage Technology Association (OSTA) on the use of their implementation of the ISO-13346 file system (known as Universal Disk Format) for use on the new DVDs. The format's details were finalized on December 8, 1995. In November 1995, Samsung announced it would start mass-producing DVDs by September 1996. The format launched on November 1, 1996, in Japan, mostly with music video releases. The first major releases from Warner Home Video arrived on December 20, 1996, with four titles being available. The format's release in the U.S. was delayed multiple times, from August 1996, to October 1996, November 1996, before finally settling on early 1997. Players began to be produced domestically that winter, with March 24, 1997, as the U.S. launch date of the format proper in seven test markets. Approximately 32 titles were available on launch day, mainly from the Warner Bros., MGM, and New Line libraries, with the notable inclusion of the 1996 film Twister. However, the launch was planned for the following day (March 25), leading to a distribution change with retailers and studios to prevent similar violations of breaking the street date. The nationwide rollout for the format happened on August 22, 1997. DTS announced in late 1997 that they would be coming onto the format. The sound system company revealed details in a November 1997 online interview, and clarified it would release discs in early 1998. However, this date would be pushed back several times before finally releasing their first titles at the 1999 Consumer Electronics Show. In 2001, blank DVD recordable discs cost the equivalent of $27.34 US dollars in 2022. === Adoption === Movie and home entertainment distributors adopted the DVD format to replace the ubiquitous VHS tape as the primary consumer video distribution format. Immediately following the formal adoption of a unified standard for DVD, two of the four leading video game console companies (Sega and The 3DO Company) said they already had plans to design a gaming console with DVDs as the source medium. Sony stated at the time that they had no plans to use DVD in their gaming systems, despite being one of the developers of the DVD format and eventually the first company to actually release a DVD-based console. Game consoles such as the PlayStation 2, Xbox, and Xbox 360 use DVDs as their source medium for games and other software. Contemporary games for Windows were also distributed on DVD. Early DVDs were mastered using DLT tape, but using DVD-R DL or +R DL eventually became common. TV DVD combos, combining a standard definition CRT TV or an HD flat panel TV with a DVD mechanism under the CRT or on the back of the flat panel, and VCR/DVD combos were also available for purchase. For consumers, DVD soon overtook VHS as the favored choice for home movie releases. In 2001, DVD players outsold VCRs for the first time in the United States. At that time, one in four American households owned a DVD player. By 2007, about 80% of Americans owned a DVD player, a figure that had surpassed VCRs; it was also higher than personal computers or cable television. == Specifications == The DVD specifications created and updated by the DVD Forum are published as so-called DVD Books (e.g. DVD-ROM Book, DVD-Audio Book, DVD-Video Book, DVD-R Book, DVD-RW Book, DVD-RAM Book, DVD-AR (Audio Recording) Book, DVD-VR (Video Recording) Book, etc.). DVD discs are made up of two discs; normally one is blank, and the other contains data. Each disc is 0.6 mm thick, and they are glued together to form a DVD disc. The gluing process must be done carefully to make the disc as flat as possible to avoid both birefringence and "disc tilt", which is when the disc is not perfectly flat, preventing it from being read. Some specifications for mechanical, physical and optical characteristics of DV

The Dodo (website)

The Dodo is an American online publisher focused on animals. The website was launched in January 2014 by Izzie Lerer, the daughter of media executive Kenneth Lerer, and journalist Kerry Lauerman. The Dodo has become one of the most popular Facebook publishers, garnering 1 billion video views from the social network in November 2015. The Dodo is headquartered in New York, New York. == History == The company—named after the first recorded species that humans drove to extinction—was founded by Lerer out of "a personal passion for the subject manner". Lerer has a PhD in animal studies with a focus on animal ethics and human relationships from Columbia University, launching the website after noticing the viral success of animal videos online but seeing no one "really owned the space." The Dodo's editorial and video production staff unionized with the Writers Guild of America, East in April 2018.

Convolutional layer

In artificial neural networks, a convolutional layer is a type of network layer that applies a convolution operation to the input. Convolutional layers are some of the primary building blocks of convolutional neural networks (CNNs), a class of neural network most commonly applied to images, video, audio, and other data that have the property of uniform translational symmetry. The convolution operation in a convolutional layer involves sliding a small window (called a kernel or filter) across the input data and computing the dot product between the values in the kernel and the input at each position. This process creates a feature map that represents detected features in the input. == Concepts == === Kernel === Kernels, also known as filters, are small matrices of weights that are learned during the training process. Each kernel is responsible for detecting a specific feature in the input data. The size of the kernel is a hyperparameter that affects the network's behavior. === Convolution === For a 2D input x {\displaystyle x} and a 2D kernel w {\displaystyle w} , the 2D convolution operation can be expressed as: y [ i , j ] = ∑ m = 0 k h − 1 ∑ n = 0 k w − 1 x [ i + m , j + n ] ⋅ w [ m , n ] {\displaystyle y[i,j]=\sum _{m=0}^{k_{h}-1}\sum _{n=0}^{k_{w}-1}x[i+m,j+n]\cdot w[m,n]} where k h {\displaystyle k_{h}} and k w {\displaystyle k_{w}} are the height and width of the kernel, respectively. This generalizes immediately to nD convolutions. Commonly used convolutions are 1D (for audio and text), 2D (for images), and 3D (for spatial objects, and videos). === Stride === Stride determines how the kernel moves across the input data. A stride of 1 means the kernel shifts by one pixel at a time, while a larger stride (e.g., 2 or 3) results in less overlap between convolutions and produces smaller output feature maps. === Padding === Padding involves adding extra pixels around the edges of the input data. It serves two main purposes: Preserving spatial dimensions: Without padding, each convolution reduces the size of the feature map. Handling border pixels: Padding ensures that border pixels are given equal importance in the convolution process. Common padding strategies include: No padding/valid padding. This strategy typically causes the output to shrink. Same padding: Any method that ensures the output size same as input size is a same padding strategy. Full padding: Any method that ensures each input entry is convolved over for the same number of times is a full padding strategy. Common padding algorithms include: Zero padding: Add zero entries to the borders of input. Mirror/reflect/symmetric padding: Reflect the input array on the border. Circular padding: Cycle the input array back to the opposite border, like a torus. The exact numbers used in convolutions is complicated, for which we refer to (Dumoulin and Visin, 2018) for details. == Variants == === Standard === The basic form of convolution as described above, where each kernel is applied to the entire input volume. === Depthwise separable === Depthwise separable convolution separates the standard convolution into two steps: depthwise convolution and pointwise convolution. The depthwise separable convolution decomposes a single standard convolution into two convolutions: a depthwise convolution that filters each input channel independently and a pointwise convolution ( 1 × 1 {\displaystyle 1\times 1} convolution) that combines the outputs of the depthwise convolution. This factorization significantly reduces computational cost. It was first developed by Laurent Sifre during an internship at Google Brain in 2013 as an architectural variation on AlexNet to improve convergence speed and model size. === Dilated === Dilated convolution, or atrous convolution, introduces gaps between kernel elements, allowing the network to capture a larger receptive field without increasing the kernel size. === Transposed === Transposed convolution, also known as deconvolution, fractionally strided convolution, and upsampling convolution, is a convolution where the output tensor is larger than its input tensor. It's often used in encoder-decoder architectures for upsampling. It's used in image generation, semantic segmentation, and super-resolution tasks. == History == The concept of convolution in neural networks was inspired by the visual cortex in biological brains. Early work by Hubel and Wiesel in the 1960s on the cat's visual system laid the groundwork for artificial convolution networks. An early convolution neural network was developed by Kunihiko Fukushima in 1969. It had mostly hand-designed kernels inspired by convolutions in mammalian vision. In 1979 he improved it to the Neocognitron, which learns all convolutional kernels by unsupervised learning (in his terminology, "self-organized by 'learning without a teacher'"). During the 1988 to 1998 period, a series of CNN were introduced by Yann LeCun et al., ending with LeNet-5 in 1998. It was an early influential CNN architecture for handwritten digit recognition, trained on the MNIST dataset, and was used in ATM. (Olshausen & Field, 1996) discovered that simple cells in the mammalian primary visual cortex implement localized, oriented, bandpass receptive fields, which could be recreated by fitting sparse linear codes for natural scenes. This was later found to also occur in the lowest-level kernels of trained CNNs. The field saw a resurgence in the 2010s with the development of deeper architectures and the availability of large datasets and powerful GPUs. AlexNet, developed by Alex Krizhevsky et al. in 2012, was a catalytic event in modern deep learning. In that year’s ImageNet competition, the AlexNet model achieved a 16% top-five error rate, significantly outperforming the next best entry, which had a 26% error rate. The network used eight trainable layers, approximately 650,000 neurons, and around 60 million parameters, highlighting the impact of deeper architectures and GPU acceleration on image recognition performance. From the 2013 ImageNet competition, most entries adopted deep convolutional neural networks, building on the success of AlexNet. Over the following years, performance steadily improved, with the top-five error rate falling from 16% in 2012 and 12% in 2013 to below 3% by 2017, as networks grew increasingly deep.

KKday

KKday is an online travel e-commerce platform focused on connecting independent travelers with authentic, curated local experiences, tours, activities, and attraction tickets. == History == KKday was founded in 2014 in Taipei, Taiwan, by CEO Ming Chen, who previously started and led both Star Travel and Ezfly to IPO. In March of 2016, the company raised US$4.5 million in a Series A round led by AppWorks Ventures with participation by 91Capital. The raise allowed KKday to open offices and expand into Hong Kong, Japan, South Korea and Singapore by 2016. By the end of 2016, KKday offered over 6,000 travel experiences across 53 countries and 174 cities, marking early international expansion with its official launch in Singapore in October 2016, accompanied by promotional campaigns to attract regional users. Expansion into Malaysia, Thailand, Vietnam and the Philippines continued throughout 2017 and into 2018, with the company opening offices in Indonesia and mainland China. KKday rapidly expanded its inventory, reaching over 10,000 experiences in more than 500 cities across 80 countries by 2018, with key markets in Taiwan, Hong Kong, and South Korea. In February 2018, KKday raised $10.5 million in a funding round led by Japanese travel giant H.I.S., allowing integration with larger travel networks and further global growth. Forbes reports that by the end of 2018, the company operated in 11 countries and regions, employed around 400 staff, and recorded over 4 million weekly website views with more than 1 million app downloads. A combination of a Japanese and South Korean trade dispute, along with the Covid-19 pandemic in 2020, lead KKday to pivot quickly toward domestic staycations and local experiences while initially raising $70m in their Series C which, was later extended to $95m. The Series C funds were partially used to accelerate and expand Rezio. Launched in 2019, Rezio is KKday's B2B SaaS booking management platform for travel providers, allowing them to track inventory, manage reservations and sell tickets. FineDayClub was launched in 2020 by KKday as a personalized luxury subscription travel service to cater to high end clients. KKday’s CFO, Jenny Tsai pivoted to lead KKday’s new venture. KKday was able to successfully navigate and adapt to travel patterns during the Covid-19 pandemic by reducing user acquisition costs by two thirds and focusing on domestic travel experiences to drive bookings and revenue. KKday was particularly successful in Vietnam, with bookings increased by 2,000% through 2022 and the company's travel operator platform Rezio, onboarding over 1,200 operators inside the country. In 2021, KKday acquired Activity Japan, a domestic focused travel company, founded by Kimiharu Obuchi in 2014. The successful acquisition, a key factor in KKday’s rapid expansion in the Japanese market, was facilitated by H.I.S., a common early investor in both platforms. In 2023 KKday inked a partnership with Rail Europe to create an all-in-one platform for 150 rail lines over 33 European countries with the intent of increasing ridership across Europe. In late 2024, KKday completed its Series D at $70M, bringing the total amount of capital raised to over $250M. The funds are to be earmarked for continued global expansion, artificial intelligence integration and enhanced partnerships, similar to the partnership with Tablelog, which now allows users to book restaurant reservations at 42,000 restaurants in Japan through the platform. == Platform == KKDay is an e-commerce online travel agency operating in 92 countries with over 350,000 travel experiences available for booking. The company started with focus on authentic local travel experiences in the Asian Pacific market and has expanded to a more global focus. KKday connects travelers with travel services and experiences such as attraction tickets, theme parks, cultural experiences, and seasonal events. KKday has positioned itself as an all-in-one travel super app with booking for hotels, rental cars, flights, sim cards, rail passes, dining and tickets. === Rezio === Rezio is a cloud-based SaaS booking management platform developed by KKday specifically for tour operators, activity providers, and attractions in the travel industry. It serves as an all-in-one system designed to help these businesses digitize their operations, particularly those previously relying on offline processes. Features include a mobile app for on-the-go order management, customer information checks, and voucher scanning, as well as channel management, analytics for customer data, and integrations with multiple OTAs and payment providers. Unlike KKday, which is an OTA marketplace for consumer exposure (with commissions), Rezio focuses on backend operations for suppliers, allowing brand independence, operational efficiency, and direct customer relationships while optionally connecting to OTAs like KKday. Rezio supports over 5,000 merchants, 30,000 experiences, and 10 million travelers worldwide, with a strong presence in Asia. One of the brands successful implementations was at the Nikko Toshogu Shrine where Rezio was implemented to help with long lines and wait times due to over-tourism. The shrine was able to implement the inventory management features to allow online booking and cashless payments onsite. === FineDayClub === FineDayClub is a membership-based travel concierge service launched in late 2020 by KKday. It is aimed at families, and organizations seeking customized travel experiences. It offers one-on-one advisory services. === ActivityJapan === ActivityJapan is a Japanese comprehensive online travel site that specializes in authentic Japanese travel experiences. It was purchased by KKday in 2021 but continues to operate independently.