Information Networking Institute (INI) is an academic department within the College of Engineering at Carnegie Mellon University. The institute was established in 1989 as the nation's first research and education center devoted to information networking. The INI also partners with research and outreach entities to extend educational and training programs to a broad audience of people using information networking as part of their daily lives. The INI is the educational partner of Carnegie Mellon CyLab, a university-wide, multidisciplinary research center involving more than 50 faculty and 100 graduate students. == Center of Academic Excellence Designations == Through the work of the INI and CyLab, Carnegie Mellon University has been designated by the National Security Agency and the Department of Homeland Security as a National Center of Academic Excellence in Information Assurance/Cyber Defense Education (CAE-IA/CD) and a National Center of Academic Excellence in Information Assurance/Cyber Defense Research (CAE-R). It has also been designated by the NSA and the U.S. Cyber Command as a National Center of Academic Excellence in Cyber Operations (CAE-Cyber Ops). Through these designations, the INI and CyLab participate in the: Federal CyberCorps Scholarship for Service (SFS) Program - Students pursuing graduate degrees in information security (MSIS or MSISPM) are eligible for scholarships under the SFS program. Information Assurance Scholarship Program (IASP) - Students pursuing graduate degrees in information security and seeking careers with the Department of Defense may be eligible for scholarships under the IASP. Capacity Building Program for Faculty from Historically Black and Hispanic Serving Institutions - The INI and CyLab developed a month-long, in-residence summer program to help build information assurance education and research capacity at colleges and universities designated as Minority Serving Institutions – specifically, Historically Black Colleges and Universities (HBCUs) and Hispanic Serving Institutions (HSIs). This program is supported through a grant from the National Science Foundation. == Faculty and researchers == Faculty involved in teaching and advising in the INI programs are conducting research in all aspects of information networking and information security. Affiliated research centers are: Carnegie Mellon CyLab SEI's CERT Division == Alumni == The INI has graduated over 1,400 alumni who currently occupy positions in a variety of sectors across industry, government and academia.
Cloud-based design and manufacturing
Cloud-based design and manufacturing (CBDM) refers to a service-oriented networked product development model in which service consumers are able to configure products or services and reconfigure manufacturing systems through Infrastructure-as-a-Service (IaaS), Platform-as-a-Service (PaaS), Hardware-as-a-Service (HaaS), and Software-as-a-Service (SaaS). Adapted from the original cloud computing paradigm and introduced into the realm of computer-aided product development, Cloud-Based Design and Manufacturing is gaining significant momentum and attention from both academia and industry. Cloud-based design and manufacturing includes two aspects: cloud-based design and cloud-based manufacturing. Another related concept is cloud manufacturing that is more general and popular. Cloud-Based Design (CBD) refers to a networked design model that leverages cloud computing, service-oriented architecture (SOA), Web 2.0 (e.g., social network sites), and semantic web technologies to support cloud-based engineering design services in distributed and collaborative environments. Cloud-Based Manufacturing (CBM) refers to a networked manufacturing model that exploits on-demand access to a shared collection of diversified and distributed manufacturing resources to form temporary, reconfigurable production lines which enhance efficiency, reduce product lifecycle costs, and allow for optimal resource allocation in response to variable-demand customer generated tasking. The enabling technologies for Cloud-Based Design and Manufacturing include cloud computing, Web 2.0, Internet of Things (IoT), and service-oriented architecture (SOA). == History == The term cloud-based design and manufacturing (CBDM) was initially coined by Dazhong Wu, David Rosen, and Dirk Schaefer at Georgia Tech in 2012 for the purpose of articulating a new paradigm for digital manufacturing and design innovation in distributed and collaborative settings. The main objective of CBDM is to further reduce time and cost associated with maintaining information and communication technology (ICT) infrastructures for design and manufacturing, enhancing digital manufacturing and design innovation in distributed and collaborative environments, and adapting to rapidly changing market demands. In 2014, the same research group also published the worldwide first two books on the subjects of Cloud-Based Design and Manufacturing (CBDM) and Social Product Development (SPD) with Springer, edited by Dirk Schaefer. == Characteristics == CBDM exhibits the following key characteristics: Cloud-based distributed file system High performance computing Cloud-based social collaboration Ubiquitous access to distributed big data Rapid manufacturing scalability Agility On-demand self-service Semantic Web Real-time request for quotation Pay-per-use pricing model Multi-tenancy CBDM differs from traditional collaborative and distributed design and manufacturing systems such as web-based systems and agent-based systems from a number of perspectives, including (1) computing architecture, (2) data storage, (3) sourcing process, (4) information and communication technology infrastructure, (5) business model, (6) programming model, and (7) communication. == Service models == Infrastructure as a service (IaaS) Platform as a service (PaaS) Hardware as a service (HaaS) Software as a service (SaaS) Similar to cloud computing, CBDM services can be categorized into four major deployment models: the public cloud, private cloud, hybrid cloud, and community cloud. == Research progress in Academia == The Defense Advanced Research Projects Agency (DARPA) MENTOR program Engineering and Physical Sciences Research Council cloud manufacturing program European Commission's Seventh Framework Program (EC FP7)
Generative engine optimization
Generative engine optimization (GEO) is one of the names given to the practice of structuring digital content and managing online presence to improve visibility in responses generated by generative artificial intelligence (AI) systems. The practice influences the way large language models (LLMs) retrieve, summarize, and present information in response to user queries. Related terms include answer engine optimization (AEO) and artificial intelligence optimization (AIO). The concept of GEO first appeared in response to generative AI technologies being integrated into mainstream search and information retrieval systems. Tools are used to monitor how websites and brands are cited, referenced, or incorporated into responses produced by large language models. == Terminology == Several overlapping terms describe related practices, and usage varies across practitioners, vendors, and publications. No consensus definition distinguishing these terms had been established in the academic literature as of early 2026, and the terms are frequently used interchangeably in trade and practitioner contexts. Other terms for the same concept include answer engine optimization (AEO), large language model optimization (LLMO), artificial intelligence optimization (AIO), and AI SEO. In 2026, Google released documentation entitled "Optimizing your website for generative AI features on Google Search." According to this documentation, "optimizing for generative AI search is optimizing for the search experience, and thus still SEO.” This position had previously been shared at conferences, with 2026 being the first time Google released official documentation stating it. == Factors influencing generative engine optimization == By early 2026, the focus of GEO practitioners shifted from simple keyword placement to "semantic relevance", a metric driven by the integration of advertising into conversational AI. OpenAI and Google began monetizing AI search results, which is not currently considered an aspect of generative engine optimization but is adjacent.
AAAI Conference on Artificial Intelligence
The AAAI Conference on Artificial Intelligence is a leading international academic conference in artificial intelligence held annually. It ranks 4th in terms of H5 Index in Google Scholar's list of top AI publications, after ICLR, NeurIPS, and ICML. It is supported by the Association for the Advancement of Artificial Intelligence (AAAI), after which it is named. Precise dates vary from year to year, but paper submissions are generally due at the end of August to beginning of September, and the conference is generally held during the following February. The first AAAI was held in 1980 at Stanford University, Stanford California. During AAAI-20 conference, AI pioneers and 2018 Turing Award winners (often referred to as the Nobel Prize of Computing) Yann LeCun and Yoshua Bengio, among eight other researchers, were honored as the AAAI 2020 Fellows. Along with other conferences such as NeurIPS and ICML, AAAI uses an artificial-intelligence algorithm to assign papers to reviewers. == Sponsors == Many leading technology companies, including Google, Microsoft, Amazon (company), IBM, Baidu, Bytedance, and Huawei, generously sponsor and participate in AAAI to publish and showcase their latest theoretical and applied research. Sponsoring companies also actively recruit AI talents at the conference. == Locations == AAAI-2026 Singapore Expo, Singapore AAAI-2025 Pennsylvania Convention Center, Philadelphia, Pennsylvania, United States AAAI-2024 Vancouver Convention Centre, Vancouver, British Columbia, Canada AAAI-2023 Washington Convention Center, Washington, D.C., United States AAAI-2022 Virtual Conference AAAI-2021 Virtual Conference AAAI-2020 Hilton New York Midtown, New York, New York, United States AAAI-2019 Hilton Hawaiian Village, Honolulu, Hawaii, United States AAAI-2018 Hilton New Orleans Riverside, New Orleans, Louisiana, United States AAAI-2017 San Francisco, California, United States AAAI-2016 Phoenix, Arizona, United States AAAI-2015 Austin, Texas, United States AAAI-2014 Québec Convention Center, Québec City, Québec, Canada AAAI-2013 Bellevue, Washington, United States AAAI-2012 Toronto, Ontario, Canada AAAI-2011 San Francisco, California, United States AAAI-2010 Westin Peachtree Plaza, Atlanta, Georgia, United States AAAI-2008 Chicago, Illinois, United States AAAI-2007 Toronto, Ontario, Canada AAAI-2006 Boston, Massachusetts, United States AAAI-2005 Pittsburgh, Pennsylvania, United States AAAI-2004 San Jose, California, United States AAAI-2002 Shaw conference center in Edmonton, Alberta, Canada AAAI-2000 Austin, Texas, United States AAAI-1999 Orlando, Florida, United States AAAI-1998 Madison, Wisconsin, United States AAAI-1997 Providence, Rhode Island, United States AAAI-1996 Portland, Oregon, United States AAAI-1994 Seattle, Washington, United States AAAI-1993 Washington Convention Center, Washington, D.C., United States AAAI-1992 San Jose Convention Center, San Jose, California, United States AAAI-1991 Anaheim Convention Center, Anaheim, California, United States AAAI-1990 Boston, Massachusetts, United States AAAI-1988 Saint Paul, Minnesota, United States AAAI-1987 Seattle, Washington, United States AAAI-1986 Philadelphia, Pennsylvania, United States AAAI-1984 University of Texas, Austin, Texas, United States AAAI-1983 Washington, D.C., United States AAAI-1982 Carnegie Mellon University and the University of Pittsburgh, Pittsburgh, Pennsylvania, United States AAAI-1980 Stanford, California, United States
Model Context Protocol
The Model Context Protocol (MCP) is an open standard and open-source framework introduced by Anthropic in November 2024 to standardize the way artificial intelligence (AI) systems like large language models (LLMs) integrate and share data with external tools, systems, and data sources. MCP provides a standardized interface for reading files, executing functions, and handling contextual prompts. Following its announcement, the protocol was adopted by major AI providers, including OpenAI and Google DeepMind. == Background == MCP was announced by Anthropic in November 2024 as an open standard for connecting AI assistants to data systems such as content repositories, business management tools, and development environments. The protocol was created at Anthropic by engineers David Soria Parra and Justin Spahr-Summers. It aims to address the challenge of information silos and legacy systems. Before MCP, developers often had to build custom connectors for each data source or tool, resulting in what Anthropic described as an "N×M" data integration problem. Earlier stop-gap approaches—such as OpenAI's 2023 "function-calling" API and the ChatGPT plug-in framework—solved similar problems but required vendor-specific connectors. MCP re-uses the message-flow ideas of the Language Server Protocol (LSP) and is transported over JSON-RPC 2.0. In December 2025, Anthropic donated the MCP to the Agentic AI Foundation (AAIF), a directed fund under the Linux Foundation, co-founded by Anthropic, Block and OpenAI, with support from other companies. == Features == The protocol was released with software development kits (SDKs) in programming languages including Python, TypeScript, C# and Java. Anthropic maintains an open-source repository of reference MCP server implementations and SDKs. MCP defines a standardized framework for integrating AI systems with external data sources and tools. It includes specifications for data ingestion and transformation, contextual metadata tagging, and AI interoperability across different platforms. The protocol also supports bidirectional connections between data sources and AI tools. MCP enables applications such as querying structured databases with plain language in the field of natural language data access. The protocol is used in AI-assisted software development tools. Integrated development environments (IDEs), coding platforms such as Replit, and code intelligence tools like Sourcegraph have adopted MCP to grant AI coding assistants real-time access to project context. MCP Apps is an official extension to the Model Context Protocol built on mcp-ui. While the base MCP specification is restricted to text and structured data, MCP Apps standardizes the delivery of interactive user interfaces—such as dashboards, forms, and data visualizations—from MCP servers to host applications like Claude and ChatGPT. == Adoption == In March 2025, OpenAI officially adopted the MCP, after having integrated the standard across its products, including the ChatGPT desktop app. In September 2025, OpenAI added support for MCP to ChatGPT apps. This allows for third-party access inside ChatGPT. MCP can be integrated with Microsoft Semantic Kernel, and Azure OpenAI. MCP servers can be deployed to Cloudflare. In April 2026, the AAIF held the MCP Dev Summit North America in New York City, drawing approximately 1,200 attendees. == Reception == The Verge reported that MCP addresses a growing demand for AI agents that are contextually aware and capable of pulling from diverse sources. In April 2025, security researchers released an analysis that concluded there are multiple outstanding security issues with MCP, including prompt injection, tool permissions that allow for combining tools to exfiltrate data, and lookalike tools that can silently replace trusted ones. MCP has been likened to OpenAPI, a similar specification that aims to describe APIs.
Genigraphics
Genigraphics is a large-format printing service bureau specializing in providing poster session services to medical and scientific conferences throughout the US and Canada. The company began in 1973 as a division of General Electric. == History == Genigraphics began as a computer graphics system, developed by General Electric in the late 1960s, for NASA to use in space flight simulation. The technologies thus developed provided a foundation for the company's expansion into the commercial market. The Computed Images System & Services division (CISS, to become Genigraphics Corporation) of GE delivered the first presentation graphics system to Amoco Oil's corporate headquarters in 1973. It was named the 100 Series, and was based on DEC's PDP 11 series of mini computer systems. The first Genigraphics systems (100 Series and 100A Series) used an array of buttons, dials, knobs and joysticks, along with a built in keyboard, as the means of user interface. The PDP-11/40 computer was housed in a tall cabinet and used random access magnetic tape drives (DECtape) for storing completed presentations. The graphics generator (Forox recorder) was capable of outputting 2,000 line resolution, suitable for 35mm and 72mm film and large sheet film positive using larger cassettes for recording. 4000 and 8000 line resolution was later achieved with duplex scanning and 4x scanning by modifying to the Forox recorder's settings menu. Subsequent models (100B,C,D,D+ and D+/GVP) replaced the knobs and dials with an on screen, text based menu system, a graphics tablet and a pen. The pen/tablet combination gave way to a mouse like device in later models, and served to provide the interface with the graphics tools. User interaction with the computer for functions such as media initialization or modem to modem data transfer required a DECwriter serial terminal. In 1982, GE divested the Genigraphics division along with a host of other "non essential" business units (Genitext, Geniponics) and Genigraphics Corporation was born. Shortly after the divestiture, the headquarters of Genigraphics was moved from Liverpool, New York to Saddle Brook, New Jersey. Major success followed as the company grew exponentially over the next few years selling both systems and slide creation services. Genigraphics film recorders produced high-resolution digital images on 35mm film. The computer-generated scenes for The Last Starfighter were calculated on a Cray X-MP supercomputer and mastered with a Genigraphics film recorder. At its peak, Genigraphics Corporation employed roughly 300 people in 24 offices worldwide, with revenues upwards of $70 million annually. By the late 1980s Genigraphics saw demand for its proprietary systems dwindle and began selling the MASTERPIECE 8770 film recorder and GRAFTIME software as a peripheral for DEC Vaxes, IBM PC AT’s, and Mac NuBus machines. But the MASTERPIECE film recorder proved too expensive to sell in volume. In 1988, the company began a partnership with Microsoft to help develop the PowerPoint software. In exchange, every copy of PowerPoint included a “Send to Genigraphics” link to have files sent to a Genigraphics service bureau to be produced as 35mm slides. This partnership continued until 2001. In 1989, after three years of flat revenue, Genigraphics sold its hardware business in order to focus on its service bureau business and partnership with Microsoft via PowerPoint. In 1994, all assets of Genigraphics, including equipment, software development, in-house artwork, trademarks, and rights to the Microsoft partnership, were sold to InFocus Corporation of Wilsonville, Oregon who continued to operate under the Genigraphics brand name. The twenty-four service bureaus were consolidated to a 20,000 square foot facility next to the FedEx hub in Memphis, Tennessee. This allowed PowerPoint slide orders to be received until 10pm and delivered across the United States by the following morning. In 1995, InFocus registered www.genigraphics.com and was among the first to offer a form of ecommerce allowing 35mm slides, color prints and transparencies, printed booklets, and digital projectors to be purchased online. In 1998, then current management bought Genigraphics from InFocus and have operated it continuously ever since as Genigraphics LLC. That same year, InFocus projector rentals were added to the “Send to Genigraphics” link in PowerPoint and Genigraphics became the rental and repair center for all InFocus national accounts. It also marked Genigraphics entry into the new industry of large format printing; leveraging their knowledge of, and access to, PowerPoint programming code to develop a proprietary printer driver to output directly to an Epson 9500 wide format printer. At the time, Genigraphics was the exclusive 35mm slide vendor for all Kinko’s stores in the United States and poster printing was added to the arrangement. In 2003, Genigraphics closed their 35mm slide E6 photo lab – one of the last high-volume commercial E6 labs in the US – and expanded their large format printing capabilities. Since 2003, Genigraphics has become a major player in the poster session market, providing printing and on-site services to medical and scientific conferences throughout the US and Canada. As of February 2019, over 150,000 medical or scientific ‘ePosters’ are made available through their ResearchPosters.com archive service. === Partnership with Microsoft and development of PowerPoint === As presentations began to be created on personal computers in the late 80’s, Genigraphics sought presentation software partners in Silicon Valley who would be interested in sending files to Genigraphics via dial-up modem to be produced on 35mm slides. In 1987, Michael Beetner, Director of Marketing Planning for Genigraphics, met with Robert Gaskins, head of Microsoft's Graphics Business Unit, who was leading the development of the newly released PowerPoint software. A joint development agreement between Microsoft and Genigraphics was agreed upon and announced at Mac World 1988. According to Erica Robles-Anderson and Patrik Svensson, "It would be hard to overestimate Genigraphics’ influence on PowerPoint. PowerPoint 2.0 was designed for Genigraphics film recorders. It shipped with Genigraphics color palettes, schemes, and the distinctively Genigraphics color-gradient backgrounds. The application contained a ‘Send to Genigraphics’ menu item that wrote the presentation to floppy disk or transmitted the order directly via modem. Within three and a half months PowerPoint orders accounted for ten percent of revenue at Genigraphics service centers. PowerPoint 3.0 was even more intimately dependent upon Genigraphics. The software incorporated a collection of clip art images and symbols that had been produced by hundreds of artists at dozens of service centers across tens of thousands of presentations. Genigraphics artists designed PowerPoint 3.0 colors, templates, and sample presentations. The software even used Genigraphics (rather than Excel) chart style. Bar charts were rendered two-dimensionally with apparent thickness added to make them seemingly recede from the axes. The technique made it easier for viewers to compare bar heights and estimate values from axis ticks and labels. Pie charts were handled analogously. Microsoft paid Genigraphics to produce more than 500 clip art drawings and symbols used in Microsoft programs.” In exchange for Genigraphics development efforts, Microsoft included a “Send to Genigraphics” link in every copy of PowerPoint through the 10.0 version (2000/2001). The arrangement came to an end when Microsoft restructured as a result of anti-trust lawsuits.
Batch normalization
In artificial neural networks, batch normalization (also known as batch norm) is a normalization technique used to make training faster and more stable by adjusting the inputs to each layer—re-centering them around zero and re-scaling them to a standard size. It was introduced by Sergey Ioffe and Christian Szegedy in 2015. Experts still debate why batch normalization works so well. It was initially thought to tackle internal covariate shift, a problem where parameter initialization and changes in the distribution of the inputs of each layer affect the learning rate of the network. However, newer research suggests it doesn’t fix this shift but instead smooths the objective function—a mathematical guide the network follows to improve—enhancing performance. In very deep networks, batch normalization can initially cause a severe gradient explosion—where updates to the network grow uncontrollably large—but this is managed with shortcuts called skip connections in residual networks. Another theory is that batch normalization adjusts data by handling its size and path separately, speeding up training. == Internal covariate shift == Each layer in a neural network has inputs that follow a specific distribution, which shifts during training due to two main factors: the random starting values of the network’s settings (parameter initialization) and the natural variation in the input data. This shifting pattern affecting the inputs to the network’s inner layers is called internal covariate shift. While a strict definition isn’t fully agreed upon, experiments show that it involves changes in the means and variances of these inputs during training. Batch normalization was first developed to address internal covariate shift. During training, as the parameters of preceding layers adjust, the distribution of inputs to the current layer changes accordingly, such that the current layer needs to constantly readjust to new distributions. This issue is particularly severe in deep networks, because small changes in shallower hidden layers will be amplified as they propagate within the network, resulting in significant shift in deeper hidden layers. Batch normalization was proposed to reduced these unwanted shifts to speed up training and produce more reliable models. Beyond possibly tackling internal covariate shift, batch normalization offers several additional advantages. It allows the network to use a higher learning rate—a setting that controls how quickly the network learns—without causing problems like vanishing or exploding gradients, where updates become too small or too large. It also appears to have a regularizing effect, improving the network’s ability to generalize to new data, reducing the need for dropout, a technique used to prevent overfitting (when a model learns the training data too well and fails on new data). Additionally, networks using batch normalization are less sensitive to the choice of starting settings or learning rates, making them more robust and adaptable. == Procedures == === Transformation === In a neural network, batch normalization is achieved through a normalization step that fixes the means and variances of each layer's inputs. Ideally, the normalization would be conducted over the entire training set, but to use this step jointly with stochastic optimization methods, it is impractical to use the global information. Thus, normalization is restrained to each mini-batch in the training process. Let us use B to denote a mini-batch of size m of the entire training set. The empirical mean and variance of B could thus be denoted as μ B = 1 m ∑ i = 1 m x i {\displaystyle \mu _{B}={\frac {1}{m}}\sum _{i=1}^{m}x_{i}} and σ B 2 = 1 m ∑ i = 1 m ( x i − μ B ) 2 {\displaystyle \sigma _{B}^{2}={\frac {1}{m}}\sum _{i=1}^{m}(x_{i}-\mu _{B})^{2}} . For a layer of the network with d-dimensional input, x = ( x ( 1 ) , . . . , x ( d ) ) {\displaystyle x=(x^{(1)},...,x^{(d)})} , each dimension of its input is then normalized (i.e. re-centered and re-scaled) separately, x ^ i ( k ) = x i ( k ) − μ B ( k ) ( σ B ( k ) ) 2 + ϵ {\displaystyle {\hat {x}}_{i}^{(k)}={\frac {x_{i}^{(k)}-\mu _{B}^{(k)}}{\sqrt {\left(\sigma _{B}^{(k)}\right)^{2}+\epsilon }}}} , where k ∈ [ 1 , d ] {\displaystyle k\in [1,d]} and i ∈ [ 1 , m ] {\displaystyle i\in [1,m]} ; μ B ( k ) {\displaystyle \mu _{B}^{(k)}} and σ B ( k ) {\displaystyle \sigma _{B}^{(k)}} are the per-dimension mean and standard deviation, respectively. ϵ {\displaystyle \epsilon } is added in the denominator for numerical stability and is an arbitrarily small positive constant. The resulting normalized activation x ^ ( k ) {\displaystyle {\hat {x}}^{(k)}} have zero mean and unit variance, if ϵ {\displaystyle \epsilon } is not taken into account. To restore the representation power of the network, a transformation step then follows as y i ( k ) = γ ( k ) x ^ i ( k ) + β ( k ) {\displaystyle y_{i}^{(k)}=\gamma ^{(k)}{\hat {x}}_{i}^{(k)}+\beta ^{(k)}} , where the parameters γ ( k ) {\displaystyle \gamma ^{(k)}} and β ( k ) {\displaystyle \beta ^{(k)}} are subsequently learned in the optimization process. Formally, the operation that implements batch normalization is a transform B N γ ( k ) , β ( k ) : x 1... m ( k ) → y 1... m ( k ) {\displaystyle BN_{\gamma ^{(k)},\beta ^{(k)}}:x_{1...m}^{(k)}\rightarrow y_{1...m}^{(k)}} called the Batch Normalizing transform. The output of the BN transform y ( k ) = B N γ ( k ) , β ( k ) ( x ( k ) ) {\displaystyle y^{(k)}=BN_{\gamma ^{(k)},\beta ^{(k)}}(x^{(k)})} is then passed to other network layers, while the normalized output x ^ i ( k ) {\displaystyle {\hat {x}}_{i}^{(k)}} remains internal to the current layer. === Backpropagation === The described BN transform is a differentiable operation, and the gradient of the loss l {\displaystyle l} with respect to the different parameters can be computed directly with the chain rule. Specifically, ∂ l ∂ y i ( k ) {\displaystyle {\frac {\partial l}{\partial y_{i}^{(k)}}}} depends on the choice of activation function, and the gradient against other parameters could be expressed as a function of ∂ l ∂ y i ( k ) {\displaystyle {\frac {\partial l}{\partial y_{i}^{(k)}}}} : ∂ l ∂ x ^ i ( k ) = ∂ l ∂ y i ( k ) γ ( k ) {\displaystyle {\frac {\partial l}{\partial {\hat {x}}_{i}^{(k)}}}={\frac {\partial l}{\partial y_{i}^{(k)}}}\gamma ^{(k)}} , ∂ l ∂ γ ( k ) = ∑ i = 1 m ∂ l ∂ y i ( k ) x ^ i ( k ) {\displaystyle {\frac {\partial l}{\partial \gamma ^{(k)}}}=\sum _{i=1}^{m}{\frac {\partial l}{\partial y_{i}^{(k)}}}{\hat {x}}_{i}^{(k)}} , ∂ l ∂ β ( k ) = ∑ i = 1 m ∂ l ∂ y i ( k ) {\displaystyle {\frac {\partial l}{\partial \beta ^{(k)}}}=\sum _{i=1}^{m}{\frac {\partial l}{\partial y_{i}^{(k)}}}} , ∂ l ∂ σ B ( k ) 2 = ∑ i = 1 m ∂ l ∂ y i ( k ) ( x i ( k ) − μ B ( k ) ) ( − γ ( k ) 2 ( σ B ( k ) 2 + ϵ ) − 3 / 2 ) {\displaystyle {\frac {\partial l}{\partial \sigma _{B}^{(k)^{2}}}}=\sum _{i=1}^{m}{\frac {\partial l}{\partial y_{i}^{(k)}}}(x_{i}^{(k)}-\mu _{B}^{(k)})\left(-{\frac {\gamma ^{(k)}}{2}}(\sigma _{B}^{(k)^{2}}+\epsilon )^{-3/2}\right)} , ∂ l ∂ μ B ( k ) = ∑ i = 1 m ∂ l ∂ y i ( k ) − γ ( k ) σ B ( k ) 2 + ϵ + ∂ l ∂ σ B ( k ) 2 1 m ∑ i = 1 m ( − 2 ) ⋅ ( x i ( k ) − μ B ( k ) ) {\displaystyle {\frac {\partial l}{\partial \mu _{B}^{(k)}}}=\sum _{i=1}^{m}{\frac {\partial l}{\partial y_{i}^{(k)}}}{\frac {-\gamma ^{(k)}}{\sqrt {\sigma _{B}^{(k)^{2}}+\epsilon }}}+{\frac {\partial l}{\partial \sigma _{B}^{(k)^{2}}}}{\frac {1}{m}}\sum _{i=1}^{m}(-2)\cdot (x_{i}^{(k)}-\mu _{B}^{(k)})} , and ∂ l ∂ x i ( k ) = ∂ l ∂ x ^ i ( k ) 1 σ B ( k ) 2 + ϵ + ∂ l ∂ σ B ( k ) 2 2 ( x i ( k ) − μ B ( k ) ) m + ∂ l ∂ μ B ( k ) 1 m {\displaystyle {\frac {\partial l}{\partial x_{i}^{(k)}}}={\frac {\partial l}{\partial {\hat {x}}_{i}^{(k)}}}{\frac {1}{\sqrt {\sigma _{B}^{(k)^{2}}+\epsilon }}}+{\frac {\partial l}{\partial \sigma _{B}^{(k)^{2}}}}{\frac {2(x_{i}^{(k)}-\mu _{B}^{(k)})}{m}}+{\frac {\partial l}{\partial \mu _{B}^{(k)}}}{\frac {1}{m}}} . === Inference === During the training stage, the normalization steps depend on the mini-batches to ensure efficient and reliable training. However, in the inference stage, this dependence is not useful any more. Instead, the normalization step in this stage is computed with the population statistics such that the output could depend on the input in a deterministic manner. The population mean, E [ x ( k ) ] {\displaystyle E[x^{(k)}]} , and variance, Var [ x ( k ) ] {\displaystyle \operatorname {Var} [x^{(k)}]} , are computed as: E [ x ( k ) ] = E B [ μ B ( k ) ] {\displaystyle E[x^{(k)}]=E_{B}[\mu _{B}^{(k)}]} , and Var [ x ( k ) ] = m m − 1 E B [ ( σ B ( k ) ) 2 ] {\displaystyle \operatorname {Var} [x^{(k)}]={\frac {m}{m-1}}E_{B}[\left(\sigma _{B}^{(k)}\right)^{2}]} . The population statistics thus is a complete representation of the mini-batches. The BN transform in the inference step thus becomes y ( k ) = B N γ ( k ) , β ( k ) inf ( x ( k ) ) = γ ( k ) x ( k ) − E [ x ( k ) ] Var [ x ( k ) ] + ϵ + β