A pixel aspect ratio (PAR) is a mathematical ratio that describes how the width of a pixel in a digital image compares to the height of that pixel. Most digital imaging systems display an image as a grid of tiny, square pixels. However, some imaging systems, especially those that must be compatible with standard-definition television motion pictures, display an image as a grid of rectangular pixels, in which the pixel width and height are different. Pixel aspect ratio describes this difference. Use of pixel aspect ratio mostly involves pictures pertaining to standard-definition television and some other exceptional cases. Most other imaging systems, including those that comply with SMPTE standards and practices, use square pixels. PAR is also known as sample aspect ratio and abbreviated SAR, though it can be confused with storage aspect ratio. == Introduction == The ratio of the width to the height of an image is known as the aspect ratio, or more precisely the display aspect ratio (DAR) – the aspect ratio of the image as displayed; for TV, DAR was traditionally 4:3 (a.k.a. fullscreen), with 16:9 (a.k.a. widescreen) now the standard for HDTV. In digital images, there is a distinction with the storage aspect ratio (SAR), which is the ratio of pixel dimensions. If an image is displayed with square pixels, then these ratios agree; if not, then non-square, "rectangular" pixels are used, and these ratios disagree. The aspect ratio of the pixels themselves is known as the pixel aspect ratio (PAR) – for square pixels this is 1:1 – and these are related by the identity: Rearranging (solving for PAR) yields: For example: A 640 × 480 VGA image has a SAR of 640/480 = 4:3, and if displayed on a 4:3 display (DAR = 4:3) has square pixels, hence a PAR of 1:1. By contrast, a 720 × 576 D-1 PAL image has a SAR of 720/576 = 5:4, but if displayed on a 4:3 display (DAR = 4:3) the PAR is 4/3 : 5/4 = 16:15 ≈ 1.066. This means that the pixels of the PAL picture must be "stretched" by this amount to fit in the 4:3 display. In analog images such as film there is no notion of pixel, nor notion of SAR or PAR, but in the digitization of analog images the resulting digital image has pixels, hence SAR (and accordingly PAR, if displayed at the same aspect ratio as the original). Non-square pixels arise often in early digital TV standards, related to digitalization of analog TV signals – whose vertical and "effective" horizontal resolutions differ and are thus best described by non-square pixels – and also in some digital video cameras and computer display modes, such as Color Graphics Adapter (CGA). Today they arise also in transcoding between resolutions with different SARs. Actual displays do not generally have non-square pixels, though digital sensors might; they are rather a mathematical abstraction used in resampling images to convert between resolutions. There are several complicating factors in understanding PAR, particularly as it pertains to digitization of analog video: First, analog video does not have pixels, but rather a raster scan, and thus has a well-defined vertical resolution (the lines of the raster), but not a well-defined horizontal resolution, since each line is an analog signal. However, by a standardized sampling rate, the effective horizontal resolution can be determined by the sampling theorem, as is done below. Second, due to overscan, some of the lines at the top and bottom of the raster are not visible, as are some of the possible image on the left and right – see Overscan: Analog to digital resolution issues. Also, the resolution may be rounded (DV NTSC uses 480 lines, rather than the 486 that are possible). Third, analog video signals are interlaced – each image (frame) is sent as two "fields", each with half the lines. Thus either the pixels are twice as tall as they would be without interlacing, or the image is deinterlaced. == Background == Video is presented as a sequential series of images called video frames. Historically, video frames were created and recorded in analog form. As digital display technology, digital broadcast technology, and digital video compression evolved separately, it resulted in video frame differences that must be addressed using pixel aspect ratio. Digital video frames are generally defined as a grid of pixels used to present each sequential image. The horizontal component is defined by pixels (or samples), and is known as a video line. The vertical component is defined by the number of lines, as in 480 lines. Standard-definition television standards and practices were developed as broadcast technologies and intended for terrestrial broadcasting, and were therefore not designed for digital video presentation. Such standards define an image as an array of well-defined horizontal "Lines", well-defined vertical "Line Duration" and a well-defined picture center. However, there is not a standard-definition television standard that properly defines image edges or explicitly demands a certain number of picture elements per line. Furthermore, analog video systems such as NTSC 480i and PAL 576i, instead of employing progressively displayed frames, employ fields or interlaced half-frames displayed in an interwoven manner to reduce flicker and double the image rate for smoother motion. === Analog-to-digital conversion === As a result of computers becoming powerful enough to serve as video editing tools, video digital-to-analog converters and analog-to-digital converters were made to overcome this incompatibility. To convert analog video lines into a series of square pixels, the industry adopted a default sampling rate at which luma values were extracted into pixels. The luma sampling rate for 480i pictures was 12+3⁄11 MHz and for 576i pictures was 14+3⁄4 MHz. The term pixel aspect ratio was first coined when ITU-R BT.601 (commonly known as Rec. 601) specified that standard-definition television pictures are made of lines of exactly 720 non-square pixels. ITU-R BT.601 did not define the exact pixel aspect ratio but did provide enough information to calculate the exact pixel aspect ratio based on industry practices: The standard luma sampling rate of precisely 13+1⁄2 MHz. Based on this information: The pixel aspect ratio for 480i would be 10:11 as: 12 3 11 ÷ 13 1 2 = 10 11 {\displaystyle 12{\tfrac {3}{11}}\div 13{\tfrac {1}{2}}={\tfrac {10}{11}}} The pixel aspect ratio for 576i would be 59:54 as: 14 3 4 ÷ 13 1 2 = 59 54 {\displaystyle 14{\tfrac {3}{4}}\div 13{\tfrac {1}{2}}={\tfrac {59}{54}}} SMPTE RP 187 further attempted to standardize the pixel aspect ratio values for 480i and 576i. It designated 177:160 for 480i or 1035:1132 for 576i. However, due to significant difference with practices in effect by industry and the computational load that they imposed upon the involved hardware, SMPTE RP 187 was simply ignored. SMPTE RP 187 information annex A.4 further suggested the use of 10:11 for 480i. As of this writing, ITU-R BT.601-6, which is the latest edition of ITU-R BT.601, still implies that the pixel aspect ratios mentioned above are correct. === Digital video processing === As stated above, ITU-R BT.601 specified that standard-definition television pictures are made of lines of 720 non-square pixels, sampled with a precisely specified sampling rate. A simple mathematical calculation reveals that a 704 pixel width would be enough to contain a 480i or 576i standard 4:3 picture: A 4:3 480-line picture, digitized with the Rec. 601-recommended sampling rate, would be 704 non-square pixels wide. x 480 × 10 11 = 4 3 ⇒ x = 480 × 11 × 4 10 × 3 = 704 {\displaystyle {\frac {x}{480}}\times {\frac {10}{11}}={\frac {4}{3}}\Rightarrow x={\frac {480\times 11\times 4}{10\times 3}}=704} A 4:3 576-line picture, digitized with the Rec. 601-recommended sampling rate, would be 702+54⁄59 non-square pixels wide. x 576 × 59 54 = 4 3 ⇒ x = 576 × 54 × 4 59 × 3 = 702 54 59 {\displaystyle {\frac {x}{576}}\times {\frac {59}{54}}={\frac {4}{3}}\Rightarrow x={\frac {576\times 54\times 4}{59\times 3}}=702{\tfrac {54}{59}}} Unfortunately, not all standard TV pictures are exactly 4:3: As mentioned earlier, in analog video, the center of a picture is well-defined but the edges of the picture are not standardized. As a result, some analog devices (mostly PAL devices but also some NTSC devices) generated motion pictures that were horizontally (slightly) wider. This also proportionately applies to anamorphic widescreen (16:9) pictures. Therefore, to maintain a safe margin of error, ITU-R BT.601 required sampling 16 more non-square pixels per line (8 more at each edge) to ensure saving all video data near the margins. This requirement, however, had implications for PAL motion pictures. PAL pixel aspect ratios for standard (4:3) and anamorphic wide screen (16:9), respectively 59:54 and 118:81, were awkward for digital image processing, especially for mixing PAL and NTSC video clips. Therefore, video editing products chose the almost equivalent value
Cloud management
Cloud management refers to the administration and oversight of cloud computing products and services. Public clouds are managed by cloud service providers, which operate the underlying infrastructure such as servers, storage, networking, and data center facilities. Users may also opt to manage their public cloud services with a third-party cloud management tool. Users of public cloud services can generally select from three basic cloud provisioning categories: User self-provisioning: Customers purchase cloud services directly from the provider, typically through a web form or console interface. The customer pays on a per-transaction basis. Advanced provisioning: Customers contract in advance a predetermined amount of resources, which are prepared in advance of service. The customer pays a flat fee or a monthly fee. Dynamic provisioning: The provider allocates resources when the customer needs them, then decommissions them when they are no longer needed. The customer is charged on a pay-per-use basis. Managing a private cloud requires software tools to help create a virtualized pool of compute resources, provide a self-service portal for end users and handle security, resource allocation, tracking and billing. Management tools for private clouds tend to be service driven, as opposed to resource driven, because cloud environments are typically highly virtualized and organized in terms of portable workloads. In hybrid cloud environments, compute, network and storage resources must be managed across multiple domains, so a good management strategy should start by defining what needs to be managed, and where and how to do it. Policies to help govern these domains should include configuration and installation of images, access control, and budgeting and reporting. Access control often includes the use of Single sign-on (SSO), in which a user logs in once and gains access to all systems without being prompted to log in again at each of them. == Characteristics of Cloud Management == Cloud management combines software and technologies in a design for managing cloud environments. Software developers have responded to the management challenges of cloud computing with a variety of cloud management platforms and tools. These tools include native tools offered by public cloud providers as well as third-party tools designed to provide consistent functionality across multiple cloud providers. Administrators must balance the competing requirements of efficient consistency across different cloud platforms with access to different native functionality within individual cloud platforms. The growing acceptance of public cloud and increased multicloud usage is driving the need for consistent cross-platform management. Rapid adoption of cloud services is introducing a new set of management challenges for those technical professionals responsible for managing IT systems and services. Cloud-management platforms and tools should have the ability to provide minimum functionality in the following categories. Functionality can be both natively provided or orchestrated via third-party integration. Provisioning and orchestration: create, modify, and delete resources as well as orchestrate workflows and management of workloads Automation: Enable cloud consumption and deployment of app services via infrastructure-as-code and other DevOps concepts Security and compliance: manage role-based access of cloud services and enforce security configurations Service request: collect and fulfill requests from users to access and deploy cloud resources. Monitoring and logging: collect performance and availability metrics as well as automate incident management and log aggregation Inventory and classification: discover and maintain pre-existing brownfield cloud resources plus monitor and manage changes Cost management and optimization: track and rightsize cloud spend and align capacity and performance to actual demand Migration, backup, and DR: enable data protection, disaster recovery, and data mobility via snapshots and/or data replication Organizations may group these criteria into key use cases including Cloud Brokerage, DevOps Automation, Governance, and Day-2 Life Cycle Operations. Enterprises with large-scale cloud implementations may require more robust cloud management tools which include specific characteristics, such as the ability to manage multiple platforms from a single point of reference, or intelligent analytics to automate processes like application lifecycle management. High-end cloud management tools should also have the ability to handle system failures automatically with capabilities such as self-monitoring, an explicit notification mechanism, and include failover and self-healing capabilities. == Multi-Cloud and Hybrid Cloud Management Challenges == Legacy management infrastructures, which are based on the concept of dedicated system relationships and architecture constructs, are not well suited to cloud environments where instances are continually launched and decommissioned. Instead, the dynamic nature of cloud computing requires monitoring and management tools that are adaptable, extensible and customizable. Cloud computing presents a number of management challenges. Companies using public clouds do not have ownership of the equipment hosting the cloud environment, and because the environment is not contained within their own networks, public cloud customers do not have full visibility or control. Users of public cloud services must also integrate with an architecture defined by the cloud provider, using its specific parameters for working with cloud components. Integration includes tying into the cloud APIs for configuring IP addresses, subnets, firewalls and data service functions for storage. Because control of these functions is based on the cloud provider’s infrastructure and services, public cloud users must integrate with the cloud infrastructure management. Capacity management is a challenge for both public and private cloud environments because end users have the ability to deploy applications using self-service portals. Applications of all sizes may appear in the environment, consume an unpredictable amount of resources, then disappear at any time. A possible solution is profiling the applications impact on computational resources. As result, the performance models allow the prediction of how resource utilization changes according to application patterns. Thus, resources can be dynamically scaled to meet the expected demand. This is critical to cloud providers that need to provision resources quickly to meet a growing demand by their applications. Charge-back—or, pricing resource use on a granular basis—is a challenge for both public and private cloud environments. Charge-back is a challenge for public cloud service providers because they must price their services competitively while still creating profit. Users of public cloud services may find charge-back challenging because it is difficult for IT groups to assess actual resource costs on a granular basis due to overlapping resources within an organization that may be paid for by an individual business unit, such as electrical power. For private cloud operators, charge-back is fairly straightforward, but the challenge lies in guessing how to allocate resources as closely as possible to actual resource usage to achieve the greatest operational efficiency. Exceeding budgets can be a risk. Hybrid cloud environments, which combine public and private cloud services, sometimes with traditional infrastructure elements, present their own set of management challenges. These include security concerns if sensitive data lands on public cloud servers, budget concerns around overuse of storage or bandwidth and proliferation of mismanaged images. Managing the information flow in a hybrid cloud environment is also a significant challenge. On-premises clouds must share information with applications hosted off-premises by public cloud providers, and this information may change constantly. Hybrid cloud environments also typically include a complex mix of policies, permissions and limits that must be managed consistently across both public and private clouds. == Cloud Management Platforms (CMP) == CMPs provide a means for a cloud service customer to manage the deployment and operation of applications and associated datasets across multiple cloud service infrastructures, including both on-premises cloud infrastructure and public cloud service provider infrastructure. In other words, CMPs provide management capabilities for hybrid cloud and multi-cloud environments. A cloud management platform (CMP) provides broad cloud management functionality atop both public cloud provider platforms and private cloud platforms. CMPs manage cloud services and resources that are distributed across multiple cloud platforms. The value of CMPs stands in delivering the maximum level of consistency between platforms without comp
Multi-label classification
In machine learning, multi-label classification or multi-output classification is a variant of the classification problem where multiple nonexclusive labels may be assigned to each instance. Multi-label classification is a generalization of multiclass classification, which is the single-label problem of categorizing instances into precisely one of several (greater than or equal to two) classes. In the multi-label problem the labels are nonexclusive and there is no constraint on how many of the classes the instance can be assigned to. The formulation of multi-label learning was first introduced by Shen et al. in the context of Semantic Scene Classification, and later gained popularity across various areas of machine learning. Formally, multi-label classification is the problem of finding a model that maps inputs x to binary vectors y; that is, it assigns a value of 0 or 1 for each element (label) in y. == Problem transformation methods == Several problem transformation methods exist for multi-label classification, and can be roughly broken down into: === Transformation into binary classification problems === The baseline approach, called the binary relevance method, amounts to independently training one binary classifier for each label. Given an unseen sample, the combined model then predicts all labels for this sample for which the respective classifiers predict a positive result. Although this method of dividing the task into multiple binary tasks may resemble superficially the one-vs.-all (OvA) and one-vs.-rest (OvR) methods for multiclass classification, it is essentially different from both, because a single classifier under binary relevance deals with a single label, without any regard to other labels whatsoever. A classifier chain is an alternative method for transforming a multi-label classification problem into several binary classification problems. It differs from binary relevance in that labels are predicted sequentially, and the output of all previous classifiers (i.e. positive or negative for a particular label) are input as features to subsequent classifiers. Classifier chains have been applied, for instance, in HIV drug resistance prediction. Bayesian network has also been applied to optimally order classifiers in Classifier chains. In case of transforming the problem to multiple binary classifications, the likelihood function reads L = ∏ i = 1 n ( ∏ k ( ∏ j k ( p k , j k ( x i ) δ y i , k , j k ) ) ) {\displaystyle L=\prod _{i=1}^{n}(\prod _{k}(\prod _{j_{k}}(p_{k,j_{k}}(x_{i})^{\delta _{y_{i,k},j_{k}}})))} where index i {\displaystyle i} runs over the samples, index k {\displaystyle k} runs over the labels, j k {\displaystyle j_{k}} indicates the binary outcomes 0 or 1, δ a , b {\displaystyle \delta _{a,b}} indicates the Kronecker delta, y i , k ∈ 0 , 1 {\displaystyle y_{i,k}\in {0,1}} indicates the multiple hot encoded labels of sample i {\displaystyle i} . === Transformation into multi-class classification problem === The label powerset (LP) transformation creates one binary classifier for every label combination present in the training set. For example, if possible labels for an example were A, B, and C, the label powerset representation of this problem is a multi-class classification problem with the classes [0 0 0], [1 0 0], [0 1 0], [0 0 1], [1 1 0], [1 0 1], [0 1 1], and [1 1 1] where for example [1 0 1] denotes an example where labels A and C are present and label B is absent. === Ensemble methods === A set of multi-class classifiers can be used to create a multi-label ensemble classifier. For a given example, each classifier outputs a single class (corresponding to a single label in the multi-label problem). These predictions are then combined by an ensemble method, usually a voting scheme where every class that receives a requisite percentage of votes from individual classifiers (often referred to as the discrimination threshold) is predicted as a present label in the multi-label output. However, more complex ensemble methods exist, such as committee machines. Another variation is the random k-labelsets (RAKEL) algorithm, which uses multiple LP classifiers, each trained on a random subset of the actual labels; label prediction is then carried out by a voting scheme. A set of multi-label classifiers can be used in a similar way to create a multi-label ensemble classifier. In this case, each classifier votes once for each label it predicts rather than for a single label. == Adapted algorithms == Some classification algorithms/models have been adapted to the multi-label task, without requiring problem transformations. Examples of these including for multi-label data are k-nearest neighbors: the ML-kNN algorithm extends the k-NN classifier to multi-label data. decision trees: "Clare" is an adapted C4.5 algorithm for multi-label classification; the modification involves the entropy calculations. MMC, MMDT, and SSC refined MMDT, can classify multi-labeled data based on multi-valued attributes without transforming the attributes into single-values. They are also named multi-valued and multi-labeled decision tree classification methods. kernel methods for vector output neural networks: BP-MLL is an adaptation of the popular back-propagation algorithm for multi-label learning. == Learning paradigms == Based on learning paradigms, the existing multi-label classification techniques can be classified into batch learning and online machine learning. Batch learning algorithms require all the data samples to be available beforehand. It trains the model using the entire training data and then predicts the test sample using the found relationship. The online learning algorithms, on the other hand, incrementally build their models in sequential iterations. In iteration t, an online algorithm receives a sample, xt and predicts its label(s) ŷt using the current model; the algorithm then receives yt, the true label(s) of xt and updates its model based on the sample-label pair: (xt, yt). == Multi-label stream classification == Data streams are possibly infinite sequences of data that continuously and rapidly grow over time. Multi-label stream classification (MLSC) is the version of multi-label classification task that takes place in data streams. It is sometimes also called online multi-label classification. The difficulties of multi-label classification (exponential number of possible label sets, capturing dependencies between labels) are combined with difficulties of data streams (time and memory constraints, addressing infinite stream with finite means, concept drifts). Many MLSC methods resort to ensemble methods in order to increase their predictive performance and deal with concept drifts. Below are the most widely used ensemble methods in the literature: Online Bagging (OzaBagging)-based methods: Observing the probability of having K many of a certain data point in a bootstrap sample is approximately Poisson(1) for big datasets, each incoming data instance in a data stream can be weighted proportional to Poisson(1) distribution to mimic bootstrapping in an online setting. This is called Online Bagging (OzaBagging). Many multi-label methods that use Online Bagging are proposed in the literature, each of which utilizes different problem transformation methods. EBR, ECC, EPS, EBRT, EBMT, ML-Random Rules are examples of such methods. ADWIN Bagging-based methods: Online Bagging methods for MLSC are sometimes combined with explicit concept drift detection mechanisms such as ADWIN (Adaptive Window). ADWIN keeps a variable-sized window to detect changes in the distribution of the data, and improves the ensemble by resetting the components that perform poorly when there is a drift in the incoming data. Generally, the letter 'a' is used as a subscript in the name of such ensembles to indicate the usage of ADWIN change detector. EaBR, EaCC, EaHTPS are examples of such multi-label ensembles. GOOWE-ML-based methods: Interpreting the relevance scores of each component of the ensemble as vectors in the label space and solving a least squares problem at the end of each batch, Geometrically-Optimum Online-Weighted Ensemble for Multi-label Classification (GOOWE-ML) is proposed. The ensemble tries to minimize the distance between the weighted prediction of its components and the ground truth vector for each instance over a batch. Unlike Online Bagging and ADWIN Bagging, GOOWE-ML utilizes a weighted voting scheme where better performing components of the ensemble are given more weight. The GOOWE-ML ensemble grows over time, and the lowest weight component is replaced by a new component when it is full at the end of a batch. GOBR, GOCC, GOPS, GORT are the proposed GOOWE-ML-based multi-label ensembles. Multiple Windows : Here, BR models that use a sliding window are replaced with two windows for each label, one for relevant and one for non-relevant examples. Instances are oversampled or undersampled according to a load factor that is kept
Markov blanket
In statistics and machine learning, a Markov blanket of a random variable is a set of variables that renders the variable conditionally independent of all other variables in the system. This concept is central in probabilistic graphical models and feature selection. If a Markov blanket is minimal—meaning that no variable in it can be removed without losing this conditional independence—it is called a Markov boundary. Identifying a Markov blanket or boundary allows for efficient inference and helps isolate relevant variables for prediction or causal reasoning. The terms Markov blanket and Markov boundary were coined by Judea Pearl in 1988. A Markov blanket may be derived from the structure of a probabilistic graphical model such as a Bayesian network or Markov random field. == Definition == A Markov blanket of a random variable Y {\displaystyle Y} in a random variable set S = { X 1 , … , X n } {\displaystyle {\mathcal {S}}=\{X_{1},\ldots ,X_{n}\}} is any subset S 1 {\displaystyle {\mathcal {S}}_{1}} of S {\displaystyle {\mathcal {S}}} , conditioned on which other variables are independent with Y {\displaystyle Y} : Y ⊥ ⊥ S ∖ S 1 ∣ S 1 {\displaystyle Y\perp \!\!\!\perp {\mathcal {S}}\smallsetminus {\mathcal {S}}_{1}\mid {\mathcal {S}}_{1}} It means that S 1 {\displaystyle {\mathcal {S}}_{1}} contains at least all the information one needs to infer Y {\displaystyle Y} , where the variables in S ∖ S 1 {\displaystyle {\mathcal {S}}\smallsetminus {\mathcal {S}}_{1}} are redundant. In general, a given Markov blanket is not unique. Any set in S {\displaystyle {\mathcal {S}}} that contains a Markov blanket is also a Markov blanket itself. Specifically, S {\displaystyle {\mathcal {S}}} is a Markov blanket of Y {\displaystyle Y} in S {\displaystyle {\mathcal {S}}} . === Example === In a Bayesian network, the Markov blanket of a node consists of its parents, its children, and its children's other parents (i.e., co-parents). Knowing the values of these nodes makes the target node conditionally independent of the rest of the network. In a Markov random field, the Markov blanket of a node is simply its immediate neighbors. == Markov condition == The concept of a Markov blanket is rooted in the Markov condition, which states that in a probabilistic graphical model, each variable is conditionally independent of its non-descendants given its parents. This condition implies the existence of a minimal separating set — the Markov blanket — that shields a variable from the rest of the network. For instance, when a person holds an object stationary against gravity, the object’s acceleration is fully determined by its direct causes—namely, the upward force from the hand and the downward gravitational pull. Other variables such as air pressure or temperature are causally irrelevant. == Markov boundary == A Markov boundary of Y {\displaystyle Y} in S {\displaystyle {\mathcal {S}}} is a subset S 2 {\displaystyle {\mathcal {S}}_{2}} of S {\displaystyle {\mathcal {S}}} , such that S 2 {\displaystyle {\mathcal {S}}_{2}} itself is a Markov blanket of Y {\displaystyle Y} , but any proper subset of S 2 {\displaystyle {\mathcal {S}}_{2}} is not a Markov blanket of Y {\displaystyle Y} . In other words, a Markov boundary is a minimal Markov blanket. The Markov boundary of a node A {\displaystyle A} in a Bayesian network is the set of nodes composed of A {\displaystyle A} 's parents, A {\displaystyle A} 's children, and A {\displaystyle A} 's children's other parents. In a Markov random field, the Markov boundary for a node is the set of its neighboring nodes. In a dependency network, the Markov boundary for a node is the set of its parents. === Uniqueness of Markov boundary === The Markov boundary always exists. Under some mild conditions, the Markov boundary is unique. However, for most practical and theoretical scenarios multiple Markov boundaries may provide alternative solutions. When there are multiple Markov boundaries, quantities measuring causal effect could fail. == In cognitive science == In the study of consciousness, brain function, and complex adaptive systems, Markov blankets are proposed as a mathematical mechanism which delimits the extent of cognitive entities, whether it be physical or causal.
Reservoir computing
Reservoir computing is a framework for computation derived from recurrent neural network theory that maps input signals into higher dimensional computational spaces through the dynamics of a fixed, non-linear system called a reservoir. After the input signal is fed into the reservoir, which is treated as a "black box," a simple readout mechanism is trained to read the state of the reservoir and map it to the desired output. The first key benefit of this framework is that training is performed only at the readout stage, as the reservoir dynamics are fixed. The second is that the computational power of naturally available systems, both classical and quantum mechanical, can be used to reduce the effective computational cost. == History == The first examples of reservoir neural networks demonstrated that randomly connected recurrent neural networks could be used for sensorimotor sequence learning, and simple forms of interval and speech discrimination. In these early models the memory in the network took the form of both short-term synaptic plasticity and activity mediated by recurrent connections. In other early reservoir neural network models the memory of the recent stimulus history was provided solely by the recurrent activity. Overall, the general concept of reservoir computing stems from the use of recursive connections within neural networks to create a complex dynamical system. It is a generalisation of earlier neural network architectures such as recurrent neural networks, liquid-state machines and echo-state networks. Reservoir computing also extends to physical systems that are not networks in the classical sense, but rather continuous systems in space and/or time: e.g. a literal "bucket of water" can serve as a reservoir that performs computations on inputs given as perturbations of the surface. The resultant complexity of such recurrent neural networks was found to be useful in solving a variety of problems including language processing and dynamic system modeling. However, training of recurrent neural networks is challenging and computationally expensive. Reservoir computing reduces those training-related challenges by fixing the dynamics of the reservoir and only training the linear output layer. A large variety of nonlinear dynamical systems can serve as a reservoir that performs computations. In recent years semiconductor lasers have attracted considerable interest as computation can be fast and energy efficient compared to electrical components. Recent advances in both AI and quantum information theory have given rise to the concept of quantum neural networks. These hold promise in quantum information processing, which is challenging to classical networks, but can also find application in solving classical problems. In 2018, a physical realization of a quantum reservoir computing architecture was demonstrated in the form of nuclear spins within a molecular solid. However, the nuclear spin experiments in did not demonstrate quantum reservoir computing per se as they did not involve processing of sequential data. Rather the data were vector inputs, which makes this more accurately a demonstration of quantum implementation of a random kitchen sink algorithm (also going by the name of extreme learning machines in some communities). In 2019, another possible implementation of quantum reservoir processors was proposed in the form of two-dimensional fermionic lattices. In 2020, realization of reservoir computing on gate-based quantum computers was proposed and demonstrated on cloud-based IBM superconducting near-term quantum computers. Reservoir computers have been used for time-series analysis purposes. In particular, some of their usages involve chaotic time-series prediction, separation of chaotic signals, and link inference of networks from their dynamics. == Classical reservoir computing == === Reservoir === The 'reservoir' in reservoir computing is the internal structure of the computer, and must have two properties: it must be made up of individual, non-linear units, and it must be capable of storing information. The non-linearity describes the response of each unit to input, which is what allows reservoir computers to solve complex problems. Reservoirs are able to store information by connecting the units in recurrent loops, where the previous input affects the next response. The change in reaction due to the past allows the computers to be trained to complete specific tasks. Reservoirs can be virtual or physical. Virtual reservoirs are typically randomly generated and are designed like neural networks. Virtual reservoirs can be designed to have non-linearity and recurrent loops, but, unlike neural networks, the connections between units are randomized and remain unchanged throughout computation. Physical reservoirs are possible because of the inherent non-linearity of certain natural systems. The interaction between ripples on the surface of water contains the nonlinear dynamics required in reservoir creation, and a pattern recognition RC was developed by first inputting ripples with electric motors then recording and analyzing the ripples in the readout. === Readout === The readout is a neural network layer that performs a linear transformation on the output of the reservoir. The weights of the readout layer are trained by analyzing the spatiotemporal patterns of the reservoir after excitation by known inputs, and by utilizing a training method such as a linear regression or a Ridge regression. As its implementation depends on spatiotemporal reservoir patterns, the details of readout methods are tailored to each type of reservoir. For example, the readout for a reservoir computer using a container of liquid as its reservoir might entail observing spatiotemporal patterns on the surface of the liquid. === Types === ==== Context reverberation network ==== An early example of reservoir computing was the context reverberation network. In this architecture, an input layer feeds into a high dimensional dynamical system which is read out by a trainable single-layer perceptron. Two kinds of dynamical system were described: a recurrent neural network with fixed random weights, and a continuous reaction–diffusion system inspired by Alan Turing's model of morphogenesis. At the trainable layer, the perceptron associates current inputs with the signals that reverberate in the dynamical system; the latter were said to provide a dynamic "context" for the inputs. In the language of later work, the reaction–diffusion system served as the reservoir. ==== Echo state network ==== The tree echo state network (TreeESN) model represents a generalization of the reservoir computing framework to tree structured data. ==== Liquid-state machine ==== Chaotic liquid state machine The liquid (i.e. reservoir) of a chaotic liquid state machine (CLSM), or chaotic reservoir, is made from chaotic spiking neurons but which stabilize their activity by settling to a single hypothesis that describes the trained inputs of the machine. This is in contrast to general types of reservoirs that don't stabilize. The liquid stabilization occurs via synaptic plasticity and chaos control that govern neural connections inside the liquid. CLSM showed promising results in learning sensitive time series data. ==== Nonlinear transient computation ==== This type of information processing is most relevant when time-dependent input signals depart from the mechanism's internal dynamics. These departures cause transients or temporary altercations which are represented in the device's output. ==== Deep reservoir computing ==== The extension of the reservoir computing framework towards deep learning, with the introduction of deep reservoir computing and of the deep echo state network (DeepESN) model allows to develop efficiently trained models for hierarchical processing of temporal data, at the same time enabling the investigation on the inherent role of layered composition in recurrent neural networks. == Quantum reservoir computing == Quantum reservoir computing may use the nonlinear nature of quantum mechanical interactions or processes to form the characteristic nonlinear reservoirs but may also be done with linear reservoirs when the injection of the input to the reservoir creates the nonlinearity. The marriage of machine learning and quantum devices is leading to the emergence of quantum neuromorphic computing as a new research area. === Types === ==== Gaussian states of interacting quantum harmonic oscillators ==== Gaussian states are a paradigmatic class of states of continuous variable quantum systems. Although they can nowadays be created and manipulated in, e.g, state-of-the-art optical platforms, naturally robust to decoherence, it is well-known that they are not sufficient for, e.g., universal quantum computing because transformations that preserve the Gaussian nature of a state are linear. Normally, linear dynamics would not be sufficient for nontrivial reser
SIP (software)
SIP is an open source software tool used to connect computer programs or libraries written in C or C++ with the scripting language Python. It is an alternative to SWIG. SIP was originally developed in 1998 for PyQt — the Python bindings for the Qt GUI toolkit — but is suitable for generating bindings for any C or C++ library. == Concept == SIP takes a set of specification (.sip) files describing the API and generates the required C++ code. This is then compiled to produce the Python extension modules. A .sip file is essentially the class header file with some things removed (because SIP does not include a full C++ parser) and some things added (because C++ does not always provide enough information about how the API works). For PyQt v4 I use an internal tool (written using PyQt of course) called metasip. This is sort of an IDE for SIP. It uses GCC-XML to parse the latest header files and saves the relevant data, as XML, in a metasip project. metasip then does the equivalent of a diff against the previous version of the API and flags up any changes that need to be looked at. Those changes are then made through the GUI and ticked off the TODO list. Generating the .sip files is just a button click. In my subversion repository, PyQt v4 is basically just a 20M XML file. Updating PyQt v4 for a minor release of Qt v4 is about half an hours work. In terms of how the generated code works then I don't think it's very different from how any other bindings generator works. Python has a very good C API for writing extension modules - it's one of the reasons why so many 3rd party tools have Python bindings. For every C++ class, the SIP generated code creates a corresponding Python class implemented in C. == Notable applications that use SIP == PyQt, a python port of the application framework and widget toolkit Qt QGIS, a free and open-source cross-platform desktop geographic information system (GIS) QtiPlot, a computer program to analyze and visualize scientific data calibre (software), a free and open-source cross-platform e-book manager Veusz, a free and open-source cross-platform program to visualize scientific data
Polynomial kernel
In machine learning, the polynomial kernel is a kernel function commonly used with support vector machines (SVMs) and other kernelized models, that represents the similarity of vectors (training samples) in a feature space over polynomials of the original variables, allowing learning of non-linear models. Intuitively, the polynomial kernel looks not only at the given features of input samples to determine their similarity, but also combinations of these. In the context of regression analysis, such combinations are known as interaction features. The (implicit) feature space of a polynomial kernel is equivalent to that of polynomial regression, but without the combinatorial blowup in the number of parameters to be learned. When the input features are binary-valued (booleans), then the features correspond to logical conjunctions of input features. == Definition == For degree-d polynomials, the polynomial kernel is defined as K ( x , y ) = ( x T y + c ) d {\displaystyle K(\mathbf {x} ,\mathbf {y} )=(\mathbf {x} ^{\mathsf {T}}\mathbf {y} +c)^{d}} where x and y are vectors of size n in the input space, i.e. vectors of features computed from training or test samples and c ≥ 0 is a free parameter trading off the influence of higher-order versus lower-order terms in the polynomial. When c = 0, the kernel is called homogeneous. (A further generalized polykernel divides xTy by a user-specified scalar parameter a.) As a kernel, K corresponds to an inner product in a feature space based on some mapping φ: K ( x , y ) = ⟨ φ ( x ) , φ ( y ) ⟩ {\displaystyle K(\mathbf {x} ,\mathbf {y} )=\langle \varphi (\mathbf {x} ),\varphi (\mathbf {y} )\rangle } The nature of φ can be seen from an example. Let d = 2, so we get the special case of the quadratic kernel. After using the multinomial theorem (twice—the outermost application is the binomial theorem) and regrouping, K ( x , y ) = ( ∑ i = 1 n x i y i + c ) 2 = ∑ i = 1 n ( x i 2 ) ( y i 2 ) + ∑ i = 2 n ∑ j = 1 i − 1 ( 2 x i x j ) ( 2 y i y j ) + ∑ i = 1 n ( 2 c x i ) ( 2 c y i ) + c 2 {\displaystyle K(\mathbf {x} ,\mathbf {y} )=\left(\sum _{i=1}^{n}x_{i}y_{i}+c\right)^{2}=\sum _{i=1}^{n}\left(x_{i}^{2}\right)\left(y_{i}^{2}\right)+\sum _{i=2}^{n}\sum _{j=1}^{i-1}\left({\sqrt {2}}x_{i}x_{j}\right)\left({\sqrt {2}}y_{i}y_{j}\right)+\sum _{i=1}^{n}\left({\sqrt {2c}}x_{i}\right)\left({\sqrt {2c}}y_{i}\right)+c^{2}} From this it follows that the feature map is given by: φ ( x ) = ( x n 2 , … , x 1 2 , 2 x n x n − 1 , … , 2 x n x 1 , 2 x n − 1 x n − 2 , … , 2 x n − 1 x 1 , … , 2 x 2 x 1 , 2 c x n , … , 2 c x 1 , c ) {\displaystyle \varphi (x)=\left(x_{n}^{2},\ldots ,x_{1}^{2},{\sqrt {2}}x_{n}x_{n-1},\ldots ,{\sqrt {2}}x_{n}x_{1},{\sqrt {2}}x_{n-1}x_{n-2},\ldots ,{\sqrt {2}}x_{n-1}x_{1},\ldots ,{\sqrt {2}}x_{2}x_{1},{\sqrt {2c}}x_{n},\ldots ,{\sqrt {2c}}x_{1},c\right)} generalizing for ( x T y + c ) d {\displaystyle \left(\mathbf {x} ^{T}\mathbf {y} +c\right)^{d}} , where x ∈ R n {\displaystyle \mathbf {x} \in \mathbb {R} ^{n}} , y ∈ R n {\displaystyle \mathbf {y} \in \mathbb {R} ^{n}} and applying the multinomial theorem: ( x T y + c ) d = ∑ j 1 + j 2 + ⋯ + j n + 1 = d d ! j 1 ! ⋯ j n ! j n + 1 ! x 1 j 1 ⋯ x n j n c j n + 1 d ! j 1 ! ⋯ j n ! j n + 1 ! y 1 j 1 ⋯ y n j n c j n + 1 = φ ( x ) T φ ( y ) {\displaystyle {\begin{alignedat}{2}\left(\mathbf {x} ^{T}\mathbf {y} +c\right)^{d}&=\sum _{j_{1}+j_{2}+\dots +j_{n+1}=d}{\frac {\sqrt {d!}}{\sqrt {j_{1}!\cdots j_{n}!j_{n+1}!}}}x_{1}^{j_{1}}\cdots x_{n}^{j_{n}}{\sqrt {c}}^{j_{n+1}}{\frac {\sqrt {d!}}{\sqrt {j_{1}!\cdots j_{n}!j_{n+1}!}}}y_{1}^{j_{1}}\cdots y_{n}^{j_{n}}{\sqrt {c}}^{j_{n+1}}\\&=\varphi (\mathbf {x} )^{T}\varphi (\mathbf {y} )\end{alignedat}}} The last summation has l d = ( n + d d ) {\displaystyle l_{d}={\tbinom {n+d}{d}}} elements, so that: φ ( x ) = ( a 1 , … , a l , … , a l d ) {\displaystyle \varphi (\mathbf {x} )=\left(a_{1},\dots ,a_{l},\dots ,a_{l_{d}}\right)} where l = ( j 1 , j 2 , . . . , j n , j n + 1 ) {\displaystyle l=(j_{1},j_{2},...,j_{n},j_{n+1})} and a l = d ! j 1 ! ⋯ j n ! j n + 1 ! x 1 j 1 ⋯ x n j n c j n + 1 | j 1 + j 2 + ⋯ + j n + j n + 1 = d {\displaystyle a_{l}={\frac {\sqrt {d!}}{\sqrt {j_{1}!\cdots j_{n}!j_{n+1}!}}}x_{1}^{j_{1}}\cdots x_{n}^{j_{n}}{\sqrt {c}}^{j_{n+1}}\quad |\quad j_{1}+j_{2}+\dots +j_{n}+j_{n+1}=d} == Practical use == Although the RBF kernel is more popular in SVM classification than the polynomial kernel, the latter is quite popular in natural language processing (NLP). The most common degree is d = 2 (quadratic), since larger degrees tend to overfit on NLP problems. Various ways of computing the polynomial kernel (both exact and approximate) have been devised as alternatives to the usual non-linear SVM training algorithms, including: full expansion of the kernel prior to training/testing with a linear SVM, i.e. full computation of the mapping φ as in polynomial regression; basket mining (using a variant of the apriori algorithm) for the most commonly occurring feature conjunctions in a training set to produce an approximate expansion; inverted indexing of support vectors. One problem with the polynomial kernel is that it may suffer from numerical instability: when xTy + c < 1, K(x, y) = (xTy + c)d tends to zero with increasing d, whereas when xTy + c > 1, K(x, y) tends to infinity.