AI Data Analytics

AI Data Analytics — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Control system

    Control system

    A control system manages, commands, directs, or regulates the behavior of other devices or systems using control loops. It can range from a single home heating controller using a thermostat controlling a domestic boiler to large industrial control systems which are used for controlling processes or machines. The control systems are designed via control engineering process. For continuously modulated control, a feedback controller is used to automatically control a process or operation. The control system compares the value or status of the process variable (PV) being controlled with the desired value or setpoint (SP), and applies the difference as a control signal to bring the process variable output of the plant to the same value as the setpoint. For sequential and combinational logic, software logic, such as in a programmable logic controller, is used. == Open-loop and closed-loop control == == Feedback control systems == == Logic control == Logic control systems for industrial and commercial machinery were historically implemented by interconnected electrical relays and cam timers using ladder logic. Today, most such systems are constructed with microcontrollers or more specialized programmable logic controllers (PLCs). The notation of ladder logic is still in use as a programming method for PLCs. Logic controllers may respond to switches and sensors and can cause the machinery to start and stop various operations through the use of actuators. Logic controllers are used to sequence mechanical operations in many applications. Examples include elevators, washing machines and other systems with interrelated operations. An automatic sequential control system may trigger a series of mechanical actuators in the correct sequence to perform a task. For example, various electric and pneumatic transducers may fold and glue a cardboard box, fill it with the product and then seal it in an automatic packaging machine. PLC software can be written in many different ways – ladder diagrams, SFC (sequential function charts) or statement lists. == On–off control == On–off control uses a feedback controller that switches abruptly between two states. A simple bi-metallic domestic thermostat can be described as an on-off controller. When the temperature in the room (PV) goes below the user setting (SP), the heater is switched on. Another example is a pressure switch on an air compressor. When the pressure (PV) drops below the setpoint (SP) the compressor is powered. Refrigerators and vacuum pumps contain similar mechanisms. Simple on–off control systems like these can be cheap and effective. == Linear control == == Fuzzy logic == Fuzzy logic is an attempt to apply the easy design of logic controllers to the control of complex continuously varying systems. Basically, a measurement in a fuzzy logic system can be partly true. The rules of the system are written in natural language and translated into fuzzy logic. For example, the design for a furnace would start with: "If the temperature is too high, reduce the fuel to the furnace. If the temperature is too low, increase the fuel to the furnace." Measurements from the real world (such as the temperature of a furnace) are fuzzified and logic is calculated arithmetic, as opposed to Boolean logic, and the outputs are de-fuzzified to control equipment. When a robust fuzzy design is reduced to a single, quick calculation, it begins to resemble a conventional feedback loop solution and it might appear that the fuzzy design was unnecessary. However, the fuzzy logic paradigm may provide scalability for large control systems where conventional methods become unwieldy or costly to derive. Fuzzy electronics is an electronic technology that uses fuzzy logic instead of the two-value logic more commonly used in digital electronics. == Physical implementation == The range of control system implementation is from compact controllers often with dedicated software for a particular machine or device, to distributed control systems for industrial process control for a large physical plant. Logic systems and feedback controllers are usually implemented with programmable logic controllers. The Broadly Reconfigurable and Expandable Automation Device (BREAD) is a recent framework that provides many open-source hardware devices which can be connected to create more complex data acquisition and control systems.

    Read more →
  • Maike Osborne

    Maike Osborne

    Maike Osborne (born Michael Osborne, 1982) is an Australian academic and scientist who serves as a professor of machine learning at University of Oxford in the Machine Learning Research Group in the Department of Engineering Science. In 2016 she co-founded Mind Foundry, an artificial intelligence company, along with fellow professor Stephen Roberts. == Education == She has a BEng in Mechanical Engineering and a BSc in both Pure Mathematics and Physics from the University of Western Australia. She has a PhD in Machine Learning from the University of Oxford. == Career == Osborne has contributed to over 100 publications, and her work has received over 24,000 citations with an h-index of 46 according to Google Scholar. and has acted as principal or co-investigator for £10.6M of research funding. Her career has focused in particular on Bayesian approaches to AI and machine learning, named after the famous British statistician Thomas Bayes. Osborne's work has contributed to Probabilistic numerics, with Osborne co-authoring the first textbook on the subject. In 2013, Osborne co-authored a paper alongside Swedish-German economist Carl Benedikt Frey called "The Future of Employment: How Susceptible are Jobs to Computerisation?". The paper has received over 13,000 citations and extensive media coverage. In 2023 Osborne gave oral evidence to the UK House of Commons Science and Technology Committee on the subject of the "Governance of Artificial Intelligence". Her testimony received significant coverage around her warnings of the threat of "rogue AI". == Honors == She is also an Official Fellow of Exeter College, and St Peter's College, Oxford, a Fellow of the ELLIS society, and a Faculty Member of the Oxford-Man Institute of Quantitative Finance. She joined the Oxford Martin School as Lead Researcher on the Oxford Martin Programme on Technology and Employment in 2015. She is a Director of the EPSRC Centre for Doctoral Training in Autonomous Intelligent Machines and Systems.

    Read more →
  • Vector quantization

    Vector quantization

    Vector quantization (VQ) is a classical quantization technique from signal processing that allows the modeling of probability density functions by the distribution of prototype vectors. Developed in the early 1980s by Robert M. Gray, it was originally used for data compression. It works by dividing a large set of points (vectors) into groups having approximately the same number of points closest to them. Each group is represented by its centroid point, as in k-means and some other clustering algorithms. In simpler terms, vector quantization chooses a set of points to represent a larger set of points. The density matching property of vector quantization is powerful, especially for identifying the density of large and high-dimensional data. Since data points are represented by the index of their closest centroid, commonly occurring data have low error, and rare data high error. This is why VQ is suitable for lossy data compression. It can also be used for lossy data correction and density estimation. Vector quantization is based on the competitive learning paradigm, so it is closely related to the self-organizing map model and to sparse coding models used in deep learning algorithms such as autoencoder. == Training == One simple training algorithm for vector quantization is: Pick a sample point at random Move the nearest quantization vector centroid towards this sample point, by a small fraction of the distance Repeat A more sophisticated algorithm reduces the bias in the density matching estimation and ensures that all points are used, by including an extra sensitivity parameter: Increase each centroid's sensitivity s i {\displaystyle s_{i}} by a small amount Pick a sample point P {\displaystyle P} at random For each quantization vector centroid c i {\displaystyle c_{i}} , let d ( P , c i ) {\displaystyle d(P,c_{i})} denote the distance of P {\displaystyle P} and c i {\displaystyle c_{i}} Find the centroid c i {\displaystyle c_{i}} for which d ( P , c i ) − s i {\displaystyle d(P,c_{i})-s_{i}} is the smallest Move c i {\displaystyle c_{i}} towards P {\displaystyle P} by a small fraction of the distance Set s i {\displaystyle s_{i}} to zero Repeat It is desirable to use a cooling schedule to produce convergence: see Simulated annealing. Another simple method is LBG, which is based on k-means. The algorithm can be iteratively updated with "live" data, rather than by picking random points from a data set, but this will introduce some bias if the data are temporally correlated over many samples. == Applications == Vector quantization is used for lossy data compression, lossy data correction, pattern recognition, density estimation and clustering. Lossy data correction, or prediction, is used to recover data missing from some dimensions. It is done by finding the nearest group with the data dimensions available, then predicting the result based on the values for the missing dimensions, assuming that they will have the same value as the group's centroid. For density estimation, the area/volume that is closer to a particular centroid than to any other is inversely proportional to the density (due to the density matching property of the algorithm). === Use in data compression === Vector quantization, also called "block quantization" or "pattern matching quantization" is often used in lossy data compression. It works by encoding values from a multidimensional vector space into a finite set of values from a discrete subspace of lower dimension. A lower-space vector requires less storage space, so the data is compressed. Due to the density matching property of vector quantization, the compressed data has errors that are inversely proportional to density. The transformation is usually done by projection or by using a codebook. In some cases, a codebook can be also used to entropy code the discrete value in the same step, by generating a prefix coded variable-length encoded value as its output. The set of discrete amplitude levels is quantized jointly rather than each sample being quantized separately. Consider a k-dimensional vector [ x 1 , x 2 , . . . , x k ] {\displaystyle [x_{1},x_{2},...,x_{k}]} of amplitude levels. It is compressed by choosing the nearest matching vector from a set of n-dimensional vectors [ y 1 , y 2 , . . . , y n ] {\displaystyle [y_{1},y_{2},...,y_{n}]} , with n < k. All possible combinations of the n-dimensional vector [ y 1 , y 2 , . . . , y n ] {\displaystyle [y_{1},y_{2},...,y_{n}]} form the vector space to which all the quantized vectors belong. Only the index of the codeword in the codebook is sent instead of the quantized values. This conserves space and achieves more compression. Twin vector quantization (VQF) is part of the MPEG-4 standard dealing with time domain weighted interleaved vector quantization. === Video codecs based on vector quantization === Bink video Cinepak Daala is transform-based but uses pyramid vector quantization on transformed coefficients Digital Video Interactive: Production-Level Video and Real-Time Video Indeo Microsoft Video 1 QuickTime: Apple Video (RPZA) and Graphics Codec (SMC) Sorenson SVQ1 and SVQ3 Smacker video VQA format, used in many games The usage of video codecs based on vector quantization has declined significantly in favor of those based on motion compensated prediction combined with transform coding, e.g. those defined in MPEG standards, as the low decoding complexity of vector quantization has become less relevant. === Audio codecs based on vector quantization === AMR-WB+ CELP CELT (now part of Opus) is transform-based but uses pyramid vector quantization on transformed coefficients Codec 2 DTS G.729 iLBC Ogg Vorbis TwinVQ === Use in pattern recognition === VQ was also used in the eighties for speech and speaker recognition. Recently it has also been used for efficient nearest neighbor search and on-line signature recognition. In pattern recognition applications, one codebook is constructed for each class (each class being a user in biometric applications) using acoustic vectors of this user. In the testing phase the quantization distortion of a testing signal is worked out with the whole set of codebooks obtained in the training phase. The codebook that provides the smallest vector quantization distortion indicates the identified user. The main advantage of VQ in pattern recognition is its low computational burden when compared with other techniques such as dynamic time warping (DTW) and hidden Markov model (HMM). The main drawback when compared to DTW and HMM is that it does not take into account the temporal evolution of the signals (speech, signature, etc.) because all the vectors are mixed up. In order to overcome this problem a multi-section codebook approach has been proposed. The multi-section approach consists of modelling the signal with several sections (for instance, one codebook for the initial part, another one for the center and a last codebook for the ending part). === Use as clustering algorithm === As VQ is seeking for centroids as density points of nearby lying samples, it can be also directly used as a prototype-based clustering method: each centroid is then associated with one prototype. By aiming to minimize the expected squared quantization error and introducing a decreasing learning gain fulfilling the Robbins-Monro conditions, multiple iterations over the whole data set with a concrete but fixed number of prototypes converges to the solution of k-means clustering algorithm in an incremental manner. === Generative adversarial networks (GAN) === VQ has been used to quantize a feature representation layer in the discriminator of generative adversarial networks. The feature quantization (FQ) technique performs implicit feature matching. It improves the GAN training, and yields an improved performance on a variety of popular GAN models: BigGAN for image generation, StyleGAN for face synthesis, and U-GAT-IT for unsupervised image-to-image translation.

    Read more →
  • Wasserstein GAN

    Wasserstein GAN

    The Wasserstein Generative Adversarial Network (WGAN) is a variant of generative adversarial network (GAN) proposed in 2017 that aims to "improve the stability of learning, get rid of problems like mode collapse, and provide meaningful learning curves useful for debugging and hyperparameter searches". Compared with the original GAN discriminator, the Wasserstein GAN discriminator provides a better learning signal to the generator. This allows the training to be more stable when generator is learning distributions in very high dimensional spaces. == Motivation == === The GAN game === The original GAN method is based on the GAN game, a zero-sum game with 2 players: generator and discriminator. The game is defined over a probability space ( Ω , B , μ r e f ) {\displaystyle (\Omega ,{\mathcal {B}},\mu _{ref})} , The generator's strategy set is the set of all probability measures μ G {\displaystyle \mu _{G}} on ( Ω , B ) {\displaystyle (\Omega ,{\mathcal {B}})} , and the discriminator's strategy set is the set of measurable functions D : Ω → [ 0 , 1 ] {\displaystyle D:\Omega \to [0,1]} . The objective of the game is L ( μ G , D ) := E x ∼ μ r e f [ ln ⁡ D ( x ) ] + E x ∼ μ G [ ln ⁡ ( 1 − D ( x ) ) ] . {\displaystyle L(\mu _{G},D):=\mathbb {E} _{x\sim \mu _{ref}}[\ln D(x)]+\mathbb {E} _{x\sim \mu _{G}}[\ln(1-D(x))].} The generator aims to minimize it, and the discriminator aims to maximize it. A basic theorem of the GAN game states that Repeat the GAN game many times, each time with the generator moving first, and the discriminator moving second. Each time the generator μ G {\displaystyle \mu _{G}} changes, the discriminator must adapt by approaching the ideal D ∗ ( x ) = d μ r e f d ( μ r e f + μ G ) . {\displaystyle D^{}(x)={\frac {d\mu _{ref}}{d(\mu _{ref}+\mu _{G})}}.} Since we are really interested in μ r e f {\displaystyle \mu _{ref}} , the discriminator function D {\displaystyle D} is by itself rather uninteresting. It merely keeps track of the likelihood ratio between the generator distribution and the reference distribution. At equilibrium, the discriminator is just outputting 1 2 {\displaystyle {\frac {1}{2}}} constantly, having given up trying to perceive any difference. Concretely, in the GAN game, let us fix a generator μ G {\displaystyle \mu _{G}} , and improve the discriminator step-by-step, with μ D , t {\displaystyle \mu _{D,t}} being the discriminator at step t {\displaystyle t} . Then we (ideally) have L ( μ G , μ D , 1 ) ≤ L ( μ G , μ D , 2 ) ≤ ⋯ ≤ max μ D L ( μ G , μ D ) = 2 D J S ( μ r e f ‖ μ G ) − 2 ln ⁡ 2 , {\displaystyle L(\mu _{G},\mu _{D,1})\leq L(\mu _{G},\mu _{D,2})\leq \cdots \leq \max _{\mu _{D}}L(\mu _{G},\mu _{D})=2D_{JS}(\mu _{ref}\|\mu _{G})-2\ln 2,} so we see that the discriminator is actually lower-bounding D J S ( μ r e f ‖ μ G ) {\displaystyle D_{JS}(\mu _{ref}\|\mu _{G})} . === Wasserstein distance === Thus, we see that the point of the discriminator is mainly as a critic to provide feedback for the generator, about "how far it is from perfection", where "far" is defined as Jensen–Shannon divergence. Naturally, this brings the possibility of using a different criteria of farness. There are many possible divergences to choose from, such as the f-divergence family, which would give the f-GAN. The Wasserstein GAN is obtained by using the Wasserstein metric, which satisfies a "dual representation theorem" that renders it highly efficient to compute: A proof can be found in the main page on Wasserstein metric. == Definition == By the Kantorovich-Rubenstein duality, the definition of Wasserstein GAN is clear:A Wasserstein GAN game is defined by a probability space ( Ω , B , μ r e f ) {\displaystyle (\Omega ,{\mathcal {B}},\mu _{ref})} , where Ω {\displaystyle \Omega } is a metric space, and a constant K > 0 {\displaystyle K>0} . There are 2 players: generator and discriminator (also called "critic"). The generator's strategy set is the set of all probability measures μ G {\displaystyle \mu _{G}} on ( Ω , B ) {\displaystyle (\Omega ,{\mathcal {B}})} . The discriminator's strategy set is the set of measurable functions of type D : Ω → R {\displaystyle D:\Omega \to \mathbb {R} } with bounded Lipschitz-norm: ‖ D ‖ L ≤ K {\displaystyle \|D\|_{L}\leq K} . The Wasserstein GAN game is a zero-sum game, with objective function L W G A N ( μ G , D ) := E x ∼ μ G [ D ( x ) ] − E x ∼ μ r e f [ D ( x ) ] . {\displaystyle L_{WGAN}(\mu _{G},D):=\mathbb {E} _{x\sim \mu _{G}}[D(x)]-\mathbb {E} _{x\sim \mu _{ref}}[D(x)].} The generator goes first, and the discriminator goes second. The generator aims to minimize the objective, and the discriminator aims to maximize the objective: min μ G max D L W G A N ( μ G , D ) . {\displaystyle \min _{\mu _{G}}\max _{D}L_{WGAN}(\mu _{G},D).} By the Kantorovich-Rubenstein duality, for any generator strategy μ G {\displaystyle \mu _{G}} , the optimal reply by the discriminator is D ∗ {\displaystyle D^{}} , such that L W G A N ( μ G , D ∗ ) = K ⋅ W 1 ( μ G , μ r e f ) . {\displaystyle L_{WGAN}(\mu _{G},D^{})=K\cdot W_{1}(\mu _{G},\mu _{ref}).} Consequently, if the discriminator is good, the generator would be constantly pushed to minimize W 1 ( μ G , μ r e f ) {\displaystyle W_{1}(\mu _{G},\mu _{ref})} , and the optimal strategy for the generator is just μ G = μ r e f {\displaystyle \mu _{G}=\mu _{ref}} , as it should. == Comparison with GAN == In the Wasserstein GAN game, the discriminator provides a better gradient than in the GAN game. Consider for example a game on the real line where both μ G {\displaystyle \mu _{G}} and μ r e f {\displaystyle \mu _{ref}} are Gaussian. Then the optimal Wasserstein critic D W G A N {\displaystyle D_{WGAN}} and the optimal GAN discriminator D {\displaystyle D} are plotted as below: For fixed discriminator, the generator needs to minimize the following objectives: For GAN, E x ∼ μ G [ ln ⁡ ( 1 − D ( x ) ) ] {\displaystyle \mathbb {E} _{x\sim \mu _{G}}[\ln(1-D(x))]} . For Wasserstein GAN, E x ∼ μ G [ D W G A N ( x ) ] {\displaystyle \mathbb {E} _{x\sim \mu _{G}}[D_{WGAN}(x)]} . Let μ G {\displaystyle \mu _{G}} be parametrized by θ {\displaystyle \theta } , then we can perform stochastic gradient descent by using two unbiased estimators of the gradient: ∇ θ E x ∼ μ G [ ln ⁡ ( 1 − D ( x ) ) ] = E x ∼ μ G [ ln ⁡ ( 1 − D ( x ) ) ⋅ ∇ θ ln ⁡ ρ μ G ( x ) ] {\displaystyle \nabla _{\theta }\mathbb {E} _{x\sim \mu _{G}}[\ln(1-D(x))]=\mathbb {E} _{x\sim \mu _{G}}[\ln(1-D(x))\cdot \nabla _{\theta }\ln \rho _{\mu _{G}}(x)]} ∇ θ E x ∼ μ G [ D W G A N ( x ) ] = E x ∼ μ G [ D W G A N ( x ) ⋅ ∇ θ ln ⁡ ρ μ G ( x ) ] {\displaystyle \nabla _{\theta }\mathbb {E} _{x\sim \mu _{G}}[D_{WGAN}(x)]=\mathbb {E} _{x\sim \mu _{G}}[D_{WGAN}(x)\cdot \nabla _{\theta }\ln \rho _{\mu _{G}}(x)]} where we used the reparameterization trick. As shown, the generator in GAN is motivated to let its μ G {\displaystyle \mu _{G}} "slide down the peak" of ln ⁡ ( 1 − D ( x ) ) {\displaystyle \ln(1-D(x))} . Similarly for the generator in Wasserstein GAN. For Wasserstein GAN, D W G A N {\displaystyle D_{WGAN}} has gradient 1 almost everywhere, while for GAN, ln ⁡ ( 1 − D ) {\displaystyle \ln(1-D)} has flat gradient in the middle, and steep gradient elsewhere. As a result, the variance for the estimator in GAN is usually much larger than that in Wasserstein GAN. See also Figure 3 of. The problem with D J S {\displaystyle D_{JS}} is much more severe in actual machine learning situations. Consider training a GAN to generate ImageNet, a collection of photos of size 256-by-256. The space of all such photos is R 256 2 {\displaystyle \mathbb {R} ^{256^{2}}} , and the distribution of ImageNet pictures, μ r e f {\displaystyle \mu _{ref}} , concentrates on a manifold of much lower dimension in it. Consequently, any generator strategy μ G {\displaystyle \mu _{G}} would almost surely be entirely disjoint from μ r e f {\displaystyle \mu _{ref}} , making D J S ( μ G ‖ μ r e f ) = + ∞ {\displaystyle D_{JS}(\mu _{G}\|\mu _{ref})=+\infty } . Thus, a good discriminator can almost perfectly distinguish μ r e f {\displaystyle \mu _{ref}} from μ G {\displaystyle \mu _{G}} , as well as any μ G ′ {\displaystyle \mu _{G}'} close to μ G {\displaystyle \mu _{G}} . Thus, the gradient ∇ μ G L ( μ G , D ) ≈ 0 {\displaystyle \nabla _{\mu _{G}}L(\mu _{G},D)\approx 0} , creating no learning signal for the generator. Detailed theorems can be found in. == Training Wasserstein GANs == Training the generator in Wasserstein GAN is just gradient descent, the same as in GAN (or most deep learning methods), but training the discriminator is different, as the discriminator is now restricted to have bounded Lipschitz norm. There are several methods for this. === Upper-bounding the Lipschitz norm === Let the discriminator function D {\displaystyle D} to be implemented by a multilayer perceptron: D = D n ∘ D n − 1 ∘ ⋯ ∘ D 1 {\displaystyle D=D_{n}\circ D_{n-1}\circ \cdots \circ D_{1}} where D i ( x ) = h ( W i x ) {\displaystyle D_{i}(x)=h(W_

    Read more →
  • Representation collapse

    Representation collapse

    Representation collapse is a phenomenon in machine learning and representation learning where a model maps different inputs to the same or very similar embeddings, which means it loses important information about how the data is spread out. It is frequently encountered in self-supervised learning, especially within contrastive and non-contrastive frameworks, when training objectives or model architectures do not maintain variance across representations. Collapse results in degenerate solutions characterized by uninformative learned features, significantly impairing downstream task performance. Various techniques have been proposed to mitigate representation collapse, including the use of negative samples, architectural asymmetry, stop-gradient operations, variance regularization, and redundancy reduction objectives, as seen in methods such as SimCLR, BYOL, and VICReg. Comprehending and averting representation collapse is regarded as a fundamental challenge in the advancement of stable and efficient self-supervised learning systems.

    Read more →
  • Is an AI Essay Writer Worth It in 2026?

    Is an AI Essay Writer Worth It in 2026?

    Comparing the best AI essay writer? An AI essay writer is software that uses machine learning to help you get more done — it lowers the barrier so anyone can produce professional output. Privacy matters too: check whether your data trains the model and whether a no-log or enterprise tier is available. Whether you are a beginner or a pro, the right AI essay writer slots into your workflow and pays for itself fast. Below we compare features, pricing, and real output so you can choose with confidence.

    Read more →
  • IBM optical mark and character readers

    IBM optical mark and character readers

    IBM designed, manufactured and sold optical mark and character readers from 1960 until 1984. The IBM 1287 is notable as being the first commercially sold scanner capable of reading handwritten numbers. == Initial development work == IBM Poughkeepsie studied machine character recognition from 1950 till 1954, developing an experimental machine that used a cathode-ray-tube attached an IBM 701 which performed the character analysis. They pursued a technique known as lakes and bays which examined different areas of dark and light where the lakes were white areas enclosed by black and the bays were partially enclosed areas. Their machine and mission was moved to IBM Endicott in 1954, where research continued. From 1955 to 1956 they then worked on the VIDOR (Visual Document Reader) program, but they could not get agreement on acceptable reject rate. The developers felt 80% recognition was acceptable (meaning 20% of documents would need to be manually processed), while product planners and IBM Marketing felt that compared to punched card, the reject rate was unacceptably high. This led to no new products being released. In 1956 the American Bankers Association chose to use Magnetic Ink Character Recognition (MICR) to automate check handling, rejecting a proposed solution generated by an IBM Poughkeepsie banking project that used optical characters formed by vertical bars and digits. IBM developed a magnetic read head to handle the new standard, releasing the IBM 1210 MICR reader/sorter in 1959. The development work for this product both with read heads and document handling, helped move optical character recognition forward, with development focusing on reading one or two lines of print from a paper document larger than an IBM punched card. The first product to be released was the IBM 1418. == IBM 123x Optical Mark Readers == The IBM 1230, IBM 1231, and IBM 1232 were optical mark readers used to input the contents of data sources such as questionnaires, test results, surveys as well as historical data that could be easily entered as marks on sheets. Educational institutes used them to score test results and they were effectively a replacement for the IBM 805 Test Scoring Machine that used electrical resistance and a mark sense pencil to score a test, rather than optical mark detection. They were developed and manufactured by IBM Rochester. They have the following features: A pneumatic input hopper that can hold approximately 600 sheets Two output stackers: the normal stacker that holds 600 sheets and the select (or reject) stacker which holds 50 sheets. Pluggable SMS printed circuit cards They can read positional marks made by a lead pencil using an optical read head that consists of photovoltaic(solar) cells and lamps The 1230 has 21 photovoltaic cells, 20 for reading the pencil marks and one to read timing marks on the right hand border of the sheet. The 1231 and 1232 have 22 photovoltaic cells, 20 to read data, one to read timing marks and one to read a special feature called a master mark. Input size is a 8+1⁄2 in × 11 in (22 cm × 28 cm) sheet called a data sheet that can have up to 1000 marked or printed positions per side. Uses electromechanical devices known as sonic delay lines to store results. === IBM 1230 Optical Mark Scoring Reader === The IBM 1230 is an offline optical mark scoring machine announced on 2 November 1962 that was designed to read and scores 1,200 answer sheets per hour. Scored results are printed via a wire matrix printer on the right margin of each answer sheet as it is processed. Two master sheets are required for the process: one that encoded the correct answers and one for the machine to record run information. Output could be sent to an IBM 534 Model 3 Card Punch as an option, which limits throughput to 750 sheets per hour when punching 80 columns of data. === IBM 1231 Optical Mark Page Reader === The IBM 1231 is an online optical mark reader that was designed to read and score 2000 test answer sheets per hour, depending on downstream operations. The correct answers for the test can either be entered using a master sheet (like the 1230) or sent to the 1231 using the optional master-mark special feature. === IBM 1232 Optical Mark Page Reader === The IBM 1232 is an offline optical mark reader that was designed to read up to 2000 marked sheets per hour. Documents can be read at up to 2000 sheets per hour, but this depends on the number of characters that need to be punched from each sheet. The IBM 1232 reads the marks and then punches them into cards using a IBM 534 Model 3 Card Punch. Together they can read up to 64,000 characters per hour or 800 fully punched cards. === Example customers === The California Test Bureau (CTB) that provided standardised achievement tests for educational institutes across the USA, began replacing their IBM 805s with IBM 1230s in 1963. They then installed two IBM 1232s in 1964. Being able to use a full 8+1⁄2 in × 11 in (22 cm × 28 cm) answer sheet rather than a 7+3⁄8 in × 3+1⁄4 in (18.7 cm × 8.3 cm) mark sense card, eliminated the need to use multiple answer cards per test per student, as well as dramatically increased the marking speed for test answers. Credit Bureau Services of Dallas used an IBM 1232 in 1966 as part of their first computerisation project. They marked credit history data onto optical scanning sheets that were fed into their IBM 1232. The attached IBM 534 then punched this data onto punched cards, which were then fed into their IBM System/360 Model 30. In 1968 the US Army Corps of Engineers Coastal Engineering Research Center (CERC) began using special log books for their coastal surveyors to record coastal survey data, which was then converted to punched cards by an IBM 1232. == IBM 2956 Optical Mark/Hole Reader == The IBM 2956 Models 2 and 3 are custom build optical mark/hole readers designed to be attached to an IBM 2740 Communications Terminal. The IBM 2956-2 can read cards that have either been hand or machine marked or that have been punched. The cards can be fed by hand or from the 400 card hopper. It has a 400 card stacker. The 2956-2 could be ordered by request for price quotation (RPQ) 843086. The IBM 2956-3 can read cards that have either been hand or machine marked or that have been punched. It can also read marked sheets up to 9 in × 14 in (230 mm × 360 mm) in size, although only a 3+1⁄4 in (83 mm) band along the side of the sheet can be read (the width of a punched card). It does not have a hopper or a stacker, so each card or sheet must be manually fed into the machine. The 2956-3 could be ordered by request for price quotation (RPQ) 843106. The 2956-3 could be attached to an IBM 3276 or IBM 3278 display station with RPQ UB9001. One use case for the IBM 2956 is to grade school tests. On completion of a learning module a student can use an optical scan-type card to record answers to up to 27 questions, with up to 5 choices per question. They are scanned by the reader and the results are then transmitted to an IBM System/360 in remote job entry mode and can also be printed on the IBM 2740. The reader can also be attached to an IBM 3735 which transmits results to an IBM System/370 and which prints results on an IBM 3286 printer. They can also be attached to an IBM System/3. Note that the IBM 2956 Model 5 (2956-5) was a banking reader/sorter. == IBM 1282 Optical Reader Card Punch == The IBM 1282 is an offline optical reader that is used to read embossed credit card receipts, a mark read field or machine printed characters in three different fonts. It then outputs this data onto a punched card. It was developed and manufactured by IBM Endicott. It proved popular and within two years of announcement 100 machines were installed or on order. === Example customer === The New York Department of Motor Vehicles reported that from 1964 until 1968 they were using an IBM 1282 to read machine printed license renewal slips that had been mailed back as part of the renewal process. They would scan the slip and then process the resulting punched card. This worked well until the DMV decided to request renewals include the drivers Social Security Number (SSN), which meant a handwritten number needed to be either manually keyed or a new scanning device procured. They switched to the IBM 1287 in 1968. == IBM 1285 Optical Reader == The IBM 1285 is an online optical reader that is used to read printed paper tapes from cash registers or adding machines. It was developed by IBM Endicott and manufactured by IBM Rochester. The IBM 1285 attaches to an IBM 1401, 1440, 1460 or System/360. It has a small round screen to display characters being read and it has a keyboard to enter header information and to optionally enter character corrections for rejected characters. It can read a 200 ft (61 m) roll or paper tape in three-and-a half minutes, reading data at speeds of up to 3000 lines per minute. It can mark the tape with a dot to indicate unreadable characters, so they can be r

    Read more →
  • Regularization perspectives on support vector machines

    Regularization perspectives on support vector machines

    Within mathematical analysis, Regularization perspectives on support-vector machines provide a way of interpreting support-vector machines (SVMs) in the context of other regularization-based machine-learning algorithms. SVM algorithms categorize binary data, with the goal of fitting the training set data in a way that minimizes the average of the hinge-loss function and L2 norm of the learned weights. This strategy avoids overfitting via Tikhonov regularization and in the L2 norm sense and also corresponds to minimizing the bias and variance of our estimator of the weights. Estimators with lower Mean squared error predict better or generalize better when given unseen data. Specifically, Tikhonov regularization algorithms produce a decision boundary that minimizes the average training-set error and constrain the Decision boundary not to be excessively complicated or overfit the training data via a L2 norm of the weights term. The training and test-set errors can be measured without bias and in a fair way using accuracy, precision, Auc-Roc, precision-recall, and other metrics. Regularization perspectives on support-vector machines interpret SVM as a special case of Tikhonov regularization, specifically Tikhonov regularization with the hinge loss for a loss function. This provides a theoretical framework with which to analyze SVM algorithms and compare them to other algorithms with the same goals: to generalize without overfitting. SVM was first proposed in 1995 by Corinna Cortes and Vladimir Vapnik, and framed geometrically as a method for finding hyperplanes that can separate multidimensional data into two categories. This traditional geometric interpretation of SVMs provides useful intuition about how SVMs work, but is difficult to relate to other machine-learning techniques for avoiding overfitting, like regularization, early stopping, sparsity and Bayesian inference. However, once it was discovered that SVM is also a special case of Tikhonov regularization, regularization perspectives on SVM provided the theory necessary to fit SVM within a broader class of algorithms. This has enabled detailed comparisons between SVM and other forms of Tikhonov regularization, and theoretical grounding for why it is beneficial to use SVM's loss function, the hinge loss. == Theoretical background == In the statistical learning theory framework, an algorithm is a strategy for choosing a function f : X → Y {\displaystyle f\colon \mathbf {X} \to \mathbf {Y} } given a training set S = { ( x 1 , y 1 ) , … , ( x n , y n ) } {\displaystyle S=\{(x_{1},y_{1}),\ldots ,(x_{n},y_{n})\}} of inputs x i {\displaystyle x_{i}} and their labels y i {\displaystyle y_{i}} (the labels are usually ± 1 {\displaystyle \pm 1} ). Regularization strategies avoid overfitting by choosing a function that fits the data, but is not too complex. Specifically: f = argmin f ∈ H { 1 n ∑ i = 1 n V ( y i , f ( x i ) ) + λ ‖ f ‖ H 2 } , {\displaystyle f={\underset {f\in {\mathcal {H}}}{\operatorname {argmin} }}\left\{{\frac {1}{n}}\sum _{i=1}^{n}V(y_{i},f(x_{i}))+\lambda \|f\|_{\mathcal {H}}^{2}\right\},} where H {\displaystyle {\mathcal {H}}} is a hypothesis space of functions, V : Y × Y → R {\displaystyle V\colon \mathbf {Y} \times \mathbf {Y} \to \mathbb {R} } is the loss function, ‖ ⋅ ‖ H {\displaystyle \|\cdot \|_{\mathcal {H}}} is a norm on the hypothesis space of functions, and λ ∈ R {\displaystyle \lambda \in \mathbb {R} } is the regularization parameter. When H {\displaystyle {\mathcal {H}}} is a reproducing kernel Hilbert space, there exists a kernel function K : X × X → R {\displaystyle K\colon \mathbf {X} \times \mathbf {X} \to \mathbb {R} } that can be written as an n × n {\displaystyle n\times n} symmetric positive-definite matrix K {\displaystyle \mathbf {K} } . By the representer theorem, f ( x i ) = ∑ j = 1 n c j K i j , and ‖ f ‖ H 2 = ⟨ f , f ⟩ H = ∑ i = 1 n ∑ j = 1 n c i c j K ( x i , x j ) = c T K c . {\displaystyle f(x_{i})=\sum _{j=1}^{n}c_{j}\mathbf {K} _{ij},{\text{ and }}\|f\|_{\mathcal {H}}^{2}=\langle f,f\rangle _{\mathcal {H}}=\sum _{i=1}^{n}\sum _{j=1}^{n}c_{i}c_{j}K(x_{i},x_{j})=c^{T}\mathbf {K} c.} == Special properties of the hinge loss == The simplest and most intuitive loss function for categorization is the misclassification loss, or 0–1 loss, which is 0 if f ( x i ) = y i {\displaystyle f(x_{i})=y_{i}} and 1 if f ( x i ) ≠ y i {\displaystyle f(x_{i})\neq y_{i}} , i.e. the Heaviside step function on − y i f ( x i ) {\displaystyle -y_{i}f(x_{i})} . However, this loss function is not convex, which makes the regularization problem very difficult to minimize computationally. Therefore, we look for convex substitutes for the 0–1 loss. The hinge loss, V ( y i , f ( x i ) ) = ( 1 − y f ( x ) ) + {\displaystyle V{\big (}y_{i},f(x_{i}){\big )}={\big (}1-yf(x){\big )}_{+}} , where ( s ) + = max ( s , 0 ) {\displaystyle (s)_{+}=\max(s,0)} , provides such a convex relaxation. In fact, the hinge loss is the tightest convex upper bound to the 0–1 misclassification loss function, and with infinite data returns the Bayes-optimal solution: f b ( x ) = { 1 , p ( 1 ∣ x ) > p ( − 1 ∣ x ) , − 1 , p ( 1 ∣ x ) < p ( − 1 ∣ x ) . {\displaystyle f_{b}(x)={\begin{cases}1,&p(1\mid x)>p(-1\mid x),\\-1,&p(1\mid x) Read more →

  • Backend as a service

    Backend as a service

    Backend as a service (BaaS), sometimes also referred to as mobile backend as a service (MBaaS), is a service for providing web app and mobile app developers with a way to easily build a backend to their frontend applications. Features available include user management, push notifications, and integration with social networking services. These services are provided via the use of custom software development kits (SDKs) and application programming interfaces (APIs). BaaS is a relatively recent development in cloud computing, with most BaaS startups dating from 2011 or later. Some of the most popular service providers are AWS Amplify and Firebase. == Purpose == Web and mobile apps require a similar set of features on the backend, including notification service, integration with social networks, and cloud storage. Each of these services has its own API that must be individually incorporated into an app, a process that can be time-consuming and complicated for app developers. BaaS providers form a bridge between the frontend of an application and various cloud-based backends via a unified API and SDK. Providing a consistent way to manage backend data means that developers do not need to redevelop their own backend for each of the services that their apps need to access, potentially saving both time and money. Although similar to other cloud-computing business models, such as serverless computing, software as a service (SaaS), infrastructure as a service (IaaS), and platform as a service (PaaS), BaaS is distinct from these other services in that it specifically addresses the cloud-computing needs of web and mobile app developers by providing a unified means of connecting their apps to cloud services. == Features == BaaS providers offer different set of features and backend tools. Some of the most common features include: Database management. Most BaaS solutions provide SQL and/or NoSQL database management services for applications. Developers can store their app data without deploying and managing databases themselves. BaaS usually provides client SDKs, REST and GraphQL APIs for the frontend to interact with databases. File storage. BaaS providers often offer storage solutions for media files, user uploads, and other binary data. Applications can upload, download, and delete files through provided SDKs and APIs. Authentication and authorization. Some BaaS offer authentication and authorization services that allow developers to easily manage app users. This includes user sign-up, login, password reset, social media login integration through OAuth, user group and permission management etc. Notification service. Some BaaS providers such as Firebase and AWS Amplify have notification services that can send custom emails to users and push native notifications on mobile platforms. This is especially useful for applications that need to send messages, alerts, and reminders. Cloud functions. Some BaaS allow developers to deploy and run serverless functions. The functions are usually stateless and can be triggered by various ways including HTTP requests, SDK invocation, background server events, and cloud scheduled executions. Different providers offer runtime support for different languages, some of the popular languages are JavaScript/TypeScript (Node.js, Deno), Python, Java/Kotlin. Cloud functions extend the potential and flexibility of BaaS by allowing developers to write custom functionalities for their apps, working in a way similar to a traditional REST API backend framework. Usage analytics. Analytics data about application usage is often included in BaaS. This allows developers to monitor user behaviors and make decisions correspondingly in marketing strategies and performance optimizations. UI design. Some BaaS providers, such as AWS Amplify and Backendless, offer user interface designing tools that help developers design the frontend UI of web and mobile apps. While this may be useful for small teams and individual developers, UI design assistance may not be conventional in BaaS as it goes beyond the scope of backend infrastructure. Real-Time. Real-time features in a BaaS platform ensure that data updates and synchronizations occur instantly across all clients, making changes immediately visible to users. This is crucial for applications like live chat and collaborative tools, using technologies like WebSockets to maintain continuous server-client connections. == Service providers == BaaS providers have a broad focus, providing SDKs and APIs that work for app development on multiple platforms with different technology stacks, such as JavaScript (for Web apps), Flutter, Java/Kotlin (for Android apps), Swift/Objective-C (for iOS/MacOS/WatchOS/TvOS apps), .NET (for Windows) and others. BaaS providers also come in different types, suiting developers of different needs. === Cloud-based BaaS === Most BaaS providers host backend platforms on their cloud servers. They also manage the infrastructure, security, and scalability of the platforms. Developers can access the backend services via a web interface or the provided APIs. Some examples of cloud-based BaaS include Firebase (hosted on Google Cloud Platform), AWS Amplify (hosted on Amazon Web Services), and Microsoft Azure Mobile Apps (hosted on Microsoft Azure). === Self-hosted BaaS === Self-hosted BaaS allow developers to host backend on their own servers, providing more flexibility and potential to customization compared to cloud-based BaaS, which often is more difficult to migrate from. However, developers are also in charge of managing the infrastructure, security, and scalability of their servers. === Mobile BaaS === Mobile backend as a service (MBaaS) is a type of BaaS specifically for applications deployed in mobile systems. While some references use MBaaS interchangeably for BaaS, BaaS can have a wider variety of support such as for web apps and desktop apps. == Business model == BaaS providers generate revenue from their services in various ways, often using a freemium model. Under this model, a client receives a certain number of free active users or API calls per month, and pays a fee for each user or call over this limit. Alternatively, clients can pay a set fee for a package which allows for a greater number of calls or active users per month. There are also flat fee plans that make the pricing more predictable. Some of the providers offer the unlimited API calls inside their free plan offerings. Another business model that has been used by a lot of BaaS providers is PAYG (pay as you go), which has a flexible cost based on developers' usage of database, storage, bandwidth, function calls, user numbers etc.

    Read more →
  • Localization Industry Standards Association

    Localization Industry Standards Association

    Localization Industry Standards Association or LISA was a Swiss-based trade body concerning the translation of computer software (and associated materials) into multiple natural languages, which existed from 1990 to February 2011. It counted among its members most of the large information technology companies of the period, including Adobe, Cisco, Hewlett-Packard, IBM, McAfee, Nokia, Novell and Xerox. LISA played a significant role in representing its partners at the International Organization for Standardization (ISO), and the TermBase eXchange (TBX) standard developed by LISA was submitted to ISO in 2007 and became ISO 30042:2008. LISA also had a presence at the W3C. A number of the LISA standards are used by the OASIS Open Architecture for XML Authoring and Localization framework. LISA shut down on 28 February 2011, and its website went offline shortly afterwards. In the wake of the closure of LISA, the European Telecommunications Standards Institute started an Industry Specification Group (ISG) for localization. The ISG has five work items: Term-Base eXchange (TBX) / ISO 30042:2008 Translation Memory eXchange (TMX), with GALA Segmentation Rules eXchange (SRX) / ISO/CD 24621) Global information management Metrics eXchange – Volume (GMX-V); Another organization that was formed in response to the closure of LISA is Terminology for Large Organizations (TerminOrgs), a consortium of terminology professionals who promote terminology management best practices.

    Read more →
  • Lise Getoor

    Lise Getoor

    Lise Getoor is an American computer scientist who is a distinguished professor and Baskin Endowed chair in the Computer Science and Engineering department, at the University of California, Santa Cruz, and an adjunct professor in the Computer Science Department at the University of Maryland, College Park. Her primary research interests are in machine learning and reasoning with uncertainty, applied to graphs and structured data. She also works in data integration, social network analysis and visual analytics. She has edited a book on Statistical relational learning that is a main reference in this domain. She has published many highly cited papers in academic journals and conference proceedings. She has also served as action editor for the Machine Learning Journal, JAIR associate editor, and TKDD associate editor. She received her Ph.D. from Stanford University, her M.S. from UC Berkeley, and her B.S. from UC Santa Barbara. Prior to joining University of California, Santa Cruz, she was a professor at the University of Maryland, College Park until November 2013. == Recognition == Getoor has multiple best paper awards, an NSF Career Award, and is an Association for the Advancement of Artificial Intelligence (AAAI) Fellow. In 2019, she was elected as an ACM Fellow "for contributions to machine learning, reasoning under uncertainty, and responsible data science", was selected as a Distinguished Alumna of the UC Santa Barbara Computer Science Department, was awarded the UCSC WiSE Chancellor's Achievement Award for Diversity, and was selected to give the UC Santa Cruz Faculty Research Lecture 2018-19, one of the highest recognitions given to UC faculty. She was named an IEEE Fellow in 2021, "for contributions to machine learning and reasoning under uncertainty". In October 2022, Getoor was elected a Fellow of the American Association for the Advancement of Science (AAAS). In 2024, she was named a Fellow of the American Academy of Arts and Sciences (AAA&S). Also in 2024, she received the ACM SIGKDD Innovation Award recognizing individuals with outstanding technical innovations in the field of Knowledge Discovery and Data Mining that have had a lasting impact in advancing the theory and practice of the field. == Personal life == Getoor's father was mathematician Ronald Getoor (1929–2017).

    Read more →
  • Model inversion attack

    Model inversion attack

    Model inversion attack is a type of adversarial machine learning attack where an attacker tries to reconstruct or infer sensitive information about a model's training data by analyzing the outputs of a trained machine learning model. Instead of directly querying the underlying dataset, attackers query the model (usually via APIs or prediction interfaces), and leverage patterns in the model responses to infer properties of the original inputs. These attacks leverage the fact that machine learning models encode statistical information about their training data in their parameters and outputs, which can unintentionally leak private or proprietary information. Depending on the access level to the target model, model inversion attacks can be performed in both black-box and white-box settings. In a generic attack, an adversary makes several queries to a model and leverages the responses (e.g. confidence scores, predictions) to train a surrogate or inversion model that learns to approximate the inverse mapping from outputs to inputs. This process may enable the reconstruction of sensitive attributes, e.g., facial features, medical data, or user behavior patterns, from models trained on such data. The technique has been demonstrated against various models like deep neural networks, classification systems etc. The technique has significant privacy risks in areas like healthcare, finance, biometric identification etc. Mitigation strategies include restricting model access, reducing output granularity, using differential privacy and monitoring anomalous query patterns.

    Read more →
  • Histogram of oriented displacements

    Histogram of oriented displacements

    Histogram of oriented displacements (HOD) is a 2D trajectory descriptor. The trajectory is described using a histogram of the directions between each two consecutive points. Given a trajectory T = {P1, P2, P3, ..., Pn}, where Pt is the 2D position at time t. For each pair of positions Pt and Pt+1, calculate the direction angle θ(t, t+1). Value of θ is between 0 and 360. A histogram of the quantized values of θ is created. If the histogram is of 8 bins, the first bin represents all θs between 0 and 45. The histogram accumulates the lengths of the consecutive moves. For each θ, a specific histogram bin is determined. The length of the line between Pt and Pt+1 is then added to the specific histogram bin. To show the intuition behind the descriptor, consider the action of waving hands. At the end of the action, the hand falls down. When describing this down movement, the descriptor does not care about the position from which the hand started to fall. This fall will affect the histogram with the appropriate angles and lengths, regardless of the position where the hand started to fall. HOD records for each moving point: how much it moves in each range of directions. HOD has a clear physical interpretation. It proposes that, a simple way to describe the motion of an object, is to indicate how much distance it moves in each direction. If the movement in all directions are saved accurately, the movement can be repeated from the initial position to the final destination regardless of the displacements order. However, the temporal information will be lost, as the order of movements is not stored-this is what we solve by applying the temporal pyramid, as shown in section \ref{sec:temp-pyramid}. If the angles quantization range is small, classifiers that use the descriptor will overfit. Generalization needs some slack in directions-which can be done by increasing the quantization range.

    Read more →
  • Christopher K. I. Williams

    Christopher K. I. Williams

    Christopher Kenneth Ingle Williams (born 1960) is a professor at the School of Informatics, University of Edinburgh, working in Artificial intelligence, and particularly the areas of Machine learning and Computer vision. == Education == Williams received a BA in Physics and Theoretical Physics from the University of Cambridge in 1982, followed by Part III Mathematics (1983). He did a MSc in Water Resources at the University of Newcastle-Upon-Tyne, then worked in Lesotho on low-cost sanitation. In 1988, he studied at the Department of Computer Science of the University of Toronto under the supervision of Geoffrey Hinton. He obtained his MSc and PhD both in computer science, in 1990 and 1994, respectively. == Career and research == In 1994, Williams moved to Aston University as a Research Fellow. He became a Lecturer in August 1995. He moved to the University of Edinburgh in July 1998 and became Reader in 2000. He obtained a Personal Chair in Machine Learning in 2005 in the School of Informatics. Williams has been a Fellow of the European Laboratory for Learning and Intelligent Systems (ELLIS) since 2019. Williams' research interests are in machine learning and computer vision. He has worked on new models for understanding time-series and images, and for finding structure in data. He is best known for his work on Gaussian processes and for the book Gaussian Processes for Machine Learning, co-authored with Carl Rasmussen. The book received the 2009 DeGroot Prize of the International Society for Bayesian Analysis. Williams was an organizer of the PASCAL Visual Object Classes (VOC) project (2005–2012) along with Mark Everingham, Luc van Gool, John Winn, and Andrew Zisserman. == Awards and honours == In 2021 Williams was elected a Fellow of the Royal Society of Edinburgh (FRSE).

    Read more →
  • Roni Rosenfeld

    Roni Rosenfeld

    Roni Rosenfeld (Hebrew: רוני רוזנפלד) is an Israeli-American computer scientist and computational epidemiologist, currently serving as the head of the Machine Learning Department at Carnegie Mellon University. He is an international expert in machine learning, infectious disease forecasting, statistical language modeling and artificial intelligence. == Education == Rosenfeld received his B.Sc. in mathematics and physics from Tel Aviv University in 1985. He received his Ph.D. in computer science from Carnegie Mellon University in 1994. While a graduate student, he developed and open-sourced a statistical language-modeling toolkit to allow anyone to create statistical language models from their own corpora and experiment with and extend the toolkit's capabilities. The toolkit has been used by more than 100 NLP laboratories in more than 20 countries. Rosenfeld's Ph.D. thesis, A Maximum Entropy Approach to Adaptive Statistical Language Modeling, was advised by Raj Reddy and Xuedong Huang and won the 2001 Computer, Speech and Language award for "Most Influential Paper in the Last 5 Years." == Career == Shortly after receiving his Ph.D., Rosenfeld joined the faculty of the Carnegie Mellon School of Computer Science as an assistant professor. He was promoted to the rank of associate professor in 1999 and received tenure in 2001. In 2005 he was promoted to professor of language technologies, machine learning computer science and computational biology in the School of Computer Science at Carnegie Mellon University. Rosenfeld also holds adjunct appointments at the University of Pittsburgh School of Medicine, department of computational and systems biology. From 2002 to 2003, Rosenfeld was a visiting professor at the University of Hong Kong. Rosenfeld is the director of Carnegie Mellon's Machine Learning for Social Good (ML4SG) program. He has held educational leadership positions in a variety of programs, including the M.S. in computational finance (1997–1999), graduate computational and statistical learning (2001–2003), M.S. in machine learning (2017) and undergraduate minor in machine learning. Rosenfeld was appointed Head of Carnegie Mellon's Machine Learning Department in 2018. == Research == Rosenfeld's research interests include epidemiological forecasting, information and communication technologies for development (ICT4D), and machine learning for social good. === Epidemiological forecasting === Rosenfeld is a world expert in epidemiological forecasting. He founded and directs the Delphi research group, which has won most of the epidemiological forecasting challenges organized by the U.S. CDC and other U.S. government agencies. In December 2016, the CDC named his group the "Most Accurate Forecaster" for 2015–2016, and in October 2017, the Delphi group's two systems took the top two spots in the 2016-2017 flu forecasting challenge. The CDC recognized Rosenfeld's Delphi group at Carnegie Mellon University as having contributed the most accurate national-, regional-, and state-level influenza-like illness forecasts and national-level hospitalization forecasts to the site. In 2019, the CDC recognized forecasts provided by the Delphi group at Carnegie Mellon as having been the most accurate for five seasons in a row, and named the Delphi group an Influenza Forecasting Center of Excellence, a five-year designation that includes $3 million in research funding. Rosenfeld describes his forecasting research goal as "to make epidemiological forecasting as universally accepted and useful as weather forecasting is today." His recent work in the area has focused on selecting high value epidemiological forecasting targets (e.g. Influenza and Dengue); creating baseline forecasting methods for them; establishing metrics for measuring and tracking forecasting accuracy; estimating the limits of forecastability for each target; and identifying new sources of data that could be helpful to the forecasting goal. == Honors and awards == 2017 Joel and Ruth Spira Teaching Award 2017 CDC Influenza Forecasting Challenge "Most Accurate Forecaster" 1992 Allen Newell Medal for Research Excellence

    Read more →