Information Harvesting

Information Harvesting

Information Harvesting (IH) was an early data mining product from the 1990s. It was invented by Ralphe Wiggins and produced by the Ryan Corp, later Information Harvesting Inc., of Cambridge, Massachusetts. Wiggins had a background in genetic algorithms and fuzzy logic. IH sought to infer rules from sets of data. It did this first by classifying various input variables into one of a number of bins, thereby putting some structure on the continuous variables in the input. IH then proceeds to generate rules, trading off generalization against memorization, that will infer the value of the prediction variable, possibly creating many levels of rules in the process. It included strategies for checking if overfitting took place and, if so, correcting for it. Because of its strategies for correcting for overfitting by considering more data, and refining the rules based on that data, IH might also be considered to be a form of machine learning. The advantage of IH, as compared with other data mining products of its time and even later, was that it provided a mechanism for finding multiple rules that would classify the data and determining, according to set criteria, the best rules to use.

Outline of machine learning

The following outline is provided as an overview of, and topical guide to, machine learning: Machine learning (ML) is a subfield of artificial intelligence within computer science that evolved from the study of pattern recognition and computational learning theory. In 1959, Arthur Samuel defined machine learning as a "field of study that gives computers the ability to learn without being explicitly programmed". ML involves the study and construction of algorithms that can learn from and make predictions on data. These algorithms operate by building a model from a training set of example observations to make data-driven predictions or decisions expressed as outputs, rather than following strictly static program instructions. == How can machine learning be categorized? == An academic discipline A branch of science An applied science A subfield of computer science A branch of artificial intelligence A subfield of soft computing Application of statistics === Paradigms of machine learning === Supervised learning, where the model is trained on labeled data Unsupervised learning, where the model tries to identify patterns in unlabeled data Reinforcement learning, where the model learns to make decisions by receiving rewards or penalties. == Applications of machine learning == Applications of machine learning Bioinformatics Biomedical informatics Computer vision Customer relationship management Data mining Earth sciences Email filtering Inverted pendulum (balance and equilibrium system) Natural language processing Named Entity Recognition Automatic summarization Automatic taxonomy construction Dialog system Grammar checker Language recognition Handwriting recognition Optical character recognition Speech recognition Text to Speech Synthesis Speech Emotion Recognition Machine translation Question answering Speech synthesis Text mining Term frequency–inverse document frequency Text simplification Pattern recognition Facial recognition system Handwriting recognition Image recognition Optical character recognition Speech recognition Recommendation system Collaborative filtering Content-based filtering Hybrid recommender systems Search engine Search engine optimization Social engineering == Machine learning hardware == Graphics processing unit Tensor processing unit Vision processing unit == Machine learning tools == Comparison of machine learning software Comparison of deep learning software === Machine learning frameworks === ==== Proprietary machine learning frameworks ==== Amazon Machine Learning Microsoft Azure Machine Learning Studio DistBelief (replaced by TensorFlow) ==== Open source machine learning frameworks ==== Apache Singa Apache MXNet Caffe PyTorch mlpack TensorFlow Torch CNTK Accord.Net Jax MLJ.jl – A machine learning framework for Julia === Machine learning libraries === Deeplearning4j Theano scikit-learn Keras === Machine learning algorithms === == Machine learning methods == === Instance-based algorithm === K-nearest neighbors algorithm (KNN) Learning vector quantization (LVQ) Self-organizing map (SOM) === Regression analysis === Logistic regression Ordinary least squares regression (OLSR) Linear regression Stepwise regression Multivariate adaptive regression splines (MARS) Regularization algorithm Ridge regression Least Absolute Shrinkage and Selection Operator (LASSO) Elastic net Least-angle regression (LARS) Classifiers Probabilistic classifier Naive Bayes classifier Binary classifier Linear classifier Hierarchical classifier === Dimensionality reduction === Dimensionality reduction Canonical correlation analysis (CCA) Factor analysis Feature extraction Feature selection Independent component analysis (ICA) Linear discriminant analysis (LDA) Multidimensional scaling (MDS) Non-negative matrix factorization (NMF) Partial least squares regression (PLSR) Principal component analysis (PCA) Principal component regression (PCR) Projection pursuit Sammon mapping t-distributed stochastic neighbor embedding (t-SNE) === Ensemble learning === Ensemble learning AdaBoost Boosting Bootstrap aggregating (also "bagging" or "bootstrapping") Ensemble averaging Gradient boosted decision tree (GBDT) Gradient boosting Random Forest Stacked Generalization === Meta-learning === Meta-learning Inductive bias Metadata === Reinforcement learning === Reinforcement learning Q-learning State–action–reward–state–action (SARSA) Temporal difference learning (TD) Learning Automata === Supervised learning === Supervised learning Averaged one-dependence estimators (AODE) Artificial neural network Case-based reasoning Gaussian process regression Gene expression programming Group method of data handling (GMDH) Inductive logic programming Instance-based learning Lazy learning Learning Automata Learning Vector Quantization Logistic Model Tree Minimum message length (decision trees, decision graphs, etc.) Nearest Neighbor Algorithm Analogical modeling Probably approximately correct learning (PAC) learning Ripple down rules, a knowledge acquisition methodology Symbolic machine learning algorithms Support vector machines Random Forests Ensembles of classifiers Bootstrap aggregating (bagging) Boosting (meta-algorithm) Ordinal classification Conditional Random Field ANOVA Quadratic classifiers k-nearest neighbor Boosting SPRINT Bayesian networks Naive Bayes Hidden Markov models Hierarchical hidden Markov model ==== Bayesian ==== Bayesian statistics Bayesian knowledge base Naive Bayes Gaussian Naive Bayes Multinomial Naive Bayes Averaged One-Dependence Estimators (AODE) Bayesian Belief Network (BBN) Bayesian Network (BN) ==== Decision tree algorithms ==== Decision tree algorithm Decision tree Classification and regression tree (CART) Iterative Dichotomiser 3 (ID3) C4.5 algorithm C5.0 algorithm Chi-squared Automatic Interaction Detection (CHAID) Decision stump Conditional decision tree ID3 algorithm Random forest SLIQ ==== Linear classifier ==== Linear classifier Fisher's linear discriminant Linear regression Logistic regression Multinomial logistic regression Naive Bayes classifier Perceptron Support vector machine === Unsupervised learning === Unsupervised learning Expectation-maximization algorithm Vector Quantization Generative topographic map Information bottleneck method Association rule learning algorithms Apriori algorithm Eclat algorithm ==== Artificial neural networks ==== Artificial neural network Feedforward neural network Extreme learning machine Convolutional neural network Recurrent neural network Long short-term memory (LSTM) Logic learning machine Self-organizing map ==== Association rule learning ==== Association rule learning Apriori algorithm Eclat algorithm FP-growth algorithm ==== Hierarchical clustering ==== Hierarchical clustering Single-linkage clustering Conceptual clustering ==== Cluster analysis ==== Cluster analysis BIRCH DBSCAN Expectation–maximization (EM) Fuzzy clustering Hierarchical clustering k-means clustering k-medians Mean-shift OPTICS algorithm ==== Anomaly detection ==== Anomaly detection k-nearest neighbors algorithm (k-NN) Local outlier factor === Semi-supervised learning === Semi-supervised learning Active learning Generative models Low-density separation Graph-based methods Co-training Transduction === Deep learning === Deep learning Deep belief networks Deep Boltzmann machines Deep Convolutional neural networks Deep Recurrent neural networks Hierarchical temporal memory Generative Adversarial Network Style transfer Transformer Stacked Auto-Encoders === Other machine learning methods and problems === Anomaly detection Association rules Bias-variance dilemma Classification Multi-label classification Clustering Data Pre-processing Empirical risk minimization Feature engineering Feature learning Learning to rank Occam learning Online machine learning PAC learning Regression Reinforcement Learning Semi-supervised learning Statistical learning Structured prediction Graphical models Bayesian network Conditional random field (CRF) Hidden Markov model (HMM) Unsupervised learning VC theory == Machine learning research == List of artificial intelligence projects List of datasets for machine learning research == History of machine learning == History of machine learning Timeline of machine learning == Machine learning projects == Machine learning projects: DeepMind Google Brain OpenAI Meta AI Hugging Face == Machine learning organizations == === Machine learning conferences and workshops === Artificial Intelligence and Security (AISec) (co-located workshop with CCS) Conference on Neural Information Processing Systems (NIPS) ECML PKDD International Conference on Machine Learning (ICML) ML4ALL (Machine Learning For All) == Machine learning publications == === Books on machine learning === Mathematics for Machine Learning Hands-On Machine Learning Scikit-Learn, Keras, and TensorFlow The Hundred-Page Machine Learning Book === Machine learning journals === Machine Learning Journal of Machine Learning Research (JMLR) Neural Computation == Pe

Bandhan Tod

Bandhan Tod is a mobile app to stop child marriage in India's Bihar state through SOS button in the app. When the SOS on Bandhan Tod is activated, the nearest small NGO will attempt to resolve the issue. If the family resists, then the police gets notified. Till now so many child marriages has been cancelled through Bandhan Tod interventions. Bandhan Tod is an initiative of Gender Alliance managed by Prashanti Tiwari to support the state government's efforts to end child marriage and dowry.

Vismon

Vismon was the Bell Labs system which displayed authors' faces on one of their internal e-mail systems. The name was a pun on the sysmon program used at Bell to show the load on computer systems. It can also be interpreted as "visual monitor". The system inspired Rich Burridge to develop the similar but more widespread faces system, which spread with Unix distributions in the 1980s. This in turn inspired Steve Kinzler to develop the Picons, or personal icons, which have the goal of offering symbols and other images, as well as faces, to represent individuals and institutions in email messages. Other systems such as the faces available on the LAN email functions of the NeXTSTEP platform also seem to have been influenced by the original Vismon capabilities. The faces program in Plan 9 is the direct descendant of this system. Vismon was the work of Rob Pike and Dave Presotto. It was based on some early experiments by Luca Cardelli. Many other scientists and engineers of the Computing Science Research Center of the Murray Hill facility were also involved. All had been spurred by the introduction in 1983 of the new Blit graphics terminal developed by Pike and Bart Locanthi and marketed by Teletype Corporation of Skokie, Illinois as the DMD 5620. Pike was eager, along with his colleagues, to exploit the new graphic capabilities. Pike and company went around their Center, convincing everybody, from directors and administrative assistants to engineers and scientists, to pose as they got out a 4×5 view camera with a Polaroid back and took black-and-white photos (Polaroid type 52) of their faces. Their efforts yielded nearly 100 faces, which they digitised with a scanner from graphics colleagues. They wrote several programs to transform the faces, store them and serve them on several machines at the lab. As time went by, they added faces from outside their Center and outside Bell Labs. This database also led to the pico image editor (originally named zunk) which was used for image transformations, many of them with colleagues as the preferred target. The first programs built around vismon were used to announce incoming mail in a dedicated window, using the 48 by 48 pixel faces. Later on the faces were also used to decorate line printer banners.

ArcSoft ShowBiz

ShowBiz is a video editor by ArcSoft for the Windows operating system. It can create VCD and DVDs and can also export to the formats AVI, MPEG, WMV, and MOV. ShowBiz also contains a DVD burning and menu building feature. As of 2003, it was one of the three most dominant bundled titles. == Reception == PC Magazine reviewer Jan Ozer states: "ArcSoft's ShowBiz has evolved into a competent editor that's generally more usable than Dazzle's MovieStar program, providing more configuration controls, better preview features, and a much greater range of fun effects." John Virata, senior editor of Digital Media Online, says in his three page review of ShowBiz DVD 2, "It is an easy editor to work with and has a logically laid out interface that takes you step by step through the video creation and DVD creation process"

Security.txt

security.txt is an accepted standard for website security information that allows security researchers to report security vulnerabilities easily. The standard prescribes a text file named security.txt in the well known location, similar in syntax to robots.txt but intended to be machine and human readable, for those wishing to contact a website's owner about security issues. security.txt files have been adopted by Google, GitHub, LinkedIn, and Facebook. == History == The Internet Draft was first submitted by Edwin Foudil in September 2017. At that time it covered four directives, "Contact", "Encryption", "Disclosure" and "Acknowledgement". Foudil expected to add further directives based on feedback. In addition, web security expert Scott Helme said he had seen positive feedback from the security community while use among the top 1 million websites was "as low as expected right now". In 2019, the Cybersecurity and Infrastructure Security Agency (CISA) published a draft binding operational directive that requires all US federal agencies to publish a security.txt file within 180 days. The Internet Engineering Steering Group (IESG) issued a Last Call for security.txt in December 2019 which ended on January 6, 2020. A study in 2021 found that over ten percent of top-100 websites published a security.txt file, with the percentage of sites publishing the file decreasing as more websites were considered. The study also noted a number of discrepancies between the standard and the content of the file. In April 2022 the security.txt file has been accepted by Internet Engineering Task Force (IETF) as RFC 9116. == File format == security.txt files can be served under the /.well-known/ directory (i.e. /.well-known/security.txt) or the top-level directory (i.e. /security.txt) of a website. The file must be served over HTTPS and in plaintext format.

Anthrobotics

Anthrobotics is the science of developing and studying robots that are either entirely or in some way human-like. The term anthrobotics was originally coined by Mark Rosheim in a paper entitled "Design of An Omnidirectional Arm" presented at the IEEE International Conference on Robotics and Automation, May 13–18, 1990, pp. 2162–2167. Rosheim says he derived the term from "...Anthropomorphic and Robotics to distinguish the new generation of dexterous robots from its simple industrial robot forebears." The word gained wider recognition as a result of its use in the title of Rosheim's subsequent book Robot Evolution: The Development of Anthrobotics, which focussed on facsimiles of human physical and psychological skills and attributes. However, a wider definition of the term anthrobotics has been proposed, in which the meaning is derived from anthropology rather than anthropomorphic. This usage includes robots that respond to input in a human-like fashion, rather than simply mimicking human actions, thus theoretically being able to respond more flexibly or to adapt to unforeseen circumstances. This expanded definition also encompasses robots that are situated in social environments with the ability to respond to those environments appropriately, such as insect robots, robotic pets, and the like. Anthrobotics is now taught at some universities, encouraging students not only to design and build robots for environments beyond current industrial applications, but also to speculate on the future of robotics that are embedded in the world at large, as mobile phones and computers are today. In 2016 philosopher Luis de Miranda created the Anthrobotics Cluster at the University of Edinburgh "a platform of cross-disciplinary research that seeks to investigate some of the biggest questions that will need to be answered" on the relationship between humans, robots and intelligent systems and "a think tank on the social spread of robotics, and also how automation is part of the definition of what humans have always been". to explore the symbiotic relationship between humans and automated protocols.