Cloud-Based Secure File Transfer is a managed or hosted file transfer service that provides cloud storage that can be accessed via SSH File Transfer Protocol (SFTP). These services allow secure, reliable file transfers while offering the scalability, redundancy, and high availability of cloud infrastructure. == Technical overview == The evolution of file transfer protocols began with File Transfer Protocol (FTP) and SSH File Transfer Protocol (SFTP). SFTP offered enhanced security through the use of SSH (Secure Shell) encryption, which addressed many of the security concerns associated with traditional FTP. Over time, as businesses increasingly adopted cloud infrastructure, the demand for services that integrate secure file transfer with cloud storage led to the rise of Cloud-Based Secure File Transfer services. These services combine the benefits of secure, encrypted file transfer with the scalability and flexibility of cloud-based storage systems. Traditional on-premises SFTP typically involves setting up and managing physical or virtual servers to handle file transfers. In contrast, Cloud-Based Secure File Transfer utilizes managed cloud infrastructure, such as AWS EC2, Azure VMs, or Google Cloud, to automate scaling, ensure redundancy, and provide high availability. These cloud environments can be configured to automatically scale with demand, enabling businesses to handle large volumes of data transfers without the need for extensive physical hardware. == Features == Scalability and availability: Cloud-Based Secure File Transfer services are inherently scalable, with features like load balancing, multi-region deployments, and auto-scaling groups that adjust resources in response to traffic spikes. This ensures that the system can handle varying workloads and provides continuous availability, even during high-demand periods. Cost-effectiveness: By eliminating the need for physical infrastructure and reducing ongoing server maintenance costs, Cloud-Based Secure File Transfer services offer significant cost savings compared to traditional on-premises services. Cloud providers typically offer pay-as-you-go pricing models, where users only pay for the resources they use, further optimizing costs. Security and compliance: Cloud-Based Secure File Transfer products offer strong security measures, including end-to-end encryption, key management, detailed logging, and auditing. These services are often compliant with industry regulations such as HIPAA (Health Insurance Portability and Accountability Act), GDPR (General Data Protection Regulation), and SOC 2 (System and Organization Controls), ensuring that data transfers meet necessary security and privacy standards. == Cloud-Based Secure File Transfer providers == == Uses == Cloud-Based Secure File Transfer is used across various industries to securely transfer sensitive data and integrate into business workflows. In healthcare, Cloud-Based Secure File Transfer is essential for securely transferring electronic Protected Health Information (ePHI), ensuring compliance with regulations like HIPAA. In financial institutions, it is used to protect sensitive financial data during transfer, maintaining privacy and security. Data analytics also benefits from Cloud-Based Secure File Transfer, offering a secure and efficient method for transferring large datasets between systems or partners. Technically, Cloud-Based Secure File Transfer is often integrated into enterprise workflows through automated file transfers, using scripting or APIs. It also plays a key role in cloud backup and disaster recovery, ensuring that files are securely transferred and stored in cloud environments, which supports business continuity. However, businesses must address certain implementation challenges. Despite its secure design, Cloud-Based Secure File Transfer is not immune to risks such as misconfigured SSH keys, improper access control, or inadequate encryption. Regular security audits and careful configuration management are necessary to minimize the risk of data breaches. Additionally, integrating Cloud-Based Secure File Transfer with legacy systems can present challenges, such as incompatible APIs or outdated authentication methods. == Comparisons with related technologies == Cloud-Based Secure File Transfer differs from traditional SFTP primarily in its deployment and management model. Traditional SFTP services are typically hosted on-premises or on virtual servers, requiring manual configuration, ongoing infrastructure maintenance, and security management by in-house IT teams. In contrast, Cloud-Based Secure File Transfer is offered as a Software-as-a-Service (SaaS) service, reducing infrastructure overhead by eliminating the need for dedicated hardware or virtual machines. This model simplifies management through centralized web-based interfaces, automated updates, and built-in scalability. While Cloud-Based Secure File Transfer is focused on providing secure file transfers over the SFTP protocol, Managed File Transfer (MFT) platforms generally support a broader range of protocols, including FTP, FTPS, HTTP/S, and AS2. MFT services often include advanced features such as end-to-end encryption, extensive automation, compliance reporting, and integration with enterprise systems. Cloud-Based Secure File Transfer services may offer some of these features but are typically more lightweight and streamlined, targeting organizations seeking a secure and scalable alternative to traditional SFTP without the full suite of MFT capabilities. As such, Cloud-Based Secure File Transfer can be seen as a specialized subset within the broader managed file transfer ecosystem.
Superintelligence ban
Superintelligence ban refers to proposed legal, ethical, or policy measures intended to restrict or prohibit the development of artificial superintelligence, AI systems that would surpass human cognitive abilities in nearly all domains. The idea arises from concerns that such systems could become uncontrollable, potentially posing existential threats to humanity or causing severe social and economic disruption. == Background == The concept of limiting or banning superintelligence research has roots in early 21st-century debates on artificial general intelligence (AGI) safety. Thinkers such as Nick Bostrom and Eliezer Yudkowsky warned that self-improving AI could rapidly exceed human oversight. As advanced models like large-scale language models and autonomous agents began demonstrating complex reasoning abilities, policymakers and ethicists increasingly discussed the need for legal constraints on the creation of systems capable of recursive self-improvement. In October 2025, the Future of Life Institute published a statement calling for "a prohibition on the development of superintelligence, not lifted before there is broad scientific consensus that it will be done safely and controllably, and strong public buy-in." This statement was signed by various public personalities, such as Richard Branson and Steve Wozniak, and AI experts, such as Yoshua Bengio and Geoffrey Hinton. == Rationale == Supporters of a superintelligence ban argue that once AI systems surpass human intelligence, traditional containment, alignment, and control methods may fail. They contend that even limited experimentation with such systems could lead to irreversible outcomes, including loss of human decision-making power or unintended global harm. Some propose international treaties modeled after the nuclear non-proliferation framework to prevent a competitive AI arms race. Opponents argue that a ban would be difficult to define and enforce, given the lack of a precise threshold distinguishing advanced AGI from superintelligence. They also warn that excessive restriction could slow scientific progress, hinder beneficial automation, and encourage unregulated underground research. == Global discussion == Although no government has enacted an explicit superintelligence ban, the idea has been debated within the European Union, United Nations, and several independent AI safety organizations. The Future of Life Institute, Center for AI Safety, and other organizations have called for international cooperation to manage risks associated with the pursuit of superintelligent systems. In 2024 and 2025, proposals for a temporary moratorium on frontier AI research were circulated among major technology firms and research institutes, reflecting growing public concern over the trajectory of AI capabilities.
Natarajan dimension
In the theory of Probably Approximately Correct Machine Learning, the Natarajan dimension characterizes the complexity of learning a set of functions, generalizing from the Vapnik–Chervonenkis dimension for boolean functions to multi-class functions. Originally introduced as the Generalized Dimension by Natarajan, it was subsequently renamed the Natarajan Dimension by Haussler and Long. == Definition == Let H {\displaystyle H} be a set of functions from a set X {\displaystyle X} to a set Y {\displaystyle Y} . H {\displaystyle H} shatters a set C ⊂ X {\displaystyle C\subset X} if there exist two functions f 0 , f 1 ∈ H {\displaystyle f_{0},f_{1}\in H} such that For every x ∈ C , f 0 ( x ) ≠ f 1 ( x ) {\displaystyle x\in C,f_{0}(x)\neq f_{1}(x)} . For every B ⊂ C {\displaystyle B\subset C} , there exists a function h ∈ H {\displaystyle h\in H} such that for all x ∈ B , h ( x ) = f 0 ( x ) {\displaystyle x\in B,h(x)=f_{0}(x)} and for all x ∈ C − B , h ( x ) = f 1 ( x ) {\displaystyle x\in C-B,h(x)=f_{1}(x)} . The Natarajan dimension of H is the maximal cardinality of a set shattered by H {\displaystyle H} . It is easy to see that if | Y | = 2 {\displaystyle |Y|=2} , the Natarajan dimension collapses to the Vapnik–Chervonenkis dimension. Shalev-Shwartz and Ben-David present comprehensive material on multi-class learning and the Natarajan dimension, including uniform convergence and learnability. Recently, Cohen et al showed that the Natarajan dimension is the dominant term governing agnostic multi-class PAC learnability.
Word2vec
Word2vec is a technique in natural language processing for obtaining vector representations of words. These vectors capture information about the meaning of the word based on the surrounding words. The word2vec algorithm estimates these representations by modeling text in a large corpus. Once trained, such a model can detect synonymous words or suggest additional words for a partial sentence. Word2vec was developed by Tomáš Mikolov, Kai Chen, Greg Corrado, Ilya Sutskever and Jeff Dean at Google, and published in 2013. Word2vec represents a word as a high-dimension vector of numbers which capture relationships between words. In particular, words which appear in similar contexts are mapped to vectors which are nearby as measured by cosine similarity. This indicates the level of semantic similarity between the words, so for example the vectors for walk and ran are nearby, as are those for "but" and "however", and "Berlin" and "Germany". == Approach == Word2vec is a group of related models that are used to produce word embeddings. These models are shallow, two-layer neural networks that are trained to reconstruct linguistic contexts of words. Word2vec takes as its input a large corpus of text and produces a mapping of the set of words to a vector space, typically of several hundred dimensions, with each unique word in the corpus being assigned a vector in the space. Word2vec can use either of two model architectures to produce these distributed representations of words: continuous bag of words (CBOW) or continuously sliding skip-gram. In both architectures, word2vec considers both individual words and a sliding context window as it iterates over the corpus. The CBOW can be viewed as a 'fill in the blank' task, where the word embedding represents the way the word influences the relative probabilities of other words in the context window. Words which are semantically similar should influence these probabilities in similar ways, because semantically similar words should be used in similar contexts. The order of context words does not influence prediction (bag of words assumption). In the continuous skip-gram architecture, the model uses the current word to predict the surrounding window of context words. The skip-gram architecture weighs nearby context words more heavily than more distant context words. According to the authors' note, CBOW is faster while skip-gram does a better job for infrequent words. After the model is trained, the learned word embeddings are positioned in the vector space such that words that share common contexts in the corpus — that is, words that are semantically and syntactically similar — are located close to one another in the space. More dissimilar words are located farther from one another in the space. == Mathematical details == This section is based on expositions. A corpus is a sequence of words. Both CBOW and skip-gram are methods to learn one vector per word appearing in the corpus. Let V {\displaystyle V} ("vocabulary") be the set of all words appearing in the corpus C {\displaystyle C} . Our goal is to learn one vector v w ∈ R d {\displaystyle v_{w}\in \mathbb {R} ^{d}} for each word w ∈ V {\displaystyle w\in V} . The idea of skip-gram is that the vector of a word should be close to the vector of each of its neighbors. The idea of CBOW is that the vector-sum of a word's neighbors should be close to the vector of the word. === Continuous bag-of-words (CBOW) === The idea of CBOW is to represent each word with a vector, such that it is possible to predict a word using the sum of the vectors of its neighbors. Specifically, for each word w i {\displaystyle w_{i}} in the corpus, the one-hot encoding of the word is used as the input to the neural network. The output of the neural network is a probability distribution over the dictionary, representing a prediction of individual words in the neighborhood of w i {\displaystyle w_{i}} . The objective of training is to maximize ∑ i ln Pr ( w i ∣ w i + j : j ∈ N ) {\displaystyle \sum _{i}\ln \Pr(w_{i}\mid w_{i+j}\colon j\in N)} where N {\displaystyle N} is a set of (non-zero) indices representing the relative locations of nearby words considered to be in w i {\displaystyle w_{i}} 's neighborhood. For example, if we want each word in the corpus to be predicted by every other word in a small span of 4 words. The set of relative indexes of neighbor words will be: N = { − 2 , − 1 , + 1 , + 2 } {\displaystyle N=\{-2,-1,+1,+2\}} , and the objective is to maximize ∑ i ln Pr ( w i ∣ w i − 2 , w i − 1 , w i + 1 , w i + 2 ) {\displaystyle \sum _{i}\ln \Pr(w_{i}\mid w_{i-2},w_{i-1},w_{i+1},w_{i+2})} . In standard bag-of-words, a word's context is represented by a word-count (aka a word histogram) of its neighboring words. For example, the "sat" in "the cat sat on the mat" is represented as {"the": 2, "cat": 1, "on": 1}. Note that the last word "mat" is not used to represent "sat", because it is outside the neighborhood N = { − 2 , − 1 , + 1 , + 2 } {\displaystyle N=\{-2,-1,+1,+2\}} . In continuous bag-of-words, the histogram is multiplied by a matrix V {\displaystyle V} to obtain a continuous representation of the word's context. The matrix V {\displaystyle V} is also called a dictionary. Its columns are the word vectors. It has D {\displaystyle D} columns, where D {\displaystyle D} is the size of the dictionary. Let d {\displaystyle d} be the length of each word vector. We have V ∈ R d × D {\displaystyle V\in \mathbb {R} ^{d\times D}} . For example, multiplying the word histogram {"the": 2, "cat": 1, "on": 1} with V {\displaystyle V} , we obtain 2 v the + v cat + v on {\displaystyle 2v_{\text{the}}+v_{\text{cat}}+v_{\text{on}}} . This is then multiplied with another matrix V ′ {\displaystyle V'} of shape R D × d {\displaystyle \mathbb {R} ^{D\times d}} . Each row of it is a word vector v ′ {\displaystyle v'} . This results in a vector of length D {\displaystyle D} , one entry per dictionary entry. Then, apply the softmax to obtain a probability distribution over the dictionary. This system can be visualized as a neural network, similar in spirit to an autoencoder, of architecture linear-linear-softmax, as depicted in the diagram. The system is trained by gradient descent to minimize the cross-entropy loss. In full formula, the cross-entropy loss is: − ∑ i ln e v w i ′ ⋅ ( ∑ j ∈ N v w j + i ) ∑ w ′ e v w ′ ′ ⋅ ( ∑ j ∈ N v w j + i ) {\displaystyle -\sum _{i}\ln {\frac {e^{v_{w_{i}}'\cdot (\sum _{j\in N}v_{w_{j+i}})}}{\sum _{w'}e^{v_{w'}'\cdot (\sum _{j\in N}v_{w_{j+i}})}}}} where the outer summation ∑ i {\displaystyle \sum _{i}} is over the words in a corpus, the quantity ∑ j ∈ N v w j + i {\displaystyle \sum _{j\in N}v_{w_{j+i}}} is the sum of a word's neighbors' vectors, etc. Once such a system is trained, we have two trained matrices V , V ′ {\displaystyle V,V'} . Either the column vectors of V {\displaystyle V} or the row vectors of V ′ {\displaystyle V'} can serve as the dictionary. For example, the word "sat" can be represented as either the "sat"-th column of V {\displaystyle V} or the "sat"-th row of V ′ {\displaystyle V'} . It is also possible to simply define V ′ = V ⊤ {\displaystyle V'=V^{\top }} , in which case there would no longer be a choice. === Skip-gram === The idea of skip-gram is to represent each word with a vector, such that it is possible to predict the vectors of its neighbors using the vector of a word. The architecture is still linear-linear-softmax, the same as CBOW, but the input and the output are switched. Specifically, for each word w i {\displaystyle w_{i}} in the corpus, the one-hot encoding of the word is used as the input to the neural network. The output of the neural network is a probability distribution over the dictionary, representing a prediction of individual words in the neighborhood of w i {\displaystyle w_{i}} . The objective of training is to maximize ∑ i ∑ j ∈ N ln Pr ( w j + i ∣ w i ) {\displaystyle \sum _{i}\sum _{j\in N}\ln \Pr(w_{j+i}\mid w_{i})} . In full formula, the loss function is − ∑ i ∑ j ∈ N ln e v w j + i ′ ⋅ v w i ∑ w ′ e v w ′ ′ ⋅ v w i {\displaystyle -\sum _{i}\sum _{j\in N}\ln {\frac {e^{v_{w_{j+i}}'\cdot v_{w_{i}}}}{\sum _{w'}e^{v_{w'}'\cdot v_{w_{i}}}}}} Same as CBOW, once such a system is trained, we have two trained matrices V , V ′ {\displaystyle V,V'} . Either the column vectors of V {\displaystyle V} or the row vectors of V ′ {\displaystyle V'} can serve as the dictionary. It is also possible to simply define V ′ = V ⊤ {\displaystyle V'=V^{\top }} , in which case there would no longer be a choice. Essentially, skip-gram and CBOW are exactly the same in architecture. They only differ in the objective function during training. == History == During the 1980s, there were some early attempts at using neural networks to represent words and concepts as vectors. In 2010, Tomáš Mikolov (then at Brno University of Technology) with co-authors applied a simple recurrent neural network with a single hidden
Facial recognition system
A facial recognition system is a technology potentially capable of matching a human face from a digital image or a video frame against a database of faces. Such a system is typically employed to authenticate users through ID verification services, and works by pinpointing and measuring facial features from a given image. Development on similar systems began in the 1960s as a form of computer application. Since their inception, facial recognition systems have seen wider uses in recent times on smartphones and in other forms of technology, such as robotics. Because computerized facial recognition involves the measurement of a human's physiological characteristics, facial recognition systems are categorized as biometrics. Although the accuracy of facial recognition systems as a biometric technology is lower than iris recognition, fingerprint image acquisition, palm recognition or voice recognition, it is widely adopted due to its contactless process. Facial recognition systems have been deployed in advanced human–computer interaction, video surveillance, law enforcement, passenger screening, decisions on employment and housing, and automatic indexing of images. Facial recognition systems are employed throughout the world today by governments and private companies. Their effectiveness varies, and some systems have previously been scrapped because of their ineffectiveness. The use of facial recognition systems has also raised controversy, with claims that the systems violate citizens' privacy, commonly make incorrect identifications, encourage gender norms and racial profiling, and do not protect important biometric data. The appearance of synthetic media such as deepfakes has also raised concerns about its security. These claims have led to the ban of facial recognition systems in several cities in the United States. Growing societal concerns led social networking company Meta Platforms to shut down its Facebook facial recognition system in 2021, deleting the face-scan data of more than one billion users. The change represented one of the largest shifts in facial recognition usage in the technology's history. IBM also stopped offering facial recognition technology due to similar concerns. == History of facial recognition technology == Automated facial recognition was pioneered in the 1960s by Woody Bledsoe, Helen Chan Wolf, and Charles Bisson, whose work focused on teaching computers to recognize human faces. Their early facial recognition project was dubbed "man-machine" because a human first needed to establish the coordinates of facial features in a photograph before they could be used by a computer for recognition. Using a graphics tablet, a human would pinpoint facial features coordinates, such as the pupil centers, the inside and outside corners of eyes, and the widows peak in the hairline. The coordinates were used to calculate 20 individual distances, including the width of the mouth and of the eyes. A human could process about 40 pictures an hour, building a database of these computed distances. A computer would then automatically compare the distances for each photograph, calculate the difference between the distances, and return the closed records as a possible match. In 1970, Takeo Kanade publicly demonstrated a face-matching system that located anatomical features such as the chin and calculated the distance ratio between facial features without human intervention. Later tests revealed that the system could not always reliably identify facial features. Nonetheless, interest in the subject grew and in 1977 Kanade published the first detailed book on facial recognition technology. In 1993, the Defense Advanced Research Project Agency (DARPA) and the Army Research Laboratory (ARL) established the face recognition technology program FERET to develop "automatic face recognition capabilities" that could be employed in a productive real life environment "to assist security, intelligence, and law enforcement personnel in the performance of their duties." Face recognition systems that had been trialled in research labs were evaluated. The FERET tests found that while the performance of existing automated facial recognition systems varied, a handful of existing methods could viably be used to recognize faces in still images taken in a controlled environment. The FERET tests spawned three US companies that sold automated facial recognition systems. Vision Corporation and Miros Inc were founded in 1994, by researchers who used the results of the FERET tests as a selling point. Viisage Technology was established by an identification card defense contractor in 1996 to commercially exploit the rights to the facial recognition algorithm developed by Alex Pentland at MIT. Following the 1993 FERET face-recognition vendor test, the Department of Motor Vehicles (DMV) offices in West Virginia and New Mexico became the first DMV offices to use automated facial recognition systems to prevent people from obtaining multiple driving licenses using different names. Driver's licenses in the United States were at that point a commonly accepted form of photo identification. DMV offices across the United States were undergoing a technological upgrade and were in the process of establishing databases of digital ID photographs. This enabled DMV offices to deploy the facial recognition systems on the market to search photographs for new driving licenses against the existing DMV database. DMV offices became one of the first major markets for automated facial recognition technology and introduced US citizens to facial recognition as a standard method of identification. The increase of the US prison population in the 1990s prompted U.S. states to established connected and automated identification systems that incorporated digital biometric databases, in some instances this included facial recognition. In 1999, Minnesota incorporated the facial recognition system FaceIT by Visionics into a mug shot booking system that allowed police, judges and court officers to track criminals across the state. Until the 1990s, facial recognition systems were developed primarily by using photographic portraits of human faces. Research on face recognition to reliably locate a face in an image that contains other objects gained traction in the early 1990s with the principal component analysis (PCA). The PCA method of face detection is also known as Eigenface and was developed by Matthew Turk and Alex Pentland. Turk and Pentland combined the conceptual approach of the Karhunen–Loève theorem and factor analysis, to develop a linear model. Eigenfaces are determined based on global and orthogonal features in human faces. A human face is calculated as a weighted combination of a number of Eigenfaces. Because few Eigenfaces were used to encode human faces of a given population, Turk and Pentland's PCA face detection method greatly reduced the amount of data that had to be processed to detect a face. Pentland in 1994 defined Eigenface features, including eigen eyes, eigen mouths and eigen noses, to advance the use of PCA in facial recognition. In 1997, the PCA Eigenface method of face recognition was improved upon using linear discriminant analysis (LDA) to produce Fisherfaces. LDA Fisherfaces became dominantly used in PCA feature based face recognition. While Eigenfaces were also used for face reconstruction. In these approaches no global structure of the face is calculated which links the facial features or parts. Purely feature based approaches to facial recognition were overtaken in the late 1990s by the Bochum system, which used Gabor filter to record the face features and computed a grid of the face structure to link the features. Christoph von der Malsburg and his research team at the University of Bochum developed Elastic Bunch Graph Matching in the mid-1990s to extract a face out of an image using skin segmentation. By 1997, the face detection method developed by Malsburg outperformed most other facial detection systems on the market. The so-called "Bochum system" of face detection was sold commercially on the market as ZN-Face to operators of airports and other busy locations. The software was "robust enough to make identifications from less-than-perfect face views. It can also often see through such impediments to identification as mustaches, beards, changed hairstyles and glasses—even sunglasses". Real-time face detection in video footage became possible in 2001 with the Viola–Jones object detection framework for faces. Paul Viola and Michael Jones combined their face detection method with the Haar-like feature approach to object recognition in digital images to launch AdaBoost, the first real-time frontal-view face detector. By 2015, the Viola–Jones algorithm had been implemented using small low power detectors on handheld devices and embedded systems. Therefore, the Viola–Jones algorithm has not only broadened the practical application of face recognition systems but
Beauty.AI
Beauty.AI is a mobile beauty pageant for humans and a contest for programmers developing algorithms for evaluating human appearance. The mobile app and website created by Youth Laboratories that uses artificial intelligence technology to evaluate people's external appearance through certain algorithms, such as symmetry, facial blemishes, wrinkles, estimated age and age appearance, and comparisons to actors and models. The Beauty.AI 2.0 contest caused great concern over important ethical issues with deep neural networks such as age, race and gender bias and lead to the creation of the Diversity.AI think tank dedicated to developing new methods for uncovering and managing bias in artificially intelligent systems. Beauty.AI was also an attempt to find approaches on how machines can perceive human face through evaluating particular features, commonly associated with health and beauty. == Concept == The Beauty.AI app was created by Youth Laboratories, a company based out of Russia and Hong Kong that focuses on facial skin analytics. The bioinformation company Insilico Medicine assists in the Beauty.AI app by testing its deep learning techniques to the app. One goal of the app is to reduce the need for human and animal testing as well as improving people's overall health. Its first contest was started in December 2016, and the results were announced in August 2016. More than 60,000 people submitted entries into the contest. The mobile app uses artificial intelligence technology to inspect photographs for certain facial features in order to both determine a person's beauty through artificial means by multiple robots. Part of the Beauty.AI app's purpose is to collect visual and anecdotal data to improve its creator's Youth Laboratories skin analyst skills. == Accusations of racism == There were a total of 44 individuals from different age groups and genders judged as the most attractive, with 37 white entrants, six Asian entrants, and one dark-skinned entrant. The app has received criticism from social justice advocates and computer science professionals. However, Alex Zhavoronkov, PhD, chief science officer of Youth Laboratories and chief technology officer Konstantin Kiselev, both for Youth Laboratories, noted that a lack of data may have contributed to these results. Also, Kiselev added that another issue was that approximately 75% of entrants were white Europeans, whereas only 7% and 1% were from India and Africa, respectively. Kiselev stated that they would work on doing more and better outreach to these areas to improve in this area. Despite this, it was said by Dr. Zhavoronkov that the AI would discard photos of dark-skinned people if the lighting is too poor. Dr. Zhavoronkov vowed to weed out the issues for the next beauty pageant and to try to avoid a similar controversy in the future.
Weka (software)
Waikato Environment for Knowledge Analysis (Weka) is a collection of machine learning and data analysis free software licensed under the GNU General Public License. It was developed at the University of Waikato, New Zealand, and is the companion software to the book "Data Mining: Practical Machine Learning Tools and Techniques". == Description == Weka contains a collection of visualization tools and algorithms for data analysis and predictive modeling, together with graphical user interfaces for easy access to these functions. The original non-Java version of Weka was a Tcl/Tk front-end to (mostly third-party) modeling algorithms implemented in other programming languages, plus data preprocessing utilities in C, and a makefile-based system for running machine learning experiments. This original version was primarily designed as a tool for analyzing data from agricultural domains, but the more recent fully Java-based version (Weka 3), for which development started in 1997, is now used in many different application areas, in particular for educational purposes and research. Advantages of Weka include: Free availability under the GNU General Public License. Portability, since it is fully implemented in the Java programming language and thus runs on almost any modern computing platform. A comprehensive collection of data preprocessing and modeling techniques. Ease of use due to its graphical user interfaces. Weka supports several standard data mining tasks, more specifically, data preprocessing, clustering, classification, regression, visualization, and feature selection. Input to Weka is expected to be formatted according the Attribute-Relational File Format and with the filename bearing the .arff extension. All of Weka's techniques are predicated on the assumption that the data is available as one flat file or relation, where each data point is described by a fixed number of attributes (normally, numeric or nominal attributes, but some other attribute types are also supported). Weka provides access to SQL databases using Java Database Connectivity and can process the result returned by a database query. Weka provides access to deep learning with Deeplearning4j. It is not capable of multi-relational data mining, but there is separate software for converting a collection of linked database tables into a single table that is suitable for processing using Weka. Another important area that is currently not covered by the algorithms included in the Weka distribution is sequence modeling. == Extension packages == In version 3.7.2, a package manager was added to allow the easier installation of extension packages. Some functionality that used to be included with Weka prior to this version has since been moved into such extension packages, but this change also makes it easier for others to contribute extensions to Weka and to maintain the software, as this modular architecture allows independent updates of the Weka core and individual extensions. == History == In 1993, the University of Waikato in New Zealand began development of the original version of Weka, which became a mix of Tcl/Tk, C, and makefiles. In 1997, the decision was made to redevelop Weka from scratch in Java, including implementations of modeling algorithms. In 2005, Weka received the SIGKDD Data Mining and Knowledge Discovery Service Award. In 2006, Pentaho Corporation acquired an exclusive licence to use Weka for business intelligence. It forms the data mining and predictive analytics component of the Pentaho business intelligence suite. Pentaho has since been acquired by Hitachi Vantara, and Weka now underpins the PMI (Plugin for Machine Intelligence) open source component. == Related tools == Auto-WEKA is an automated machine learning system for Weka. Environment for DeveLoping KDD-Applications Supported by Index-Structures (ELKI) is a similar project to Weka with a focus on cluster analysis, i.e., unsupervised methods. H2O.ai is an open-source data science and machine learning platform KNIME is a machine learning and data mining software implemented in Java. Massive Online Analysis (MOA) is an open-source project for large scale mining of data streams, also developed at the University of Waikato in New Zealand. Neural Designer is a data mining software based on deep learning techniques written in C++. Orange is a similar open-source project for data mining, machine learning and visualization based on scikit-learn. RapidMiner is a commercial machine learning framework implemented in Java which integrates Weka. scikit-learn is a popular machine learning library in Python.