AI For Np Students

AI For Np Students — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • List of data science software

    List of data science software

    This is a list of data science software and platforms used in data science, which includes programming languages, programming environments, machine learning frameworks, data engineering tools, statistical software, data analysis, plotting, MLOps systems, and more. == Programming languages == == Development environments == These interactive notebooks, IDEs, and platforms provide specialised development environments. Apache Zeppelin Architect — Eclipse (software) CoCalc Dataiku Data Science Studio FreeMat GNU Octave Google Colab DataSpell Jupyter Notebook / JupyterLab Kaggle Notebooks MATLAB O-Matrix PyCharm RStudio SAS (software) and SAS Studio Spyder Visual Studio Code == Machine and deep learning software == The Machine learning / deep learning tools support development in those fields. == Data engineering == Examples of Data engineering tools. Apache Airflow Apache Flink Apache Hadoop Apache Kafka Apache NiFi Apache Spark Dask Data build tool (dbt) == Data mining == Examples of Data mining tools. === Free and open-source === === Proprietary === == Database management == === List of RDBMS === ==== Proprietary ==== == Data warehouses == Data warehouse environments include: Amazon Redshift Snowflake Google BigQuery Microsoft Azure Synapse Teradata Vertica == Data lakes == Data lake environments include: Apache Hadoop Cloudera Databricks Delta Lake Amazon S3 Google Cloud Storage Azure Data Lake == Algorithms == Apriori algorithm – frequent itemset mining and association rule learning in market basket analysis Backpropagation – algorithm for training artificial neural networks using gradient descent Decision Trees – tree-based algorithm for classification and regression Expectation–maximization algorithm – iterative procedure for maximum likelihood estimation with latent variables Gradient descent – iterative optimization algorithm for minimizing a loss function ID3 algorithm – used to generate a decision tree from a dataset K-Means – clustering algorithm based on minimizing within-cluster distances K-Nearest Neighbors (KNN) – instance-based learning and classification method Linear regression – estimation method for predicting a dependent variable based on independent variables Logistic regression – classification algorithm for predicting a binary outcome Naive Bayes – probabilistic classifier based on Bayes' theorem Ordinary least squares – estimation method for parameters in linear regression PageRank – graph-based algorithm for link analysis and search ranking Principal component analysis – technique to reduce high-dimensional data while preserving variance Q-learning – reinforcement learning algorithm for learning optimal actions Random forest – ensemble of decision trees for improved classification or regression Sequential minimal optimization – solver for training support vector machines Stochastic gradient descent – randomized variant of gradient descent for large-scale machine learning Support Vector Machines (SVM) – algorithm for finding a hyperplane to separate classes == Statistical software == === Open-source === === Public domain === CSPro Dataplot Epi Map X-13ARIMA-SEATS === Freeware === BV4.1 MINUIT WinBUGS Winpepi === Proprietary === == Data processing == Tools for Data processing and analysis: == Data and information visualization == Software for Data visualization: == Plotting software == Software for plotting data to support processing and visualise results. == Maps and geospatial visualization == ArcGIS Carto Epi Map GeoDA Google Earth Engine Leaflet Mapbox MountainsMap QGIS == Machine learning == MLOps and model deployment: BentoML Data Version Control (DVC) Kubeflow MLflow Seldon Core Streamlit TensorFlow Serving Weights & Biases == Data repositories == Kaggle – platform for data science competitions, datasets, and notebooks. OpenML – collaborative platform for sharing datasets, algorithms, and experiments. University of California, Irvine Machine Learning Repository Zenodo – open-access repository supported by CERN and the EU. == Educational data science software == Kaggle – online platform for data science education, competitions, datasets, and collaborative learning. KNIME – open-source data analytics platform used for teaching data science, machine learning, and workflow-based analysis. RapidMiner – used in academic research and education for data mining and machine learning. Statistics Online Computational Resource (SOCR) – online tools and instructional resources for statistics education. Tanagra (machine learning) – data mining software developed for research and teaching purposes. TinkerPlots – explore and analyze data through visual modeling.

    Read more →
  • Online OS

    Online OS

    The Online Operating System was a fully multi-lingual and free to use web desktop written in JavaScript using Ajax. It was a Windows-based desktop environment with open-source applications and system utilities developed upon the reBOX web application framework by iCUBE Network Solutions, an Austrian company located in Vienna. == About the project == OOS.cc, which is short for Online Operating System, was a web application platform that mimicked the look and feel of classic desktop operating systems such as Microsoft Windows, Mac OS X or KDE. It consisted of various open source applications built upon the so-called reBOX web application framework. As applications could be executed in an integrated and parallel way, the OOS could have been considered a web desktop or webtop. It provided basic services such as a GUI, a virtual file system, access control management and possibilities to develop and deploy applications online. As the Online Operating System was executed within a web browser, it was no real operating system but rather a portal to various web applications, offering a high usability and flexibility. The project was partly funded by grants from the Internetprivatstiftung Austria (IPA). As at 01.08.2008 almost 20.000 users have joined the oos.cc community, using the offered featured and applications. == History == The development of the web desktop was started by iCUBE Network Solutions in 2005, followed by the first beta releases in 2006. Hence, together with YouOS and eyeOS, it can be considered to be one of the first publicly available systems of its kind. The first full version including core-level multi-language support, the file system and a basic set of applications was released to the public in March 2007 on the occasion of a national exhibition (ITnT Austria Archived 2007-06-30 at the Wayback Machine) and has left beta state half a year later in October 2007. The first release considered stable (1.0.0) was published in July 2007. The project itself and the contained applications have received several national innovation awards (see,) and have gained attention mainly due to the comprehensive approach taken (see,). OOS.cc started as a national project. The full platform including all offered applications are currently available in three languages (German, English as well as Spanish) and is receiving increasing coverage around the world (for examples see, or). The current version is 1.3.01 from 01.08.2008. == Technical overview == The project is fully written in JavaScript, exclusively using DHTML techniques to run in any web browser without any additional software installation needed. The system implements a modern kind of web application model, excessively using Ajax for communicating between client components and the Java server backend in an exclusively asynchronous manner. Aim is to offer users the unique interaction behavior following the desktop metaphor, which is the main idea of any web desktop. Also typical for this sort of web application is the broadly use of Javascript-on-demand techniques, cutting the complete project source into pieces and loading them instantly when needed. Based on this technical basis, reBOX was the framework library all applications in oos.cc were built of. It is a fully flexible and extensible API, including a GUI widget set, communication mechanisms and server services offering general and framework specific services. The Online Operating System itself consisted of a basic framework, which was able to launch any JavaScript application using the reBOX library. The user interface was based on the behavior of the Windows desktop with a start menu, a task bar and a desktop background. All applications were running in this environment. At server side, there were Java based web services that ran to serve the client processes and to provide data from the relational database in the backend. oos.cc also provided an integrated development environment called Developer Suite, which allowed the community to build own applications for the desktop environment based on reBOX (see development section below). == License == All applications available in oos.cc were open source under the European Union Public Licence (EUPL). The reBOX development toolkit is free to use developing any applications for the webtop. == Features == As mentioned above, all applications published on oos.cc are open source based on the EUPL, and can be "installed" or "deinstalled" to what-ever preferences the user has. Besides global services like the multi-language support or the global theme support, as well as some minor tools and games, oos.cc offered four major services that could be used completely free of charge. Integrated and fully flexible file storage (1 GB per user) HTTP as well as FTP file transfer from and to local file system User-based file-shares within the oos-community WebDAV access Document Management (including Version Control and File Locking mechanisms) Image publishing, organization and post-processing A free sub domain (user.oos.cc) for web- or image publishing, directly integrated in the desktop Groupware applications, including free mail, fetchmail and contact management An integrated development environment where oos-applications can be created directly from within the system (see development section below) Next releases were planned to focus on an extensive security and privacy suite, dealing with challenges like anonymous communication (browsing as well as temporary mail-addresses) as well as offering encrypted password and file storage and connectivity services. Since its initial stable release, OOS.cc could have been accessed using https to ensure secure communication. == Limitations and drawbacks == Limited number of applications: no commercial applications can be hosted. Only reviewed applications are being published No processing of popular office formats (.doc, .odt, etc.) Limited language support: Only English, German and Spanish Dependence on foreign infrastructure: No possibility to extend storage, no additional/guaranteed bandwidth, etc. == Development == One of the key focuses of the team was right from the beginning to offer a very flexible and comprehensive API, that can be used to develop not only custom applications within oos.cc, but also stand-alone web-applications or to integrate single components in existing web-sites. By decoupling the development from web-related "problems" using the reBOX API web-applications can be development in a similar fashion to any Java program: Elements can be positioned and can interact like in high-level object oriented programming languages, without taking care of divs, browser specific behavior or communication handling. The framework also offers multi-language and theme support for existing as well as newly created applications, allowing changing almost every aspect of the look and feel of the used components according to the preferences of its users. For taking advantage of this approach, one of the applications offered in the OOS was an integrated Development Suite, allowing directly writing and executing code and hence creating new programs within the boundaries of the web computer. All applications on oos.cc were released as open source, thus all existing programs were offered to be imported, reviewed or changed and then locally deployed. Following this idea, every user was free to submit changed or newly created applications to be included in the globally offered application set. The last release offered features like auto-completion and an outline-window.

    Read more →
  • Fabric Connect

    Fabric Connect

    Fabric Connect, in computer networking usage, is the name used by Extreme Networks to market an extended implementation of the IEEE 802.1aq and IEEE 802.1ah-2008 standards. The Fabric Connect technology was originally developed by the Enterprise Solutions R&D department within Nortel Networks. In 2009, Avaya, Inc acquired Nortel Networks Enterprise Business Solutions; this transaction included the Fabric Connect intellectual property together with all of the Ethernet Switching platforms that supported it. Subsequently, the Fabric Connect technology became part of the Extreme Networks portfolio by virtue of their 2017 purchase of the Avaya Networking business and assets. It was during the Avaya era that this technology was promoted as the lead element of the Virtual Enterprise Network Architecture (VENA). == Technologies == === Fabric Connect === Fabric Connect's provides network-wide, end-to-end, multi-layer virtualization. A network virtualization capability, based on an enhanced implementation of the IEEE 802.1aq Shortest Path Bridging (SPB) standard, Fabric Connect offers the ability to create a simplified network that can dynamically virtualize elements to efficiently provision and utilize resources, thus reducing the strain on the network and personnel. Extreme Networks base the Fabric Connect technology on the SPB standard, including support for RFC 6329, and have integrated IP Routing and IP Multicast support; this unified technology allows for the replacement of multiple conventional protocols such as Spanning Tree, RIP and/or OSPF, ECMP, and PIM. === Fabric Attach === An adjunct to the Fabric Connect technology, Fabric Attach allows network operators to extend network virtualization directly into conventional wiring closets (using existing non-Fabric Ethernet switches) and automate the provisioning of devices to their appropriate virtual network. This is particularly relevant for the mass of unattended network end-point that are now appearing, such as IP Phones, Wireless Access Points, and IP Cameras. Fabric Attach standardized protocols such as 802.1AB LLDP to exchange credentials and obtain provisioning information that allows "Client" Switches to be automatically re-configured on the fly with parameters that let Traffic Flows Map through to Fabric Connect Edge Switches (aka "Backbone Edge Bridge" in SPB definition) functioning as a Fabric Attach "Server" Switch. This method is described by an IETF "Internet Draft", pending further standardization activity. Fabric Attach is typically used to automate Wiring Closet connectivity, but has the potential to be extensible for use in the Data Center, with Virtual Machines being able to dynamically request VLAN/VSN (Virtual Service Network) assignment based upon application requirements. == Hardware products == === Virtual Services Platform 9000 Series === A range of modular chassis-based products, featuring a carrier-grade Linux operation system, and designed for high-performance deployment scenarios that need to scale to multiple terabits of switching capacity and support 10 and 40 gigabit Ethernet connections, and is designed eventually to support 100 gigabit Ethernet. === Virtual Services Platform 8000 Series === A compact form-factor platform delivering high-density 10/40 gigabit Ethernet connectivity, and targeted at mid-market through to mid-size enterprise core switch applications. === Virtual Services Platform 7000 Series === A range of high-end 10 gigabit Ethernet stackable switches that extend fabric-based networking to the data center top-of-rack. They support 40 gigabit Ethernet via the MDA Slot. === Virtual Services Platform 4000 Series === A range of high-end gigabit Ethernet stackable switches that extend Fabric-based networking to branch and metro locations. === Ethernet Routing Switch 5000 Series === A range of high-end gigabit Ethernet stackable switches that provides enterprise-class desktop features, including PoE, and offers 10 Gbit/s uplink connections. Each Switch supports up to 144 Gbit/s of virtual backplane capacity, delivering up to 1.152 Tbit/s for a system of eight, creating a virtual backplane through a stacking configuration. === Ethernet Routing Switch 4000 Series === A range of gigabit Ethernet stackable switches that provide enterprise-class desktop features, including PoE/PoE+, and offer 1/10 Gbit/s uplink connections. Each switch supports up to 48 Gbit/s of virtual backplane capacity, delivering up to 384 Gbit/s for a system of 8, creating a virtual backplane through a stacking configuration. === Ethernet Routing Switch 3500 Series === These entry-level gigabit Ethernet stackable switches provide enterprise-class desktop features, including PoE/PoE+, and 1 Gbit/s uplink connections.

    Read more →
  • Dropbox Paper

    Dropbox Paper

    Dropbox Paper, or simply Paper, is a collaborative document-editing service developed by Dropbox. Originating from the company's acquisition of document collaboration company Hackpad in April 2014, Dropbox Paper was officially announced in October 2015, and launched in January 2017. It offers a web application, as well as mobile apps for Android and iOS. Dropbox Paper was described in the official announcement post as "a flexible workspace that brings people and ideas together. With Paper, teams can create, review, revise, manage, and organize — all in shared documents". Reception of Dropbox Paper has been mixed. Critics praised collaboration functionality, including content available immediately, the ability to mention specific collaborators, assign tasks, write comments, as well as editing attribution, and revision history. It received particular praise for its support for rich media from a variety of sources, with one reviewer noting that the Paper's support for rich media exceeds the capabilities of most of its competitors. However, it was criticized for a lack of formatting options and editing features. While the user interface was liked for being minimal, reviewers cited the lack of a fixed formatting bar and missing features present in competitors' products as making Dropbox Paper seem like a "light" tool. == History == Dropbox acquired document collaboration company Hackpad in April 2014. A year later, Dropbox launched a Dropbox Notes note-taking product in beta testing phase. Dropbox Paper was officially announced on October 15, 2015, followed by an open beta and release of mobile Android and iOS apps in August 2016. Dropbox Paper was officially released on January 30, 2017. == Reception == In a comparison between Dropbox Paper and Evernote, PC World's Michael Ansaldo wrote that "With its emphasis on document creation, you might expect formatting to be front and center in Dropbox Paper. That's not the case." Ansaldo noted the lack of a "fixed formatting toolbar as you'd find in Evernote or a word processor like Google Docs or Microsoft Word. Instead, the text editor appears as a floating ribbon only when you highlight selected text." The only formatting options available for emphasis were bolding, strikethrough, bulleted and numbered lists, and H1 and H2 tags. Users can also add links, convert text to checklists, and add comments. Ansaldo wrote that "Both Evernote and Dropbox Paper make it easy to add images to a document", but also noted that "Dropbox Paper doesn't support any image editing". Paper supports rich media, and users can "add rich content to your document just by pasting a link to the file. In addition to Dropbox, Paper supports media from a variety of popular services including YouTube, Spotify, Vimeo, SoundCloud, Facebook, and Google's productivity suite. Once the file appears, you can delete the link for a cleaner display." To start working with other people, Paper "allows you to invite people via email from within a document", with sharing options for who can view the link (anyone with the link or just the invited person), and action permissions (edit or only comment). Regarding collaboration, Ansaldo wrote that "Creative collaboration is Paper’s marquee feature, and it provides a variety of ways to work effectively with others in real time". Users can "make any content immediately visible and accessible to a specific collaborator with "@mentions"", and "You can also use @mentions to create and assign task lists within a document." Paper also "boasts essential collaboration tools including comments, editing attribution, and revision history." Writing for TechRadar, John Brandon wrote that Dropbox Paper "might be a 'light' tool for now without the extensive templates of Microsoft Office or the integration with other apps in the Zoho suite, but it does work well with the Dropbox storage service that's so popular with office workers these days." Kyle Wiggers of Digital Trends wrote that Paper is "all about minimizing distractions. Its interface is quite literally a big, blank canvas on which you tap out your agenda. You can organize notes by title and create to-do lists, but even basic formatting tools are obscured from view", noting Paper's "floating box above words and phrases highlighted by your cursor". Wiggers stated that "Paper is not a to-do organizer", but that it's "well suited to the purpose thanks to a bevy of labor-saving conveniences", highlighting that Paper "supports more media than most of its to-do and note-taking counterparts". He praised the collaboration tools, writing that they "are as extensive as you'd hope, and then some", citing its invitation system with permission controls, lists of changes and revision history, comment and chat support, and "perhaps best of all", the ability to assign tasks with a "@" mention. Business Insider's Alex Heath praised that "Paper's interface is spotless and friendly to write in. You don't feel overwhelmed with formatting options", but criticized the available features, writing that "Google Docs is much more full-featured in the formatting department, so Paper has some catching up to do if it wants to be on par with the competition". Writing for The Verge, Casey Newton praised Paper's handling of rich media, complimenting it for being "great", and added that "I imagine that creative types who work on teams will appreciate having rich media embedded in the documents they're working on rather than in a series of infinite tabs".

    Read more →
  • PANGU (software)

    PANGU (software)

    The PANGU (Planet and Asteroid Natural scene Generation Utility) is a computer graphics utility of which the development was funded by ESA and performed by University of Dundee. It generates scenes of planets, moons, asteroids, spacecraft and rovers. The main purpose of the tool is to test and validate navigation techniques based on the processing of images coming from on-board sensors, such as a camera or imaging LIDAR on a planetary lander.

    Read more →
  • LiveChat

    LiveChat

    LiveChat is an AI customer service software with chatbot, online chat, help desk software, and web analytics capabilities. LiveChat is used by over 76,000 companies. It was first launched in 2002 and is offered via a SaaS (software as a service) business model by Text. Organizations use LiveChat as a single point of contact to manage customer service and online sales activities with a single program. == Product == LiveChat is proprietary software. LiveChat's website chat widget can be embedded on customers' websites as a small chat box, often displayed in the bottom right corner of the web browser. It can be used to conduct chats, share files and save transcripts. The agent application is used by company employees to respond to questions asked by the customers. This is available through both web-based application, desktop applications, and mobile apps. Web chat sessions can be initiated by the visiting customer, or by the agent, either manually or automatically by the LiveChat system when the visitor meets the predefined criteria (i.e. searched keyword, time on website, encountered error, etc.). LiveChat's system attempts to identify the best prospects visiting a website based on data gathered from past purchasing decisions. Other features include real-time website traffic monitoring, built-in ticketing system and agents' efficiency analytics. LiveChat is available in 48 languages. == Research and reception == Reviewing LiveChat's usefulness for online learning in 2020, psychologist Jaclyn Broadbent said "LiveChat occurs as a real-time conversation, it can be time-consuming for staff and disruptive to other tasks." However, using it has resulted in reduced communication traffic from other channels, such as the discussion boards or email. As a teacher, the best time to be available on LiveChat is when you are doing other administrative jobs." Since 2014 LiveChat has been publishing Customer Service Report - an annual study of customer satisfaction and analysis of online business communication trends. It includes research of thousands of companies and millions of customer service email and live support interactions.

    Read more →
  • Deblurring

    Deblurring

    Deblurring is the process of removing blurring artifacts from images. Deblurring recovers a sharp image S from a blurred image B, where S is convolved with K (the blur kernel) to generate B. Mathematically, this can be represented as B = S ∗ K {\displaystyle B=SK} (where represents convolution). While this process is sometimes known as unblurring, deblurring is the correct technical word. The blur K is typically modeled as point spread function and is convolved with a hypothetical sharp image S to get B, where both the S (which is to be recovered) and the point spread function K are unknown. This is an example of an inverse problem. In almost all cases, there is insufficient information in the blurred image to uniquely determine a plausible original image, making it an ill-posed problem. In addition the blurred image contains additional noise which complicates the task of determining the original image. This is generally solved by the use of a regularization term to attempt to eliminate implausible solutions. This problem is analogous to echo removal in the signal processing domain. Nevertheless, when coherent beam is used for imaging, the point spread function can be modeled mathematically. By proper deconvolution of the point spread function K and the blurred image B, the blurred image B can be deblurred (unblur) and the sharp image S can be recovered.

    Read more →
  • Model-based clustering

    Model-based clustering

    In statistics, cluster analysis is the algorithmic grouping of objects into homogeneous groups based on numerical measurements. Model-based clustering based on a statistical model for the data, usually a mixture model. This has several advantages, including a principled statistical basis for clustering, and ways to choose the number of clusters, to choose the best clustering model, to assess the uncertainty of the clustering, and to identify outliers that do not belong to any group. == Model-based clustering == Suppose that for each of n {\displaystyle n} observations we have data on d {\displaystyle d} variables, denoted by y i = ( y i , 1 , … , y i , d ) {\displaystyle y_{i}=(y_{i,1},\ldots ,y_{i,d})} for observation i {\displaystyle i} . Then model-based clustering expresses the probability density function of y i {\displaystyle y_{i}} as a finite mixture, or weighted average of G {\displaystyle G} component probability density functions: p ( y i ) = ∑ g = 1 G τ g f g ( y i ∣ θ g ) , {\displaystyle p(y_{i})=\sum _{g=1}^{G}\tau _{g}f_{g}(y_{i}\mid \theta _{g}),} where f g {\displaystyle f_{g}} is a probability density function with parameter θ g {\displaystyle \theta _{g}} , τ g {\displaystyle \tau _{g}} is the corresponding mixture probability where ∑ g = 1 G τ g = 1 {\displaystyle \sum _{g=1}^{G}\tau _{g}=1} . Then in its simplest form, model-based clustering views each component of the mixture model as a cluster, estimates the model parameters, and assigns each observation to cluster corresponding to its most likely mixture component. === Gaussian mixture model === The most common model for continuous data is that f g {\displaystyle f_{g}} is a multivariate normal distribution with mean vector μ g {\displaystyle \mu _{g}} and covariance matrix Σ g {\displaystyle \Sigma _{g}} , so that θ g = ( μ g , Σ g ) {\displaystyle \theta _{g}=(\mu _{g},\Sigma _{g})} . This defines a Gaussian mixture model. The parameters of the model, τ g {\displaystyle \tau _{g}} and θ g {\displaystyle \theta _{g}} for g = 1 , … , G {\displaystyle g=1,\ldots ,G} , are typically estimated by maximum likelihood estimation using the expectation-maximization algorithm (EM); see also EM algorithm and GMM model. Bayesian inference is also often used for inference about finite mixture models. The Bayesian approach also allows for the case where the number of components, G {\displaystyle G} , is infinite, using a Dirichlet process prior, yielding a Dirichlet process mixture model for clustering. === Choosing the number of clusters === An advantage of model-based clustering is that it provides statistically principled ways to choose the number of clusters. Each different choice of the number of groups G {\displaystyle G} corresponds to a different mixture model. Then standard statistical model selection criteria such as the Bayesian information criterion (BIC) can be used to choose G {\displaystyle G} . The integrated completed likelihood (ICL) is a different criterion designed to choose the number of clusters rather than the number of mixture components in the model; these will often be different if highly non-Gaussian clusters are present. === Parsimonious Gaussian mixture model === For data with high dimension, d {\displaystyle d} , using a full covariance matrix for each mixture component requires estimation of many parameters, which can result in a loss of precision, generalizabity and interpretability. Thus it is common to use more parsimonious component covariance matrices exploiting their geometric interpretation. Gaussian clusters are ellipsoidal, with their volume, shape and orientation determined by the covariance matrix. Consider the eigendecomposition of a matrix Σ g = λ g D g A g D g T , {\displaystyle \Sigma _{g}=\lambda _{g}D_{g}A_{g}D_{g}^{T},} where D g {\displaystyle D_{g}} is the matrix of eigenvectors of Σ g {\displaystyle \Sigma _{g}} , A g = diag { A 1 , g , … , A d , g } {\displaystyle A_{g}={\mbox{diag}}\{A_{1,g},\ldots ,A_{d,g}\}} is a diagonal matrix whose elements are proportional to the eigenvalues of Σ g {\displaystyle \Sigma _{g}} in descending order, and λ g {\displaystyle \lambda _{g}} is the associated constant of proportionality. Then λ g {\displaystyle \lambda _{g}} controls the volume of the ellipsoid, A g {\displaystyle A_{g}} its shape, and D g {\displaystyle D_{g}} its orientation. Each of the volume, shape and orientation of the clusters can be constrained to be equal (E) or allowed to vary (V); the orientation can also be spherical, with identical eigenvalues (I). This yields 14 possible clustering models, shown in this table: It can be seen that many of these models are more parsimonious, with far fewer parameters than the unconstrained model that has 90 parameters when G = 4 {\displaystyle G=4} and d = 9 {\displaystyle d=9} . Several of these models correspond to well-known heuristic clustering methods. For example, k-means clustering is equivalent to estimation of the EII clustering model using the classification EM algorithm. The Bayesian information criterion (BIC) can be used to choose the best clustering model as well as the number of clusters. It can also be used as the basis for a method to choose the variables in the clustering model, eliminating variables that are not useful for clustering. Different Gaussian model-based clustering methods have been developed with an eye to handling high-dimensional data. These include the pgmm method, which is based on the mixture of factor analyzers model, and the HDclassif method, based on the idea of subspace clustering. The mixture-of-experts framework extends model-based clustering to include covariates. == Example == We illustrate the method with a dateset consisting of three measurements (glucose, insulin, sspg) on 145 subjects for the purpose of diagnosing diabetes and the type of diabetes present. The subjects were clinically classified into three groups: normal, chemical diabetes and overt diabetes, but we use this information only for evaluating clustering methods, not for classifying subjects. The BIC plot shows the BIC values for each combination of the number of clusters, G {\displaystyle G} , and the clustering model from the Table. Each curve corresponds to a different clustering model. The BIC favors 3 groups, which corresponds to the clinical assessment. It also favors the unconstrained covariance model, VVV. This fits the data well, because the normal patients have low values of both sspg and insulin, while the distributions of the chemical and overt diabetes groups are elongated, but in different directions. Thus the volumes, shapes and orientations of the three groups are clearly different, and so the unconstrained model is appropriate, as selected by the model-based clustering method. The classification plot shows the classification of the subjects by model-based clustering. The classification was quite accurate, with a 12% error rate as defined by the clinical classification. Other well-known clustering methods performed worse with higher error rates, such as single-linkage clustering with 46%, average link clustering with 30%, complete-linkage clustering also with 30%, and k-means clustering with 28%. == Outliers in clustering == An outlier in clustering is a data point that does not belong to any of the clusters. One way of modeling outliers in model-based clustering is to include an additional mixture component that is very dispersed, with for example a uniform distribution. Another approach is to replace the multivariate normal densities by t {\displaystyle t} -distributions, with the idea that the long tails of the t {\displaystyle t} -distribution would ensure robustness to outliers. However, this is not breakdown-robust. A third approach is the "tclust" or data trimming approach which excludes observations identified as outliers when estimating the model parameters. == Non-Gaussian clusters and merging == Sometimes one or more clusters deviate strongly from the Gaussian assumption. If a Gaussian mixture is fitted to such data, a strongly non-Gaussian cluster will often be represented by several mixture components rather than a single one. In that case, cluster merging can be used to find a better clustering. A different approach is to use mixtures of complex component densities to represent non-Gaussian clusters. == Non-continuous data == === Categorical data === Clustering multivariate categorical data is most often done using the latent class model. This assumes that the data arise from a finite mixture model, where within each cluster the variables are independent. === Mixed data === These arise when variables are of different types, such as continuous, categorical or ordinal data. A latent class model for mixed data assumes local independence between the variable. The location model relaxes the local independence assumption. The clustMD approach assumes that the observed variables are manifestations of underlying continuous Gaussian latent

    Read more →
  • Relational data mining

    Relational data mining

    Relational data mining is the data mining technique for relational databases. Unlike traditional data mining algorithms, which look for patterns in a single table (propositional patterns), relational data mining algorithms look for patterns among multiple tables (relational patterns). For most types of propositional patterns, there are corresponding relational patterns. For example, there are relational classification rules (relational classification), relational regression tree, and relational association rules. There are several approaches to relational data mining: Inductive Logic Programming (ILP) Statistical Relational Learning (SRL) Graph Mining Propositionalization Multi-view learning == Algorithms == Multi-Relation Association Rules: Multi-Relation Association Rules (MRAR) is a new class of association rules which in contrast to primitive, simple and even multi-relational association rules (that are usually extracted from multi-relational databases), each rule item consists of one entity but several relations. These relations indicate indirect relationship between the entities. Consider the following MRAR where the first item consists of three relations live in, nearby and humid: “Those who live in a place which is near by a city with humid climate type and also are younger than 20 -> their health condition is good”. Such association rules are extractable from RDBMS data or semantic web data. == Software == Safarii: a Data Mining environment for analysing large relational databases based on a multi-relational data mining engine. Dataconda: a software, free for research and teaching purposes, that helps mining relational databases without the use of SQL. == Datasets == Relational dataset repository: a collection of publicly available relational datasets.

    Read more →
  • CloudHealth Technologies

    CloudHealth Technologies

    CloudHealth Technologies, now CloudHealth by VMware, is a software company based in Boston, Massachusetts. The company provides cloud computing services related to cost management, governance, automation, security, and performance. == History == CloudHealth Technologies was founded by Joe Kinsella in 2012. Dan Phillips joined as CEO and co-founder in late 2012, and Dave Eicher joined as co-Founder in January 2013. In May 2016, the company announced plans to expand from its Boston headquarters with branch offices in San Francisco, London, Washington, D.C., Sydney, Amsterdam, Tel Aviv, and Singapore. Headquarters moved in Boston from Fort Point to 100 Summer Street in the Spring of 2018, tripling in square footage. In September 2017, Tom Axbey—who was previously at Rave Mobile Safety—joined as the new CEO and President. VMware announced its intention to acquire CloudHealth Technologies on August 27, 2018. The acquisition is "part of the information technology company's continued push into cloud-based software services" according to Reuters. The deal closed on October 4, 2018, and was reported to be in excess of $500 million. == Technology == Delivered through a software as a service (SaaS) model, CloudHealth Technologies's platform collects and analyzes data from cloud computing services and other IT environments so clients can report on costs, inform their business models, and project future trends. CloudHealth Technologies is compatible with Amazon Web Services, Microsoft Azure, Google Cloud Platform, multicloud, and hybrid cloud environments. CloudHealth Technologies has received Amazon Web Services(AWS) Education Competency status, AWS Migration Competency status and achieved SOC 2 Type 2 Compliance. == Funding == As of June 2017, CloudHealth Technologies has raised a total of $85.7 million through four rounds of funding. In March 2013, CloudHealth Technologies announced that it had secured $4.5 million in Series A funding. This round was led by .406 Ventures and Sigma Prime Ventures. In January 2015, CloudHealth Technologies secured $12 million in Series B funding. This round was led by Scale Venture Partners, .406 Ventures, and Sigma Prime Ventures, and was followed by a $3.2 million extension round. In May 2016, CloudHealth Technologies announced $20 million in Series C funding, led by Sapphire Ventures, .406 Ventures, Scale Venture Partners and Sigma Prime Ventures. In June 2017, CloudHealth Technologies secured $46 million in Series D funding led by Kleiner Perkins Caufield & Byers with participation from Meritech Capital Partners, Sapphire Ventures, 406 Ventures, and Scale Venture Partners. == Competition == As of March 2023, CloudHealth Technologies competes with Cloudability by Apptio and CloudCheckr by NetApp.

    Read more →
  • Web application

    Web application

    A web application (or web app) is application software that is created with web technologies and runs via a web browser. Web applications emerged during the late 1990s and allowed for the server to dynamically build a response to the request, in contrast to static web pages. Web applications are commonly distributed via a web server. There are several different tier systems that web applications use to communicate between the web browsers, the client interface, and server data. Each system has its own uses as they function in different ways. However, there are many security risks that developers must be aware of during development; proper measures to protect user data are vital. Web applications are often constructed with the use of a web application framework. Single-page applications (SPAs) and progressive web apps (PWAs) are two architectural approaches to creating web applications that provide a user experience similar to native apps, including features such as smooth navigation, offline support, and faster interactions. Web applications are often fully hosted on remote cloud services, can require a constant connection to them, and can replace conventional desktop applications for operating systems such as Microsoft Windows, thus facilitating the operation of software as a service as it grants the developer the power to tightly control billing based on use of the remote services as well as vendor lock-in by hosting data remotely. Modern browsers such as Chrome offer sandboxing for every browser tab which improves security and restricts access to local resources. No software installation is required as the app runs within the browser which reduces the need for managing software installations. With the use of remote cloud services, customers do not need to manage servers as that can be left to the developer and the cloud service and can use the software with a relatively low power, low-resource PC such as a thin client. The source code of the application can stay the same across operating systems and devices of users with the use of responsive web design, since it only needs to be compatible with web browsers which adhere to web standards, making the code highly portable and saving on development time. Numerous JavaScript frameworks and CSS frameworks facilitate development. == History == The concept of a "web application" was first introduced in the Java language in the Servlet Specification version 2.2, which was released in 1999. At that time, both JavaScript and XML had already been developed, but the XMLHttpRequest object had only been recently introduced on Internet Explorer 5 as an ActiveX object. Beginning around the early 2000s, applications such as "Myspace (2003), Gmail (2004), Digg (2004), [and] Google Maps (2005)," started to make their client sides more and more interactive. A web page script is able to contact the server for storing/retrieving data without downloading an entire web page. The practice became known as Ajax in 2005. Eventually this was replaced by web APIs using JSON, accessed via JavaScript asynchronously on the client side. In earlier computing models like client-server, the processing load for the application was shared between code on the server and code installed on each client locally. In other words, an application had its own pre-compiled client program which served as its user interface and had to be separately installed on each user's personal computer. An upgrade to the server-side code of the application would typically also require an upgrade to the client-side code installed on each user workstation, adding to the support cost and decreasing productivity. Additionally, both the client and server components of the application were bound tightly to a particular computer architecture and operating system, which made porting them to other systems prohibitively expensive for all but the largest applications. Later, in 1995, Netscape introduced the client-side scripting language called JavaScript, which allowed programmers to add dynamic elements to the user interface that ran on the client side. Essentially, instead of sending data to the server in order to generate an entire web page, the embedded scripts of the downloaded page can perform various tasks such as input validation or showing/hiding parts of the page. "Progressive web apps", the term coined by designer Frances Berriman and Google Chrome engineer Alex Russell in 2015, refers to apps taking advantage of new features supported by modern browsers, which initially run inside a web browser tab but later can run completely offline and can be launched without entering the app URL in the browser. == Structure == Traditional PC applications are typically single-tiered, residing solely on the client machine. In contrast, web applications inherently facilitate a multi-tiered architecture. Though many variations are possible, the most common structure is the three-tiered application. In its most common form, the three tiers are called presentation, application and storage. The first tier, presentation, refers to a web browser itself. The second tier refers to any engine using dynamic web content technology (such as ASP, CGI, ColdFusion, Dart, JSP/Java, Node.js, PHP, Python or Ruby on Rails). The third tier refers to a database that stores data and determines the structure of a user interface. Essentially, when using the three-tiered system, the web browser sends requests to the engine, which then services them by making queries and updates against the database and generates a user interface. The 3-tier solution may fall short when dealing with more complex applications, and may need to be replaced with the n-tiered approach; the greatest benefit of which is how business logic (which resides on the application tier) is broken down into a more fine-grained model. Another benefit would be to add an integration tier, which separates the data tier and provides an easy-to-use interface to access the data. For example, the client data would be accessed by calling a "list_clients()" function instead of making an SQL query directly against the client table on the database. This allows the underlying database to be replaced without making any change to the other tiers. There are some who view a web application as a two-tier architecture. This can be a "smart" client that performs all the work and queries a "dumb" server, or a "dumb" client that relies on a "smart" server. The client would handle the presentation tier, the server would have the database (storage tier), and the business logic (application tier) would be on one of them or on both. While this increases the scalability of the applications and separates the display and the database, it still does not allow for true specialization of layers, so most applications will outgrow this model. == Security == Security breaches on these kinds of applications are a major concern because it can involve both enterprise information and private customer data. Protecting these assets is an important part of any web application, and there are some key operational areas that must be included in the development process. This includes processes for authentication, authorization, asset handling, input, and logging and auditing. Building security into the applications from the beginning is sometimes more effective and less disruptive in the long run. == Development == Writing web applications is simplified with the use of web application frameworks. These frameworks facilitate rapid application development by allowing a development team to focus on the parts of their application which are unique to their goals without having to resolve common development issues such as user management. In addition, there is potential for the development of applications on Internet operating systems, although currently there are not many viable platforms that fit this model.

    Read more →
  • Availability zone

    Availability zone

    In cloud computing, an availability region is a group of data centres that are located in the same geographical region. Availability regions comprise multiple availability zones, which are groups of data centres that are located far enough from each other to prevent large-scale outages in the event of failure of a single zone, whilst still being close enough to each other to enable low-latency connections. Distributed systems spanning multiple availability zones allow for high availability, even in the event of catastrophic failure, such as natural disasters. Services offering distinct availability zones include Amazon Web Services, Microsoft Azure and Google Cloud.

    Read more →
  • Comparison of machine learning software

    Comparison of machine learning software

    The following tables are a comparison of machine learning software such as software frameworks, libraries, and computer programs used for machine learning. == Machine learning software == == Other comparisons == == Machine learning helper libraries and platforms == Apache OpenNLP — natural language processing toolkit CUDA — GPU computing platform used to accelerate machine learning and deep learning workloads Horovod — distributed training framework for deep learning Hugging Face Transformers — library of pretrained transformer models built on other machine learning frameworks Kubeflow — machine learning platform for Kubernetes Mallet — toolkit for natural language processing and text analysis NumPy — numerical computing library used in machine learning OpenCV — computer vision library with machine learning functions ONNX — open format for representing machine learning models pandas — data analysis and data preparation library used in machine learning PlaidML — tensor compiler and backend for machine learning frameworks Polars — Dataframe library used for machine learning data preparation and analysis PyArrow — columnar data library used in machine learning data processing ROOT (TMVA) — data analysis framework with machine learning tools SciPy — scientific computing and optimization library used in machine learning == Online development environments for machine learning == Google Colab — hosted Jupyter Notebook environment commonly used for machine learning and deep learning JupyterLab — notebook-based development environment for machine learning and data science Jupyter Notebook — interactive notebook environment used for machine learning and data science Kaggle — online data science and machine learning platform

    Read more →
  • Application framework

    Application framework

    In computer programming, an application framework consists of a software framework used by software developers to implement the standard structure of application software. Application frameworks became popular with the rise of graphical user interfaces (GUIs), since these tended to promote a standard structure for applications. Programmers find it much simpler to create automatic GUI creation tools when using a standard framework, since this defines the underlying code structure of the application in advance. Developers usually use object-oriented programming (OOP) techniques to implement frameworks such that the unique parts of an application can simply inherit from classes extant in the framework. == Examples == Apple Computer developed one of the first commercial application frameworks, MacApp (first release 1985), for the Macintosh. Originally written in an extended (object-oriented) version of Pascal termed Object Pascal, it was later rewritten in C++. Another notable framework for the Mac is Metrowerks' PowerPlant, based on Carbon. Cocoa for macOS offers a different approach to an application framework, based on the OpenStep framework developed at NeXT. Since the 2010s, many apps have been created with the frameworks based on Google's Chromium project. The two prominent ones are Electron and the Chromium Embedded Framework. Free and open-source software frameworks exist as part of the Mozilla, LibreOffice, GNOME, KDE, NetBeans, and Eclipse projects. Microsoft markets a framework for developing Windows applications in C++ called the Microsoft Foundation Class Library, and a similar framework for developing applications with Visual Basic or C#, named .NET Framework. Several frameworks can build cross-platform applications for Linux, Macintosh, and Windows from common source code, such as Qt, wxWidgets, Juce, Fox toolkit, or Eclipse Rich Client Platform (RCP). Oracle Application Development Framework (Oracle ADF) aids in producing Java-oriented systems. Silicon Laboratories offers an embedded application framework for developing wireless applications on its series of wireless chips.

    Read more →
  • Marq (company)

    Marq (company)

    Marq (formerly Lucidpress) is a cloud-based software platform for brand management and templated content creation. The platform integrates with digital asset management (DAM) systems—including Aprimo and Bynder and customer relationship management (CRM) tools such as Salesforce and HubSpot. Marq also includes AI-assisted features for brand compliance and content automation. Trade publications have described the product as a brand templating and creative automation platform. == History == In October 2013, Lucid Software, Inc. announced Lucidpress as a public beta version. Following its release, Lucidpress was featured in TechCrunch, VentureBeat and PC World, with TechCrunch noting: "I had a chance to test the app before its launch and it is indeed very easy to use. If you've ever used a desktop publishing app in the past, you'll feel right at home with Marq, as it features the same kind of standard top-bar menu and layout options as most other publishing apps. In terms of features, it can also hold its own against similar desktop-based apps." In May 2021, Lucidpress announced that it had been acquired by Charles Thayne Capital ("CTC"), a growth-oriented and technology-focused private investment firm. In May 2021, following its acquisition by Charles Thayne Capital, Lucidpress became fully independent. Owen Fuller, who had served as General Manager since 2017, was appointed Chief Executive Officer. In 2022, Lucidpress was rebranded as Marq to reflect the company’s shift toward brand templating and creative automation tools, while continuing to support its publishing features. == Features == Marq integrates with customer relationship management (CRM) platforms such as Salesforce and HubSpot, enabling the creation of personalized, on-brand sales and marketing materials. The platform also connects with multiple digital asset management (DAM) systems, including Bynder, Aprimo, MediaValet, PhotoShelter, Acquia, and Canto. == Investment == Lucid Software raised $1 million in Seed in 2011, led by Google Ventures. In May 2014, the company received a $5 million investment. The round was led by Salt Lake-based Kickstart Seed Fund. In September 2016, the company received a $36 million investment from Spectrum Equity.

    Read more →