Artificial intelligence in industry

Artificial intelligence in industry

Industrial artificial intelligence, or industrial AI, refers to the application of artificial intelligence to industrial business processes. Unlike general artificial intelligence which is a frontier research discipline to build computerized systems that perform tasks requiring human intelligence, industrial AI is more concerned with the application of such technologies to address industrial pain-points for customer value creation, productivity improvement, cost reduction, site optimization, predictive analysis and insight discovery. Artificial intelligence and machine learning have become key enablers to leverage data in production in recent years due to a number of different factors: More affordable sensors and the automated process of data acquisition; More powerful computation capability of computers to perform more complex tasks at a faster speed with lower cost; Faster connectivity infrastructure and more accessible cloud services for data management and computing power outsourcing. == Categories == Possible applications of industrial AI and machine learning in the production domain can be divided into seven application areas: Market and trend analysis Machinery and equipment Intralogistics Production process Supply chain Building Product Each application area can be further divided into specific application scenarios that describe concrete AI/ML scenarios in production. While some application areas have a direct connection to production processes, others cover production adjacent fields like logistics or the factory building. An example from the application scenario Process Design & Innovation are collaborative robots. Collaborative robotic arms are able to learn the motion and path demonstrated by human operators and perform the same task. Predictive and preventive maintenance through data-driven machine learning are application scenarios from the Machinery & Equipment application area. == Challenges == In contrast to entirely virtual systems, in which ML applications are already widespread today, real-world production processes are characterized by the interaction between the virtual and the physical world. Data is recorded using sensors and processed on computational entities and, if desired, actions and decisions are translated back into the physical world via actuators or by human operators. This poses major challenges for the application of ML in production engineering systems. These challenges are attributable to the encounter of process, data and model characteristics: The production domain's high reliability requirements, high risk and loss potential, the multitude of heterogeneous data sources and the non-transparency of ML model functionality impede a faster adoption of ML in real-world production processes. In particular, production data comprises a variety of different modalities, semantics and quality. Furthermore, production systems are dynamic, uncertain and complex, and engineering and manufacturing problems are data-rich but information-sparse. Besides that, due to the variety of use cases and data characteristics, problem-specific data sets are required, which are difficult to acquire, hindering both practitioners and academic researchers in this domain. === Process and industry characteristics === The domain of production engineering can be considered as a rather conservative industry when it comes to the adoption of advanced technology and their integration into existing processes. This is due to high demands on reliability of the production systems resulting from the potentially high economic harm of reduced process effectiveness due to e.g., additional unplanned downtime or insufficient product qualities. In addition, the specifics of machining equipment and products prevent area-wide adoptions across a variety of processes. Besides the technical reasons, the reluctant adoption of ML is fueled by a lack of IT and data science expertise across the domain. === Data characteristics === The data collected in production processes mainly stem from frequently sampling sensors to estimate the state of a product, a process, or the environment in the real world. Sensor readings are susceptible to noise and represent only an estimate of the reality under uncertainty. Production data typically comprises multiple distributed data sources resulting in various data modalities (e.g., images from visual quality control systems, time-series sensor readings, or cross-sectional job and product information). The inconsistencies in data acquisition lead to low signal-to-noise ratios, low data quality and great effort in data integration, cleaning and management. In addition, as a result from mechanical and chemical wear of production equipment, process data is subject to various forms of data drifts. === Machine learning model characteristics === ML models are considered as black-box systems given their complexity and intransparency of input-output relation. This reduces the comprehensibility of the system behavior and thus also the acceptance by plant operators. Due to the lack of transparency and the stochasticity of these models, no deterministic proof of functional correctness can be achieved, complicating the certification of production equipment. Given their inherent unrestricted prediction behavior, ML models are vulnerable against erroneous or manipulated data, further risking the reliability of the production system because of lacking robustness and safety. In addition to high development and deployment costs, the data drifts cause high maintenance costs, which is disadvantageous compared to purely deterministic programs. == Standard processes for data science in production == The development of ML applications – starting with the identification and selection of the use case and ending with the deployment and maintenance of the application – follows dedicated phases that can be organized in standard process models. The process models assist in structuring the development process and defining requirements that must be met in each phase to enter the next phase. The standard processes can be classified into generic and domain-specific ones. Generic standard processes (e.g., CRISP-DM, ASUM-DM, or knowledge discovery in databases (KDD)) describe a generally valid methodology and are thus independent of individual domains. Domain-specific processes on the other hand consider specific peculiarities and challenges of special application areas. The Machine Learning Pipeline in Production is a domain-specific data science methodology that is inspired by the CRISP-DM model and was specifically designed to be applied in fields of engineering and production technology. To address the core challenges of ML in engineering – process, data, and model characteristics – the methodology especially focuses on use-case assessment, achieving a common data and process understanding data integration, data preprocessing of real-world production data and the deployment and certification of real-world ML applications. == Industrial data sources == The foundation of most artificial intelligence and machine learning applications in industrial settings are comprehensive datasets from the respective fields. Those datasets act as the basis for training the employed models. In other domains, like computer vision, speech recognition or language models, extensive reference datasets (e.g. ImageNet, Librispeech, The People's Speech) and data scraped from the open internet are frequently used for this purpose. Such datasets rarely exist in the industrial context because of high confidentiality requirements and high specificity of the data. Industrial applications of artificial intelligence are therefore often faced with the problem of data availability. For these reasons, existing open datasets applicable to industrial applications, often originate from public institutions like governmental agencies or universities and data analysis competitions hosted by companies. In addition to this, data sharing platforms exist. However, most of these platforms have no industrial focus and offer limited filtering abilities regarding industrial data sources.

Globetrooper

Globetrooper is a free travel app known for assisting travelers in finding partners for group trips and world adventures. Globetrooper offers a free social travel platform that helps people find travel partners. == History == Globetrooper was developed and released in 2010 by a couple; Todd Sullivan and Lauren McLeod who are two travel-minded individuals that wanted to make it easier for travelers to plan a journey and see the world. With their backgrounds in business, software & design, and a love for travel, both left the corporate world and launched Globetrooper on Lauren’s birthday 28 March 2010. Globetrooper was first launched as an information portal with a view to making it more social, but after some months, the content quickly grew and changed to the ‘travel partner’ concept.

Artificial intuition

Artificial intuition is a theoretical capacity of an artificial software to function similarly to human consciousness, specifically in the capacity of human consciousness known as intuition. == Comparison of human and the theoretically artificial == Intuition is the function of the mind, the experience of which, is described as knowledge based on "a hunch", resulting (as the word itself does) from "contemplation" or "insight". Psychologist Jean Piaget showed that intuitive functioning within the normally developing human child at the Intuitive Thought Substage of the preoperational stage occurred at from four to seven years of age. In Carl Jung's concept of synchronicity, the concept of "intuitive intelligence" is described as something like a capacity that transcends ordinary-level functioning to a point where information is understood with a greater depth than is available in more simple rationally-thinking entities. Artificial intuition is theoretically (or otherwise) a sophisticated function of an artifice that is able to interpret data with depth and locate hidden factors functioning in Gestalt psychology, and that intuition in the artificial mind would, in the context described here, be a bottom-up process upon a macroscopic scale identifying something like the archetypal (see τύπος). To create artificial intuition supposes the possibility of the re-creation of a higher functioning of the human mind, with capabilities such as what might be found in semantic memory and learning. The transferral of the functioning of a biological system to synthetic functioning is based upon modeling of functioning from knowledge of cognition and the brain, for instance as applications of models of artificial neural networks from the research done within the discipline of computational neuroscience. == Application software contributing to its development == The notion of a process of a data-interpretative synthesis has already been found in a computational-linguistic software application that has been created for use in an internal security context. The software integrates computed data based specifically on objectives incorporating a paradigm described as "religious intuitive" (hermeneutic), functional to a degree that represents advances upon the performance of generic lexical data mining.

Browsing

Browsing is a kind of orienting strategy. It is supposed to identify something of relevance for the browsing organism. In context of humans, it is a metaphor taken from the animal kingdom. It is used, for example, about people browsing open shelves in libraries, window shopping, or browsing databases or the Internet. In library and information science, it is an important subject, both purely theoretically and as applied science aiming at designing interfaces which support browsing activities for the user. == Definition == In 2011, Birger Hjørland provided the following definition: "Browsing is a quick examination of the relevance of a number of objects which may or may not lead to a closer examination or acquisition/selection of (some of) these objects. It is a kind of orienting strategy that is formed by our "theories", "expectations" and "subjectivity". == Controversies == As with any kind of human psychology, browsing can be understood in biological, behavioral, or cognitive terms on the one hand or in social, historical, and cultural terms on the other hand. In 2007, Marcia Bates researched browsing from "behavioural" approaches, while Hjørland (2011a+b) defended a social view. Bates found that browsing is rooted in our history as exploratory, motile animals hunting for food and nesting opportunities. According to Hjørland (2011a), on the other hand, Marcia Bates' browsing for information about browsing is governed by her behavioral assumptions, while Hjørland's browsing for information about browsing is governed by his socio-cultural understanding of human psychology. In short: Human browsing is based on our conceptions and interests. === Is browsing a random activity? === Browsing is often understood as a random activity. Dictionary.com, for example, has this definition: "to glance at random through a book, magazine, etc.". Hjørland suggests, however, that browsing is an activity that is governed by our metatheories. We may dynamically change our theories and conceptions but when we browse, the activity is governed by the interests, conceptions, priorities and metatheories that we have at that time. Therefore, browsing is not totally random. == Browsing versus analytical search strategies == In 1997, Gary Marchionini wrote: "A fundamental distinction is made between analytical and browsing strategies [...]. Analytical strategies depend on careful planning, the recall of query terms, and iterative query reformulations and examinations of results. Browsing strategies are heuristic and opportunistic and depend on recognizing relevant information. Analytic strategies are batch oriented and half duplex (turn talking) like human conversation, whereas browsing strategies are more interactive, real-time exchanges and collaborations between the information seeker and the information system. Browsing strategies demand a lower cognitive load in advance and a steadier attentional load throughout the information-seeking process. When it comes to Browsing, giblets are amazing." == Orienting strategies == Some sociologists, such as Berger and Zelditch in 1993, Wagner in 1984, and Wagner & Berger in 1985, have used the term "orienting strategies". They find that orienting strategies should be understood as metatheories: "Consider the very large proportion of sociological theory that is in the form of metatheory. It is discussion about theory: about what concepts it should include, about how those concepts should be linked, and about how theory should be studied. Similar to Kuhn’s paradigms, theories of this sort provide guidelines or strategies for understanding social phenomena and suggest the proper orientation of the theorist to these phenomena; they are orienting strategies. Textbooks in theory frequently focus on orienting strategies such as functionalism, exchange, or ethnomethodology." Sociologists thus use metatheories as orienting strategies. We may generalize and say that all people use metatheories as orienting strategies and that this is what direct our attention and also our browsing – also when we are not conscious about it.

Scriptella

Scriptella is an open source extract transform load (ETL) and script execution tool written in Java. It allows the use of SQL or another scripting language suitable for the data source to perform required transformations. Scriptella does not offer any graphical user interface. == Typical use == Database migration. Database creation/update scripts. Cross-database ETL operations, import/export. Alternative for Ant task. Automated database schema upgrade. == Features == Simple XML syntax for scripts. Add dynamics to your existing SQL scripts by creating a thin wrapper XML file: Support for multiple datasources (or multiple connections to a single database) in an ETL file. Support for many useful JDBC features, e.g. parameters in SQL including file blobs and JDBC escaping. Performance and low memory usage are one of the primary goals. Support for evaluated expressions and properties (JEXL syntax) Support for cross-database ETL scripts by using elements Transactional execution Error handling via elements Conditional scripts/queries execution (similar to Ant if/unless attributes but more powerful) Easy-to-Use as a standalone tool or Ant task, without deployment or installation. Easy-To-Run ETL files directly from Java code. Built-in adapters for popular databases for a tight integration. Support for any database with JDBC/ODBC compliant driver. Service Provider Interface (SPI) for interoperability with non-JDBC DataSources and integration with scripting languages. Out of the box support for JSR 223 (Scripting for the Java Platform) compatible languages. Built-in CSV, TEXT, XML, LDAP, Lucene, Velocity, JEXL and Janino providers. Integration with Java EE, Spring Framework, JMX and JNDI for enterprise ready scripts.

Harris corner detector

The Harris corner detector is a corner detection operator that is commonly used in computer vision algorithms to extract corners and infer features of an image. It was first introduced by Chris Harris and Mike Stephens in 1988 upon the improvement of Moravec's corner detector. Compared to its predecessor, Harris' corner detector takes the differential of the corner score into account with reference to direction directly, instead of using shifting patches for every 45 degree angles, and has been proved to be more accurate in distinguishing between edges and corners. Since then, it has been improved and adopted in many algorithms to preprocess images for subsequent applications. == Introduction == A corner is a point whose local neighborhood stands in two dominant and different edge directions. In other words, a corner can be interpreted as the junction of two edges, where an edge is a sudden change in image brightness. Corners are the important features in the image, and they are generally termed as interest points which are invariant to translation, rotation and illumination. Although corners are only a small percentage of the image, they contain the most important features in restoring image information, and they can be used to minimize the amount of processed data for motion tracking, image stitching, building 2D mosaics, stereo vision, image representation and other related computer vision areas. In order to capture the corners from the image, researchers have proposed many different corner detectors including the Kanade-Lucas-Tomasi (KLT) operator and the Harris operator which are most simple, efficient and reliable for use in corner detection. These two popular methodologies are both closely associated with and based on the local structure matrix. Compared to the Kanade-Lucas-Tomasi corner detector, the Harris corner detector provides good repeatability under changing illumination and rotation, and therefore, it is more often used in stereo matching and image database retrieval. Although there still exist drawbacks and limitations, the Harris corner detector is still an important and fundamental technique for many computer vision applications. == Development of Harris corner detection algorithm == Source: Without loss of generality, we will assume a grayscale 2-dimensional image is used. Let this image be given by I {\displaystyle I} . Consider taking an image patch ( x , y ) ∈ W {\displaystyle (x,y)\in W} (window) and shifting it by ( Δ x , Δ y ) {\displaystyle (\Delta x,\Delta y)} . The sum of squared differences (SSD) between these two patches, denoted f {\displaystyle f} , is given by: f ( Δ x , Δ y ) = ∑ ( x k , y k ) ∈ W ( I ( x k , y k ) − I ( x k + Δ x , y k + Δ y ) ) 2 {\displaystyle f(\Delta x,\Delta y)={\underset {(x_{k},y_{k})\in W}{\sum }}\left(I(x_{k},y_{k})-I(x_{k}+\Delta x,y_{k}+\Delta y)\right)^{2}} I ( x + Δ x , y + Δ y ) {\displaystyle I(x+\Delta x,y+\Delta y)} can be approximated by a Taylor expansion. Let I x {\displaystyle I_{x}} and I y {\displaystyle I_{y}} be the partial derivatives of I {\displaystyle I} , such that I ( x + Δ x , y + Δ y ) ≈ I ( x , y ) + I x ( x , y ) Δ x + I y ( x , y ) Δ y {\displaystyle I(x+\Delta x,y+\Delta y)\approx I(x,y)+I_{x}(x,y)\Delta x+I_{y}(x,y)\Delta y} This produces the approximation f ( Δ x , Δ y ) ≈ ∑ ( x , y ) ∈ W ( I x ( x , y ) Δ x + I y ( x , y ) Δ y ) 2 , {\displaystyle f(\Delta x,\Delta y)\approx {\underset {(x,y)\in W}{\sum }}\left(I_{x}(x,y)\Delta x+I_{y}(x,y)\Delta y\right)^{2},} which can be written in matrix form: f ( Δ x , Δ y ) ≈ ( Δ x Δ y ) M ( Δ x Δ y ) , {\displaystyle f(\Delta x,\Delta y)\approx {\begin{pmatrix}\Delta x&\Delta y\end{pmatrix}}M{\begin{pmatrix}\Delta x\\\Delta y\end{pmatrix}},} where M is the structure tensor, M = ∑ ( x , y ) ∈ W [ I x 2 I x I y I x I y I y 2 ] = [ ∑ ( x , y ) ∈ W I x 2 ∑ ( x , y ) ∈ W I x I y ∑ ( x , y ) ∈ W I x I y ∑ ( x , y ) ∈ W I y 2 ] {\displaystyle M={\underset {(x,y)\in W}{\sum }}{\begin{bmatrix}I_{x}^{2}&I_{x}I_{y}\\I_{x}I_{y}&I_{y}^{2}\end{bmatrix}}={\begin{bmatrix}{\underset {(x,y)\in W}{\sum }}I_{x}^{2}&{\underset {(x,y)\in W}{\sum }}I_{x}I_{y}\\{\underset {(x,y)\in W}{\sum }}I_{x}I_{y}&{\underset {(x,y)\in W}{\sum }}I_{y}^{2}\end{bmatrix}}} == Process of Harris corner detection algorithm == Commonly, Harris corner detector algorithm can be divided into five steps. Color to grayscale Spatial derivative calculation Structure tensor setup Harris response calculation Non-maximum suppression === Color to grayscale === If we use Harris corner detector in a color image, the first step is to convert it into a grayscale image, which will enhance the processing speed. The value of the gray scale pixel can be computed as a weighted sums of the values R, B and G of the color image, ∑ C ∈ { R , G , B } w C ⋅ C {\displaystyle \sum _{C\,\in \,\{R,G,B\}}w_{C}\cdot C} , where, e.g., w R = 0.299 , w G = 0.587 , w B = 1 − ( w R + w G ) = 0.114. {\displaystyle w_{R}=0.299,\ w_{G}=0.587,\ w_{B}=1-(w_{R}+w_{G})=0.114.} === Spatial derivative calculation === Next, we are going to find the derivative with respect to x and the derivative with respect to y, I x ( x , y ) {\displaystyle I_{x}(x,y)} and I y ( x , y ) {\displaystyle I_{y}(x,y)} . This can be approximated by applying Sobel operators. === Structure tensor setup === With I x ( x , y ) {\displaystyle I_{x}(x,y)} , I y ( x , y ) {\displaystyle I_{y}(x,y)} , we can construct the structure tensor M {\displaystyle M} . === Harris response calculation === For x ≪ y {\displaystyle x\ll y} , one has x ⋅ y x + y = x 1 1 + x / y ≈ x . {\displaystyle {\tfrac {x\cdot y}{x+y}}=x{\tfrac {1}{1+x/y}}\approx x.} In this step, we compute the smallest eigenvalue of the structure tensor using that approximation: λ min ≈ λ 1 λ 2 ( λ 1 + λ 2 ) = det ( M ) tr ⁡ ( M ) {\displaystyle \lambda _{\min }\approx {\frac {\lambda _{1}\lambda _{2}}{(\lambda _{1}+\lambda _{2})}}={\frac {\det(M)}{\operatorname {tr} (M)}}} with the trace t r ( M ) = m 11 + m 22 {\displaystyle \mathrm {tr} (M)=m_{11}+m_{22}} . Another commonly used Harris response calculation is shown as below, R = λ 1 λ 2 − k ( λ 1 + λ 2 ) 2 = det ( M ) − k tr ⁡ ( M ) 2 {\displaystyle R=\lambda _{1}\lambda _{2}-k(\lambda _{1}+\lambda _{2})^{2}=\det(M)-k\operatorname {tr} (M)^{2}} where k {\displaystyle k} is an empirically determined constant; k ∈ [ 0.04 , 0.06 ] {\displaystyle k\in [0.04,0.06]} . === Non-maximum suppression === In order to pick up the optimal values to indicate corners, we find the local maxima as corners within the window which is a 3 by 3 filter. == Improvement == Sources: Harris-Laplace Corner Detector Differential Morphological Decomposition Based Corner Detector Multi-scale Bilateral Structure Tensor Based Corner Detector == Applications == Image Alignment, Stitching and Registration 2D Mosaics Creation 3D Scene Modeling and Reconstruction Motion Detection Object Recognition Image Indexing and Content-based Retrieval Video Tracking

Emotion recognition

Emotion recognition is the process of identifying human emotion. People vary widely in their accuracy at recognizing the emotions of others. Use of technology to help people with emotion recognition is a relatively nascent research area. Generally, the technology works best if it uses multiple modalities in context. To date, the most work has been conducted on automating the recognition of facial expressions from video, spoken expressions from audio, written expressions from text, and physiology as measured by wearables. == Human == Humans show a great deal of variability in their abilities to recognize emotion. A key point to keep in mind when learning about automated emotion recognition is that there are several sources of "ground truth", or truth about what the real emotion is. Suppose we are trying to recognize the emotions of Alex. One source is "what would most people say that Alex is feeling?" In this case, the 'truth' may not correspond to what Alex feels, but may correspond to what most people would say it looks like Alex feels. For example, Alex may actually feel sad, but he puts on a big smile and then most people say he looks happy. If an automated method achieves the same results as a group of observers it may be considered accurate, even if it does not actually measure what Alex truly feels. Another source of 'truth' is to ask Alex what he truly feels. This works if Alex has a good sense of his internal state, and wants to tell you what it is, and is capable of putting it accurately into words or a number. However, some people are alexithymic and do not have a good sense of their internal feelings, or they are not able to communicate them accurately with words and numbers. In general, getting to the truth of what emotion is actually present can take some work, can vary depending on the criteria that are selected, and will usually involve maintaining some level of uncertainty. == Automatic == Decades of scientific research have been conducted developing and evaluating methods for automated emotion recognition. There is now an extensive literature proposing and evaluating hundreds of different kinds of methods, leveraging techniques from multiple areas, such as signal processing, machine learning, computer vision, and speech processing. Different methodologies and techniques may be employed to interpret emotion such as Bayesian networks. , Gaussian Mixture models and Hidden Markov Models and deep neural networks. === Approaches === The accuracy of emotion recognition is usually improved when it combines the analysis of human expressions from multimodal forms such as texts, physiology, audio, or video. Different emotion types are detected through the integration of information from facial expressions, body movement and gestures, and speech. The technology is said to contribute in the emergence of the so-called emotional or emotive Internet. The existing approaches in emotion recognition to classify certain emotion types can be generally classified into three main categories: knowledge-based techniques, statistical methods, and hybrid approaches. ==== Knowledge-based techniques ==== Knowledge-based techniques (sometimes referred to as lexicon-based techniques), utilize domain knowledge and the semantic and syntactic characteristics of text and potentially spoken language in order to detect certain emotion types. In this approach, it is common to use knowledge-based resources during the emotion classification process such as WordNet, SenticNet, ConceptNet, and EmotiNet, to name a few. One of the advantages of this approach is the accessibility and economy brought about by the large availability of such knowledge-based resources. A limitation of this technique on the other hand, is its inability to handle concept nuances and complex linguistic rules. Knowledge-based techniques can be mainly classified into two categories: dictionary-based and corpus-based approaches. Dictionary-based approaches find opinion or emotion seed words in a dictionary and search for their synonyms and antonyms to expand the initial list of opinions or emotions. Corpus-based approaches on the other hand, start with a seed list of opinion or emotion words, and expand the database by finding other words with context-specific characteristics in a large corpus. While corpus-based approaches take into account context, their performance still vary in different domains since a word in one domain can have a different orientation in another domain. ==== Statistical methods ==== Statistical methods commonly involve the use of different supervised machine learning algorithms in which a large set of annotated data is fed into the algorithms for the system to learn and predict the appropriate emotion types. Machine learning algorithms generally provide more reasonable classification accuracy compared to other approaches, but one of the challenges in achieving good results in the classification process, is the need to have a sufficiently large training set. Some of the most commonly used machine learning algorithms include Support Vector Machines (SVM), Naive Bayes, and Maximum Entropy. Deep learning, which is under the unsupervised family of machine learning, is also widely employed in emotion recognition. Well-known deep learning algorithms include different architectures of Artificial Neural Network (ANN) such as Convolutional Neural Network (CNN), Long Short-term Memory (LSTM), and Extreme Learning Machine (ELM). The popularity of deep learning approaches in the domain of emotion recognition may be mainly attributed to its success in related applications such as in computer vision, speech recognition, and Natural Language Processing (NLP). ==== Hybrid approaches ==== Hybrid approaches in emotion recognition are essentially a combination of knowledge-based techniques and statistical methods, which exploit complementary characteristics from both techniques. Some of the works that have applied an ensemble of knowledge-driven linguistic elements and statistical methods include sentic computing and iFeel, both of which have adopted the concept-level knowledge-based resource SenticNet. The role of such knowledge-based resources in the implementation of hybrid approaches is highly important in the emotion classification process. Since hybrid techniques gain from the benefits offered by both knowledge-based and statistical approaches, they tend to have better classification performance as opposed to employing knowledge-based or statistical methods independently. A downside of using hybrid techniques however, is the computational complexity during the classification process. === Datasets === Data is an integral part of the existing approaches in emotion recognition and in most cases it is a challenge to obtain annotated data that is necessary to train machine learning algorithms. For the task of classifying different emotion types from multimodal sources in the form of texts, audio, videos or physiological signals, the following datasets are available: HUMAINE: provides natural clips with emotion words and context labels in multiple modalities Belfast database: provides clips with a wide range of emotions from TV programs and interview recordings SEMAINE: provides audiovisual recordings between a person and a virtual agent and contains emotion annotations such as angry, happy, fear, disgust, sadness, contempt, and amusement IEMOCAP: provides recordings of dyadic sessions between actors and contains emotion annotations such as happiness, anger, sadness, frustration, and neutral state eNTERFACE: provides audiovisual recordings of subjects from seven nationalities and contains emotion annotations such as happiness, anger, sadness, surprise, disgust, and fear DEAP: provides electroencephalography (EEG), electrocardiography (ECG), and face video recordings, as well as emotion annotations in terms of valence, arousal, and dominance of people watching film clips DREAMER: provides electroencephalography (EEG) and electrocardiography (ECG) recordings, as well as emotion annotations in terms of valence, dominance of people watching film clips MELD: is a multiparty conversational dataset where each utterance is labeled with emotion and sentiment. MELD provides conversations in video format and hence suitable for multimodal emotion recognition and sentiment analysis. MELD is useful for multimodal sentiment analysis and emotion recognition, dialogue systems and emotion recognition in conversations. MuSe: provides audiovisual recordings of natural interactions between a person and an object. It has discrete and continuous emotion annotations in terms of valence, arousal and trustworthiness as well as speech topics useful for multimodal sentiment analysis and emotion recognition. UIT-VSMEC: is a standard Vietnamese Social Media Emotion Corpus (UIT-VSMEC) with about 6,927 human-annotated sentences with six emotion labels, contributing to emotion recognition research in Vietnamese