Return to page

H2O.ai WIKI

Unsupervised Machine Learning

What is Unsupervised Machine Learning?

Unsupervised machine learning algorithms refer to patterns from a dataset without a known or labeled outcome. By comparison, supervised machine learning algorithms have a known labeled outcome. Knowing this distinction helps you understand why unsupervised machine learning methods cannot be applied to regression or classification problems because you don’t know what the value/answer for the output data might be. Not knowing the value/answer makes it impossible for you to train an algorithm the way you typically would.

Unsupervised learning can, however, be used to discover the foundational structure of the data. Take for instance this everyday example: you’re at a grocery store and see an unlabeled fruit that you’ve never seen before. Based on your observations of the unknown fruit’s shape, size, or color, you can easily tell it apart from other fruit nearby. This is roughly how unsupervised machine learning happens.

What are Examples of Unsupervised Machine Learning?

Below are two examples of unsupervised machine learning.

Clustering: Finding Customer Segments

Clustering is an unsupervised machine learning technique with the goal of finding natural groups or clusters in a feature space and interpreting the input data. A common approach is to divide the data points so that each data point falls into a group that is similar to other data points in the same group based on a predefined similarity or distance metric in the feature space.

Clustering techniques are being used to determine customer segments in marketing data. Being able to determine different segments of customers helps marketing teams approach these customer segments in unique ways. (Think of demographics such as gender, location, age, education, income bracket, and so on.)

Dimensionality Reduction: Reducing the Complexity of a Problem

Dimensionality reduction is an unsupervised machine learning technique with the goal of reducing the number of random variables under consideration. A common use of dimensionality reduction is to reduce the complexity of a problem by projecting the feature space to a lower-dimensional space so that less correlated variables are considered in a machine learning system.

Common approaches being used in dimensionality reduction are PCA, t-SNE, and UMAP algorithms. They are valuable for reducing the complexity of a problem and also visualizing the data instances in a way that is easier to understand.

Why is Unsupervised Machine Learning Important?

Unsupervised machine learning is a useful technique that can be used when you do not have data on the desired outcome. For instance, determining a target market for a new product or service a business has never sold before.

Unsupervised Machine Learning FAQs

What is unsupervised machine learning?

Unsupervised machine learning algorithms refer to patterns from a dataset without a known or labeled outcome.

What is an example of unsupervised machine learning?

Clustering is an unsupervised machine learning technique with the goal of finding natural groups or clusters in a feature space and interpreting the input data.

What is the difference between supervised and unsupervised machine learning?

Unsupervised machine learning algorithms refer to patterns from a dataset without a known or labeled outcome. By comparison, supervised machine learning algorithms are trained to predict a known labeled outcome.

Is machine learning supervised or unsupervised?

Both. Within the field of machine learning, there are two main types of tasks, supervised and unsupervised machine learning. There’s also semi-supervised learning for cases when you have a lot more unlabeled data than labeled data, but still want to use the labels to get a boost.

What is the difference between machine learning and deep learning?

Machine learning is a branch of artificial intelligence and computer science which focuses on the use of data and algorithms to derive key insights from data. Deep learning is a sub-field of machine learning that works particularly well on unstructured data such as images or text, as it is able to generate abstractions of the data by itself, modeling the human brain.

H2O.ai and Unsupervised Machine Learning: H2O AI Cloud is a platform that helps data scientists apply unsupervised machine learning models to their datasets much faster. From H2O-3’s scalable clustering and anomaly detection methods that work on terabytes of data to H2O Driverless AI’s customizable recipes that enable unsupervised AutoML AI Apps such as H2O AutoInsights, H2O allows data scientists to get past the technology layer that changes on a daily basis and get straight to making, operating, and innovating with AI. As a result, businesses are able to innovate faster using proven AI technology. H2O.ai enables teams of data scientists, developers, machine-learning engineers, DevOps, IT professionals, and business users to work together with the same toolset toward a common goal.

Unsupervised Machine learning vs Other Technologies & Methodologies

Unsupervised machine learning vs supervised

With supervised learning, input data is provided to the model along with the output. For unsupervised learning, only input data is provided to the model.

Unsupervised machine learning vs neural network

Unsupervised learning only has input data and no corresponding output variables. In a neural network, data will be passing through interconnected layers of nodes, classifying characteristics and information of a layer before passing the results on to other nodes in subsequent layers.

Algorithms vs Unsupervised machine learning

With supervised learning, the algorithm learns on a labeled dataset, providing an answer key that the algorithm can use to evaluate its accuracy on training data. An unsupervised model provides unlabeled data that the algorithm tries to make sense of by extracting features and patterns on its own.