Return to page

H2O.ai WIKI

Multiclass Classification

What is Multiclass Classification?

Multiclass classification is the process of assigning entities with more than two classes. Each entity is assigned to one class without any overlap. An example of multiclass classification, using images of vegetables, where each image is either a carrot, tomato, or zucchini. Each image is placed in one of the three classes. For example, one image cannot be both a carrot and a zucchini. 

 

Classifiers Used in Multiclass Classification

Various algorithms are used in multiclass classification such as naive bayes, neural networks, k-nearest neighbors (kNN), and decision trees.

Naive Bayes 

Naive Bayes are parametric algorithms that require a set of assumptions to improve the learning process of the machine while using parameters independent of the training data size. 

 

Advantages

  • Easy to build and efficient for training models and making predictions

  • No hyperparameter tuning can give better results

  • Excellent text classifier

Disadvantage

  • It is a linear classifier, which is not suitable for classes that are not linearly separate

K-nearest neighbor 

K-nearest neighbor (kNN), a simple and powerful supervised machine learning (ML) algorithm, is used to solve regression and classification problems. It memorizes training data instead of learning a discriminative function from the data.

Advantages

  • kNN is a non-parametric classifier, and does not make assumptions about the distribution of classes.

  • Can be used universally in multiclass classification

  • Outliers have no influence

  • It is easy to implement

Disadvantages

  • K value must work with the testing and training data, making it difficult to find.

  • It memorizes instead of learning.

  • It measures distance with each data point, making it exhaustive.

Decision Trees

Like flow charts, decision trees are structures which use conditional statements to guide a decision path. Because they are easy to interpret, have a high level of accuracy, and are stable, decision trees are one of the most frequently used supervised learning methods. Additionally, they map both linear and non-linear relationships.

Advantages

  • Decision trees are ideal for classification.

  • Can compute binary or multiclass classification problems.

Disadvantages

  • Can create overly-complex trees.

  • Small variations in the data can result in an entirely different tree being created.

  • Create biased trees if one or more classes dominate.

 

Binary vs. Multiclass Classification

The algorithms discussed above can be used in both binary and multiclass classification. However, there are differences in these classifications and how they are used.

Binary classification is used to organize data into two classes. Examples of binary classification include: email spam detection, churn prediction, and conversion prediction.

Multiclass classification permits multiple classes. Examples of multiclass classification include: face classification, animal species classification, and optical character recognition.

 

How is Multiclass Classification Used?

Multiclass classification has a variety of applications. It can be used to identify animals from images and sort them into categories. Cybersecurity companies can use multiclass classification to sort incoming emails as spam or not. It can also be used when analyzing an individual's mood into more than positive or negative. Instead, it will use multiple categories such as, happy, sad, depressed, excited, etc. There is no limit to the amount of classes used in multiclass classification.