Return to page

WIKI

What is a Target Variable in Machine Learning?

The target variable is the feature of a dataset that you want to understand more clearly. It is the variable that the user would want to predict using the rest of the dataset. In most situations, a supervised machine learning algorithm is used to derive the target variable. Such an algorithm uses historical data to learn patterns and uncover relationships between other parts of your dataset and the target. Target variables may vary depending on the goal and available data. 
 

Why are Target Variables Important?

In the absence of a labeled target, supervised machine learning algorithms would not be able to map available data to outcomes.

  • Example:

    A child would be incapable of figuring out that dogs are called dogs without being told a few times. Well-defined targets are important, as the only thing the algorithm does is learn a function that maps the relationship between input data and the target. The model’s outcomes mean nothing if the target isn’t well understood.
     

How Do I Determine the Target Variable? 

Determining the target variable can require running an existing suboptimal system until enough training data is collected.

  • Example:

    When building a machine learning solution for measuring customer attrition in the telecommunication industry, you need to spend significant time observing, weeks or even months, as some customers unsubscribe and others renew. Once you have enough training instances to build an accurate machine learning model, you can begin using machine learning in production.
     

Target Variable FAQs

Is the target an independent variable?

No. With data mining tools, the dependent variable is assigned a role as the target variable, while an independent variable may be given a role as the regular variable.

What type of target variable is used for a regression algorithm?

In the case of regression models, the target is real-valued, i.e. value is in real numbers. In comparison, with a classification model, the target is binary or multivalued. 

What is a binary variable in statistics?

A binary variable in statistics is a variable with only two values. Examples include:

  • 1 / 0

  • Yes / No

  • Success / Failure

  • Male / Female

  • Black / White
     

H2O.ai and Target Variables

When building supervised machine learning models at H2O, you can specify your target variable to state which feature in your dataset you want the model to predict.

  • When using H2O-3 for distributed and open source models, use the y parameter 

  • When using Driverless AI for AutoML experiments, use the Target Column parameter

  • When using Hydrogen Torch for no-code deep learning, use the Label Column(s) parameter. Please note that some Hydrogen Torch use cases, such as Text Regression, can have more than one target column or multiple labels per row. 

When using Auto Insights to explore your datasets, you can specify a Target Column to get insights focused on the data you will be predicting in your models

 

Target Variables vs Other Technologies & Methodologies

Target variable vs predictor variable

The target variable is the variable whose values are modeled and predicted by other variables. A predictor variable is a variable whose values will be used to predict the value of the target variable.