What Are Regression Trees?

BERT

BERT

What Are Regression Trees?

In machine learning (ML), decision trees are algorithms that use a series of if-else decisions to classify input data by the answers generated. Regression trees are a specific form of decision trees which are used to predict numerical outputs instead of classifications. Regression trees are based on a data set, from either historical data or an experiment. The data set contains input variables called predictors and an output variable that the user wants to predict. The regression tree is then built to help predict the outcome of future inputs.

How to Build A Regression Tree

Building a regression tree begins with careful analysis of the data set. The data is plotted for each predictor, similar to how a user would plot a regression. Then, for each graph, the sum of squared residuals (RSS) is calculated at various points across the graph. The RSS is a statistical calculation which aggregates the distance of each data point from the average above and below the selected data point. A higher sum of squared residuals represents a higher variance from the average line. The value which generated the minimum RSS for each predictor is selected as the threshold for the relevant decision. The variable that has the lowest of all the calculated sums of squared residuals becomes the root of the tree. The tree is then built downward based on the sum of squared residuals.

How to Prune A Regression Tree

When first created, a new regression tree is prone to overfitting. Regression tree pruning is the process of optimizing a regression tree by removing or splitting decision nodes based on the bias and variance of the output. Pruning modifies the tree to create different variations and find the tree that performs best on a validation data set. Pruning is often performed in a reverse order, meaning the last node generated is the first to be considered for elimination. The cost complexity algorithm can help guide the pruning process. Other methods allow multiple tree variants to work together to provide a more accurate and well supported prediction. These include bagging, boosting, and the random tree method.

Generative AI

Predictive AI

On-Premise Platform

Managed Cloud

Hybrid Cloud

Industry Solutions

Use Cases

H2O.ai Hospital Occupancy Simulator

Strategic Transformation

View All Case Studies

FINANCIAL SERVICES

TELECOM

ENERGY

MARKETING

Partners

Resources

Open Source

Join H2O University

Support

Events

H2O.ai Wiki

Responsible AI

Company

Submit AI 100 2025 Nomination

2025 Gartner® Magic Quadrant™

H2O AI 100 2024

WIKI

What Are Regression Trees?

How to Build A Regression Tree

How to Prune A Regression Tree

Why H2O.ai

Products

Resources

Insights

Generative AI

Predictive AI

On-Premise Platform

Managed Cloud

Hybrid Cloud

Industry Solutions

Use Cases

H2O.ai Hospital Occupancy Simulator

Strategic Transformation

View All Case Studies

FINANCIAL SERVICES

TELECOM

ENERGY

MARKETING

Partners

Resources

Open Source

Join H2O University

Support

Events

H2O.ai Wiki

Responsible AI

Company

Submit AI 100 2025 Nomination

2025 Gartner® Magic Quadrant™

H2O AI 100 2024

WIKI

Algorithms

Artificial Intelligence

BERT

Data

Deep Learning

General

Machine Learning

Modeling

Predictions

Tools

Training

n.a

-Select-

Algorithms

Artificial Intelligence

BERT

Data

Deep Learning

General

Machine Learning

Modeling

Predictions

Tools

Training

n.a

What Are Regression Trees?

How to Build A Regression Tree

How to Prune A Regression Tree

Why H2O.ai

Products

Resources

Insights