Return to page

H2O.ai WIKI

Regression

What is regression?

Regression is a statistical technique used to study the relationship between independent and dependent variables. In machine learning, regression analysis is a fundamental concept that consists of a set of machine learning methods that predict a continuous outcome variable (y) based on the value of one or multiple predictor variables (x).

As a result, it helps establish a relationship between the variables by estimating how one variable affects the other.

Examples of regression

Regression models are widely used today. For example, regression analysis can predict a house's price given its features, predict the impact of SAT/GRE scores on college admissions, predict sales based on input parameters, and predict the weather.

Furthermore, there are different types of regression algorithms, such as linear regression, regression trees, lasso regression, and multivariate regression, that can help with the following:

  • Predicting a person's age
  • Predicting a person's nationality
  • Indicating whether the stock prices of a company will increase in the future

Why is regression important?

With customer sentiments, the goal is to predict the class label of a given piece of text. The sentiment of the text, like a tweet or a product review, is a popular topic in text classification. This is widely used in the e-commerce industry to help determine whether customers have made negative comments.

H2O AI Cloud and regression: AI Platform

H2O AI Cloud and Regression

H2O-3 provides a variety of metrics that can be used for evaluating supervised and unsupervised models. 

The following evaluation metrics are available for regression models. (Note that H2O-3 also calculates regression metrics for Classification problems.)

Read More

Regression vs. Other Technologies & Methodologies

Regression vs. classification

Classification and Regression are examples of Supervised Learning algorithms. Those two algorithms both work with labeled datasets in Machine learning and are used for prediction. They differ, however, in how they are used for different machine learning tasks.

The critical difference between regression and classification is that regression helps predict a continuous quantity, whereas classification indicates discrete class labels. The two types of machine learning algorithms also share some similarities:

  • With a regression algorithm, a discrete value in the form of an integer quantity can be predicted

  • A classification algorithm can predict a continuous value if it is in the form of a probabilistic class label

In essence, variable outputs for regression are numerical (continuous) while they are categorical (discrete).

Regression vs. machine learning

The assessment of the machine learning algorithm is carried out by testing its accuracy against a set of data. Alternatively, for a statistical model, confidence intervals, significance tests, and other tests can be used to appraise the model's validity by analyzing the parameters of the regression. Machine learning is all about results, which is similar to working for a company where you are measured solely by the quality of your performance, as opposed to statistical modeling, which is primarily concerned with finding correlations between variables and then assessing their significance and making predictions.

Regression vs. correlation

Correlation and regression measure the degree of a relationship between two variables differently. Correlation analyses indicate the strength and direction of the linear relationship between two variables. In contrast, simple linear regression analysis estimates parameters in a linear equation that can predict the values of one variable based on the other. Finally, correlation produces a single statistic, whereas regression generates an entire equation.

Regression vs. ANOVA

Regression analysis is done with variables that are fixed or independent in nature. It can be done with a single independent variable or multiple independent variables.  As opposed to ANOVA, which predicts a continuous outcome based on one or more categorical predictor variables. The main difference between linear regression and ANOVA is the way results are reported by all commonly used statistical software.