Return to page

H2O.ai WIKI

Linear Regression

What is Linear Regression?

Linear regression is a way to model the relationship between a response variable and one or more explanatory variables. In linear regression, the data is modeled by a linear function.
 

Examples of Linear Regression

Linear regressions can be used in business to evaluate trends and make estimates or forecasts. For example, if a company's sales have increased steadily every month for the past few years, by conducting a linear analysis of the sales data with monthly sales, the company could forecast sales in the coming months. Consider the example below.

Product development

Advertising and revenue: Businesses use linear regression to better understand the relationships between advertising spending and revenue.

For example, they might fit a simple linear regression model using advertising spending as the predictor variable and revenue as the response variable. The regression model would take the following form: revenue = β0 + β1(ad spending).

The coefficient β0 would represent the total expected revenue when ad spending is zero. The coefficient β1 would represent the average change in total revenue when ad spending is increased by one unit, a dollar.

If β1 is negative, it would mean that more ad spending is associated with less revenue. If β1 is close to zero, it would mean that ad spending has little effect on revenue. And if β1 is positive, it would mean more ad spending is associated with more revenue.

Why is Linear Regression Important?

Linear regression models are an important and proven way to reliably predict future outcomes. Because linear regression is a long-established statistical procedure, the properties of linear regression models are well understood and can be trained very quickly.

Linear Regression FAQs

How do you calculate linear regression?

Consider the linear regression equation Y= a + bX, where Y is the dependent variable (that's the variable that goes on the Y-axis), X is the independent variable (i.e. it is plotted on the X-axis), b is the slope of the line and a is the y-intercept.

What are some benefits of using linear regression?

  • Ease of use. The model is simple to implement. It does not require a lot of engineering overhead, neither before launch nor during maintenance.
  • Interpretability. Linear regression is straightforward to interpret.
  • Scalability. The algorithm is not computationally heavy, which means that linear regression is perfect for use cases where scaling is expected.
  • Performs well online. Due to the ease of computation, linear regression can be used in online settings, meaning that the model can be retrained with each new example and generate predictions in near real-time.

What is the difference between simple linear regression and multiple linear regression?

The difference lies in the number of independent variables that they take as inputs. Simple linear regression takes a single feature, while multiple linear regression takes multiple x values.

H2O.ai and Linear Regression: H2O AI Cloud is a platform that helps data scientists apply linear regression models to their datasets much faster. The AI Cloud allows data scientists to get past the technology layer that changes daily and get straight to making, operating, and innovating with AI. As a result, businesses can innovate faster using proven AI technology. H2O.ai enables teams of data scientists, developers, machine-learning engineers, DevOps, IT professionals, and business users to work together with the same toolset toward a common goal.

Linear Regression vs Other Technologies & Methodologies

Linear Regression vs Logistic Regression

Linear Regression is used to manage regression problems and logistic regression is used to manage the classification problems.

Linear Regression vs Multiple Regression

Linear regression is used for simple calculations and multiple linear regression tends to be used for more specific calculations. When relationships are more straightforward, linear regression can capture the relationship between the two variables. For complex relationships, multiple linear regression can be more useful.

Linear Regression vs Correlation

Regression is often used to build models/equations to predict a key response, Y, from a set of predictor (X) variables. Correlation is often used to quickly summarize the direction and strength of the relationships between a set of 2 or more numeric variables.

Linear Regression vs ANOVA

Regression is often used on variables that are fixed or independent in nature. ANOVA is often used to find commonalities between variables of different groups unrelated to each other.

Linear Regression Resources