What is Linear Regression?

Linear Regression is a statistical approach to model linear relationships between one or more explanatory variables (the independent variables) and a continuous target variable.

A scatterplot to check for linear relationship. Image courtesy: Laerd Statistics

Linear Regression can be of two distinct types:

  1. Simple Linear Regression: one explanatory variable (independent variable)
  2. Multiple Linear Regression: more than one explanatory variable

How does Linear Regression work?

Linear Regression models the relationship between the explanatory variables and the target variable as a linear equation.

Let y be the target variable and the xᵢ’s be the explanatory variables. Let there be n such explanatory variables. Then, by assuming a linear relationship, we can say:

y = w₀ + w₁x₁ + … wnxn

The wᵢ’s are called coefficients. The optimal value of these coefficients have to be learnt using the available training data so that the difference between the resulting y from this equation is closest to the true y. This difference is also known as the error (also known as residual errors).

The equation with the optimal set of coefficients is also known as the best fit line. The best fit line is the one for which total prediction error (considering all data points) is as small as possible.

A linear regression model with the best fit line and errors. Image courtesy: STADA

What are the advantages of Linear Regression?

Linear Regression is one of the most fundamental statistical models. It is easy to interpret since the best coefficients associated with the explanatory variable show the relevance of that variable to the final output. This helps explain the model’s predictions to business users.

Linear Regression is extremely quick to train and does not require much data to offer a best fit line.

What are the drawbacks of Linear Regression?

Linear Regression can only model simple linear relationships. If the relationship between the explanatory and target variables is not linear, then Linear Regression fails. More often than not, it ends up being too simple for real world data which is rarely linear.

Linear Regression is also extremely sensitive to outliers and has a number of assumptions. One such assumption is that each explanatory variable is independent of the others, which is violated in most real world scenarios.

Think we're missing something? 🧐 Help us update this article by sending us your suggestions here. 🙏

See also

Articles you might be interested in

  1. Linear Regression — detailed view
  2. Sklearn Linear Regression documentation
  3. How to implement Simple Linear Regression from scratch with Python