What is Linear Regression?
Linear Regression is a statistical approach to model linear relationships between one or more explanatory variables (the independent variables) and a continuous target variable.
Linear Regression can be of two distinct types:
- Simple Linear Regression: one explanatory variable (independent variable)
- Multiple Linear Regression: more than one explanatory variable
How does Linear Regression work?
Linear Regression models the relationship between the explanatory variables and the target variable as a linear equation.
Let y be the target variable and the xᵢ’s be the explanatory variables. Let there be n such explanatory variables. Then, by assuming a linear relationship, we can say:
The wᵢ’s are called coefficients. The optimal value of these coefficients have to be learnt using the available training data so that the difference between the resulting y from this equation is closest to the true y. This difference is also known as the error (also known as residual errors).
The equation with the optimal set of coefficients is also known as the best fit line. The best fit line is the one for which total prediction error (considering all data points) is as small as possible.
What are the advantages of Linear Regression?
Linear Regression is one of the most fundamental statistical models. It is easy to interpret since the best coefficients associated with the explanatory variable show the relevance of that variable to the final output. This helps explain the model’s predictions to business users.
Linear Regression is extremely quick to train and does not require much data to offer a best fit line.
What are the drawbacks of Linear Regression?
Linear Regression can only model simple linear relationships. If the relationship between the explanatory and target variables is not linear, then Linear Regression fails. More often than not, it ends up being too simple for real world data which is rarely linear.
Linear Regression is also extremely sensitive to outliers and has a number of assumptions. One such assumption is that each explanatory variable is independent of the others, which is violated in most real world scenarios.
Think we're missing something? 🧐 Help us update this article by sending us your suggestions here. 🙏