Multivariable Linear regression is a common machine learning algorithm. When getting started with machine learning, multivariable linear regression is a great place to dive into next. If you haven’t read the previous article about Simple Linear Regression, I would recommend it, because that is the best place to start.
What is Multivariable Linear Regression?
Multivariable Linear regression in simple terms is a statistical way of measuring the relationship between multiple variables. Such as, as time increases, so does cost.
Why does linear regression matter? In real life, generally there isn’t 1 variable that predicts a value, often times multiple variable predicts a value. Simply put, you can predict the future!
Variable vs Feature
In machine learning, you may hear the term “feature” used often. Feature and variable are often times used interchangeably. Let’s use an example of a feature. Let’s take an apple, what are the basic features of this apple?
The apple is:
- Has a stem
Feature selection in reality is nearly a field on it’s own. Feature selection is the process of selecting the best features to use to best predict the y value. Here are a few tips when trying to select features:
- The less correlated the features are, the better – Using the correlation coefficient
- Features must describe the predictive value
- Features must be related to the predictive value
If you recall back to the linear regression formula, y = mx + b, you may notice that the formula is similar. The basic formula is:
y = m1x1 + m2x2 + b
or another way to write this is:
y = w1x1 + w2x2 + b
- y – the predicted value
- w1x1 – the first feature
- w2x2 – the second feature
- b – the bias
Implement the Math
Let’s say that we are given the following dataset:
|House Value (y)||Square Footage (x1)||Number of Bedrooms (x2)|
Let’s also say that we have a house with:
- 3 bedrooms
- 2,005 square feet
What is the house value?
First, we figure out the slope between feature one, which is the square foot and the house value, y.
The slope is $112.50 per square foot.
Next, we figure out the slope between feature two, which is the number of bedrooms and the house value, y.
The slope is $10,500 per bedroom added to the house.
Plugin the Values
Using the same formula as found above, y = w1x1 + w2x2 + b, we now plugin the values into the formula.
- Plugin feature one – the square footage slope and the 2,005 square footage value
- y = $112.5 * 2,005 + w2x2 + b
- Plugin feature two – the number of bedrooms and the 3 bedroom house value
- y = $112.5 * 2,005 + $10,500 * 3 + b
- Finally, plugin the bias – which in our case is $0
- y = $112.5 * 2,005 + $10,500 * 3 + 0
- Complete the math
- y = $257,062.50
From this article and video, you were able to understand what multivariable linear regression is, what the math looks like, and how to implement multivariable linear regression in a simple problem. Please provide any comments to help improve this post or video for future learners.