Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
What is Feature Scaling? Why is it crucial for Machine Learning?
Brain Dump
Sep 16, 2024
87 views
This is note is Machine Learning 101. Explanation on feature scaling.
Let's see a story from a school.
Imagine you're comparing different aspects of your classmates:
let's say you want to use computer to find similarities between classmates. The computer doesn't understand that these numbers represent different things. It just sees number.
Without feature scaling:
Feature scaling is like converting everything to the same scale, so the computer can compare them fairly. It's similar to how in school, all subjects are usually graded on the same scale of (0 - 100) even though they're very different.
After scaling, the computer can compare all these aspects fairly, without giving too much importance to one just because its numbers were bigger.
In machine learning, this helps the computer make better decisions and find patterns more accurately, just like how having all your school subjects on the same grading scale helps you compare your performance across different classes.
A visual example of converting features/input data into the same scale, so that it can be compared fairly.
Another example: Feature Scaling Example Dataset
## Why Feature Scaling is Important for This Dataset?
1. Age vs. Salary:
- Age ranges from 28 to 52 (difference of 24)
- Salary ranges from $75,000 to $150,000 (difference of $75,000)
- Without scaling, salary differences would dominate age differences
2. Height vs. Weight:
- Height ranges from 160 to 180 cm (difference of 20)
- Weight ranges from 55 to 85 kg (difference of 30)
- The scales are closer, but still not directly comparable
3. Years of Experience vs. Other Features:
- Ranges from 5 to 30 years
- Much smaller scale compared to salary, but larger than age
4. Salary vs. Years of Experience:
- Both tend to increase with career progression
- But salary is in tens of thousands while experience is in single/double digits
Impact on Machine Learning Models
Without scaling:
After scaling:
This example illustrates why feature scaling is crucial across various types of data, ensuring that all features are considered proportionally in machine learning models, regardless of their original units or scales.
Feature Scaling is absolutely compulsory for deep learning. There are two way to do feature scaling.
1. Normalization :
The process of taking minimum inside a column, subtracting that minimum from every single value inside that column, then dividing by the difference between maximum and then minimum.
X' = X - Xmin / (Xmax - Xmin)
we will end up with X' (a new adjusted column) where we have values between (0 - 1)
[0 , 1]
In the example of a school:
Normalization is like converting all scores to a 0-100 scale. For example, with height:
2. Standardization :
The process of taking average inside the column & divided by the standard deviation.
As a result, almost all the values inside the column will be between [-3, +3]. If you have some outliers/extreme values, they can end up outside this range i.e -3 to +3 boundary.
X' = (X - mean) / s.d, [-3 , 3] is the range of results.
In the example of a school:
Standardization: is like grading on a curve, where most people are in the middle, and few are at the extremes.
Wherever you build an artificial neural network you have to apply feature scaling.
That's absolutely fundamental that we will actually apply feature scaling to all our features, you know regardless of whether they already have some values of zero & 1, we will just scale everything because it is so important t to do it for deep learning.
We will take our data processing toolkit
Let's assume we have a trained ANN model to predict if the customer with the following informations will leave the credit card company?
| Geography | France |
| Credit Score | 600 |
| Gender | Male |
| Age | 40 years |
| Tenure | 3 years |
| Balance | $60,000 |
| Number of Products| 2 |
| Has Credit Card | Yes |
| Is Active Member | Yes |
| Estimated Salary | $50,000 |
do we pass value these features with $ for example 50000$, 60,000$?
No we apply feature scalling.
This is how we do feature scaling using scikit-learn in python.
#featureScaling #inputLayers #machinglearning #features #inputs #scalling #cscs598