The 365 Data Science team is proud to invite you to our own community forum. A very well built system to support your queries, questions and give the chance to show your knowledge and help others in their path of becoming Data Science specialists.
Anybody can ask a question
Anybody can answer
The best answers are voted up and moderated by our team

Why do we need to standardize the data in first place?

Why do we need to standardize the data in first place?


I am trying to understand how does standardizing the data help us? Can we also standardize a distribution which is not normal?

1 Answer

365 Team

Hi George!

Standardization is a common technique to deal with many different problems with the data. The biggest one being: scale.

When our data is of completely different scale like this:

One variable takes values in the range: 500,000 to 1,000,000.
Another takes values in the range: 0.001 to 0.005

Then the two are not really comparable. This causes a great deal of problems when using machine learning models (seen later in the course).

A simple solution is to standardize all variables. Once they have the same magnitude/scale, models work much better (due to homoscedasticity, etc.).

Note that all these topics will be explored later in the course, so you need not worry about it.

Finally, there are mathematical transformations that can transform non-normal distributions to normal. A commonly used term is ‘normalization’, which is a type of ‘standardization’ (but the formula is different).

Everything will become clear around the end of the course, promise!
365 Team

Online Data Science Training
SAVE 60%