The word standardization may sound a little weird at first but understanding it in the context of statistics is not brain surgery. It is something that has to do with distributions. In fact, every distribution can be standardized. Say the mean and the variance of a variable are mu and sigma squared respectively. Standardization is the process of transforming a variable to one with a mean of 0 and a standard deviation of 1.
You can see how everything is denoted below along with the formula that allows us to standardize a distribution.
What’s a Standard Normal Distribution?
Logically, a normal distribution can also be standardized. The result is called a standard normal distribution.
You may be wondering how the standardization goes down here. Well, all we need to do is simply shift the mean by mu, and the standard deviation by sigma.
We use the letter Z to denote it. As we already mentioned, its mean is 0 and its standard deviation: 1.
The standardized variable is called a z-score. It is equal to the original variable, minus its mean, divided by its standard deviation.
A Case in Point
Let’s take an approximately normally distributed set of numbers: 1, 2, 2, 3, 3, 3, 4, 4, and 5.
As shown below, we get a new data set of: -2, -1, -1, 0, 0, 0, 1, 1, and 2.
The new mean is 0, exactly as we anticipated.
The Next Step of the Standardization
So far, we have a new distribution. It is still normal, but with a mean of 0 and a standard deviation of 1.22. The next step of the standardization is to divide all data points by the standard deviation. This will drive the standard deviation of the new data set to 1.
Let’s go back to our example.
The original dataset has a standard deviation of 1.22. The same goes for the dataset which we obtained after subtracting the mean from each data point.
Now, let’s divide each data point by 1.22. As you can see in the picture below, we get: -1.6, -0.82, -0.82, 0, 0, 0, 0.82, 0.82, and 1.63.
And the mean is still 0!
What Are the Benefits of Standardization?
This is how we can obtain a standard normal distribution from any normally distributed data set.
Using it makes predictions and inferences much easier. This is exactly what will help us a great deal in the next tutorials. So, if you want to use the knowledge you gained here, feel free to jump into the linked tutorial.
Interested in learning more? You can take your skills from good to great with our statistics tutorials!