I’ve been learning and teaching math for decades and I STILL don’t understand why we have formulas in statistics that use the population standard deviation or variance to calculate other things.
If you have the population variance, didn’t you have to have the population data? And if you have the population data, don’t you have just about everything and there’s no reason to be doing other calculations?
Am I missing something?
Thanks for reaching out!
In fact, you are indeed correct.
The whole field of statistics exists because we almost never have population data.
Even if we do have population, we may not be able to analyze it (it may be so much that it doesn’t make sense to be used all at once).
Here’s an example.
Think about people using the Internet. The data Google has, approximates population data, BUT even their data is not. There are people who are not a part of the Google ecosystem in any way. That can be done by using other browsers (Opera, Safari, etc.), other search engines like Bing, DuckDuckGo, video provders different than Youtube, browse in incognito, etc. They are all a part of the population of people using the Internet, but Google doesn’t have much data on them.
So if Google wants to target Google ads to these people, they will basically be using some sample data, lookalikes, etc. to guess their preferences.
Point being – even the company that has the most data… doesn’t have population data.
Now there are population formulas because for some metrics such as variance and standard deviation, if you have all the data, the result changes. So, you can use the population formula instead.
In the case of population data, it is not like ‘you know everything’. You have all the data, but you can still make inferences about the future using statistics. For example, ‘based on all past data, how likely it is for future observations to…’.
Hope this helps!
The 365 Team