🛠️ Scheduled Maintenance | We’ll be undergoing scheduled maintenance and upgrades between 00:00 PST Jan 26th until 00:00 PST Jan 28th. There may be brief interruption of services in that period. We apologize for the inconvenience.

The 365 Data Science team is proud to invite you to our own community forum. A very well built system to support your queries, questions and give the chance to show your knowledge and help others in their path of becoming Data Science specialists.
Anybody can ask a question
Anybody can answer
The best answers are voted up and moderated by our team

How can you have a known population variance without having all the population data?

How can you have a known population variance without having all the population data?


I’ve been learning and teaching math for decades and I STILL don’t understand why we have formulas in statistics that use the population standard deviation or variance to calculate other things. 
If you have the population variance, didn’t you have to have the population data? And if you have the population data, don’t you have just about everything and there’s no reason to be doing other calculations?
Am I missing something?
Please help.
~Bon Crowder

1 Answer

365 Team

Hi Bon,
Thanks for reaching out!
In fact, you are indeed correct.
The whole field of statistics exists because we almost never have population data. 
Even if we do have population, we may not be able to analyze it (it may be so much that it doesn’t make sense to be used all at once).
Here’s an example.
Think about people using the Internet. The data Google has, approximates population data, BUT even their data is not. There are people who are not a part of the Google ecosystem in any way. That can be done by using other browsers (Opera, Safari, etc.), other search engines like Bing, DuckDuckGo, video provders different than Youtube, browse in incognito, etc. They are all a part of the population of people using the Internet, but Google doesn’t have much data on them.
So if Google wants to target Google ads to these people, they will basically be using some sample data, lookalikes, etc. to guess their preferences. 
Point being – even the company that has the most data… doesn’t have population data.
Now there are population formulas because for some metrics such as variance and standard deviation, if you have all the data, the result changes. So, you can use the population formula instead. 
In the case of population data, it is not like ‘you know everything’. You have all the data, but you can still make inferences about the future using statistics. For example, ‘based on all past data, how likely it is for future observations to…’. 
Hope this helps!
The 365 Team