The 365 Data Science team is proud to invite you to our own community forum. A very well built system to support your queries, questions and give the chance to show your knowledge and help others in their path of becoming Data Science specialists.
Anybody can ask a question
Anybody can answer
The best answers are voted up and moderated by our team

Sample or population?

Sample or population?


Excellent course, love it so far! I was wondering about something that I still get confused about: sample and population. I completely understand the examples given in the lecture, however this is what I’m confused about in more detail:
Let’s say, for example, in a biomedical field when a scientist is doing research using animals – some animals are given treatment A, and some animals are given treatment B, the rest are not given any treatment (control group). Each treatment group has 10 animals. Are the animals in each group sample or population?
I get this confused: I think they should be population, because we account for all animals in group A, and all animals in group B. But can they also be samples? I mean, we are only using 30 animals in total (10 for each group), not the whole animal that lives on the planet… right? Apologies if this sounds stupid. Hope someone can help me clear this up!

1 Answer

365 Team

Hello, Lal!

We really appreciate your feedback! Now for the question..

Now, we’ve got 3 samples, 10 each. In real life, by the way, we would usually draw a sample of 30 and then randomly divide it in 3. By all means these two scenarios are the same, but nobody is going to perform the sampling 3 times (10 animals each).
Once this is done, we have 3 groups of animals. Ideally, we want each of those 3 samples to be: random and representative.
Let’s explore an example.
Imagine you are trying to cure all animals, and you’ve picked 10 dogs.
That would be highly uninformative.
What about horses, pigs, cows, cats, elephants? Given the same treatment, elephants will definitely react in a different way than dogs.
This points us to two potential problems:
1) Animals may be too different.
That’s why treatments that work on rats don’t always work on people and vice versa.
2) Given that there are lots and lots of different animals, a sample size of 30 (10) won’t be enough (most animals would not be represented).


So let’s simplify to 1 animal – dogs.
You get 30 dogs.
You test A on 10 of them, test B on 10 of them, and monitor the other 10.
These 30 dogs are a sample of the population of all dogs. They are not the population itself.
We are using samples for several important reasons:
1) time efficiency -> it is much easier to draw a sample of 30 dogs, rather than … all dogs (which is also impossible).
2) monetary cost -> it is much cheaper to work with 30 dogs. Think about how many people must be involved to test a treatment on 5000 dogs. Where are you going to do this? Who is going to feed them, take care of them, walk them, etc.
3) safety -> what if treatment A kills 9 out of 10 dogs? We don’t want this to happen to thousands of dogs. That’s why we tested the treatment on only a sample and not the whole population.
Let’s think about a case where it may have been the population.
Imagine that there is a village which has some unknown disease which is killing the dogs there. We find that there are only 30 dogs which are contaminated.
We take them away from the others so they don’t spread the virus and decide to find a cure. We proceed as you indicated in the question.
This is the population of dogs which have this disease.
If we take 10 dogs out of the 30, it would be a sample.
All 3 samples taken together make up the population.
Hope this helps!
365 Team


Complete Data Science Education
Get 50% OFF