The 365 Data Science team is proud to invite you to our own community forum. A very well built system to support your queries, questions and give the chance to show your knowledge and help others in their path of becoming Data Science specialists.
Ask
Anybody can ask a question
Answer
Anybody can answer
Vote
The best answers are voted up and moderated by our team

Techniques and intuition for binning continuous variables (fine classing)

Techniques and intuition for binning continuous variables (fine classing)

Super Learner
1
Vote
1
Answer

Hello, 
I was wondering how the bin size will effect the model’s performance. At the beginning of the video it states, “you can visualize everything with a hundred or fewer categories and still make sense of it without getting lost…” Then 50 is chosen as the bin size when using pandas’ cut method. Let’s say I used Sturge’s rule, this is a rule used for binning when creating a histogram, I would end up with around 43 bins rather than the chosen 50. I was wondering

  1. how it was decided to use 50 for cutting?
  2. What sort of an impact would a bin size of 43 have compared to a bin size of 50? I am sure it doesn’t matter much (assuming) but I was trying to get an intuition of how the bin size would effect the accuracy of the model. 

Thanks,
John 

1 Answer

365 Team
0
Votes

Hi John, 
thanks for reaching out! The topic of choosing the appropriate number of bins for a Histogram is a complex one. And even though there are advanced techniques, such as Sturges rules, they rarely work so well in practice, as real data is discrete and usually noisy. 
If you’re looking to develop a bit more intuition on the matter, you can check out the chapter on Histogram in the Data Visualization Course. 
There is a lecture specifically dedicated to choosing the appropriate number of bins. Hope you’ll find it instructive:
https://365datascience.teachable.com/courses/1045353/lectures/22114500
 
Best, 
365 Eli

JanuaryPromo
×
Complete Data Science Education
Get 50% OFF