The 365 Data Science team is proud to invite you to our own community forum. A very well built system to support your queries, questions and give the chance to show your knowledge and help others in their path of becoming Data Science specialists.
Anybody can ask a question
Anybody can answer
The best answers are voted up and moderated by our team

Techniques and intuition for binning continuous variables (fine classing)

Techniques and intuition for binning continuous variables (fine classing)

Super Learner

I was wondering how the bin size will effect the model’s performance. At the beginning of the video it states, “you can visualize everything with a hundred or fewer categories and still make sense of it without getting lost…” Then 50 is chosen as the bin size when using pandas’ cut method. Let’s say I used Sturge’s rule, this is a rule used for binning when creating a histogram, I would end up with around 43 bins rather than the chosen 50. I was wondering

  1. how it was decided to use 50 for cutting?
  2. What sort of an impact would a bin size of 43 have compared to a bin size of 50? I am sure it doesn’t matter much (assuming) but I was trying to get an intuition of how the bin size would effect the accuracy of the model. 


No answers so far.