I was wondering how the bin size will effect the model’s performance. At the beginning of the video it states, “you can visualize everything with a hundred or fewer categories and still make sense of it without getting lost…” Then 50 is chosen as the bin size when using pandas’ cut method. Let’s say I used Sturge’s rule, this is a rule used for binning when creating a histogram, I would end up with around 43 bins rather than the chosen 50. I was wondering
- how it was decided to use 50 for cutting?
- What sort of an impact would a bin size of 43 have compared to a bin size of 50? I am sure it doesn’t matter much (assuming) but I was trying to get an intuition of how the bin size would effect the accuracy of the model.