Resolved: Dendrogram when categorical variable has many categories
In the dendrogram example, we see just 6 countries, so the dendrogram is easily readable and we can understand it well. But what if we had more countries, say 50-60 or more? The dendrogram will become huge and the labels will be smushed together. What do we do in such a situation? What would be the best approach?
Thank you in advance!
Thank you for your question!
Indeed, just as mentioned at the end of the video, scalability is a big problem for dendrograms. It is reasonable to use them for a small number of observations. For a larger number, K-means is certainly the better approach. Another possibility is to combine the dendrogram with a heatmap, as done in the following video.
Hope this helps!
Thank you for your answer. I was also wondering whether it is possible to just show certain parts of the dendrogram when it is very large, instead of visualizing the whole thing?
Hm, I'm not sure if there's a straightforward way of showing parts of the diagram, apart from a trivial zooming-in on the figure. For an easier zoom-in, you can use the magic command
before the plotting, so that you have it opened in a new window. To go back to inline display, type
Going back to your question, I've gone through the seaborn library of the method and didn't find any parameter or method that allows for a cleaner visualization. I suppose the only way is to manually cut the relevant parts of your dataset and put only them in your dendrogram. Or maybe there is another way I am not aware of :)