Super learner
This user is a Super Learner. To become a Super Learner, you need to reach Level 8.
Last answered:

18 Apr 2022

Posted on:

15 Apr 2022

0

Resolved: Dendrogram when categorical variable has many categories

Hello,

In the dendrogram example, we see just 6 countries, so the dendrogram is easily readable and we can understand it well. But what if we had more countries, say 50-60 or more? The dendrogram will become huge and the labels will be smushed together. What do we do in such a situation? What would be the best approach?

Thank you in advance!

Kind regards,
Desislava Hristova

4 answers ( 2 marked as helpful)
Instructor
Posted on:

18 Apr 2022

0

Hey Desislava,

Thank you for your question!

Indeed, just as mentioned at the end of the video, scalability is a big problem for dendrograms. It is reasonable to use them for a small number of observations. For a larger number, K-means is certainly the better approach. Another possibility is to combine the dendrogram with a heatmap, as done in the following video.

Hope this helps!

Kind regards,
365 Hristina

Super learner
This user is a Super Learner. To become a Super Learner, you need to reach Level 8.
Posted on:

18 Apr 2022

0

Hello Hristina,

Thank you for your answer. I was also wondering whether it is possible to just show certain parts of the dendrogram when it is very large, instead of visualizing the whole thing?

Kind regards,
Desislava Hristova

Instructor
Posted on:

18 Apr 2022

0

Hey,

Hm, I'm not sure if there's a straightforward way of showing parts of the diagram, apart from a trivial zooming-in on the figure. For an easier zoom-in, you can use the magic command

%matplotlib

before the plotting, so that you have it opened in a new window. To go back to inline display, type

%matplotlib inline

before plotting.

Going back to your question, I've gone through the seaborn library of the method and didn't find any parameter or method that allows for a cleaner visualization. I suppose the only way is to manually cut the relevant parts of your dataset and put only them in your dendrogram. Or maybe there is another way I am not aware of :)

Kind regards,
365 Hristina

Super learner
This user is a Super Learner. To become a Super Learner, you need to reach Level 8.
Posted on:

18 Apr 2022

0

Hello,

Thank you so much again for the detailed answer. It was very helpful.

Hope you have a great day!

Kind regards,
Desislava Hristova

Submit an answer