Is it wise to run PCA on all segmentation data-sets before trying to cluster it first?
Hi Joseph,
one of the main uses of PCA is to reduce the dimensions of the data set. That’s especially worthwhile when there are a lot of features in the data, and fewer data points. This leads to a phenomenon, known as the ‘curse of dimensionality’, which can be avoided by employing techniques such as PCA. That’s why we perform PCA before clustering.
Best,
Eli
Thank you 😀
Reference:”How to Combine PCA and K-means in Python?”
Why does the dimensionality of the matrix get reduced to a 4 X W matrix for the output in the Principal Component Data frame? Are the rows in my original data being dropped?
-Jade