In the customer analytics course we perform segmentation on customers entering a FMCG store. Here, when we run df.describe(), I observed that the education data is skewed to then left. If I drop this column will it make my segmentation clearer?
Thanks for reaching out!
First, we standardize our features before segmenting.
Generally, if we have prior knowledge about our data or a specific feature, and know it is compromised, it could improve our results to remove it. So, you could try segmenting the data without the full set of features and compare the results. We’d certainly be happy to see what you come up with. 🙂
Lastly, because this is unsupervised learning, it’s difficult to define a notion of ‘best algorithm’. Unfortunately, that’s one of the main challenges of unsupervised learning. So, that’s always something to keep in mind.
Thank you 😀