Last answered:

29 Sept 2022

Posted on:

28 Sept 2022

0

Do seeds start always in random points?

Good morning. I was thinking that, if seeds start always in random points, clustering could fail. For instance: if we have some points with two features, latitudes and longitudes, and we want for example two clusters, it could happen that all the points are close to the random position of seed 1 from the beginning, and no point is close to the random position of seed 2. So the iteration is just 1 and in this case clustering fails. And if starting by randomly putting the seeds is not an effective technique, how can we choose the right starting positions of the seeds instead?
I also think that we should define a particular limited area (if we have two features, otherwise a volume if we have three features and so on) where the seeds can be put.
Thank you

2 answers ( 0 marked as helpful)
Instructor
Posted on:

29 Sept 2022

0

HI Alessandro
thanks for reaching out!
On the question of random seeds, let's be reminded that only the initial seeds are at random points, and which iteration the seeds change to signify the center of each class. So, even though we could have randomness at the beginning of the algorithm, with each iteration, we' become closer and closer to determining the centers of the classes.
Hope this helps!

Best,
365 Eli

Posted on:

29 Sept 2022

0

Good morning. The problem I said is that starting with random seeds is not always a good way to cluster. There are particular cases where the seeds may also change position for each iteration, but the clustering fails, as in the example I said above. If for instance seed 2 is too far from the points from the beginning, it could happen that all the points take the color of seed 1, even if there are two very distinct clusters. Anyway, in the meanwhile I have already found out the solution to my problem on Wikipedia (https://en.wikipedia.org/wiki/K-means%2B%2B): the method I was searching for is "k-means++". If you read that page, you will better understand why k-means is not always a good procedure and what I mean (read the "Example of sub-optimal clustering" for instance, although it's something different from my example; they are both solved by the k-means++ method).
Thank you

Submit an answer