Resolved: knn and outliers

Question

how is knn is affected by outliers, outliers should be far away from other points, so is their distance, so the algorithm should naturally ignore them when making a decision since these points will never make it to the top shortest distance. it might be computationally costly but how exactly it is affecting decisions

Answer 1

Hey Doaa,

Thank you for reaching out!

KNN is a distance-based algorithm, and while it's generally true that outliers are far away from the majority of data points, their presence can still impact KNN in certain ways.

For example, outliers can influence the decision boundary. In classification problems, KNN assigns the class based on the majority of the nearest neighbors. If an outlier is close enough (in terms of distance) to a query point and happens to belong to a different class, it might affect the classification result.

In datasets where the data is sparse or the distance between clusters is large, outliers might still end up being close to some data points from other classes. In these cases, outliers can influence classification results, especially in datasets with complex geometries or overlapping clusters.

And finally, as you pointed out, while outliers may not directly affect the decision if they are far from most points, KNN still computes distances between the query point and every point in the dataset. This indeed increases the computational load.

One can apply several strategies to reduce the impact of outliers on KNN. Such are increasing the value of K, assigning higher weights to closer neighbours, or preprocess the data to remove outliers in the data using statistical methods. This will ensure better and faster algorithm performance.

Hope this helps!

Kind regards,
365 Hristina

Resolved: knn and outliers

Submit an answer

related questions