How can one treat outliers for multiple columns at the same time?
In this course, i have learnt how to treat outliers for a single column. But tried applying the same method for multiple columns, it's either there is an error message or most of the columns in the dataset are filled with NaN. How can one treat outliers for multiple columns successfully?
hey arowosegbe! unfortunately, if you drop outliers for multiple columns, you will end up throwing away a ton of data.
to treat multiple columns, I would recommend winsorizing your data. Basically for each column, you can cap it at the 95th or 99th percentile (or whichever percentile you'd like). This would prevent you from throwing away your data.