On what bases the quantiles are selected and confident that outliers are removed? Usually when we plot a box blots we can directly observe the outliers.
Thank you for reaching out.
Using percentiles is a statistically sound method, especially in large datasets, to remove extreme values. The 99th percentile is a common choice as it retains the bulk of the data while removing the most extreme values. The confidence in this approach comes from statistical principles, visual assessment, and empirical evaluation based on model performance.
Notably here, I agree that the approach is somewhat heuristic but it serves the purpose of the course, i.e., demonstrating a way of dealing with outliers in the data. You are welcome to perform further analysis and determine if removing data above the 99th percentile results in the most optimal fit.