Oversampling beofre cross validation

in The Machine Learning Process A-Z / The ML modelling process basics - Coding portion

If you over-sample before CV like we do manually in the notebook, we get a decent accuracy/F1 score. However, based on what I have read and tested, this approach leads to data leakage and overfitting. Instead, the over-sampling should be included in a pipeline that can be fed into the cross_validation_score so that over-sampling occurs as part of the model training.

I did this and got a terrible accuracy/F1 score which I found out was explained by the major overfitting occurring in the notebook.

0 answers ( 0 marked as helpful)

Oversampling beofre cross validation

Submit an answer

related questions