Oversampling beofre cross validation
If you over-sample before CV like we do manually in the notebook, we get a decent accuracy/F1 score. However, based on what I have read and tested, this approach leads to data leakage and overfitting. Instead, the over-sampling should be included in a pipeline that can be fed into the cross_validation_score so that over-sampling occurs as part of the model training.
I did this and got a terrible accuracy/F1 score which I found out was explained by the major overfitting occurring in the notebook.

0 answers ( 0 marked as helpful)