Posted on:

13 Dec 2023

0

Oversampling beofre cross validation

If you over-sample before CV like we do manually in the notebook, we get a decent accuracy/F1 score. However, based on what I have read and tested, this approach leads to data leakage and overfitting. Instead, the over-sampling should be included in a pipeline that can be fed into the cross_validation_score so that over-sampling occurs as part of the model training. 


I did this and got a terrible accuracy/F1 score which I found out was explained by the major overfitting occurring in the notebook.


0 answers ( 0 marked as helpful)

Submit an answer