Super learner
This user is a Super Learner. To become a Super Learner, you need to reach Level 8.
Last answered:

16 Jun 2025

Posted on:

29 Jan 2025

0

Resolved: performance test

random forest has it's own build in cross validation,  with that remaining 37% data left. so why should we need to perform another testing.
1 answers ( 1 marked as helpful)
Instructor
Posted on:

16 Jun 2025

0
Hi,

You are correct in that random forests have in built cross validation, however, cross-validation and the test dataset have different purposes in mind.

Cross validation is used during training to prevent overfitting and during hyperparameter optimization to evaluate the different configurations. Because of this, it can't be used as the final testing benchmark, since the model has already seen the data, even if not trained on it. You may say that we, as the data scientists, are the ones overfitting on that data.

That's why the testing set exists - to have a final dataset that was truly never seen. We use this dataset to give us a final benchmark on the accuracy (or any other metrics) of our model.

Hope this answers your question.

Best,
Nikola, 365 Team

Submit an answer