Last answered:

18 Apr 2021

Posted on:

17 Apr 2021


Resolved: Splitting Training, Validation and Testing Data

I will like to ask how advisable to use sklearn's
 method to split a dataset into Training, Validation and Testing by running the train_test_split() method twice with adjusted test_size percentages to get the respective chunks of the dataset like so?

from sklearn.model_selection import train_test_split

train_inputs_temp, test_imputs, train_targets_temp, test_targets = train_test_split(shuffled_scaled_inputs, shuffled_targets, test_size=0.1, random_state=365) # Get the test_imputs and test_targets

train_inputs, validation_imputs, train_targets, validation_targets = train_test_split(train_inputs_temp, train_targets_temp, test_size=0.1, random_state=365) # Get the train_inputs & validation_imputs and train_targets & validation_targets

I have used train_test_split() and the splitting method explained in the video and have almost the same results

Values obtained using the splitting method explained in the video:

710.0 3579 0.19837943559653534

84.0 447 0.18791946308724833

93.0 1342 0.06929955290611028

Values obtained using train_test_split():

706.0 3579 0.19726180497345627

87.0 447 0.19463087248322147

94.0 1342 0.07004470938897168

Is it advisable to use train_test_split()as shown above to split data for Deep Learning ?

1 answers ( 1 marked as helpful)
Posted on:

18 Apr 2021


Hi Vardama,
Thanks for reaching out!
It is perfectly fine using train_test_split() twice to split your data into train validation and test set. Good job on thinking of it and sharing it with us here in the q&a!

Submit an answer