Resolved: Splitting Training, Validation and Testing Data
Hello.
I will like to ask how advisable to use sklearn's
train_test_split()
method to split a dataset into Training, Validation and Testing by running the train_test_split() method twice with adjusted test_size percentages to get the respective chunks of the dataset like so?
from sklearn.model_selection import train_test_split
train_inputs_temp, test_imputs, train_targets_temp, test_targets = train_test_split(shuffled_scaled_inputs, shuffled_targets, test_size=0.1, random_state=365) # Get the test_imputs and test_targets
train_inputs, validation_imputs, train_targets, validation_targets = train_test_split(train_inputs_temp, train_targets_temp, test_size=0.1, random_state=365) # Get the train_inputs & validation_imputs and train_targets & validation_targets
I have used train_test_split() and the splitting method explained in the video and have almost the same results
Values obtained using the splitting method explained in the video:
710.0 3579 0.19837943559653534
84.0 447 0.18791946308724833
93.0 1342 0.06929955290611028
Values obtained using train_test_split():
706.0 3579 0.19726180497345627
87.0 447 0.19463087248322147
94.0 1342 0.07004470938897168
Is it advisable to use train_test_split()as shown above to split data for Deep Learning ?
Hi Vardama,
Thanks for reaching out!
It is perfectly fine using train_test_split()
twice to split your data into train validation and test set. Good job on thinking of it and sharing it with us here in the q&a!
Best,
Lyubo