Last answered:

30 Jan 2023

Posted on:

15 Oct 2022

0

Why did you scale before train test split?

Why did you use StandardScaler before train test split? Isn't doing this a form of data leakage because the train data was fit and transformed using enough on the test (unseen) data?

2 answers ( 0 marked as helpful)
Instructor
Posted on:

20 Oct 2022

0

Hi Garrett!

Thanks for reaching out.

Can you please clarify what you mean by saying "using enough"?
In general, we want to normalize or standardize (which of the two we'll use depends on whether the data contains a lot or not too many outliers, respectively) the data before proceeding with further statistical/analytical steps.
At the same time, please keep in mind that we use .fit() to compute the mean and variance of each feature. Only then can we use .transform(), which is the method we use to standardize the data.

Hope this helps.
Kind regards,
Martin

Posted on:

30 Jan 2023

0

Hii, why just using 2 sets and not three to get a more accurate model, like train, validation, and a test set. I understand you use the function random_state, but it makes a model more accurate with three sets?

Submit an answer