Why did you scale before train test split?
Why did you use StandardScaler before train test split? Isn't doing this a form of data leakage because the train data was fit and transformed using enough on the test (unseen) data?
Thanks for reaching out.
Can you please clarify what you mean by saying "using enough"?
In general, we want to normalize or standardize (which of the two we'll use depends on whether the data contains a lot or not too many outliers, respectively) the data before proceeding with further statistical/analytical steps.
At the same time, please keep in mind that we use
.fit() to compute the mean and variance of each feature. Only then can we use
.transform(), which is the method we use to standardize the data.
Hope this helps.