The 365 Data Science team is proud to invite you to our own community forum. A very well built system to support your queries, questions and give the chance to show your knowledge and help others in their path of becoming Data Science specialists.
Ask
Anybody can ask a question
Answer
Anybody can answer
Vote
The best answers are voted up and moderated by our team

Question about Scaler.fit(x) and Scaler.transform()

Question about Scaler.fit(x) and Scaler.transform()

1
Vote
1
Answer

In the lecture, we talked about using a scaler to hold an empty StandardScaler object. I am a bit confused about the relationship between StandardScaler() and transform(). 
So this is what my understanding of the material so far, please let me know if I got the wrong concept:
We use StandardScaler() module to standardize the features data (x)
   –> Because we need std dev. and mean for standardization, we need to use fit() to fit the data to the StandardScaler() module for the calculation of mean and std dev.
      –> StandardScaler.transform(x) is the actual line to transform/standardize from the original data based on the mean and std dev we get from the scaler.fit(x)
If my understanding is correct, does that mean:
When we got new data, we need to fit the x to the scalar again, so that we can proceed to the transform() line?
I am kinda confused by the last part of the video starting from 5:10, when the narrator talks about getting new data.
 
Thank you for answering!

1 Answer

365 Team

Hi Kam,
to answer your question, when you have new data you’ll just need to use transform on it.

  1. We fit the data, so we determine the mean and standard deviation for the variables.
  2. We transform our data, that is we subtract the mean and divide by the standard deviation. So we ‘standardize‘ our data.
  3. When new data comes in, we transform it (or ‘standardize’ it) with the standard scaler we’ve already got.

You can think of it similarly to fitting a new model. We have train data where we fit the model and determine the coefficients. Then, we transform our data. (There a few additional steps here, like testing, cross-validation, etc.)
Now, whenever we obtain new data, we won’t fit the model again, as we’ve already trained it. All we need is to use our already trained model and use it to transform the new data.
Hope this makes things a bit clearer! And this is a confusing topic, especially if you’re seeing it for the first time. So, don’t worry if takes a little time to settle in. 🙂

Best,
Eli