Resolved: Standardising the features
Previously we used sklearn's StandardScaler module to standardise our features. In this lesson we used sklearn's Preprocessing module to standardise our features. Is there a benefit to using one over the other or is it purely preference?
Hey Ryan,
Thanks for reaching out.
The sklearn.preprocessing.StandardScaler class and the sklearn.preprocessing.scale function in sklearn
both serve the purpose of standardizing features by removing the mean and scaling to unit variance. However, they differ in how they are used and integrated in the code.
StandardScaler
is a class that implements the Transformer
API - it fits to data and then transforms it. A StandardScaler
object, therefore, comes with fit()
, transform()
, and fit_transform()
methods. Among other benefits, instances of the StandardScaler
class can be easily saved and loaded, allowing the scaling parameters to be reused.
On the other hand, scale()
is a function that performs the standardization by directly computing the mean and standard deviation and applying the transformation. It does not maintain any internal state and, therefore, doesn't remember any state from the data it's applied to. This means each time it's used, it needs to calculate the mean and standard deviation from the given data. It does not have the fit()
, transform()
, and fit_transform()
methods of a transformer class. It's useful for quick and straightforward scaling without the need to fit a scaler object, especially useful in simple scripts (as the one used in the lecture).
Hope this helps!
Kind regards,
365 Hristina