Posted on:

29 Jun 2021

0

Clarification on the interpretation of the shapes of y and X

In the 2.9 notebook for this lesson, the specification for the class LinearRegression says the following:
class LinearRegression(linear_model.LinearRegression):
    """
    LinearRegression class after sklearn's, but calculate t-statistics
    and p-values for model coefficients (betas).
    Additional attributes available after .fit()
    are tandp which are of the shape (y.shape[1], X.shape[1])
    which is (n_features, n_coefs)
    This class sets the intercept to 0 by default, since usually we include it
    in X.
    """

I don't understand why it says that y.shape[1] = n_features and X.shape[1] = n_coefs?

First of all, I thought that y.shape is just a one-dimensional vector (where the length of y.shape would equal the # of training examples aka # of samples, not the # of features). Please correct me if I'm wrong about that.
Moreover, when I run print(y.shape[1]) in my notebook, it gives me IndexError: tuple index out of range, so I don't understand why the specification says that y.shape[1] = n_features (since it seems that y.shape[1] doesn't even exist) ?

Second of all, isn't it true that the # of features = # of coefficients ? In our case, X.shape[1] = 2, which I understand as the # of coefficients in our linear regression model, which is equal to the # of features in our model (i.e. 'SAT' and 'Rand 1,2,3'). Am I understanding this correctly?

Thank you! Any help would be appreicated!

0 answers ( 0 marked as helpful)

Submit an answer