Clarification on the interpretation of the shapes of y and X
In the 2.9 notebook for this lesson, the specification for the class LinearRegression says the following:
class LinearRegression(linear_model.LinearRegression):
"""
LinearRegression class after sklearn's, but calculate t-statistics
and p-values for model coefficients (betas).
Additional attributes available after .fit()
are
tand
p which are of the shape (y.shape[1], X.shape[1])
which is (n_features, n_coefs)
This class sets the intercept to 0 by default, since usually we include it
in X.
"""
I don't understand why it says that y.shape[1] = n_features and X.shape[1] = n_coefs?
First of all, I thought that y.shape is just a one-dimensional vector (where the length of y.shape would equal the # of training examples aka # of samples, not the # of features). Please correct me if I'm wrong about that.
Moreover, when I run print(y.shape[1])
in my notebook, it gives me IndexError: tuple index out of range
, so I don't understand why the specification says that y.shape[1] = n_features (since it seems that y.shape[1] doesn't even exist) ?
Second of all, isn't it true that the # of features = # of coefficients ? In our case, X.shape[1] = 2, which I understand as the # of coefficients in our linear regression model, which is equal to the # of features in our model (i.e. 'SAT' and 'Rand 1,2,3'). Am I understanding this correctly?
Thank you! Any help would be appreicated!