Posted on:

14 Dec 2020

0

# Code for Multivariate P-values

Hello!
I have run into a problem with the given code for calculating multivariate p-values in the Machine Learning Course. I have downloaded and directly copied and pasted the code from linked course. Everything runs, and I get an output for all the p-values.

However, when I did the Multiple Linear Regression - Exercise (using sklearn), I get very different p-values than what is suggested in the solutions. In fact, I also get very different answers than what is suggested using the same data set in statsmodels.api or in R.

I think that there may be a problem with the calculation of the Standard Error or the SSE in the downloadable python code (sklearn - How to properly include p-values.ipynb). I don't know a whole lot about python (or linear algebra honestly) but when I do the same analysis in R I get a very different Standard error for the coefficients.

Maybe I am doing something wrong, but I don't have any idea what it might be.

Any help would be appreciated. If I didn't explain well enough, please let me know!

Minimal reproducible example:

The code for the python is all downloadable from your course. But if we look at the SE, t-values, and p-values, I got:

print(reg_with_pvalues.se)

`[[12.34242586  5.53812016]]`

print(reg_with_pvalues.t)

[[ 18.44863049 526.67425796]]

print(reg_with_pvalues.p)

[0. 0.]

In R:

y <- data\$price
x1 <- data\$size
x2 <- data\$year
model <- lm(y~x1+x2)

Coefficients:

Estimate    Std. Error    t value  Pr(>|t|)

(Intercept) -5.772e+06 1.583e+06  -3.647  0.000429

x1              2.277e+02 1.247e+01   18.254  < 2e-16

x2              2.917e+03 7.859e+02    3.711  0.000344

All of these values are also the same using statsmodels.api.