🛠️ Scheduled Maintenance | We’ll be undergoing scheduled maintenance and upgrades between 00:00 PST Jan 26th until 00:00 PST Jan 28th. There may be brief interruption of services in that period. We apologize for the inconvenience.

×
The 365 Data Science team is proud to invite you to our own community forum. A very well built system to support your queries, questions and give the chance to show your knowledge and help others in their path of becoming Data Science specialists.
Ask
Anybody can ask a question
Answer
Anybody can answer
Vote
The best answers are voted up and moderated by our team

Code for Multivariate P-values

Code for Multivariate P-values

0
Votes
0
Answer

Hello!
I have run into a problem with the given code for calculating multivariate p-values in the Machine Learning Course. I have downloaded and directly copied and pasted the code from linked course. Everything runs, and I get an output for all the p-values.

However, when I did the Multiple Linear Regression – Exercise (using sklearn), I get very different p-values than what is suggested in the solutions. In fact, I also get very different answers than what is suggested using the same data set in statsmodels.api or in R.

I think that there may be a problem with the calculation of the Standard Error or the SSE in the downloadable python code (sklearn – How to properly include p-values.ipynb). I don’t know a whole lot about python (or linear algebra honestly) but when I do the same analysis in R I get a very different Standard error for the coefficients.

Maybe I am doing something wrong, but I don’t have any idea what it might be.

Any help would be appreciated. If I didn’t explain well enough, please let me know!

 

Minimal reproducible example:

The code for the python is all downloadable from your course. But if we look at the SE, t-values, and p-values, I got:

print(reg_with_pvalues.se)

[[12.34242586  5.53812016]]

print(reg_with_pvalues.t)

[[ 18.44863049 526.67425796]]

print(reg_with_pvalues.p)

[0. 0.]

 

In R:

library(readr)
data <- read_csv(‘real_estate_price_size_year.csv’)
y <- data$price
x1 <- data$size
x2 <- data$year
model <- lm(y~x1+x2)

Coefficients:

Estimate    Std. Error    t value  Pr(>|t|)

(Intercept) -5.772e+06 1.583e+06  -3.647  0.000429

x1              2.277e+02 1.247e+01   18.254  < 2e-16

x2              2.917e+03 7.859e+02    3.711  0.000344

All of these values are also the same using statsmodels.api.

No answers so far.