Last answered:

30 Nov 2020

Posted on:

19 Nov 2020


Properly including p values with sklearn using real estate data produces unexpected results

I am doing the multiple linear regression exercise in the Machine Learning with Python module This exercise uses the Real Estate dataset provided which contains price, size and year variables and the sklearn package.  As well as the univariate p values, I thought I would also calculate the p values properly with sklearn copying the example given earlier in the course (which used the SATS dataset): Unfortunately, the "proper" p values come out as zero while the univariate ones come out as I would expect:
reg_with_pvalues.p. gives.

array([0., 0.])

but the univariate p values come out as
array([0.   , 0.357])
The other "proper" regression values (Intercept, coefficients and R squared) come out as expected and match the solution given but the p values do not.

Could you give me some advice on troubleshooting this issue?


1 answers ( 0 marked as helpful)
Posted on:

30 Nov 2020

Hi Simon, Thanks for reaching out. In fact, there is no way to go around "this issue". Univariate p-values by default would give different results than the multivariate ones.  Note that multivariate p-values are preferable in all situations. If you are dealing with a simple linear regression though, univariate will be equal to the multivariate (as there is a single variable). Best, The 365 Team

Submit an answer