Properly including p values with sklearn using real estate data produces unexpected results

Question

Hello,
I am doing the multiple linear regression exercise in the Machine Learning with Python module https://learn.365datascience.com/courses/machine-learning-in-python/multiple-linear-regression This exercise uses the Real Estate dataset provided which contains price, size and year variables and the sklearn package. As well as the univariate p values, I thought I would also calculate the p values properly with sklearn copying the example given earlier in the course (which used the SATS dataset): https://learn.365datascience.com/courses/machine-learning-in-python/a-note-on-calculation-of-p-values-with-sklearn Unfortunately, the "proper" p values come out as zero while the univariate ones come out as I would expect:
reg_with_pvalues.p. gives.

array([0., 0.])

but the univariate p values come out as
p_values.round(3)

array([0.   , 0.357])

The other "proper" regression values (Intercept, coefficients and R squared) come out as expected and match the solution given but the p values do not.

Could you give me some advice on troubleshooting this issue?

Thanks!

Answer 1

Hi Simon, Thanks for reaching out. In fact, there is no way to go around "this issue". Univariate p-values by default would give different results than the multivariate ones. Note that multivariate p-values are preferable in all situations. If you are dealing with a simple linear regression though, univariate will be equal to the multivariate (as there is a single variable). Best, The 365 Team

Properly including p values with sklearn using real estate data produces unexpected results

Submit an answer