I am doing the multiple linear regression exercise in the Machine Learning with Python module
This exercise uses the Real Estate dataset provided which contains price, size and year variables and the sklearn package. As well as the univariate p values, I thought I would also calculate the p values properly with sklearn copying the example given earlier in the course (which used the SATS dataset):
Unfortunately, the “proper” p values come out as zero while the univariate ones come out as I would expect:
but the univariate p values come out as
array([0. , 0.357])
The other "proper" regression values (Intercept, coefficients and R squared) come out as expected and match the solution given but the p values do not.
Could you give me some advice on troubleshooting this issue?
Thanks for reaching out.
In fact, there is no way to go around “this issue”. Univariate p-values by default would give different results than the multivariate ones.
Note that multivariate p-values are preferable in all situations. If you are dealing with a simple linear regression though, univariate will be equal to the multivariate (as there is a single variable).
The 365 Team