Last answered:

26 Oct 2021

Posted on:

26 Oct 2021

0

Adjusted R- Score is very low, How to best fit using the Linear Regression Model

In below Regression results, the R-Squared is too low but p value for some explanatory variables are significant, Is something to be worried about , what steps to be followed for best fit of model by improving R-squared

OLS Regression Results
==============================================================================
Dep. Variable:                  Sales   R-squared:                       0.003
Model:                            OLS   Adj. R-squared:                  0.003
Method:                 Least Squares   F-statistic:                     40.25
Date:                Tue, 26 Oct 2021   Prob (F-statistic):           5.93e-57
Time:                        17:28:47   Log-Likelihood:            -7.5394e+05
No. Observations:               87864   AIC:                         1.508e+06
Df Residuals:                   87856   BIC:                         1.508e+06
Df Model:                           7
Covariance Type:            nonrobust
========================================================================================
                           coef    std err          t      P>|t|      [0.025      0.975]
----------------------------------------------------------------------------------------
const                 1.296e+04   1334.938      9.706      0.000    1.03e+04    1.56e+04
Item_W                   0.5884      1.000      0.588      0.556      -1.372       2.549
Item_Type               -1.2032      0.930     -1.293      0.196      -3.027       0.620
Item_MRP                 0.2944      0.073      4.027      0.000       0.151       0.438
Outlet_Year             -5.5148      0.668     -8.260      0.000      -6.823      -4.206
Outlet_Size             20.0983      6.631      3.031      0.002       7.101      33.096
Outlet_Location_Type   -69.7235      5.810    -12.001      0.000     -81.111     -58.336
Outlet_ID               -3.5718      3.764     -0.949      0.343     -10.950       3.806
==============================================================================
Omnibus:                    14857.798   Durbin-Watson:                   1.996
Prob(Omnibus):                  0.000   Jarque-Bera (JB):            24973.048
Skew:                           1.128   Prob(JB):                         0.00
Kurtosis:                       4.317   Cond. No.                     6.16e+05
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 6.16e+05. This might indicate that there are
strong multicollinearity or other numerical problems.
1 answers ( 0 marked as helpful)
Instructor
Posted on:

26 Oct 2021

0

Hey Aadhavan,

Thank you for your question!

What I can see from your screenshot is that the condition number at the bottom is very big which, as the warning informs, suggests multicollinearity i.e., two or more of your variables are highly correlated. Therefore, you should most likely revisit your dataset.

More on this issue and possible solutions you can find in our video regarding multicollinearity, which is a part of the Machine Learning in Python course.

Hope this helps!

Kind regards,
365 Hristina

Submit an answer