Adjusted R- Score is very low, How to best fit using the Linear Regression Model
In below Regression results, the R-Squared is too low but p value for some explanatory variables are significant, Is something to be worried about , what steps to be followed for best fit of model by improving R-squared
OLS Regression Results
==============================================================================
Dep. Variable: Sales R-squared: 0.003
Model: OLS Adj. R-squared: 0.003
Method: Least Squares F-statistic: 40.25
Date: Tue, 26 Oct 2021 Prob (F-statistic): 5.93e-57
Time: 17:28:47 Log-Likelihood: -7.5394e+05
No. Observations: 87864 AIC: 1.508e+06
Df Residuals: 87856 BIC: 1.508e+06
Df Model: 7
Covariance Type: nonrobust
========================================================================================
coef std err t P>|t| [0.025 0.975]
----------------------------------------------------------------------------------------
const 1.296e+04 1334.938 9.706 0.000 1.03e+04 1.56e+04
Item_W 0.5884 1.000 0.588 0.556 -1.372 2.549
Item_Type -1.2032 0.930 -1.293 0.196 -3.027 0.620
Item_MRP 0.2944 0.073 4.027 0.000 0.151 0.438
Outlet_Year -5.5148 0.668 -8.260 0.000 -6.823 -4.206
Outlet_Size 20.0983 6.631 3.031 0.002 7.101 33.096
Outlet_Location_Type -69.7235 5.810 -12.001 0.000 -81.111 -58.336
Outlet_ID -3.5718 3.764 -0.949 0.343 -10.950 3.806
==============================================================================
Omnibus: 14857.798 Durbin-Watson: 1.996
Prob(Omnibus): 0.000 Jarque-Bera (JB): 24973.048
Skew: 1.128 Prob(JB): 0.00
Kurtosis: 4.317 Cond. No. 6.16e+05
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 6.16e+05. This might indicate that there are
strong multicollinearity or other numerical problems.
1 answers ( 0 marked as helpful)
Hey Aadhavan,
Thank you for your question!
What I can see from your screenshot is that the condition number at the bottom is very big which, as the warning informs, suggests multicollinearity i.e., two or more of your variables are highly correlated. Therefore, you should most likely revisit your dataset.
More on this issue and possible solutions you can find in our video regarding multicollinearity, which is a part of the Machine Learning in Python course.
Hope this helps!
Kind regards,
365 Hristina