Resolved: VIF vs correlation in multicolinearity
Dear Team,
I understand that these two methods measure different things but can they be used interchangeably determining multicollinearity?
which one is more reliable?
VIF has troubles with dummies while pd.Dataframe.corr() doesnt.
thanks,
peter
Hey Peter,
Thank you for the good question!
You can definitely apply both approaches to search for possible collinearity between the features in a dataset. The correlation matrix gives you the pairwise correlation between predictors. On the other hand, collinearity between more than two predictors is also possible (and often the case). What is worse, one can have a situation where more than two features are correlated and, at the same time, have no high correlation pairwise - hence, multicollinearity. That is where the VIF comes in handy.
I can't really say that correlation matrices can replace VIFs and vice versa. Rather, these two approaches complement each other. Nevertheless, VIFs would most often be used when searching for possible multicollinearity in the data.
There is some excellent literature one can study on the topic. A good starting point is An Introduction to Statistical Learning by Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani. You could also check A Handbook of Statistical Analyses using SPSS by Landau and Everitt.
Hope this answers your question, at least to some extend :)
Kind regards,
365 Hristina
Thank you for your answer, i checked the net before, but your answer was the one that cleared the haze up.
If i may roll the question further, i open another thread about control variables.