Resolved: VIF vs correlation in multicolinearity
I understand that these two methods measure different things but can they be used interchangeably determining multicollinearity?
which one is more reliable?
VIF has troubles with dummies while pd.Dataframe.corr() doesnt.
Thank you for the good question!
You can definitely apply both approaches to search for possible collinearity between the features in a dataset. The correlation matrix gives you the pairwise correlation between predictors. On the other hand, collinearity between more than two predictors is also possible (and often the case). What is worse, one can have a situation where more than two features are correlated and, at the same time, have no high correlation pairwise - hence, multicollinearity. That is where the VIF comes in handy.
I can't really say that correlation matrices can replace VIFs and vice versa. Rather, these two approaches complement each other. Nevertheless, VIFs would most often be used when searching for possible multicollinearity in the data.
There is some excellent literature one can study on the topic. A good starting point is An Introduction to Statistical Learning by Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani. You could also check A Handbook of Statistical Analyses using SPSS by Landau and Everitt.
Hope this answers your question, at least to some extend :)