Last answered:

15 Feb 2022

Posted on:

14 Feb 2022

1

Resolved: VIF vs correlation in multicolinearity

Dear Team,

I understand that these two methods measure different things but can they be used interchangeably determining multicollinearity?

which one is more reliable?

VIF has troubles with dummies while pd.Dataframe.corr() doesnt.

thanks,
peter

2 answers ( 1 marked as helpful)
Instructor
Posted on:

15 Feb 2022

1

Hey Peter,

Thank you for the good question!

You can definitely apply both approaches to search for possible collinearity between the features in a dataset. The correlation matrix gives you the pairwise correlation between predictors. On the other hand, collinearity between more than two predictors is also possible (and often the case). What is worse, one can have a situation where more than two features are correlated and, at the same time, have no high correlation pairwise - hence, multicollinearity. That is where the VIF comes in handy.

I can't really say that correlation matrices can replace VIFs and vice versa. Rather, these two approaches complement each other. Nevertheless, VIFs would most often be used when searching for possible multicollinearity in the data.

There is some excellent literature one can study on the topic. A good starting point is An Introduction to Statistical Learning by Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani. You could also check A Handbook of Statistical Analyses using SPSS by Landau and Everitt.

Hope this answers your question, at least to some extend :)

Kind regards,
365 Hristina

Posted on:

15 Feb 2022

0

Thank you for your answer, i checked the net before, but your answer was the one that cleared the haze up.

If i may roll the question further, i open another thread about control variables.

Submit an answer