Resolved: No Multicollinearity and No endogeneity!
Hi. As you explained in the 'No endogeneity' when we don't explain (include) a variable, it goes to the error term and 'No endogeneity' means no correlation between error term and the variables. On the other hand, No multicollinearity encourge us to use only one of the variables when two variables are highly correlated. So one of them will reject the other one! Am i missing something here?
Thanks for reaching out and for your observation. Let's clarify the interplay between these two OLS assumptions.
The "no endogeneity" assumption states that there shouldn't be any correlation between the error term and the explanatory variables. Violation of this assumption could indicate omitted variable bias, i.e., a relevant variable is left out of the model.
Multicollinearity, on the other hand, arises when two or more explanatory variables in the model are highly correlated, meaning one can linearly predict the other with some degree of accuracy. The "no multicollinearity" condition, therefore, asusmes the absence of such correlated predictors.
As I understand, the conflict you're observing is this: on one hand, we're concerned about omitting variables (leading to endogeneity) but on the other hand, we're cautioned not to include variables that are highly correlated.
The key is balance.
In practice, if two variables are highly correlated, you might consider including only one. But you should be confident the one you're omitting isn't directly affecting the outcome (in which case, its effect will go into the error term, and you're back to the endogeneity issue).
Another approach is to use techniques like principal component analysis (PCA) or regularization methods that can handle multicollinearity. You could also derive new variables that represent the shared information of the correlated ones. You might therefore be interested in our course on the topic - Linear Algebra and Feature Selection:
In essence, the goal is to make sure you're capturing all relevant information without introducing the issues that come with correlated predictors.
Hope this helps!