Resolved: Adjusted R - Square
So, according to this lecture, should we accept the model with additional variable only when the adjusted R-Square value is same and the R-Square value increases?
Thanks for reaching out.
Let's first recap on what each of the two metric represents:
- R²: Measures the proportion of variance in the dependent variable that can be explained by the independent variables in the model. It's a measure of the model's goodness of fit. A higher R² indicates that more variance is explained by the model. However, adding more variables to a model will always increase R², even if those variables are not truly related to the response variable. This is because with each additional variable, you are adding more information (or noise), and hence the model will explain more variance, by default.
- Adjusted R²: Adjusts the R² value based on the number of predictors in the model. It penalizes the addition of variables that do not improve the model significantly. This makes it a more robust metric for model comparison, especially when models have a different number of predictors.
To answer your question, a model should not be accepted based solely on an increase in R² when a new variable is added. This is because R² will always increase (or at least stay the same) with the addition of new variables, regardless of whether they are meaningful or not. The adjusted R² is more indicative in this context. If the adjusted R² increases with the addition of a new variable, it suggests that the new variable is contributing useful information that improves the model beyond what is expected by chance.
Of course, when deciding whether to include a new vairable in the analysis, it's important to consider also:
- its statistical significance,
- its domain relevance (the variable should make sense in the context of the problem you're trying to solve),
- the increasing complexity of the model (be wary of overfitting),
- and the model's cross-validation performance (look at how the model performs on unseen data, not just on the data used to fit the model).