The skewness of the F stat
Hi Chitra,
The F-statistic in regressions is a comparison between our regression model, and a model that has no independent variables.
If you create a regression with no independent variables (so you don't have Xs), just Y = some constant, say Y = 5. You will have some explanatory power still, like.. 0.1% (sometimes you'll get close to the answer, no doubt). If you use a regression for this, the only coefficient will be the intercept and it will be equal to the mean!
So Y = the mean.
If you check your formula for SSR, you will realize that if the prediction is always the mean,
then SSR = 0 (no explanained variability).
***
Now, the F-stat is a ratio.
It is a ratio of our model and the model with no Xs, which had SSR = 0.
So what the F-stat shows us is:
How much better is our current model, than a model that has no explanatory power whatsoever?
We would usually have F-stats >50, or >300, or >2000. And those are all normal.
However, an F-stat = 2, would imply that our model, is just 2 times better than saying: the answer is always the mean. We want something much more dramatic than 2, right?
And that's what the F-stat constitutes of.
***
The F-stat follows the F-distribution, which is:
1) always non-negative (for regressions, it is the ratio of two sum of squares so it is 0 or above)
2) right skewed, because the F-distribution is right-skewed by definition
Why the F-stat follows the F-distribution (and is therefore right-skewed, too)?
Most of the time, the F-stat is 2,3,4,5,6, so our model is not much better than the one which has no explanatory power.
Usually, there is a critical value, which says: okay, the cut-off line is say 3. If the F-stat is >3, then this model has some merit.
Of course, there is an F-table you can consult about that!
Hope this helps!
Best,
The 365 Team