The 365 Data Science team is proud to invite you to our own community forum. A very well built system to support your queries, questions and give the chance to show your knowledge and help others in their path of becoming Data Science specialists.
Ask
Anybody can ask a question
Answer
Anybody can answer
Vote
The best answers are voted up and moderated by our team

The skewness of the F stat

The skewness of the F stat

0
Votes
1
Answer

The F-distro looks right skewed. Am I reading that right? Also, if it is right skewed, could you further explain how, if at all, skewness affects regression and H test? Thanks.

1 Answer

365 Team
0
Votes

Hi Chitra,

The F-statistic in regressions is a comparison between our regression model, and a model that has no independent variables.

If you create a regression with no independent variables (so you don’t have Xs), just Y = some constant, say Y = 5. You will have some explanatory power still, like.. 0.1% (sometimes you’ll get close to the answer, no doubt). If you use a regression for this, the only coefficient will be the intercept and it will be equal to the mean!

So Y = the mean.

If you check your formula for SSR, you will realize that if the prediction is always the mean,
then SSR = 0 (no explanained variability).

***

Now, the F-stat is a ratio.

It is a ratio of our model and the model with no Xs, which had SSR = 0.

So what the F-stat shows us is:

How much better is our current model, than a model that has no explanatory power whatsoever?
We would usually have F-stats >50, or >300, or >2000. And those are all normal.
However, an F-stat = 2, would imply that our model, is just 2 times better than saying: the answer is always the mean. We want something much more dramatic than 2, right?
And that’s what the F-stat constitutes of.

***

The F-stat follows the F-distribution, which is:

1) always non-negative (for regressions, it is the ratio of two sum of squares so it is 0 or above)
2) right skewed, because the F-distribution is right-skewed by definition

Why the F-stat follows the F-distribution (and is therefore right-skewed, too)?

Most of the time, the F-stat is 2,3,4,5,6, so our model is not much better than the one which has no explanatory power.
Usually, there is a critical value, which says: okay, the cut-off line is say 3. If the F-stat is >3, then this model has some merit.

Of course, there is an F-table you can consult about that!

Hope this helps!
Best,
The 365 Team