The 365 Data Science team is proud to invite you to our own community forum. A very well built system to support your queries, questions and give the chance to show your knowledge and help others in their path of becoming Data Science specialists.
Ask
Anybody can ask a question
Answer
Anybody can answer
Vote
The best answers are voted up and moderated by our team

Confused With R squared

Confused With R squared

0
Votes
1
Answer

Hi Teacher,
At the time I was done with this chapter, R squared was pretty clear to me, however, now that I moved to learning principles of data science, where R squared is defined by 1-SSR/SST.
Could you please explain?

1 Answer

365 Team
0
Votes

Hey Sarala,

As far as I understand, the concept is clear to you but in our lecture, we define R-squared as SSR / SST, while according to another source it is 1 – SSR/ SST, correct?

That’s a valid question.

In both cases, what is meant is that the R-squared = Variability explained / Total variability.

1. Now, according to our framework, SST = Sum of Squares Total; SSR = Sum of Squares Regression; SSE = Sum of Squares Error
In that case, R-squared = SSR / SST, or R-squared = 1 – SSE/SST
2. Unfortuntely, there is a different notation in some books that you may come across. Some sources have the abbreviations as:

TSS (SST) = Total Sum of Squares; RSS (SSR) = Residual Sum of Squares; ESS (SSE) = Estimation Sum of Squares

You can see how this is a problem, as residuals (conceptually) mean error. And estimation (at least in the case of regression analysis) means regression.

Using this notation, you can state: R-squared = Variability explained / Total variability = ESS / TSS or as you saw it = 1 – RSS / TSS.

3. Some people even define SSR = Sum of Squares Regression Errorstating that SSR stands for the sum of errors. This third abbreviation in my opinion is the most misleading of them all. Do you even need to say the word ‘regression’ here? That to me is basically saying: ‘You can’t make me use your notation. I prefer my own.’

Conceptually, the three notations have the same meaning. Unfortunately, their abbreviations are opposite.

I have seen the first notation (the one from our lectures) used much more often than the others. When creating the R-squared lecture, I put the extra effort to research the usage of each one of those, as I anticipated some confusion. Predominantly, sources were using the first notation, so I stuck with it. I like the second one, but only when they put: TSS, RSS, and ESS as abbreviations. That makes it clear which framework the author is using.

In any case, now you know about this ridiculous confusion in statistics. In the material you are using, just assume that SSR and SSE have switched places. Everything else should be the same.

Best,
The 365 Team

×
EXTENDED SALE
Learn Data Science this Summer!
Get 50% OFF