Sum of Squares Total, Sum of Squares Regression and Sum of Squares Error

Iliya Valchanov 20 Oct 2021 5 min read

You may be wondering what all of those sums of squares are all about. Maybe that’s what got you here in the first place. Well, they are the determinants of a good linear regression. This tutorial is based on the ANOVA framework you may have heard before.

Before reading it, though, make sure you are not mistaking regression for correlation. If you’ve got this checked, we can get straight into the action.

A quick side-note: Want to learn more about linear regression? Check out our explainer videos The Linear Regression Model. Geometrical Representation and The Simple Linear Regression Model.

SST, SSR, SSE: Definition and Formulas

There are three terms we must define. The sum of squares total, the sum of squares regression, and the sum of squares error.

The sum of squares total, the sum of squares regression, and the sum of squares error.

What is the SST?

The sum of squares total, denoted SST, is the squared differences between the observed dependent variable and its mean. You can think of this as the dispersion of the observed variables around the meanmuch like the variance in descriptive statistics.

Sum of squares total

It is a measure of the total variability of the dataset.

Side note: There is another notation for the SST. It is TSS or total sum of squares.

What is the SSR?

The second term is the sum of squares due to regression, or SSR. It is the sum of the differences between the predicted value and the mean of the dependent variable. Think of it as a measure that describes how well our line fits the data.

Sum of squares regression

If this value of SSR is equal to the sum of squares total, it means our regression model captures all the observed variability and is perfect. Once again, we have to mention that another common notation is ESS or explained sum of squares.

What is the SSE?

The last term is the sum of squares error, or SSE. The error is the difference between the observed value and the predicted value.

Sum of squares error

We usually want to minimize the error. The smaller the error, the better the estimation power of the regression. Finally, I should add that it is also known as RSS or residual sum of squares. Residual as in: remaining or unexplained.

The Confusion between the Different Abbreviations

It becomes really confusing because some people denote it as SSR. This makes it unclear whether we are talking about the sum of squares due to regression or sum of squared residuals.

Sum of squares error

In any case, neither of these are universally adopted, so the confusion remains and we’ll have to live with it.

Simply remember that the two notations are SST, SSR, SSE, or TSS, ESS, RSS.

Sum of squares error

There’s a conflict regarding the abbreviations, but not about the concept and its application. So, let’s focus on that.  

How Are They Related?

Mathematically, SST = SSR + SSE.


The rationale is the following: the total variability of the data set is equal to the variability explained by the regression line plus the unexplained variability, known as error.


Given a constant total variability, a lower error will cause a better regression. Conversely, a higher error will cause a less powerful regression. And that’s what you must remember, no matter the notation.

Next Step: The R-squared

Well, if you are not sure why we need all those sums of squares, we have just the right tool for you. The R-squared. Care to learn more? Just dive into the linked tutorial where you will understand how it measures the explanatory power of a linear regression!


Interested in learning more? You can take your skills from good to great with our statistics course. 

Try statistics course for free  

Next Tutorial: Measuring Variability with the R-squared

Learn data science with industry experts

Try For Free
Iliya Valchanov

Co-founder of 365 Data Science

Iliya is a Finance Graduate from Bocconi University with expertise in mathematics, statistics, programming, machine learning, and deep learning. His passion for teaching inspired him to create some of the most popular courses in our program: Introduction to Data and Data Science, Introduction to R Programming, Statistics, Mathematics, Deep Learning with TensorFlow, Deep Learning with TensorFlow 2, and Machine Learning in Python.