# Linear regression

x1 = data [['SAT','Attendance']] which aggregates the two independent variable into x1. But why is x = sm.add_constant(x1) needed?

Hey Josh,

We're doing it because we assume in our model that:

*GPA = c + SAT * a_1 + Attendance * a_2 + e*

In this case, we have a constant term, SAT score, Attendance and a residual. Hence, before we regress GPA on SAT and Attendance, we need to make sure we include a constant factor among the exogenous variables. Then, the regression will calculate the values for the constant term (c) as well as the coefficients for SAT and Attendance (a_1 and a_2).

The idea is that there is **often** some minimal GPA value that can't be explained by changes in SAT or Attendance. Hence, we add the constant term here. Of course, we say *often*, rather than always, because it is possible for the constant factor to be non-significant (a.k.a equivalent to 0). That being said, we prefer to assume there exist a constant and find out it's not significant, than to assume there is no constant factor and attribute the shifts in values to the other coefficients (a_1 and a_2).

Hope this helps!

Best,

365 Vik