The 365 Data Science team is proud to invite you to our own community forum. A very well built system to support your queries, questions and give the chance to show your knowledge and help others in their path of becoming Data Science specialists.
Anybody can ask a question
Anybody can answer
The best answers are voted up and moderated by our team

Linear regression

Linear regression


I am just starting on linear regression and I’m not too sure about this part.
I get 
y = data [‘GPA’]
x1 = data [[‘SAT’,’Attendance’]]
which aggregates the two independent variable into x1.
But why is
x = sm.add_constant(x1)

1 Answer

365 Team

Hey Josh,
We’re doing it because we assume in our model that:

GPA = c + SAT * a_1 + Attendance * a_2 + e

In this case, we have a constant term, SAT score, Attendance and a residual. Hence, before we regress GPA on SAT and Attendance, we need to make sure we include a constant factor among the exogenous variables. Then, the regression will calculate the values for the constant term (c) as well as the coefficients for SAT and Attendance (a_1 and a_2).

The idea is that there is often some minimal GPA value that can’t be explained by changes in SAT or Attendance. Hence, we add the constant term here. Of course, we say often, rather than always, because it is possible for the constant factor to be non-significant (a.k.a equivalent to 0). That being said, we prefer to assume there exist a constant and find out it’s not significant, than to assume there is no constant factor and attribute the shifts in values to the other coefficients (a_1 and a_2).

Hope this helps!
365 Vik

Complete Data Science Education
Get 50% OFF