'Regression itself' part in the code
sorry but i don't understand why we crated a new variable called x and use the function of add_constant , therefor didn't understand the OLS function
thanks in advance
Thank you for your question!
Here is how you could think about it. In general, the regression equation looks as follows:
Y = b0 + b1 * x1 + ...
In our case, this would be modified to
GPA = b0 + b1 * SAT,
where b0 and b1 are the coefficients that statsmodels needs to estimate. They serve as the bias and the slope of this function, respectively. Notice how we haven't multiplied b0 by x0 - that is because x0 is always 1. However, statsmodels doesn't know that - we need to manually put in an additional column full of ones in the table, such that statsmodels can recognize the variable x0.
We can show this in terms of code. In the notebook provided, try modifying the code under Regression itself in the following way:
results = sm.OLS(y,x1).fit() results.summary()
where we have skipped the definition of the variable
x and have changed
x1 inside the parentheses. Run the code. What you will see in the coefficients' table is that we only have a value for the coefficient in front of SAT namely, b1 = 0.0018. We know the slope of the function but we don't know what the bias is - we don't know where our plot should start from along the y-axis.
Now, run the original version of the code namely:
x = sm.add_constant(x1) results = sm.OLS(y,x).fit() results.summary()
You now see that we have both b0 = const = 0.2750 and SAT = b1 = 0.0017. This is because we have added this additional column of ones into our model.
If you are still unsure about the role of the
add_constant() method and how it works, I advise you to visit the original statsmodels webpage, search for add_constant and examine the documentation of the method.
Hope this helps!