This question is based on the exercise from ‘Dealing with Categorical Data – Dummy Variables Exercise’.
I decided to test the linearity assumption by plotting the dependent variable (price) vs the independent variables (size, year, view), as seen below. The price vs size graph clearly shows that a line can be drawn through the observations to create a clear linear regression, but this doesn’t appear to be the case for the price vs year and the price vs view graphs. Since both price vs year and price vs view are piecewise functions, are we only looking for linearity in each respective piece (does it pass the linearity assumption because for each year and view, it’s a linear vertical line)?
Thanks for reaching out.
First of all, price-size is obviously alright, so I won’t comment on it.
Second, you can use a parameter called ‘alpha’ when plotting and set it to a number between 0 and 1, e.g. alpha = 0.5 to see the actual density of the points.
Third, when we have a relationship like price and view. Since view is a dummy variable, the linearity assumption does not need to hold. It is included in the regression in a different way (that is why it is called a ‘dummy’ / indicator it is not a real variable and does not need to be treated like one). So linearity is fine.
Finally, we’ve got year. You have correctly identified that it behaves strangely. It behaves more like a dummy rather than a continuous variable, right? And that is precisely the case.
If we had the interval from 1900 to 2010 then it would have looked linear. However, if you have just a couple of years like in this example, then you could treat it as a categorical variable (and create several dummies). This is not uncommon practice!
Hope this helps!