How do dummy variables meet the linearity assumption?
Hi Varun,
Thanks for reaching out.
First of all, price-size is obviously alright, so I won't comment on it.
Second, you can use a parameter called 'alpha' when plotting and set it to a number between 0 and 1, e.g. alpha = 0.5 to see the actual density of the points.
Third, when we have a relationship like price and view. Since view is a dummy variable, the linearity assumption does not need to hold. It is included in the regression in a different way (that is why it is called a 'dummy' / indicator it is not a real variable and does not need to be treated like one). So linearity is fine.
Finally, we've got year. You have correctly identified that it behaves strangely. It behaves more like a dummy rather than a continuous variable, right? And that is precisely the case.
If we had the interval from 1900 to 2010 then it would have looked linear. However, if you have just a couple of years like in this example, then you could treat it as a categorical variable (and create several dummies). This is not uncommon practice!
Hope this helps!
Iliya