It is said that one of the most crucial conditions which should not be violated in Linear Regression is linearity. Thus, to make sure that the condition is not violated, the dependent variable should be individually plotted against the independent variables.
In Machine Learning in Python course, when it comes to Linear Multiple Regression, if the Attendance is plotted against the GPA the condition is violated. How this issue should be treated?
thanks so much for reaching out! To your question:
There are 3 options when we see issue with the linearity condition:
- run a non-linear regression
- use exponential transformation
- use log transformation
In the section ‘Linear Regression-Practical Example’ in the course Machine Learning in Python in the second lecture entitled ‘Practical example(part2)’ we see the application of the log transformation. Namely, we plot ‘Price’ feature versus ‘Year’, ‘Engine Volume’ etc. and we spot that there is an issue exactly as yours-we take the log of ‘Price’ and create another column called ‘log_price’ then we plot ‘Year’ and ‘Engine volume’ against the newly created ‘log_price’ and we see a linear dependence.
Hope this helps!