Why exactly do we need to drop the dummy variable? Because of the mathematical reasons or because reason 0 doesn’t give us information? Do we need to do this for all the other data frames that we work? Or is it just for this occasion?
The reason is we are trying to avoid multicollinearity.
We have this topic covered in one of our other courses (the Data Science Course).
Alternatively, feel free to get familiar with the mathematics of it here (written by your instructor Iliya)
The 365 Team