Just a review of my model. Please write suggestions below
So I did certain transformations. Firstly, I added the model column back and removed more outliers. Removing more outliers certainly did a much better job. However, creating dummies for models kind of messed it up. There were so many of them, and vif s were drastically huge. What happened was that until the 3rd quartile, I had a 34% difference, with an improved R2 of 0.82, but above the 75% most of the predictions were infinite so that I assume some models were so different in terms of determining the price, their vif was just divided by 0.
Implementing actual prices instead of log_prices didn't work even a little, I got more dispersed data at first when I plotted for the training dataset, and later there was nothing on the plot for the predicted values.
If anyone has any suggestions about the improved model, please leave it below, I'd really appreciate that.
I have the same problem too. I also used the "model" column, plus I transformed the numeric values into the "EngineV" column using discretization (https://scikit-learn.org/stable/modules/preprocessing.html#discretization). I get a final dataframe with 319 columns and an R2 of 0.95 on the train data. The model however, for very few values, predicts very high, infinite prices. I do not know why