Improving the Model
So it seems in all these years no one has attempted to improve the model. That's a bummer. I was hoping to see what others had tried.
What I came up with as an improvement, which is still no where near the 90%+ accuracy that Ilya was able to achieve, was a combination of a change to the preprocessing step along with a new parameter in the fit() method we did not learn about.
I did not undersample the data. I left all the data in tact and unbalanced. I then used a new parameter in the fit method called class_weight, which allows me to apply a higher penalty to the minority class. This way, an error for the class that is in the minority is penalized higher than the the majority one. This allows us to keep the entire dataset and not throw away information while still ensuring there is a balancing effect happening. I had to use Gemini to help me figure this out as there is no way I would have come up with this on my own as I'm still largely ignorant to what all is possilbe here.
I was able to move from an 81.25% accuracy on the undersampled dataset to an 85.88% accuracy with the class_weights dataset. I think that's pretty solid. Let me know what you think! Is there an even better way to approach this?
-Justin
What I came up with as an improvement, which is still no where near the 90%+ accuracy that Ilya was able to achieve, was a combination of a change to the preprocessing step along with a new parameter in the fit() method we did not learn about.
I did not undersample the data. I left all the data in tact and unbalanced. I then used a new parameter in the fit method called class_weight, which allows me to apply a higher penalty to the minority class. This way, an error for the class that is in the minority is penalized higher than the the majority one. This allows us to keep the entire dataset and not throw away information while still ensuring there is a balancing effect happening. I had to use Gemini to help me figure this out as there is no way I would have come up with this on my own as I'm still largely ignorant to what all is possilbe here.
I was able to move from an 81.25% accuracy on the undersampled dataset to an 85.88% accuracy with the class_weights dataset. I think that's pretty solid. Let me know what you think! Is there an even better way to approach this?
-Justin
0 answers ( 0 marked as helpful)
