Which evaluation metric to choose?
If we had to choose between precision and recall, which metric would be prioritized more for this particular use-case? I think we would want a model with higher recall as we don't want a real comment to get removed just because it was falsely classified as spam. I'd love to hear your feedback on this!
Thank you for reaching out!
Prioritizing precision means you'd want to minimize the cases where true ham messages are incorrectly classified as spam (have as few false positives as possible). Prioritizing recall, on the other hand, means you'd want to minimize the cases where true spam messages are incorrectly classified as ham (have as few false negatives as possible).
Indeed, we'd want to minimize the false positives, so that as few as possible true ham messages get flagged as spam. This means maximizing the precision of the model. This can manifest in a lower recall, where the model misses some of the spam messages and leaves them in the comment section. These can be reported manually by either the content creator or the viewers.