Resolved: Ordinal encoding
Hi, loving the course so far!
Can ordinal encoding leave some type of bias on the model that you will train? It seems like when employing ordinal encoding, we're incorporating our subjective understanding, potentially influencing the data differently compared to one-hot encoding, where all values are represented by 0s and 1s.
I apologize if my question is a bit unclear.
Hello Ryo Sato,
Good to hear from you. Your question is quite clear and touches on an important aspect of data preprocessing in machine learning.
Ordinal encoding is generally suitable for ordinal data, where the categories have a meaningful order or ranking. For example, ratings like 'poor', 'average', 'good', and 'excellent' have a natural order. However, using ordinal encoding on nominal data (where categories do not have an inherent order, like countries, colors, or brand names) can introduce bias. This is because the model might incorrectly assume an inherent order in the data that doesn't exist.
In addition, when applying ordinal encoding, the order in which categories are encoded can be subjective and potentially misleading. For instance, if you encode educational levels as 0 for 'high school', 1 for 'bachelor's degree', and 2 for 'master's degree', you’re imposing a hierarchy that might not be appropriate for all analyses.