Last answered:

15 Jun 2024

Posted on:

17 Apr 2022


Encoding features with one hot encoding vs ordinal encoding in Data Preprocessing

Why the choice of the ordinal encoder and not one hot encoder when preprocessing our data? I'm guessing our choice affects the performance of our model. In practice, do we have to try both and select the encoding with better performance?

2 answers ( 0 marked as helpful)
Posted on:

19 Apr 2022


Hi Nehita,
thanks for reaching out! In practice the two types of encoders will likely give similar results (though I have not yet tried the one-hot encoder). The reason for choosing this one here is that we show the one-hot encoder in another course, and I wanted to show our students other possibilities. From my experience though, both techniques lead to very similar results. If you do find a significant difference in the results, I'd be happy if you share them here in the hub.
Hope this helps!

365 Eli

Super learner
This user is a Super Learner. To become a Super Learner, you need to reach Level 8.
Posted on:

15 Jun 2024


Hi Eli, Could you point out one-hot encoder course you're referring to?

Regarding the differences between one-hot and ordinal, I see an extremely different result.

enc_i = OneHotEncoder()
enc_t = LabelEncoder()

x_train_transf = enc_i.fit_transform(x_train)
x_test_transf = enc_i.transform(x_test)

y_train_transf = enc_t.fit_transform(y_train)
y_test_transf = enc_t.transform(y_test)

C= 1.0
clf = svm.SVC(C=C, kernel='linear').fit(x_train_transf, y_train_transf)

y_test_pred = clf.predict(x_test_transf)
ConfusionMatrixDisplay.from_predictions(y_test_transf, y_test_pred, display_labels=enc_t.classes_.tolist())

result of the one-hot encoder is 100% accuracy in the mushroom case.

Please check my code, *probably I did something wrong?

Submit an answer