Last answered:

30 May 2024

Posted on:

16 Jan 2023

0

Day of the Week and Month Variable type

Shouldn't the 'Day of the Week' and 'Month Value' columns be represented as categorical or string to avoid potential ordinal or ratio inference in modelling? (i.e June isn't half of December, Thursday isn't greater than a Monday). Both my variables have appeared as integers.

1 answers ( 0 marked as helpful)
Instructor
Posted on:

30 May 2024

0

Hi Justin!
Thanks for reaching out.

Representing 'Day of the Week' and 'Month Value' as categorical variables is important to prevent misleading interpretations in your model. Here’s why:

1) Nominal Nature: Days of the week and months are nominal categories with no inherent order or magnitude. For example, Thursday is not inherently greater than Monday, and June is not half of December.
2) Incorrect Inferences: Using integers can lead to incorrect ordinal or ratio inferences. The model might assume numerical relationships that don't exist, potentially skewing predictions and insights.
3) Proper Encoding: Encoding these variables as categorical ensures the model recognizes them as distinct categories. This can be done using one-hot encoding or label encoding, depending on the model requirements.
Converting these variables to categorical or string types is a standard practice in data preprocessing to ensure accurate modeling and interpretation.


Hope this helps.
Best,
Martin

Submit an answer