In this data set there are many variables that deal with months. I am trying to figure out which function to use on a months column. In the video, ‘mnths_since_issue_d’ has been calculated by simply subtracting months from a particular date and through fine classing is called with the woe_ordered_continuous function. It is counted and a whole number so shouldn’t we treat this as a discrete variable? Thanks for the help.
P.S. The homework “Data Preperation. Preprocessing continuous variables: creating dummies. Homework” has a typo that is underlined:
Your task in this homework is to examine weight of evidence and determine the categories that should be created as dummy variables for the probability of default (PD) model for two continuous variables: ‘dti’ (debt to income ratio) and ‘mths_since_last_record’.
For each of the two variables:
1. Determine whether you should do fine classing of the variable, that is, cut it into categories. If yes, use the ‘pd.cut’ method to do the fine classing.
2. Run the ‘woe_discrete’ function with the following arguments: the ‘df_inputs_prepr’ dataframe, the respective independent variable, and the ‘df_targets_prepr’ dataframe. Display the dataframe with results.
thanks for bringing this to our attention!
To your question months is of type date. As such it is neither discrete, nor continuous, as it has specific properties, related to time series data.
Hope this helps!