Super learner

29 Apr 2024

Posted on:

27 Apr 2024

0

# Adjusting the dates where "mths_since_earliest_cr_line" is negative

In case you want to do set the date the original date instead of replacing with the highest positive value, use the following:

``````def earliest_cr_line_date(df):
df["earliest_cr_line_date"]= pd.to_datetime(df["earliest_cr_line"], format="%b-%y")
df["mths_since_earliest_cr_line"] = round(pd.to_numeric((pd.to_datetime('2017-12-01') -
df['earliest_cr_line_date']) / np.timedelta64(1, 'M')))
df.loc[df["mths_since_earliest_cr_line"] < 0, "earliest_cr_line_date"] = df["earliest_cr_line_date"] -
pd.DateOffset(years=100)
df.loc[df["mths_since_earliest_cr_line"] < 0, "mths_since_earliest_cr_line"] = (
round(pd.to_numeric((pd.to_datetime('2017-12-01') - df['earliest_cr_line_date']) / np.timedelta64(1,
'M')))
)

return df

df = earliest_cr_line_date(df)``````

This took me a while to compute but it was worth the while. Here is my df["mths_since_earliest_cr_line"].describe() after running this code:
Hope this helps someone.