Super learner
This user is a Super Learner. To become a Super Learner, you need to reach Level 8.
Adjusting the dates where "mths_since_earliest_cr_line" is negative
In case you want to do set the date the original date instead of replacing with the highest positive value, use the following:
def earliest_cr_line_date(df):
df["earliest_cr_line_date"]= pd.to_datetime(df["earliest_cr_line"], format="%b-%y")
df["mths_since_earliest_cr_line"] = round(pd.to_numeric((pd.to_datetime('2017-12-01') -
df['earliest_cr_line_date']) / np.timedelta64(1, 'M')))
df.loc[df["mths_since_earliest_cr_line"] < 0, "earliest_cr_line_date"] = df["earliest_cr_line_date"] -
pd.DateOffset(years=100)
df.loc[df["mths_since_earliest_cr_line"] < 0, "mths_since_earliest_cr_line"] = (
round(pd.to_numeric((pd.to_datetime('2017-12-01') - df['earliest_cr_line_date']) / np.timedelta64(1,
'M')))
)
return df
df = earliest_cr_line_date(df)
This took me a while to compute but it was worth the while. Here is my df["mths_since_earliest_cr_line"].describe() after running this code:
Hope this helps someone.
1 answers ( 0 marked as helpful)
Hey Jonathan, thanks so much for your contribution! Much appreciated.