Super learner
This user is a Super Learner. To become a Super Learner, you need to reach Level 8.
Last answered:

29 Apr 2024

Posted on:

27 Apr 2024

0

Adjusting the dates where "mths_since_earliest_cr_line" is negative

In case you want to do set the date the original date instead of replacing with the highest positive value, use the following:

def earliest_cr_line_date(df):
    df["earliest_cr_line_date"]= pd.to_datetime(df["earliest_cr_line"], format="%b-%y")
    df["mths_since_earliest_cr_line"] = round(pd.to_numeric((pd.to_datetime('2017-12-01') - 
    df['earliest_cr_line_date']) / np.timedelta64(1, 'M')))
    df.loc[df["mths_since_earliest_cr_line"] < 0, "earliest_cr_line_date"] = df["earliest_cr_line_date"] - 
    pd.DateOffset(years=100)
    df.loc[df["mths_since_earliest_cr_line"] < 0, "mths_since_earliest_cr_line"] = (
        round(pd.to_numeric((pd.to_datetime('2017-12-01') - df['earliest_cr_line_date']) / np.timedelta64(1, 
        'M')))
    )

    return df

df = earliest_cr_line_date(df)

This took me a while to compute but it was worth the while. Here is my df["mths_since_earliest_cr_line"].describe() after running this code:
Hope this helps someone.

1 answers ( 0 marked as helpful)
Instructor
Posted on:

29 Apr 2024

1

Hey Jonathan, thanks so much for your contribution! Much appreciated.

Submit an answer