Problem with NaN in dataset in "Credit Risk Modeling" - "PD model estimation"
Dear 365Team,
I am referring to module “Credit Risk Modeling”, course “PD model estimation” and Video “PD model estimation”.
If I use this code inputs_train_with_ref_cat = loan_data_inputs_train.loc[: , ['grade:A', 'grade:B', 'grade:C', … ] then I get error message “KeyError: 'Passing list-likes to .loc or [] with any missing labels is no longer supported, see https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#deprecate-loc-reindex-listlike'
So, I followed the recommendation to change .iloc with .reindex (Source: https://365datascience.com/question/credir-risk-modeling-in-python-section-pd-model-6-2/ ) inputs_train_with_ref_cat = loan_data_inputs_train.reindex(['grade:A', 'grade:B', 'grade:C', …], axis = 1)
But the problem is I get if I conduct this code
inputs_train = inputs_train_with_ref_cat.drop(ref_categories, axis = 1)
inputs_train.head()
many NaN in dataset inputs_train.
Where is man NaN in inputs_train.isnull().sum()
grade:A 0
grade:B 0
grade:C 0
grade:D 0
grade:E 0
grade:F 0
home_ownership:OWN 0
home_ownership:MORTGAGE 0
addr_state:NM_VA 373028
addr_state:NY 0
addr_state:OK_TN_MO_LA_MD_NC 373028
addr_state:CA 0
addr_state:UT_KY_AZ_NJ 373028
addr_state:AR_MI_PA_OH_MN 373028
addr_state:RI_MA_DE_SD_IN 373028
addr_state:GA_WA_OR 373028
addr_state:WI_MT 373028
addr_state:TX 0
addr_state:IL_CT 373028
addr_state:KS_SC_CO_VT_AK_MS 373028
addr_state:WV_NH_WY_DC_ME_ID 373028
verification_status:Not Verified 0
verification_status:Source Verified 0
purpose:credit_card 0
purpose:debt_consolidation 0
purpose:oth__med__vacation 373028
purpose:major_purch__car__home_impr 373028
initial_list_status:w 0
term:36 373028
emp_length:1 373028
emp_length:2-4 373028
emp_length:5-6 373028
emp_length:7-9 373028
emp_length:10 373028
mths_since_issue_d:<38 373028
mths_since_issue_d:38-39 373028
mths_since_issue_d:40-41 373028
mths_since_issue_d:42-48 373028
mths_since_issue_d:49-52 373028
mths_since_issue_d:53-64 373028
mths_since_issue_d:65-84 373028
int_rate:<9.548 373028
int_rate:9.548-12.025 373028
int_rate:12.025-15.74 373028
int_rate:15.74-20.281 373028
mths_since_earliest_cr_line:141-164 373028
mths_since_earliest_cr_line:165-247 373028
mths_since_earliest_cr_line:248-270 373028
mths_since_earliest_cr_line:271-352 373028
mths_since_earliest_cr_line:>352 373028
delinq_2yrs:0 373028
delinq_2yrs:1-3 373028
inq_last_6mths:0 373028
inq_last_6mths:1-2 373028
inq_last_6mths:3-6 373028
open_acc:1-3 373028
open_acc:4-12 373028
open_acc:13-17 373028
open_acc:18-22 373028
open_acc:23-25 373028
open_acc:26-30 373028
open_acc:>=31 373028
pub_rec:3-4 373028
pub_rec:>=5 373028
total_acc:28-51 373028
total_acc:>=52 373028
acc_now_delinq:>=1 373028
total_rev_hi_lim:5K-10K 373028
total_rev_hi_lim:10K-20K 373028
total_rev_hi_lim:20K-30K 373028
total_rev_hi_lim:30K-40K 373028
total_rev_hi_lim:40K-55K 373028
total_rev_hi_lim:55K-95K 373028
total_rev_hi_lim:>95K 373028
annual_inc:20K-30K 373028
annual_inc:30K-40K 373028
annual_inc:40K-50K 373028
annual_inc:50K-60K 373028
annual_inc:60K-70K 373028
annual_inc:70K-80K 373028
annual_inc:80K-90K 373028
annual_inc:90K-100K 373028
annual_inc:100K-120K 373028
annual_inc:120K-140K 373028
annual_inc:>140K 373028
dti:<=1.4 373028
dti:1.4-3.5 373028
dti:3.5-7.7 373028
dti:7.7-10.5 373028
dti:10.5-16.1 373028
dti:16.1-20.3 373028
dti:20.3-21.7 373028
dti:21.7-22.4 373028
dti:22.4-35 373028
mths_since_last_delinq:Missing 373028
mths_since_last_delinq:4-30 373028
mths_since_last_delinq:31-56 373028
mths_since_last_delinq:>=57 373028
mths_since_last_record:Missing 373028
mths_since_last_record:3-20 373028
mths_since_last_record:21-31 373028
mths_since_last_record:32-80 373028
mths_since_last_record:81-86 373028
mths_since_last_record:>=86 373028
dtype: int64
So I decided to look in dataset
loan_data_inputs_train.csv
loan_data_targets_train.csv
loan_data_inputs_test.csv
loan_data_targets_test.csv
and test each of its with .isnull().sum(). There is on NaN in dataset.
I have now no clue what should I do now? Where is the mistake, which i did? I hope I described the problem in clearly way.
Best regard
Volkmar