Dear 365Team,
I am referring to module “Credit Risk Modeling”, course “PD model estimation” and Video “PD model estimation”.
If I use this code
inputs_train_with_ref_cat = loan_data_inputs_train.loc[: , [‘grade:A’, ‘grade:B’, ‘grade:C’, … ]
then I get error message
“KeyError: ‘Passing list-likes to .loc or [] with any missing labels is no longer supported, see https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#deprecate-loc-reindex-listlike’
So, I followed the recommendation to change .iloc with .reindex
(Source: https://365datascience.com/question/credir-risk-modeling-in-python-section-pd-model-6-2/)
inputs_train_with_ref_cat = loan_data_inputs_train.reindex([‘grade:A’, ‘grade:B’, ‘grade:C’, …], axis = 1)
But the problem is I get if I conduct this code
inputs_train = inputs_train_with_ref_cat.drop(ref_categories, axis = 1)
inputs_train.head()
many NaN in dataset inputs_train.
Where is man NaN in inputs_train.isnull().sum()
grade:A 0 grade:B 0 grade:C 0 grade:D 0 grade:E 0 grade:F 0 home_ownership:OWN 0 home_ownership:MORTGAGE 0 addr_state:NM_VA 373028 addr_state:NY 0 addr_state:OK_TN_MO_LA_MD_NC 373028 addr_state:CA 0 addr_state:UT_KY_AZ_NJ 373028 addr_state:AR_MI_PA_OH_MN 373028 addr_state:RI_MA_DE_SD_IN 373028 addr_state:GA_WA_OR 373028 addr_state:WI_MT 373028 addr_state:TX 0 addr_state:IL_CT 373028 addr_state:KS_SC_CO_VT_AK_MS 373028 addr_state:WV_NH_WY_DC_ME_ID 373028 verification_status:Not Verified 0 verification_status:Source Verified 0 purpose:credit_card 0 purpose:debt_consolidation 0 purpose:oth__med__vacation 373028 purpose:major_purch__car__home_impr 373028 initial_list_status:w 0 term:36 373028 emp_length:1 373028 emp_length:2-4 373028 emp_length:5-6 373028 emp_length:7-9 373028 emp_length:10 373028 mths_since_issue_d:<38 373028 mths_since_issue_d:38-39 373028 mths_since_issue_d:40-41 373028 mths_since_issue_d:42-48 373028 mths_since_issue_d:49-52 373028 mths_since_issue_d:53-64 373028 mths_since_issue_d:65-84 373028 int_rate:<9.548 373028 int_rate:9.548-12.025 373028 int_rate:12.025-15.74 373028 int_rate:15.74-20.281 373028 mths_since_earliest_cr_line:141-164 373028 mths_since_earliest_cr_line:165-247 373028 mths_since_earliest_cr_line:248-270 373028 mths_since_earliest_cr_line:271-352 373028 mths_since_earliest_cr_line:>352 373028 delinq_2yrs:0 373028 delinq_2yrs:1-3 373028 inq_last_6mths:0 373028 inq_last_6mths:1-2 373028 inq_last_6mths:3-6 373028 open_acc:1-3 373028 open_acc:4-12 373028 open_acc:13-17 373028 open_acc:18-22 373028 open_acc:23-25 373028 open_acc:26-30 373028 open_acc:>=31 373028 pub_rec:3-4 373028 pub_rec:>=5 373028 total_acc:28-51 373028 total_acc:>=52 373028 acc_now_delinq:>=1 373028 total_rev_hi_lim:5K-10K 373028 total_rev_hi_lim:10K-20K 373028 total_rev_hi_lim:20K-30K 373028 total_rev_hi_lim:30K-40K 373028 total_rev_hi_lim:40K-55K 373028 total_rev_hi_lim:55K-95K 373028 total_rev_hi_lim:>95K 373028 annual_inc:20K-30K 373028 annual_inc:30K-40K 373028 annual_inc:40K-50K 373028 annual_inc:50K-60K 373028 annual_inc:60K-70K 373028 annual_inc:70K-80K 373028 annual_inc:80K-90K 373028 annual_inc:90K-100K 373028 annual_inc:100K-120K 373028 annual_inc:120K-140K 373028 annual_inc:>140K 373028 dti:<=1.4 373028 dti:1.4-3.5 373028 dti:3.5-7.7 373028 dti:7.7-10.5 373028 dti:10.5-16.1 373028 dti:16.1-20.3 373028 dti:20.3-21.7 373028 dti:21.7-22.4 373028 dti:22.4-35 373028 mths_since_last_delinq:Missing 373028 mths_since_last_delinq:4-30 373028 mths_since_last_delinq:31-56 373028 mths_since_last_delinq:>=57 373028 mths_since_last_record:Missing 373028 mths_since_last_record:3-20 373028 mths_since_last_record:21-31 373028 mths_since_last_record:32-80 373028 mths_since_last_record:81-86 373028 mths_since_last_record:>=86 373028 dtype: int64
So I decided to look in dataset
loan_data_inputs_train.csv
loan_data_targets_train.csv
loan_data_inputs_test.csv
loan_data_targets_test.csv
and test each of its with .isnull().sum(). There is on NaN in dataset.
I have now no clue what should I do now? Where is the mistake, which i did? I hope I described the problem in clearly way.
Best regard
Volkmar
Im having this exact same problem, however the NaN values are appearing only in the las column ‘mths_since_last_record:>=86’.
The dataset seems fine without NaN values by applying .isna().sum() before reindexing after applying .reindex NaN values appear.
Did you find a workaround or a way of solving this issue?
.iloc method is no longer supported