The 365 Data Science team is proud to invite you to our own community forum. A very well built system to support your queries, questions and give the chance to show your knowledge and help others in their path of becoming Data Science specialists.
Ask
Anybody can ask a question
Answer
Anybody can answer
Vote
The best answers are voted up and moderated by our team

Problem with NaN in dataset in “Credit Risk Modeling” – “PD model estimation”

Problem with NaN in dataset in “Credit Risk Modeling” – “PD model estimation”

0
Votes
0
Answer

Dear 365Team,
 
I am referring to module “Credit Risk Modeling”, course “PD model estimation” and Video “PD model estimation”.
 
If I use this code
inputs_train_with_ref_cat = loan_data_inputs_train.loc[: , [‘grade:A’, ‘grade:B’, ‘grade:C’, … ]
then I get error message
“KeyError: ‘Passing list-likes to .loc or [] with any missing labels is no longer supported, see https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#deprecate-loc-reindex-listlike’
 
So, I followed the recommendation to change .iloc with .reindex
(Source: https://365datascience.com/question/credir-risk-modeling-in-python-section-pd-model-6-2/)
inputs_train_with_ref_cat = loan_data_inputs_train.reindex([‘grade:A’, ‘grade:B’, ‘grade:C’, …], axis = 1)
 
But the problem is I get if I conduct this code
inputs_train = inputs_train_with_ref_cat.drop(ref_categories, axis = 1)
inputs_train.head()
many NaN in dataset inputs_train.
 
Where is man NaN in inputs_train.isnull().sum()

grade:A                                     0
grade:B                                     0
grade:C                                     0
grade:D                                     0
grade:E                                     0
grade:F                                     0
home_ownership:OWN                          0
home_ownership:MORTGAGE                     0
addr_state:NM_VA                       373028
addr_state:NY                               0
addr_state:OK_TN_MO_LA_MD_NC           373028
addr_state:CA                               0
addr_state:UT_KY_AZ_NJ                 373028
addr_state:AR_MI_PA_OH_MN              373028
addr_state:RI_MA_DE_SD_IN              373028
addr_state:GA_WA_OR                    373028
addr_state:WI_MT                       373028
addr_state:TX                               0
addr_state:IL_CT                       373028
addr_state:KS_SC_CO_VT_AK_MS           373028
addr_state:WV_NH_WY_DC_ME_ID           373028
verification_status:Not Verified            0
verification_status:Source Verified         0
purpose:credit_card                         0
purpose:debt_consolidation                  0
purpose:oth__med__vacation             373028
purpose:major_purch__car__home_impr    373028
initial_list_status:w                       0
term:36                                373028
emp_length:1                           373028
emp_length:2-4                         373028
emp_length:5-6                         373028
emp_length:7-9                         373028
emp_length:10                          373028
mths_since_issue_d:<38                 373028
mths_since_issue_d:38-39               373028
mths_since_issue_d:40-41               373028
mths_since_issue_d:42-48               373028
mths_since_issue_d:49-52               373028
mths_since_issue_d:53-64               373028
mths_since_issue_d:65-84               373028
int_rate:<9.548                        373028
int_rate:9.548-12.025                  373028
int_rate:12.025-15.74                  373028
int_rate:15.74-20.281                  373028
mths_since_earliest_cr_line:141-164    373028
mths_since_earliest_cr_line:165-247    373028
mths_since_earliest_cr_line:248-270    373028
mths_since_earliest_cr_line:271-352    373028
mths_since_earliest_cr_line:>352       373028
delinq_2yrs:0                          373028
delinq_2yrs:1-3                        373028
inq_last_6mths:0                       373028
inq_last_6mths:1-2                     373028
inq_last_6mths:3-6                     373028
open_acc:1-3                           373028
open_acc:4-12                          373028
open_acc:13-17                         373028
open_acc:18-22                         373028
open_acc:23-25                         373028
open_acc:26-30                         373028
open_acc:>=31                          373028
pub_rec:3-4                            373028
pub_rec:>=5                            373028
total_acc:28-51                        373028
total_acc:>=52                         373028
acc_now_delinq:>=1                     373028
total_rev_hi_lim:5K-10K                373028
total_rev_hi_lim:10K-20K               373028
total_rev_hi_lim:20K-30K               373028
total_rev_hi_lim:30K-40K               373028
total_rev_hi_lim:40K-55K               373028
total_rev_hi_lim:55K-95K               373028
total_rev_hi_lim:>95K                  373028
annual_inc:20K-30K                     373028
annual_inc:30K-40K                     373028
annual_inc:40K-50K                     373028
annual_inc:50K-60K                     373028
annual_inc:60K-70K                     373028
annual_inc:70K-80K                     373028
annual_inc:80K-90K                     373028
annual_inc:90K-100K                    373028
annual_inc:100K-120K                   373028
annual_inc:120K-140K                   373028
annual_inc:>140K                       373028
dti:<=1.4                              373028
dti:1.4-3.5                            373028
dti:3.5-7.7                            373028
dti:7.7-10.5                           373028
dti:10.5-16.1                          373028
dti:16.1-20.3                          373028
dti:20.3-21.7                          373028
dti:21.7-22.4                          373028
dti:22.4-35                            373028
mths_since_last_delinq:Missing         373028
mths_since_last_delinq:4-30            373028
mths_since_last_delinq:31-56           373028
mths_since_last_delinq:>=57            373028
mths_since_last_record:Missing         373028
mths_since_last_record:3-20            373028
mths_since_last_record:21-31           373028
mths_since_last_record:32-80           373028
mths_since_last_record:81-86           373028
mths_since_last_record:>=86            373028
dtype: int64

 
So I decided to look in dataset
loan_data_inputs_train.csv
loan_data_targets_train.csv
loan_data_inputs_test.csv
loan_data_targets_test.csv
and test each of its with .isnull().sum(). There is on NaN in dataset.
 
I have now no clue what should I do now? Where is the mistake, which i did? I hope I described the problem in clearly way.
Best regard
Volkmar

No answers so far.
×
EXTENDED SALE
Learn Data Science this Summer!
Get 50% OFF