Credit Risk Modeling in Python

Teaching you the programming behind how banks decide who should get a loan. You will learn risk modeling theory and advance your Python modeling skills.








Course description

Credit risk modeling is the place where data science and fintech meet. It is one of the most important activities conducted in a bank and the one with the most attention since the recession. This course is the only comprehensive credit risk modeling course in Python available right now. It shows the complete credit risk modeling picture, from preprocessing, through probability of default (PD), loss given default (LGD) and exposure at default (EAD) modeling, and finally finishing off with calculating expected loss (EL).

Setting up the environment

Here you will learn how to set up Python 3 and load up Jupyter. We’ll also show you what the Anaconda Prompt is and how you can use it to download and import new modules.


Dataset description

Our example focuses on consumer loans. Since there are more than 100 potential features, we've devoted a complete section to explain why some features are chosen over others.


General preprocessing

Each raw datasets has its drawbacks. While most preprocessing is model specific, in some cases (like missing values imputation), we could generalize the data preparation.


PD model: data preparation

Once we have completed all general preprocessing, we dive into model-specific preprocessing. We employ fine classing, coarse classing, weight of evidence and information value criterion to achieve the probability of default preprocessing. Conventionally, we should turn all variables into dummy indicators prior to modeling.


PD model validation (test)

Since each model overfits the training data, it is crucial to test the results on out-of-sample observations. Consequently, we find its accuracy, its area under the curve (AUC), the Gini coefficient and the Kolmogorov-Smirnov test.


Applying the PD model for decision making

In practice, banks don't really want a complicated Python-implemented model. Instead, they prefer a simple score-card which contains only yes and no questions that could be employed by any bank employee. In this section, we learn how to create one.


PD model monitoring

Model estimation is extremely important, but an often-neglected step is model maintenance. A common approach is to monitor the population stability over time using the population stability index (PSI) and revisit our model if needed.


LGD and EAD models

To calculate the final expected loss, we need three ingredients: probability of default (PD), loss given default (LGD) and exposure at default (EAD). In this section, we preprocess our data to be able to estimate the LGD and EAD models.


EAD model

The exposure at default (EAD) modeling is very similar to the LGD one. In this section, we take advantage of a linear regression to calculate EAD.


Calculating expected loss

After having calculated PD, LGD, and EAD, we reach the final step: computing expected loss (EL). This is also the number which is most interesting to C-level executives and is the finale of the credit risk modeling process.