Credit Risk Modeling in Python

Credit risk modeling is the place where data science and fintech meet. It is one of the most important activities conducted in a bank and the one with most attention since the Great recession. This course is the only comprehensive credit risk modeling course in Python available right now. It shows the complete credit risk modeling picture - from preprocessing, through probability of default (PD), loss given default (LGD) and exposure at default (EAD) modeling, and finally finishing off with calculating expected loss (EL).

Sign up to
preview the course
for FREE!

Create a free account and start learning data science today.

create free account
Our graduates work at exciting places:
walmart
tesla
paypal
citibank
booking.com

Section 1

Introduction

We start by explaining why credit risk is important for financial institutions. We also define ground 0 terms such as expected loss, probability of default, loss given default and exposure at default.

Premium course icon What is credit risk and why is it important?
Premium course icon Expected loss (EL) and its components: PD, LGD and EAD
Premium course icon Capital adequacy, regulations, and the Basel II accord
Premium course icon Basel II approaches: SA, F-IRB, and A-IRB
Premium course icon Different facility types (asset classes) and credit risk modeling approaches

Section 2

Dataset description

Our example focuses on consumer loans. Since there are more than 100 potential features, we've devoted a complete section to explain why some features are chosen over others.

Premium course icon Our example: consumer loans. A first look at the dataset
Premium course icon Dependent variables and independent variables

Section 3

General preprocessing

Each raw datasets has its drawbacks. While most preprocessing is model specific, in some cases (like missing values imputation), we could generalize the data preparation.

Premium course icon Importing the data into Python
Premium course icon Preprocessing few continuous variables
Premium course icon Preprocessing few discrete variables
Premium course icon Check for missing values and clean

Section 4

PD model: data preparation

Once we have completed all general preprocessing, we dive into model specific preprocessing. We employ fine classing, coarse classing, weight of evidence and information value criterion to achieve the probability of default preprocessing. Conventionally, we should turn all variables into dummy indicators prior to modeling.

Premium course icon How is the PD model going to look like?
Premium course icon Dependent variable: Good/ Bad (default) definition
Premium course icon Fine classing, weight of evidence, coarse classing, information value
Premium course icon Data preparation. Splitting data
Premium course icon Data preparation. Preprocessing discrete variables: automating calculations
Premium course icon Data preparation. Preprocessing discrete variables: visualizing results
Premium course icon Data Preparation. Preprocessing Discrete Variables: Creating Dummies
Premium course icon Data preparation. Preprocessing continuous variables: automating calculations
Show all lessons
Premium course icon Data preparation. Preprocessing continuous variables: creating dummies
Premium course icon Data preparation. Preprocessing the test dataset
Show fewer lessons

Section 5

PD model estimation

Having set up all variables to be dummies, we estimate the probability of default. The most intuitive and widely accepted approach is to employ a logistic regression.

Premium course icon The PD model. Logistic regression with dummy variables
Premium course icon Loading the data and selecting the features
Premium course icon PD model estimation
Premium course icon Build a logistic regression model with p-values.
Premium course icon Interpreting the coefficients in the PD model

Section 6

PD model validation (test)

Since each model overfits the training data, it is crucial to test the results on out-of-sample observations. Consequently, we find its accuracy, its area under the curve (AUC), Gini coefficcient and Kolmogorov-Smirnov test.

Premium course icon Out-of-sample validation (test).
Premium course icon Evaluation of model performance: accuracy and area under the curve (AUC)
Premium course icon Evaluation of model performance: Gini and Kolmogorov-Smirnov.

Section 7

Applying the PD model for decision making

In practice, banks don't really want a complicated Python implemented model. Instead, they prefer a simple score-card which contains only Yes/No questions, that could be employed by any bank employee. In this section we learn how to create one.

Premium course icon Calculating probability of default for a single customer
Premium course icon Creating a scorecard
Premium course icon Calculating credit score
Premium course icon From credit score to PD
Premium course icon Setting cut-offs

Section 8

PD model monitoring

Model estimation is extremely important, but an often neglected step is model maintenance. A common approach is to monitor the population stability over time using the population stability index (PSI) and revisit our model if needed.

Premium course icon PD model monitoring via assessing population stability
Premium course icon Population stability index: preprocessing
Premium course icon Population stability index: calculation and interpretation

Section 9

LGD and EAD models

To calculate the final expected loss, we need three ingredients. Probability of default (PD), loss given default (LGD) and exposure at default (EAD). In this section we preprocess our data to be able to estimate the LGD and EAD models.

Premium course icon LGD and EAD models: independent variables
Premium course icon LGD and EAD models: dependent variables
Premium course icon LGD and EAD models: distribution of recovery rates and credit conversion factors

Section 10

LGD model

LGD models are often estimated using a beta regression. To keep the modeling part simpler, we employ a 2-step regression model, which aims to simulate a beta regression. We combine the predictions from a logistic regression with those from a linear regression to estimate the loss given default.

Premium course icon LGD model: preparing the inputs
Premium course icon LGD model: testing the model
Premium course icon LGD model: estimating the accuracy of the model
Premium course icon LGD model: saving the model
Premium course icon LGD model: stage 2 - linear regression
Premium course icon LGD model: stage 2 - linear regression evaluation
Premium course icon LGD model: combining stage 1 and stage 2

Section 11

EAD model

The exposure at default (EAD) modeling is very similar to the LGD one. In this section we take advantage of a linear regression to calculate EAD>

Premium course icon EAD model estimation and interpretation
Premium course icon EAD model validation

Section 12

Calculating expected loss

After having calcuated PD, LGD, and EAD, we reach the final step - computing expected loss (EL). This is also the number which is most interesting to C-level executives and is the finale of the credit risk modeling process.

Premium course icon Calculating expected loss
MODULE 4

Advanced Specialization

This course is part of Module 4 of the 365 Data Science Program. The complete training consists of four modules, each building up on your knowledge from the previous one. Module 4 is focused on developing a specialized industry-relevant skillset and students are encouraged to complete Modules 1, 2, and 3 before they start this part of the training. Here you will learn how to perform Credit Risk Modeling for banks, Customer Analytics for retail or other commercial companies, and Time Series Analysis for finance and stock data.

See All Modules

Trust the other 276,000 students

Ready to start?
Sign up today for FREE!

Whether you want to scale your career or transition into a new field, data science is the number one skillset employers look for. Grow your analytics expertise and get hired as a data scientist!