# Credit Risk Modeling in Python

Credit risk modeling is the place where data science and fintech meet. It is one of the most important activities conducted in a bank and the one with most attention since the Great recession. This course is the only comprehensive credit risk modeling course in Python available right now. It shows the complete credit risk modeling picture - from preprocessing, through probability of default (PD), loss given default (LGD) and exposure at default (EAD) modeling, and finally finishing off with calculating expected loss (EL).

##### Our graduates work at exciting places:     ## Introduction

We start by explaining why credit risk is important for financial institutions. We also define ground 0 terms such as expected loss, probability of default, loss given default and exposure at default. What is credit risk and why is it important? Expected loss (EL) and its components: PD, LGD and EAD Capital adequacy, regulations, and the Basel II accord Basel II approaches: SA, F-IRB, and A-IRB Different facility types (asset classes) and credit risk modeling approaches

## Dataset description

Our example focuses on consumer loans. Since there are more than 100 potential features, we've devoted a complete section to explain why some features are chosen over others. Our example: consumer loans. A first look at the dataset Dependent variables and independent variables

## General preprocessing

Each raw datasets has its drawbacks. While most preprocessing is model specific, in some cases (like missing values imputation), we could generalize the data preparation. Importing the data into Python Preprocessing few continuous variables Preprocessing few discrete variables Check for missing values and clean

## PD model: data preparation

Once we have completed all general preprocessing, we dive into model specific preprocessing. We employ fine classing, coarse classing, weight of evidence and information value criterion to achieve the probability of default preprocessing. Conventionally, we should turn all variables into dummy indicators prior to modeling. How is the PD model going to look like? Dependent variable: Good/ Bad (default) definition Fine classing, weight of evidence, coarse classing, information value Data preparation. Splitting data Data preparation. Preprocessing discrete variables: automating calculations Data preparation. Preprocessing discrete variables: visualizing results Data Preparation. Preprocessing Discrete Variables: Creating Dummies Data preparation. Preprocessing continuous variables: automating calculations
Show all lessons Data preparation. Preprocessing continuous variables: creating dummies Data preparation. Preprocessing the test dataset
Show fewer lessons

## PD model estimation

Having set up all variables to be dummies, we estimate the probability of default. The most intuitive and widely accepted approach is to employ a logistic regression. The PD model. Logistic regression with dummy variables Loading the data and selecting the features PD model estimation Build a logistic regression model with p-values. Interpreting the coefficients in the PD model

## PD model validation (test)

Since each model overfits the training data, it is crucial to test the results on out-of-sample observations. Consequently, we find its accuracy, its area under the curve (AUC), Gini coefficcient and Kolmogorov-Smirnov test. Out-of-sample validation (test). Evaluation of model performance: accuracy and area under the curve (AUC) Evaluation of model performance: Gini and Kolmogorov-Smirnov.

## Applying the PD model for decision making

In practice, banks don't really want a complicated Python implemented model. Instead, they prefer a simple score-card which contains only Yes/No questions, that could be employed by any bank employee. In this section we learn how to create one. Calculating probability of default for a single customer Creating a scorecard Calculating credit score From credit score to PD Setting cut-offs

## PD model monitoring

Model estimation is extremely important, but an often neglected step is model maintenance. A common approach is to monitor the population stability over time using the population stability index (PSI) and revisit our model if needed. PD model monitoring via assessing population stability Population stability index: preprocessing Population stability index: calculation and interpretation

### Section 9

To calculate the final expected loss, we need three ingredients. Probability of default (PD), loss given default (LGD) and exposure at default (EAD). In this section we preprocess our data to be able to estimate the LGD and EAD models. LGD and EAD models: independent variables LGD and EAD models: dependent variables LGD and EAD models: distribution of recovery rates and credit conversion factors

## LGD model

LGD models are often estimated using a beta regression. To keep the modeling part simpler, we employ a 2-step regression model, which aims to simulate a beta regression. We combine the predictions from a logistic regression with those from a linear regression to estimate the loss given default. LGD model: preparing the inputs LGD model: testing the model LGD model: estimating the accuracy of the model LGD model: saving the model LGD model: stage 2 - linear regression LGD model: stage 2 - linear regression evaluation LGD model: combining stage 1 and stage 2

### Section 11

The exposure at default (EAD) modeling is very similar to the LGD one. In this section we take advantage of a linear regression to calculate EAD> EAD model estimation and interpretation EAD model validation

## Calculating expected loss

After having calcuated PD, LGD, and EAD, we reach the final step - computing expected loss (EL). This is also the number which is most interesting to C-level executives and is the finale of the credit risk modeling process. Calculating expected loss
MODULE 4  