Model Risk – sources and some examples

Model Risk – sources and some examples Tony Bellotti Department of Mathematics Imperial College London

Model development A highly simplifiedmodel development framework:- Model Model development Use In this framework, once the model is developed, we then think of it as correct. However, the model is only an approximation to reality.

Thinking about model risk Do you factor in the uncertainty of your model when you use it? Model Model development Assess- ment Use Measure Model Risk • Firstly, we need to understand the sources of model risk and how to measure those risks. • Secondly, the consequences of using the model needs to be assessed in light of the model risks, prior to use.

Does model risk matter? But…does model risk really matter? Does it make a substantial difference in the real world? “The reliance on models to handle risk carries its own risk” * In securities markets, where complex pricing models are used, there is such a thing as model arbitrage, where a trader will take advantage of known errors in model structure or implementation to make money. So there is a genuine effect. * If this happens in retail credit, perhaps it could lead to adverse selection (eg pricing a loan below the true risk level of the borrower). * Emanuel Derman (1996), Model Risk, Goldman Sachs Quantitative Strategies Research Notes

What about model risk in retail credit? But retail credit employs relatively simple models, so perhaps there is no problem?.... But model complexity is not the only source of model risk (although it is an important one for pricing models). In the following slides I will consider several possible sources of model risk. • Note: This is not an exhaustive list and also there is some overlap between the various categories. Later, I give some examples from retail finance to illustrate when there could be model risk issues.

Sources of model risk Statistical:- • Model misspecification • Model efficiency/inefficiency • Data problems and selection bias • Robustness over time • Inappropriate use Other/management:- • Model development resources (analysts/time) • Publication, implementation and software error We consider only the statistical sources of model risk.

Model misspecification (1) • Model structure • Do we have the correct general model structure to model the data? • In the past, it was common to use OLS. Now it is standard to use logistic regression. Perhaps now we can ask if logit is the correct link function? • Is the basic linear scorecard correct? Is a nonlinear structure more appropriate? • Model assumptions: what are they and are we breaking them? • Distributions on error terms (eg normality for OLS). • Independence for observations in standard logistic regression. Is this really true in retail credit?

Model misspecification (2) • Inclusion of variables. • Too few variables may lead to biassed estimates. • Too many will lead to less efficient estimates and, hence, less robust models. • Variable transformations (to log or not to log?). • With some variables like income, it is “standard” to take log. • What about others? Age, eg? • Some modellers use all weights-of-evidence – is this appropriate? • Multicollinearity. • Where predictor variables are themselves highly correlated, this can lead to inefficient or wrong estimates (in particular, it can lead to the wrong sign).

Model efficiency/inefficiency Every model is inaccurate and every estimate is just that: an estimate. Fortunately, most statistical models provide a measure of the accuracy of estimates (ie the standard errors). • This is not true of all models (eg standard linear discriminant analysis and machine learning algorithms) – although it’s always possible to bootstrap. • Remember though that the accuracy of the standard errors themselves can be suspect and is dependent on following model assumptions (or relying on model robustness).

Data problems and selection bias • Is the data appropriate for the modelling task? • Reliability in data collection; eg how reliable is a self-assessment of income? • Or, eg, based on an existing portfolio of predominantly older customers, build a model for a card targeting young customers. • Adata set of accepted loan applications, to build a scorecard across all new applications. • Of course, the last example is the problem of selection bias. • It is a fairly well understood model risk issue in retail credit. • Several reject inference techniques to handle it: eg parcelling and augmentation.

Robustness over time (1) • There are some problem domains where risk factors and distributions on variables are stable over time. • In such domains, models remain stable. • For example, mortality scoring models based on physiology of hospital in-patients (eg Apache III) are stable since human physiology does not change much over time. • However, consumer credit does not remain stable over time. • Credit risk changes over the business cycle. • Credit usage behaviour changes over time. • Banks’ risk appetite changes over time. • Innovations in technology and product development change risk. • All of these time-varying factors affect the applicability of credit risk models over time.

Robustness over time (2) • Changes in the effect size of risk factors will have an obvious effect on the applicability of a model. • Population drift: Changes in the distribution of predictor or outcome variables can also affect the robustness of the model. • Slow versus sudden change (eg economic crisis) can have different effects on the applicability of a model. • Possible approaches to dealing with this problem:- • Rebuild models regularly and Champion/challenger environment. • Dynamic models (ie including time-varying factors in the risk model). • Adaptive models.

Model robustness, in general • The problem of model robustness over time generalizes to different domains: eg geographic or product type. • For example, if we have a credit card product operating in UK, does the same scorecard model apply to Ireland? • How different will it be?

Inappropriate use “In terms of risk control, you’re worse off thinking you have a model and relying on it than in simply realizing there isn’t one.” * A model may be built correctly. However, it may be used for the wrong task. For example, using a default model as the basis of a strategy on customer retention…. Better to build a new model focussed on retention. * Emanuel Derman (1996), Model Risk, Goldman Sachs Quantitative Strategies Research Notes

Consequences of model risk (1) What are the consequences of model risk? Need to measure the effect of model risk on model use :- (1) Explanatory model • If it is important that the model is used as an explanatory model, then bias and inefficiency in model estimation will be important. • Eg for discussion with management and regulators. (2) Forecasting • Individual / account level; • Aggregate / loss forecasting; • Does the flat maximum effect provide some robustness against model bias and inefficiency?

Consequences of model risk (2) (3) Stress testing • Predictions of outcome for extreme values. • Typically, value-at-risk, expected shortfall, or scenarios. • Effects of model risk on stress testing are likely to be different to the effect on standard forecasts. I now give some quick examples of model risk, looking at usage, measurement issues and consequences….

Example 1: Misspecification / Misapplication Performance of models for extreme cases * Models work well at estimating expected values for “typical” cases from the population. However, how do they fare when predicting default rates (DR) for extreme cases? In this experiment, a logistic regression model is built for credit card data. DR is then predicted for an independent test set of extreme cases (with respect to variables such as age and job) and compared with observed DR. * Work conducted by Alice Wang as part of her third year undergraduate project.

Example 1: Results We see that these models tend to under- or over- estimate DR for extreme cases. Interestingly, the parsimonious model gives better forecast results. Note: all extreme criteria represent 2% of the test data (N=600).

Example 2: Selection bias Simulation study The problem of selection bias in application models is well known and several reject inference methods have been proposed. Unfortunately, in a real world context it is not usually possible to accurately evaluate the extent of the bias, or the effectiveness of a reject inference method, since outcomes for rejects are unknown. However, simulation studies can be used to show the effect. These are valuable to demonstrate the extent of the problem. Here is the result of a simulation study using an augmentation method. • In a nutshell, augmentation is a method that weights observations from the accepts; usually according to how typical they are of being accepts, based on an Accept-Reject model.

Example 2: Results Suppose we simulate 25,000 applications with two variables: income () and number of previous delinquencies () and outcome: good/bad. Reject 40% of applications using a scorecard. Build an unbiassed model S1 on all applications: • Score = -2.05 + 1.47 -0.64 • (remember, in the real world we could not build S1 since we do not have outcomes for rejects) Now build a biassed model S2 based on just the 60% accepted cases:- • Score = -2.08 +1.43 -0.32 Notice the difference in coefficient estimate on . Why does this happen?

Example 2 continued This graph shows the distribution is not the same for the accepted population, compared to all. Those with high numbers of delinquencies are under-represented. This effects the model estimation.

Example 2 continued A model using augmentation S3 uses only the sample of accepts like S2, but weights observations with high delinquency more heavily in the accepted sample. Hence model estimation is closer to the unbiassed model: • Score = -2.05 + 1.59 -0.46 The new model also gives better results on an independent test set:- Onelesson here is that simulation studies are of value to give insight into aspects of model risk that are not immediately measureable in the real-world setting.

Example 3: Model estimation error Incorporating model estimation error in loss forecasts Take the log-odds score from a scorecard to build a univariate logistic regression model. • Of course, the coefficient estimate on is (approximately). • However, there is a standard error which allows us to construct a CI for :. What consequences does this have in a real example? Experiments with 50,000 credit cards where default rate=0.2: . • This has a small and modest effect on estimates of PD: If PD estimate with is 0.2, then 99%CI gives (0.193,0.207).

Example 3 continued Effect on expected loss EL=PD x LGD x EAD: However, if we look at Value-at-risk (VaR) of EL, then the small variation in model, has a bigger impact. Using Monte Carlo simulation of EL, either (A) with fixed coefficient , or (B) generated values of : • At the 99% level, VaR for simulation study (B) is 4% higher than for study (A). Based on Bellotti (2011), A simulation study of Basel II expected loss distributions for a portfolio of credit cards. Journal of Financial Services Marketing

Example 4: Misspecification Using Logit versus Poisson link function In the context of large defaultable bond portfolios, Lucas and Verhoef* experiment with Logit and Poisson link function. • Note: there is a good rationale for using a Poisson link function since default time can be modelled as a Poisson process. How do the models perform in estimating expected loss? * Lucas A and Verhoef B (2012), Aggregating Credit and Market Risk: the Impact of Model Specification, working paper, Tinbergen Institute, VU University Amsterdam

Example 4 continued For two segments, they report these results:- Hardly any model misspecification problem for Expected Loss estimates… But, importantly, for VaR, Logit underestimates (relative to Poisson). “model specification matters … This is surprising, as the shape of the link function is deemed to be less important for computing capital requirements.” *

Example 5: Robustness over time Use of time-varying risk factors for loss forecasting One approach to dealing with changing risk levels over time is to include macroeconomic time series. Survival models are a good way to do this since macroeconomic and behavioural data can be included as time-varying covariates (TVCs). Model time to default as a failure event. Experiment on portfolio of UK credit card data: * • Training data: 400,000 credit cards over period 1999 to 2004. • Forecast for 150,000 credit cards from 2005 to mid-2006. * Bellotti and Crook (2009), Forecasting and stress testing credit card default using dynamic models, working paper, Credit Research Centre, Edinburgh

Example 5: Results Inclusion of interest rate and unemployment rate are statistically significant. We compare default rate (DR) forecasts between models with application variables (AV) only (eg age, income, employment status, housing status, at application), behavioural variables (BV) and macroeconomic variables (MV). MAD = mean absolute difference between estimated and observed DR. This shows an improvement in aggregate forecasts when macroeconomic data is included in the model.

Conclusion There is a genuine problem of model risk. • We have seen some suggestive examples. We need to understand the sources of model risk. We need to know the consequences of model risk and how to measure it. We need to find ways to manage model risk: • Develop methods to reduce or control it, and • Incorporate model risk in our decision making.

Model Risk – sources and some examples