Risk Analysis & Modelling

Risk Analysis & Modelling Lecture 10: Extreme Value Theory

http://www.angelfire.com/linux/lecturenotes

What we will learn in this lecture • We will look at a method for dealing specifically with infrequent, extreme events: EVT (Extreme Value Theory) • EVT can be used to describe the tails of almost any distribution • EVT can be used as a method of calculating a distribution independent measure of VaR • The limitations of EVT • More advanced programming techniques in VBA!

The Tails Of A Distribution & Risk • In our look at Value At Risk we estimated the likely loss by describing the complete behaviour of a random variable • From this distribution we took the 5% lower tail or 1% lower tail as our estimate of a serious, but possible loss • We had to make some strict assumptions about the distribution in order to estimate the position of these losses • Since we are only interested in the tails of the distribution cant we make fewer assumptions and just focus on the tails? • The answer is yes and the method is EVT!

Tail Risks From VaR We make a lot of strict assumptions about the random variable in order to estimating the whole distribution All we are interested is the lower tail We do not need a complete description of the random distribution just the lower tail. Isn’t this inefficient?

Probability Distribution Recap • Before we discuss EVT we will recap some statistics • Imagine we have a probability distribution F which describes a continuous random variable • The definition of a probability distribution is that it describe the chance of observing a value equal or less than a given level • It is sometimes called Cumulative Density • So F(0.5) would give the probability that our random variable will take a value equal to or less than 0.5 • We will assume we are dealing with variables that can be between –infinity and +infinity • F(-infinity) is 0, F(+infinity) is 1 • If the upper bound is +infinity then were are certain the random variable will be bellow it!

Probability Distribution Example For Random Variable X 1.0 P(X<=C) +X C -X 0.0

Peak Over Threshold Distribution • The Peak Over Threshold Distribution (POTD) describes the distribution of a random variable given that we know it has exceeded a given boundary • An example of such a distribution would be the distribution describing daily returns greater than 5% • This distributions is obviously related to the distribution describing the complete behaviour of the random variable • More specifically the POTD is: • The probability of observing a value of X that is less than or equal to y+u given that X is above u

If we know the Probability Distribution describing a random variable X, F, then we can express the POTD in terms of F, u, y: • F(y+u) is the probability that X will be less than or equal to y+u • F(u) is the probability that X will be less than or equal to u • F(y+u) - F(u) is the probability that X will be greater than u but less than or equal to y+u • 1-F(u) is the probability that u will be greater than u (F(inf) = 1) • Since our probability distribution is conditional of the fact that we are above the boundary u we divide by the probability 1–F(u) (ie rescale our probabilities)

POTD Interpretation PDF(X) Probability that X is greater than a threshold value u u X u u+y POTD measures the probability that X will be greater than u and less than u + y, given that we know X is greater than u

A Very Important Result • It can be shown that regardless of the probability distribution F of the variable the POTD distribution approaches a set distribution as the threshold u increases: • Where G is a Generalised Pareto Distribution (GPD): • x is the shape parameter, b is a scaling parameter (like s scales the normal distribution) • When x > 0 we say the GPD has ‘heavy’ tails

Use Of POTD Limit • Re-expressing this result we can say that for all values of x greater than u • This means we can estimate the Probability Distribution F(x) for the tail (x>u) interms of the Generalised Pareto Distribution • All we have to do is find out a way of estimating G’s parameters!

Peak Over Threshold Approach • If we have an observation of a random variable X then EVT tells us that if we set a high enough boundary (u) the distribution describing the random points above this boundary will have a Generalised Pareto Distribution Random Points Over Threshold u Are Described By GPD X u

We can describe the tail of the distribution by fitting a GPD to the data-points above the boundary • If 10% of the points are above the boundary then we say that F(u) is 90% (ie 90% of the points are below the boundary u, this is our estimate for the specific value F(u), remember we do not know what F(u) looks like!) • We estimate F(u) by from the number of points above the boundary (Nu) relative to to the total N of points in our data set: F(u) = 1 – (Nu/N) • For this boundary the relationship between F(x) and G(x) would be: • F(x) describes the probability distribution for x above the boundary u • We can estimate the parameters of G (x, b) using maximum likelihood • What values for x, b are most likely to produce the points we observe above the boundary?

Maximum Likelihood Estimator • We have a set of data points (S) we observe above the boundary we set u (A1,A2,A3..) • What is the probability of observing this set of results? Given they are independent it is the product of the probability of observing each individually: • We want to select the GPD which maximises P(S) • We can also express this problem as selecting the GPD to maximise the log likelihood, (which is often a simpler problem to solve)

We still have to work out how to calculate the probability of observing a point above the boundary • This is given by the probability density function (pdf) for the GPD: • Note this is just the derivative of G with respect to y • We notice that the log of this is:

So to find the GPD distribution that is most likely to have produced the data points we have observed above the boundary we simply have to find the values of x, b that maximise: Max ln(P(s)) by changing x, b • We Ai is the set of observations above the boundary we set • We cannot user solver to find these values! • We need to use a grid searching algorithm because it can have multiple peaks • Once we have x, b we have a GPD tail distribution that we can use to calculate Value at Risk!

Using EVT To Calculate VaR • VaR tells us how far into the tail of a distribution we have to go to be sure only 5% or 1% of possible outcomes will be bellow that point • Since EVT describes the tails we should be able to use it to calculate VaR • We want to use the tail distribution to ask at what level of loss can we say that only X% of losses will be less than that loss level? • There are 3 problems that must be solved before we can calculate VaR using EVT

Problem 1: EVT Deals With The Upper Tail • The EVT model we have looked at deals exclusively with the upper tail of the distribution, while VaR deals with the lower tail • The solution to this is fairly simple, instead of measuring returns on the portfolio we measure losses. A positive loss (L) is a negative return (R), A positive return is a negative loss. • Using this definition the problem of finding that maximum loss is to find the upper tail of the distribution describing L.

Problem 2: Selecting The Upper Boundary • To use the Peak Over Threshold we need to set an upper boundary on the level of loss and only look at points above that line • Since the tail distribution is only valid above this peak we must select a threshold which is not above the level of VaR we wish to calculate • For example if we set our threshold so that only 3% of the distribution is above it we cannot then use this tail distribution to estimate the 5% tail

Diagram of the relationship between the threshold and the VaR level we can estimate The level of the threshold determines how much of the tail our GPD estimates The VaR confidence interval must be contained by our tail estimation! u X We only estimate the distribution above our boundary

Problem 3: Inverting the Tail Estimator • Our tail estimator for the function is: • F(x) tells us the probability of observing a value less than or equal to x (loss) • We are interested in finding the the level of x (loss) for which there is only a probability P of observing losses greater than x • For example we want to find the loss level which we can say only 1% of losses will be greater than • We need to rearrange the above to get x interms of the probability rather than the probability in terms of x

This comes down to reordering • After some work: • We observe that F(x) measures the probability that the loss L is less than or equal to some level x (L <= x). We want the probability the loss is greater than some level x V(x). Which is simply V(x) = 1-F(x).

The Terms Of EVT VaR • V(x) is the VaR Confidence level, such as 5% for 5% VaR • X is the upper boundary on loss that we only expect to be above V(x) % of the time • U is the level of the threshold we set (the loss level) to estimate the tail • N is the total number of observations for losses in our dataset and Nu, therefore our estimate for 1-F(u) is Nu/N • x, b are terms we estimate using Maximum Likelihood from the points over the threshold from our dataset of losses • This is only value for risks estimate above u, our tail estimator is only valid above the threshold we set!

The Advantages of EVT • The advantage of EVT is that it just focuses on the tail of the distribution • The GPD this method can estimate the fat tail losses we observe in financial instruments and insurance liabilities • The calculation is not excessively complex or computationally intensive

The Problem With EVT • The problem with EVT is that we need a lot of data to estimate the GPD of the tail • The higher we set the boundary for the Peak Over Threshold the closer our distribution will be to the GPD • Unfortunately the higher we set the boundary the less data we have to estimate the GPD • EVT is still under development, we will have to wait and see what people come up with!

Part 2: Arrays, Objects & More VBA Tricks

A Recap of Last Week • Last week we introduced the key concepts of variables and statements • Variables were like boxes that store a single piece of data • Statements were instructions to the computer • We looked at If statements and Loops • This week will look at a special type of variable called an array • We will look at how to create our own variable types with objects!

An Array • An array is a variable that can store more than one value • Last week we looked at variables as boxes that only contain one value • An array is a variable that can contain many variables • Arrays are important because often we want to store lists or blocks of things of various length of variable length (such as a list of all the students in the class) • Each element in the array is identified by a number

Creating an Array • Let us say we want to create an Array of 10 strings we would write: Dim StudentNames(10) as String • If we wanted to create an Array of 500 Daily Returns on a stock: Dim DailyReturns(500) as Double • If we wanted to create a list of student ages where ClassSize is an integer variable we would say: Dim StudentAges(ClassSize) as Integer • If the variable ClassSize contained 35 then the Array StudentAges would contain 35 variables

Accessing The Elements Of An Array • Just like a Variable the Array is initially blank • Let us say we wanted to assign the name Frank Bloggs to the first element in the StudentNameArray we would write: StudentNameArray(1) = “Frank Bloggs” • StudentNameArray(1) would now contain the string “Frank Bloggs”

The For Loop • The for loop is a different type of loop • It is especially designed for the case where we use an integer variable to count the number of loops • If we wanted to count all the cells in column A which have a value greater than 0.5 we would write Dim I as integer Dim CellCount as Integer Dim CellValue as Double For I = 1 to 100 CellValue = Cells(I,1) If CellValue > 0.5 then CellCount = CellCount + 1 End If Next I

If..Then..Else..End If • Last week we looked at If Then Blocks • There is an extension to this called If Then Else Blocks: If MyNumber > 0.5 Then Call MsgBox(“MyNumber is Greater Than 0.5”) Else Call MsgBox(“MyNumber is Not Greater Than 0.5”) End if • This is useful when we want to say to the computer: “if this is then do this else do something else”, rather than just saying “if this is do this”.

Introduction To Objects • Using objects we can create our own variable types • So we could say: Dim TopStudent as New Student • This is very useful and is the basis for object oriented programming (OO) • The type Student is know as a class (the type of the variable) and TopStudent is an object of type student (instance or example of that class)

Class Modules • To create our own variable types or “Classes” we have to create a Class Module • The name we give the Class Module is the name of the new Variable Type we create • There is a class module called student containing the following: Public StudentName as String Public StudentAge as Integer Public StudentGrade as Double • Every object of type student will have 3 sub-variable or members: StudentName, StudentAge and StudentGrade • They are declared as Public so we can access them outside the class module

Using Objects • Here is some example code of using an object of type Student: Dim TopStudent as new Student TopStudent.Name = “Frank Bloggs” TopStudent.StudentAge = 32 TopStudent.StudentGrade = 72.1 • Notice how we access the members of the object using a ‘.’ • Objects make the code readable • Objects have many uses in advanced programming techniques but those are for you to discover!

THE END

Risk Analysis & Modelling