1 / 70

Risk Analysis & Modelling

Risk Analysis & Modelling. Lecture 4: The Frequency-Severity Model. www.angelfire.com/linux/riskanalysis RiskCourseHQ@Hotmail.com. Recap of Last Weeks Class. In last weeks class we looked at the concept of continuous random values with a limitless number of possibilities

mac
Download Presentation

Risk Analysis & Modelling

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Risk Analysis & Modelling Lecture 4: The Frequency-Severity Model

  2. www.angelfire.com/linux/riskanalysisRiskCourseHQ@Hotmail.com

  3. Recap of Last Weeks Class • In last weeks class we looked at the concept of continuous random values with a limitless number of possibilities • For continuous random variables we cannot measure the probability of any one outcome occurring since it is zero • We can measure the probability of a continuous random variable being below a certain value using the Cumulative Distribution Function (CDF) which gives the probability of a random value being at or below a given value • We looked at the Exponential Distribution whose CDF is described by a mathematical equation…

  4. Exponential Distribution Function • The Cumulative Distribution Function (CDF) gives the probability of a random variable being less than some value x • The formula for the CDF of an Exponentially Distributed random variable is Where m is the average of the random variable and x is the value we wish to calculate the probability of the random variable being less than or equal to For example if we wanted to calculate the probability of an Exponentially Distributed random variable being less than or equal to 5 if it has an average of 3:

  5. Exponential CDF where m=3 Probability that this exponentially distributed random variable is less than or equal to 5 is 81.1% 1 – e-(5/3)

  6. EXPCDF • Excel does not have a formula to calculate the EXPCDF but we can make our own use VBA Public Function EXPCDF(X,Average) EXPCDF = 1 – EXP(-1 * X / Average) End Function • Excel keywords are highlighted in blue • X is the outcome we wish to calculate the probability of the random variable being less than and Average is the average of the random variable

  7. Inverse Cumulative Distribution Function (ICDF) • The Cumulative Distribution Function (CDF) calculates the probability of a random variable being less than or equal to some value • The Inverse Cumulative Distribution Function (ICDF) calculates the outcome given the probability of it being less than or equal to that outcome • The ICDF can be calculated by inverting the CDF or using it in reverse..

  8. Reading Probabilities from an Exponential CDF of Losses with Average of 500 EXPCDF(2000,500) = 98.16% (Exponentially Distributed Loss with an average of 500 will be less than or equal to 2000 98.16% of the time)

  9. Reading Outcomes from ICDFof Exponentially Distributed Losses with Average of 500 What loss will 98.16% of losses be less than given that losses are exponentially distributed with and average of 500? This is a use of the ICDF, getting the outcome from the probability.

  10. Inverting the CDF Function • The CDF function for the Exponential Distribution gives us the probability (P) for an outcome (x) and its formula is: • For the inverse transform method we need a function that will give us the outcome for a probability (the inverse) • By inverting the CDF we obtain the ICDF formula: When using the ICDF we specify the probability (P) and we obtain the value (X) it will be less than or equal to

  11. EXPICDF • This is the code for EXPICDF Public Function EXPICDF(Probability,Average) EXPICDF = -1*Average*log(1 – Probability) End Function • Excel keywords are highlighted in blue • Probability is the probability of observing values less than the outcome we wish to calculate and Average is the average of the random variable

  12. ICDF Review Questions • If the loss is Exponentially Distributed with an average of 2000 calculate the loss that 95% of losses will be less than or equal to (PML.95) • If the loss is Exponentially Distributed with an average of 2000 calculate the loss such that 99% of losses will be less than or equal to (PML.99)

  13. Probable Maximum Loss (PML) • Probable Maximum Loss (PML) gives a measure of the worst loss that is likely to occur at some level of statistical significance • The severity of the loss measured by the PML is measured in terms of the likelihood or probability of observing smaller losses • For example the PML.95 is the loss such that 95% of all losses will be less than or equal to that amount • The PML.99 is the loss such that 99% of all losses will be less than or equal to that amount • Once we have a distribution for losses the calculation of the PML is simply a matter of using the ICDF • PML is one of the most important measures of Underwriting Risk

  14. Inverse Transform Method • If we wish to use distributions like the Exponential Distribution in Risk Models based on Monte Carlo Simulations we will have to learn how to generate random outcomes from them… • Conceptually the method by which uniform random numbers can be transformed into random number representing samples of any distribution is extremely simple – although there are often complications in practice! • The inverse transform method comes from the observation that the cumulative probability described by the CDF is always between 0 and 1 • We can generate a uniform random number between 0 and 1 (which represents the probability of an outcome) and use the inverse CDF to imply the outcome relating to that probability • Note the Inverse CDF relates the probability to the outcome (unlike the CDF which relates the outcome to the probability)

  15. Inverse Transform Method Exponential Distribution Uniformly Distributed Random Numbers 1 0 0.25 0.91 1 Exponentially Distributed Random Numbers 12.03 1.43

  16. Inverse Transform MethodNormal Distribution We use the Inverse of the CDF to calculate the value relating to the probability of 0.91 which is 1.54 The computer generates a random number 0.91 from the uniform distribution 1 1.54 0.91 0 1 The transformed random variable 1.54 is normally distributed 1.54

  17. Exponentially Distributed Random Numbers m = 5 Exponential CDF We generate the random number 0.91 from the uniform distribution using Rand() 1 0.91 Using the formula for the inverse CDF of the exponentially distributed random variable we transform this uniformly distributed random number into an exponentially distributed random number or EXPICDF(0.91,5)=12.03

  18. Exponentially Distributed Random Variable Question • The amount of time an insurance company will have to wait until the next claim is a random variable • We will assume the amount of time until the next accident insured by an insurance company is an Exponentially Distributed random variable with an average of 0.25 hours (15 minutes) • What is the probability of the next loss will be incurred within the next hour – or the waiting time being less than 1 ? • Estimate with 95% certainty how long the company will have to wait until the next loss – value such that an exponentially distributed random variable with mean of 0.25 will be less than 95% of the time? • One the “Exponential Time Sheet” simulate the total time the insurance company will have to wait to receive a total of 5 Claims

  19. Modelling Underwriting Risk with Frequency-Severity Models • The Insurance Companies face two uncertainties on their underwriting portfolios: How many claims will occur (frequency) and the size (severity) of those claims? • Actuarial Science Models use statistical distributions to describe the number (or frequency) of claims and the size (or severity) of claims • We will start by looking at two distributions commonly used by non-life insurance companies to model the severity or size of claims • We will then move on to look at how Actuarial Science models the number or frequency of claims using the Poisson Process and Distribution – which is closely related to the Exponential Distribution • Finally, we will look at how the severity and frequency models can be combined to simulate the aggregate or total level of claims experienced

  20. Frequency & Severity of Underwriting Risk Underwriting Portfolio Frequency How many claims will the insurer experience? Severity How large will the claims be? Policies Experiencing Claims Claim Severity Distribution

  21. Severity Distribution 1: Pareto Distribution • The Pareto distribution is named after the Italian economist Vilfredo Pareto • The Pareto Distribution is a continuous distribution • It is probably the most commonly used severity distribution along side the Gamma Distribution • It models a pattern in which most claims are small but there is the potential for very large losses (heavy tails) • It is particularly prevalent in modelling the claims severity experienced in Liability Insurance, Excess of Loss Reinsurance, Marine Insurance or any loss where the majority of losses are small but there is potential for large losses • The Pareto Distribution is also widely used in Catastrophe or CAT models to model losses due to extreme events like exceptional floods or windstorms

  22. Pareto Distribution Formula • The CDF (Cumulative Distribution Function) for a Pareto Distributed random variable is: • Where M is the minimum value of the Pareto random variable which is greater than zero, and a is a positive number greater than zero defining the shape or alpha of the distribution • The PDF (Probability Density Function) for a Pareto random variable is: For X >= M For X >= M

  23. The inverse CDF for a Pareto random variable is: • The average or expected value of a Pareto random variable is equal to (when shape is greater than 1, see appendix): • We can invert this function to get the shape parameter in terms of the average and minimum values of the Pareto random variable (this is useful if we want to fit the distribution to a dataset):

  24. Pareto CDF and Inverse CDF in VBA • Excel does not have any built in support for the Pareto Distribution, however we can easily add our own functions: Public Function ParetoCDF2(X,Shape,Min) ParetoCDF2 = 1 – (Min/X)^Shape End Function Public Function ParetoCDF(X,Average,Min) Shape = Average / (Average – Min) ParetoCDF = 1 – (Min/X)^Shape End Function Public Function ParetoICDF(P,Average,Min) Shape = Average / (Average – Min) ParetoICDF = Min / (1 – P)^(1/Shape) End Function

  25. Pareto CDF: m = 5 and min = 3 =ParetoCDF(5,5,3) =ParetoICDF(0.95,5,3)

  26. Pareto Distributed Random Numbers m = 5 and min = 3 Rand() =ParetoICDF(rand(),5,3)

  27. Interpretation of the Minimum • One feature of Pareto distributed random variables is that they have a minimum value which they do not go below • This minimum has two interpretations as far as the modelling of Claim Severities • One interpretation of this minimum is that it represents the deductible on a policy, so claims on losses in value less than this minimum are never made • Another interpretation is that due to the nature of insured losses there is a natural lower boundary to their size (for example, you will never have £100 of damage caused by a fire of an offshore platform)

  28. Pareto Distributions and Repeating Proportionalities • The Pareto distribution describes a phenomena we frequently see in the world about us – that is that there are repeating proportional patterns in populations • One of the most famous examples of this was the observation by the Italian Economist that 80% of all the wealth was owned by the wealthiest 20% of the population • He also observed that within that wealthiest 20% the top 20% or the top 4% (20% * 20%) owned 80% of that 80% or owned 64% (80% * 80%) of all the wealth and so on… • This pattern can lead to some very wealthy people!

  29. These repeating proportional patterns are also exhibited by the Pareto Distribution • To illustrate this assume losses are Pareto Distributed and have a minimum of 100 and an average of 500 • What proportions of losses are greater than 120 (100 * 120% or 20% above 100) above the minimum: (1 – ParetoCDF(120,500,100) ) • What proportions of losses are greater than 144 (120 *120% or 20% above 120): (1 – ParetoCDF(144,500,100) ) • What proportion of the losses above 120 are also above 144? (Divide the probability of it being greater than 144 by the probability of it being greater than 120 to see it is the same) • This pattern can lead to some very large losses!

  30. Splitting Loss Between Multiple Reinsurers Into Layers Each Reinsurer also has a Pareto Loss Distribution with Shape a but with different Minimums equal to the Deductible of their Layer Loss Distribution follows Pareto Distribution with some Shape Parameter a

  31. Fitting the Severity Distribution • Even if we know the type of distribution we wish to use to describe a real world phenomena (like the size of claims) we still need to fit the distribution • The fitting process involves selecting correct parameters for the distributions • For the Pareto distribution this can simply involve selecting the average and minimum loss from historical loss data (this is called the Method of Moments) • The loss data needs to be divided into groups with the which are believed to have the same level risk or severity distribution • This means splitting the loss data into groups with the same Claim Type and the same Risk Factors

  32. Risk Factors and Claim Types Risk Factors Claim Type Risk Factor 1 High Risk or Low Risk Insurance Policy ………………… ………………… ………………… ………………… ………………… Claim Type 1 Claim Type 2 Risk Factor 2 Low Risk or High Risk Policies with the same Risk Factors are of equivalent risk in that their Claim Types are of similar frequency and severity Different Claim Types can have different Frequency and Severity distributions so also have to be separated

  33. Splitting Out Loss Data Claim Type 1 Factor 1 Low Risk Losses are split by Risk Factors and Claim Type so that they are statistically homogenous (ie can be described by the same Frequency and Severity Distributions). This cells would represent homogenous data for Claim Type 1 with High Risk for factor 1 and Low Risk for factor 2 Factor 1 High Risk Factor 2 Low Risk Factor 2 High Risk

  34. Example of Division of Risk Factors and Claim Types for Motor Insurance Risk Factors Claim Type Vehicle Age Accidental Damage Motor Insurance Policy ………………… ………………… ………………… ………………… ………………… Vehicle Type Fire & Theft Driver Age Windscreen Year of No Claims Discount Third Party Bodily Injury Third Party Property Damage District / Area Some Risk factors can have a large number of Categories such as 50 Vehicle Types

  35. Why Loss Distributions are Useful • The loss distribution compresses our description of how something behaves down to the an equation and a couple of parameters – this is obviously convenient when building a model • They allow us to estimate the chance of very large losses we have not yet observed in our data (extrapolation) • They allow us to see what is happening between the losses we have observed (interpolation) • If we do not have any relevant data regarding the behaviour of a loss we can fit a distribution by just estimating one or two parameters - for example the minimum and average if we decide the loss follows a Pareto Distribution

  36. Severity Distribution 2: The Gamma Distribution • The Gamma Distribution is another distribution widely used in the modelling of claim severities • It is generally used for classes of insurance which do not exhibit very large losses – such as property insurance or vehicle damage in motor insurance • It is a flexible distribution, that can fit a wide variety of random patterns • The Gamma Distribution is related to the exponential distribution in that the sum Exponential Distributed random variables give a Gamma Distributed random variable • It’s PDF and CDF formula are complex mathematically….

  37. Gamma Distribution Formula The formula for the CDF for the gamma function is: Where g is the incomplete gamma function and G is the gamma function, these are special mathematical functions (see appendix) a is called the shape parameter, b is the scale parameter and M is the minimum value for the gamma random variable The PDF of the Gamma distribution is:

  38. The Average and Variance of a Gamma distributed random variable can be calculated as follows: • These formula can be inverted to get the shape and scale in terms of the average, variance and minimum:

  39. Gamma CDF and Inverse CDF in VBA • Excel has limited built in support for the Gamma function, we will create our own functions in VBA: Public Function GammaCDF(X, Average, Variance, Min) ShapeParam = (Average - Min) ^ 2 / Variance ScaleParam = Variance / (Average - Min) GammaCDF = Application.WorksheetFunction.GammaDist(X - Min, ShapeParam, ScaleParam, True) End Function Public Function GammaICDF(Probability, Average, Variance, Min) ShapeParam = (Average - Min) ^ 2 / Variance ScaleParam = Variance / (Average - Min) GammaICDF = Application.WorksheetFunction.GammaInv(Probability, ShapeParam, ScaleParam) + Min End Function

  40. Gamma CDF and PDF where m=7 ,s2=6and Min=3 =GammaCDF(6,7,6,3) =GammaICDF(0.95,7,6,3)

  41. Gamma Distributed Random Numbers where m=7 ,s2=6and Min=3 Rand() =GammaICDF(rand(),7,6,3)

  42. Alternative to the CDF: Cantelli’s Inequality • Cantelli’s Inequality is an important result from statistics that places an upper boundary on the probability of a random variable being less than or equal to some value • The estimate can be calculated just from the mean and the variance of the random variable: • Note this formula is only valid for values of x BELOW the average m • The true probability is likely to be less than this estimate BUT it has the advantage in that we do not have to know the distribution of the random variable just its Mean and Variance!

  43. Cantelli’s Inequality Example • Imagine we have a portfolio whose average return is 8% (0.08) and variance is 0.003 • We can use Cantelli’s inequality to estimate the probability of losing more that 5% of the portfolios value (the random return on the portfolio being less than -5%) • So Cantelli’s inequality tells us that the chance or probability of losing more than 5% of the portfolio is 15.07% (0.1507) • This is a worst case estimate the true probability will be less than or equal to this

  44. Claim Frequency • So far we have focused on distributions modelling the severity or size of claims • Another uncertainty faced by an insurance company when it underwrites a risk is whether losses will arise and the timing and number of losses • A common way to model this uncertainty is to assign a distribution to the waiting time between claims • A standard distribution used to describe this elapsed waiting time is the Exponential Distribution

  45. Simulating Claim Frequency Random waiting times between claims sampled from an exponential distribution Claim 1 Claim 2 Claim 3 Claim 4 Claim 5 Total Time

  46. Poisson Process • If we assume the waiting time between events is Exponentially Distributed we have a randomly spaced sequence (or process) across time called a Poisson Process • The Poisson Process is an example of a Stochastic Process • Stochastic Processes are phenomena who change randomly across time (such as the price of assets, the number of claims received, the spread of a virus in a population)

  47. A Poisson Process Realisation Waiting Time Between Claims is Exponentially Distributed with an Average of 0.1 days

  48. Frequency-Severity Model • The Insurance Company is not just interested in the number of claims that occur but also their size or severity – the frequency and severity • We will assume that each claim that occurs has a Gamma Distribution with m = 160, s2 = 300, min = 100 • If we graph out the Total Claim against time we have an example of a Compound Poisson Process – which is probably one of the most important statistical models in Actuarial Science

  49. Compound Poisson Process The size of each claim is Gamma Distributed Waiting Time Between Claims is Exponentially Distributed with an Average of 0.1 days

  50. Simplifying the Poisson Process : The Poisson Distribution • The Poisson Process is too detailed and abstract for most of the Risk Analysis carried out by Insurance Companies • Normally we are interested in the number of losses or events that occur over a time period not the exact timing of those events • Also we do not think in terms of the average time between events but the average number of events that occur • Converting average waiting time to average frequency is very simple • We could can also use the Poisson Process to count the number of events that occur over a period to obtain the Poisson Distribution

More Related