Toward a unified approach to fitting loss models

1 / 34

# Toward a unified approach to fitting loss models - PowerPoint PPT Presentation

Toward a unified approach to fitting loss models. Jacques Rioux and Stuart Klugman, for presentation at the IAC, Feb. 9, 2004. Handout/slides. E-mail me [email protected] Overview. What problem is being addressed? The general idea The specific ideas Models to consider

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'Toward a unified approach to fitting loss models' - meadow

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Toward a unified approach to fitting loss models

Jacques Rioux and Stuart Klugman, for presentation at the IAC, Feb. 9, 2004

Overview
• What problem is being addressed?
• The general idea
• The specific ideas
• Models to consider
• Recording the data
• Representing the data
• Testing a model
• Selecting a model
The problem
• Too many models
• Two books – 26 distributions!
• Can mix or splice to get even more
• Data can be confusing
• Deductibles, limits
• Too many tests and plots
• Chi-square, K-S, A-D, p-p, q-q, D
The general idea
• Limited number of distributions
• Standard way to present data
• Retain flexibility on testing and selection
Distributions
• Should be
• Familiar
• Few
• Flexible
A few familiar distributions
• Exponential
• Only one parameter
• Gamma
• Two parameters, a mode if a>1.
• Lognormal
• Two parameters, a mode
• Pareto
• Two parameters, a heavy right tail
Flexible
• That is,

where

and all

• Some restrictions:
• Only the exponential can be used more than once.
• Cannot use both the gamma and lognormal.
Why mixtures?
• Allows different shape at beginning and end (e.g. mode from lognormal, tail from Pareto).
• By using several exponentials can have most any tail weight (see Keatinge).
Estimating parameters
• Use only maximum likelihood
• Asymptotically optimal
• Can be applied in all settings, regardless of the nature of the data
• Likelihood value can be used to compare different models
Representing the data
• Why do we care?
• Graphical tests require a graph of the empirical density or distribution function.
• Hypothesis tests require the functions themselves.
What is the issue?
• None if,
• All observations are discrete or grouped
• No truncation or censoring
• But if so,
• For discrete data the Kaplan-Meier product-limit estimator provides the empirical distribution function (and is the nonparametric mle as well).
Issue – grouped data
• For grouped data,
• If completely grouped, the histogram represents the pdf, the ogive the cdf.
• If some grouped, some not, or multiple deductibles, limits, our suggestion is to replace the observations in the interval with that many equally spaced points.
Review
• Given a data set, we have the following:
• A way to represent the data.
• A limited set of models to consider.
• Parameter estimates for each model.
• Decide which models are acceptable.
• Decide which model to use.
Example
• The paper has two example, we will look only at the second one.
• Data are individual payments, but the policies that produced them had different deductibles (100, 250, 500) and different maximum payments (1,000, 3,000, 5,000).
• There are 100 observations.
Distribution function plot
• Plot the empirical and model cdfs together. Note, because in this example the smallest deductible is 100, the empirical cdf begins there.
• To be comparable, the model cdf is calculated as
Example model
• All plots and tests that follow are for a mixture of a lognormal and exponential distribution. The parameters are
Confidence bands
• It is possible to create 95% confidence bands. That is, we are 95% confident that the true distribution is completely within these bands.
• Formulas adapted from Klein and Moeschberger with a modification for multiple truncation points (their formula allows only multiple censoring points).
Other CDF pictures
• Any function of the cdf, such as the limited expected value, could be plotted.
• The only one shown here is the difference plot – magnify the previous plot by plotting the difference of the two distribution functions.
Histogram plot
• Plot a histogram of the data against the density function of the model.
• For data that were not grouped, can use the empirical cdf to get cell probabilities.
Hypothesis tests
• Null-model fits
• Alternative-it doesn’t
• Three tests
• Kolmogorov-Smirnov
• Anderson-Darling
• Chi-square
Kolmogorov-Smirnov
• Test statistic is maximum difference between the empirical and model cdfs. Each difference is multiplied by a scaling factor related to the sample size at that point.
• Critical values are way off when parameters estimated from data.
Anderson-Darling
• Test statistic looks complex:
• where e is empirical and m is model.
• The paper shows how to turn this into a sum.
• More emphasis on fit in tails than for K-S test.
Chi-square test
• You have seen this one before.
• It is the only one with an adjustment for estimating parameters.
Results
• K-S: 0.5829
• A-D: 0.2570
• Chi-square p-value of 0.5608
• The model is clearly acceptable. Simulation study needed to get p-values for these tests. Simulation indicates that the p-values are over 0.9.
Comparing models
• Good picture
• Better test numbers
• Likelihood criterion such as Schwarz Bayesian. The SBC is the loglikelihood minus (r/2)ln(n) where r is the number of parameters and n is the sample size.
Which is the winner?
• Referee A – loglikelihood rules – pick gamma/exp/exp mixture
• This is a world of one big model and the best is the best, simplicity is never an issue.
• Referee B – SBC rules – pick exponential
• Parsimony is most important, pay a penalty for extra parameters.
• Me – lognormal/exp. Great pictures, better numbers than exponential, but simpler than three component mixture.
Can this be automated?
• We are working on software