On Predictive Modeling for Claim Severity

On Predictive Modeling for Claim Severity Glenn Meyers ISO Innovative Analytics CARe Seminar June 6-7, 2005

Problems with Experience Rating for Excess of Loss Reinsurance • Use submission claim severity data • Relevant, but • Not credible • Not developed • Use industry distributions • Credible, but • Not relevant (???)

General Problems withFitting Claim Severity Distributions • Parameter uncertainty • Fitted parameters of chosen model are estimates subject to sampling error. • Model uncertainty • We might choose the wrong model. There is no particular reason that the models we choose are appropriate. • Loss development • Complete claim settlement data is not always available.

Outline of Remainder of Talk • Quantifying Parameter Uncertainty • Likelihood ratio test • Incorporating Model Uncertainty • Use Bayesian estimation with likelihood functions • Uncertainty in excess layer loss estimates • Bayesian estimation with prior models based on data reported to a statistical agent • Reflect insurer heterogeneity • Develops losses

How Paper is Organized • Start with classical hypothesis testing. • Likelihood ratio test • Calculate a confidence region for parameters. • Calculate a confidence interval for a function of the parameters. • For example, the expected loss in a layer • Introduce a prior distribution of parameters. • Calculate predictive mean for a function of parameters.

The Likelihood Ratio Test

An Example – The Pareto Distribution • Simulate random sample of size 1000 a = 2.000, q = 10,000

Hypothesis Testing Example • Significance level = 5% c2 critical value = 5.991 • H0: (q,a) = (10000, 2) • H1: (q,a) ≠ (10000, 2) • lnLR = 2(-10034.660 + 10035.623) =1.207 • Accept H0

Hypothesis Testing Example • Significance level = 5% c2 critical value = 5.991 • H0: (q,a) = (10000, 1.7) • H1: (q,a) ≠ (10000, 1.7) • lnLR = 2(-10034.660 + 10045.975) =22.631 • Reject H0

Confidence Region • X% confidence region corresponds to the 1-X% level hypothesis test. • The set of all parameters (q,a) that fail to reject corresponding H0. • For the 95% confidence region: • (10000, 2.0) is in. • (10000, 1.7) out.

Confidence Region Outer Ring 95%, Inner Ring 50%

Grouped Data • Data grouped into four intervals • 562 under 5000 • 181 between 5000 and 10000 • 134 between 10000 and 20000 • 123 over 20000 • Same data as before, only less information is given.

Confidence Region for Grouped Data Outer Ring 95%, Inner Ring 50%

Confidence Region for Ungrouped Data Outer Ring 95%, Inner Ring 50%

Estimation with Model UncertaintyCOTOR Challenge – November 2004 • COTOR published 250 claims • Distributional form not revealed to participants • Participants were challenged to estimate the cost of a $5M x $5M layer. • Estimate confidence interval for pure premium

You want to fit a distribution to 250 Claims • Knee jerk first reaction, plot a histogram.

This will not do! Take logs • And fit some standard distributions.

Still looks skewed. Take double logs. • And fit some standard distributions.

Still looks skewed. Take triple logs. • Still some skewness. • Lognormal and gamma fits look somewhat better.

Candidate #1Quadruple lognormal

Candidate #2Triple loggamma

Candidate #3Triple lognormal

All three cdf’s are within confidence interval for the quadruple lognormal.

Elements of Solution • Three candidate models • Quadruple lognormal • Triple loggamma • Triple lognormal • Parameter uncertainty within each model • Construct a series of models consisting of • One of the three models. • Parameters within a broad confidence interval for each model. • 7803 possible models

Steps in Solution • Calculate likelihood (given the data) for each model. • Use Bayes’ Theorem to calculate posterior probability for each model • Each model has equal prior probability.

Steps in Solution • Calculate layer pure premium for 5 x 5 layer for each model. • Expected pure premium is the posterior probability weighted average of the model layer pure premiums. • Second moment of pure premium is the posterior probability weighted average of the model layer pure premiums squared.

CDF of Layer Pure Premium Probability that layer pure premium ≤ x equals Sum of posterior probabilities for which the model layer pure premium is ≤ x

Numerical Results

Histogram of Predictive Pure Premium

Example with Insurance Data • Continue with Bayesian Estimation • Liability insurance claim severity data • Prior distributions derived from models based on individual insurer data • Prior models reflect the maturity of claim data used in the estimation

Initial Insurer Models • Selected 20 insurers • Claim count in the thousands • Fit mixed exponential distribution to the data of each insurer • Initial fits had volatile tails • Truncation issues • Do small claims predict likelihood of large claims?

Initial Insurer Models

Low Truncation Point

High Truncation Point

Selections Made • Truncation point = $100,000 • Family of cdf’s that has “correct” behavior • Admittedly the definition of “correct” is debatable, but • The choices are transparent!

Selected Insurer Models

Each model consists of • The claim severity distribution for all claims settled within 1 year • The claim severity distribution for all claims settled within 2 years • The claim severity distribution for all claims settled within 3 years • The ultimate claim severity distribution for all claims • The ultimate limited average severity curve

Three Sample Insurers Small, Medium and Large • Each has three years of data • Calculate likelihood functions • Most recent year with #1 on prior slide • 2nd most recent year with #2 on prior slide • 3rd most recent year with #3 on prior slide • Use Bayes theorem to calculate posterior probability of each model

Formulas for Posterior Probabilities Model (m) Cell Probabilities Number of claims Likelihood (m) Using Bayes’ Theorem

ResultsTaken from paper.

Formulas for Ultimate Layer Pure Premium • Use #5 on model (3rd previous) slide to calculate ultimate layer pure premium

Results • All insurers were simulated from same population. • Posterior standard deviation decreases with insurer size.

Possible Extensions • Obtain model for individual insurers • Obtain data for insurer of interest • Calculate likelihood, Pr{data|model}, for each insurer’s model. • Use Bayes’ Theorem to calculate posterior probability of each model • Calculate the statistic of choice using models and posterior probabilities • e.g. Loss reserves

On Predictive Modeling for Claim Severity