Distinguishing the forest from the trees 2006 cas ratemaking seminar
This presentation is the property of its rightful owner.
Sponsored Links
1 / 26

Distinguishing the Forest from the Trees 2006 CAS Ratemaking Seminar PowerPoint PPT Presentation


  • 45 Views
  • Uploaded on
  • Presentation posted in: General

Distinguishing the Forest from the Trees 2006 CAS Ratemaking Seminar. Richard Derrig, PhD, Opal Consulting www.opalconsulting.com Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc. www.data-mines.com. Data Mining.

Download Presentation

Distinguishing the Forest from the Trees 2006 CAS Ratemaking Seminar

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Distinguishing the forest from the trees 2006 cas ratemaking seminar

Distinguishing the Forest from the Trees2006 CAS Ratemaking Seminar

Richard Derrig, PhD,

Opal Consulting

www.opalconsulting.com

Louise Francis, FCAS, MAAA

Francis Analytics and Actuarial Data Mining, Inc.

www.data-mines.com


Data mining

Data Mining

  • Data Mining, also known as Knowledge-Discovery in Databases (KDD), is the process of automatically searching large volumes of data for patterns. In order to achieve this, data mining uses computational techniques from statistics, machine learning and pattern recognition.

    • www.wikipedia.org


The problem nonlinear functions

The Problem: Nonlinear Functions


An insurance nonlinear function provider bill vs probability of independent medical exam

An Insurance Nonlinear Function:Provider Bill vs. Probability of Independent Medical Exam


Common links for glms

Common Links for GLMs

The identity link: h(Y) = Y

The log link: h(Y) = ln(Y)

The inverse link: h(Y) =

The logit link: h(Y) =

The probit link: h(Y) =


Nonlinear example data

Nonlinear Example Data


Desirable features of a data mining method

Desirable Features of a Data Mining Method:

  • Any nonlinear relationship can be approximated

  • A method that works when the form of the nonlinearity is unknown

  • The effect of interactions can be easily determined and incorporated into the model

  • The method generalizes well on out-of sample data


Decision trees

Decision Trees

  • In decision theory (for example risk management), a decision tree is a graph of decisions and their possible consequences, (including resource costs and risks) used to create a plan to reach a goal. Decision trees are constructed in order to help with making decisions. A decision tree is a special form of tree structure.

    • www.wikipedia.org


Different kinds of decision trees

Different Kinds of Decision Trees

  • Single Trees (CART, CHAID)

  • Ensemble Trees, a more recent development (TREENET, RANDOM FOREST)

    • A composite or weighted average of many trees (perhaps 100 or more)

    • There are many methods to fit the trees and prevent overfitting

      • Boosting: Iminer Ensemble and Treenet

      • Bagging: Random Forest


The methods and software evaluated

The Methods and Software Evaluated

1) TREENET5) Iminer Ensemble

2) Iminer Tree6) Random Forest

3) SPLUS Tree7) Naïve Bayes (Baseline)

4) CART8) Logistic (Baseline)


Cart example of parent and children nodes total paid as a function of provider 2 bill

CART Example of Parent and Children NodesTotal Paid as a Function of Provider 2 Bill


Cart example with seven nodes ime proportion as a function of provider 2 bill

CART Example with Seven Nodes IME Proportion as a Function of Provider 2 Bill


Cart example with seven step functions ime proportions as a function of provider 2 bill

CART Example with Seven Step FunctionsIME Proportions as a Function of Provider 2 Bill


Ensemble prediction of total paid

Ensemble Prediction of Total Paid


Ensemble prediction of ime requested

Ensemble Prediction of IME Requested


Naive bayes

Naive Bayes

  • Naive Bayes assumes conditional independence

  • Probability that an observation will have a specific set of values for the independent variables is the product of the conditional probabilities of observing each of the values given category cj


Bayes predicted probability ime requested vs quintile of provider 2 bill

Bayes Predicted Probability IME Requested vs. Quintile of Provider 2 Bill


Na ve bayes predicted ime vs provider 2 bill

Naïve Bayes Predicted IME vs. Provider 2 Bill


The fraud surrogates used as dependent variables

The Fraud Surrogates used as Dependent Variables

  • Independent Medical Exam (IME) requested

  • Special Investigation Unit (SIU) referral

  • IME successful

  • SIU successful

  • DATA: Detailed Auto Injury Claim Database for Massachusetts

  • Accident Years (1995-1997)


Results for ime requested

Results for IME Requested


Treenet roc curve ime auroc 0 701

TREENET ROC Curve – IMEAUROC = 0.701


Ranking of methods software 1 st two surrogates

Ranking of Methods/Software – 1st Two Surrogates


Ranking of methods software last two surrogates

Ranking of Methods/Software – Last Two Surrogates


Plot of auroc for siu vs ime decision

Plot of AUROC for SIU vs. IME Decision


Plot of auroc for siu vs ime favorable

Plot of AUROC for SIU vs. IME Favorable


References

References

  • Brieman, L., J. Freidman, R. Olshen, and C. Stone, Classification and Regression Trees, Chapman Hall, 1993.

  • Friedman, J., Greedy Function Approximation: The Gradient Boosting Machine, Annals of Statistics, 2001.

  • Hastie, T., R. Tibshirani, and J. Friedman, The Elements of Statistical Learning, Springer, New York., 2001.

  • Kantardzic, M., Data Mining, John Wiley and Sons, 2003.


  • Login