CANE 2007 Spring Meeting Visualizing Predictive Modeling Results

CANE 2007 Spring MeetingVisualizing Predictive Modeling Results Chuck Boucek (312) 879-3859

Agenda • Data Validation • Hypothesis Building • Model Building • Model Testing • Monitoring • Visualization as a Diagnostic Tool

Data Validation • Goals • Validate reasonableness of data • Understand key patterns in data • Understand changes in data and underlying business through time

Data Validation • Histogram is a simple tool to for reasonability testing of modeling database

Data Validation • Mosaic Plot shows the distribution of predictors in two dimensions

Data Validation • Missing Data plot shows the relationship of missing data elements

Data Validation • Time series plots identify consistency of data over time 1.0 0.9 0.8 0.7 0.6 Claims Match to Exposure 0.5 0.4 0.3 0.2 0.1 Company 1 Company 2 0.0 Company 3 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004

Hypothesis Building • Goals • Perform initial analysis of potential predictor variables • Limit the list of predictor variables to be employed in subsequent phases of model building • Further reasonability testing of data

Demographic Variable 1 1.500 0.750 15000 1.125 0.625 12500 0.500 0.750 10000 Frequency Loss Ratio Pure Premium 0.375 7500 0.375 0.250 5000 0.0 0 0 0.3 0.3 0 0.01 0.02 0.01 0.02 0.3 0.01 0.02 0.001 0.001 0.001 200 10000 40000 150 7500 32500 100 5000 25000 Severity Premium ($MM) Exposure ($MM) 50 2500 17500 0 0 10000 0 0.3 0.3 0 0.3 0 0.01 0.02 0.01 0.02 0.01 0.02 0.001 0.001 0.001

Hypothesis Building • Quantile-Quantile plots help identify needed transformations of data

Hypothesis Building • Correlation Web concisely summarizes a correlation matrix

Model Building • Model building is an iterative process • Understanding patterns and relationships throughout this process is critical

Model Building • Partial Plots are a key tool to visualize predictor variables throughout the model building process • What is a “Partial Plot?” Linear Predictor = k + b1X1 + b2X2 + b3X3 + b4X4 Predicted value = (ek) x (eb1X1) x (eb2X2) x (eb3X3) x (eb4X4) • Partial Plot demonstrates an individual predictor variables contribution to final prediction

1.25 8000 1.00 6000 0.75 4000 2000 0.50 0 0 5 10 15 20 25 30 0 5 10 15 20 25 30 Model Building • Partial Plot demonstrates an individual predictor variables contribution to final prediction

1.25 1.000 0.75 0.50 0.25 0 10 20 30 Model Building • Partial Plot with modified scatter plot of variable

0 5 10 15 20 25 1997 1998 1999 2000 0 5 10 15 20 25 Model Building • Time Consistency plot is a critical tool for numeric predictors

1.30 1.20 1.10 1.00 Credit Level 2 0.90 0.80 Credit Level 1 0.70 Model Building • Partial Plot for a factor variable

Credit Variable 1 1.500 0.750 15000 1.125 0.625 12500 0.500 0.750 10000 Frequency Loss Ratio Pure Premium 0.375 7500 0.375 0.250 0.0 5000 No No No Yes Yes Yes 200 10000 40000 150 7500 32500 Exposure (Pred. Count) 100 5000 25000 Severity Premium ($MM) 50 2500 17500 0 0 10000 No No Yes Yes No Yes

Model Testing • Likely the most critical visualizations in predictive modeling work • Management’s perception of a project’s success will likely depend on these visualizations • Holdout tests • Cross validation tests

Model Testing • Lift Chart shows overall model performance Loss Ratio Lift Chart - Holdout Sample 1.0 1.0 Predicted Actual 0.9 0.7 0.8 0.8 0.7 0.7 0.6 0.6 0.5 0.5 0.4 0.4

1.0 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 Model Testing • ROC Curve shows overall model performance Holdout Sample ROC Curve Null, 0 Perfect, 1 prem, 0.51 pred.loss, 0.56

Out of Sample Error Number of variables in final model Prediction Error In Sample Error 5 10 15 20 25 Number of Predictors Model Testing • Classical Cross Validation exhibit

Monitoring Model Results • The work does not end when the lift chart looks good • Monitoring tools • Decile management • Exception analysis • Model vs. Actual Results

Monitoring Model Results • Decile Management • Retention • Loss Ratio • Rate Action • Tier/Schedule Mod

Monitoring Model Results • Average score over time

Monitoring Model Results • Loss ratio of model exceptions

Visualization as Diagnostic Tool • Frequency and severity models have been developed • Model is underperforming in predicting loss ratio • Likely cause of underperformance is severity model

Visualization as Diagnostic Tool

Visualization as Diagnostic Tool • Two different visualizations of the same model tell a very different story!

CANE 2007 Spring Meeting Visualizing Predictive Modeling Results