Data Mining – Best Practices Part #2. Richard Derrig, PhD, Opal Consulting LLC CAS Spring Meeting June 16-18, 2008. Data Mining.
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Data Mining – Best PracticesPart #2
Richard Derrig, PhD,
Opal Consulting LLC
CAS Spring Meeting
June 16-18, 2008
AGENDA
Predictive v Explanatory Models
Discussion of Methods
Example: Explanatory Models for Decision to Investigate Claims
The “Importance” of Explanatory and Predictive Variables
An Eight Step Program for Building
a Successful Model
Supervised learning
Most common situation
Target variable
Frequency
Loss ratio
Fraud/no fraud
Some methods
Regression
Decision Trees
Some neural networks
Unsupervised learning
No Target variable
Group like records together-Clustering
A group of claims with similar characteristics might be more likely to be of similar risk of loss
Ex: Territory assignment,
Some methods
PRIDIT
K-means clustering
Kohonen neural networks
1) TREENET7) Iminer Ensemble
2) Iminer Tree8) MARS
3) SPLUS Tree9) Random Forest
4) CART10) Exhaustive Chaid
5) S-PLUS Neural11) Naïve Bayes (Baseline)
6) Iminer Neural 12) Logistic reg ( (Baseline)
=
Cube size proportional to annual Medicaid revenues
© 1999 Intelligent Technologies Corporation