Data mining in health insurance
Download
1 / 64

Data mining in Health Insurance - PowerPoint PPT Presentation


  • 119 Views
  • Uploaded on

Data mining in Health Insurance. Introduction. Rob Konijn, [email protected] VU University Amsterdam Leiden Institute of Advanced Computer Science (LIACS) Achmea Health Insurance Currently working here Delivering leads for other departments to follow up Fraud, abuse

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Data mining in Health Insurance' - hamlet


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Introduction
Introduction

  • Rob Konijn, [email protected]

    • VU University Amsterdam

    • Leiden Institute of Advanced Computer Science (LIACS)

    • Achmea Health Insurance

      • Currently working here

      • Delivering leads for other departments to follow up

        • Fraud, abuse

  • Research topic keywords: data mining/ unsupervised learning / fraud detection


Outline
Outline

  • Intro Application

    • Health Insurance

    • Fraud detection

  • Part 1: Subgroup discovery

  • Part 2: Anomaly detection (slides partly by Z. Slavik, VU)


Intro application
Intro Application

  • Health Insurance Data

  • Health Insurance in NL

    • Obligatory

    • Only private insurance companies

    • About 100 euro/month(everyone)+170 euro (income)

    • Premium increase of 5-12% each year

      Achmea: about 6 million customers


Funding of health insurance costs in the netherlands
Funding of Health Insurance Costs in the Netherlands

vereveningsfonds

vereveningsfonds

vereveningsfonds

vereveningsfonds

vereveningsfonds

vereveningsfonds

vereveningsfonds

vereveningsfonds

rijksbijdrage

verzekerden 18-

2 mld

vereveningsbijdrage

inkomensafh.

bijdrage

werkgevers 17 mld

18 mld

zorgverzekeraar

verzekerde

zorgverzekeraar

nominale premie 18+:

- rekenpremie (~€ 947/vrz): 12 mld

- opslag (~€ 150/vrz) : 2 mld

30 mld

zorguitgaven


Verevenings model
Verevenings-model

Mannen

Vrouwen

0 - 4 jr

1,400

1,210

  • By population characteristics

    • Age

    • Gender

    • Income, social class

    • Type of work

  • Calculation afterwards

    • High costs compensation (>15.000 euro)

5 - 9 jr

1,026

936

10 - 14 jr

907

918

15 - 17 jr

964

1,062

18 - 24 jr

892

1,214

25 - 29 jr

870

1,768

30 - 34 jr

905

1,876

35 - 39 jr

980

1,476

40 - 44 jr

1,044

1,232

45 - 49 jr

1,183

1,366

50 - 54 jr

1,354

1,532

55 - 59 jr

1,639

1,713

60 - 64 jr

1,885

1,905

65 - 69 jr

2,394

2,201

70 - 74 jr

2,826

2,560

75 - 79 jr

3,244

2,886

80 - 84 jr

3,349

3,018

85 - 89 jr

3,424

3,034

90 jr e.o.

3,464

3,014



Introduction application the data
Introduction Application:The Data

  • Transactional data

    • Records of an event

    • Visit to a medical practitioner

  • Charged directly by medical practioner

  • Patient is not involved

  • Risk of fraud


Transactional data
Transactional Data

  • Transactions: Facts

    • Achmea: About 200 mln transactions per year

  • Info of customers and practitioners: dimensions


Different levels of hierarchy
Different levels of hierarchy

  • Records represent events

  • However, for example for fraud detection, we are interested in customers, or medical practitoners

  • See examples next pages

  • Groups of records: Subgroup Discovery

  • Individual patients/practioners: outlier detection


Different types of fraud hierarchy
Different types of fraud hierarchy

  • On a patient level, or on a hospital level:


Handling different hierarchy
Handling different hierarchy

  • Creating profiles from transactional data

  • Aggregating costs over a time period

    • Each record: patient

      • Each attribute i =1 to n: cost spent on treatment i

  • Feature construction, for example

    • The ratio of long/short consults (G.P.)

    • The ratio of 3-way and 2 way fillings (Dentist)

    • Usually used for one-way analysis


Different types of fraud detection
Different types of fraud detection

  • Supervised

    • A labeled fraud set

    • A labeled non-fraud set

    • Credit cards, debit cards

  • Unsupervised

    • No labels

    • Health Insurance, Cargo, telecom, tax etc.


Unsupervised learning in health insurance data
Unsupervised learning in Health Insurance Data

  • Anomaly Detection (outlier detection)

    • Finding individual deviating points

  • Subgroup Discovery

    • Finding (descriptions of) deviating groups

  • Focus on differences and uncommon behavior

    • In contrast to other unsupervised learning methods

      • Clustering

      • Frequent Pattern mining


Subgroup discovery
Subgroup Discovery

  • Goal: Find differences in claim behavior of medical practitioners

  • To detect inefficient claim behavior

    • Actions:

      • A visit from the account manager

      • To include in contract negotiations

    • In the extreme case: fraud

      • Investigation by the fraud detection department

  • By describing deviations of a practitioner from its peers

    • Subgroups


Patient level subgroup discovery
Patient-level, Subgroup Discovery

  • Subgroup (orange): group of patients

  • Target (red)

    • Indicates whether a patient visited a practitioner (1), or not (0)


Subgroup discovery quality measures
Subgroup Discovery: Quality Measures

  • Target Dentist: 1672 patiënten

    • Compare with peer group, 100.000 patients in total

  • Subgroup V11 > 42 euro : 10347 patients

    • V11: one sided filling

  • Crosstable


The cross table
The cross table

  • Cross table in data

  • Cross table expected:

  • Assuming independence


Calculating wracc and lift
Calculating Wracc and Lift

  • Size subgroup = P(S) = 0.10347, size target dentist = P(T) = 0.01672

  • Weighted Relative ACCuracy (WRAcc) = P(ST) – P(S)P(T) = (871 – 173)/100000 = 689/100000

  • Lift = P(ST)/P(S)P(T) = 871/173 = 5.03


Example dentistry at depth 1 one target dentist
Example dentistry, at depth 1, one target dentist



Making sd more useful adding prior knowledge
Making SD more useful: adding prior knowledge

  • Adding prior knowledge

    • Background variables patient (age, gender, etc.)

    • Specialism practitioner

    • For dentistry: choice of insurance

  • Adding already known differences

    • Already detected by domain experts themselves

    • Already detected during a previous data mining run





Quality measures
Quality Measures knowledge

  • Ratio (Lift)

  • Difference (WRAcc)

  • Squared sum (Chi-square statistic)


Example iterative approach
Example, iterative approach knowledge

  • Idea: add subgroup to prior knowledge iteratively

  • Target = single pharmacy

  • Patients that visited the hospital in last 3 years removed from data

  • Compare with peer group (400,000 patients), 2929 patiënts of target pharmacy

  • Top subgroup : “B03XA01 (Erythropoietin)>0 euro”

1 ‘target’

pharmacy

rest

subgroup

B03XA01 > 0

rest


Next iteration
Next iteration knowledge

  • Add “B03XA01 (EPO) >0 euro” to prior knowledge

  • Next best subgroup: “N05AX08 (Risperdal)>= 500 euro”


Figure describing subgroup n05ax08 500
Figure describing subgroup: knowledgeN05AX08 > 500

Left: target pharmacy, right: other pharmacies


Addition adding costs to quality measure
Addition: adding costs to quality measure knowledge

  • M55: dental cleaning

  • V11: 1-way filling

  • V21: polishing

  • Cost of treatments in subgroup 370 euro (average)

  • 791 more patients than expected

  • Total quality 791*370 = 292,469 euro


  • Iterative approach top 3 subgroups
    Iterative approach, top 3 subgroups knowledge

    • V12: 2-sided filling

    • V21: polishing

    • V60: indirect pulpa covering

  • V21 and V60 are not allowed on the same day

  • Claim back (from all dentists): 1.3 million euro



  • Other target types double binary target
    Other target types: double binary target knowledge

    • Target 1: year: 2009 or 2008

    • Target 2: target practitioner

    • Pattern:

      • M59: extensive (expensive) dental cleaning

      • C12: second consult in one year

    • Crosstable:


    Other target types multiclass target
    Other target types: Multiclass target knowledge

    • Subgroup (orange): group of patients

    • Target (red), now is a multi-value column, one value per dentist



    Anemaly detection

    Anemaly Detection knowledge

    The exampleabovecontains a contextualanomaly...


    Outline anomaly detection
    Outline Anomaly Detection knowledge

    • Anomalies

      • Definition

      • Types

      • Technique categories

      • Examples

    • Lecture based on

      • Chandola et al. (2009). Anomaly Detection: A Survey

      • Paper in BB

    38


    Definition
    Definition knowledge

    • “Anomaly detection refers to the problem of finding patternsin data that do not conform to expected behavior”

    • Anomalies, aka.

      • Outliers

      • Discordant observations

      • Exceptions

      • Aberrations

      • Surprises

      • Peculiarities

      • Contaminants


    Anomaly types
    Anomaly types knowledge

    Point anomalies

    • A data point is anomalous with respect to the rest of the data


    Not covered today
    Not covered today knowledge

    • Other types of anomalies:

      • Collective anomalies

      • Contextual anomalies

    • Other detection approaches:

      • Supervised learning

      • Semi supervised

        • Assume training data is from normal class

        • Use to detect anomalies in the future


    We focus on outlier scores
    We focus on outlier scores knowledge

    • Scores

      • You get a ranked list of anomalies

      • “We investigate the top 10”

      • “An anomaly has a score of at least 134”

      • Leads followed by fraud investigators

    • Labels

    ANOMALY


    Detection method categorisation
    Detection knowledgemethodcategorisation

    • Model based

    • Depth based

    • Distance Based

    • Information theory related (not covered)

    • Spectral theory related (not covered)


    Model based
    Model based knowledge

    • Build a (statistical) model of the data

    • Data instances occur in high probability regions of a stochastic model, while anomalies occur in low probability regions

    • Or: data instances have a high distance to the model are outliers

    • Or: data instances have a high influence on the model are outliers


    Example one way outlier detection
    Example: one way outlier detection knowledge

    • Pharmacy records

    • Records represent patients

    • One attribute at a time:

      • This example: attribute describing the costs spent on fertility medication (gonodatropin) in a year

    • We could use such one way detection for each attribute in the data



    Example model non parametric distribution
    Example, model = non-parametric distribution knowledge

    • Left: kernel density estimate

    • Right: boxplot



    Other models possible
    Other models possible knowledge

    • Probabilistic

      • Bayesian networks

    • Regression models

      • Regression trees/ random forests

      • Neural networks

    • Outlier score = prediction error (residual)


    Depth based methods
    Depth based methods knowledge

    • Applied on 1-4 dimensional datasets

      • Or 1-4 attributes at a time

    • Objects that have a high distance to the “center of the data” are considered outliers

    • Example Pharmacy:

      • Records represent patients

      • 2 attributes:

        • Costs spent on diabetes medication

        • Costs spent on diabetes testing material



    Distance based nearest neighbor based
    Distance based (nearest neighbor based) knowledge

    • Assumption:

      • Normal data instancesoccur in denseneighbourhoods, whileanomaliesoccurfarfromtheirclosestneighbours


    Similarity distance
    Similarity/distance knowledge

    • You need a similarity measure between two data points

      • Numeric attributes: Eucledian, etc.

      • Nominal: simple match often enough

      • Multivariate:

        • Distance using all attributes

        • Distance between attribute values, then combine


    Example dentistry data
    Example, dentistry data knowledge

    • Records represent dentists

    • Attributes are 14 cost categories

      • Denote the percentage of patients that received a claim from the category


    Option 1 distance to k th neighbour as anomaly score
    Option knowledge 1:Distance to kthneighbour as anomaly score


    Option 2 use relative densities of neighbourhoods
    Option 2: knowledgeUse relative densities of neighbourhoods

    • Density of neighbourhood estimated for each instance

    • Instances in the low density neighbourhoods are anomalous, others normal

    • Note:

      • Distance to kth neighbour is an estimate for the inverse of density (large distance  low density)

      • But this estimates outliers in varying density neighbourhoods badly


    LOF knowledge

    Average local density of k nearest neighbours

    Local density of instance

    • Local Outlier Factor:

    • Local density:

      • k divided by the volume of the smallest hyper-sphere centred around the instance, containing k neighbours

    • Anomalous instance:

      • Local density will belower than that ofthe k nearest neighbours



    3 clustering based a d techniques
    3. Clustering based a.d. techniques knowledge

    • 3 possibilities;

      1. Normal data instances belong to a cluster in the data, while anomalies do not belong to any cluster

      • Use clustering methods that do not force all instances to belong to a cluster

        • DBSCAN, ROCK, SSN

          2. Distance to the cluster center = outlier score

          3. Clusters with too few points are outlying clusters


    K means with 6 clusters centers of the dentistry data set
    K-means with 6 clusters, centers of the dentistry data set knowledge

    • Attributes: percent of patient that received claim from cost category

    • Clusters correspond to specialism

      • Dentist

      • Orthodontist

      • Orthodontist (charged by dentist)

      • Dentist

      • Dentist

      • Dental hygenist


    Combining subgroup discovery and outlier detection
    Combining Subgroup Discovery and Outlier Detection knowledge

    • Describe regions with outliers using SD

    • Identify suspicious medical practitioners

    • 2 or 3 step approach to describe outliers:

      • Calculate outlier score

      • Use subgroup discovery to describe regions with outliers.

      • (optional) identify the involved medical practitioners


    Example output
    Example output: knowledge

    • Look at patients with ‘P30>1050 euro’ for practitioner number 221

    • Left: all data, right: practitioner 221


    Descriptions of outliers loci outlier score
    Descriptions of outliers: knowledge LOCI outlier score

    • 1. Calculate outlier score

      • LOCI is a density based outlier score

    • 2. Describe outlying regions

    • Result top subgroup:

      • Orthodontics (dentist) 0.044 ^ Orthodontics 0.78

      • Group of 9 dentists with an average score of 3.9


    Conclusions
    Conclusions knowledge

    • Health insurance: Interesting application domain

      • Very relevant

    • Outlier Detection and Subgroup discovery are useful


    ad