Cs490d introduction to data mining prof chris clifton
Download
1 / 27

CS490D: Introduction to Data Mining Prof. Chris Clifton - PowerPoint PPT Presentation


  • 125 Views
  • Uploaded on

CS490D: Introduction to Data Mining Prof. Chris Clifton. April 14, 2004 Fraud and Misuse Detection. What is Fraud Detection?. Identify wrongful actions Is right and wrong universal? If so, why not just prevent wrong actions Identify actions by the wrong people Identify suspect actions

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'CS490D: Introduction to Data Mining Prof. Chris Clifton' - Faraday


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Cs490d introduction to data mining prof chris clifton

CS490D:Introduction to Data MiningProf. Chris Clifton

April 14, 2004

Fraud and Misuse Detection


What is fraud detection
What is Fraud Detection?

  • Identify wrongful actions

    • Is right and wrong universal?

    • If so, why not just prevent wrong actions

  • Identify actions by the wrong people

  • Identify suspect actions

    • Legal

    • But probably not right


In data mining terms
In Data Mining terms…

  • Classification?

    • Classify into fraudulent and non-fraudulent behavior

    • What do we need to do this?

  • Outlier Detection

    • Assume non-fraudulent behavior is normal

    • Find the exceptions

  • Problems?


Solution differential profiling

Profile

Profile

Profile

Solution: Differential Profiling

  • Determine individual behavior

    • What is normal for the individual

    • What separates one individual from another

  • Gives profile of individual behavior

  • How do we do this?

+

+

+

Classification

Mining


Has this been done intrusion detection lane brodley
Has this been done?Intrusion Detection (Lane&Brodley)

  • Profiled computer users based on command sequences

    • Command

    • Some (but not all) argument information

    • Sequence information


Results accuracy time to alarm
ResultsAccuracy Time to Alarm


Scaling issues
Scaling Issues

  • What happens with millions of users?

    • Credit card

    • Cell phone

  • What about new users?

  • Ideas?


Multi user profiles
Multi-user profiles

  • Cluster users

  • Develop profiles for clusters

    • E.g., differential profiling

  • Old customers: Do they match profile for their cluster?

    • Allows wider range of acceptable behavior

  • New customer: Do they match any profile?


Data mining for detection and prevention
Data mining for detection and prevention


Matching known fraud non compliance
Matching known fraud/non-compliance

  • Which new cases are similar to known cases?

  • How can we define similarity?

  • How can we rate or score similarity?


Anomalies and irregularities
Anomalies and irregularities

  • How can we detect anomalous or unusual behavior?

  • What do we mean by usual?

  • Can we rate or score cases on their degree of anomaly?


Techniques used to identify fraud

Predict and Classify

Regression algorithms (predict numeric outcome): neural networks, CART, Regression, GLM

Classification algorithms (predict symbolic outcome): CART, C5.0, logistic regression

Group and Find Associations

Clustering/Grouping algorithms: K-means, Kohonen, 2Step, Factor analysis

Association algorithms: apriori, GRI, Capri, Sequence

Techniques used to identify fraud


Techniques for finding fraud
Techniques for finding fraud:

  • Predict the expected value for a claim, compare that with the actual value of the claim.

  • Those cases that fall far outside the expected range should be evaluated more closely


Techniques for finding fraud1
Techniques for finding fraud:

Decision Trees and Rules

  • Build a profile of the characteristics of fraudulent behavior.

  • Pull out the cases that meet the historical characteristics of fraud.


Techniques for finding fraud2
Techniques for finding fraud:

Clustering and Associations

  • Group behavior using a clustering algorithm

  • Find groups of events using the association algorithms

  • Identify outliers and investigate


Fraud detection using crisp dm
Fraud detection using CRISP-DM

  • Provides a systematic way to detect fraud and abuse

  • Ensures auditing and investigative efforts are maximized

  • Continually assesses and updates models to identify new emerging fraud patterns

  • Leads to higher recoupments



Payment error prevention
Payment Error Prevention

The US Health Care Finance Administration needed to isolate the likely causes of payment error by developing a profile of acceptable billing practices and...

…used this information to focus their auditing effort


Payment error prevention solution
Payment error prevention solution

  • Clementine™

  • Using audited discharge records, built profiles of appropriate decisions such as diagnosis coding and admission

  • Matched new cases

  • Cases not matching are audited


Payment error prevention results
Payment error prevention results

  • Detected 50% of past incorrect payments – resulting in significant recovery of funding lost to payment errors

  • PRO analysts able to use resultant Clementine models to prevent future error


Billing and payment fraud
Billing and payment fraud

The US Defense Finance and Accounting Service needed to

find fraud in millions of Dept of

Defense transactions and...

Identified suspicious cases to focus investigations


Billing and payment fraud solution
Billing and payment fraud solution

  • Clementine

  • Detection models based on known fraud patterns

  • Analyzed all transactions – scored based on similarity to these known patterns

  • High scoring transactions flagged for investigation


Billing and payment fraud results
Billing and payment fraud results

  • Identified over 1,200 payments for further investigation

  • Integrated the detection process

  • Anomaly detection methods (e.g., clustering) will serve as ‘sentinel’ systems for previously undetected fraud patterns


Audit selection
Audit selection

The Washington State Department of Revenue needed to detect erroneous tax returns and...

Focused audit investigations on cases with the highest likely adjustments


Audit selection solution
Audit selection solution

  • Clementine

  • Using previously audited returns

  • Model adjustment (recovery) per auditor hour based on return information

  • Models will then score future returns showing highest potential adjustment


Audit selection results
Audit selection results

  • Maximizes auditors’ time by focusing on cases likely to yield the highest return

  • Closes the ‘tax gap’


Data mining key to detecting and preventing fraud waste and abuse
Data mining - key to detecting and preventing fraud, waste and abuse

  • Learn from the past

    • High quality, evidence based decisions

  • Predict

    • Prevent future instances

  • React to changing circumstances

    • Models kept current, from latest data


ad