data mining for credit card fraud a comparative study n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Data Mining For Credit Card Fraud : A Comparative Study PowerPoint Presentation
Download Presentation
Data Mining For Credit Card Fraud : A Comparative Study

Loading in 2 Seconds...

play fullscreen
1 / 19

Data Mining For Credit Card Fraud : A Comparative Study - PowerPoint PPT Presentation


  • 230 Views
  • Uploaded on

Data Mining For Credit Card Fraud : A Comparative Study. Xxxxxxxx DSCI 5240 | Dr. Nick Evangelopoulos Graduate Presentation. Overview. Credit Card Fraud Data Mining Techniques Data Experimental Setup Results. Credit Card Fraud. Two Types: Application Fraud

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Data Mining For Credit Card Fraud : A Comparative Study' - damia


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
data mining for credit card fraud a comparative study

Data Mining For Credit Card Fraud: A Comparative Study

Xxxxxxxx

DSCI 5240 | Dr. Nick Evangelopoulos

Graduate Presentation

overview
Overview
  • Credit Card Fraud
  • Data Mining Techniques
  • Data
  • Experimental Setup
  • Results

Graduate Presentation | DSCI 5240 | Xxxxxxx

credit card fraud
Credit Card Fraud
  • Two Types:
    • Application Fraud
      • Obtain new cards using false information
    • Behavioral Fraud
      • Mail theft
      • Stolen/lost card
      • Counterfeit card

Graduate Presentation | DSCI 5240 | Xxxxxxx

credit card fraud1
Credit Card Fraud
  • Online Revenue loss due to Fraud (cybersource.com)

Graduate Presentation | DSCI 5240 | Xxxxxxx

data mining techniques
Data Mining Techniques
  • Logistic Regression
    • Used to predict outcome of categorical dependent variable
    • Fraud variable is binary
  • Support Vector Machines
  • Random Forest

Graduate Presentation | DSCI 5240 | Xxxxxxx

support vector machines svm
Support Vector Machines (SVM)
  • Supervised learning models with associated learning algorithms that analyze and recognize patterns
  • Linear classifiers that work in high dimensional feature space that is non-linear mapping of input space
  • Two properties of SVM
    • Kernel representation
    • Margin optimization

Graduate Presentation | DSCI 5240 | Xxxxxxx

random forest rf
Random Forest (RF)
  • Ensemble of classification trees
  • Performs well when individual members are dissimilar

Graduate Presentation | DSCI 5240 | Xxxxxxx

data datasets
Data: Datasets
  • 13 Months of data (Jan 2006 – Jan 2007)
  • 50 Million credit card transactions on 1 Million credit cards
  • 2420 known fraudulent transactions with 506 credit cards

Graduate Presentation | DSCI 5240 | Xxxxxxx

percentage of transaction by transaction type
Percentage of Transaction by transaction type

Graduate Presentation | DSCI 5240 | Xxxxxxx

data selection
Data Selection

Graduate Presentation | DSCI 5240 | Xxxxxxx

primary attributes in dataset
Primary attributes in Dataset

Graduate Presentation | DSCI 5240 | Xxxxxxx

derived attributes
Derived Attributes

Graduate Presentation | DSCI 5240 | Xxxxxxx

experimental setup
Experimental Setup
  • For SVM, Gaussian radial basis function was used as the kernel function
  • For Random Forest, number of attributes considered at the node and number of trees was set.
  • Data were sampled at different rates using random under sampling of majority class

Graduate Presentation | DSCI 5240 | Xxxxxxx

training and testing data
Training and testing data

Graduate Presentation | DSCI 5240 | Xxxxxxx

results
Results

Graduate Presentation | DSCI 5240 | Xxxxxxx

proportion of fraud captured at different depths
Proportion of fraud captured at different depths

Graduate Presentation | DSCI 5240 | Xxxxxxx

fraud capture rate w different fraud rates in training data
Fraud Capture Rate w/ Different Fraud Rates in Training Data

Graduate Presentation | DSCI 5240 | Xxxxxxx

conclusion
Conclusion
  • Examine the performance of two data mining techniques
    • SVM and RF together with logistic regression
  • Used real life data set from Jan 2006 – Jan 2007
  • Used data undersampling approach to sample data
  • Random forest showed much higher performance at upper file depths
  • SVM performance at the upper file depths tended to increase with lower proportion of fraud in the training data
  • Random forest demonstrated overall better performance

Graduate Presentation | DSCI 5240 | Xxxxxxx

questions
Questions

Graduate Presentation | DSCI 5240 | Xxxxxxx