1 / 66

Electronic Commerce Technology (IT60104) 3-0-0 Spring 2010-11

Electronic Commerce Technology (IT60104) 3-0-0 Spring 2010-11. Dr. Shamik Sural School of Information Technology IIT Kharagpur. Lecture Set 5. Credit Card Fraud Detection – Defining the Problem Traditional Approaches Recent Advances

jayme
Download Presentation

Electronic Commerce Technology (IT60104) 3-0-0 Spring 2010-11

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Electronic Commerce Technology(IT60104) 3-0-0Spring 2010-11 Dr. Shamik Sural School of Information Technology IIT Kharagpur

  2. Lecture Set 5 • Credit Card Fraud Detection – Defining the Problem • Traditional Approaches • Recent Advances • Computational Intelligence Techniques in Credit Card Fraud Detection (CCFD) • Two Stage CCFD using Sequence Alignment • BLAST-SAHA Hybridization in CCFD • CCFD using Dempster-Shafer theory and Bayesian Inferencing • Future Directions • References

  3. Lecture Set 5 Credit Card Fraud Detection

  4. Credit Card Fraud Over 40 million card accounts were exposed to potential fraud (13.9 million Master cards, 22 million Visa cards), 2005 The total amount of credit card fraud in USA itself is reported to be $2.7 billion in 2005 The estimated credit card fraud in UK was more than £200m in 2005

  5. Types of Credit Card Fraud Physical card gets lost or stolen and is used by fraudster Card number is stolen and used in indirect shopping Credit card skimming where the data from a card magnetic strip is electronically copied onto another card

  6. How does an Adversary get my Credit Card? Lost card I just leave it somewhere Card theft Gone along with my wallet Non-receipt Does not even reach the cardholder Virtual Loss Wiretapping during a non-secure communication Copy made at unscrupulous merchant establishments Solder surfing, dumpster diving, etc…

  7. Credit Card FraudDetection From millions of credit card transactions Identify the ones which are fraudulent Detect all misuses Do not wrongly identify any genuine transaction as fraud Parties concerned Cardholder Fraudster Merchant’s bank Card issuing bank Laws of the land

  8. Challenges Behavior of fraudster Is it predictable? Is he under any kind of time pressure? What are his motives? Is he part of an organized gang? Behavior of cardholder Does he have a spending behavior? What kind of purchases does he make? Does he deviate a lot from a set pattern?

  9. Further Challenges How accurate should the CCFD system be? Number of misses Number of false alarms Base rate fallacy Inconvenience to cardholder Denial of service Design Choices Any attempt to reduce false alarms lowers correct detection rate as well

  10. Existing Approaches Neural network based Ghosh and Reilly (1994) Aleskerov et al. (1997) Syeda et al. (2002) Meta-learning based Stolfo et al. (1997) Data mining based Chan et al. (1999) Brause et al. (1999) Chiu et al. (2004) Other approaches Chen et al. (2005)

  11. Limitations • Misuse-based fraud detection systems (FDS) cannot detect new fraud patterns • Anomaly-based FDSs often raise a large number of false alarms • Deviation in spending behavior due to emergency or accidental causes raise false alarms • Main limitation • Majority of transactions flagged as fraudulent are actually genuine • Fixed set of rules

  12. Recent Advances Parallel granular neural networks by Syeda et al. (2002) Commercially available Maxmind (2005) Fraud Detection Suite (2005) Game theoretic approach Liu and Li (2002) Vatsa, Sural and Majumdar (2005, 2007) Sequence alignment based approach Kundu, Sural and Majumdar (2006) HMM based approach Srivastava, Kundu, Sural and Majumdar (2008) Combining multiple evidences and learning Panigrahi, Sural and Majumdar (2008)

  13. Overall Objectives Achieve high fraud detection rate (True positive) with low false alarm (False positive) Integration of misuse-based and anomaly-based detection techniques Make detection process fast enough to achieve a reasonable level of customer satisfaction Use of sequence alignment tool for detecting fraudulent activities Develop an FDS that Combines multiple evidences including behavioral patterns of cardholders and fraudsters Dynamically learns and adapts to the changing behavior

  14. Approaches to be covered in the rest of the lecture A Sequence Alignment (SA) based Credit Fraud Detection technique Purchase amounts form a sequence Uses BLAST for sequence alignment Combination of misuse and anomaly detection Enhancement of the basic SA approach BLAST-SSAHA hybridization Faster speed Uses amount-time as the symbol Combining evidences and learning Use of Dempster-Shafer theory Belief update

  15. Two Stage CCFD using Sequence Alignment (TCCFDS)

  16. Notion of Spending Behavior Product choice, brand choice, dealer choice, income group and amount of purchase Weekend effect in the first half and second half of a month Significance of day of the week as a decision to go for shopping Purchase behavior forms sequence

  17. Sequence Alignment (SA) Basics Let S1 And S2 be two given sequences : S1 = <ABCDE> and S2 = <BRXDE> Sub sequences of S1 = {<A>,<B>,…,<AB>,…, <DE>, <ABC>, <ABD>,…<BDE>, …<ABCD>, <BCDE>} Common subsequences of S1 and S2 = {<B>, <D>, <E>, <BD>, <DE>, <BE>, <BDE>} Longest common subsequence of S1 and S2 = <BDE> Sequence alignment is a technique used to arrange two or more sequences in order to find a common subsequence of longest possible length

  18. Sequence Alignment for TCCFDS User profile can be represented using a sequence of tasks (transactions) Incoming transactions can be treated as a mixed sequence of good and bad transactions Good-sequence = <HLLLHLL> Mixed-sequence = <LLHHHHL> Longest common subsequence = <LLHL> <HHH> may be caused by fraudulent transactions

  19. Two-stage CCFD First stage: Align mixed sequence with good sequences Second stage: Align probable bad sequence with past fraud sequences

  20. Application of SA for CCFD Architecture of a two-stage fraud detection system

  21. Application of SA for CCFD Let L be the length of a Q-sequence and N be the number of matches with aligned GD-sequence G-score = N×  - (L - N) ×  Let M be the number of matches with aligned BD-sequence and R-sequence B-score = M×  - ((L - N) - M) ×  T-score = G-score – B-score Alarm raised if T-score < T over a window W

  22. Performance Analysis

  23. Performance Analysis True Positive (TP) is the percentage of fraudulent transactions identified as fraudulent False Positive (FP) is the percentage of genuine transactions identified as fraudulent

  24. Performance Analysis Variation of TP/FP for High spending user and Low risk-loving thief (a) with T and (b) with W

  25. Performance Analysis Comparative study of TP/FP for TCCFDS and CARDWATCH

  26. Limitations of TCCFDS Uses only purchase amount as the source of information Temporal spending behavior is not considered at all Takes substantially long time in the alignment process  Use time of purchase and develop a faster method

  27. BLAST-SAHA Hybridization for Credit Card Fraud Detection (BLAHFDS)

  28. Improvements Consider time dimension along with transaction amount Propose a hybrid algorithm named BLAH which combines the advantages of BLAST and SSAHA algorithms

  29. SSAHA A hash table is constructed from sequences in the database in the first stage Query words are searched appropriately from the hash table in the second stage

  30. SSAHA S =<CCACACCBBBAA> is a DB sequence and <CABCA> is a query sequence {< CA >< AB >< BC >< CA >}

  31. Modified SSAHA

  32. Time-amount Dimension in FDS Consider monthly spending sequence in BLAHFDS A month is divided into four weeks Each week is divided into two slots weekday and weekend Simple yet effective way of visualizing the time dimension Time dimension TD = [0,1,2,3,4,5,6,7] is represented by eight time slots

  33. Architecture of BLAHFDS

  34. Performance Analysis Performance of BLAHFDS for different T and W

  35. Performance Analysis Variation of TP/FP with Genuine Profile Size

  36. Performance Analysis Variation of Alignment Time with Genuine Profile Size

  37. Performance Analysis Comparative study of TP/FP for BLAHFDS and CARDWATCH

  38. Limitations of BLAHFDS Uses a single rule Cannot learn and adapt to the changing behavior of users Further scope for reducing false alarms?? Combine multiple rules Introduce belief update component

  39. CCFD using Dempster-Shafer Theory and • Bayesian Learning

  40. Overview Uses a number of rules to monitor behavioral pattern of users Each rule measures deviation of an incoming transaction from user’s normal profile by assigning basic probabilities to it Basic probabilities are combined using an extension of Dempster-Shafer theory to obtain an initial belief 6/7/2014 40

  41. Overview (contd.) Two preset thresholds are used – lower threshold (θLT) and upper threshold (θUT) If initial belief < θLT, transaction is genuine If initial belief > θUT, transaction is intrusive If θLT ≤ initial belief ≤ θUT, transaction is suspicious For a suspicious transaction, its initial belief is updated based on similarity with genuine or intrusive transaction historyusing Bayesianlearning 6/7/2014 41

  42. FDS Components FDS has the following four components: Rule-based Filter Dempster-Shafer Adder Transaction History Database Bayesian Learner 6/7/2014 42

  43. Rule-Based Filter (RBF) RBF consists of generic as well as customer-specific rules which classify an incoming transaction as fraudulent with a certain probability Rules at the RBF: Address Mismatch (R1) Outlier Detection (R2) Other rules may be added as required 6/7/2014 43

  44. Outlier Detection DBSCAN (Density Based Spatial Clustering of Applications with Noise) algorithm is applied for outlier detection ‘Transaction amount’ is used as the attribute for generating outliers

  45. Degree of Outlierness Deviation of an incoming transaction is measured by its degree of outlierness (doutlier) defined as: MinPts: Minimum number of points required in the ε-neighborhood of each point to form a cluster ε: Maximum radius of the neighborhood vavg: average distance of amount p of an outlier transaction from the set of existing clusters 6/7/2014 45

  46. Dempster-Shafer Adder (DSA) DSA uses the Dempster-Shafer theory (DST) to compute an initial belief for each transaction by combining evidences from R1 and R2 DST is a mathematical theory of evidence to combine multiple evidences from independent sources It assumes a Universe of Discourse U (Frame of Discernment) - set of mutually exclusive and exhaustive possibilities 6/7/2014 46

  47. Dempster-Shafer Adder (contd.) For the CCFD problem, U consists of two possible values for any suspected transaction: U = {fraud, ¬fraud} The power set of U has three possible elements: h = {fraud} implies transaction is fraudulent. h’ = {¬fraud} implies transaction is not fraudulent (genuine). U impliestransaction is suspicious. 6/7/2014 47

  48. Basic Probability Assignment DST is applied for CCFD by assigning probabilities to evidences from R1 and R2 The basic probability assignment (BPA) of R1 and R2 is given by: BPA for R1: m1(h) = 0.6, m1(h’) = 0, m1(U) = 0.4 BPA for R2: m2(h) = doutlier, m2(h’) = 0, m2(U) = 1 - doutlier 6/7/2014 48

  49. Dempster’s Rule for Combination Dempster’s Rule for Combination Initial belief P(h) for a transaction is now computedas: Based on P(h), a transaction can be initially classified as - genuine, fraudulent or suspicious 6/7/2014 49

  50. Transaction History Database (THD) Transactions labeled as suspicious by DSA are passed to THD for further strengthening or weakening the initial belief Each history transaction is represented by a set of attributes - card number, transaction amount and time since last purchase (time gap) Past spending frequency on a card is accumulated and analyzed at the THD 6/7/2014 50

More Related