chapter 5 mining association rules with fp tree l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Chapter 5 Mining Association Rules with FP Tree PowerPoint Presentation
Download Presentation
Chapter 5 Mining Association Rules with FP Tree

Loading in 2 Seconds...

play fullscreen
1 / 26

Chapter 5 Mining Association Rules with FP Tree - PowerPoint PPT Presentation


  • 714 Views
  • Uploaded on

Chapter 5 Mining Association Rules with FP Tree. Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010. Mining Frequent Itemsets without Candidate Generation.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Chapter 5 Mining Association Rules with FP Tree' - landen


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
chapter 5 mining association rules with fp tree

Chapter 5 Mining Association Rules with FP Tree

Dr. Bernard Chen Ph.D.

University of Central Arkansas

Fall 2010

mining frequent itemsets without candidate generation
Mining Frequent Itemsets without Candidate Generation
  • In many cases, the Apriori candidate generate-and-test method significantly reduces the size of candidate sets, leading to good performance gain.
  • However, it suffer from two nontrivial costs:
    • It may generate a huge number of candidates (for example, if we have 10^4 1-itemset, it may generate more than 10^7 candidata 2-itemset)
    • It may need to scan database many times
bottleneck of frequent pattern mining
Bottleneck of Frequent-pattern Mining
  • Multiple database scans are costly
  • Mining long patterns needs many passes of scanning and generates lots of candidates
    • To find frequent itemset i1i2…i100
      • # of scans: 100
      • # of Candidates: (1001) + (1002) + … + (110000) = 2100-1 = 1.27*1030 !
  • Bottleneck: candidate-generation-and-test
  • Can we avoid candidate generation?
mining frequent patterns without candidate generation
Mining Frequent Patterns WithoutCandidate Generation
  • Grow long patterns from short ones using local frequent items
    • “abc” is a frequent pattern
    • Get all transactions having “abc”: DB|abc
    • “d” is a local frequent item in DB|abc  abcd is a frequent pattern
process of fp growth
Process of FP growth
  • Scan DB once, find frequent 1-itemset (single item pattern)
  • Sort frequent items in frequency descending order
  • Scan DB again, construct FP-tree
association rules
Association Rules
  • Let’s have an example
    • T100 1,2,5
    • T200 2,4
    • T300 2,3
    • T400 1,2,4
    • T500 1,3
    • T600 2,3
    • T700 1,3
    • T800 1,2,3,5
    • T900 1,2,3
benefits of the fp tree structure
Benefits of the FP-tree Structure
  • Completeness
    • Preserve complete information for frequent pattern mining
    • Never break a long pattern of any transaction
  • Compactness
    • Reduce irrelevant info—infrequent items are gone
    • Items in frequency descending order: the more frequently occurring, the more likely to be shared
    • Never be larger than the original database (not count node-links and the count field)
    • For Connect-4 DB, compression ratio could be over 100
exercise
Exercise
  • A dataset has five transactions, let min-support=60% and min_confidence=80%
  • Find all frequent itemsets using FP Tree
association rules with apriori
Association Rules with Apriori

K:5KE:4 KE

E:4 KM:3 KM

M:3KO:3 KO

O:3 => KY:3 => KY => KEO

Y:3 EM:2 EO

EO:3

EY:2

MO:1

MY:2

OY:2

association rules with fp tree14
Association Rules with FP Tree

Y: KEMO:1 KEO:1 KY:1

K:3 KY

O: KEM:1 KE:2

KE:3 KO EO KEO

M: KE:2 K:1

K:3 KM

E: K:4 KE

why is fp growth the winner
Why Is FP-Growth the Winner?
  • Divide-and-conquer:
    • decompose both the mining task and DB according to the frequent patterns obtained so far
    • leads to focused search of smaller databases
  • Other factors
    • no candidate generation, no candidate test
    • compressed database: FP-tree structure
    • no repeated scan of entire database
    • basic ops—counting local freq items and building sub FP-tree, no pattern search and matching
strong association rules are not necessary interesting

Strong Association Rules are not necessary interesting

Dr. Bernard Chen Ph.D.

University of Central Arkansas

Fall 2010

example 5 8 misleading strong association rule
Example 5.8 Misleading “Strong” Association Rule
  • Of the 10,000 transactions analyzed, the data show that
    • 6,000 of the customer included computer games,
    • while 7,500 include videos,
    • And 4,000 included both computer games and videos
misleading strong association rule
Misleading “Strong” Association Rule
  • For this example:
    • Support (Game & Video) =

4,000 / 10,000 =40%

    • Confidence (Game => Video) =

4,000 / 6,000 = 66%

    • Suppose it pass our minimum support and confidence (30% , 60%, respectively)
misleading strong association rule20
Misleading “Strong” Association Rule
  • However, the truth is : “computer games and videos are negatively associated”
  • Which means the purchase of one of these items actually decreases the likelihood of purchasing the other.
  • (How to get this conclusion??)
misleading strong association rule21
Misleading “Strong” Association Rule
  • Under the normal situation,
    • 60% of customers buy the game
    • 75% of customers buy the video
    • Therefore, it should have 60% * 75% = 45% of people buy both
    • That equals to 4,500 which is more than 4,000 (the actual value)
from association analysis to correlation analysis
From Association Analysis to Correlation Analysis
  • Lift is a simple correlation measure that is given as follows
    • The occurrence of itemset A is independent of the occurrence of itemset B if

P(AUB) = P(A)P(B)

    • Otherwise, itemset A and B are dependent and correlated as events
  • Lift(A,B) = P(AUB) / P(A)P(B)
    • If the value is less than 1, the occurrence of A is negatively correlated with the occurrence of B
    • If the value is greater than 1, then A and B are positively correlated
mining multiple level association rules
Mining Multiple-Level Association Rules
  • Items often form hierarchies
mining multiple level association rules24
Mining Multiple-Level Association Rules
  • Items often form hierarchies
mining multiple level association rules25

uniform support

reduced support

Level 1

min_sup = 5%

Milk

[support = 10%]

Level 1

min_sup = 5%

Level 2

min_sup = 5%

2% Milk

[support = 6%]

Skim Milk

[support = 4%]

Level 2

min_sup = 3%

Mining Multiple-Level Association Rules
  • Flexible support settings
    • Items at the lower level are expected to have lower support
multi level association redundancy filtering
Multi-level Association: Redundancy Filtering
  • Some rules may be redundant due to “ancestor” relationships between items.
  • Example
    • milk  wheat bread [support = 8%, confidence = 70%]
    • 2% milk  wheat bread [support = 2%, confidence = 72%]
  • We say the first rule is an ancestor of the second rule.