the fp growth apriori debate
Download
Skip this Video
Download Presentation
The FP-Growth/Apriori Debate

Loading in 2 Seconds...

play fullscreen
1 / 44

the fp-growth - PowerPoint PPT Presentation


  • 350 Views
  • Uploaded on

The FP-Growth/Apriori Debate. Jeffrey R. Ellis CSE 300 – 01 April 11, 2002. Presentation Overview. FP-Growth Algorithm Refresher & Example Motivation FP-Growth Complexity Vs. Apriori Complexity Saving calculation or hiding work? Real World Application Datasets are not created equal

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'the fp-growth' - libitha


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
the fp growth apriori debate

The FP-Growth/Apriori Debate

Jeffrey R. Ellis

CSE 300 – 01

April 11, 2002

presentation overview
Presentation Overview
  • FP-Growth Algorithm
    • Refresher & Example
    • Motivation
  • FP-Growth Complexity
    • Vs. Apriori Complexity
    • Saving calculation or hiding work?
  • Real World Application
    • Datasets are not created equal
    • Results of real-world implementation
fp growth algorithm
FP-Growth Algorithm
  • Association Rule Mining
    • Generate Frequent Itemsets
      • Apriori generates candidate sets
      • FP-Growth uses specialized data structures (no candidate sets)
    • Find Association Rules
      • Outside the scope of both FP-Growth & Apriori
  • Therefore, FP-Growth is a competitor to Apriori
fp growth example

Header Table

Item frequency head

f 4

c 4

a 3

b 3

m 3

p 3

{}

f:4

c:1

c:3

b:1

b:1

a:3

p:1

m:2

b:1

p:2

m:1

FP-Growth Example

TID Items bought (ordered) frequent items

100 {f, a, c, d, g, i, m, p}{f, c, a, m, p}

200 {a, b, c, f, l, m, o}{f, c, a, b, m}

300 {b, f, h, j, o}{f, b}

400 {b, c, k, s, p}{c, b, p}

500{a, f, c, e, l, p, m, n}{f, c, a, m, p}

Conditional pattern bases

item cond. pattern base

c f:3

a fc:3

b fca:1, f:1, c:1

m fca:2, fcab:1

p fcam:2, cb:1

fp growth example5

Item

Conditional pattern-base

Conditional FP-tree

p

{(fcam:2), (cb:1)}

{(c:3)}|p

m

{(fca:2), (fcab:1)}

{(f:3, c:3, a:3)}|m

b

{(fca:1), (f:1), (c:1)}

Empty

a

{(fc:3)}

{(f:3, c:3)}|a

c

{(f:3)}

{(f:3)}|c

f

Empty

Empty

FP-Growth Example
fp growth algorithm6
FP-CreateTree

Input: DB, min_support

Output: FP-Tree

Scan DB & count all frequent items.

Create null root & set as current node.

For each Transaction T

Sort T’s items.

For each sorted Item I

Insert I into tree as a child of current node.

Connect new tree node to header list.

Two passes through DB

Tree creation is based on number of items in DB.

Complexity of CreateTree is O(|DB|)

FP-Growth Algorithm
fp growth example7

Header Table

Item frequency head

f 4

c 4

a 3

b 3

m 3

p 3

{}

f:4

c:1

c:3

b:1

b:1

a:3

p:1

m:2

b:1

p:2

m:1

FP-Growth Example

TID Items bought (ordered) frequent items

100 {f, a, c, d, g, i, m, p}{f, c, a, m, p}

200 {a, b, c, f, l, m, o}{f, c, a, b, m}

300 {b, f, h, j, o}{f, b}

400 {b, c, k, s, p}{c, b, p}

500{a, f, c, e, l, p, m, n}{f, c, a, m, p}

fp growth algorithm8
FP-Growth

Input: FP-Tree, f_is (frequent itemset)

Output: All freq_patterns

if FP-Tree contains single Path P

then for each combination of nodes in P

generate pattern f_is  combination

else for each item i in header

generate pattern f_is  i

construct pattern’s conditional cFP-Tree

if (FP-Tree  0)

then call FP-Growth (cFP-Tree, pattern)

Recursive algorithm creates FP-Tree structures and calls FP-Growth

Claims no candidate generation

FP-Growth Algorithm
two part algorithm

A

B

C

D

Two-Part Algorithm

if FP-Tree contains single Path P

then for each combination of nodes in P

generate pattern f_is  combination

  • e.g., { A B C D }
    • p = |pattern| = 4
    • AD, BD, CD, ABD, ACD, BCD, ABCD
  • (n=1 to p-1) (p-1Cn)
two part algorithm10

A

B

C

D

C

D

E

D

E

E

Two-Part Algorithm

else for each item i in header

generate pattern f_is  i

construct pattern’s conditional cFP-Tree

if (cFP-Tree  null)

then call FP-Growth (cFP-Tree, pattern)

  • e.g., { A B C D E } for f_is = D
    • i = A, p_base = (ABCD), (ACD), (AD)
    • i = B, p_base = (ABCD)
    • i = C, p_base = (ABCD), (ACD)
    • i = E, p_base = null
  • Pattern bases are generated by following f_is path from header to each node in tree having it, up to tree root, for each header item.
fp growth complexity
FP-Growth Complexity
  • Therefore, each path in the tree will be at least partially traversed the number of items existing in that tree path (the depth of the tree path) * the number of items in the header.
  • Complexity of searching through all paths is then bounded by O(header_count2 * depth of tree)
  • Creation of a new cFP-Tree occurs also.
sample data fp tree

K

K

L

L

M

M

L

L

M

M

M

M

M

M

Sample Data FP-Tree

null

A

B

C

D

C

D

E

J

K

L

M

L

M

M

M

algorithm results in english
Algorithm Results (in English)
  • Candidate Generation sets exchanged for FP-Trees.
    • You MUST take into account all paths that contain an item set with a test item.
    • You CANNOT determine before a conditional FP-Tree is created if new frequent item sets will occur.
    • Trivial examples hide these assertions, leading to a belief that FP-Tree operates more efficiently.
fp growth mining example

Null

Transactions:

A (49)

B (44)

F (39)

G (34)

H (30)

I (20)

ABGM

BGIM

FGIM

GHIM

A:50

B:45

F:40

G:35

B:1

G:1

G:1

H:1

G:1

I:1

I:1

I:1

M:1

M:1

M:1

M:1

FP-Growth Mining Example

Header:

A

B

F

G

H

IM

H:30

I:20

fp growth mining example15
FP-Growth Mining Example

Null

i = A

pattern = { A M }

support = 1

pattern base

cFP-Tree = null

Header:

A

B

F

G

H

IM

A:50

B:45

F:40

G:35

B:1

G:1

G:1

H:1

G:1

I:1

I:1

I:1

M:1

M:1

M:1

M:1

freq_itemset = { M }

fp growth mining example16
FP-Growth Mining Example

Null

i = B

pattern = { B M }

support = 2

pattern base

cFP-Tree …

Header:

A

B

F

G

H

IM

A:50

B:45

F:40

G:35

B:1

G:1

G:1

H:1

G:1

I:1

I:1

I:1

M:1

M:1

M:1

M:1

freq_itemset = { M }

fp growth mining example17

G:2

B:2

M:2

FP-Growth Mining Example

Header:

G

B

M

Patterns mined:

BM

GM

BGM

freq_itemset = { B M }

All patterns: BM, GM, BGM

fp growth mining example18
FP-Growth Mining Example

Null

i = F

pattern = { F M }

support = 1

pattern base

cFP-Tree = null

Header:

A

B

F

G

H

IM

A:50

B:45

F:40

G:35

B:1

G:1

G:1

H:1

G:1

I:1

I:1

I:1

M:1

M:1

M:1

M:1

freq_itemset = { M }

fp growth mining example19
FP-Growth Mining Example

Null

i = G

pattern = { G M }

support = 4

pattern base

cFP-Tree …

Header:

A

B

F

G

H

IM

A:50

B:45

F:40

G:35

B:1

G:1

G:1

H:1

G:1

I:1

I:1

I:1

M:1

M:1

M:1

M:1

freq_itemset = { M }

fp growth mining example20
FP-Growth Mining Example

Null

i = I

pattern = { I G M }

support = 3

pattern base

cFP-Tree …

Header:

I

B

G

M

I:3

B:1

B:1

G:2

G:1

G:1

M:2

M:1

M:1

freq_itemset = { G M }

fp growth mining example21

I:3

G:3

M:3

FP-Growth Mining Example

Header:

I

G

M

Patterns mined:

IM

GM

IGM

freq_itemset = { I G M }

All patterns: BM, GM, BGM, IM, GM, IGM

fp growth mining example22
FP-Growth Mining Example

Null

i = B

pattern = { B G M }

support = 2

pattern base

cFP-Tree …

Header:

I

B

G

M

I:3

B:1

B:1

G:2

G:1

G:1

M:2

M:1

M:1

freq_itemset = { G M }

fp growth mining example23

B:2

G:2

M:2

FP-Growth Mining Example

Header:

B

G

M

Patterns mined:

BM

GM

BGM

freq_itemset = { B G M }

All patterns: BM, GM, BGM, IM, GM, IGM,

BM, GM, BGM

fp growth mining example24
FP-Growth Mining Example

Null

i = H

pattern = { H M }

support = 1

pattern base

cFP-Tree = null

Header:

A

B

F

G

H

IM

A:50

B:45

F:40

G:35

B:1

G:1

G:1

H:1

G:1

I:1

I:1

I:1

M:1

M:1

M:1

M:1

freq_itemset = { M }

fp growth mining example25
FP-Growth Mining Example
  • Complete pass for { M }
  • Move onto { I }, { H }, etc.
  • Final Frequent Sets:
    • L2 : { BM, GM, IM, GM, BM, GM, IM, GM, GI, BG }
    • L3 : { BGM, GIM, BGM, GIM }
    • L4 : None
fp growth redundancy v apriori candidate sets
FP-Growth Redundancy v. Apriori Candidate Sets
  • FP-Growth (support=2) generates:
    • L2 : 10 sets (5 distinct)
    • L3 : 4 sets (2 distinct)
    • Total : 14 sets
  • Apriori (support=2) generates:
    • C2 : 21 sets
    • C3 : 2 sets
    • Total : 23 sets
  • What about support=1?
    • Apriori : 23 sets
    • FP-Growth : 28 sets in {M} with i = A, B, F alone!
fp growth vs apriori
FP-Growth vs. Apriori
  • Apriori visits each transaction when generating a new candidate sets; FP-Growth does not
    • Can use data structures to reduce transaction list
  • FP-Growth traces the set of concurrent items; Apriori generates candidate sets
  • FP-Growth uses more complicated data structures & mining techniques
algorithm analysis results
Algorithm Analysis Results
  • FP-Growth IS NOT inherently faster than Apriori
    • Intuitively, it appears to condense data
    • Mining scheme requires some new work to replace candidate set generation
    • Recursion obscures the additional effort
  • FP-Growth may run faster than Apriori in circumstances
  • No guarantee through complexity which algorithm to use for efficiency
improvements to fp growth
Improvements to FP-Growth
  • None currently reported
  • MLFPT
    • Multiple Local Frequent Pattern Tree
    • New algorithm that is based on FP-Growth
    • Distributes FP-Trees among processors
  • No reports of complexity analysis or accuracy of FP-Growth
real world applications
Real World Applications
  • Zheng, Kohavi, Mason – “Real World Performance of Association Rule Algorithms”
    • Collected implementations of Apriori, FP-Growth, CLOSET, CHARM, MagnumOpus
    • Tested implementations against 1 artificial and 3 real data sets
    • Time-based comparisons generated
apriori fp growth
Apriori & FP-Growth
  • Apriori
    • Implementation from creator Christian Borgelt (GNU Public License)
    • C implementation
    • Entire dataset loaded into memory
  • FP-Growth
    • Implementation from creators Han & Pei
    • Version – February 5, 2001
other algorithms
Other Algorithms
  • CHARM
    • Based on concept of Closed Itemset
    • e.g., { A, B, C } – ABC, AC, BC, ACB, AB, CB, etc.
  • CLOSET
    • Han, Pei implementation of Closed Itemset
  • MagnumOpus
    • Generates rules directly through search-and-prune technique
datasets
Datasets
  • IBM-Artificial
    • Generated at IBM Almaden (T10I4D100K)
    • Often used in association rule mining studies
  • BMS-POS
    • Years of point-of-sale data from retailer
  • BMS-WebView-1 & BMS-WebView-2
    • Months of clickstream traffic from e-commerce web sites
experimental considerations
Experimental Considerations
  • Hardware Specifications
    • Dual 550MHz Pentium III Xeon processors
    • 1GB Memory
  • Support { 1.00%, 0.80%, 0.60%, 0.40%, 0.20%, 0.10%, 0.08%, 0.06%, 0.04%, 0.02%, 0.01% }
  • Confidence = 0%
  • No other applications running (second processor handles system processes)
slide37

BMS-WebView-1

BMS-WebView-2

study results real data
Study Results – Real Data
  • At support  0.20%, Apriori performs as fast as or better than FP-Growth
  • At support < 0.20%, Apriori completes whenever FP-Growth completes
    • Exception – BMS-WebView-2 @ 0.01%
  • When 2 million rules are generated, Apriori finishes in 10 minutes or less
    • Proposed – Bottleneck is NOT the rule algorithm, but rule analysis
study results artificial data
Study Results – Artificial Data
  • At support < 0.40%, FP-Growth performs MUCH faster than Apriori
  • At support  0.40%, FP-Growth and Apriori are comparable
real world study conclusions
Real-World Study Conclusions
  • FP-Growth (and other non-Apriori) perform better on artificial data
  • On all data sets, Apriori performs sufficiently well in reasonable time periods for reasonable result sets
  • FP-Growth may be suitable when low support, large result count, fast generation are needed
  • Future research may best be directed toward analyzing association rules
research conclusions
Research Conclusions
  • FP-Growth does not have a better complexity than Apriori
    • Common sense indicates it will run faster
  • FP-Growth does not always have a better running time than Apriori
    • Support, dataset appears more influential
  • FP-Trees are very complex structures (Apriori is simple)
  • Location of data (memory vs. disk) is non-factor in comparison of algorithms
to use or not to use
To Use or Not To Use?
  • Question: Should the FP-Growth be used in favor of Apriori?
    • Difficulty to code
    • High performance at extreme cases
    • Personal preference
  • More relevant questions
    • What kind of data is it?
    • What kind of results do I want?
    • How will I analyze the resulting rules?
references
References
  • Han, Jiawei, Pei, Jian, and Yin, Yiwen, “Mining Frequent Patterns without Candidate Generation”. In Proc. of the ACM SIGMOD Int. Conf. on Management of Data, pages 1-12, Dallas, Texas, USA, 2000.
  • Orlando, Salvatore, “High Performance Mining of Short and Long Patterns”. 2001.
  • Pei, Jian, Han, Jiawei, and Mao, Runying, “CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets”. In SIGMOD Int\'l Workshop on Data Mining and Knowledge Discovery, May 2000.
  • Webb, Geoffrey L., “Efficient search for association rules”. In Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 99--107, 2000.
  • Zaiane, Osmar, El-Hajj, Mohammad, and Lu, Paul, “Fast Parallel Association Rule Mining Without Candidacy Generation”. In Proc. of the IEEE 2001 International Conference on Data Mining (ICDM\'2001), San Jose, CA, USA, November 29-December 2, 2001
  • Zaki, Mohammed J., “Generating Non-Redundant Association Rules”. In Proceedings of the International Conference on Knowledge Discovery and Data Mining, 2000.
  • Zheng, Zijian, Kohavi, Ron, and Mason, Llew, “Real World Performance of Association Rule Algorithms”. In proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, California, August 2001.
ad