Association analysis 3
This presentation is the property of its rightful owner.
Sponsored Links
1 / 26

Association Analysis (3) PowerPoint PPT Presentation


  • 95 Views
  • Uploaded on
  • Presentation posted in: General

Association Analysis (3). FP-Tree/FP-Growth Algorithm. Use a compressed representation of the database using an FP-tree Once an FP-tree has been constructed, it uses a recursive divide-and-conquer approach to mine the frequent itemsets. Building the FP-Tree

Download Presentation

Association Analysis (3)

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Association analysis 3

Association Analysis (3)


Fp tree fp growth algorithm

FP-Tree/FP-Growth Algorithm

  • Use a compressed representation of the database using an FP-tree

  • Once an FP-tree has been constructed, it uses a recursive divide-and-conquer approach to mine the frequent itemsets.

    Building the FP-Tree

  • Scan data to determine the support count of each item.

    Infrequent items are discarded, while the frequent items are sorted in decreasing support counts.

  • Make a second pass over the data to construct the FP­tree.

    As the transactions are read, before being processed, their items are sorted according to the above order.


First scan determine frequent 1 itemsets then build header

First scan – determine frequent 1-itemsets, then build header


Fp tree construction

null

B:1

A:1

null

B:2

C:1

A:1

D:1

FP-tree construction

After reading TID=1:

After reading TID=2:


Fp tree construction1

null

B:8

A:2

A:5

C:3

C:1

D:1

C:3

D:1

D:1

E:1

D:1

E:1

D:1

E:1

FP-Tree Construction

Transaction Database

Header table

Chain pointers help in quickly finding all the paths of the tree containing some given item.


Fp tree size

FP-Tree size

  • The size of an FP­tree is typically smaller than the size of the uncompressed data because many transactions often share a few items in common.

  • Best­case scenario:

    • All transactions have the same set of items, and the FP­tree contains only a single branch of nodes.

  • Worst­case scenario:

    • Every transaction has a unique set of items.

    • As none of the transactions have any items in common, the size of the FP­tree is effectively the same as the size of the original data.

  • The size of an FP­tree also depends on how the items are ordered.

    • If the ordering scheme in the preceding example is reversed,

      • i.e., from lowest to highest support item, the resulting FP­tree probably is denser (shown in next slide).

      • Not always though…ordering is just a heuristic.


Association analysis 3

An FP­tree representation for the data set with a different item ordering scheme.


Fp growth i

FP-Growth (I)

  • FP­growth generates frequent itemsets from an FP­tree by exploring the tree in a bottom­up fashion.

  • Given the example tree, the algorithm looks for frequent itemsetsending in E first, followed by D, C, A, and finally, B.

  • Since every transaction is mapped onto a path in the FP­tree, we can derive the frequent itemsets ending with a particular item, say, E, by examining only the paths containing node E.

  • These paths can be accessed rapidly using the pointers associated with node E.


Association analysis 3

null

null

B:3

B:8

A:2

A:2

A:5

C:3

C:3

C:1

C:1

D:1

D:1

C:3

D:1

D:1

E:1

E:1

D:1

D:1

E:1

E:1

D:1

E:1

E:1

Paths containing node E


Conditional fp tree for e

Conditional FP-Tree for E

  • We now need to build a conditional FP-Tree for E, which is the tree of itemsets ending in E.

  • It is not the tree obtained in previous slide as result of deleting nodes from the original tree.

  • Why? Because the order of the items change.

    • In this example, C has a higher count than B.


Conditional fp tree for e1

Header table

null

The conditional FP-Tree for E

B:3

A:2

C:3

C:1

D:1

null

The new header

C:3

C:1

A:1

E:1

D:1

E:1

B:3

E:1

A:1

D:1

The set of paths containing E.

Insert each path (after truncating E) into a new tree.

D:1

Conditional FP-Tree for E

Adding up the counts for D we get 2, so {E,D} is frequent itemset.

We continue recursively.

Base of recursion: When the tree has a single path only.


Fp tree another example

FP-Tree Another Example

Transactions

Transactions with items sorted based on frequencies, and ignoring the infrequent items.

Freq. 1-Itemsets.

Supp. Count 2

A B C E F O

A C G

E I

A C D E G

A C E G L

E J

A B C E F P

A C D

A C E G M

A C E G N

A C E B F

A C G

E

A C E G D

A C E G

E

A C E B F

A C D

A C E G

A C E G


Association analysis 3

FP-Tree after reading 1st transaction

A C E B F

A C G

E

A C E G D

A C E G

E

A C E B F

A C D

A C E G

A C E G

Header

null

A:1

C:1

E:1

B:1

F:1


Association analysis 3

FP-Tree after reading 2nd transaction

A C E B F

A C G

E

A C E G D

A C E G

E

A C E B F

A C D

A C E G

A C E G

Header

null

A:2

C:2

G:1

E:1

B:1

F:1


Association analysis 3

FP-Tree after reading 3rd transaction

A C E B F

A C G

E

A C E G D

A C E G

E

A C E B F

A C D

A C E G

A C E G

Header

null

A:2

E:1

C:2

G:1

E:1

B:1

F:1


Association analysis 3

FP-Tree after reading 4th transaction

A C E B F

A C G

E

A C E G D

A C E G

E

A C E B F

A C D

A C E G

A C E G

Header

null

A:3

E:1

C:3

G:1

E:2

G:1

B:1

F:1

D:1


Association analysis 3

FP-Tree after reading 5th transaction

A C E B F

A C G

E

A C E G D

A C E G

E

A C E B F

A C D

A C E G

A C E G

Header

null

A:4

E:1

C:4

G:1

E:3

G:2

B:1

F:1

D:1


Association analysis 3

FP-Tree after reading 6th transaction

A C E B F

A C G

E

A C E G D

A C E G

E

A C E B F

A C D

A C E G

A C E G

Header

null

A:4

E:2

C:4

G:1

E:3

G:2

B:1

F:1

D:1


Association analysis 3

FP-Tree after reading 7th transaction

A C E B F

A C G

E

A C E G D

A C E G

E

A C E B F

A C D

A C E G

A C E G

Header

null

A:5

E:2

C:5

G:1

E:4

G:2

B:2

F:2

D:1


Association analysis 3

FP-Tree after reading 8th transaction

A C E B F

A C G

E

A C E G D

A C E G

E

A C E B F

A C D

A C E G

A C E G

Header

null

A:6

E:2

C:6

G:1

D:1

E:4

G:2

B:2

F:2

D:1


Association analysis 3

FP-Tree after reading 9th transaction

A C E B F

A C G

E

A C E G D

A C E G

E

A C E B F

A C D

A C E G

A C E G

Header

null

A:7

E:2

C:7

G:1

D:1

E:5

G:3

B:2

F:2

D:1


Association analysis 3

FP-Tree after reading 10th transaction

A C E B F

A C G

E

A C E G D

A C E G

E

A C E B F

A C D

A C E G

A C E G

Header

null

A:8

E:2

C:8

G:1

D:1

E:6

G:4

B:2

F:2

D:1


Conditional fp trees

Conditional FP-Trees

Build the conditional FP-Tree for each of the items.

For this:

  • Find the paths containing on focus item. With those paths we build the conditional FP-Tree for the item.

  • Read again the tree to determine the new counts of the items along those paths. Build a new header.

  • Insert the paths in the conditional FP-Tree according to the new order.


Conditional fp tree for f

Conditional FP-Tree for F

null

Header

null

New Header

A:2

A:8

C:2

C:8

E:6

E:2

B:2

B:2

F:2

There is only a single path containing F


Recursion

Recursion

  • We continue recursively on the conditional FP-Tree for F.

  • However, when the tree is just a single path it is the base case for the recursion.

  • So, we just produce all the subsets of the items on this path merged with F.

    {F} {A,F} {C,F} {E,F} {B,F} {A,C,F}, …,

    {A,C,E,F}

null

New Header

A:2

C:2

E:2

B:2


Conditional fp tree for d

null

A:8

C:8

D:1

E:6

G:4

D:1

Paths containing D after updating the counts

Conditional FP-Tree for D

null

New Header

A:2

C:2

The other items are removed as infrequent.

  • The tree is just a single path; it is the base case for the recursion.

  • So, we just produce all the subsets of the items on this path merged with D.

  • {D} {A,D} {C,D} {A,C,D}

  • Exercise: Complete the example.


  • Login