1 / 25

FPtree/FPGrowth

FPtree/FPGrowth. FP-Tree/FP-Growth Algorithm. Use a compressed representation of the database using an FP-tree Then use a recursive divide-and-conquer approach to mine the frequent itemsets. Building the FP-Tree. Scan data to determine the support count of each item.

Download Presentation

FPtree/FPGrowth

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. FPtree/FPGrowth

  2. FP-Tree/FP-Growth Algorithm • Use a compressed representation of the database using an FP-tree • Then use a recursive divide-and-conquer approach to mine the frequent itemsets.

  3. Building the FP-Tree • Scan data to determine the support count of each item. Infrequent items are discarded, while the frequent items are sorted in decreasing support counts. • Make a second pass over the data to construct the FP­tree. As the transactions are read, before being processed, their items are sorted according to the above frequency order.

  4. First scan – determine frequent 1-itemsets, then build header

  5. null B:1 A:1 null B:2 C:1 A:1 D:1 FP-tree construction After reading TID=1: After reading TID=2:

  6. null B:8 A:2 A:5 C:3 C:1 D:1 C:3 D:1 D:1 E:1 D:1 E:1 D:1 E:1 FP-Tree Construction Transaction Database Header table Chain pointers help in quickly finding all the paths of the tree containing some given item.

  7. FP-Tree size • Size of FP­tree is typically smaller than the size of the uncompressed data. • Because many transactions often share a few items in common. • Best­case scenario: • All transactions have the same set of items, and the FP­tree contains only a single branch of nodes. • Worst­case scenario: • Every transaction has a unique set of items. • As none of the transactions have any items in common, the size of the FP­tree is effectively the same as the size of the original data. • Size of FP­tree also depends on how the items are ordered. • If the ordering scheme in the preceding example is reversed, i.e., from lowest to highest support item, the resulting FP­tree is denser.

  8. FP-Growth • FP­growth generates frequent itemsets by exploring the FP-tree in a bottom­up fashion. • It starts with the less frequent item, in this example, E. • Then, the algorithm looks for frequent itemsetsending in E first, followed by D, C, A, and finally, B. • We can derive the frequent itemsets ending with E, by examining only the paths containing node E. • These paths can be accessed rapidly using the pointers associated with node E.

  9. null null B:1 B:8 A:2 A:2 A:5 C:1 C:3 C:1 C:1 D:1 D:1 C:3 D:1 D:1 E:1 E:1 D:1 D:1 E:1 E:1 D:1 E:1 E:1 Paths containing node E

  10. Conditional FP-Tree for E • FP-Growth builds a conditional FP-Tree forE, which is the tree of itemsets ending in E. • It is not the tree obtained in the previous slide as result of deleting nodes from the original tree. Why? • Because the order of the items can change. • Now, C has a higher count than B.

  11. null B:1 A:2 C:1 C:1 D:1 E:1 D:1 E:1 E:1 Suffix E (New) Header table Conditional FP-Tree for suffix E null B doesn’t survive because its support is 1, which is lower than minsupport of 2. A:2 C:1 C:1 D:1 The set of paths ending in E. Insert each path (after truncating E) into a new tree. D:1 We continue recursively. Base of recursion: When the tree has a single path only. FI: E

  12. Steps of Building Conditional FP-Trees • Find the paths containing on focus item. • Read the tree to determine the new counts of the items along those paths. Build a new header. • Read again the tree. Insert the paths in the conditional FP-Tree according to the new order.

  13. Suffix DE (New) Header table null The conditional FP-Tree for suffix DE A:2 C:1 D:1 null D:1 A:2 The set of paths, from the E-conditional FP-Tree, ending in D. Insert each path (after truncating D) into a new tree. We have reached the base of recursion. FI: DE, ADE

  14. Base of Recursion • We continue recursively on the conditional FP-Tree. • Base case of recursion:when the tree is just a single path. • Then, we just produce all the subsets of the items on this path merged with the corresponding suffix.

  15. Suffix CE (New) Header table null The conditional FP-Tree for suffix CE A:1 C:1 C:1 null The set of paths, from the E-conditional FP-Tree, ending in C. Insert each path (after truncating C) into a new tree. We have reached the base of recursion. FI: CE

  16. Suffix AE (New) Header table The conditional FP-Tree for suffix AE null null A:2 The set of paths, from the E-conditional FP-Tree, ending in A. Insert each path (after truncating A) into a new tree. We have reached the base of recursion. FI: AE

  17. null B:3 A:2 A:2 C:1 C:1 D:1 C:1 D:1 D:1 D:1 D:1 Suffix D (New) Header table Conditional FP-Tree for suffix D null A:4 B:1 B:2 C:1 C:1 C:1 The set of paths ending in D. Insert each path (after truncating D) into a new tree. We continue recursively. Base of recursion: When the tree has a single path only. FI: D

  18. Suffix CD (New) Header table null Conditional FP-Tree for suffix CD A:4 B:1 B:2 C:1 null C:1 C:1 A:2 B:1 B:1 The set of paths, from the D-conditional FP-Tree, ending in C. Insert each path (after truncating C) into a new tree. We continue recursively. Base of recursion: When the tree has a single path only. FI: CD

  19. Suffix BCD (New) Header table Conditional FP-Tree for suffix CDB null A:2 B:1 null B:1 The set of paths from the CD-conditional FP-Tree, ending in B. Insert each path (after truncating B) into a new tree. We have reached the base of recursion. FI: BCD

  20. Suffix ACD (New) Header table Conditional FP-Tree for suffix ACD null null The set of paths from the CD-conditional FP-Tree, ending in A. Insert each path (after truncating B) into a new tree. We have reached the base of recursion. FI: ACD

  21. Suffix C null (New) Header table Conditional FP-Tree for suffix C B:6 A:1 A:3 C:3 C:1 null C:3 B:6 A:1 A:3 The set of paths ending in C. Insert each path (after truncating C) into a new tree. We continue recursively. Base of recursion: When the tree has a single path only. FI: C

  22. Suffix AC (New) Header table null Conditional FP-Tree for suffix AC B:6 A:1 null A:3 B:3 The set of paths from the C-conditional FP-Tree, ending in A. Insert each path (after truncating A) into a new tree. We have reached the base of recursion. FI: AC, BAC

  23. Suffix BC (New) Header table null Conditional FP-Tree for suffix BC B:6 null The set of paths from the C-conditional FP-Tree, ending in B. Insert each path (after truncating B) into a new tree. We have reached the base of recursion. FI: BC

  24. Suffix A null (New) Header table Conditional FP-Tree for suffix A B:5 A:2 A:5 null B:5 The set of paths ending in A. Insert each path (after truncating A) into a new tree. We have reached the base of recursion. FI: A, BA

  25. Suffix B (New) Header table Conditional FP-Tree for suffix B null null B:8 The set of paths ending in B. Insert each path (after truncating B) into a new tree. We have reached the base of recursion. FI: B

More Related