1 / 36

Mohammad El-Hajj

Inverted Matrix : Efficient Discovery of Frequent Items in Large Datasets in the Context of Interactive Mining. KDD 2003. Mohammad El-Hajj. Osmar R. Zaïane. Department of Computing Science University of Alberta, Canada. Introduction Pre-processing Mining Phase Experiments Conclusion.

meryle
Download Presentation

Mohammad El-Hajj

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Inverted Matrix: Efficient Discovery of Frequent Items in Large Datasets in the Context of Interactive Mining KDD 2003 Mohammad El-Hajj Osmar R. Zaïane Department of Computing Science University of Alberta, Canada

  2. Introduction Pre-processing Mining Phase Experiments Conclusion Outline • Introduction • Pre-Processing Phase Transactional Layouts • Mining Phase Building COFI-trees Mining COFI-trees • Experimental Studies • Conclusion and Future work

  3. Frequent Itemset Mining Association Rules Generation 1 2 Introduction Pre-processing Mining Phase Experiments Conclusion Association Rule Mining Association rule mining is crucial in many applications and plays an essential role in many important mining tasks. Antecedent  Consequent Body  Head FIM

  4. Introduction Pre-processing Mining Phase Experiments Conclusion Challenges for FIM 1. High memory dependency 2. Repetitive tasks, (I/O) readings (Superfluous Processing) 3. Non interactive mining Expensive candidacy generation step OR Huge Memory based Data structures

  5. Introduction Pre-processing Mining Phase Experiments Conclusion Challenges for FIM 1. High memory dependency 2. Repetitive tasks, (I/O) readings (Superfluous Processing) 3. Non interactive mining Support > 4 Frequent 1-itemsets {A, B, C, D, E, F} Non frequent items {G, H, I, J, K, L, M, N, O, P, Q, R}

  6. Introduction Pre-processing Mining Phase Experiments Conclusion Challenges for FIM 1. High memory dependency 2. Repetitive tasks, (I/O) readings (Superfluous Processing) 3. Non interactive mining Support > 9 Frequent 1-itemsets {A, B, C} Non frequent items {D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R}

  7. Evaluation and Presentation Knowledge Data Mining Selection and Transformation Patterns Data warehouse Databases Introduction Pre-processing Mining Phase Experiments Conclusion Challenges for FIM 1. High memory dependency 2. Repetitive tasks, (I/O) readings (Superfluous Processing) 3. Non interactive mining Changing the support level means expensive steps (whole process is redone)

  8. Introduction Pre-processing Mining Phase Experiments Conclusion Motivation • New association Rule mining algorithm that has the following features 1. Low Memory Dependency 2. Remove Superfluous Processing 3. Interactive Mining Ready Without compromising scalability

  9. Introduction Pre-processing Mining Phase Experiments Conclusion Transactional Layouts • Horizontal Layout Candidacy generation can be removed (FP-Growth) Superfluous Processing

  10. Introduction Pre-processing Mining Phase Experiments Conclusion Transactional Layouts • Vertical Layout Minimize Superfluous Processing Candidacy generation is required

  11. Introduction Pre-processing Mining Phase Experiments Conclusion Suggested Layout • Inverted Matrix Layout: Combines the horizontal and vertical layouts 2 I/O passes

  12. Introduction Pre-processing Mining Phase Experiments Conclusion TransactionalLayouts • Inverted Matrix Layout Pass 1, generates sorted item list (based on frequency)

  13. T# Items T1 A G D C B T2 B C H E D T3 B D E A M T4 C E F A N T5 A B N O P T6 A C Q R G T7 A C H I G Transactional Array Loc Index T8 L E F K B 1 2 3 4 5 6 7 8 9 10 11 T9 A F M N O 1 R 2 T10 C F P J R 2 Q 2 T11 A D B H I 3 P 3 T12 D E B K L 4 O 3 T13 M D C G O 5 N 3 T14 C F P Q J 6 M 3 T15 B D E F I 7 L 3 T16 J E B A D 8 K 3 T17 A K E F C 9 J 3 T18 C D L B A 10 I 3 11 H 3 12 G 4 (15,1) 13 F 7 14 E 8 15 D 9 (16,1) 16 C 10 (17,1) 17 B 10 (18,1) 18 A 11 (¤, ¤) Introduction Pre-processing Mining Phase Experiments Conclusion TransactionalLayouts • Inverted Matrix Layout Pass 2, Generate the transactional array of the IM

  14. T# Items T1 A G D C B T2 B C H E D T3 B D E A M T4 C E F A N T5 A B N O P T6 A C Q R G T7 A C H I G Transactional Array Loc Index T8 L E F K B 1 2 3 4 5 6 7 8 9 10 11 T9 A F M N O 1 R 2 T10 C F P J R 2 Q 2 T11 A D B H I 3 P 3 T12 D E B K L 4 O 3 T13 M D C G O 5 N 3 T14 C F P Q J 6 M 3 T15 B D E F I 7 L 3 T16 J E B A D 8 K 3 T17 A K E F C 9 J 3 T18 C D L B A 10 I 3 11 H 3 (14,1) 12 G 4 (15,1) 13 F 7 14 E 8 (15,2) 15 D 9 (16,1) (16,2) 16 C 10 (17,1) (17,2) 17 B 10 (18,1) (¤, ¤) 18 A 11 (¤, ¤) Introduction Pre-processing Mining Phase Experiments Conclusion TransactionalLayouts • Inverted Matrix Layout

  15. Transactional Array Loc Index 1 2 3 4 5 6 7 8 9 10 11 1 R 2 (2,1) (3,2) 2 Q 2 (12,2) (3,3) 3 P 3 (4,1) (9,1) (9,2) 4 O 3 (5,2) (5,3) (6,3) 5 N 3 (13,1) (17,4) (6,2) 6 M 3 (14,2) (13,3) (12,4) 7 L 3 (8,1) (8,2) (15,9) 8 K 3 (13,2) (14,5) (13,7) 9 J 3 (13,4) (13,5) (14,7) 10 I 3 (11,2) (11,3) (13,6) 11 H 3 (14,1) (12,3) 15,4) 12 G 4 (15,1) (16,4) (16,5) (15,6) 13 F 7 (14,3) (14,4) (18,7) (16,6) (16,8) (14,6) (14,8) 14 E 8 (15,2) (15,3) (16,3) (17,5) (15,5) (15,7) (15,8) (16,9) 15 D 9 (16,1) (16,2) (17,2) (17,6) (17,7) (16,7) (17,8) (17,9) (16,10) (¤, ¤) (¤, ¤) (¤, ¤) 16 C 10 (17,1) (17,2) (18,3) (18,5) (18,6) (18,10) (17,10) (¤, ¤) (¤, ¤) (¤, ¤) (¤, ¤) 17 B 10 (18,1) (18,2) (18,4) (18,8) (18,9) (18,11) (¤, ¤) (¤, ¤) (¤, ¤) (¤, ¤) (¤, ¤) (¤, ¤) (¤, ¤) (¤, ¤) (¤, ¤) (¤, ¤) (¤, ¤) 18 A 11 Introduction Pre-processing Mining Phase Experiments Conclusion TransactionalLayouts There is no minimum support involved in building the Inverted Matrix. • Inverted Matrix Layout

  16. Introduction Pre-processing Mining Phase Experiments Conclusion TransactionalLayouts • Inverted Matrix Layout Support > 4 Border Support

  17. Introduction Pre-processing Mining Phase Experiments Conclusion Transactional Layouts • Inverted Matrix Layout

  18. Introduction Pre-processing Mining Phase Experiments Conclusion Transactional Layouts • Inverted Matrix Layout

  19. Introduction Pre-processing Mining Phase Experiments Conclusion Sub transactions generated from IM Frequent sub-transaction with item E Frequent sub-transaction with item F Frequent sub-transaction with item D Frequent sub-transaction with item C Frequent sub-transaction with item B

  20. Introduction Pre-processing Mining Phase Experiments Conclusion Co-Occurrences Frequent Item tree Participation Count Frequency Count Building F-COFI-tree

  21. Introduction Pre-processing Mining Phase Experiments Conclusion Co-Occurrences Frequent Item tree Building F-COFI-tree

  22. Introduction Pre-processing Mining Phase Experiments Conclusion Co-Occurrences Frequent Item tree Building F-COFI-tree

  23. Introduction Pre-processing Mining Phase Experiments Conclusion Co-Occurrences Frequent Item tree Building F-COFI-tree

  24. Introduction Pre-processing Mining Phase Experiments Conclusion Co-Occurrences Frequent Item tree Building F-COFI-tree

  25. Introduction Pre-processing Mining Phase Experiments Conclusion Co-Occurrences Frequent Item tree Building F-COFI-tree

  26. Introduction Pre-processing Mining Phase Experiments Conclusion Co-Occurrences Frequent Item tree Building F-COFI-tree

  27. Introduction Pre-processing Mining Phase Experiments Conclusion Co-Occurrences Frequent Item tree

  28. Introduction Pre-processing Mining Phase Experiments Conclusion Mining COFI-trees E-COFI-tree

  29. Introduction Pre-processing Mining Phase Experiments Conclusion Mining COFI-trees E-COFI-tree Support = Frequency count – Participation count

  30. Introduction Pre-processing Mining Phase Experiments Conclusion Mining COFI-trees E-COFI-tree

  31. Introduction Pre-processing Mining Phase Experiments Conclusion Mining COFI-trees E-COFI-tree

  32. Introduction Pre-processing Mining Phase Experiments Conclusion Mining COFI-trees E-COFI-tree

  33. Introduction Pre-processing Mining Phase Experiments Conclusion Mining COFI-trees D-COFI-tree DBA:5 DA:5 DB:8 B-COFI-tree BA:6 C-COFI-tree CA:6

  34. Introduction Pre-processing Mining Phase Experiments Conclusion Experimental Studies Time needed to mine 1M transactions with different support levels Pentium 700Mhz with 256 MB of RAM

  35. Introduction Pre-processing Mining Phase Experiments Conclusion Experimental Studies Accumulated time needed to mine 1M transactions using 4 different support levels Time needed in seconds to mine different transaction sizes Pentium 700Mhz with 256 MB of RAM

  36. Introduction Pre-processing Mining Phase Experiments Conclusion Conclusion and Future work New AR algorithm • Low memory dependency • No Superfluous processing • Interactive mining ready • scalable Future work Updateable Inverted Matrix for native storage of transactions Compressing the size of Inverted Matrix Parallelizing the mining process as well as the construction of the Inverted Matrix

More Related