1 / 26

EstMax : Tracing Maximal Frequent Item Sets Instantly over Online Transactional Data Streams

Outline. AbstractIntroductionRelated workPreliminaries estMax methodExperimentsConclusions . 2. Abstract(1/2). The number of frequent item sets in a typeical data set is very large:Solution: Frequent item sets needs to be represented in a more compact notations: maximal frequent item s

eshe
Download Presentation

EstMax : Tracing Maximal Frequent Item Sets Instantly over Online Transactional Data Streams

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. estMax : Tracing Maximal Frequent Item Sets Instantly over Online Transactional Data Streams Source: IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 21, NO. 10, OCTOBER 2009 Authors: Ho Jin Woo ,Won Suk Lee Reporter: Cheng-Ting Hsieh 1

    2. Outline Abstract Introduction Related work Preliminaries estMax method Experiments Conclusions 2

    3. Abstract(1/2) The number of frequent item sets in a typeical data set is very large: Solution: Frequent item sets needs to be represented in a more compact notations: maximal frequent item set(MFI) closed frequent item set(CFI) Finding such item sets over online transactional data streams is not easy Solution: estMax method (tracing the set of MFIs instantly over an online data strean) 3

    4. Abstract(2/2) Advantages of estMax method: without any checking superset/subset mechanism extracting MFIs at any moment over online data streams 4

    5. Introduction estMax method based on estDec method the underlying node structure is prefix tree 5

    6. Related Work MOMENT Closed Enumeration Tree(CET) Direct Update tree(DIU) INSTANT Single-phase algorithm for finding MFIs over a data stream 6

    7. Preliminaries(1/5) Prefix tree: the root node has a null value each node has two fields item-id cnt n-item set e (i1,i2,…,in) nroot ?i1?i2?… ?in 7

    8. Preliminaries(2/5) detestDec method determines each item set generated in a new transaction operations of estDec method delay insertion pruning operation 8

    9. Preliminaries(3/5) delay insertion Case1: When a new 1-item appears in a new generated transaction 9

    10. Preliminaries(4/5) Case2: n-item set e(n?2) used to insignificant just becomes significant , set e is significant 10

    11. Preliminaries(5/5) pruning operation When the current support of an n-item set(n ?2) becomes less than Ssig 11

    12. estMax method(1/8) ML:Maximal lifetime , IS_MAX If the item set e is an MFI, IS_MAX = true cnt : count Ck(e) err : estimate error e(e) tid : the identifier of the least transaction that contains the item set e 12

    13. estMax method(2/8) Top-q t-max 13

    14. estMax method(3/8) Error reduction 14

    15. estMax method(4/8) estMax method: Parameter updating phase Count updating phase Item set insertion phase MFI selection phase 15

    16. estMax method(5/8) Parameter updating phase The total number of transactions in the current data stream Dk is updated 16

    17. estMax method(6/8) Count updating phase If the v.cnt < Ssig * |Dk| ? prune this node and all of it’s descendent nodes If the v.cnt ? Ssig * |Dk| , If the v.err ?Serr * |Dk | ?the item set e is a new MFI ? v.IS_MAX = True 17

    18. estMax method(7/8) Item set insertion phase (i) new 1-items (not contained in Pk-1) ? insert into Pk-1 ML=k, IS_MAX=True (ii) Filtering Tk with Ssig significant item sets (iii) Finding new significant item sets and insert into Pk-1 18

    19. estMax method(8/8) MFI selection phase retraversing the prefix tree Pk Sk(e) ?Smin and IS_MAX(e)=True then e is MFI 19

    20. Experiments(1/6) 20

    21. Experiments(2/6) 21

    22. Experiments(3/6) False negative errors False position errors 22

    23. Experiments(4/6) 23

    24. Experiments(5/6) 24

    25. Experiments(6/6) 25

    26. Conclusions By using these two parameters ML and IS_MAX tracing MFIs without superset/subset checking By several predefined thresholds diminish the false positive and negative errors Serr controlling the accuracy of MFIs top-q-Tk-maxes providing a nice trade-off between accuracy and processing time 26

    27. Thank for your attention 27

More Related