1 / 24

Efficient Algorithms for Mining Share-Frequent Itemsets

Efficient Algorithms for Mining Share-Frequent Itemsets. Authors: Y. C. Li, J. S. Yeh and C. C. Chang Speaker: Yu-Chiang Li Date :July 28, 2005. Outline. Introduction Related Work Enhanced Fast Share Measure (EFSM) Algorithm Support-Counted Fast Share Measure (SuFSM) Algorithm

ping
Download Presentation

Efficient Algorithms for Mining Share-Frequent Itemsets

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Efficient Algorithms for Mining Share-Frequent Itemsets Authors: Y. C. Li, J. S. Yeh and C. C. Chang Speaker: Yu-Chiang Li Date :July 28, 2005

  2. Outline • Introduction • Related Work • Enhanced Fast Share Measure (EFSM) Algorithm • Support-Counted Fast Share Measure (SuFSM) Algorithm • Share-Counted Fast Share Measure (ShFSM) Algorithm • Experimental Results • Conclusions

  3. Introduction (1/2) • Goal: discovering the buying patterns of customers • Itemset: a group of items (products) bought together in a transaction • Support: the ratio of transactions containing the itemset to the total transaction number (limited in informative feedback) • Share: the ratio of the total count of items in the itemset to the total count of items in the database

  4. Introduction (2/2) • Share-confidence framework: providing useful information about numerical values associated with transaction items ( Carter et al., 1997) • Share-frequent (SH-frequent) itemset: usually includes some infrequent subsets • Fast Share Measure (FSM) algorithm discovers share-frequent itemsets on small dataset efficiently • This study proposes Enhanced FSM, SuFSM and ShFSM to discover share-frequent itemsets more efficiently than that of FSM

  5. Related Work • Support-Confidence Framework (Agrawal et al., 1993) • Each item is a binary variable denoting whether an item was purchased • Apriori (Agrawal & Swami, 1994) & Apriori-like algorithms • Pattern-growth algorithms (Han et al., 2000; Han et al, 2004) • Share-Confidence Framework (Carter et al., 1997) • Support-confidence framework does not analyze the exact number of products purchased • The support count method does not measure the profit or cost of an itemset • Exhaustive search algorithm (Carter et al., 2000) • FSM algorithm (Li et al., 2005)

  6. Related Work Apriori algorithm (Agrawal and Srikant, 1994): minSup = 40%

  7. Share-Confidence Framework • Measure value: mv(ip, Tq) • mv({D}, T01) = 1 • mv({C}, T03) = 3 • Transaction measure value: tmv(Tq) = • tmv(T02) = 9 • Total measure value: Tmv(DB)= • Tmv(DB)=44 • Itemset measure value: imv(X, Tq)= • imv({A, E}, T02)=4 • Local measure value: lmv(X)= • lmv({BC})=2+4+5=11

  8. Itemset share: SH(X)= • SH({BC})=11/44=25% • SH-frequent: if SH(X) >= minShare, X is a share-frequent (SH-frequent) itemset minShare=30%

  9. Existing algorithms • ZP(Zero Pruning)、ZSP(Zero Subset Pruning) • Variants of exhaustive search • Prune the candidate itemsets whose local measure values are exactly zero • FSM(Fast Share Measure)(Li et al., 2005) • Fast on a small dataset • Generate too many candidates • Existing algorithms are inefficient on a large datasets

  10. ZP Algorithm

  11. ZSP Algorithm

  12. FSM: Fast Share Measure Algorithm • ML: Maximum transaction length in DB • MV: Maximum measure valuein DB • Let min_lmv=minShare×Tmv • Let CF(X)FSM= lmv(X)+(lmv(X)/k)×MV ×(ML-k) • If CF(X)FSM< min_lmv, all supersets of X are infrequent

  13. FSM: Fast Share Measure Algorithm • minShare=30%, ML=6, MV=3, TMV=44 • min_lmv=14 • Prune X if CF(X)FSM <min_lmv • Let X={A B C} • CF(X)FSM =3+(3/3)×3×(6-3)=12<14=min_lmv

  14. Enhanced FSM (EFSM) Algorithm • EFSM: instead of joining arbitrary two itemsets in RCk-1, EFSM joins arbitrary itemset of RCk-1 with a single item in RC1 to generate Ck efficiently • Reduce time complexity from O(n2k-2) to O(nk)

  15. SuFSM (Support-counted FSM) • Xk+1:arbitrary superset of X with length k+1 in DB • S(Xk+1): the set which contains all Xk+1 in DB • dbS(Xk+1): the set of transactions of which each transaction contains at least one Xk+1 • SuFSM and ShFSM from EFSM which prune the candidates more efficiently than FSM • SuFSM (Support-counted FSM): • Theorem 1. If lmv(X)+Sup(S(Xk+1))×MV×(ML – k)< min_lmv, all supersets of X are infrequent

  16. lmv(X)/k Sup(X) Sup(S(Xk+1)) EX. lmv({BCD})/k=15/3=5, Sup({BCD})=3, Sup(S({BCD}k+1))=2 If there is no superset of X is an SH-frequent itemset, then the following three equations hold lmv(X)+(lmv(X)/k)×MV×(ML - k) < min_lmv lmv(X)+Sup(X) ×MV×(ML - k) < min_lmv lmv(X)+Sup(S(Xk+1)) ×MV×(ML - k) < min_lmv SuFSM (Support-counted FSM)

  17. ShFSM (Share-counted FSM) • dbS(Xk+1): the set of transactions of which each transaction contains at least one Xk+1 • ShFSM (Share-counted FSM): • Theorem 2. If Tmv(dbS(Xk+1)) < min_lmv, all supersets of X are infrequent • FSM:lmv(X)+(lmv(X)/k)×MV×(ML - k) < min_lmv • SuFSM:lmv(X)+Sup(S(Xk+1)) ×MV×(ML - k) < min_lmv • ShFSM: Tmv(dbS(Xk+1)) < min_lmv • CF(X)FSM>=CF(X)SuFSM>=CF(X)ShFSM

  18. FSM:lmv(X)+(lmv(X)/k)×MV×(ML - k) < min_lmv • SuFSM:lmv(X)+Sup(S(Xk+1)) ×MV×(ML - k) < min_lmv • ShFSM: Tmv(dbS(Xk+1)) < min_lmv • Ex. X = {BCD} • CF(X)FSM = 9+(9/3)×3×(6-3)=36 • CF(X)SuFSM = 9+2×3×(6-3)=18 • CF(X)ShFSM = 6+8=14

  19. ShFSM (Share-counted FSM) • Ex. X={AB} • Tmv(dbS(Xk+1)) = tmv(T01)+tmv(T05) =6+6=12 <14 = min_lmv

  20. Experimental Results (1/3) • PC: Pentium IV 1.5 GHZ, 1.5GB SDRAM, running Windows XP professional • All algorithms were coded in VC++ 6.0 Figure 1 Figure 2

  21. Experimental Results (2/3) minShare=0.1% Figure 3 Figure 4

  22. T6.I4.D100k.N200.S10 minShare = 0.1% ML=20 , MV=10 Tmv=2,302,443 ExperimentalResults (3/3)

  23. Conclusions • This study proposes the Enhanced FSM (EFSM) algorithm to efficiently reduce the time complexity of the join step • We have also developed SuFSM and ShFSM from EFSM • SuFSM and ShFSM can efficiently prune the candidates, and significantly improve the performance • The experimental results have indicated that ShFSM has the best performance • In the future, we plan to develop even more advanced algorithms to accelerate the process of identifying all share-frequent itemsets

  24. Thank You

More Related