Takeaki Uno Ken Satoh National Institute of Informatics, JAPAN　 19/Nov/2003 FIMI 2003

Detailed Description of an Algorithm forEnumeration of Maximal Frequent Sets with Irredundant DualizationIrredundant Border Enumerator Takeaki Uno Ken Satoh National Institute of Informatics, JAPAN　 19/Nov/2003 FIMI 2003

Outline of This Talk ・ Explanation of our algorithm (improved version of Gunopulos et al.) ・ Algorithm technique using sparseness. ・ Computational experiments for datasets

Algorithm of Gunopulos et al. 11…1 1. Find minimal sets by dualization 2. If one of them is frequent, then find a maximal frequent set including it, and go to 1. 00…0

Algorithm of Gunopulos et al. 11…1 -solves dualization many times -finds the same minimal set many times 1. Find minimal sets by dualization 2. If one of them is frequent, then find a maximal frequent set including it, and go to 1. Our algorithm dualizes and finds maximal elements simultaneously Irredundant Dualization 00…0

Our Algorithm 11…1 1. When find a frequent minimal set during dualization, find a maximal frequent set including it, and add it to the current set. 00…0

Our Algorithm -finds each minimal set once -solves one dualization -dualization can accept additional input 11…1 1. When find a frequent minimal set during dualization, find a maximal frequent set including it, and add it to the current set. Incremental dualization by Kavvadias and Stavropoulos, or by Uno 00…0

C B ABC CE CD BCD CE ACD CDE BCDE ACDE Incremental Dualization φ AE - Algorithms of Kavvadias and Stavropoulos, by Uno ( !! input sets are the complement in the terms of dualizaion) CDE

Algorithm Technique: crit items: itemset |max sets| : # max. sets - Algorithms of Kavvadias and Stavropoulos, by Uno, checks minimality many times (each takes O(|max sets|×|items|) time) - Algorithm of Uno checks it by using "crit" (critical elements) crit(e,H) ≠ φ ⇔ His minimal - crit can be updated for H∪{e} in O(|max sets|) time improving factor = O(|items|)

Using Sparseness remains max. sets - Checking minimality for all H∪{e} takes O(|max. sets|×|items|) time - Checking them by tracing each max. set - |items|  ave. size ofmax sets e1 e2 e3 e4 e5 e6 crit(*,H∪*)

Comparison to Bottom Up - Computation time depends on: Bottom up approach (ex. apriori)  #frequent sets, #closed sets Our algorithm  #max. frequent sets, #min. infrequent sets. For instances with few minimum infrequent sets, Our algorithm performs well

Experiments

Conclusion - We improved the algorithm of Gunopulos et al. by irredundant dualization and sparse algorithms - The computation time depends on #max. frequent sets, #min. infrequent sets. (reduced to size of max sets / |items|2|max sets|) For further improvements - Speed up dulization by pruning of unnecessary items - Speed up updating occurrences by usual techniques

Takeaki Uno Ken Satoh National Institute of Informatics, JAPAN　 19/Nov/2003 FIMI 2003