310 likes | 323 Views
Mining Free Itemsets under Constraints. By Jean-Francois Boulicaut and Baptiste Jeudy International Database Engineering and Application Symposium. Content. Introduction Constrained itemset mining Apriori revisit Anti-monotone constrains Monotone constrains Generic algorithm
E N D
Mining Free Itemsets under Constraints By Jean-Francois Boulicaut and Baptiste Jeudy International Database Engineering and Application Symposium
Content • Introduction • Constrained itemset mining • Apriori revisit • Anti-monotone constrains • Monotone constrains • Generic algorithm • Frequent closed itemset mining • CLOSE algorithm • Incorporating constraints into Apriori • Conclusion
Introduction • Frequent itemset mining • A set of items is referred to as itemset • X is an item(or itemset), • Support is bounded by a threshold r • A frequent itemset is an itemset with a support larger than the minimum support • Given a database, find all the frequent itemsets
Introduction • Problems with frequent itemset mining algorithms • The computation may be intractable for a user-given frequency threshold: the number of frequent itemsets may explode • Lack of focus leads to huge output of frequent itemsets
Introduction • Two issues to tackle these problems • Constraint-based extraction of the frequent itemsets: only a subset of the collection of frequent itemsets is interesting. • Condensed representation of frequent itemsets: extract a subset of the frequent patterns and regenerate the whole collection when necessary
Introduction • Constraint-based extraction of frequent itemsets • Syntactic constraints • an item must not appear in the itemsets • Constraints related to objective measures of interestingness • the itemsets must be frequent • Push constraint checking into algorithms • Anti-monotone constraints • Monotone constraints • Decrease the size of output • Improve user guidance
Introduction • Condensed representation of frequent itemsets • Extract a particular subset of the frequent itemset collection • The condensed subset is much smaller than the original collection • Can be extracted efficiently • The whole frequent itemsets can be regenerated
Introduction • Main idea of the paper • Combine the above two approaches into one algorithm • This algorithm is based on the structure of Apriori
Content • Introduction • Constrained itemset mining • Apriori revisit • Anti-monotone constrains • Monotone constrains • Generic algorithm • Frequent closed itemset mining • CLOSE algorithm • Incorporating constraints into Apriori • Conclusion
Summary of paper • Definition of constraints • : transactional database • : set of all itemsets • : constraint • : itemset, • : subset of • S satisfies C in T ( , T ) = • (I)={ , satisfies } • denotes
Summary of paper : an itemset must be at least frequent. = 0,6 and , then ,
Summary of paper • Constrained itemset mining • : transactional database • : constraint • Computation of the collection of itemsets that satisfy together with their frequecies • Use Apriori for constrained itemset mining where is
Summary of paper - Apriori The completeness of Apriori relies on the anti-monotonicity of the constraint • Apriori Algorithm • 1. • 2. • 3.while do • 4. safe-pruning-on( ) • 5. • 6. • 7. • 8. Phase 1 – Candidate safe pruning Eliminate candidates for which a subset of length k is not frequent Phase 2- frequency constraint (database scan) Phase 3 – candidate generation for level k+1, fuse two elements that share the same k-1 first items where A and B share the k-1 first items(in lexicographic order)}
Anti-monotone constraints • Definition: an anti-monotone constraint is a constraint C such that for all itemsets S, S’: satisfy satisfy • If S does not satisfy , every superset of S does not satisfy • Example: • A disjunction or conjunction of anti-monotone constraints is an anti-monotone constraint
Anti-monotone constraints • Apriori can be changed: • Let be an anti-monotone constraint. Step 5 of Apriori is replaced by it is still correct and complete. • Apriori can be used to mine constrained itemsets when the given constraint is anti-monotone What about monotone constraints?
Monotone Constraints • Definition • is true is true • Example • Given a monotone constraint , simply replacing Step 5 in Apriori with leads to the loss of the completeness of Apriori.
The generation step in Apriori must be complete: i.e., it must not miss any itemset satisfying C Monotone Constraints • Example • Assume Itemset ABC should be generated by from AB and AC but since ACB is not generated whereas • Assume Itemset ABC is correctly generated by from AB and AC but since ACB is incorrectly pruned whereas The pruning step (Phase 1) must be correct, i.e., it must not prune an itemset that verify C The generation step and pruning step need to be modified in order to include monotone constraints
Monotone Constraints • Some definition in modified generation procedure • Negative border: If denotes an anti-monotone constraint, is the collection of the minimal itemsets that do not satisfy • denotes a monotone constraint, it is the negation of , so equals to
Monotone Constraints • denotes an anti-monotone constraint • denotes a monotone constraint • denotes the collection of the minimal itemsets that do not satisfy • Generation procedure • and B is a 1-itemset • A,B • Assumeand For , • If k<ms, • If k=ms, • If k>ms, • This generation procedure is complete and ensures that every candidate itemset verifies ( ) We do not need to verify the monotone constraint after this generation procedure
Monotone Constraints • Pruning procedure • For all and for all such that |S’|=k do if and then delete S from • is correct and complete The algorithm is correct because it does not prune any itemset that verify .Its completeness means that if an itemset is not pruned then every proper subset of that itemset verify .
Generic Algorithm • For a constraint , the generic algorithm uses the structure of Apriori and the procedures and • 1. • 2. k:=1 • 3. while do • 4. Phase 1 – candidate safe pruning • 5. Phase 2 - anti-monotone constraint checking • 6. Phase 3 – candidate generation for level k+1 • 7. k:=k+1 • 8. output Apriori Algorithm 1. 2. k:=1 3. while do 4. Phase 1–candidate safe pruning 5. Phase 2-frequency constraint 6. Phase 3-candidate generate 7. k:=k+1 8. Output
Generic Algorithm-example • Constraints: • and B is 1-itemset} • A,B } • k<ms, • k=ms, • k<ms, For all and for all such that |S’|=k do if and then delete S from {B,BC,BD,BCD}
Content • Introduction • Constrained itemset mining • Apriori revisit • Anti-monotone constrains • Monotone constrains • Generic algorithm • Frequent closed itemset mining • CLOSE algorithm • Incorporating constraints into Apriori • Conclusion
CLOSE algorithm-frequent closed itemset mining • The closure of an itemset S(closure(S)) is the maximal superset of S which has the same support as S. • A closed itemset is an itemset that is equal to its closure • The set of closed itemset is a lattice called the closed itemset lattice
CLOSE algorithm We can use this constraint in our generic algorithm together with other constraints to achieve constrained free-set mining • We can consider CLOSE as an exploration of the classical itemset lattice with a new constraint • A constraint for CLOSE • Free itemsets: itemsets that are not included in any closure of their proper sub-set. Equivalently, free itemsets are itemsets that verify
CLOSE algorithm The closure of an itemset S(closure(S)) is the maximal superset of S which has the same support as S • Example • Closure(AB): items A and B are simultaneously in transactions 1,4,6. Item C is the only other item that is also present in these three transactions, thus closure(AB)=ABC. • Closure(A)=AC, Closure(B)=BC, and . Therefore is true. • If frequency threshold r = ½, where means that
CLOSE algorithm • The constraint is anti-monotone, it needs a database pass to be checked • Checking this constraint seems expensive if the closure of every subset of S has to be computed • We can use an equivalent constraint • The equivalence means that is true iff is true. • We need the closure of every subset of S of size |S|-1, then check if
Incorporating constraints into Apriori • Directly using causes two problems • The closures of some candidates of level k are not computed => impossible to check at level k+1 • will no longer enables to compute
Incoporating constraints • Assume we replace • with • with • Then: the constraints and are equivalent and anti-monotone. The set can be efficiently computed using the same method as in CLOSE using i.e., the output of the generic algorithm with the constraint Now we can find free-itemsets that verify conjunctions of anti-monotone and monotone constraints
Content • Introduction • Constrained itemset mining • Apriori revisit • Anti-monotone constrains • Monotone constrains • Generic algorithm • Frequent closed itemset mining • CLOSE algorithm • Incorporating constraints into Apriori • Conclusion
Conclusion • Frequent itemset mining can be intractable for a given support threshold and a particular database • Two issues to address this problem: constraint-based itemset mining and condensed representation of frequent itemsets • The generic algorithm can be used to achieve constrained free-set mining when