120 likes | 238 Views
Tutorial 4. Association rule mining. Goal: Find all rules that satisfy the user-specified minimum support ( minsup ) and minimum confidence ( minconf ). Assume all data are categorical. No good algorithm for numeric data.
E N D
Association rule mining • Goal: Find all rules that satisfy the user-specified minimum support (minsup) and minimum confidence(minconf). • Assume all data are categorical. • No good algorithm for numeric data. • Initially used for Market Basket Analysis to find how items purchased by customers are related.
Association rule IF A B Support (AB)= #of tuples containing both (A,B) Total # of tuples • IF A B • Confidence (AB)= #of tuples containing both (A,B) Total # of tuples containing A
The Apriori algorithm • The best known algorithm. • Two steps: • Find all itemsets that have minimum support (frequent itemsets, also called large itemsets). • Use frequent itemsets to generate rules.
Example • Five transactions from a supermarket
Minimum support • Minimum support=2/5= 40%
example {Egg, Milk} , {Egg, butter} {Egg,Milk,butter} After that check all possible pairs in L2: {Egg,Milk} ok {Egg,Butter} ok {Milk,butter} No Remove it
cont • Minimum support=2/5= 40% min confidence=70%
Results • Egg Butter Support: 60% confidence:75% • Butter Egg Support: 60% confidence:75% • Milk Egg Support: 40% confidence:100% • Baby Powder Butter Support: 40% confidence:100%
Insert the same example to weka. • Try the same example in Weka, insert marketing-list.csv
Reference: • “Association Rules Apriori Algorithm”, https://dspace.ist.utl.pt/bitstream/2295/55704/1/licao_9.pdf