Association Mining

Association Mining Data Mining Spring 2012

Transactional Database • Transactional Database • Transaction – A row in the database • i.e.: {Eggs, Cheese, Milk}

Items and Itemsets • Item = {Milk}, {Cheese}, {Bread}, etc. • Itemset = {Milk}, {Milk, Cheese}, {Bacon, Bread, Milk} • Doesn’t have to be in the dataset • Can be of size 1 – n

The Support Measure

Support Examples Support({Eggs}) = 3/5 = 60% Support({Eggs, Milk}) = 2/5 = 40%

Minimum Support Minsup– The minimum support threshold for an itemset to be considered frequent (User defined) Frequent itemset – an itemset in a database whose support is greater than or equal to minsup. Support(X) >minsup = frequent Support(X) < minsup = infrequent

Minimum Support Examples Minimum support = 50% Support({Eggs}) = 3/5 = 60%  Pass Support({Eggs, Milk}) = 2/5 = 40%  Fail

Association Rules

Confidence Example 1 {Eggs} => {Bread} Confidence = sup({Eggs, Bread})/Sup({Eggs}) Confidence = (1/5)/(3/5) = 33%

Confidence Example 2 {Milk} => {Eggs, Cheese} Confidence = sup({Milk, Eggs, Cheese})/sup({Milk}) Confidence = (2/5)/(3/5) = 66%

Strong Association Rules Minimum Confidence– A user defined minimum bound on confidence. (Minconf) Strong association rule – a rule X=>Y whose conf>minconf. - this is a potentially interesting rule for the user. Conf(X=>Y) >minconf = strong Conf(X=>Y) < minconf = uninteresting

Minimum Confidence Example Minconf = 50% {Eggs} => {Bread} Confidence = (1/5)/(3/5) = 33% Fail {Milk} => {Eggs, Cheese} Confidence = (2/5)/(3/5) = 66%  Pass

Association Mining Association Mining: - Finds strong rules contained in a dataset from frequent itemsets. Can be divided into two major subtasks: 1. Finding frequent itemsets 2. Rule generation

Transactional Database Revisited • Some algorithms change items into letters or numbers • Numbers are more compact • Easier to make comparisons

Basic Set Logic Subset – a subset itemset X is contained in an itemset Y. Superset – a superset itemset Y contains an itemset X. example: X = {1,2} Y = {1,2,3,5} Y X

Apriori • Arranges database into a temporary lattice structure to find associations • Apriori principle – 1. itemsets in the lattice with support < minsupwill only produce supersets with support < minsup. 2. the subsets of frequent itemsets are always frequent. • Prunes lattice structure of non-frequent itemsets using minsup. • Reduces the number of comparisons • Reduces the number of candidate itemsets

Monotonicity Monotone (upward closed) - if X is a subset of Y, then support(X) cannot exceed support(Y). Anti-Monotone (downward closed) - if X is a subset of Y, then support(Y) cannot exceed support(X). Apriori is anti-monotone. - uses this property to prune the lattice structure.

Itemset Lattice

Lattice Pruning

Lattice Example Count occurrences of each 1-itemset in the database and compute their support: Support = #occurrences/#rows in db Prune anything less than minsup = 30%

Lattice Example Count occurrences of each 2-itemset in the database and compute their support Prune anything less than minsup = 30%

Lattice Example Count occurrences of the last 3-itemset in the database and compute its support. Prune anything less than minsup = 30%

Example - Results Frequent itemsets: {1}, {2}, {3}, {1,2}, {1,3}, {2,3}, {1,2,3}

Apriori Algorithm

Frequent Itemset Generation • Minsup = 70% • Generate all 1-itemsets • Calculate the support for each itemset • Determine whether or not the itemsets are frequent

Frequent Itemset Generation Generate all 2-itemsets, minsup = 70% {1} U {3} = {1,3} , {1} U {5} = {1,5} {3} U {5} = {3,5}

Frequent Itemset Generation Generate all 3-itemsets, minsup = 70% {1,3} U {1,5} = {1,3,5}

Frequent Itemset Results All frequent itemsets generated are output: {1} , {3} , {5} {1,3} , {1,5} , {3,5} {1,3,5}

Apriori Rule Mining

Apriori Rule Mining Rule Combinations: 1. {1,2} 2-itemsets {1}=>{2} {2}=>{1} 2. {1,2,3} 3-itemsets {1}=>{2,3} {2,3}=>{1} {1,2}=>{3} {3}=>{1,2} {1,3}=>{2} {2}=>{1,3}

Strong Rule Generation • I = {{1}, {3}, {5}} • Rules = X => Y • Minconf = 80%

Strong Rules Results All strong rules generated are output: {1}=>{5} {3}=>{5} {2}=>{3,5} {2,3}=>{5} {2,5}=>{3}

Other Frequent Itemsets Closed Frequent Itemset – a frequent itemset X who has no immediate supersets with the same support count as X. Maximal Frequent Itemset – a frequent itemset whom none of its immediate supersets are frequent.

Itemset Relationships Frequent Itemsets Closed Frequent Itemsets Maximal Frequent Itemsets

Targeted Association Mining

Targeted Association Mining * Users may only be interested in specific results * Potential to get smaller, faster, and more focused results * Examples: 1. User wants to know how often only bread and garlic cloves occur together. 2. User wants to know what items occur with toilet paper.

Itemset Trees * Itemset Tree: - A data structure which aids in users querying for a specific itemset and it’s support. * Items within a transaction are mapped to integer values and ordered such that each transaction is in lexical order. {Bread, Onion, Garlic} = {1, 2, 3} * Why use numbers? - make the tree more compact - numbers follow ordering easily

Itemset Trees An Itemset Tree T contains: * A root pair (I, f(I)), where I is an itemset and f(I) is its count. * A (possibly empty) set {T1, T2, . . . , Tk} each element of which is an itemset tree. * If Ij is in the root, then it will also be in The root’s children * If Ij is not in the root, then it might be in the root’s children if: first_item(I) < first_item(Ij) and last_item(I) < last_item(Ij)

Building an Itemset Tree Let ci be a node in the itemset tree. Let I be a transaction from the dataset Loop: Case 1: ci = I Case 2: ci is a child of I - make I the parent node of ci Case 3: ci and I contain a common lexical overlap i.e. {1,2,4} vs. {1,2,6} - make a node for the overlap - make I and ci it’s children. Case 4: ci is a parent of I - Loop to check ci’s children - make I a child of ci Note: {2,6} and {1,2,6} do not have a Lexical overlap

Itemset Trees - Creation

Itemset Trees - Creation Child node.

Itemset Trees - Creation Lexical overlap

Itemset Trees - Creation Parent node.

Itemset Trees - Creation Child node.

Itemset Trees – Querying Let I be an itemset, Let cibe a node in the tree Let totalSup be the total count for I in the tree For all s.t. first_item(ci) < first_item(I): Case 1: If I is contained in ci. - Add support to totalSup. Case 2: If I is not contained and last_item(ci) < last_item(I) - proceed down the tree

Example 1

Itemset Trees - Querying • Querying Example 1: • Query: {2} • totalSup = 0

Association Mining

Association Mining

Presentation Transcript

Data Mining: Association

Mining Association Rules

Association Rule Mining Constraint Based Association Rule Mining

Mining Association Rules

Association Rule Mining

Mining Association Rules

Association Rule Mining

Association Mining

Association Rule Mining

Association Rule Mining

Association Rule Mining

Association Rule Mining

Association Rule Mining

Association Rule Mining

Association Rule Mining

Association Rule Mining

Association Rule Mining

Association rule mining

Association Rule Mining

Association Rule Mining

Mining Association Rules

Data Mining: Association Rule Mining