Association Rules
This presentation is the property of its rightful owner.
Sponsored Links
1 / 27

Association Rules PowerPoint PPT Presentation


  • 146 Views
  • Uploaded on
  • Presentation posted in: General

Association Rules. Evgueni Smirnov. Overview. Association Rule Problem Applications The Apriori Algorithm Discovering Association Rules Techniques to Improve Efficiency of Association Rule Mining Measures for Association Rules. Association Rule Problem.

Download Presentation

Association Rules

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Association rules

Association Rules

Evgueni Smirnov


Association rules

Overview

  • Association Rule Problem

  • Applications

  • The Apriori Algorithm

  • Discovering Association Rules

  • Techniques to Improve Efficiency of Association Rule Mining

  • Measures for Association Rules


Association rule problem

Association Rule Problem

  • Given a database of transactions:

  • Find all the association rules:


Applications

Applications

  • Market Basket Analysis: given a database of customer transactions, where each transaction is a set of items the goal is to find groups of items which are frequently purchased together.

  • Telecommunication (each customer is a transaction containing the set of phone calls)

  • Credit Cards/ Banking Services (each card/account is a transaction containing the set of customer’s payments)

  • Medical Treatments (each patient is represented as a transaction containing the ordered set of diseases)

  • Basketball-Game Analysis (each game is represented as a transaction containing the ordered set of ball passes)


Association rules

Association Rule Definitions

  • I={i1, i2, ..., in}: a set of all the items

  • Transaction T: a set of items such that TI

  • Transaction DatabaseD: a set of transactions

  • A transaction TIcontains a set XI of some items, if XT

  • An Association Rule: is an implication of the form XY, where X, YI


Association rules

Association Rule Definitions

  • A set of items is referred as an itemset. A itemset that contains k items is a k-itemset.

  • The support s of an itemset Xis the percentage of transactions in the transaction databaseDthat contain X.

  • The support of the rule XY in the transaction databaseDis the support of the items set XYin D.

  • The confidence of the rule XY in the transaction databaseDis the ratio of the number of transactions in D that contain XY to the number of transactions that contain X in D.


Association rules

Example

  • Given a database of transactions:

  • Find all the association rules:


Association rules

Association Rule Problem

  • Given:

    • a set I of all the items;

    • a database D of transactions;

    • minimum support s;

    • minimum confidence c;

  • Find:

    • all association rules XY with a minimum support s and confidence c.


Association rules

Problem Decomposition

1. Find all sets of items that have minimum support (frequent itemsets)

2. Use the frequent itemsets to generate the desired rules


Association rules

Problem Decomposition

If the minimum support is 50%, then {Shoes,Jacket} is the only 2- itemset that satisfies the minimum support.

If the minimum confidence is 50%, then the only two rules generated from this 2-itemset, that have confidence greater than 50%, are:

Shoes  Jacket Support=50%, Confidence=66%

Jacket  Shoes Support=50%, Confidence=100%


Association rules

The Apriori Algorithm

  • Frequent Itemset Property:

    Any subset of a frequent itemset is frequent.

  • Contrapositive:

    If an itemset is not frequent, none of its supersets are frequent.


Association rules

Frequent Itemset Property


Association rules

The Apriori Algorithm

  • Lk: Set of frequent itemsets of size k (with min support)

  • Ck: Set of candidate itemset of size k (potentially frequent itemsets)

    L1 = {frequent items};

    for (k = 1; Lk !=; k++) do

    Ck+1 = candidates generated from Lk;

    for each transaction t in database do

    increment the count of all candidates in Ck+1 that are contained in t

    Lk+1 = candidates in Ck+1 with min_support

    returnkLk;


Association rules

Database D

L1

C1

Scan D

C2

C2

L2

Scan D

L3

C3

Scan D

The Apriori Algorithm — Example

Min support =50%


Association rules

How to Generate Candidates

Input: Li-1:set of frequent itemsets of size i-1

Output: Ci :set of candidate itemsets of size i

Ci= empty set;

for each itemset J in Li-1do

for each itemset K in Li-1s.t. K<> J do

ifi-2 of the elements in J and K are equal then

if all subsets of {K  J} are in Li-1then

Ci = Ci {K  J}

return Ci;


Association rules

Example of Generating Candidates

  • L3={abc, abd, acd, ace, bcd}

  • Generating C4 from L3

    • abcd from abc and abd

    • acde from acd and ace

  • Pruning:

    • acde is removed because ade is not in L3

  • C4={abcd}


Association rules

Example of Discovering Rules

Let use consider the 3-itemset {I1, I2, I5}:

I1  I2  I5

I1  I5  I2

I2  I5  I1

I1  I2  I5

I2  I1  I5

I5  I1  I2


Association rules

Discovering Rules

for each frequent itemset Ido

for each subset C of Ido

if (support(I) / support(I - C) >= minconf) then

output the rule (I - C) C,

with confidence = support(I) / support (I - C)

and support = support(I)


Association rules

Example of Discovering Rules

Let use consider the 3-itemset {I1, I2, I5} with support of 0.22(2)%. Let generate all the association rules from this itemset:

I1  I2  I5 confidence= 2/4 = 50%

I1  I5  I2 confidence= 2/2 = 100%

I2  I5  I1 confidence= 2/2 = 100%

I1  I2  I5 confidence= 2/6 = 33%

I2  I1  I5 confidence= 2/7 = 29%

I5  I1  I2 confidence= 2/2 = 100%


Association rules

Apriori Advantages/Disadvantages

  • Advantages:

    • Uses large itemset property.

    • Easily parallelized

    • Easy to implement.

  • Disadvantages:

    • Assumes transaction database is memory resident.

    • Requires many database scans.


Association rules

Transaction reduction

A transaction that does not contain any frequent k-itemset will not contain frequent l-itemset for l >k! Thus, itis useless in subsequent scans!


Partitioning

Partitioning


Sampling

Sampling

Mining on a subset of given data, lower support threshold + a method to determine the completeness


Alternative measures for association rules

Alternative Measures for Association Rules

  • The confidence of XYin database D is the ratio of the number of transactions containing XY to the number of transactions that contain X. In other words the confidence is:

  • But, when Y is independent of X: p(Y) = p(Y | X). In this case if p(Y) is high we’ll have a rule with high confidence that associate independent itemsets! For example, if p(“buy milk”) = 80% and “buy milk” is independent from “buy salmon”, then the rule “buy salmon” “buy milk” will have confidence 80%!


Alternative measures for association rules1

The lift measure indicates the departure from independence of X and Y. The lift of XYis :

But, the lift measure is symmetric; i.e., it does not take into account the direction of implications!

Alternative Measures for Association Rules


Alternative measures for association rules2

The conviction measure indicates the departure from independence of X and Y taking into account the implication direction. The conviction of XYis :

Alternative Measures for Association Rules


Association rules

Summary

  • Association Rules form an very applied data mining approach.

  • Association Rules are derived from frequent itemsets.

  • The Apriori algorithm is an efficient algorithm for finding all frequent itemsets.

  • The Apriori algorithm implements level-wise search using frequent irem property.

  • The Apriori algorithm can be additionally optimised.

  • There are many measures for association rules.


  • Login