Efficient mining of both positive and negative association rules
This presentation is the property of its rightful owner.
Sponsored Links
1 / 40

Efficient Mining of Both Positive and Negative Association Rules PowerPoint PPT Presentation


  • 108 Views
  • Uploaded on
  • Presentation posted in: General

Efficient Mining of Both Positive and Negative Association Rules. Xindong Wu (*), Chengqi Zhang (+), and Shichao Zhang (+) (*) University of Vermont, USA (+) University of Technology Sydney, Australia xwu@ cs.uvm.edu Presenter: Mike Tripp. Outline. Association Analysis Exceptions

Download Presentation

Efficient Mining of Both Positive and Negative Association Rules

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Efficient mining of both positive and negative association rules

Efficient Mining of Both Positive andNegative Association Rules

Xindong Wu (*), Chengqi Zhang (+), and Shichao Zhang (+)

(*) University of Vermont, USA

(+) University of Technology Sydney, Australia

[email protected]

Presenter: Mike Tripp


Outline

Outline

  • Association Analysis

    • Exceptions

    • Problems

    • Rules

    • Examples

  • Pruning Strategy

    • Frequent Items of Potential Interest

    • Infrequent Items of Potential Interest

  • Procedure AllItemsOfInterest

  • Extracting Positive and Negative Rules

    • CPIR

  • PositiveAndNegativeAssociations

  • Effectiveness and Efficiency

  • Experimental Results

  • Related Work

  • Exam Questions


Exceptions of rules

Exceptions of Rules

  • Known as a deviational pattern to a well-known fact, and exhibits unexpectedness.

  • Also known as surprising patterns.

  • Example:

    • While birds(x) => flies(x), exception:

    • bird(x), penguin(x) => −flies(x)

  • Interesting fact: A => B as a valid rule does not imply −B => −A is a valid rule.


Key problems in negative association r ule mining

Key Problems in Negative Association Rule Mining

  • How to effectively search for interesting itemsets.

  • How to effectively identify negative association rules of interest.


Association analysis

Association Analysis

  • Generate all large itemsets: All itemsets that have a support greater than or equal to the user specified minimum support are generated.

  • Generate all the rules that have a minimum confidence in the following naive way: For every large itemset Xand any B C X , let A =X − B . If the rule A => Bhas the minimum confidence (or supp(X)/supp(A) ≥ mc ), then it is a valid rule.


Negation types of rules

Negation/Types of Rules

  • The negation of an itemset A is indicated by −A . The support of−A, supp(−A) =1 − supp(A).

  • In particular, for an itemset i1−i2i3, its support is supp(i1−i2i3 ) = supp(i1i3) − supp (i1i2i3).

  • Positive rule: A => B

  • Negative rules:

    • A => −B

    • −A => B

    • −A => −B


Negative association rules

Negative Association Rules

  • Still Difficult: exponential growth of infrequent itemsets

    • TD={(A,B,D);(B,C,D);(B,D);(B,C,D,E);(A,B,D,F)}

    • Such a simple database contains 49 infrequent item sets.


Define negative association rules

Define Negative Association Rules

  • Two cases:

    • If both A and B are frequent, A U B is infrequent, is A=>~B a valid rule?

    • If A is frequent, B is infrequent, is A => ~B a valid rule? Maybe, but not of our interest.

  • Heuristic: Only if both A and B are frequent, will A => ~B be considered.


Negative association example

Negative Association Example

Consider supp(c) = 0.6, supp(t) = 0.4, supp(t U c) = 0.05, mc = 0.52

  • Confidence of t => c is supp(t U c)/supp[t] = 0.05/0.4 = 0.125, which is < mc = 0.52 and, supp(t U c) = 0.05 is low.

    • This indicates that t U c is an infrequent itemset and that t => c cannot be a valid rule.

  • However...

  • Supp(t U −c) = supp[t] – supp[t U c] = 0.4 – 0.05 = 0.35 which is high, and the confidence of t => −c is the ratio supp[t U −c]/supp[t] = 0.35/0.4 = 0.875, which is > mc

    • Therefore, t => −c is a valid rule.


Identifying interesting itemsets

Identifying Interesting Itemsets

  • Because of the exponential number of infrequent itemsets in a database, pruning is critical to efficient search for interesting itemsets.


Pruning strategy

Pruning Strategy

  • We want to find interesting itemsets when pruning.

  • Let’s define an interestingness function interest(X, Y) = |supp(X U Y) – supp(X)supp(Y)| and a threshold mi

    • If interest(X, Y) ≥ mi, then the rule X => Y is of potential interest, and X U Y is referred to as potentially interesting itemset

  • Using this approach, we can establish an effective pruning strategy for efficiently identifying all frequent itemsets of potential interest in a database.

  • The pruning strategy ensures we can use an Apriori like algorithm. Generating infrequent k itemsets from frequent k-1 itemsets.


Frequent itemset of potential interest

Frequent Itemset of Potential Interest

Where f() is a constraint function concerning the support, confidence, and interestingness of X => Y


Infrequent itemset of potential interest

Infrequent Itemset of Potential Interest

Where g() is a constraint function concerning f() and the support, confidence,

and interestingness of X => Y


Bringing them together

Bringing them together

  • Using the fipiand iipimechanisms for both positive and negative rule discovery, our search is constrained to seeking interesting rules on certain measures, and pruning is the removal of all uninteresting branches that cannot lead to an interesting rule that would satisfy those constraints.


Procedure allitemsofinterest

Procedure AllItemsOfInterest

Input: D (a database); minsupp; mininterest

Output: PL (frequent itemsets); NL (infrequent itemsets)


Procedure allitemsofinterest1

Procedure AllItemsOfInterest


Procedure allitemsofinterest2

Procedure AllItemsOfInterest

  • E.g. run of the algorithm (ms=0.3,mi=0.05)

TIDItems bought

T1{A,B,D}

T2{A,B,C,D}

T3{B,D}

T4{B,C,D,E}

T5{A,E}

T6 {B,D,F}

T7 {A,E,F}

T8 {C,F}

T9 {B,C,F}

T10 {A,B,C,D,F}


Procedure allitemsofinterest3

Procedure AllItemsOfInterest

  • Generate frequent and infrequent 2-itemset of interest.

    • When ms= 0.3, L2 = {AB, AD, BC, BD, BF, CD, CF}, N2 = {AC, AE, AF, BE, CE, DE, DF, EF}

  • Use interest measure to prune.


Procedure allitemsofinterest4

Procedure AllItemsOfInterest

  • So AD and CD are not of interest, they are removed from L2.


Procedure allitemsofinterest5

Procedure AllItemsOfInterest

  • So the resulting frequent 2-itemsets are as follows:


Procedure allitemsofinterest6

Procedure AllItemsOfInterest

  • Generate infrequent 2-itemsets useing the iipi measure.

    • Very similar to frequent 2-itemsets.


Extracting positive and negative rules

Extracting Positive and Negative Rules

  • Continue like this to get all the itemsets.

Algorithm iteration

Frequent 1-itemsetA,B,C,D,E,F

Frequent 2-itemsetAB,BC,BD,BF,CF

Infrequent 2-itemsetAC,AE,AF,BE,

CE,CF,DE,EF

Frequent 3-itemsetBCD

Infrequent 3-itemset BCF,BDF

TIDItems bought

T1{A,B,D}

T2{A,B,C,D}

T3{B,D}

T4{B,C,D,E}

T5{A,E}

T6 {B,D,F}

T7 {A,E,F}

T8 {C,F}

T9 {B,C,F}

T10 {A,B,C,D,F}


Extracting positive and negative rules1

Extracting Positive and Negative Rules

  • Pruning strategy for rule generation: Piatetsky-Shapiro’s argument.

  • If Dependence(X,Y) = 1, X and Y are independent.

  • If Dependence(X,Y) > 1, Y is positively dependent on X.

    • The bigger the ratio (p(Y | X) – p(Y))/(1 – p(Y)), the higher the positive dependence.

  • If Dependence(X,Y) < 1, Y is negatively dependent on X (~Y is positively dependent on X).

    • The bigger the ratio (p(Y | X) – p(Y))/(−p(Y)), the higher the negative dependence.


Extracting both types of rules

Extracting Both Types of Rules

  • Conditional probability increment ratio.

  • Used to measure the correlation between X and Y.

    • When CPIR(X|Y)=0, X and Y are dependent.

    • When it is 1, they are perfectly correlated.

    • When it is -1, they are perfectly negatively correlated.


Extracting both types of rules1

Extracting Both Types of Rules

  • Because p(~A)=1-p(A), we only need the first half of the previous equation.

    or

  • This value is used as confidence value.


Association rules of interest

Association Rules of Interest

  • Let I be the set of items in a database D, i = A UB ⊆I be an itemset, A ∩B = Ø, supp(A) ≠ 0, supp(B) ≠ 0, andms, mc and mi > 0 be given by the user. Then,

    • If supp(A U B) ≥ ms, interest(A, B) ≥ mi, and CIPR(B|A) ≥ mc, then A => B is a positive rule of interest

    • If supp(A U −B) ≥ ms, supp(A) ≥ ms, supp(B) ≥ ms, interest(A, −B) ≥mi, and CPIR(−B|A) ≥ mc, then A => −B is a negative rule of interest

    • If supp(−A U B) ≥ ms, supp(A) ≥ ms, supp(B) ≥ ms, interest(−A, B) ≥ mi, and CPIR(B|−A) ≥ mc, then −A => B is a negative rule of interest

    • If supp(−A U −B) ≥ ms, supp(A) ≥ ms, supp(B) ≥ ms, interest(−A, −B) ≥ mi, and CPIR(−B|−A) ≥ mc, then −A => −B is a negative rule of interest


Example with cpir

Example with CPIR

  • For itemset B U D in PL

  • B => D can be a valid positive rule of interest


Extracting rules

Extracting rules

  • One snapshot of an iteration in the algorithm

  • The result B => −Eis a valid rule.


Algorithm design

Algorithm Design

  • Generate the set PL of frequent itemset and the set NL of infrequent itemsets

  • Extract positive rules of the form A => B in PL, and negative rules of the forms A => −B, −A => B, −A => −B in NL


Positiveandnegativeassociations

PositiveAndNegativeAssociations


Positiveandnegativeassociations1

PositiveAndNegativeAssociations


Effectiveness and efficiency

Effectiveness and Efficiency

  • Aggregated Test Data: used for KDD Cup 2000 Data and Questions

  • Can be found: http://www.ecn.purdue.edu/KDDCUP/

  • Implemented on: Dell Workstation PWS650 w/ 2G of CPU and 2G memory

  • Language: C++


Experimental results 1

Experimental Results (1)

  • A comparison with Apriori like algorithm without pruning

    • MBP = Mining By Pruning

    • MNP = Mining with No-Pruning


Experimental results 2

Experimental Results (2)

  • A comparison with no-pruning


Experimental results

Experimental Results

  • Effectiveness of pruning

PII = Positive Items of Interest

NII = Negative Items of Interest


Related work

Related Work

  • Negative relationships between frequent itemsets, but not how to find negative rules (Brin, Motwani and Silverstein 1997)

  • Strong negative association mining using domain knowledge (Savasere, Ommiecinski and Navathe 1998)


Conclusions

Conclusions

  • Negative rules are useful

  • Pruning is essential to find frequent and infrequent itemsets.

  • Pruning is important to find negative association rules.

  • There could be more negative association rules if you have different conditions.


Exam questions

Exam Questions

  • What are the three types dependencies when dependence(X, Y) are equal, greater than, and less than 1?

    • If Dependence(X,Y) = 1, X and Y are independent.

    • If Dependence(X,Y) > 1, Y is positively dependent on X.

    • If Dependence(X,Y) < 1, Y is negatively dependent on X(−Yis positively dependent on X).


Exam questions1

Exam Questions

  • Give an example of a rule exception, or surprising pattern.

    • While birds(x) => flies(x), exception:

    • bird(x), penguin(x) => −flies(x)


Exam questions2

Exam Questions

  • What does CPIR(X|Y) tell us?

    • When CPIR(X|Y)=0, X and Y are dependent.

    • When it is 1, they are perfectly correlated.

    • When it is -1, they are perfectly negatively correlated.


  • Login