Mining quantitative association rules in large relational tables
Download
1 / 30

Mining Quantitative Association Rules in Large Relational Tables - PowerPoint PPT Presentation


  • 97 Views
  • Uploaded on

Mining Quantitative Association Rules in Large Relational Tables. ACM SIGMOD Conference 1996 Authors: R. Srikant, and R. Agrawal Presented by: Biyu Liang March 29, 2006. Outline. Review of Association Analysis Introducing Quantitative AR Problem Partitioning Quantitative Attributes

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Mining Quantitative Association Rules in Large Relational Tables' - sybill-carroll


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Mining quantitative association rules in large relational tables

Mining Quantitative Association Rules in Large Relational Tables

ACM SIGMOD Conference 1996

Authors: R. Srikant, and R. Agrawal

Presented by: Biyu Liang

March 29, 2006


Outline
Outline Tables

  • Review of Association Analysis

  • Introducing Quantitative AR Problem

  • Partitioning Quantitative Attributes

  • Identifying the Interesting Rules

  • Extending the Apriori Algorithm

  • Conclusions


Association rule
Association Rule Tables

  • Item sets X and Y

  • Rule X => Y

  • Support = Pr(XUY)

  • Confidence= Pr(Y|X)= Pr(XUY)/Pr(X)

  • Find rules that have MinSup and MinConf


Boolean association rules
Boolean Association Rules Tables

  • Attribute has a value of “1” if the transaction contains the corresponding item; “0” otherwise.


Outline1
Outline Tables

  • Review of Association Analysis

  • Introducing Quantitative AR Problem

  • Partitioning Quantitative Attributes

  • Identifying the Interesting Rules

  • Extending the Apriori Algorithm

  • Conclusions


Quantitative association rules
Quantitative Association Rules Tables

  • <Age: 30..39> and <Married: Yes> => <NumCars: 2>

  • Support = 40%, Conf = 100%


Mapping to boolean association rules problem
Mapping to Boolean Association Rules Problem Tables

  • Using <attribute: value> as new attribute, which has only boolean values


Problems with direct mapping
Problems with Direct Mapping Tables

  • MinSup: If number of intervals is large, the support of a single interval can be lower

  • MinConf: Information lost during partition values into intervals. Confidence can be lower as number of intervals is smaller


The tradeoff
The Tradeoff Tables

  • Increase the number of intervals (to reduce information lost) while combining adjacent ones (to increase support)

  • ExecTime blows up as items per record increases

  • ManyRules: Number of rules also blows up. Many of them will not be interesting


The proposed approach
The Proposed Approach Tables

  • Partition quantitative attribute values and combining adjacent partitions as necessary

  • Partial Completeness Measure for deciding the partitions

  • Interest Measure (pruning) to address the “ManyRules” problem

  • Extend the Apriori Algorithm


5 steps of the proposed approach
5 Steps of the Proposed Approach Tables

  • Determine the number of partitions for each quantitative attribute

  • Map values/ranges to consecutive integer values such that the order is preserved

  • Find the support of each value of the attributes, and combine when support is less than MaxSup. Find frequent itemsets, whose support is larger than MinSup

  • Use frequent set to generate association rules

  • Pruning out uninteresting rules


5 steps of the proposed approach1
5 Steps of the Proposed Approach Tables

  • Determine the number of partitions for each quantitative attribute

  • Map values/ranges to consecutive integer values such that the order is preserved

  • Find the support of each value of the attributes, and combine when support is less than MaxSup. Find frequent itemsets, whose support is larger than MinSup

  • Use frequent set to generate association rules

  • Pruning out uninteresting rules


Outline2
Outline Tables

  • Review of Association Analysis

  • Introducing Quantitative AR Problem

  • Partitioning Quantitative Attributes

  • Identifying the Interesting Rules

  • Extending the Apriori Algorithm

  • Conclusions


Partial completeness
Partial Completeness Tables

  • R : rules obtained before partition

  • R’: rules obtained after partition

  • Partial Completeness measures the maximum distance between a rule in R and its closest generalization in R’

  • is a generalization of itemset X: if

  • The distance is defined by the ratio of support


K complete
K-Complete Tables

  • C : the set of frequent itemsets

  • For any K ≥ 1, P is K-complete w.r.t C if:

    • P C

    • For any itemset X (or its subset) in C, there exists a generalization whose support is no more than K times that of X (or its subset)

  • The smaller K is, the less the information lost


Theoretical results
Theoretical Results Tables

  • Lemma 1: If P is K-complete set w.r.t C, then any rule R obtained from C has a generalization R’ from P, such that conf(R’) is bounded by [conf(R)/K, K*conf(R)]

  • For given partial completeness level K, equi-depth partitioning satisfies the completeness level with minimum number of intervals: 2n/[m(K-1)],and MaxSup for each interval is m(K-1)/(2n)


Outline3
Outline Tables

  • Review of Association Analysis

  • Introducing Quantitative AR Problem

  • Partitioning Quantitative Attributes

  • Identifying the Interesting Rules

  • Extending the Apriori Algorithm

  • Conclusions


Example of uninteresting rule
Example of Uninteresting Rule Tables

  • Suppose a quarter of people in age group 20..30 are in the age group 20..25

    • <Age: 20..30> => <Cars: 1..2>, with 8% sup, 70% conf

    • <Age: 20..25> => <Cars: 1..2>, with 2% sup, 70% conf

  • The second rule doesn’t give any additional information, and is less general than the first rule


Expected values based on generalization
Expected Values Based on Generalization Tables

  • Itemset Z = {<z1, l1, u1>, …, <z1, l1, u1>}

  • The expected support of Z based on the support of its generalization is defined as


Expected values based on generalization1
Expected Values Based on Generalization Tables

  • The expected confidence of the rule X => Y based on the confidence of its generalization is defined as


Interest measure
Interest Measure Tables

  • Itemset X is R-interesting w.r.t its generalization if

    • The support of X is no less than R times the expected supports based on , and

    • For any specialization X' of , X – X' is R-interesting w.r.t

  • Rule X => Y is R-interesting w.r.t its generalization if the support or confidence is R times that of , and the itemset is R-interesting w.r.t


Outline4
Outline Tables

  • Review of Association Analysis

  • Introducing Quantitative AR Problem

  • Partitioning Quantitative Attributes

  • Identifying the Interesting Rules

  • Extending the Apriori Algorithm

  • Conclusions


Candidate generation
Candidate Generation Tables

  • Given the set Lk-1 of all frequent (k-1)-itemset, generate the set of Lk

  • The process has three parts:

    • Join Phase

    • Subset Prune Phase

    • Interest Prune Phase


Join phase
Join Phase Tables

  • Lk-1 joined with itself

  • Join condition: k-2 items are the same, the remaining ones have different attribute

  • Example, L2:

    • {<Married:Yes> <Age:20..24>}

    • {<Married:Yes> <Age:20..29>}

    • {<Married:Yes> <Cars:0..1>}

    • {<Age:20..29> <Cars:0..1> }

  • Result of self-join, C3:

    • {<Married:Yes> <Age:20..24><Cars:0..1>}

    • {<Married:Yes> <Age:20..29><Cars:0..1>}


Subset prune phase
Subset Prune Phase Tables

  • Make sure any (k-1)-subset is in Lk-1

  • Example, L2:

    • {<Married:Yes> <Age:20..24>}

    • {<Married:Yes> <Age:20..29>}

    • {<Married:Yes> <Cars:0..1>}

    • {<Age:20..29> <Cars:0..1> }

  • Result of self-join, C3:

    • {<Married:Yes> <Age:20..24><Cars:0..1>}

    • {<Married:Yes> <Age:20..29><Cars:0..1>}

  • Delete the first itemset in C3 since <Age:20..24><Cars:0..1> is not in L2


Interest prune phase
Interest Prune Phase Tables

  • Given user-specified interest level R

  • Delete any itemset that contains a item with support greater than 1/R

  • Lemma 5 guarantees that such itemsets cannot be R-interesting w.r.t to their generalizations


Outline5
Outline Tables

  • Review of Association Analysis

  • Introducing Quantitative AR Problem

  • Partitioning Quantitative Attributes

  • Identifying the Interesting Rules

  • Extending the Apriori Algorithm

  • Conclusions


Conclusions
Conclusions Tables

  • This paper introduced the problem of mining quantitative association rules in large relational tables

  • It dealt with quantitative attributes by fine-partitioning the values and combining adjacent partitions as necessary

  • Partial completeness quantifies the info lost, and help decide the partitions

  • Interest measure to identify interesting rules



Final exam questions
Final Exam Questions Tables

  • What is Partial Completeness? (p.14-15)

  • Determine a number of intervals, where there 3 quantitaive attributes, .70 min support and a 1.5 partial completeness level? (p.16)

  • If Intervals are too large, rules may not have MinConf, and if they are too samll, rules may not have MinSupp, how Do you go about solving this catch 22 problem? (p.8-9)


ad