Dynamic itemset counting and implication rules for market basket data
This presentation is the property of its rightful owner.
Sponsored Links
1 / 15

Dynamic Itemset Counting and Implication Rules for Market Basket Data PowerPoint PPT Presentation


  • 45 Views
  • Uploaded on
  • Presentation posted in: General

Dynamic Itemset Counting and Implication Rules for Market Basket Data. Abstract. new algorithm fewer passes fewer candidate itemsets implication rules normalized based on both the antecedent and the consequent truly implications (not co-occurrence) more useful, intuitive results.

Download Presentation

Dynamic Itemset Counting and Implication Rules for Market Basket Data

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Dynamic itemset counting and implication rules for market basket data

Dynamic Itemset Counting and Implication Rules for Market Basket Data


Abstract

Abstract

  • new algorithm

    • fewer passes

    • fewer candidate itemsets

  • implication rules

    • normalized based on both the antecedent and the consequent

    • truly implications (not co-occurrence)

    • more useful, intuitive results


Apriori vs dic

Apriori vs. DIC

  • Apriori

    • level-wise

    • many passes

  • DIC

    • reduce the number of passes

    • fewer candidate itemsets than sampling

  • example : 40,000 transaction, M = 10,000


Dynamic itemset counting and implication rules for market basket data

Fig 1. Apriori and DIC


Counting large itemsets

Counting large itemsets

  • Itemsets : a large lattice

  • count just the minimal small itemsets

    • the itemsets that do not include any other small itemsets

  • mark itemset

    • Solid box - confirmed large itemset

    • Solid circle - confirmed small itemset

    • Dashed box - suspected large itemset

    • Dashed circle - suspected small itemset


Dynamic itemset counting and implication rules for market basket data

Fig 2. An itemsets lattice


Dic algorithm

DIC algorithm

  • The empty itemset is marked with a soild box. All the 1-itemsets are marked with dashed circles. All other itemsets are unmarked.

  • Read M transactions. For each transaction, increment the respective counters for the itemsets marked with dashes.

  • If a dashed circle has a count that exceeds the support threshold, turn it into a dashed square. If any immediate superset of it has all of its subsets as solid or dashed squares, add new counter for it and make it dashed circle.

  • If a dashed itemset has beec counted through all the transactions, make it solid and stop counting it.

  • If we are at the end of the transaction file, rewind to the beginning

  • If any dashed itemsets remain, go to step 2.


Dynamic itemset counting and implication rules for market basket data

Fig 3. Start of DIC algorithm


Dynamic itemset counting and implication rules for market basket data

Fig 4. After M transactions


Dynamic itemset counting and implication rules for market basket data

Fig 5. After 2M transactions


Dynamic itemset counting and implication rules for market basket data

Fig 6. After one pass


Data structure

Data structure

  • like the hash tree used in Apriori with a little extra information

  • Every node stores

    • the last item in the itemset

    • counter, marker, its state

    • its branches if it is an interior node


Dynamic itemset counting and implication rules for market basket data

Fig 7. Hash Tree Data Structure


Implication rules

Implication rules

  • conviction

    • more useful and intuitive measure

    • unlike confidence,

      • normalized based on both the antecedent and the consequent

    • unlike interest,

      • directional

      • actual implication as opposed to co-occurrence


Implication rules1

Implication rules

  • support : P(A, B)

  • confidence : P(B|A) = P(A, B)/P(A)

  • interest : P(A, B)/P(A)P(B)

  • conviction : P(A)P(B)/P(A, B)


  • Login