Genmax
This presentation is the property of its rightful owner.
Sponsored Links
1 / 22

GenMax PowerPoint PPT Presentation


  • 131 Views
  • Uploaded on
  • Presentation posted in: General

GenMax. From: “ Efficiently Mining Frequent Itemsets ” By : Karam Gouda & Mohammed J. Zaki. The Problem. Given a large database of items transactions, find all frequent itemsets

Download Presentation

GenMax

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Genmax

GenMax

From:

“Efficiently Mining Frequent Itemsets”

By :

Karam Gouda & Mohammed J. Zaki

Zeev Dvir – [email protected]


The problem

The Problem

  • Given a large database of items transactions, find all frequent itemsets

  • A frequent itemset is a set of items that occurs in at-least a user-specified percentage of the data-base

  • We call this percentage : min_sup (for minimum support).

Zeev Dvir – [email protected]


Genmax

  • A Maximal Frequent Itemset is a frequent itemset, that doesn’t have a frequent superset

  • FI := frequent itemsets

    MFI := maximal frequent itemsets

  • Fact:

    |MFI| << |FI|

    GenMax is an algorithm to find the exact MFI

Zeev Dvir – [email protected]


Example

Example

Min_sup = 3

ABCD

ABC ABD ACD BCD

AB AC AD BC BD CD

A B C D

Zeev Dvir – [email protected]


Some useful definitions

Some Useful Definitions

  • The Combine-Set of an itemset I , is the set of items that can be added to I to create a frequent itemset.

  • For example , in the previous example, The combine-set of the itemset {A} is {B, C}.

  • The combine-set of the empty itemset is called F1 and is actually the set of frequent itemsets ofsize 1.

Zeev Dvir – [email protected]


Genmax

Zeev Dvir – [email protected]


Genmax

Zeev Dvir – [email protected]


Improvement

Improvement

  • At each level, sort the combine-set (C) in increasing order of support

  • An itemset with low support has a smaller chance of producing a large combine-set in the next level

  • The sooner we prune the tree, the more work we save

  • This heuristic was first used in MaxMiner

Zeev Dvir – [email protected]


Bottlenecks

Bottlenecks

  • Superset checking :

    The best algorithms for superset checking give an amortized bound of per operation.

    that’s bad if we have many itemsets in the MFI.

    2. Frequency testing :

    How can we make frequency testing faster ?

Zeev Dvir – [email protected]


Optimizing superset checking

Optimizing Superset Checking

  • A technique called “Progressive Focusing” is used to narrow down the group of potential supersets, as the recursive calls are made

  • LMFI := Local MFI

  • Before each recursive call, we construct the LMFI for the next call, based on the current LMFI and the new item added.

Zeev Dvir – [email protected]


Lmfi example

LMFI Example

FGHI FGHJ …

FGH FGI …

FG …

Zeev Dvir – [email protected]


Genmax

Zeev Dvir – [email protected]


Frequency testing optimization

Frequency Testing Optimization

  • GenMax uses a “vertical database format”:

  • For each item , we have a set of all the transactions containing this item.

  • This set is called a tidset. (Transaction ID Set).

  • This method makes support computations easier, because we don’t have to go over the entire database.

Zeev Dvir – [email protected]


Vertical database

Vertical Database

A {1, 3, 4, 5}

B {1, 3, 4, 6}

C {1 ,2 ,3 ,4 ,7}

D {2, 4, 6}

t(A) = {1, 3, 4, 5}

t(AC) = {1, 3, 4}

supp(I) = |t(I)|

Zeev Dvir – [email protected]


Genmax

ABC ABD ABE

AB …

= { C , E }

t(ABC) t(ABE)

Each item y in the combine-set , actually represents the itemset

, and stores the tidset associated with it.

Zeev Dvir – [email protected]


Additional optimization

Additional Optimization

  • Diffsets:don’t store the entire tidsets, only the differences between tidsets (described in “Fast Vertical Mining Using Diffsets”)

Zeev Dvir – [email protected]


Experimental results

Experimental Results

  • GenMax is compared with:

    MaxMiner , MAFIA, MAFIA-PP

  • MaxMiner & MAFIA-PP give the exact MFI, while MAFIA gives a superset of the MFI

  • The Databases used in the experiments are grouped according to the MFI length distribution

Zeev Dvir – [email protected]


Type i datasets

Type I Datasets

Zeev Dvir – [email protected]


Type ii datasets

Type II Datasets

Zeev Dvir – [email protected]


Type iii datasets

Type III Datasets

Zeev Dvir – [email protected]


Type iv datasets

Type IV Datasets

Zeev Dvir – [email protected]


Genmax

The End

Zeev Dvir – [email protected]


  • Login