algorithms for mining maximal frequent itemsets a survey l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Algorithms for Mining Maximal Frequent Itemsets -- A Survey PowerPoint Presentation
Download Presentation
Algorithms for Mining Maximal Frequent Itemsets -- A Survey

Loading in 2 Seconds...

play fullscreen
1 / 48

Algorithms for Mining Maximal Frequent Itemsets -- A Survey - PowerPoint PPT Presentation


  • 284 Views
  • Uploaded on

Algorithms for Mining Maximal Frequent Itemsets -- A Survey. Chaojun Lu. Introduction Frequent Itemset Extension Tree Common Techniques Some MFI-Mining Algorithms Concluding Remarks. Introduction. Terminology and Notations Problem Solution. Terminology and Notations

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

Algorithms for Mining Maximal Frequent Itemsets -- A Survey


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide2
Introduction
  • Frequent Itemset Extension Tree
  • Common Techniques
  • Some MFI-Mining Algorithms
  • Concluding Remarks
introduction
Introduction
  • Terminology and Notations
  • Problem
  • Solution
slide4

Terminology and Notations

set of items: I = { i1, i2, …, in}

set of transactions: DB = {T1,T2,…,Tm},Ti I

(k-)itemset: N  I ( |N| = k )

support of itemset N: supp(N)

frequent itemset (fi)

maximal frequent itemset (mfi)

set of all frequent (k-)itemsets: FI, FIk

set of all mfi: MFI

slide5

Problem

Discover all maximal frequent itemsets in a given transaction database

Solution

Traversing the search space -- subset lattice of I -- and count support for itemset in DB

slide6

Solution(cont.)

  • Traversing the search space by --
  • Brute-force: 2|I|
  • Clever use of the Basic Property of itemsets:
  • A  B  supp(A)  supp(B)
  • BP1: All subsets of a known frequent itemset are also frequent.
  • BP2: All supersets of a known infrequent itemset are also infrequent.
slide7

Introduction

  • Frequent Itemset Extension Tree
  • Common Techniques
  • Some MFI-Mining Algorithms
  • Concluding Remarks
frequent itemset extension tree
Frequent Itemset eXtension Tree
  • Purpose
  • Idea
  • Description
  • Problem Re-formulated
slide9

Purpose

To provide a general framework for analyzing and comparing existent MFI mining algorithms.

Idea

Larger frequent itemsets are generated by extending known smaller frequent itemsets with suitable items.

FIXTree captures and illustrates this extension process.

slide10

Description of FIXTree

  • Root: 
  • Nodes: frequent itemset
  • Each node N is associated with its candidate extensions CX(N) and frequent extensions FX(N) defined as:
  • CX(N) = {x | xI and N{x} may be frequent}
  • FX(N) = {x | xCX(N) and N{x} is frequent}
  • Parent-Child PC: C is a frequent extension of P, i.e. C = P{x} for some xFX(P).
slide11

Example

 ({1,2,3,4,5}/{1,2,3,4})

1 ({2,3,4}/{2,4})

2 ({3,4}/{3,4})

3…

4…

23 ({4}/)

12 ({4}/{4})

14 (/)

24 (/)

124 (/)

Problem Re-formulated

Generate as small a FIXTree containing MFI as possible while searching the subset lattice of I.

slide12

Introduction

  • Frequent Itemset Extension Tree
  • Common Techniques
  • Some MFI-Mining Algorithms
  • Concluding Remarks
common techniques
Common Techniques
  • Search Strategies
  • Pruning Strategies
  • Dynamic Reordering
  • Data Representation for Fast Support Counting
  • Frequency Determination
slide14

Search Strategies

  • We can generate the FIXTree via:
  • Breadth-first
  • Depth-first
  • Hybrid
  • For MFI-mining, it’s unnecessary to generate and count all nodes. Instead, we try to generate as fewer nodes of the FIXTree as possible, so long as MFI can be identified.
slide15

Pruning Strategies

BasicPS1:

Prune node N’s infrequent extension subtree.

1 ({2,3,4}/{2,4})

12 ({4}/{4})

13

14 (/)

Note: This strategy greatly improves a PURE DFS algorithm for mining long patterns.

slide16

Pruning Strategies(cont.)

BasicPS2:

Node N’s CX(N) comes from its parent-node P’s FX(P).

Let N=P{x}, xFX(P), then

CX(N) = {y | yFX(P) and y > x}

1 ({2,3,4}/{2,4})

14 (/…)

12 ({4}/…)

slide17

Pruning Strategies (cont.)

MaxPS1:

At node N, if NCX(N)  M (a known fi/mfi), then N-subtree may be pruned.

MaxPS2:

At node N, if NCX(N) is frequent by support counting, then all N’s children may be pruned ( and a possible new mfi is produced).

Look-ahead

1 ({2,3,4}/…)

12

13

14

123

124

1234

slide18

Pruning Strategies(cont.)

MaxPS3:

At node N, NCX(N) is frequent, then all N’s right-hand-side siblings may be pruned. (Those branches won’t produce new mfi.)

 ({1,2,3,4,5}/{1,2,3,4})

1…

2 ({3,4}/…)

3…

4…

slide19

Pruning Strategies(cont.)

DFMaxPS:

In DFS, AFTER the recursive call DFS(Ni), check if the leftmost path N{i,…,n}is frequent. If yes, then Ni’s right-hand-side siblings may be pruned. (These won’t produce new mfi.)

N(…/{1,2,…n})

N1

Ni ({i+1,…,n})

N(i+1)

Nn

slide20

Pruning Strategies(cont.)

EquivPS:

At node N, if for some xCX(N), supp(N{x}) = supp(N), then N can be replaced by N{x}, with CX(N{x}) = CX(N)-{x}

N ({x,y,z}/…)

Nx ({y,z}/…)

Nx…

Ny…

Nz…

Nxy…

Nxz…

Itemsets containing N but not x

cannot be mfi

Nxy…

Nxz…

slide21

Dynamic Reordering

  • The item order in which to extend itemsets greatly affects MFI mining algorithms
  • Two heuristics:
  • DR1 At node N, reorder all xFX(N) in supp(Nx) increasing order.

1 {2,3,4}

13{4}

14

12 {4,3}

124{3}

134

123

1243

slide22

Dynamic Reordering(cont.)

  • DR2 Reorder items of FX() (i.e. FI1) in decreasing order of IF(x) with xFI1, where
  • IF(x) = {y | yFI1 and xy is infrequent}.
  • Notes:
  • |M(x)|  |FI1|-|IF(x)| where M(x) is the size of the longest mfi containing x
  • DR2 + DR1 for FI1.
  • Compute FI1 and FI2 before use of DR2.
slide23

Data Representation

  • Data representation
  • transaction
  • set of items
  • bitstring
  • tid-list for each item(set)
  • FP-tree
  • vertical bitmap for each item(set)
  • diffset
  • Count support on the entire DB or sub-DB?
  • Counting techniques
slide24

Frequency Determination

  • We can determine a frequent itemset N via:
  • Direct counting supp(N) in DB
  • A known frequent superset of N
  • Lower Bound of supp(N) exceeding minsup
slide25

Lower Bound Technique

  • Obtain a lower-bound on supp(N) based on support information of N’s subsets.
  • supp(N{x}) = supp(N)-drop(N,x)
  •  supp(N)-drop(M,x) where MN.
  • supp(NX)  supp(N)-drop(M,x) where MN.
slide26

Lower Bound Technique(cont.)

  • LB-PS
  • We already have supp(N),supp(N1),supp(N2),supp(N3), so we can compute
  • Supp(N123) = supp(N)-drop(N,1)-drop(N,2)-drop(N,3) and check if it is  minsup?
  • If yes, then prune N2 and N3 branches. (cf. MaxPS3)

N (…/{1,2,3})

N1 ({2,3}/…)

N2 ({3}/…)

N3

slide27

Introduction

  • Frequent Itemset Extension Tree
  • Common Techniques
  • Some MFI-Mining Algorithms
  • Concluding Remarks
some mfi mining algorithms
Some MFI-Mining Algorithms
  • Apriori
  • Pincer- Search
  • FP-growth
  • Max-Miner
  • DepthProject
  • MAFIA
  • GenMax
slide29

Apriori

Breadth-first

Key steps:

Given FIk

Generate Ck+1

Join (Extending FIk using BasicPS2)

Prune (BP2)

Support Counting Ck+1 to obtain FIk+1

slide30

Apriori(cont.)

Symmetry of FI-mining problem

FIk

IFk

extension

Count Ck+1

Count Ck

reduction

IFk+1

FIk+1

{1,2,…,n}

Extension-based vs Reduction-based

Frequent vs Infrequent

slide31

Pincer-Search

Hybrid Search (Top-down + Bottom-up)

Key steps: initially CMFI={I}

Given FIk-1, Ck , CMFI and MFI

Count Ck  CMFI to obtain FIk , IFIk and new MFI

Use MFI to prune FIk (BP1, MaxPS)

Use IFIk to update CMFI

Generate Ck+1

Join (Extending FIk using BasicPS2)

Recover missing candidates

Prune (BP2)

slide32

Pincer-Search(cont.)

topdown

1

2

3

4

5

12

13

14

23

24

34

pruned

pruned

1234

bottomup

12345

slide33

FP-Growth

FP-tree: a compact form of DB/sub-DB

Key steps: FP-growth(N,N-tree)

if N-tree is a single path N{x,y,z}

then a possible mfi is found Nx Ny Nz

else { extend N with xFX(N)

construct Nx-tree

FP-growth(N{x},Nx-tree)}

slide34

FP-Growth(cont.)

c:1

f:4

f

c

a

b

m

p

p(mbacf/c)

b

a

c

f

b:1

m(bacf/acf)

b:1

c:3

cp

p:1

pruned

a:3

p’s subDB:fcam,fcam,cb

p’s FP-tree: c

m’s subDB: fca,fca,vcab

m’s FP-tree: fca

b:1

m:2

m:1

p:2

slide35

FP-Growth(cont.)

Depth-first

MaxPS (if used for MFI-mining)

Dynamic Reordering

Projected subDB

Without Candidate Generation?

Construct subDB for N  CX(N)

Single path  MaxPS

Mining frequent 1-itemset in subDB  FX(N)

slide36

MaxMiner

Breadth-first + Pruning

Key Steps: At node N with CX(N)

Count NCX(N), N{x} for xCX(N) to get FX(N)

If NCX(N) is frequent, prune using MaxPS2

Reorder FX(N) using DR1

Generate N’s children N{x} for xFX(N)

with CX(N{x})={y | yFX(N) and y > x}

MaxPS3 + LB-PS

slide37

DepthProject

Depth-first + Pruning

Key Steps: At node N with CX(N), call DP(N,DB)

Count N{x} in DB to obtain FX(N)

Prune using DFMaxPS, MaxPS1

Project DB to obtain subDB (if necessary)

Reorder FX(N) using DR1

For each xFX(N):

DP(N{x}, subDB)

Output: a superset of MFI

slide38

DepthProject(cont.)

Projected DB

DB Proj.DB for {a} a ({b,c})

abc FX(a) bc [101] ab ac

acd c abc

abe b [1010]

bd

slide39

DepthProject(cont.)

Project DB for some nodes on a path

Bitstring representation

Byte Counting

Bucket Counting

slide40

MAFIA

Depth-first + Pruning

Key Steps: At node N, call MAFIA(N, MFI)

If NCX(N) MFI then prune using MaxPS1

Count N{x} obtain FX(N) using EquivPS

Reorder FX(N) using DR1

For each xFX(N)

MAFIA(N{x}, MFI)

If on leftmost path, prune using DFMaxPS

slide41

MAFIA(cont.)

Data Representation

Vertical bitmap and byte counting

Bitmap of item(set) N - bmp(N)

N

N {x}

Tran. j

0/1

t(N {x}) = t(N)t(x)

bmp(N) AND bmp(x)

slide42

GenMax

Depth-first + Pruning

Key Steps

Compute FI1 and FI2

Reorder FI1 using DR2 + DR1

MFI =  used for MaxPS1

LMFI( , FI1, MFI) //use diffsets

Return MFI

slide43

GenMax(cont.)

MFI-subset check: progressive focusing

LMFI(N,FX(N),LMFI)

For each xFX(N)

Generate N{x}with CX(N)

If NxCX(Nx) LMFI // MaxPS1

then return

Count CX(Nx) to obtain FX(Nx)

update LMFI to obtain newLMFI

LMFI(Nx, FX(Nx), newLMFI)

slide44

GenMax(cont.)

MFI-subset check optimization: check for local MFI

DR2

Data Representation: diffsets

slide45

Introduction

  • Frequent Itemset Extension Tree
  • Common Techniques
  • Some MFI-Mining Algorithms
  • Concluding Remarks
concluding remarks
Concluding Remarks
  • Independent components can fit together nicely
  • Search strategy: hybrid
  • Pruning strategy and dynamic reordering
  • Data projection, bitmap representation, fast counting, compression
  • Different algorithms perform well under different MFI distributions
  • MAFIA and GenMax: current state-of-the-art
slide47

References

R. C. Agarwal, et al. Depth first generation of long patterns.

R. J. Bayardo. Efficiently mining long patterns from databases.

D. Burdick, et al. MAFIA: a maximal frequent itemset algorithm for transactional databases.

K. Gouda, et al. Efficiently mining maximal frequent itemsets.

J. Han, et al. Mining frequent patterns without candidate generation.

D-I Lin, et al. Pincer-search: an efficient algorithm for discovering the maximum frequent set.