direct mining of discriminative and essential frequent patterns via model based search tree n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Direct Mining of Discriminative and Essential Frequent Patterns via Model-based Search Tree PowerPoint Presentation
Download Presentation
Direct Mining of Discriminative and Essential Frequent Patterns via Model-based Search Tree

Loading in 2 Seconds...

play fullscreen
1 / 24

Direct Mining of Discriminative and Essential Frequent Patterns via Model-based Search Tree - PowerPoint PPT Presentation


  • 76 Views
  • Uploaded on

Direct Mining of Discriminative and Essential Frequent Patterns via Model-based Search Tree. How to find good features from semi-structured raw data for classification. Wei Fan, Kun Zhang, Hong Cheng, Jing Gao, Xifeng Yan, Jiawei Han, Philip S. Yu, Olivier Verscheure. Feature Construction.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Direct Mining of Discriminative and Essential Frequent Patterns via Model-based Search Tree' - maisie-joseph


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
direct mining of discriminative and essential frequent patterns via model based search tree

Direct Mining of Discriminative and Essential Frequent Patterns via Model-based Search Tree

How to find good features from semi-structured raw data

for classification

Wei Fan, Kun Zhang, Hong Cheng,

Jing Gao, Xifeng Yan, Jiawei Han,

Philip S. Yu, Olivier Verscheure

feature construction
Feature Construction
  • Most data mining and machine learning model assume the following structured data:
    • (x1, x2, ..., xk) -> y
    • where xi’s are independent variable
    • y is dependent variable.
      • y drawn from discrete set: classification
      • y drawn from continuous variable: regression
  • When feature vectors are good, differences in accuracy among learners are not much.
  • Questions: where do good features come from?
frequent pattern based feature extraction
Frequent Pattern-Based Feature Extraction
  • Data not in the pre-defined feature vectors
    • Transactions
    • Biological sequence
    • Graph database

Frequent pattern is a good candidate for discriminative features So, how to mine them?

fp sub graph

A discovered pattern

NSC 4960

NSC 699181

NSC 40773

NSC 164863

NSC 191370

FP: Sub-graph

(example borrowed from George Karypis presentation)

frequent pattern feature vector representation

NN

DT

Petal.Length< 2.45

|

setosa

Petal.Width< 1.75

SVM

versicolor

virginica

LR

Any classifiers you can name

Frequent Pattern Feature Vector Representation

P1 P2 P3

Data1 1 1 0

Data2 1 0 1

Data3 1 1 0

Data4 0 0 1

………

Mining these predictive

features is an NP-hard

problem.

100 examples can get up to

1010 patterns

Most are useless

example
Example
  • 192 examples
    • 12% support (at least 12% examples contain the pattern), 8600 patterns returned by itemsets
      • 192 vs 8600 ?
    • 4% support, 92,000 patterns
      • 192 vs 92,000 ??
    • Most patterns have no predictive power and cannot be used to construct features.
  • Our algorithm
    • Find only 20 highly predictive patterns
    • can construct a decision tree with about 90% accuracy
data in bad feature space
Data in “bad” feature space

Discriminative patterns

A non-linear combination of single feature(s)

Increase the expressive and discriminative power of the feature space

An example

y

1

1

0

x

1

1

Data is non-linearly separable in (x, y)

new feature space
New Feature Space

Map Data to

a Different Space

  • Solving Problem

0

1

1

ItemSet:

F: x=0,y=0

Association rule

F: x=0  y=0

1

1

F

Mine & Transform

1

1

0

x

1

1

y

  • Data is linearly separable in (x, y, F)
computational issues
Computational Issues
  • Measured by its “frequency” or support.
    • E.g. frequent subgraphs with sup ≥ 10% or ≥ 10% examples contain these patterns
  • “Ordered” enumeration: cannot enumerate “sup = 10%” without first enumerating all patterns > 10%.
  • NP hard problem, easily up to 1010 patterns for a realistic problem.
  • Most Patterns are Non-discriminative.
  • Low support patterns can have high “discriminative power”. Bad!
  • Random sampling not work since it is not exhaustive.
    • Most patterns are useless. Random sample patterns (or blindly enumerate without considering frequency) is useless.
    • Small number of examples.
      • If subset of vocabulary, incomplete search.
      • If complete vocabulary, won’t help much but introduce sample selection bias problem, particularly to miss low support but high info gain patterns
slide10

DataSet

Mined

Discriminative

Patterns

1 2 4

Frequent Patterns

1----------------------

---------2----------3

----- 4 --- 5 --------

--- 6 ------- 7------

select

mine

NN

represent

F1 F2 F4

Data1 1 1 0

Data2 1 0 1

Data3 1 1 0

Data4 0 0 1

………

DT

Petal.Length< 2.45

|

setosa

Petal.Width< 1.75

SVM

versicolor

virginica

LR

Any classifiers you can name

Conventional Procedure

Two-Step Batch Method

  • Mine frequent patterns (>sup)
  • Select most discriminative patterns;
  • Represent data in the feature space using such patterns;

Build classification models.

Feature Construction and Selection

two problems

DataSet

Frequent Patterns

1----------------------

---------2----------3

----- 4 --- 5 --------

--- 6 ------- 7------

mine

Two Problems
  • Mine step
    • combinatorial explosion

2. patterns not considered if minsupport isn’t small enough

1. exponential explosion

two problems1

Mined

Discriminative

Patterns

1 2 4

Frequent Patterns

1----------------------

---------2----------3

----- 4 --- 5 --------

--- 6 ------- 7------

select

Two Problems

4. Correlation not

directly evaluated on their joint predictability

  • Select step
    • Issue of discriminative power

3. InfoGain against the complete dataset, NOT on subset of examples

direct mining selection via model based search tree

dataset

Mine & SelectP: 20%

Most discriminative F based on IG

1

N

Y

Mine & Select P:20%

Mine & SelectP: 20%

5

2

N

N

Y

Y

Mine & Select P:20%

Mine & Select P:20%

3

6

7

4

N

N

Y

Y

N

Y

+

+

Few Data

Direct Mining & Selection via Model-based Search Tree

Feature Miner

Classifier

Compact set of highly discriminative patterns

1

2

3

4

5

6

7

.

.

.

  • Basic Flow

Global Support:

10*20%/10000=0.02%

Divide-and-Conquer Based Frequent Pattern Mining

Mined Discriminative Patterns

analyses i
Analyses (I)
  • Scalability (Theorem 1)
    • Upper bound
    • “Scale down” ratio to obtain extremely low support pat:
  • Bound on number of returned features (Theorem 2)
analyses ii
Analyses (II)
  • Subspace is important for discriminative pattern
    • Original set: no-information gain if
      • C1 and C0: number of examples belonging to class 1 and 0
      • P1: number of examples in C1 that contains “a pattern α”
      • P0: number of examples in C0 that contains the same pattern α
    • Subsets could have info gain:
  • Non-overfitting
  • Optimality under exhaustive search
experimental studies itemset mining i
Experimental Studies: Itemset Mining (I)

dataset

dataset

Mine & SelectP: 20%

Mine & SelectP: 20%

Most discriminative F based on IG

Most discriminative F based on IG

1

1

N

N

Y

Y

Mine & Select P:20%

Mine & Select P:20%

Mine & SelectP: 20%

Mine & SelectP: 20%

5

5

2

2

N

N

N

N

Y

Y

Y

Y

Mine & Select P:20%

Mine & Select P:20%

Mine & Select P:20%

Mine & Select P:20%

Global Support:

10*20%/10000=0.02%

Global Support:

10*20%/10000=0.02%

3

3

6

6

7

7

4

4

N

N

N

N

Y

Y

Y

Y

+

+

+

+

Few Data

Few Data

  • Scalability Comparison
experimental studies itemset mining ii
Experimental Studies: Itemset Mining (II)

4 Wins 1 loss

much smaller

number of

patterns

  • Accuracy of Mined Itemsets
experimental studies itemset mining iii
Experimental Studies: Itemset Mining (III)
  • Convergence
experimental studies graph mining i
Experimental Studies: Graph Mining (I)
  • 9 NCI anti-cancer screen datasets
    • The PubChem Project. URL: pubchem.ncbi.nlm.nih.gov.
    • Active (Positive) class : around 1% - 8.3%
  • 2 AIDS anti-viral screen datasets
    • URL: http://dtp.nci.nih.gov.
    • H1: CM+CA – 3.5%
    • H2: CA – 1%
experimental studies graph mining ii

dataset

dataset

Mine & SelectP: 20%

Mine & SelectP: 20%

Most discriminative F based on IG

Most discriminative F based on IG

1

1

N

N

Y

Y

Mine & Select P:20%

Mine & Select P:20%

Mine & SelectP: 20%

Mine & SelectP: 20%

5

5

2

2

N

N

N

N

Y

Y

Y

Y

Mine & Select P:20%

Mine & Select P:20%

Mine & Select P:20%

Mine & Select P:20%

Global Support:

10*20%/10000=0.02%

Global Support:

10*20%/10000=0.02%

3

3

6

6

7

7

4

4

N

N

N

N

Y

Y

Y

Y

+

+

+

+

Few Data

Few Data

Experimental Studies: Graph Mining (II)
  • Scalability
experimental studies graph mining iii
Experimental Studies: Graph Mining (III)
  • AUC and Accuracy

AUC

11 Wins

10 Wins 1 Loss

slide22

Experimental Studies: Graph Mining (IV)

  • AUC of MbT, DT MbT VS Benchmarks

7 Wins, 4 losses

summary
Summary
  • Model-based Search Tree
    • Integrated feature mining and construction.
    • Dynamic support
    • Can mine extremely small support patterns
    • Both a feature construction and a classifier
    • Not limited to one type of frequent pattern: plug-play
  • Experiment Results
    • Itemset Mining
    • Graph Mining
  • Software and Dataset available from:
    • www.cs.columbia.edu/~wfan