1 / 24

Direct Mining of Discriminative and Essential Frequent Patterns via Model-based Search Tree

Direct Mining of Discriminative and Essential Frequent Patterns via Model-based Search Tree. How to find good features from semi-structured raw data for classification. Wei Fan, Kun Zhang, Hong Cheng, Jing Gao, Xifeng Yan, Jiawei Han, Philip S. Yu, Olivier Verscheure. Feature Construction.

Download Presentation

Direct Mining of Discriminative and Essential Frequent Patterns via Model-based Search Tree

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Direct Mining of Discriminative and Essential Frequent Patterns via Model-based Search Tree How to find good features from semi-structured raw data for classification Wei Fan, Kun Zhang, Hong Cheng, Jing Gao, Xifeng Yan, Jiawei Han, Philip S. Yu, Olivier Verscheure

  2. Feature Construction • Most data mining and machine learning model assume the following structured data: • (x1, x2, ..., xk) -> y • where xi’s are independent variable • y is dependent variable. • y drawn from discrete set: classification • y drawn from continuous variable: regression • When feature vectors are good, differences in accuracy among learners are not much. • Questions: where do good features come from?

  3. Frequent Pattern-Based Feature Extraction • Data not in the pre-defined feature vectors • Transactions • Biological sequence • Graph database Frequent pattern is a good candidate for discriminative features So, how to mine them?

  4. A discovered pattern NSC 4960 NSC 699181 NSC 40773 NSC 164863 NSC 191370 FP: Sub-graph (example borrowed from George Karypis presentation)

  5. NN DT Petal.Length< 2.45 | setosa Petal.Width< 1.75 SVM versicolor virginica LR Any classifiers you can name Frequent Pattern Feature Vector Representation P1 P2 P3 Data1 1 1 0 Data2 1 0 1 Data3 1 1 0 Data4 0 0 1 ……… Mining these predictive features is an NP-hard problem. 100 examples can get up to 1010 patterns Most are useless

  6. Example • 192 examples • 12% support (at least 12% examples contain the pattern), 8600 patterns returned by itemsets • 192 vs 8600 ? • 4% support, 92,000 patterns • 192 vs 92,000 ?? • Most patterns have no predictive power and cannot be used to construct features. • Our algorithm • Find only 20 highly predictive patterns • can construct a decision tree with about 90% accuracy

  7. Data in “bad” feature space Discriminative patterns A non-linear combination of single feature(s) Increase the expressive and discriminative power of the feature space An example y 1 1 0 x 1 1 Data is non-linearly separable in (x, y)

  8. New Feature Space Map Data to a Different Space • Solving Problem 0 1 1 ItemSet: F: x=0,y=0 Association rule F: x=0  y=0 1 1 F Mine & Transform 1 1 0 x 1 1 y • Data is linearly separable in (x, y, F)

  9. Computational Issues • Measured by its “frequency” or support. • E.g. frequent subgraphs with sup ≥ 10% or ≥ 10% examples contain these patterns • “Ordered” enumeration: cannot enumerate “sup = 10%” without first enumerating all patterns > 10%. • NP hard problem, easily up to 1010 patterns for a realistic problem. • Most Patterns are Non-discriminative. • Low support patterns can have high “discriminative power”. Bad! • Random sampling not work since it is not exhaustive. • Most patterns are useless. Random sample patterns (or blindly enumerate without considering frequency) is useless. • Small number of examples. • If subset of vocabulary, incomplete search. • If complete vocabulary, won’t help much but introduce sample selection bias problem, particularly to miss low support but high info gain patterns

  10. DataSet Mined Discriminative Patterns 1 2 4 Frequent Patterns 1---------------------- ---------2----------3 ----- 4 --- 5 -------- --- 6 ------- 7------ select mine NN represent F1 F2 F4 Data1 1 1 0 Data2 1 0 1 Data3 1 1 0 Data4 0 0 1 ……… DT Petal.Length< 2.45 | setosa Petal.Width< 1.75 SVM versicolor virginica LR Any classifiers you can name Conventional Procedure Two-Step Batch Method • Mine frequent patterns (>sup) • Select most discriminative patterns; • Represent data in the feature space using such patterns; Build classification models. Feature Construction and Selection

  11. DataSet Frequent Patterns 1---------------------- ---------2----------3 ----- 4 --- 5 -------- --- 6 ------- 7------ mine Two Problems • Mine step • combinatorial explosion 2. patterns not considered if minsupport isn’t small enough 1. exponential explosion

  12. Mined Discriminative Patterns 1 2 4 Frequent Patterns 1---------------------- ---------2----------3 ----- 4 --- 5 -------- --- 6 ------- 7------ select Two Problems 4. Correlation not directly evaluated on their joint predictability • Select step • Issue of discriminative power 3. InfoGain against the complete dataset, NOT on subset of examples

  13. dataset Mine & SelectP: 20% Most discriminative F based on IG 1 N Y Mine & Select P:20% Mine & SelectP: 20% 5 2 N N Y Y Mine & Select P:20% Mine & Select P:20% 3 6 7 4 N N Y … Y N Y + + Few Data … Direct Mining & Selection via Model-based Search Tree Feature Miner Classifier Compact set of highly discriminative patterns 1 2 3 4 5 6 7 . . . • Basic Flow Global Support: 10*20%/10000=0.02% Divide-and-Conquer Based Frequent Pattern Mining Mined Discriminative Patterns

  14. Analyses (I) • Scalability (Theorem 1) • Upper bound • “Scale down” ratio to obtain extremely low support pat: • Bound on number of returned features (Theorem 2)

  15. Analyses (II) • Subspace is important for discriminative pattern • Original set: no-information gain if • C1 and C0: number of examples belonging to class 1 and 0 • P1: number of examples in C1 that contains “a pattern α” • P0: number of examples in C0 that contains the same pattern α • Subsets could have info gain: • Non-overfitting • Optimality under exhaustive search

  16. Experimental Studies: Itemset Mining (I) dataset dataset Mine & SelectP: 20% Mine & SelectP: 20% Most discriminative F based on IG Most discriminative F based on IG 1 1 N N Y Y Mine & Select P:20% Mine & Select P:20% Mine & SelectP: 20% Mine & SelectP: 20% 5 5 2 2 N N N N Y Y Y Y Mine & Select P:20% Mine & Select P:20% Mine & Select P:20% Mine & Select P:20% Global Support: 10*20%/10000=0.02% Global Support: 10*20%/10000=0.02% 3 3 6 6 7 7 4 4 N N N N Y Y Y Y + + + + Few Data Few Data • Scalability Comparison

  17. Experimental Studies: Itemset Mining (II) 4 Wins 1 loss much smaller number of patterns • Accuracy of Mined Itemsets

  18. Experimental Studies: Itemset Mining (III) • Convergence

  19. Experimental Studies: Graph Mining (I) • 9 NCI anti-cancer screen datasets • The PubChem Project. URL: pubchem.ncbi.nlm.nih.gov. • Active (Positive) class : around 1% - 8.3% • 2 AIDS anti-viral screen datasets • URL: http://dtp.nci.nih.gov. • H1: CM+CA – 3.5% • H2: CA – 1%

  20. dataset dataset Mine & SelectP: 20% Mine & SelectP: 20% Most discriminative F based on IG Most discriminative F based on IG 1 1 N N Y Y Mine & Select P:20% Mine & Select P:20% Mine & SelectP: 20% Mine & SelectP: 20% 5 5 2 2 N N N N Y Y Y Y Mine & Select P:20% Mine & Select P:20% Mine & Select P:20% Mine & Select P:20% Global Support: 10*20%/10000=0.02% Global Support: 10*20%/10000=0.02% 3 3 6 6 7 7 4 4 N N N N Y Y Y Y + + + + Few Data Few Data Experimental Studies: Graph Mining (II) • Scalability

  21. Experimental Studies: Graph Mining (III) • AUC and Accuracy AUC 11 Wins 10 Wins 1 Loss

  22. Experimental Studies: Graph Mining (IV) • AUC of MbT, DT MbT VS Benchmarks 7 Wins, 4 losses

  23. Summary • Model-based Search Tree • Integrated feature mining and construction. • Dynamic support • Can mine extremely small support patterns • Both a feature construction and a classifier • Not limited to one type of frequent pattern: plug-play • Experiment Results • Itemset Mining • Graph Mining • Software and Dataset available from: • www.cs.columbia.edu/~wfan

More Related