Fast Ensemble Pruning Algorithm Leveraging Pattern Mining for Efficient Online Learning

A fast ensemble pruning algorithm based on pattern mining process 17 July 2009 Springer Science+Business Media, LLC 2009 69821514洪佳瑜 69821516蔣克欽

Outline • Motive • Introduction • Method • Experiment • Conclusion

Motive • most ensemble pruning methods in the literature need much pruning time,and are mainly used to the domains where time can be sacrificed in order to improveaccuracy. This makes them unsuitable for the applications requiring fast learning process,such as on-line network intrusiondetection.

Introduction • pattern mining based ensemble pruning (PMEP) • The algorithm converts an ensemble pruning problem into aspecial pattern mining problem,which enables a FP-Tree to store the prediction resultsof all base classifiers, then uses a new pattern mining method to select base classifiers. • The final output of our PMEP approach is the pruned ensemblewith the highest correctvalue.

Properties of PMEP (1/2) • Firstly, it uses a transaction database to represent the prediction results of all base classifiers. This representation enables a FP-Tree to compact the results, and the ensemble pruning process becomes to pattern mining problems. • Secondly, PMEP uses majority voting principle to decide the candidate classifiers before pattern mining process. For a given k, PMEP only considers the paths with length of [k/2 + 1] in the FP-Tree.

Properties of PMEP (2/2) • Thirdly, the pattern mining method greedily selects a set of classifiers, instead of one in each iteration, which saves pruning time further.

Method (1/7)

Method (2/7) For any i (1 ≤ i ≤ n), if we have:Li = L 或 Li =0 we delete their corresponding rows from table T to reduce computational cost.

Method FP-Tree (3/7)

Method (4/7) • suppose k = 5, we have:

Method (5/7) • the largest count value, and its classifier set is {h2, h5, h6}.We addthese three classifiers into S.set, and set S.correct=3. • S.set={ h2, h5, h6}, S.correct=3. • Then delete

Method (6/7) • the first row has the maximum count value, so the base classifier h7 is selected. • S.set={ h2, h5, h6, h7}, S.correct=7.

Method (7/7) • the classifier sets {h1} and {h4} have the same count value. Consideringthat the path of h1 is constructed earlier in Path-Table than that of h4, we add h1into S.set. • S.set={h2, h5, h6, h7, h1}, andS. correct=8

advantages • the classifiers with negative effect for ensemble have low probability to be selected because of low count values • the selected classifiers come from multiple paths, which makes them have low error correlation.

Experiment We compared the performance of ourapproach, PMEP, against Bagging (Breiman 1996), GASEN (Zhou et al. 2002), and Forward Selection (FS) (Caruana et al. 2004)in our empirical study. Test platform: AMD 4000+, 2G RAM C programming language Linux operating system

All the tests are performed on 15 data sets from UCI machine leaning repository

Results of prediction accuracy

Sizes of pruned ensembles for each data set, the last one is the average result of all 15 data sets Avg :20 7.43 3.775.70

Results of pruningtime (s)

Conclusion • The experimental results have shown that the proposed PMEP achieves the highest prediction accuracy, and costs much less pruning time than GASEN and forward selection. • The design of our PMEP algorithm is aimed at majority voting method, how to extend the algorithm to other combination strategies is the other of our works.

THANK

algorithm

Fast Ensemble Pruning Algorithm Leveraging Pattern Mining for Efficient Online Learning

Fast Ensemble Pruning Algorithm Leveraging Pattern Mining for Efficient Online Learning

Presentation Transcript

Frequent Pattern Mining

A Fast String Matching Algorithm

Trajectory Pattern Mining

Sequential Pattern Mining

A Fast String Matching Algorithm

Trajectory Pattern Mining

A New Fast Motion Estimation Algorithm Based on H.264

Mining Individual Life Pattern Based on Location History: A Paradigm and Framework

A Fast High Utility Itemsets Mining Algorithm

An Efficient Candidate Pruning Technique for High Utility Pattern Mining

Sequential Pattern Mining

Sequential Pattern Mining

Trajectory Pattern Mining

A Fast Algorithm for Multi-Pattern Searching

A Fast String Searching Algorithm

Hash-Based Algorithm for Mining Association Rules

WhereNext : a Location Predictor on Trajectory Pattern Mining

Frequent Pattern Mining

Discriminative Pattern Mining