1 / 24

A fast ensemble pruning algorithm based on pattern mining process

A fast ensemble pruning algorithm based on pattern mining process. 17 July 2009 Springer Science+Business Media, LLC 2009. 69821514 洪佳瑜 69821516 蔣克欽. Outline. M otive Introduction Method Experiment Conclusion. M otive.

mab
Download Presentation

A fast ensemble pruning algorithm based on pattern mining process

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A fast ensemble pruning algorithm based on pattern mining process 17 July 2009 Springer Science+Business Media, LLC 2009 69821514洪佳瑜 69821516蔣克欽

  2. Outline • Motive • Introduction • Method • Experiment • Conclusion

  3. Motive • most ensemble pruning methods in the literature need much pruning time,and are mainly used to the domains where time can be sacrificed in order to improveaccuracy. This makes them unsuitable for the applications requiring fast learning process,such as on-line network intrusiondetection.

  4. Introduction • pattern mining based ensemble pruning (PMEP) • The algorithm converts an ensemble pruning problem into aspecial pattern mining problem,which enables a FP-Tree to store the prediction resultsof all base classifiers, then uses a new pattern mining method to select base classifiers. • The final output of our PMEP approach is the pruned ensemblewith the highest correctvalue.

  5. Properties of PMEP (1/2) • Firstly, it uses a transaction database to represent the prediction results of all base classifiers. This representation enables a FP-Tree to compact the results, and the ensemble pruning process becomes to pattern mining problems. • Secondly, PMEP uses majority voting principle to decide the candidate classifiers before pattern mining process. For a given k, PMEP only considers the paths with length of [k/2 + 1] in the FP-Tree.

  6. Properties of PMEP (2/2) • Thirdly, the pattern mining method greedily selects a set of classifiers, instead of one in each iteration, which saves pruning time further.

  7. Method (1/7)

  8. Method (2/7) For any i (1 ≤ i ≤ n), if we have:Li = L 或 Li =0 we delete their corresponding rows from table T to reduce computational cost.

  9. Method FP-Tree (3/7)

  10. Method (4/7) • suppose k = 5, we have:

  11. Method (5/7) • the largest count value, and its classifier set is {h2, h5, h6}.We addthese three classifiers into S.set, and set S.correct=3. • S.set={ h2, h5, h6}, S.correct=3. • Then delete

  12. Method (6/7) • the first row has the maximum count value, so the base classifier h7 is selected. • S.set={ h2, h5, h6, h7}, S.correct=7.

  13. Method (7/7) • the classifier sets {h1} and {h4} have the same count value. Consideringthat the path of h1 is constructed earlier in Path-Table than that of h4, we add h1into S.set. • S.set={h2, h5, h6, h7, h1}, andS. correct=8

  14. advantages • the classifiers with negative effect for ensemble have low probability to be selected because of low count values • the selected classifiers come from multiple paths, which makes them have low error correlation.

  15. Experiment We compared the performance of ourapproach, PMEP, against Bagging (Breiman 1996), GASEN (Zhou et al. 2002), and Forward Selection (FS) (Caruana et al. 2004)in our empirical study. Test platform: AMD 4000+, 2G RAM C programming language Linux operating system

  16. All the tests are performed on 15 data sets from UCI machine leaning repository

  17. Results of prediction accuracy

  18. Sizes of pruned ensembles for each data set, the last one is the average result of all 15 data sets Avg :20 7.43 3.775.70

  19. Results of pruningtime (s)

  20. Conclusion • The experimental results have shown that the proposed PMEP achieves the highest prediction accuracy, and costs much less pruning time than GASEN and forward selection. • The design of our PMEP algorithm is aimed at majority voting method, how to extend the algorithm to other combination strategies is the other of our works.

  21. THANK

  22. algorithm

More Related