1 / 25

I-Ling Chen 1 , Bor-Chen Kuo 1 , Chen-Hsuan Li 2 , Chih-Cheng Hung 3

Combining Ensemble Technique of Support Vector Machines with the Optimal Kernel Method for High Dimensional Data Classification. I-Ling Chen 1 , Bor-Chen Kuo 1 , Chen-Hsuan Li 2 , Chih-Cheng Hung 3

hharris
Download Presentation

I-Ling Chen 1 , Bor-Chen Kuo 1 , Chen-Hsuan Li 2 , Chih-Cheng Hung 3

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Combining Ensemble Technique of Support Vector Machines with the Optimal Kernel Method for High Dimensional Data Classification I-Ling Chen1, Bor-Chen Kuo1, Chen-Hsuan Li2, Chih-Cheng Hung3 1 Graduate Institute of Educational Measurement and Statistics, National Taichung University, Taichung, Taiwan, R.O.C. 2 Department of Electrical and Control Engineering, National Chiao Tung University, Taiwan, R.O.C. 3 School of Computing and Software Engineering, Southern Polytechnic State University, GA, U.S.A.

  2. Outline • Introduction • Statement of problems • The Objective • Literature Review • Support Vector Machines • Kernel method • Multiple Classifier System • Random subspace method , Dynamic subspace method • An Optimal Kernel Method for selecting RBF Kernel Parameter • Optimal Kernel-based Dynamic Subspace Method • Experimental Design and Results • Conclusion and Future Work

  3. INTRODUCTION

  4. Hughes Phenomenon (Hughes, 1968) or so called curse of dimensionality, peaking phenomenon Small sample size, N low performance High dimensionality, d

  5. Support Vector Machines (SVM) • Proposed by Vapnik and Coworkers (1992, 1995, 1996, 1997, 1998) • It’s robust and effect to Hughes phenomenon. (Bruzzone & Persello, 2009; Camps-Valls, Gomez-Chova, Munoz-Mari, Vila-Frances, Calpe-Maravilla,2006; Melgani & Bruzzone,2004; Camps-Valls & Bruzzone, 2005; Fauvel, Chanussot, & Benediktsson, 2006) • SVM includes • Kernel Trick • Support Vector Learning

  6. The Goal of Kernel Method for Classification • The samples in the same class can be mapped into the same area. • The samples in the different classes can be mapped into the different areas.

  7. margins support vector optimal hyperplane support vectors Support Vector Learning • SV learning tries to learn a linear separating hyperplane for a two-class classification problem via a given training set. Illustration of SV learning with kernel trick: nonlinear feature mapping

  8. Multiple Classifier System • There are two effective approaches for generating an ensemble of diverse base classifiers via different feature subsets. (Ho, T. K. ,1998 ; Yang, J-M., Kuo, B-C., Yu,P-T. & Chuang, C-H. 2010) • Approaches to building classifier ensembles. Kuncheva, L. I. (2004). Combining Pattern Classifiers: Methods and Algorithms. Hoboken, NJ: Wiley & Sons.

  9. THE FRAMEWORK OF RANDOM SUBSPACE METHOD (RSM) BASED ON SVM (HO, 1998) Given the learning algorithm, SVM, and the ensemble size, S.

  10. THE INADEQUACIES OF RSM Given the learning algorithm, SVM, and the ensemble size, S. Given w random features selection *Implicit Number How to choose a suitable subspace dimensionality for the SVM. Without an appropriate subspace dimensionality for the SVM, RSM might be inferior to a single classifier. *Irregular Rule Each individual feature potentially possesses the different discriminate power for classification. A randomized strategy for selecting feature is unable to distinguish between informative features and redundant ones.

  11. DYNAMIC SUBSPACE METHOD (DSM) (Yang et al., 2010) 4 2 2 ML kNN Density (%) Density (%) 3 1 1 Density (%) 0 0 2 1 49 97 145 191 1 49 97 145 191 Feature Feature 1 2 2 SVM BCC Density (%) Density (%) 0 1 1 1 49 97 145 191 Feature 0 0 1 49 97 145 191 1 49 97 145 191 Feature Feature • Two importance distributions • Importance distribution of feature weight, W distributionto model the selected probability of each feature. • Importance distribution of subspace dimensionality, R distributionto automatically determine the suitable subspace size. Class separability of LDA for each feature Re-substitution accuracy for each feature Kernel smoothing Initialization R0

  12. THE FRAMEWORK OF DSM BASED ON SVM Given the learning algorithm, SVM, and the ensemble size, S.

  13. INADEQUACIES OF DSM * time-consuming Choosing a proper kernel function or a better parameter of kernel for SVM is quite important yet ordinarily time-consuming. Especially, an updating R distribution is obtained by the resubstitution accuracy in DSM. Given the learning algorithm, SVM, and the ensemble size, S. *Kernel function The SVM algorithm provides an effective way to perform supervised classification. However, The kernel function is a critical topic to influence the performance of SVM.

  14. An Optimal Kernel Method for Selecting RBF Kernel Parameter • The performances of SVM are based on choosing the proper kernel functions or proper parameters of a kernel function. • Li, Lin, Kuo, and Chu (2010) present a novel criterion to choose a proper parameter σ of RBF kernel function automatically. Gaussian Radial Basis Function (RBF) kernel : • In the feature space determined by the RBF kernel, the norm of every sample is one, and the kernel values are positive. Hence, the samples will be mapped onto the surface of a hypersphere.

  15. Kernel-based Dynamic Subspace Method(KDSM)

  16. THE FRAMEWORK OF KDSM Kernel based W distribution Optimal RBF Kernel Algorithm Separability Feature (Band) Kernel based Feature Selection Distribution Mdist Kernel Space (L-dimension) Optimal RBF Kernel Algorithm+Kernel Smoothing Subspace Pool (Reduced Dataset) Original Dataset X Until the performance of classification is stable Multiple Classifiers Decision Fusion (Majority Voting)

  17. Experiment Design OP : the optimal method to choose CV : 5-fold cross-validation We use the grid search within a range [0.01, 10] (suggested by Bruzzone & Persello, 2009) to choose a proper parameter (2σ2) of RBF kernel and a set {0.1, 1, 10, 20, 60, 100, 160, 200, 1000} to choose a proper parameter of slack variable to control the margins.

  18. EXPERIMENTAL DATASET

  19. ExperimentalResults • There are three cases in Washington, DC Mall. case 1: ; case 2: • case 3: : the number of training samples in class i : the number of all training samples

  20. Experiment Results in Washington, DC Mall The outcome of classification by using various multiple classifier systems:

  21. Classification Maps with Ni =20 in Washington, DC Mall SVM_OP SVM_CV □ Background ■Water ■ Tree ■Path ■Grass ■Roof ■Road ■Shadow DSM_WACC DSM_WLDA KDSM

  22. Classification Maps (roof) with Ni =40 SVM_OP SVM_CV □ Background ■Water ■ Tree ■Path ■Grass ■Roof ■Road ■Shadow DSM_WACC DSM_WLDA KDSM

  23. Classification Maps with Ni =300 in Washington, DC Mall SVM_OP SVM_CV □ Background ■Water ■ Tree ■Path ■Grass ■Roof ■Road ■Shadow DSM_WACC DSM_WLDA KDSM

  24. Conclusions • In this paper, the core of the presented method, KDSM, is applying both optimal algorithm of selecting the proper RBF parameter and dynamic subspace method in the subspace selection based MCS to improve the result of classification in high dimensional dataset. • The experimental results showed that the classification accuracies of KDSM invariably are the best among outcomes of all classifiers in each cases of Washington DC Mall datasets. • Moreover, these results show that comparing with DSM, the KDSM can not only obtain more accurate outcome of classification but also economize on computer time.

  25. Thank You

More Related