1 / 15

Sample-Separation-Margin Based Minimum Classification Error Training of Pattern Classifiers with Quadratic Discriminant

Sample-Separation-Margin Based Minimum Classification Error Training of Pattern Classifiers with Quadratic Discriminant Functions. Yongqiang Wang 1,2 , Qiang Huo 1 1 Microsoft Research Asia, Beijing, China 2 The University of Hong Kong, Hong Kong, China (qianghuo@microsoft.com).

danica
Download Presentation

Sample-Separation-Margin Based Minimum Classification Error Training of Pattern Classifiers with Quadratic Discriminant

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sample-Separation-Margin Based Minimum Classification Error Training of Pattern Classifiers with Quadratic Discriminant Functions Yongqiang Wang1,2 , Qiang Huo1 1Microsoft Research Asia, Beijing, China 2The University of Hong Kong, Hong Kong, China (qianghuo@microsoft.com) ICASSP-2010, Dallas, Texas, U.S.A., March 14-19, 2010

  2. Outline • Background • What’s our new approach • How does it work • Conclusions

  3. Background of Minimum Classification Error (MCE) Formulation for Pattern Classification • Pioneered by Amari and Tsypkin in late 1960s • S. Amari, “A theory of adaptive pattern classifiers,” IEEE Trans. On Electronic Computers, Vol. EC-16, No. 3, pp.299-307, 1967. • Y. Z. Tsypkin, Adaptation and learning in automatic systems, 1971. • Y. Z. Tsypkin, Foundations of the theory of learning systems, 1973. • Proposed originally for supervised online adaptation of a pattern classifier • to minimize the expected risk (cost) • via a sequential probabilistic descent (PD) algorithm • Extended by Juang and Katagiri in early 1990s • B.-H. Juang and S. Katagiri, “Discriminative learning for minimum error classification,” IEEE Trans. on Signal Processing, Vol. 40, No. 12, pp.3043-3054, 1992.

  4. MCE Formulation by Juang and Katagiri (1) • Define a proper discriminant function of an observation for each pattern class • To enable a maximum discriminant decision rule for pattern classification • Largely an art and application dependent

  5. MCE Formulation by Juang and Katagiri (2) • Define a misclassification measure for each observation • to embed the decision process in the overall MCE formulation • to characterize the degree of confidence (or margin) in making decision for this observation • a differentiable function of the classifier parameters • A popular choice: where • Many possible ways => which one is better? => an open problem!

  6. MCE Formulation by Juang and Katagiri (3) • Define a loss (cost) function for each observation • a differentiable and monotonically increasing function of the misclassification measure • many possibilities => sigmoid function most popular for approximating MCE • MCE training via minimizing • empirical average loss (cost) by an appropriate optimization procedure, e.g., gradient descent (GD), Quickprop, Rprop, etc., or • expected loss (cost) by a sequential probabilistic descent (PD) algorithm (a.k.a. GPD)

  7. Some Remarks • Combinations of different choices for each of the previous three steps and optimization methods lead to various MCE training algorithms. • The power of MCE training has been demonstrated by many research groups for different pattern classifiers in different applications. • How to improve the generalization capability of an MCE-trained classifier?

  8. One Possible Solution: SSM-based MCE Training • Sample Separation Margin (SSM) • Defined as the smallest distance of an observation to the classification boundary formed by the true class and the most competing class • There is a closed-form solution for piecewise linear classifier • Define misclassification measure as negative SSM • Other parts of the formulation is the same as “traditional” MCE • A happy result  • Minimized empirical error rate, and • Improved generalization • Correctly recognized training samples have a large margin from the decision boundaries! • For more info: • T. He and Q. Huo, “ A study of a new misclassification measure for minimum classification error training of prototype-based pattern classifiers, ’’ in Proc. ICPR-2008

  9. What’s New in This Study? • Extend SSM-based MCE training to pattern classifier with a quadratic discriminant function (QDF) • No closed-form solution to calculate SSM • Demonstrate its effectiveness on a large-scale Chinese handwriting recognition task • Modified QDF (MQDF) is widely used in state-of-the-art Chinese handwriting recognition systems

  10. Two Technical Issues • How to calculate the SSM efficiently? • Formulated as a nonlinear programming problem • Can be solved efficiently because it is a quadratically constrained quadratic programming (QCQP) problem with a very special structure: • A convex objective function with one quadratic equality constraint • How to calculate the derivative of the SSM? • Using a technique known as sensitivity analysis in nonlinear programming • Calculated by using the solution to the problem in Eq. (1) • Please refer to our paper for details

  11. Experimental Setup • Vocabulary: • 6763 simplified Chinese characters • Dataset: • Training: 9,447,328 character samples • # of samples per class: 952 – 5,600 • Testing: 614,369 character samples • Feature extraction: • 512 “8-directional features” • Use LDA to reduce dimension to 128 • Use MQDF for each character class • # of retained eigenvectors: 5 and 10 • SSM-based MCE Training • Use maximum likelihood (ML) trained model as seed model • Update mean vectors only in MCE training • Optimize MCE objective function by batch-mode Quickprop (20 epochs) Distribution of writing styles in testing data

  12. Experimental Results (1) • MQDF, K=5

  13. Experimental Results (2) • MQDF, K=10

  14. Experimental Results (3) • Histogram of SSMs on training set • SSM-based MCE-trained classifier vs. conventional MCE-trained one • Training samples are pushed away from decision boundaries • Bigger the SSM, better the generalization

  15. Conclusion and Discussions • SSM-based MCE training offers an implicit way of minimizing empirical error rate and maximizing sample separation margin simutaneously • Verified for quadratic classifiers in this study • Verified for piecewise linear classifiers previously (He&Huo, ICPR-2008) • Ongoing and future works • SSM-based MCE training for discriminative feature extraction • SSM-based MCE training for more flexible classifiers based on GMM and HMM • Searching for other (hopefully better) methods to combine MCE training and maximum margin training

More Related