1 / 40

Discriminative Training and Machine Learning Approaches

Discriminative Training and Machine Learning Approaches. Chih-Pin Liao. Machine Learning Lab, Dept. of CSIE, NCKU. Discriminative Training. Our Concerns. Feature extraction and HMM modeling should be jointly performed. Common objective function should be considered.

rcarrie
Download Presentation

Discriminative Training and Machine Learning Approaches

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Discriminative Trainingand Machine Learning Approaches Chih-Pin Liao Machine Learning Lab, Dept. of CSIE, NCKU

  2. Discriminative Training

  3. Our Concerns • Feature extraction and HMM modeling should be jointly performed. • Common objective function should be considered. • To alleviate model confusion and improve recognition performance, we should estimate HMM using discriminative criterion built from statistics theory. • Model parameters should be calculated rapidly without applying descent algorithm.

  4. Minimum Classification Error (MCE) • MCE is a popular discriminative training algorithm developed for speech recognition and extended to other PR applications. • Rather than maximizing likelihood of observed data, MCE aims to directly minimize classification errors. • Gradientdescent algorithm was used to estimate HMM parameters.

  5. MCE Training Procedure • Procedure of training discriminative models using observations X • Discriminant function • Anti-discriminant function • Misclassification measure

  6. Expected Loss • Loss function is calculated by mapping into a range between zero to one through a sigmoid function. • Minimize the expected loss or classification error to find discriminative model.

  7. Hypothesis Test

  8. Likelihood Ratio Test • New training criterion was derived from hypothesis test theory. • We are testing null hypothesis against alternative hypothesis. • Optimal solution is obtained by a likelihood ratio test according to Neyman-Pearson Lemma • Higher likelihood ratio imply stronger confidence towards accepting null hypothesis.

  9. Hypotheses in HMM Training • Null and alternative hypotheses : ObservationsX are from target HMM state j : Observation X are not from target HMM state j • We develop discriminative HMM parameters for target state against non-target states. • Problem turns out to verify the goodness of data alignment to the corresponding HMM states.

  10. Maximum Confidence Hidden Markov Model

  11. Maximum Confidence HMM • MCHMM is estimated by maximizing the log likelihood ratio or the confidence measure where parameter set consists of HMM parameters and transformation matrix

  12. Hybrid Parameter Estimation • Expectation-maximization (EM) algorithm is applied to tackle missing data problem for maximum confidence estimation • E-step

  13. Expectation Function

  14. MC Estimates of HMM Parameters

  15. MC Estimates of HMM Parameters

  16. MC Estimate of Transformation Matrix

  17. MC Classification Rule • Let Y denote an input test image data. We apply the same criterion to identify the most likely category corresponding to Y

  18. Summary • A new maximum confidence HMM framework was proposed. • Hypothesis test principle was used for building training criterion. • Discriminative feature extraction and HMM modeling were performed under the same criterion. • “Maximum Confidence Hidden Markov Modeling for Face Recognition”Chien, Jen-Tzung; Liao, Chih-Pin;Pattern Analysis and Machine Intelligence, IEEE Transactions onVolume 30,  Issue 4,  April 2008 Page(s):606 – 616

  19. Machine Learning Approaches

  20. Introduction • Conditional Random Fields (CRF) • relax the normal conditional independence assumption of the likelihood model • enforce the homogeneity of labeling variables conditioned on the observation • Due to the weak assumptions of CRF model and its discriminative nature • allows arbitrary relationship among data • may require less resources to train its parameters

  21. Better performance of CRF models than the Hidden Markov Model (HMM) and Maximum Entropy Markov models (MEMMs) • language and text processing problem • Object recognition problems • Image and video segmentation • tracking problem in video sequences

  22. Generative & Discriminative Model

  23. Two Classes of Models • Generative model (HMM) - model the distribution of states • Direct model (MEMM and CRF) - model the posterior probability directly MEMM CRF

  24. Comparisons of Two Kinds of Model • Generative model – HMM • Use Bayesian rule approximation • Assume that observations are independent • Multiple overlapping features are not modeled • Model is estimated through recursive Viterbi algorithm

  25. Direct model - MEMM and CRF • Direct modeling of posterior probability • Dependencies of observations are flexibly modeled • Model is estimated through recursive Viterbi algorithm

  26. Hidden Markov Model & Maximum Entropy Markov Model

  27. HMM for Human Motion Recognition • HMM is defined by • Transition probability • Observation probability

  28. Maximum Entropy Markov Model • MEMM is defined by • is used to replace transition and observation probability in HMM model

  29. Maximum Entropy Criterion • Definition of feature functions where • Constrained optimization problem where empirical expectation model expectation

  30. Solution of MEMM • Lagrange multipliers are used for constrained optimization where are the model parameters • Solution is obtained by

  31. GIS Algorithm • Optimize the Maxmimum Mutual Information Criterion (MMI) • Step1: Calculate the empirical expectation • Step2: Start from an initial value • Step3: Calculate the model expectation • Step4: Update model parameters • Repeat step 3 and 4 until convergence

  32. Conditional Random Field

  33. Conditional Random Field • Definition Let be a graph such that . When conditioned on , and obeyed the Markov property Then, is a conditional random field

  34. CRF Model Parameters • The undirected graphical structure can be used to factorize into a normalized product of potential functions • Consider the graph as a linear-chain structure • Model parameter set • Feature function set

  35. CRF Parameter Estimation • We can rewrite and maximize the posterior probability where and • Logposterior probability is given by

  36. Parameter Updating by GIS Algorithm • Differentiating the log posterior probability with respect to parameter • Setting this derivative to zero yields the constraint in maximum entropy model • This estimation has no closed-form solution. We can use GIS algorithm.

  37. Summary and Future works • We construct complex CRF with cycle for better modeling of contextual dependency. Graphical model algorithm is applied. • In the future, the variational inference algorithm will be developed for improving calculation of conditional probability. • The posterior probability can be calculated directly by a approximating approach. • “Graphical modeling of conditional random fields for human motion recognition” Liao, Chih-Pin; Chien, Jen-Tzung;ICASSP 2008. IEEE International Conference on March 31 2008-April 4 2008 Page(s):1969 - 1972

  38. Thanks for your attention and Discussion

More Related