1 / 44

Hybrids of generative and discriminative methods for machine learning

MSRC Summer School - 30/06/2009. Hybrids of generative and discriminative methods for machine learning. Cambridge – UK. Motivation. Generative models prior knowledge handle missing data such as labels Discriminative models perform well at classification However

taima
Download Presentation

Hybrids of generative and discriminative methods for machine learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MSRC Summer School - 30/06/2009 Hybrids of generative anddiscriminative methods for machine learning Cambridge – UK

  2. Motivation • Generative models • prior knowledge • handle missing data such as labels • Discriminative models • perform well at classification • However • no straightforward way to combine them

  3. Content • Generative and discriminative methods • A principled hybrid framework • Study of the properties on a toy example • Influence of the amount of labelled data

  4. Content • Generative and discriminative methods • A principled hybrid framework • Study of the properties on a toy example • Influence of the amount of labelled data

  5. Generative methods • Answer: “what does a cat look like? and a dog?” => data and labels joint distribution x : data c : label  : parameters

  6. Generative methods • Objective function: G() = p() p(X, C|) G() = p() n p(xn, cn|) • 1 reusable model per class, can deal with incomplete data • Example: GMMs

  7. Example of generative model

  8. Discriminative methods • Answer: “is it a cat or a dog?” => labels posterior distribution x : data c : label  : parameters

  9. Discriminative methods • The objective function is D() = p() p(C|X, ) D() = p() n p(cn|xn, ) • Focus on regions of ambiguity, make faster predictions • Example: neural networks, SVMs

  10. Example of discriminative model SVMs / NNs

  11. Generative versus discriminative No effect of the double mode on the decision boundary

  12. Content • Generative and discriminative methods • A principled hybrid framework • Study of the properties on a toy example • Influence of the amount of labelled data

  13. Semi-supervised learning • Few labelled data / lots of unlabelled data • Discriminative methods overfit, generative models only help classify if they are “good” • Need to have the modelling power of generative models while performing at discriminating => hybrid models

  14. p(xn, cn|) c p(xn, c|) Discriminative trainingBach et al, ICASSP 05 • Discriminative objective function: D() = p() n p(cn|xn, ) • Using a generative model: D() = p() n [ p(xn, cn|) / p(xn|) ] D() = p() n

  15. Convex combinationBouchard et al, COMPSTAT 04 • Generative objective function: G() = p() n p(xn, cn|) • Discriminative objective function: D() = p() n p(cn|xn, ) • Convex combination: log L() =   log D() + (1- )  log G() [0,1]

  16. A principled hybrid model

  17. A principled hybrid model

  18. A principled hybrid model

  19. A principled hybrid model

  20. A principled hybrid model •  - posterior distribution of the labels ’- marginal distribution of the data  and ’ communicate through a prior • Hybrid objective function: L(,’) = p(,’)  n p(cn|xn, ) n p(xn|’)

  21. A principled hybrid model •  = ’ => p(, ’) = p() (-’) L(,’) = p() (-’) n p(cn|xn, ) n p(xn|’) L() = G() generative case •   ’ => p(, ’) = p() p(’) L(,’) = [ p() n p(cn|xn, ) ]  [ p(’) n p(xn|’) ] L(,’) = D()  f(’) discriminative case

  22. A principled hybrid model • Anything in between – hybrid case • Choice of prior: p(, ’) = p() N(’|, (a)) a  0 =>   0 =>  = ’ a 1 =>    =>   ’

  23. Why principled? • Consistent with the likelihood of graphical models => one way to train a system • Everything can now be modelled => potential to be Bayesian • Potential to learn a

  24. Learning • EM / Laplace approximation / MCMC either intractable or too slow • Conjugate gradients flexible, easy to check BUT sensitive to initialisation, slow • Variational inference

  25. Content • Generative and discriminative methods • A principled hybrid framework • Study of the properties on a toy example • Influence of the amount of labelled data

  26. Toy example

  27. Toy example • 2 elongated distributions • Only spherical gaussians allowed => wrong model • 2 labelled points per class => strong risk of overfitting

  28. Toy example

  29. Decision boundaries

  30. Content • Generative and discriminative methods • A principled hybrid framework • Study of the properties on a toy example • Influence of the amount of labelled data

  31. A real example • Images are a special case, as they contain several features each • 2 levels of supervision: at the image level, and at the feature level • Image label only => weakly labelled • Image label + segmentation => fully labelled

  32. The underlying generative model multinomial multinomial gaussian

  33. The underlying generative model weakly – fully labelled

  34. Experimental set-up • 3 classes: bikes, cows, sheep • : 1 Gaussian per class => poor generative model • 75 training images for each category

  35. HF framework

  36. HF versus CC

  37. Results • When increasing the proportion of fully labelled data, the trend is: generative  hybrid  discriminative • Weakly labelled data has little influence on the trend • With sufficient fully labelled data, HF tends to perform better than CC

  38. Experimental set-up • 3 classes: lions, tigers and cheetahs • : 1 Gaussian per class => poor generative model • 75 training images for each category

  39. HF framework

  40. HF versus CC

  41. Results • Hybrid models consistently perform better • However, generative and discriminative models haven’t reached saturation • No clear difference between HF and CC

  42. Conclusion • Principled hybrid framework • Possibility to learn the best trade-off • Helps for ambiguous datasets when labelled data is scarce • Problem of optimisation

  43. Future avenues • Bayesian version (posterior distribution of ) under study • Replace  by a diagonal matrix  to allow flexibility => need for the Bayesian version • Choice of priors

  44. Thank you!

More Related