1 / 15

John Steinberg Institute for Signal and Information Processing Temple University

A Comparative Analysis of Bayesian Nonparametric Variational Inference Algorithms for Speech Recognition. John Steinberg Institute for Signal and Information Processing Temple University Philadelphia, Pennsylvania, USA. Introduction. The Motivating Problem.

nikita
Download Presentation

John Steinberg Institute for Signal and Information Processing Temple University

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Comparative Analysis of Bayesian Nonparametric Variational Inference Algorithms for Speech Recognition John Steinberg Institute for Signal and Information Processing Temple University Philadelphia, Pennsylvania, USA

  2. Introduction

  3. The Motivating Problem • A set of data is generated from multiple distributions but it is unclear how many.

  4. Goals • Use a nonparametric Bayesian approach to learn the underlying structure of speech data • Investigate viability of three variational inference algorithms for acoustic modeling: • Accelerated Variational Dirichlet Process Mixtures (AVDPM) • Collapsed Variational Stick Breaking (CVSB) • Collapsed Dirichlet Priors (CDP) • Assess Performance: • Compare error rates to parametric GMM models • Understand computational complexity

  5. Goals: An Explanation • Why use Dirichlet process mixtures (DPMs)? • Goal: Automatically determine an optimal # of mixture components for each phoneme model • DPMs generate priors needed to solve this problem! • What is “Stick Breaking”? Dir~1 • Step 1: Let p1 = θ1. Thus the stick, now has a length of 1–θ1. • Step 2: Break off a fraction of the remaining stick, θ2. Now, p2 = θ2(1–θ1) and the length of the remaining stick is (1–θ1)(1–θ2). If this is repeated k times, then the remaining stick's length and corresponding weight is: θ1 θ3 θ2

  6. Background

  7. Inference: An Approximation • Inference: Estimating probabilities in statistically meaningful ways • Parameter estimation is computationally difficult • Distributions over distributions ∞ parameters • Posteriors, p(y|x), can’t be analytically solved • Variational Inference • Uses independence assumptions to create simpler variational distributions, q(y), to approximate p(y|x). • Optimize q from Q = {q1, q2, …, qm} using an objective function, e.g. Kullbach-Liebler divergence • Constraints can be added to Q to improve computational efficiency

  8. Variational Inference Algorithms • Accelerated Variational Dirichlet Process Mixtures (AVDPMs) • Incorporates kd-trees to improve efficiency • Complexity O(NlogN) + O(2depth) • Collapsed Variational Stick Breaking (CVSB) & Collapsed Dirichlet Priors (CDP) • Truncates the DPM to a maximumof K clusters and marginalizes out mixture weights • Complexity O(TN) KD Tree CVSB CDP

  9. Experimental Setup

  10. Experimental Design & Data • Phoneme Recognition (TIMIT, CH-E, CH-M) • Acoustic models trained for phoneme alignment • Phoneme alignments generated using HTK

  11. Evaluation Error Rate Comparison

  12. Computational Complexity: Training Samples TIMIT

  13. Conclusions and Future Work • Conclusions • AVDPM, CVSB, and CDP yield comparable error rates to GMM models • AVDPM, CVSB, and CDP utilize much fewer #’s of mixtures per phoneme label than standard GMMs • AVDPM is much better suited for large corpora since the KD tree significantly reduces training time without substantially degrading error rates. • Performance gap between CH-E and CH-M can be attributed to # of labels • Future Work • Investigate methods to improve covariance estimation • Apply AVDPM to HDP-HMM systems to move towards complete Bayesian nonparametric speech recognizer

  14. Acknowledgements • Thanks to my committee for all of the help and support: • Dr. Iyad Obeid • Dr. Joseph Picone • Dr. Marc Sobel • Dr. Chang-Hee Won • Dr. Alexander Yates • Thanks to my research group for all of their patience and support: • Amir Harati • Shuang Lu • The Linguistic Data Consortium (LDC) for awarding a data scholarship to this project and providing the lexicon and transcripts for CALLHOME Mandarin. • Owlsnest1 1This research was supported in part by the National Science Foundation through Major Research Instrumentation Grant No. CNS-09-58854.

  15. Brief Bibliography of Related Research Bussgang, J. (2012). Seeing Both Sides. Retrieved November 27, 2012 from http://bostonvcblog.typepad.com/vc/2012/05/forget-plastics-its-all-about-machine-learning.html Ng, A. (2012). Machine Learning. Retrieved November 27, 2012 from https://www.coursera.org/course/ml K. Kurihara, M. Welling, and N. Vlassis, “Accelerated variational Dirichlet process mixtures,” Advances in Neural Information Processing Systems, MIT Press, Cambridge, Massachusetts, USA, 2007 (editors: B. Schölkopf and J.C. Hofmann). K. Kurihara, M. Welling, and Y. W. Teh, “Collapsed variational Dirichlet process mixture models,” Proceedings of the 20th International Joint Conference on Artificial Intelligence, Hyderabad, India, Jan. 2007. Steinberg, J., & Picone, J. (2012). HTK Tutorials. Retrieved from http://www.isip.piconepress.com/projects/htk_tutorials/ Harati, A., Picone, J., & Sobel, M. (2012). Applications of Dirichlet Process Mixtures to Speaker Adaptation. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 4321–4324). Kyoto, Japan. doi:10.1109/ICASSP.2012.6288875 Frigyik, B., Kapila, A., & Gupta, M. (2010). Introduction to the Dirichlet Distribution and Related Processes. Seattle, Washington, USA. Retrieved from https://www.ee.washington.edu/techsite/papers/refer/UWEETR-2010-0006.html D. M. Blei and M. I. Jordan, “Variational inference for Dirichlet process mixtures,” Bayesian Analysis, vol. 1, pp. 121–144, 2005. Zografos, V. Wikipedia. Retrieved November 27, 2012 from http://en.wikipedia.org/wiki/File:3dRosenbrock.png Rabiner, L. (1989). A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proceedings of the IEEE, 77(2), 879–893. doi:10.1109/5.18626 Quintana, F. A., & Muller, P. (2004). Nonparametric Bayesian Data Analysis. Statistical Science, 19(1), 95–110. Harati, A., & Picone, J. (2012). Applications of Dirichlet Process Models to Speech Processing and Machine Learning. IEEE Section Meeting, Northern Virginia. Fairfax, Virginia, USA. doi:http://www.isip.piconepress.com/publications/presentations_invited/2012/ieee_nova/dpm/ Teh, Y. W. (2007) Dirichlet Processes: Tutorial and Practical Course. Retrieved November 30, 2012 from http://videolectures.net/mlss07_teh_dp/

More Related