1 / 27

LING 439/539: Statistical Methods in Speech and Language Processing

LING 439/539: Statistical Methods in Speech and Language Processing. Ying Lin Department of Linguistics University of Arizona. Welcome!. Get the syllabus Fill out and return the information sheet Email: yinglin@email.arizona.edu Office: Douglass 224

calder
Download Presentation

LING 439/539: Statistical Methods in Speech and Language Processing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. LING 439/539: Statistical Methods in Speech and Language Processing Ying Lin Department of Linguistics University of Arizona

  2. Welcome! • Get the syllabus • Fill out and return the information sheet • Email: yinglin@email.arizona.edu • Office: Douglass 224 • OH: MW 2:00 --3:00 by appoint (also teaching another undergrad class) • Course webpage: see syllabus • Listserv coming soon.

  3. 438/538 and 439/539 • LING 438/538 (Computational Linguistics): • Symbolic representations (mostly syntax), e.g. FSA, CFG. • Focus on logic • Simple probabilistic models, e.g. N-grams.

  4. 438/538 and 439/539 • This class complements 438/538: • Numerical representations (speech signals): need digital signal processing • Focus on statistics/learning • More sophisticated probabilistic models, e.g. HMM, PCFG

  5. Main reference texts (!) • Huang, Acero and Hon (2001). Spoken Language Processing: A guide to theory, algorithm, and system development. Prentice-Hall. • Manning and Schutze (1999). Foundations of Statistical Natural Language Processing. MIT Press. • Rabiner and Juang (1993). Fundamental of Speech Recognition. Prentice-Hall. • Duda, Hart and Stork (2001). Pattern Classification (2nd ed). JohnWiley & Sons. • Rabiner and Schafer (1978). Digital Processing of Speech Signals. Prentice-Hall. • Hastie, Tibshirani and Friedman (2001). The Elements of Statistical Learning. Springer.

  6. Guideline for course reading • There is no single book that covers all of our materials. • Most books are written either for EE or CS audience only. • A few chapters are selected from each book (see the reading list). Lecture notes will summarize the reading. • Expect a rough ride for the first time -- feedback is greatly appreciated!

  7. Three skills for this class • 1. Linguistics: understanding source of particular patterns. • 2. Math/Statistics: underlying principles of the model. • 3. Programming: implementation • This class emphasizes 2, reason: • Models are based on simple structures • Programming skills require much practice

  8. What is “statistical approach”? • Narrow: uses statistical principle, I.e. based on the probability calculus or other theories of inductive inference • Compared to logic: dedutive inference • Broad: any work that uses a quantative measure of success • Relevant to both language engineering and linguistic science

  9. What is “statistical approach”? • Narrow: uses statistical principle, I.e. based on the probability calculus or other theories of inductive inference • Compared to logic: dedutive inference • Broad: any work that uses a quantative measure of success • Relevant to both anguage engineering and linguistic science Thiscourse

  10. Language engineering: speech recognition • Tasks: increasing level of difficulty WordError Rate

  11. A brief history of speech recognition • 1950’s: U.S. government started funding research on automatic recognition of speech • 1960-70’s: Isolated words, digit strings • Debate: rules v.s. statistics • Dynamic time warping • 1980-now: continuous speech, speech understanding, spoken dialog • Hidden Markov model dominates

  12. Why the rules didn’t work? • Completely bottom-up approach: • Rules are hand-coded by experts • Problem: variability in speech • Sophisticated, symbolic rules are not flexible enough to handle continuous speech Phonetic rules Phonological rules “How are you?” h A U A j o U

  13. The rise of statistical methods in speech • Initial solution: hire many linguists to continually improve the rule system • This turns out to be costly and slow, failing the high expectation • Advantage of statistical models: • Allows training on different data: flexible, scalable • Computing power much cheaper than expert • Drives the move to less and less constrained tasks • Bitterness: “every time I fire a linguist, the word error rate goes up” -- F. Jelinek (IBM)

  14. The rise of statistics in NLP • Very similar scenarios also happened in NLP: • E.g. tagging, parsing, machine translation • “Old” NLP: deductive systems, hand-coded • “New” NLP: broad-coverage, corpus-based, emphasize training, evaluation • Speech is now merging with NLP • Many tools originated in speech, then got copied to NLP • New task keep emerging: web as an (unstructured) data source

  15. Basic architecture of today’s ASR system Language model Acoustic modeling p(M1),p(M2) X Audio speech Feature extraction Likelihood p(X|M1), p(X|M2) Scoring rank Model parameters trained offline: M1 = “I recognize speech” M2 = “I wreck a nice beach” … ANSWER

  16. Component 1: signal processing / feature extraction • First 1/3 of the course (also useful for understanding synthesis):

  17. Examples of some common features

  18. Component 2: Acoustic models • Mixture of Gaussians: p(ot | qi) =  • Dimension reduction: principle component analysis, linear discriminant analysis, parameter tying

  19. start j ou end a Component 3:Pronunciation modeling • Model for differnent pronunciations of “you” in continuous speech • Other types of units: triphones, syllables Each unit is an HMM

  20. Component 4: Language model • Provide the probability of word sequence models p(M) to combine with the acoustic model p(X|M) • Common: N-gram with smoothing, backoff, very hard and specialized business • Just starting to integrate parsing • Fundamental equation:M* = argmaxM p(M|X) = argmaxM p(X|M)p(M)Viterbi, beam, A*, N-best search

  21. ASR: example of a generative model • Component 2+3+4 provide an instance of generative models • Language M generates word sequences • Word sequence generates pronunciation • Pronunciation generates acoustic features • Unsupervised learning/training • Maximum likelihood estimation • Expectation-Maximization algorithm (different incarnations) • Main focus of this class

  22. Other models to look at: • Descriptive/maximum entropy models • Started in vision, then copied to speech, then NLP • Discriminative models: directly using data to construct classifiers, with weak assumptions about prob distribution • Supervised learning, focus on the perspective of classification Input string Feature vector Output labels count classifier “Machine learning approach to NLP”

  23. Problem solved? • No, improvements are mostly due to larger training set and speed up Driven byMoore’s law?

  24. Challenges • Environment distortion (microphone, noise, cocktail party) breaks feature extraction • Acoustic condition mismatch • Between + within speaker variability breaks the pronunciation modeling and acoustic modeling • Conversational speech breaks the language model • Understanding these problems is crucial for improving the performance of ASR

  25. Dreaming • “2001: A Space Odyssey” (1968) Dave: “Open the pod bay doors, HAL” HAL9000: “I’m sorry Dave. I’m afraid I can’t do that.”

  26. The reality,before the problem is solved • Speech is used as a user interface only when people can’t use hand • Driving a car (use speech to drive?) • Device too small (cellphone) • Customer service (who will tolerate touch tone?) • Dictation (how many people actually use it?)

  27. For next time: • We will start with signal processing • Uses engineering math, including power series (including convergence), trigonometric functions, integration and representation of complex numbers. • If you forgot or do not know these materials, please look for references and study it before class.

More Related