1 / 14

Midter m Review

Midter m Review. Spoken Language Processing Prof. Andrew Rosenberg. Lecture 1 - Overview. Applications speech recognition speech synthesis other applications: indexing, language id , etc. Information in speech words speaker identity speaker state discourse acts.

heller
Download Presentation

Midter m Review

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Midterm Review Spoken Language Processing Prof. Andrew Rosenberg

  2. Lecture 1 - Overview • Applications • speech recognition • speech synthesis • other applications: indexing, language id, etc. • Information in speech • words • speaker identity • speaker state • discourse acts

  3. Lecture 2 – From Sounds to Language • Differences between orthography and sounds • Phonetic symbol sets • e.g. IPA, ARPAbet. • Vocal organs • articulators • Classes of sounds • Coarticulation

  4. Lecture 3 – Spoken Dialog Systems • Maxims of Conversational Implicature • Dialog System Architecture • Speech Recognition • Dialog Management • Response Generation • Speech Synthesis • Dialog Strategies

  5. Lecture 4 – Acoustics of Speech • Phone Recognition • Prosody • Speech Waveforms • Analog to Digital Conversion • Nyquist Rate • Pitch Doubling and Halving

  6. Lecture 5 – Speech Recognition Overview • History of Speech Recognition • Rule based recognition • Dynamic Time Warping • Statistical Modeling • What are qualities that make speech recognition difficult? • Noisy Channel Model • Training and Test Corpora • Word Error Rate

  7. Lecture 6 – Fast Fourier Transform • Multiplying Polynomials • Divide-and-Conquer for multiplying polynomials. • Relationship between multiplying polynomials and cosine transform • Complex roots at unity

  8. Lecture 7 - MFCC • What is the MFCC used for? • Overlapping Windows • Mel Frequency • Spectrogram

  9. Lecture 8 – Statistical Modeling • Probabilities • Bayes Rule • Bayesians vs. Frequentists • Maximum Likelihood Estimation • Multinomial Distribution • Bernoulli Distribution • Gaussian Distribution • Multidimensional Gaussian • Difference between Classification, Clustering, Regression • Black Swans and the Long Tail

  10. Lecture 9 – Acoustic Modeling • What does an Acoustic Model do? • Gaussian Mixture Model • Potential Problems • Inconsistent Numbers of Gaussians • Singularities • Training Acoustic Models.

  11. Lecture 10 – Hidden Markov Model • The Markov Assumption • Difference between states and observations • Finite State Automata • Decoding using Viterbi • Forced Alignment • Flat Start • Silence

  12. Lecture 11 - Pronunciation Modeling • Dictionary • Finite State Automata • Use in speech recognition • Using morphology for pronunciation modeling • Grapheme to Phoneme Conversion • Letter to Sound rules • Machine Learning for G-to-P

  13. Lecture 12 – Language Modeling • Using a Context Free Grammar to define a set of recognized sequences of words. • Terminals, non-terminals, start state • N-Gram models • Mathematical underpinnings • Theoretical background • How a “word” is defined. • Learning n-gram statistics • Terminology

  14. Next Class • Midterm Exam • Reading: J&M Chapter 4

More Related