ee 516 lecture 1 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
EE 516 Lecture 1 PowerPoint Presentation
Download Presentation
EE 516 Lecture 1

Loading in 2 Seconds...

play fullscreen
1 / 29

EE 516 Lecture 1 - PowerPoint PPT Presentation


  • 103 Views
  • Uploaded on

EE 516 Lecture 1. Geoffrey Zweig Microsoft Research 4/2/2009. Our Topics. Introducing today!. From JHU 2002 SuperSID Final Presentation – Reynolds et al. Topic Coverage By Day. Data Representations and Models (4/23) Vector Quantization Gaussian Mixtures The EM Algorithm

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'EE 516 Lecture 1' - terrene


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
ee 516 lecture 1

EE 516 Lecture 1

Geoffrey Zweig

Microsoft Research

4/2/2009

our topics
Our Topics

Introducing today!

From JHU 2002 SuperSID Final Presentation – Reynolds et al.

topic coverage by day
Topic Coverage By Day
  • Data Representations and Models (4/23)
    • Vector Quantization
    • Gaussian Mixtures
    • The EM Algorithm
  • Speaker Identification (5/7)
  • Language Identification (5/7)
  • Hidden Markov Models (5/14)
    • Dynamic Programming
  • Building a Speech Recognizer (5/14)
language identification why do it
Language Identification – Why Do it?
  • Multi-lingual society
    • Applications should be able to deal with anyone
  • Businesses
    • Automated help systems
    • Reservations, account access, etc.
    • Travel
      • Airport Kiosks
      • Train stations
  • Government
    • Funds research to identify languages
    • Runs evaluations in it
how do you do it
How Do You Do it?

English Acoustic Model

French Acoustic Model

Output Likeliest

Tamil Acoustic Model

Gaussian Mixture Models - 4/23

how do you do it 2
How Do You Do It? (2)

“p ih n s” – probably English…

“k r p s t” – probably Czech…

Simple HMMs – 5/14

Language Models – 4/30

After Zissman 1996

how do you do it 3
How Do You Do It (3)

Same methods multiple times

Acero et al., Chapter 4

4/23

After Zissman 1996

how do you do it 4
How Do You Do It? (4)

Run a complete speech recognizer in each language

And we will see several other ways, and combinations!

After Zissman 1996

gauging progress the nist evaluations
Gauging Progress – The NIST Evaluations
  • National Institute of Standards and Technology
  • Has sponsored benchmark tests in multiple language processing areas for over a decade
    • Topic Detection & Tracking
    • Content Extraction
    • Video Analysis
    • Speech Recognition
    • Language Identification
    • Speaker Identification
    • Machine Translation
    • http://www.itl.nist.gov/iad/mig/tests/
  • Coordination with site funding by Defense Advanced Research Projects Agency (DARPA)
  • Along with business interest, the driving force in advancing the State-of-the-Art
how well can it be done testing conditions
How Well Can It Be Done? – Testing Conditions
  • 26 languages and dialects
  • Telephone speech
  • Multiple duration conditions
    • 3, 10, 30 seconds
  • Detection Error Tradeoff (DET) Curves used to measure performance
language identification project
Language Identification Project
  • Build a language ID system with the Call Friend Data set
  • Implement several of the main techniques
  • Set up a demo on your laptop that will recognize someone’s language
flavors of speaker recognition
Flavors of Speaker Recognition

Our Focus!

From JHU 2002 SuperSID Final Presentation – Reynolds et al.

speaker recognition why do it
Speaker Recognition – Why Do It?
  • Personal Applications
    • Voice-print passwords
    • Voicemail transcription – who left that message?
  • Business Applications
    • Calling your bank
  • Government
    • Is that Osama calling from Pakistan?
    • Prison call monitoring
    • Automated parolee calling – is he where you think?
how do you do it1
How Do You Do It?

The most basic approach:

Gaussian Mixture Models - 4/23

More recently:

Support vector machines operating on GMMs (!)

how do you do it 21
How Do You Do It? (2)

Also use high-level information!

From JHU 2002 SuperSID Final Presentation – Reynolds et al.

how well can it be done who salutes
How Well Can It Be Done – Who Salutes?

From NIST 2008 SRE Presentation, Martin & Greenberg

more salutes
More Salutes

From NIST 2008 SRE Presentation, Martin & Greenberg

from europe
From Europe

From NIST 2008 SRE Presentation, Martin & Greenberg

more from europe
More From Europe

From NIST 2008 SRE Presentation, Martin & Greenberg

u s entries
U.S. Entries

From NIST 2008 SRE Presentation, Martin & Greenberg

how well can it be done testing conditions1
How Well Can It Be Done – Testing Conditions
  • Conditions for different amounts of data
    • 10 sec.
    • 3-5 minutes
    • 8 minutes
    • Separate channel and summed channel conditions
  • English-speakers, non-English speakers, multilingual speakers
speaker verification project
Speaker Verification Project
  • Implement a Speaker-ID system
    • Template based
    • GMM based
    • SVM based
    • Vector space model
  • Demonstrate it:
    • NIST data, e.g. 2001 Evaluation
    • Your own voice – implement on laptop
speech recognition project
Speech Recognition Project
  • Implement an HMM based recognition system
  • Use, e.g., Phonebook isolated word data data set or Aurora digit set
  • Write features with existing front-end
  • Build your own HMM trainer/decoder
  • Set it up on your laptop for online word recognition (?!)
highlights of syllabus
Highlights of Syllabus
  • Required Texts:
    • Huang, Acero, Hon: Spoken Language Processing
    • Deng and O’Shaughnessy, Speech Processing
    • EE516 Reader, at Professional Copy ‘n Print, 4200 University Way
  • Grading:
    • Projects: 50%
    • Final Exam: 30%
    • Homework 20%
  • Projects:
    • Small team or individual
      • Teams are self-forming
    • Presentation times TBD
    • Read ahead & pick an area!!!
      • Talk to relevant instructor
    • Suggest deciding no later than 4/30
  • Office Hours at end of class and by appointment
  • Please sign in on email list!