The 1980’s
This presentation is the property of its rightful owner.
Sponsored Links
1 / 17

The 1980’s PowerPoint PPT Presentation


  • 87 Views
  • Uploaded on
  • Presentation posted in: General

The 1980’s. Collection of large standard corpora Front ends: auditory models, dynamics Engineering: scaling to large vocabulary continuous speech Second major (D)ARPA ASR project HMMs become ready for prime time. Standard Corpora Collection. Before 1984, chaos TIMIT RM (later WSJ)

Download Presentation

The 1980’s

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


The 1980 s

The 1980’s

  • Collection of large standard corpora

  • Front ends: auditory models, dynamics

  • Engineering: scaling to large vocabulary continuous speech

  • Second major (D)ARPA ASR project

  • HMMs become ready for prime time


The 1980 s

Standard Corpora Collection

  • Before 1984, chaos

  • TIMIT

  • RM (later WSJ)

  • ATIS

  • NIST, ARPA, LDC


The 1980 s

Front Ends in the 1980’s

  • Mel cepstrum (Bridle, Mermelstein)

  • PLP (Hermansky)

  • Delta cepstrum (Furui)

  • Auditory models (Seneff, Ghitza, others)


The 1980 s

Mel Frequency Scale


The 1980 s

Spectral vs Temporal

Processing

Analysis

(e.g., cepstral)

frequency

Spectral processing

Time

Processing

(e.g., mean removal)

frequency

Temporal processing


The 1980 s

Dynamic Speech Features

  • temporal dynamics useful for ASR

  • local time derivatives of cepstra

  • “delta’’ features estimated over multiple frames (typically 5)

  • usually augments static features

  • can be viewed as a temporal filter


The 1980 s

“Delta” impulse response

.2

.1

0

-2

-1

0

1

2

frames

-.1

-.2


The 1980 s

HMM’s for ContinuousSpeech

  • Using dynamic programming for cts speech(Vintsyuk, Bridle, Sakoe, Ney….)

  • Application of Baker-Jelinek ideas to continuous speech (IBM, BBN, Philips, ...)

  • Multiple groups developing major HMMsystems (CMU, SRI, Lincoln, BBN, ATT)

  • Engineering development - coping with data, fast computers


The 1980 s

2nd (D)ARPA Project

  • Common task

  • Frequent evaluations

  • Convergence to good, but similar, systems

  • Lots of engineering development - now up to 60,000 word recognition, in real time, on aworkstation, with less than 10% word error

  • Competition inspired others not in project -Cambridge did HTK, now widely distributed


The 1980 s

Knowledge vs. Ignorance

  • Using acoustic-phonetic knowledge in explicit rules

  • Ignorance represented statistically

  • Ignorance-based approaches (HMMs) “won”, but

  • Knowledge (e.g., segments) becoming statistical

  • Statistics incorporating knowledge


The 1980 s

Some 1990’s Issues

  • Independence to long-term spectrum

  • Adaptation

  • Effects of spontaneous speech

  • Information retrieval/extraction withbroadcast material

  • Query-style systems (e.g., ATIS)

  • Applying ASR technology to relatedareas (language ID, speaker verification)


The 1980 s

Where Pierce Letter Applies

  • We still need science

  • Need language, intelligence

  • Acoustic robustness still poor

  • Perceptual research, models

  • Fundamentals of statistical patternrecognition for sequences

  • Robustness to accent, stress,rate of speech, ……..


The 1980 s

Progress in 25 Years

  • From digits to 60,000 words

  • From single speakers to many

  • From isolated words to continuousspeech

  • From no products to many products,some systems actually saving LOTSof money


The 1980 s

Real Uses

  • Telephone: phone company services(collect versus credit card)

  • Telephone: call centers for queryinformation (e.g., stock quotes, parcel tracking)

  • Dictation products: continuous recognition, speaker dependent/adaptive


The 1980 s

But:

  • Still <97% accurate on “yes” for telephone

  • Unexpected rate of speech causes doublingor tripling of error rate

  • Unexpected accent hurts badly

  • Accuracy on unrestricted speech at 60%

  • Don’t know when we know

  • Few advances in basic understanding


The 1980 s

ErrorRate

Class

1

2

3

4

5

6

7

8

9

0

1

191

0

0

5

1

0

1

0

2

0

4.5

2

0

188

2

0

0

1

3

0

0

6

6.0

3

0

3

191

0

1

0

2

0

3

0

4.5

4

8

0

0

187

4

0

1

0

0

0

6.5

5

0

0

0

0

193

0

0

0

7

0

3.5

6

0

0

0

0

1

196

0

2

0

1

2.0

7

2

2

0

2

0

1

190

0

1

2

5.0

8

0

1

0

0

1

2

2

196

0

0

2.0

9

5

0

2

0

8

0

3

0

179

3

10.5

0

1

4

0

0

0

1

1

0

1

192

4.5

Overall error rate 4.85%

Confusion Matrix for Digit Recognition


The 1980 s

‘88

‘89

‘90

‘91

‘92

‘93

‘94

Large Vocabulary CSR

ErrorRate%

12

9

Ø

1

6

3

Year

--- RM ( 1K words, PP 60)

___WSJØ, WSJ1(5K, 20-60K words, PP 100)

~~

~~


  • Login