The 1980’s
Download
1 / 17

The 1980’s - PowerPoint PPT Presentation


  • 122 Views
  • Uploaded on

The 1980’s. Collection of large standard corpora Front ends: auditory models, dynamics Engineering: scaling to large vocabulary continuous speech Second major (D)ARPA ASR project HMMs become ready for prime time. Standard Corpora Collection. Before 1984, chaos TIMIT RM (later WSJ)

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' The 1980’s' - gary-dean


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

The 1980’s

  • Collection of large standard corpora

  • Front ends: auditory models, dynamics

  • Engineering: scaling to large vocabulary continuous speech

  • Second major (D)ARPA ASR project

  • HMMs become ready for prime time


Standard Corpora Collection

  • Before 1984, chaos

  • TIMIT

  • RM (later WSJ)

  • ATIS

  • NIST, ARPA, LDC


Front Ends in the 1980’s

  • Mel cepstrum (Bridle, Mermelstein)

  • PLP (Hermansky)

  • Delta cepstrum (Furui)

  • Auditory models (Seneff, Ghitza, others)



Spectral vs Temporal

Processing

Analysis

(e.g., cepstral)

frequency

Spectral processing

Time

Processing

(e.g., mean removal)

frequency

Temporal processing


Dynamic Speech Features

  • temporal dynamics useful for ASR

  • local time derivatives of cepstra

  • “delta’’ features estimated over multiple frames (typically 5)

  • usually augments static features

  • can be viewed as a temporal filter


“Delta” impulse response

.2

.1

0

-2

-1

0

1

2

frames

-.1

-.2


HMM’s for ContinuousSpeech

  • Using dynamic programming for cts speech(Vintsyuk, Bridle, Sakoe, Ney….)

  • Application of Baker-Jelinek ideas to continuous speech (IBM, BBN, Philips, ...)

  • Multiple groups developing major HMMsystems (CMU, SRI, Lincoln, BBN, ATT)

  • Engineering development - coping with data, fast computers


2nd (D)ARPA Project

  • Common task

  • Frequent evaluations

  • Convergence to good, but similar, systems

  • Lots of engineering development - now up to 60,000 word recognition, in real time, on aworkstation, with less than 10% word error

  • Competition inspired others not in project -Cambridge did HTK, now widely distributed


Knowledge vs. Ignorance

  • Using acoustic-phonetic knowledge in explicit rules

  • Ignorance represented statistically

  • Ignorance-based approaches (HMMs) “won”, but

  • Knowledge (e.g., segments) becoming statistical

  • Statistics incorporating knowledge


Some 1990’s Issues

  • Independence to long-term spectrum

  • Adaptation

  • Effects of spontaneous speech

  • Information retrieval/extraction withbroadcast material

  • Query-style systems (e.g., ATIS)

  • Applying ASR technology to relatedareas (language ID, speaker verification)


Where Pierce Letter Applies

  • We still need science

  • Need language, intelligence

  • Acoustic robustness still poor

  • Perceptual research, models

  • Fundamentals of statistical patternrecognition for sequences

  • Robustness to accent, stress,rate of speech, ……..


Progress in 25 Years

  • From digits to 60,000 words

  • From single speakers to many

  • From isolated words to continuousspeech

  • From no products to many products,some systems actually saving LOTSof money


Real Uses

  • Telephone: phone company services(collect versus credit card)

  • Telephone: call centers for queryinformation (e.g., stock quotes, parcel tracking)

  • Dictation products: continuous recognition, speaker dependent/adaptive


But:

  • Still <97% accurate on “yes” for telephone

  • Unexpected rate of speech causes doublingor tripling of error rate

  • Unexpected accent hurts badly

  • Accuracy on unrestricted speech at 60%

  • Don’t know when we know

  • Few advances in basic understanding


ErrorRate

Class

1

2

3

4

5

6

7

8

9

0

1

191

0

0

5

1

0

1

0

2

0

4.5

2

0

188

2

0

0

1

3

0

0

6

6.0

3

0

3

191

0

1

0

2

0

3

0

4.5

4

8

0

0

187

4

0

1

0

0

0

6.5

5

0

0

0

0

193

0

0

0

7

0

3.5

6

0

0

0

0

1

196

0

2

0

1

2.0

7

2

2

0

2

0

1

190

0

1

2

5.0

8

0

1

0

0

1

2

2

196

0

0

2.0

9

5

0

2

0

8

0

3

0

179

3

10.5

0

1

4

0

0

0

1

1

0

1

192

4.5

Overall error rate 4.85%

Confusion Matrix for Digit Recognition


‘88

‘89

‘90

‘91

‘92

‘93

‘94

Large Vocabulary CSR

ErrorRate%

12

9

Ø

1

6

3

Year

--- RM ( 1K words, PP 60)

___WSJØ, WSJ1(5K, 20-60K words, PP 100)

~~

~~


ad