Early error detection on word level
1 / 16

Early error detection on word level - PowerPoint PPT Presentation

  • Uploaded on

Early error detection on word level. Gabriel Skantze and Jens Edlund {gabriel,[email protected] Centre for Speech Technology Department of Speech, Music and Hearing KTH, Sweden. Overview. How do we handle errors in conversational human-computer dialogue?

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about ' Early error detection on word level' - harlan-williams

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Early error detection on word level

Early error detection on word level

Gabriel Skantze and Jens Edlund

{gabriel,[email protected]

Centre for Speech Technology

Department of Speech, Music and Hearing

KTH, Sweden


  • How do we handle errors in conversational human-computer dialogue?

  • Which features are useful for error detection in ASR results?

  • Two studies on selected features:

    • Machine learning

    • Human subjects’ judgement

Error detection
Error detection

  • Early error detection

    • Detect if a given recognition result contains errors

    • e.g. Litman, D. J., Hirschberg, J., & Swertz, M. (2000).

  • Late error detection

    • Feed back the interpretation of the utterance to the user (grounding)

    • Based on the user’s reaction to that feedback, detect errors in the original utterance

    • e.g. Krahmer, E., Swerts, M., Theune, T. & Weegels, M. E. (2001).

  • Error prediction

    • Detect that errors may occur later on in the dialogue

    • e.g. Walker, M. A., Langkilde-Geary, I., Wright Hastie, H., Wright, J., & Gorin, A. (2002).

Why early error detection
Why early error detection?

  • ASR errors reflect errors in acoustic and language models. Why not fix them there?

    • Post-processing may consider systematic errors in the models, due to mismatched training and usage conditions.

    • Post-processing may help to pinpoint the actual problems in the models.

    • Post-processing can include factors not considered by the ASR, such as:

      • Prosody

      • Semantics

      • Dialogue history

Corpus collection
Corpus collection









I have the lawn on my right and a house with number two on my left

i have the lawn on right is and a house with from two on left

Study i machine learning
Study I: Machine learning

  • 4470 words

  • 73.2% correct (baseline)

  • 4/5 training data, 1/5 test data

  • Two ML algorithms tested

    • Transformation-based learning (µ-TBL)

      • Learn a cascade of rules that transforms the classification

    • Memory-based learning (TiMBL)

      • Simply store each training instance in memory

      • Compare the test instance to the stored instances and find the closest match


  • Content-words:

    • Baseline: 69.8%, µ-TBL: 87.7%, TiMBL: 87.0%

Study ii human error detection
Study II: Human error detection

  • First 15 user utterances from 4 dialogues with high WER

  • 50% of the words correct (baseline)

  • 8 judges

  • Features were varied for each utterance:

    • ASR information

    • Context information

The judges interface
The judges’ interface

Correction field

Dialogue so far

5-best list

Grey scale reflect word confidence

Utterance confidence

Conclusions discussion
Conclusions & Discussion

  • ML can be used for early error detection on word level, especially for content words.

  • Word confidence scores have some use.

  • Utterance context and lexical information improve the ML performance.

  • A rule-learning algorithm such as transformation-based learning can be used to pinpoint the specific problems.

  • N-best lists are useful for human subjects. How do we operationalise them for ML?

Conclusions discussion1
Conclusions & Discussion

  • The ML improved only slightly from the discourse context.

    • Further work in operationalising context for ML should focus on the previous utterance

  • The classifier should be tested together with a parser or keyword spotter to see if it can improve performance.

  • Other features should be investigated, such as prosody. These may improve performance further.

The end

The End

Thank you for your attention!