Contextual Information for Improved Speech Recognition in Dialog Systems

Transcription ASR Output w/o context w/o context w context w context Contextual Information System Architecture Training Corpora Human Voice (raw sound data) Training Algorithms Testing Corpora Speech Recognizer CRF J48 DM DP Word Transcription steel one nine this is gator nine Interpreter Identification fdc-id = steel one nine fo-id = gator nine ASR Output Simulator Dialog Manager (Receive command, Display explosion) Confirm Identification fo-id = gator nine one fdc-id = steel one nine Generation Voice Over Radio 1 2 2 2 1 2 System Overview Background Different types of contextual information is used in different ways: Use other words surrounding the utterance of interest as context (e.g., Taylor et al., 1998) Use dialog state as context (e.g., Bohus and Rudnicky, 2003) Use dialog moves and dialog parameters as context in dialog managers which employ information state approach (e.g., Traum and Larsson, 2003) Use context to provide expectations of likely utterances (Smith and Richard Hipp, 1994) Radiobot-CFF is a dialog agent built for Call for Fire (CFF) radio dialogs. In a CFF, a Forward Observer (FO) team (played by trainee) identifies a target to be attacked by indirect artillery fire. A Fire Direction Center (FDC) (played by computer dialog agent) sends an artillery mission. The FO and the FDC coordinate the attack. Using Information State to Improve Dialog Move Identification in a Spoken Dialog SystemHua Ai, Antonio Roque, Anton Leuski, David TraumISSP, University of Pittsburgh; ICT, University of Southern California Our Approach Example Dialog: Use 9 information state features as contextual information; treat the contextual information as features that interpreter can use. Examples of contextual features: Has target location, Phase, Method of fire, etc. FO: steel one niner this is gator niner one adjust fire polar over FDC: gator nine one this is steel one nine adjust fire polar out … Experiments Predicted Classes Results Results shown here are using the best model: CRF tagger trained on ASR output. Refer to paper for other results. Hypothesis Corpus Evaluation Tagging Accuracies Online Evaluation Tagging Accuracies Adding contextual information can improve the performance of the dialog move and dialog parameter taggers and identify ASR quality (c-correct, s-substitute, i-inserted, d-deleted). Conclusion Using contextual information improves the Dialog Move and Dialog Parameter Taggers’ performance It is possible to recover ASR errors using a tagger with contextual information Conditional Random Field Tagger outperforms J48 Decision Tree on DM, DP and word prediction Taggers trained on ASR outputs outperform taggers trained on transcriptions

Contextual Information for Improved Speech Recognition in Dialog Systems

Contextual Information for Improved Speech Recognition in Dialog Systems

Presentation Transcript

Hypothesis?

Hypothesis

Hypothesis

Hypothesis:

Hypothesis

Hypothesis:

Hypothesis

Hypothesis :

Hypothesis

Hypothesis:

Hypothesis:

Hypothesis:

Hypothesis

HYPOTHESIS

Hypothesis tests Hypothesis

Hypothesis

HYPOTHESIS

Hypothesis

Hypothesis

HYPOTHESIS

Hypothesis

Hypothesis