1 / 14

MAI Internship April-May 2002

MAI Internship April-May 2002. What?. The AST Project promotes development of speech technology for official languages of South Africa SAEnglish, Afrikaans, Zulu, Xhosa, Sesotho Create reusable databases & software Prototype hotel booking dialogue system 2000-2003.

mcgurk
Download Presentation

MAI Internship April-May 2002

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MAI Internship April-May 2002

  2. What? • The AST Project promotes development of speech technology for official languages of South Africa • SAEnglish, Afrikaans, Zulu, Xhosa, Sesotho • Create reusable databases & software • Prototype hotel booking dialogue system • 2000-2003

  3. AST dialogue system: basics Telephone Network DATABASE Speech Synthesis Speech Recognition Dialogue Manager Natural LanguageUnderstanding

  4. AST Speech Database • Use?  input ASR: acoustic training •  output ASR: dictionary • Start from scratch, even for SAE • Telephone data based on SpeechDat • Datasheet utterances • Hierarchical recruiting method • Labeling Tool: PRAAT

  5. Language Spoken Code No. of Speakers 1 English (E) Speech varieties: Mother-tongue English Black English Coloured English Asian English Afrikaans English EE BE CE ASE AE 1500-2000 300-400 300-400 300-400 300-400 300-400 2 isiXhosa (X) XX 300-400 3 Sesotho (S) SS 300-400 4 isiZulu (Z) ZZ 300-400 5 Afrikaans (A) Speech varieties: Mother-tongue Afrikaans Black Afrikaans Coloured Afrikaans AA BA CA 900-1200 300-400 300-400 300-400

  6. AST Speech Database Acoustic signal Manual labour Orthographic annotation Rules & dictionary: Patana Phonemic transcription Forced alignment: HTK Phonetic alignment

  7. AST Speech Recognition • Difficult: • Speaker independent, noisy conditions • Medium-size vocabulary (10.000 words) • Training data sparse • Not so difficult: • Dialogue Manager helps • Phoneme-based HMMs  future diphones • Finite-state language model • Pitch & clicks African languages ignored

  8. AST Natural Language Understanding • Same finite-state network as language model recogniser •  +: all utterances ‘understood’ • -: FSG are limited • Makes no sense to recognise more than we can understand • Semantic labels are activated • Alternative: robust parsing (Phoenix, ATIS)

  9. Meaning Recognised utterance Grammar ID Grammar ID AST Natural Language Understanding Speech Recognition Dialogue Manager NLU FSG

  10. AST Natural Language Understanding • Embedded semantic tags: • ‘drie honderd duisend agt en neëntig’  3 0 0 0 9 8 t1=3 t2=0 t3=0 V6=3 V5=0 V4=0 V3=0 V2=9 V1=8

  11. AST Dialogue Manager • Trade-off: naturalness  response restriction • System-directed: predictability user utterances, simple dialogues • Mixed-initiative: shorter dialogues, more recognition errors • User-initiative: unpopular

  12. AST Dialogue Manager • Design: • Early focus on users and task • Wizard-of-Oz: pay no attention to the man behind the curtain • System-in-the-loop • Finite-state structure because of simplicity and functionality • Possible frame-based approach in future

  13. AST Speech Synthesis • Fixed machine utterances: pre-recorded speech • Database queries: limited-domain synthesis (Festival platform)

  14. Conclusion • Finite-state approach in • Recogniser • NLU component • Dialogue manager • Workable prototype • New fundings 2003

More Related