1 / 13

Mandarin-English Information (MEI) Johns Hopkins University Summer Workshop 2000

Mandarin-English Information (MEI) Johns Hopkins University Summer Workshop 2000. presented at the TDT-3 Workshop February 28, 2000 Helen Meng The Chinese University of Hong Kong Sanjeev Khudanpur Johns Hopkins University Douglas W. Oard University of Maryland

rainer
Download Presentation

Mandarin-English Information (MEI) Johns Hopkins University Summer Workshop 2000

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Mandarin-English Information (MEI)Johns Hopkins University Summer Workshop 2000 presented at the TDT-3 Workshop February 28, 2000 Helen Meng The Chinese University of Hong Kong Sanjeev Khudanpur Johns Hopkins University Douglas W. Oard University of Maryland Hsin-Min Wang Academia Sinica, Taiwan

  2. Outline • Background • The MEI Project • Multiscale Retrieval • Multiscale Translation • Using the TDT-3 collection • Schedule

  3. Motivation • Emerging speech retrieval applications • E.g., http://speechbot.research.compaq.com • Increasing need for translingual audio search • 1896 Internet accessible radio & TV stations • 529 of these (28%) are not in English source: www.real.com

  4. The Big Picture MEI Translingual Audio Search Translingual Audio Browsing Speech to Speech Translation Select Examine English Query English Audio

  5. Related Work • TREC Spoken Document Retrieval • Close coupling of recognition and retrieval • TREC Cross-Language Retrieval • Close coupling of translation and retrieval • TDT-3 • Coupling recognition, translation and retrieval • Using baseline recognizer transcripts

  6. The MEI Project • Closely coupling recognition and translation • For the purpose of retrieval • English text queries, Mandarin news audio • Specific research issues: • Multi-scale retrieval • Multi-scale translation

  7. /j/ /ng/ Preme/Toneme /i/ /a/ /ji/ /ang/ Preme/Core Final /j/ /iang/ Initial/Final Multi-scale Analysis of Mandarin

  8. Multi-scale Retrieval • Subword-scale • Syllable lattice matching [Chen, Wang & Lee, 2000] • Overlapping syllable n-grams [Meng et al., 1999] • Skipped syllable pairs [Chen, Wang & Lee, 2000] • Syllable confusion matrix [Meng et al., 1999] • Word-scale • Structured queries [Pirkola, 1998] • Multi-scale • Unified retrieval using a merged feature set • Scale-optimized retrieval with result-set merging

  9. Why Multi-scale Retrieval? • Word-based retrieval exploits lexical knowledge • Enhances precision • Subword units achieve complete phonological coverage • Enhances recall • Combination of evidence may beat either alone

  10. Multi-scale Translation • Word-scale • Dictionary-based [Levow & Oard, 2000] • Parallel corpora [Nie, 1999] • Comparable corpora [Fung, 1998] • Subword-scale • Cross-language phonetic map [Knight & Graehl, 1997] • /bei2 ai4 er3 lan2/ • Kosovo (/ke1-sou3-wo4/, /ke1-sou3-fo2/, /ke1-sou3-fu1/, /ke1-sou3-fu2/)

  11. Using the TDT-3 Collection • English queries formed from topic descriptions • 2-4 words (simulated Web search) • Full topic description (simulated routing profile) • Mandarin broadcast news audio (121 hours) • Story-boundary-known condition (4624 stories) • Baseline recognizer transcripts provide words

  12. Schedule Six Weeks: Summer Workshop Planning Meeting Second MEI Team Planning Meeting First MEI Team Planning Meeting Dec Feb Apr Jun Aug

  13. Things We Need • Ideas • To sharpen our focus • Connections • To build a community of interest • Resources • To build on what others have done

More Related