1 / 10

2005. 11. 3. Myoung-Wan Koo †‡ and Du-Seong Chang † KT † /KAIT ‡

Position Paper for W3C Workshop on Internationalizing SSML The Usage of Part-Of-Speech for Resolving Multiple Pronunciations in SSML. 2005. 11. 3. Myoung-Wan Koo †‡ and Du-Seong Chang † KT † /KAIT ‡. Introduction. Multiple pronunciation problem Same word but different pronunciations

uma-weber
Download Presentation

2005. 11. 3. Myoung-Wan Koo †‡ and Du-Seong Chang † KT † /KAIT ‡

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Position Paper for W3C Workshop on Internationalizing SSMLThe Usage of Part-Of-Speech for Resolving Multiple Pronunciations in SSML 2005. 11. 3. Myoung-Wan Koo†‡ and Du-Seong Chang† KT†/KAIT‡

  2. Introduction • Multiple pronunciation problem • Same word but different pronunciations • Newton: /nju:tən/ v.s. /nu:tən/ • Same spelling but different pronunciations (homograph) • refuse: /rɪ'fju:z/ v.s. /'refju:s/ <?xml version="1.0" encoding="UTF-8"?> <lexicon version="1.0" xmlns="http://www.w3.org/2005/01/pronunciation-lexicon" alphabet="ipa" xml:lang="en-GB"> <lexeme> <grapheme>Newton</grapheme> <phoneme>nju:tən</phoneme> <phoneme>nu:tən</phoneme> </lexeme> <lexeme> <grapheme>refuse</grapheme> <phoneme> rɪ'fju:z </phoneme> <phoneme>'refju:s</phoneme> </lexeme> </lexicon>

  3. Multiple pronunciation in SSML&PLS • SSML • The Speech Synthesis Markup Language Specification Version 1.0 • Pronunciation information in SSML • Phoneme element • Lexicon element • PLS • Pronunciation Lexicon Specification Version 1.0 • Pronunciation information in PLS • Phoneme element • Prefer attribute • They doesn’t fully support the pronunciation lexicon for multiple pronunciations and agglutinative language. •  Part-Of-Speech information is needed

  4. Pronunciation information in PLS (1/2) • Pronunciation Lexicon Specification • Version 1.0/Feb 2005/W3C Voice Browser Working Group • It allow interoperable specification of pronunciation information for either ASR and TTS engines within voice browsing applications. • It is expected to handle multiple pronunciation. • Example of PLS <?xml version="1.0" encoding="UTF-8"?> <lexicon version="1.0" xmlns=“http://www.w3.org/2005/01/pronunciation-lexicon’ alphabet="ipa" xml:lang="en-US"> <lexeme> <grapheme>tomato</grapheme> <phoneme> təmei̥ɾou</phoneme> </lexeme> </lexicon>

  5. Pronunciation information in PLS (2/2) • Prefer attribute of phoneme element • Give one pronunciation high priority among pronunciation candidates. • Effective in speech synthesis • Only in multiple pronunciations for same orthography • Not in homograph problem • refuse: verb/rɪ'fju:z/ v.s. noun/'refju:s/ • No information for ASR systems. <?xml version="1.0" encoding="UTF-8"?> <lexicon version="1.0" xmlns="http://www.w3.org/2005/01/pronunciation-lexicon" alphabet="ipa" xml:lang="en-GB"> <lexeme> <grapheme>Newton</grapheme> <phoneme prefer="true">nju:tən</phoneme> <phoneme>nu:tən</phoneme> </lexeme> </lexicon>

  6. Typical Korean TTS system structure Structural Information Morphemes, POS Phonemes, POS Phonemes, Prosody Text Morphological Analyzer Grapheme-to- Phoneme Prosody Analysis Waveform production Speech

  7. POS for resolving multiple pronunciations • POS information can reduce the overhead of resolving multiple pronunciations in ASR and TTS systems. • The word “refuse” can have two different pronunciations depending on pos information. • Proposal: POS attribute <?xml version="1.0" encoding="UTF-8"?> <lexicon version="1.0" xmlns="http://www.w3.org/2005/01/pronunciation-lexicon" alphabet="ipa" xml:lang="en-US"> <lexeme> <grapheme>refuse</grapheme> <phoneme pos=“verb”> rɪ'fju:z </phoneme> </lexeme> <lexeme> <grapheme>refuse</grapheme> <phoneme pos=“noun”>'refju:s</phoneme> </lexeme> </lexicon>

  8. POS information for LVCSR • Large vocabulary continuous speech recognition of agglutinative language • Basic unit is morpheme (pseudo-morpheme) for reducing the vocabulary size. • Many homographs in the recognition dictionary. • POS information help system to get a proper pronunciation in a dictionary as well as to resolve multiple pronunciations in some words. • It reduce the search time since POS information could cut the wrong word connection in the first stage, not in the semantic interpretation stage.

  9. Proposals • Proposal 1: POS attribute of phoneme element • Optional attribute • Proposal 2: POS element • Lexeme element contain optional POS elements. • POS values: language-specific • Type: allow vendor-specific POS type? • Outstanding POS set: Penn Treebank, Sejong project (Korean) <?xml version="1.0" encoding="UTF-8"?> <lexicon version="1.0" xmlns="http://www.w3.org/2005/01/pronunciation-lexicon" alphabet="ipa" xml:lang="en-US"> <lexeme> <grapheme>refuse</grapheme> <phoneme> rɪ'fju:z </phoneme> <pos> verb </verb> </lexeme> </lexicon>

  10. Conclusion • No element or attribute for resolving multiple pronunciations • In current SSML, PLS • POS information • can reduce the overhead of resolving multiple pronunciations in ASR and TTS systems. • Can reduce the search time in a large vocabulary recognition system. • Can be effective in agglutinative language. • Proposals • POS element • POS attribute

More Related