1 / 27

Pronunciation Extraction Through Cross-Lingual Word-to-Phoneme Alignment

Pronunciation Extraction Through Cross-Lingual Word-to-Phoneme Alignment. Felix Stahlberg, Tim Schlippe , Stephan Vogel, Tanja Schultz. Outline. Motivation Word Segmentation Word Pronunciation Extraction Experiments Corpus Evaluation Measures Which Translation Is Favorable?

jadyn
Download Presentation

Pronunciation Extraction Through Cross-Lingual Word-to-Phoneme Alignment

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Pronunciation Extraction Through Cross-Lingual Word-to-Phoneme Alignment Felix Stahlberg, Tim Schlippe, Stephan Vogel, Tanja Schultz

  2. Outline • Motivation • Word Segmentation • Word Pronunciation Extraction • Experiments • Corpus • Evaluation Measures • Which Translation Is Favorable? • Combining Multiple Translations • Analysis of the Results – Common errors • Conclusion and Future Work Pronunciation Extraction Through Cross-lingual Word-to-Phoneme Alignment

  3. Scenario Say “I am sick.” in your mother tongue. /b/ /o/ /l/ /e/ /s/ /t/ /a/ /n/ /s/ /a/ /m/ /z/ /d/ /r/ /a/ /v/ /s/ /a/ /m/ Say “I am healthy.” in your mother tongue. • /s/ /a/ /m/ seems to be a word (meaning I am) • /b/ /o/ /l/ /e/ /s/ /t/ /a/ /n/ seems to be a word (meaning sick) • /z/ /d/ /r/ /a/ /v/ seems to be a word (meaning healthy) Pronunciation Extraction Through Cross-lingual Word-to-Phoneme Alignment

  4. Long Term Goal • We obtain • Transcribed audio data (in terms of IDs) • Pronunciation dictionary • Language model 1 7 3 5 4 6 /l/ /ae/ /ng/ /w/ /ah/ /jh/ /v/ /er/ /s/ /ae/ /n/ /d/ /th/ /ich/ /ng/ /k/ /s/ /f/ /er/ /y/ /uw/ Train ASR System (future work) 2 8 2 9 2 10 1 7 3 5 4 6 Pronunciation Extraction Through Multilingual Word-to-Phoneme Alignment

  5. Applications • http://www.fotopedia.com/items/_avPIZmqM3w-6716j3F1J-U Dialects Speech processingfor non-writtenandunder-resourcedlanguages Pronunciation Extraction Through Cross-lingual Word-to-Phoneme Alignment

  6. Roadmap • Need phonetictranscriptionofwhatissaid • Usuallyphonemerecognizer • In thiswork:Perfectphonetictranscriptions • Focus todefineandevaluatesteps for extracting a pronunciation dictionary from the phoneme sequences Pronunciation Extraction Through Cross-lingual Word-to-Phoneme Alignment

  7. Roadmap • Howcanwe find wordboundariesandsegmentphonemesequencesintowordunits? • Inprovedsegmentationwithcross-lingualinformation • Alignmentbetweenwordunits in writtentranslationandphonemesequencesoftargetlanguage Pronunciation Extraction Through Cross-lingual Word-to-Phoneme Alignment

  8. Word-Segmentation – Word-to-Phoneme Alignments Sentence: Sprache die für dichtet und denkt dich German (Source Language) English (Target Language) Phoneme sequence: l ae ng g w ah jh v er s ae n d th ih ng k s f er y uw Audio: (Besacier et. al., 2006) (Stükerand Waibel, 2008) (StükerandBesacier, 2009) (Stahlberg et. al., 2012) Phoneme Recognizer Pronunciation Extraction Through Cross-lingual Word-to-Phoneme Alignment

  9. Word-Segmentation – Results (Stahlberg et. al., 2012) http://code.google.com/p/pisa/ Pronunciation Extraction Through Cross-lingual Word-to-Phoneme Alignment

  10. Roadmap Pronunciation Extraction Through Cross-lingual Word-to-Phoneme Alignment

  11. Word-PronunciationExtraction (Stahlberg et. al, 2013) Pronunciation Extraction Through Cross-lingual Word-to-Phoneme Alignment

  12. Experiments – Corpus • Parallel data from the Christian Bible (30.6k verses,14 written translations) • Variety of linguistic approaches to Bible translation (dynamic equivalence, formal equivalence, and idiomatic translation) • English as “under-resourced target language” (deeper insight in strengths and weaknesses of our algorithm)  ESV Bible • “Perfect phoneme recognizer”:Replaced words in ESV Bible and removed word boundaries Pronunciation Extraction Through Cross-lingual Word-to-Phoneme Alignment

  13. Evaluation Measures (1) Set of word IDs Pronunciation Extraction Through Cross-lingual Word-to-Phoneme Alignment

  14. Evaluation Measures (2) Out-Of-Vocabulary Rate (OOV-Rate) Set of word IDs Pronunciation Extraction Through Cross-lingual Word-to-Phoneme Alignment

  15. Evaluation Measures (3) Phoneme Error Rate (PER) Set of word IDs Pronunciation Extraction Through Cross-lingual Word-to-Phoneme Alignment

  16. Evaluation Measures (4) Hypo/Ref Ratio Set of word IDs Pronunciation Extraction Through Cross-lingual Word-to-Phoneme Alignment

  17. Which Translation Is Favorable? –Distribution ofeditdistances Numberofextractedvocabularyentries # entries Phoneme Error Rate (PER) Distribution oftheeditdistancesbetweentheextractedpronunciationsandthenearestentryin thereferencedictionaryfor all 14 sourcetranslations Numberofextractedvocabularyentriescloseto real targetlanguagewords (<0.1 editdistance) Edit distancesofextractedvocabularyentriestothenextreferencevocabularyentry Pronunciation Extraction Through Cross-lingual Word-to-Phoneme Alignment

  18. Which Translation Is Favorable? – Impact of 4 factors to our evaluation measures • ∆ vocabulary size:Difference between vocabulary size of the source translation and size of the ESV Bible • ∆ average number of words per verse:Difference between average verse length in the source translation and in the ESV Bible • ∆ average word frequency:Difference between the average number of word repetitions in the source translation and in the ESV Bible • IBM-4 PPL:To measure the general correspondence of the translation to IBM-Model based alignment models, we run GIZA++ with default configuration at the word level and use the final perplexity of IBM-Model 4 Pronunciation Extraction Through Cross-lingual Word-to-Phoneme Alignment

  19. Which Translation Is Favorable? – Correlationofevaluationmeasures Pronunciation Extraction Through Cross-lingual Word-to-Phoneme Alignment

  20. Combining multiple translations • Concatenate pronunciations and remove homophones • Combining all 14 translations results in a dictionary with only 7.9% OOV rate, • But more than 9 of 10 dictionary entries are extracted unnecessarily (Hypo/Ref ratio 10.7:1) Evaluation measuresoverthenumberofcombinedsourcetranslations Pronunciation Extraction Through Cross-lingual Word-to-Phoneme Alignment

  21. Common Errors (1) • Off-by-onealignmenterrors Contextinformationmaybehelpful Pronunciation Extraction Through Cross-lingual Word-to-Phoneme Alignment

  22. Common Errors (2) • Different words with the same stem are merged together Clustering issue Pronunciation Extraction Through Cross-lingual Word-to-Phoneme Alignment

  23. Common Errors (3) • Missing word boundaries between words often occurring in the same context Cross-lingualinformationof multiple languagesmayhelp Pronunciation Extraction Through Cross-lingual Word-to-Phoneme Alignment

  24. Summary • Speech processing in non-written and under-resourced languages or dialects • Cross-lingual information helps to find word boundaries • Proposed steps for extracting a pronunciation dictionary with word IDs from these segmentations and alignments • Pronunciation quality is still not good enough for productive use • Need better compensation for alignment and phoneme recognition errors when extracting pronunciations • Initial approach for combining dictionaries from multiple translations drops OOV rate, but increases number of unnecessary entries Pronunciation Extraction Through Cross-lingual Word-to-Phoneme Alignment

  25. Possible Next Steps • Iterative extraction • Better clustering • Analysis for different cluster algorithms • Add contextual information • Use information from multiple source languages • Integrate monolingual word and syllable segmentation • Real phoneme recognizer • How to bootstrap the phoneme recognizer? – maybe multilingual voting and adaptation techniques based on confidence score Pronunciation Extraction Through Cross-lingual Word-to-Phoneme Alignment

  26. ¡Muchas gracias!¡Moltesgràcies! Pronunciation Extraction Through Cross-lingual Word-to-Phoneme Alignment

  27. References Pronunciation Extraction Through Cross-lingual Word-to-Phoneme Alignment

More Related