1 / 13

W3C Workshop on Internationalizing SSML SSML Extension for Korean

Workshop : 2005/11/02 (Wed). W3C Workshop on Internationalizing SSML SSML Extension for Korean. Sang-Jin Kim sangjin@icu.ac.kr. Contents. Characteristic of Korean SSML Extension for Chinese Characters in Korean SSML Extension for Homograph Words in Korean Conclusion.

Download Presentation

W3C Workshop on Internationalizing SSML SSML Extension for Korean

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Workshop : 2005/11/02 (Wed) W3C Workshop on Internationalizing SSMLSSML Extension for Korean Sang-Jin Kimsangjin@icu.ac.kr

  2. Contents • Characteristic of Korean • SSML Extension for Chinese Characters in Korean • SSML Extension for Homograph Words in Korean • Conclusion

  3. Characteristic of Korean • Hangul, The Korean Character • Consists of forty letters • 21 vowels (including 13 diphthongs), and 19 consonants • Syllable • V, CV, VC, and CVC (C : consonant, V : vowel) • Eojeol, the word phrase is different from a phrase in English • Completely different from Japanese except for the grammatical structure • Completely different from Chinese although Korean has borrowed many Chinese words and some Chinese characters

  4. Characteristic of Korean • Vowels in Hangul, The Korean Character • Monothong vowels classified according to tongue position and height

  5. Characteristic of Korean • Consonants in Hangul, The Korean Character • Consonants classified according to place and manner of articulation

  6. SSML Extension forChinese Characters in Korean • Chinese Characters in Korean • Present Korean and Japanese use many Chinese Characters • But, pronunciation of the characters is different • Same characters is represented differently according to the country • These simplified characters are not used in Korea

  7. SSML Extension forChinese Characters in Korean • Chinese Characters in Korean • We can write text only with Korean characters • Not unusual to use Chinese characters as well • The pronunciation of the are exactly same

  8. SSML Extension forChinese Characters in Korean • Chinese Characters in Korean TTS • The input text for text-to-speech(TTS) system has to be converted into a phonetic list • If Chinese characters are mixed with Korean characters, they have to be substituted to Korean • We don’t use all Chinese characters, rather there is a frequently-used-Chinese-character-list recommended by our Korean government and its size is 2000 • We need to utilize this list and their pronunciations in the Korean TTS system, since the pronunciations of them are different from Chinese and Japanese

  9. SSML Extension forChinese Characters in Korean • SSML Extension for Chinese Characters in Korean • Same characters but different pronunciation in Chinese Characters according to the country <lexicon xml:lang=”ko” uri=”http://www.multilingual.org/lexicon.file”> <lexicon xml:lang=”ko-CN” uri=”http://www.multilingual.org/Chinese_lexicon_freq_KR.file”> <lexicon xml:lang=”ko-CN” uri=”http://www.multilingual.org/Chinese_lexicon_technical.file”> <lexicon xml:lang=”ja-KR” uri=”http://www.multilingual.org/Chinese_lexicon_JP.file”> <lexicon xml:lang=”cn-KR” uri=”http://www.multilingual.org/Chinese_lexicon_CN.file”>

  10. SSML Extension forHomograph Words in Korean • Homograph Words in Korean • Same word, different pronunciation, different meaning • The difference is “duration”

  11. SSML Extension forHomograph Words in Korean • SSML Extension for Homograph Words in Korean • Only the difference for these words is the duration in pronunciation • necessary to give the duration information to a TTS system for these kinds of words • SSML recommendation supports “say-as” element and “sub” element, these elements cannot handle the above problem successfully

  12. SSML Extension forHomograph Words in Korean • SSML Extension for Homograph Words in Korean • We suggest “tone” tag for this problem • Attribute values for tone element are ‘long’, ‘short’ and ‘default’ would be enough for Korean.

  13. Conclusion • SSML Extension for Chinese Characters in Korean • lexicon element doesn’t support “xml:lang” tag • We suggest xml:lang=“ko”, xml:lang=“ko-CN”, xml:lang=“ja-KR”, xml:lang=“cn-KR” tags • SSML Extension for Homograph Words in Korean • “say-as” and “sub” elements cannot handle homograph problem successfully • We suggest “tone” element • Attribute values, type=“long”, type=“short”, and type=“default” would be enough for Korean

More Related