1 / 32

Annotation of speech from the phonetics/phonology perspective

Annotation of speech from the phonetics/phonology perspective. Bettina Braun & Jürgen Trouvain. Fachrichtung 4.7, Institut für Phonetik. 15.02.2002. Manipulating text vs. speech [1]. text file manipulat ion "vowel-only" version

carney
Download Presentation

Annotation of speech from the phonetics/phonology perspective

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Annotation of speech from the phonetics/phonology perspective Bettina Braun & Jürgen Trouvain Fachrichtung 4.7, Institut für Phonetik 15.02.2002

  2. Manipulating text vs. speech [1] text file manipulation "vowel-only" version remove all consonantletters, replace them with a space, so that only the vowels are left e ea e o e a o o o o : a e ou y i e o i i a eu y e i e a e oo . Annotation of speech

  3. Manipulating text vs. speech [2] text file manipulation"consonants-only" version remove all vowel letters, replace them with aspace, so that only the consonants are left Th w th r f r c st f r t m rr w: r th r cl d n th m n ng w th f ws nn sp lls n th ft n n. Annotation of speech

  4. Manipulating text vs. speech [3] • The weather forecast for tomorrow: rather cloudy in the morning with a fewsunny spells in the afternoon. • speech file manipulation • original recording, not manipulated • "consonants-only" version: vowel segments replaced with silence • "vowels-only" version: consonant segments replaced with silence Annotation of speech

  5. Coarticulation • articulating means • articulator in motion, not in fixed position • articulators move continously, not discretely • articulatory movements temporally overlap Annotation of speech

  6. originalvowelsonlyvowelsonlywithoutsilences Annotation of speech

  7. Timing • information of consonant durations:silence is more than nothing Annotation of speech

  8. Speech melody • information about fundamental frequency (F0) in the voiced vowel segments • with F0 variation • without any F0 variation (monotonous) Annotation of speech

  9. Annotation of sound segments: discreteness in mind & in physics • "Es ist 8 Uhr morgens." m m m o O g g e @ n n s s s r r graphemes phonemes phones O6 N Annotation of speech

  10. Annotation of sound segments: discrete units? • "Die Nacht haben Maiers gut geschlafen." • "…………… haben Maier ……………………." • phonemic h a: b @ n m aI @ r s • acoustic-phonetic h a: b m aI 6 s • articulatory phonetic h a: b n m aI 6 s(possibly) Annotation of speech

  11. Segmentation of sound segments: degree of discreteness • "Wer möchte noch Milch?" • clear segmentation: • closure and closure release in [t] in "möch t e" • unclear segmentation: • [I l] in "M il ch" Annotation of speech

  12. Kiel Corpus read & spontaneous speech • orthography • phonemic (canonical) form • realised form • word & sentence boundary • manually labelled Annotation of speech

  13. From sounds to syllables:how many syllables? • semi-vowels: syllabic or not? • Studie Stu - di - e vs. Stu - die • Piano Pi - a - no vs. Pia - no • size of auditory window • "… mit mir diese Dienstreise zu unternehmen, …" • rei - se - zu - un - ter • zu - un - ter • zu - un Annotation of speech

  14. From sounds to syllables:where is the syllable boundary? • ambisyllabic consonants & onset principles • Mitte /m I - t @/ vs. /m I _t @/ • Adler /a: t - l @ r/ vs. / a: - d l @ r/ • Fenster /f E n s - t E r/ vs. /f E n - s t E r/ • resyllabification • "Wenn es Ihnen da 5 Tage lang irgendwo passen würde." • /v E n - E s/ vs. [v E _ n E s] Annotation of speech

  15. Controlled elicitation of spontaneous speech • Monologues • Erzählung • Bildbeschreibung • Dialogues: Task-oriented data collection • Map Task • Appointment-making • Degree of naturalness? • Controlled elicitation Annotation of speech

  16. Controlled elicitation of spontaneous speech Annotation of speech

  17. Problems for annotation: non-speech in speech • Many non-linguistic signal portions: • swallowing • lip-smacking • breathing • unfilled, filled pauses • laughter • hesitational lengthening Partly overlapping with speech Annotation of speech

  18. Functions of prosody • Generally: Features above the segmental level  suprasegmental Annotation of speech

  19. Phonetic encoding of prosody • perceived pitch over time • duration • intensity • spectral quality Annotation of speech

  20. Prosodic annotation: Signal oriented • Tilt-model (Taylor 2000) • intonational “events” • continuous parameters (tilt parameter): • amplitude: sum of the magnitude of rise and fall • duration: sum of rise and fall durations • tilt: shape of the event 1.0 0.5 0 Annotation of speech

  21. Prosodic annotation: Autosegmental, phonological • GToBI (Grice et al.) • Tonal tier, break tier • Two levels of pitch-heights (L, H) • Simple and complex pitch accents • Association to word stress marked by * • Exact temporal alignment • Boundary tones marked by % • Strength of prosodic breaks (3, 4) Annotation of speech

  22. Prosodic annotation: Example tonal orth. break misc Annotation of speech

  23. GToBI Labelfiles 46.836392 113 also 46.958899 113 ich 47.171623 113 bin 47.555335 113 genau 48.180049 113 waagerecht 48.468170 113 rechts 48.613576 113 von 48.726670 113 der 49.246344 113 Goldmine orthografic tones breaks 47.469173 115 L+H* 47.555339 115 H- 47.768061 115 H* 47.851534 115 < 48.320061 115 !H* 48.812822 115 !H* 49.240958 115 L-% 47.555339 123 3 49.249036 123 4 Annotation of speech

  24. Prosodic annotation: Phonological, single-layer • KIM (Kohler 1995) • no suprasegmental tiers => efficient analysis of segment-prosody interaction • differentiated from segmental labels by special diacritica • time marks for prosodic events anchored to word boundaries. • Example: Annotation of speech

  25. 14721 #&0 0.9200000 14721 ##m 0.9200000 16051 $i:6+ 1.0031250 16935 #&0 1.0583750 16935 ##g 1.0583750 18093 $-h 1.1307500 18564 $'u: 1.1601875 19314 $t 1.2070625 19981 $-h 1.2487500 20336 #&0. 1.2709375 20336 #&2) 1.2709375 20336 ##p 1.2709375 21501 $-h 1.3437500 22440 $'a 1.4024375 23700 $s 1.4811875 25408 $@- 1.5879375 25408 $n 1.5879375 28935 #, 1.8083750 13 #c: 0.0007500 13 #&2 0.0007500 13 ##v: 0.0007500 13 $Q- 0.0007500 13 $E: 0.0007500 2147 $m 0.1341250 4787 #&PGn 0.2991250 4787 #&2( 0.2991250 4787 ##d 0.2991250 6243 $-h 0.3901250 6619 $'i: 0.4136250 7569 $n 0.4730000 8265 $s 0.5165000 9202 $t 0.5750625 9527 $-h 0.5953750 9995 $a: 0.6246250 10648 $k-x 0.6654375 11405 #&0 0.7127500 11405 ##v 0.7127500 12528 $Y6 0.7829375 13946 $d 0.8715625 14275 $@+ 0.8921250 Annotation of speech

  26. 14721 #&0 0.9200000 14721 ##m 0.9200000 16051 $i:6+ 1.0031250 16935 #&0 1.0583750 16935 ##g 1.0583750 18093 $-h 1.1307500 18564 $'u: 1.1601875 19314 $t 1.2070625 19981 $-h 1.2487500 20336 #&0. 1.2709375 20336 #&2) 1.2709375 20336 ##p 1.2709375 21501 $-h 1.3437500 22440 $'a 1.4024375 23700 $s 1.4811875 25408 $@- 1.5879375 25408 $n 1.5879375 28935 #, 1.8083750 13 #c: 0.0007500 13 #&2 0.0007500 13 ##v: 0.0007500 13 $Q- 0.0007500 13 $E: 0.0007500 2147 $m 0.1341250 4787 #&PGn 0.2991250 4787 #&2( 0.2991250 4787 ##d 0.2991250 6243 $-h 0.3901250 6619 $'i: 0.4136250 7569 $n 0.4730000 8265 $s 0.5165000 9202 $t 0.5750625 9527 $-h 0.5953750 9995 $a: 0.6246250 10648 $k-x 0.6654375 11405 #&0 0.7127500 11405 ##v 0.7127500 12528 $Y6 0.7829375 13946 $d 0.8715625 14275 $@+ 0.8921250 Annotation of speech

  27. 14721 #&0 0.9200000 14721 ##m 0.9200000 16051 $i:6+ 1.0031250 16935 #&0 1.0583750 16935 ##g 1.0583750 18093 $-h 1.1307500 18564 $'u: 1.1601875 19314 $t 1.2070625 19981 $-h 1.2487500 20336 #&0. 1.2709375 20336 #&2) 1.2709375 20336 ##p 1.2709375 21501 $-h 1.3437500 22440 $'a 1.4024375 23700 $s 1.4811875 25408 $@- 1.5879375 25408 $n 1.5879375 28935 #, 1.8083750 13 #c: 0.0007500 13 #&2 0.0007500 13 ##v: 0.0007500 13 $Q- 0.0007500 13 $E: 0.0007500 2147 $m 0.1341250 4787 #&PGn 0.2991250 4787 #&2( 0.2991250 4787 ##d 0.2991250 6243 $-h 0.3901250 6619 $'i: 0.4136250 7569 $n 0.4730000 8265 $s 0.5165000 9202 $t 0.5750625 9527 $-h 0.5953750 9995 $a: 0.6246250 10648 $k-x 0.6654375 11405 #&0 0.7127500 11405 ##v 0.7127500 12528 $Y6 0.7829375 13946 $d 0.8715625 14275 $@+ 0.8921250 Annotation of speech

  28. 14721 #&0 0.9200000 14721 ##m 0.9200000 16051 $i:6+ 1.0031250 16935 #&0 1.0583750 16935 ##g 1.0583750 18093 $-h 1.1307500 18564 $'u: 1.1601875 19314 $t 1.2070625 19981 $-h 1.2487500 20336 #&0. 1.2709375 20336 #&2) 1.2709375 20336 ##p 1.2709375 21501 $-h 1.3437500 22440 $'a 1.4024375 23700 $s 1.4811875 25408 $@- 1.5879375 25408 $n 1.5879375 28935 #, 1.8083750 13 #c: 0.0007500 13 #&2 0.0007500 13 ##v: 0.0007500 13 $Q- 0.0007500 13 $E: 0.0007500 2147 $m 0.1341250 4787 #&PGn 0.2991250 4787 #&2( 0.2991250 4787 ##d 0.2991250 6243 $-h 0.3901250 6619 $'i: 0.4136250 7569 $n 0.4730000 8265 $s 0.5165000 9202 $t 0.5750625 9527 $-h 0.5953750 9995 $a: 0.6246250 10648 $k-x 0.6654375 11405 #&0 0.7127500 11405 ##v 0.7127500 12528 $Y6 0.7829375 13946 $d 0.8715625 14275 $@+ 0.8921250 Annotation of speech

  29. 14721 #&0 0.9200000 14721 ##m 0.9200000 16051 $i:6+ 1.0031250 16935 #&0 1.0583750 16935 ##g 1.0583750 18093 $-h 1.1307500 18564 $'u: 1.1601875 19314 $t 1.2070625 19981 $-h 1.2487500 20336 #&0. 1.2709375 20336 #&2) 1.2709375 20336 ##p 1.2709375 21501 $-h 1.3437500 22440 $'a 1.4024375 23700 $s 1.4811875 25408 $@- 1.5879375 25408 $n 1.5879375 28935 #, 1.8083750 13 #c: 0.0007500 13 #&2 0.0007500 13 ##v: 0.0007500 13 $Q- 0.0007500 13 $E: 0.0007500 2147 $m 0.1341250 4787 #&PGn 0.2991250 4787 #&2( 0.2991250 4787 ##d 0.2991250 6243 $-h 0.3901250 6619 $'i: 0.4136250 7569 $n 0.4730000 8265 $s 0.5165000 9202 $t 0.5750625 9527 $-h 0.5953750 9995 $a: 0.6246250 10648 $k-x 0.6654375 11405 #&0 0.7127500 11405 ##v 0.7127500 12528 $Y6 0.7829375 13946 $d 0.8715625 14275 $@+ 0.8921250 Annotation of speech

  30. Data structures and retrieval • Mostly pure textfiles, aligned to signal • “Retrieval” using script languages • (GToBI in EMU-Format) • XML-formats Annotation of speech

  31. What for? • Basic research • Rhythmic patterns • Speech rate measurements (units, domains) • Temporal alignment & scaling of pitch accents • Differentiated analysis of pitch range • Speech technology • Modelling accentuation in ASR • Speech rate in ASR • Intonation and timing for synthesis Annotation of speech

  32. Bibliography • Alwan, A., H.Bourlard and S.Furui (eds). 2001. Speech Communication33. Special Issue on Speech Annotation and Corpus Tools. • Grice,M., S.Baumann and R.Benzmüller (to appear). German ToBI. In: S.Jun (ed). Prosodic Typology • Grice, M. et al. (2000). Representation and annotation of dialogue. In: Handbook of Multimodal and Spoken Dialogue Systems. Resources, Terminology and Product Evaluation. Kluwer, pp. 1-101. • Kohler, K.J. (ed) 1995. Kieler Arbeitsberichte29. • Taylor, P. 2000. Analysis and Synthesis of Intonation Using the Tilt Model. In: JASA107(3). pp. 1697-1714. Annotation of speech

More Related