1 / 43

解析國語連續語流基頻信號中的字調、句調及語篇韻律

解析國語連續語流基頻信號中的字調、句調及語篇韻律. 中央研究院語言學研究所語音實驗室 蘇昭宇 morison@gate.sinica.edu.tw http://phslab.ling.sinica.edu.tw/. Outline. What is fluent speech prosody? Tones and Intonation? Why the HPG framework (Tseng, 2004—2008)? How to decompose F0 contours? The nature of the Fujisaki model

ilya
Download Presentation

解析國語連續語流基頻信號中的字調、句調及語篇韻律

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 解析國語連續語流基頻信號中的字調、句調及語篇韻律解析國語連續語流基頻信號中的字調、句調及語篇韻律 中央研究院語言學研究所語音實驗室 蘇昭宇 morison@gate.sinica.edu.tw http://phslab.ling.sinica.edu.tw/ NeGSST 2008

  2. Outline • What is fluent speech prosody? Tones and Intonation? • Why the HPG framework (Tseng, 2004—2008)? • How to decompose F0 contours? • The nature of the Fujisaki model • Auto-extracting of the Fujisaki parameters • Calculating layered contribution by the HPG • Modeling tones, intonation and additional components • Investigating prosodic style variation • Further evidence of the HPG framework • Domains and units of boundaries (acoustic features) in fluent speech • Pause or No Pause for boundaries ? • Boundary effects and the relative-ness of supra-segmental signals NeGSST 2008

  3. The HPG framework—Speech Paragraph (context of cross-over & adjacency) (Tseng 2005) adjacency Discourse Cross-over PG PG BG BG BG PPh# PPh# PPh PPhs PPh PPh# DM/PF DM/PF DM/PF PPh PW PW PW PW PW PW PW PW PW PW PW SYL SYL SYL SYL SYL SYL SYL SYL SYL SYL SYL SYL Prosodic units Syllables (SYL), Prosodic Words (PW), Prosodic Phrase (PPh), Breath Groups (BG), Prosodic phrase Groups (PG) and corresponding Boundary Breaks B1, B2, B3, B4 and B5 where SYL/B1< PW/B2< PPh/B3< BG/B4< PG/B5 Output Prosody Is Super-Positional and Cumulative (Tseng et al, 2004, 2005, 2006) NeGSST 2008

  4. Prosodic Units and Boundaries in the Framework Prosodic Group B5 Breath Group B4 B4 PG-Initial PG-Medial PG-Final B3 B3 PW PW .. .. .. .. .. .. .. .. .. .. .. .. .. PW B2 B2 B2 B2 B2 B2 B2 B2 B2 B2 B2 B2 B2 NeGSST 2008

  5. Outline • What is fluent speech prosody? Tones and Intonation? • Why the HPG framework (Tseng, 2004—2008)? • How to decompose F0 contours? • The nature of the Fujisaki model • Auto-extracting of the Fujisaki parameters • Calculating layered contribution by the HPG • Modeling tones, intonation and additional components • Investigating prosodic style variation • Further evidence of the HPG framework • Domains and units of boundaries (acoustic features) in fluent speech • Pause or No Pause for boundaries ? • Boundary effects and the relative-ness of supra-segmental signals NeGSST 2008

  6. Fujisaki Model (Fujisaki 1984) NeGSST 2008

  7. Outline • What is fluent speech prosody? Tones and Intonation? • Why the HPG framework (Tseng, 2004—2008)? • How to decompose F0 contours? • The nature of the Fujisaki model • Auto-extracting of the Fujisaki parameters • Calculating layered contribution by the HPG • Modeling tones, intonation and additional components • Investigating prosodic style variation • Further evidence of the HPG framework • Domains and units of boundaries (acoustic features) in fluent speech • Pause or No Pause for boundaries ? • Boundary effects and the relative-ness of supra-segmental signals NeGSST 2008

  8. Original F0 contour Auto-extraction based on Mixdorff’s method (2000, 2003) highpass filter (stop frequency at 0.5 Hz) High-frequency contour (HFC) Low-frequency contour (LFC) NeGSST 2008

  9. B3 optimization B3 The decision of commands Low-frequency contour (LFC) from Mixdorff’s filter Optimization criteria: Min NeGSST 2008

  10. The results of auto-extraction based on Mixdorff’s method in Mandarin- Phonetics Lab, Academia Sinica NeGSST 2008

  11. Outline • What is fluent speech prosody? Tones and Intonation? • Why the HPG framework (Tseng, 2004—2008)? • How to decompose F0 contours? • The nature of the Fujisaki model • Auto-extracting of the Fujisaki parameters • Calculating layered contribution by the HPG • Modeling tones, intonation and additional components • Investigating prosodic style variation • Further evidence of the HPG framework • Domains and units of boundaries (acoustic features) in fluent speech • Pause or No Pause for boundaries ? • Boundary effects and the relative-ness of supra-segmental signals NeGSST 2008

  12. SYL PW PW information PW model Single tone model Final output of F0 contour by HPG Tone1 Tone3 PG information PPh BG PG Single phrase model PG model (Multiple phrase) Model additional component of F0 contour by HPG Tone (SYL) & above (PW) Prosodic phrase (PPh) & above (BG&PG) NeGSST 2008

  13. Examples without PG-effect with PG-effect One expected cell mean can’t approach LFC well, PG-initial and PG-final especially. NeGSST 2008

  14. Model additional component of F0 contour by HPG Tone (SYL) & above (PW) Prosodic phrase (PPh) & above (BG&PG) NeGSST 2008

  15. BG PPh PPh BG PPh PW SYL PW PW PW Residues SYL SYL SYL SYL Residues Residues Layered Contributions Linear regression • Predict syllable duration by SYL category from the bottom of HPG • Error between prediction and real value is regarded as the effect of PW instead of unpredicted variation. • The same predictions are repeated at each HPG layer from the SYL upward to PG • Final output is the sum of prediction in each prosodic layer and prediction accuracy of each layer is regarded as the layered Contributions NeGSST 2008

  16. Speech material • Two types of Mandarin speech corpus • Read speech of (1.) plain text of 26 discourse pieces by one male • M051 and one female F051 • (2.) three rhyme formats of Chinese Classics by • one male M056 and one female F054 • Pre-analysis annotation • Automatically labeled segmental identities with HTK toolkit • Subsequent manual tagging with Sinica COSPRO Toolkit • Spot-checking for annotated segments • Table 1 Summary of speech data by corpus type NeGSST 2008

  17. Outline • What is fluent speech prosody? Tones and Intonation? • Why the HPG framework (Tseng, 2004—2008)? • How to decompose F0 contours? • The nature of the Fujisaki model • Auto-extracting of the Fujisaki parameters • Calculating layered contribution by the HPG • Modeling tones, intonation and additional components • Investigating prosodic style variation • Further evidence of the HPG framework • Domains and units of boundaries (acoustic features) in fluent speech • Pause or No Pause for boundaries ? • Boundary effects and the relative-ness of supra-segmental signals NeGSST 2008

  18. Aa prediction & Tone Model Cumulative accuracy of Aa prediction Tone Model NeGSST 2008

  19. Boundary effect above PPh for Aa prediction Cumulative accuracy of Aa prediction NeGSST 2008

  20. PW model NeGSST 2008

  21. PW model NeGSST 2008

  22. Ap & Aa Cumulative Accuracy Ap cumulative accuracy Average of Aa and Ap predictions were used as the final accuracy of total F0 contour prediction Final accuracy of total F0 contour prediction NeGSST 2008

  23. PG Model for Ap & average duration An example of average Ap by PG-position of two adjacent PG’s by speaker and by speech data type. The horizontal axis represents the PG-position index. The vertical axis represents the average Ap values. The tempo of the same examples used in Figure1is plotted by speaker and by speech data type. The horizontal axis represents the PG-position index. The vertical axis represents the mean syllable duration values. NeGSST 2008

  24. Significant test of PPh features Comparison of Ap by pairs of PG positions initial/medial, initial/final and medial/final and by speaker, the asterisk * denotes statistically significant differences. Comparison of mean syllable duration by pairs of PG positions initial/medial, initial/final and medial/final and by speaker, the asterisk * denotes statistically significant differences. NeGSST 2008

  25. Outline • What is fluent speech prosody? Tones and Intonation? • Why the HPG framework (Tseng, 2004—2008)? • How to decompose F0 contours? • The nature of the Fujisaki model • Auto-extracting of the Fujisaki parameters • Calculating layered contribution by the HPG • Modeling tones, intonation and additional components • Investigating prosodic style variation • Further evidence of the HPG framework • Domains and units of boundaries (acoustic features) in fluent speech • Pause or No Pause for boundaries ? • Boundary effects and the relative-ness of supra-segmental signals NeGSST 2008

  26. Classification of Stylistic Variations regular semi-regular irregular NeGSST 2008

  27. Relationship between prosodic styles & HPG contribution distribution • Higher level contribution by prosody style as shown from R, SMR, IR to WIR • Various prosodic styles can be explainedby HPG framework systematically. • The more regular the prosodic style, the larger the prosodic domain, and more contribution from higher level information. NeGSST 2008

  28. Outline • What is fluent speech prosody? Tones and Intonation? • Why the HPG framework (Tseng, 2004—2008)? • How to decompose F0 contours? • The nature of the Fujisaki model • Auto-extracting of the Fujisaki parameters • Calculating layered contribution by the HPG • Modeling tones, intonation and additional components • Investigating prosodic style variation • Further evidence of the HPG framework • Domains and units of boundaries (acoustic features) in fluent speech • Pause or No Pause for boundaries ? • Boundary effects and the relative-ness of supra-segmental signals NeGSST 2008

  29. Layered Contributions of Duration, Intensity and Boundary Pause (Tseng, 2004, 2005) Duration, Pause and Intensity patterns of PW and PPh layer in speaker F051P NeGSST 2008

  30. Layered Contributions of Duration, Intensity and Boundary Pause (Tseng, 2004, 2005) Duration, Pause and Intensity patterns of PW and PPh layer in speaker F051P Duration Pattern Intensity Pattern Pause Pattern BG layer NeGSST 2008

  31. Comparison of Cumulative Predictions and Speech Data Comparision between speech data and predictions for M051 NeGSST 2008

  32. Outline • What is fluent speech prosody? Tones and Intonation? • Why the HPG framework (Tseng, 2004—2008)? • How to decompose F0 contours? • The nature of the Fujisaki model • Auto-extracting of the Fujisaki parameters • Calculating layered contribution by the HPG • Modeling tones, intonation and additional components • Investigating prosodic style variation • Further evidence of the HPG framework • Domains and units of boundaries (acoustic features) in fluent speech • Pause or No Pause for boundaries ? • Boundary effects and the relative-ness of supra-segmental signals NeGSST 2008

  33. Previous and Revised Models -- Duration Patterns at PPh Layer • The revised PPh patterns show how the general pattern derived from the revised model is more contrastive than earlier patterns shown in the previous PPh patterns NeGSST 2008

  34. Previous and Revised Models -- Intensity Patterns at PPh Layer • The PPh patterns from the revised model decayed more drastically towards boundary, thus match the tendency of the intensity attenuation for PPh final weakening, especially for M051P NeGSST 2008

  35. Outline • What is fluent speech prosody? Tones and Intonation? • Why the HPG framework (Tseng, 2004—2008)? • How to decompose F0 contours? • The nature of the Fujisaki model • Auto-extracting of the Fujisaki parameters • Calculating layered contribution by the HPG • Modeling tones, intonation and additional components • Investigating prosodic style variation • Further evidence of the HPG framework • Domains and units of boundaries (acoustic features) in fluent speech • Pause or No Pause for boundaries ? • Boundary effects and the relative-ness of supra-segmental signals NeGSST 2008

  36. Discourse boundary discrimination • Boundary properties and respective discourse identities • Only from pause duration? • Pre-boundary syllable lengthening ? (in other intonation study) • Relative acoustic feature? • Boundary discrimination • Detect topic & discourse organization • Discourse prosody context • Cross-over & adjacency NeGSST 2008

  37. Experiment 1 Q: Examine if only syllable domain is helpful to discriminate discourse boundary identities Goal: Whether singular/relative acoustic factor in syllable layer is sufficient to differentiate B3, B4 & B5 Acoustic features: Singular acoustic factor (1.) boundary pause (BP), (2.) pre-boundary syllable duration (PrDu) and (3.) pre-boundary syllable intensity (PrIn) Relative acoustic factor (4.) between-boundary syllable duration contrast (DuCon) and (5.) between-boundary syllable intensity contrast (InCon) NeGSST 2008

  38. Results of Experiment 1 (1/2) Cross boundary discrimination by single acoustic features. Each panel-acoustic feature. The horizontal axis -the prosodic boundary indexes. The vertical axis -the coefficient of normalized values of boundary pause (BP), per-boundary duration (PrDu) and per-boundary intensity (PrIn), respectively. NeGSST 2008

  39. Results of Experiment 1 (2/2) Cross boundary discrimination by single contrastive factors. Each panel-contrastive feature. The horizontal axis – prosodic boundary indexes. The vertical axis-the coefficient of normalized values between boundary duration contrasts (DuCon) and between boundary intensity contrasts (InCon). NeGSST 2008

  40. Experiment 2 Q: Examine the scale of boundary context to discriminate discourse boundary identities Goal : How to account for boundary context in the acoustic signals by discourse specifications • Acoustic features Average duration of prosodic units by different scale -Syllable -Prosodic word -Prosodic phrase NeGSST 2008

  41. Results of Experiment 2 (1/2)-- Lengthening Patterns by Discourse Units Cross boundary comparison of duration patterns by prosodic units the syllable (SYL), the PW and the PPh. The horizontal axis represents indexes of the speech data and speaker. The vertical axis denotes normalized average duration of prosodic units. NeGSST 2008

  42. Results of Experiment 2 (2/2)—Lengthening Patterns by Discourse Units Cross-boundary duration patter by boundary breaks. The panel denotes result of specific prosodic unit. Each curve denotes one of speech data. The horizontal-axis represents prosodic boundary index. The vertical-axis denotes the normalized average duration for specific prosodic unit. NeGSST 2008

  43. Conclusions of Fluent Narrative Speech • Prosody and prosody context More information beyond tones and intonation • adjacency and cross-over associations • Tone and intonation variation can be explainedby HPG framework systematically • Tone variation-Lower level by HPG • Various prosodic styles-Higher level by HPG • Prosody context boundaries across fluent speech • discourse specified • Possible application to ASR • topic & discourse organization detection NeGSST 2008

More Related