Context in multilingual tone and pitch accent recognition
This presentation is the property of its rightful owner.
Sponsored Links
1 / 19

Context in Multilingual Tone and Pitch Accent Recognition PowerPoint PPT Presentation


  • 92 Views
  • Uploaded on
  • Presentation posted in: General

Context in Multilingual Tone and Pitch Accent Recognition. Gina-Anne Levow University of Chicago September 7, 2005. Roadmap. Motivating Context Data Collections & Processing Modeling Context for Tone and Pitch Accent Context in Recognition Conclusion. Challenges.

Download Presentation

Context in Multilingual Tone and Pitch Accent Recognition

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Context in multilingual tone and pitch accent recognition

Context in Multilingual Tone and Pitch Accent Recognition

Gina-Anne Levow

University of Chicago

September 7, 2005


Roadmap

Roadmap

  • Motivating Context

  • Data Collections & Processing

  • Modeling Context for Tone and Pitch Accent

  • Context in Recognition

  • Conclusion


Challenges

Challenges

  • Tone and Pitch Accent Recognition

    • Key component of language understanding

      • Lexical tone carries word meaning

      • Pitch accent carries semantic, pragmatic, discourse meaning

    • Non-canonical form (Shen 90, Shih 00, Xu 01)

      • Tonal coarticulation modifies surface realization

        • In extreme cases, fall becomes rise

    • Tone is relative

      • To speaker range

        • High for male may be low for female

      • To phrase range, other tones

        • E.g. downstep


Strategy

Strategy

  • Common model across languages, SVM classifier

    • Acoustic-prosodic model: no word label, POS, lexical stress info

      • No explicit tone label sequence model

    • English, Mandarin Chinese (also Cantonese)

  • Exploit contextual information

    • Features from adjacent syllables

      • Height, shape: direct, relative

    • Compensate for phrase contour

  • Analyze impact of

    • Context position, context encoding, context type

    • > 20% relative improvement over no context

      • Preceding context greater enhancement than following


Data collection processing

Data Collection & Processing

  • English: (Ostendorf et al, 95)

    • Boston University Radio News Corpus, f2b

    • Manually ToBI annotated, aligned, syllabified

    • Pitch accent aligned to syllables

      • Unaccented, High, Downstepped High, Low

        • (Sun 02, Ross & Ostendorf 95)

  • Mandarin:

    • TDT2 Voice of America Mandarin Broadcast News

    • Automatically force aligned to anchor scripts (CUSonic)

    • High, Mid-rising, Low, High falling, Neutral


Local feature extraction

Local Feature Extraction

  • Uniform representation for tone, pitch accent

    • Motivated by Pitch Target Approximation Model

      • Tone/pitch accent target exponentially approached

        • Linear target: height, slope (Xu et al, 99)

  • Scalar features:

    • Pitch, Intensity max, mean (Praat, speaker normalized)

    • Pitch at 5 points across voiced region

    • Duration

    • Initial, final in phrase

  • Slope:

    • Linear fit to last half of pitch contour


Context features

Context Features

  • Local context:

    • Extended features

      • Pitch max, mean, adjacent points of preceding, following syllables

    • Difference features

      • Difference between

        • Pitch max, mean, mid, slope

        • Intensity max, mean

      • Of preceding, following and current syllable

  • Phrasal context:

    • Compute collection average phrase slope

    • Compute scalar pitch values, adjusted for slope


Classification experiments

Classification Experiments

  • Classifier: Support Vector Machine

    • Linear kernel

    • Multiclass formulation

      • (SVMlight, Joachims), LibSVM (Cheng & Lin 01)

    • 4:1 training / test splits

  • Experiments: Effects of

    • Context position: preceding, following, none, both

    • Context encoding: Extended/Difference

    • Context type: local, phrasal


Results local context

Results: Local Context


Results local context1

Results: Local Context


Results local context2

Results: Local Context


Discussion local context

Discussion: Local Context

  • Any context information improves over none

    • Preceding context information consistently improves over none or following context information

      • English: Generally more context features are better

      • Mandarin: Following context can degrade

    • Little difference in encoding (Extend vs Diffs)

  • Consistent with phonological analysis (Xu) that coarticulation is carryover, not anticipatory


Results discussion phrasal context

Results & Discussion: Phrasal Context

  • Phrase contour compensation enhances recognition

    • Simple strategy

    • Use of non-linear slope compensate may improve


Conclusion

Conclusion

  • Employ common acoustic representation

    • Tone (Mandarin), pitch accent (English)

      • Cantonese, recent experiments

  • SVM classifiers - linear kernel: 76%, 81%

  • Local context effects:

    • Up to > 20% relative reduction in error

    • Preceding context greatest contribution

      • Carryover vs anticipatory

  • Phrasal context effects:

    • Compensation for phrasal contour improves recognition


Current future work

Current & Future Work

  • Application of model to different languages

    • Cantonese, Dschang (Bantu family)

      • Cantonese: ~65% acoustic only, 85% w/segmental

  • Integration of additional contextual influence

    • Topic, turn, discourse structure

    • HMSVM, GHMM models

  • http://people.cs.uchicago.edu/~levow/projects/tai

    • Supported by NSF Grant #: 0414919


Confusion matrix english

Confusion Matrix (English)


Confusion matrix mandarin

Confusion Matrix (Mandarin)


Related work

Related Work

  • Tonal coarticulation:

    • Xu & Sun,02; Xu 97;Shih & Kochanski 00

  • English pitch accent

    • X. Sun, 02; Hasegawa-Johnson et al, 04; Ross & Ostendorf 95

  • Lexical tone recognition

    • SVM recognition of Thai tone: Thubthong 01

    • Context-dependent tone models

      • Wang & Seneff 00, Zhou et al 04


Pitch target approximation model

Pitch Target Approximation Model

  • Pitch target:

    • Linear model:

    • Exponentially approximated:

    • In practice, assume target well-approximated by mid-point (Sun, 02)


  • Login