Building a sentential model for automatic prosody evaluation

Building a sentential modelforautomatic prosody evaluation Part A Kyuchul Yoon School of English Language & Literature Yeungnam University 2009.06.19 Korea University

Introduction English pronunciation evaluation • English pronunciation proficiency evaluation • Ultimate goals • Evaluation at • The segmental level • The suprasegmental level • Current goals • Evaluation at • The suprasegmental level

Introduction English pronunciation evaluation • The goal of present study • Prosody evaluation of a single target utterance • Produced by a Korean student • Given • An English target sentence • A sentential model for prosody evaluation

Introduction Manual vs. automatic • Problems of manual evaluation • What to evaluate • How to evaluate • Consistency • Problems of automatic evaluation • How to reflect human knowledge

Introduction Manual vs. automatic • A possible solution? • Avoid knowledge-based abstraction • Compare a target utterance with native speakers’ utterances • Use multiple utterances for comparison • Multiple “good” utterances from native speakers • Adopt raw values • Calculate difference values between the target and the “good” utterances in terms of • The three prosodic aspects : F0, intensity, durations  3D coordinates

Introduction How to build the model • Use multivariate statistical analysis • A discriminant analysis • The components of the model (The segmental proficiency scores controlled) • The manual prosody evaluation scores (response) • The automatic prosody evaluation scores (factors) • The requirements of the model • The correlation between the two levelsManual scores vs. Automatic scores

Introduction How to build the model • The manual prosody scores (an ideal case) • The “good” utterance versions (point 5)by many native speakers of English • The utterance versions by Korean students whose prosodic proficiencies are • High (point 5) • Intermediate (point 3) • Low (point 1) • On a scale of 1 (worst) to 5 (best)

Introduction How to build the model • The automatic prosody scores • Use of Praat scripts • Comparison between a single target utterance & multiple native speakers’ utterances to yield scores for • The F0 difference • The intensity difference • The duration difference in the form of 3D coordinates (x, y, z) = (F0, Int, Dur) • One utterance yields as many coordinates as the number of “good” native speakers

Introduction How to build the model • Evaluation by comparisons

Introduction A 3D sentential modelfor prosody evaluation • A 3D model • 3D axes: F0, intensity, durations (F0, Int, Dur) coordinates= (x, y, z) • Automatic scores as scatterplot points • Manually evaluated scores group the points

Introduction A 3D sentential modelfor prosody evaluatioin • Validity of the model • Sufficient separation of groups with different manual scores • colors : manual scores • arrowheads : automatic scores

Methods Sentential prosody evaluation [7] Before & after duration manipulation native learner before learner after

Methods Sentential prosody evaluation [7] F0 : point-to-point comparison btw/ native and learner after normalization native learner after Automatic score (F0, Int, Dur) (x, y, z)

Methods Sentential prosody evaluation [7] Intensity : point-to-point comparison btw/ native and learner after normalization native learner after Automatic score (F0,Int, Dur) (x, y, z)

Methods Sentential prosody evaluation [7] Duration : segment-to-segment comparison btw/ native and learner native learner before Automatic score (F0, Int, Dur) (x, y, z) Euclidean distance metric for evaluation measure P = (p1, p2, p3,..., pn) and Q = (q1, q2, q3,..., qn) in Euclidean n-dimensional space

Methods Manual evaluation of sentential prosody Manual scores for Set B utterances “The dancing queen likes only the apple pies”

Methods Sentential prosody evaluation [7] A sample score array for one utterance from group K5:one learner utterance vs. 10 model native utterances Automatic prosody score for K5.U1 = {(899,142,408), (360,92,190), …(716,178,183)}

Results A prosody evaluation modelby a Korean phonetician Korean phonetician’s Model

Results A sample prosody evaluationwith a discriminant analysis

Discussion To make this fully automatic • For manual evaluation of the training model • The number of Korean learners • The more the better • The levels of English proficiency • The diverse the better (scores 1 through 5) • For automatic evaluation of the trainees • Need automatic segmentation (ASR) • Need to deal with redundant/missing segments

Building a sentential modelfor automatic evaluation of pronunciation proficiency Part B What about segmental evaluation?

Methods Segmental evaluation byspectral comparison • Sex/age controlled (no normalization was used) • Adult male (native/Korean) speakers were selected • Spectral comparison • Three equally-spaced spectral slices were used for each matching segments • Euclidean distance measure was used from a pair of matching spectral envelopes • Four coordinates for pronunciation proficiency evaluation • Segments, F0, intensity, durations • (w, x, y, z) becomes one of the score array

Methods Manual evaluation of overall proficiency Manual scores for Set C utterances “Put your toys away right now” <Table 4> The overall scores of the 34 utterances for Set C sentence “Put your toys away right now”. The manual evaluation was performed by a Korean phonetician. Note that the subjects were all male adults.

Results A pronunciation proficiency evaluation modelby a Korean phonetician Korean phonetician’s Models (Intensity axis not shown)

Results A prosody evaluation modelby a Korean phonetician Korean phonetician’s Model

Results A discriminant analysis <Table 5> The classification table from the discriminant analysis of one test data. The number in each cell represents the probability of the automatic pronunciation Proficiency score being classified into the predicted group. <Table 6> The confusion matrix for the classification table.

Results Discriminant analyseswith leave-one-out cross-validation Testing for score 4 : 6 out of 9 correct Testing for score 2 : 12 out of 15 correct

Results Discriminant analyseswith leave-one-out cross-validation • For N4 & K2 groups, evaluation models were built by using • The discriminant analysis with • Leave-one-out cross-validation • The number of models (built by discriminant analyses) was 24 • Group N4 : 9 subjects • Group K2 : 15 subjects • Success rate • Group N4 : 6 out of 9 predicted correct • Group K2 : 12 out of 15 predicted correct

Discussion Automatic evaluationof pronunciation proficiency • Viability of sentential models for the evaluation of • Segmental proficiency : spectral comparison • Prosodic proficiency : F0/intensity/durations in the form of multiple score array coordinates (segments, F0, intensity, durations) = (w, x, y, z) • Comparison seems to work • A target utterance vs. multiple model native utterances • Better models can be built with • More (controlled) utterances • More score resolution • Current : score 2 (bad) – score 4 (good) • Future : score 1 (worst) – score 3 (fair) – score 5 (best)

References [1] Boersma, Paul, “Praat, a system for doing phonetics by computer”, Glot International 5(9/10), pp.341-345, 2001. [2] Mahalanobis, P.C., “On the generalized distance in statistics”, Proceedings of the National Institute of Science of India 12, pp.49-55, 1936. [3] Moulines, E. & F. Charpentier, “Pitch synchronous waveform processing techniques for text-to-speech synthesis using diphones”, Speech Communication 9, pp.453-467, 1990. [4] Ramus, F., M. Nespor, J. Mehler, “Correlates of linguistic rhythm in the speech signal”, Cognition 73, pp. 265-292, 1999. [5] Rhee, S., S. Lee, Y. Lee & S. Kang, “Design and construction of Korean-Spoken English Corpus (K-SEC)”, Malsori 46, pp.159-174, 2003. [6] Yoon, K, “Imposing native speakers' prosody on non-native speakers' utterances: The technique of cloning prosody”, Journal of the Modern British & American Language & Literature 25(4), pp.197-215, 2007. [7] Yoon, K. 2008. Synthesis and evaluation of prosodically exaggerated utterances. Unpublished manuscript

Building a sentential model for automatic prosody evaluation

Building a sentential model for automatic prosody evaluation

Presentation Transcript

Terms for Discussing Prosody

Sentential Semantics

A Web-based Automatic Evaluation System

A Non-Photorealistic Model for Automatic Technical Illustration

Sentential Logic

Prosody

Sentential Semantics

Building A Model Lung

Sentential Logic

Building a Watershed Model

Building a Model-Checker for Z

Automatic Prosody Labeling Final Presentation

Automatic Summary Evaluation

“Building a Landscape Model”

A Vector Space Model for Automatic Indexing

A Technique for Automatic Validation of Model Transformations

Building a Model

Sentential logic

Building a Maxent Model

Prosody