An exemplar based approach to automatic burst detection in voiceless stops
Download
1 / 32

An Exemplar-based Approach to Automatic Burst Detection in Voiceless Stops - PowerPoint PPT Presentation


  • 296 Views
  • Uploaded on

An Exemplar-based Approach to Automatic Burst Detection in Voiceless Stops. YAO YAO UC BERKELEY [email protected] http://linguistics.berkeley.edu/~yaoyao JULY 25, 2008. Overview. Background Data Methodology Algorithm Tuning the model Testing Results General Discussion.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'An Exemplar-based Approach to Automatic Burst Detection in Voiceless Stops' - Sophia


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
An exemplar based approach to automatic burst detection in voiceless stops l.jpg

An Exemplar-based Approach to Automatic Burst Detection in Voiceless Stops

YAO YAO

UC BERKELEY

[email protected]

http://linguistics.berkeley.edu/~yaoyao

JULY 25, 2008


Overview l.jpg
Overview Voiceless Stops

  • Background

  • Data

  • Methodology

    • Algorithm

    • Tuning the model

    • Testing

  • Results

  • General Discussion


Background l.jpg
Background Voiceless Stops

  • Purpose of the study

    • To find the point of burst in a word initial voiceless stop (i.e. [p], [t], [k])

close

release

vowel onset

  • Existing approach

    • Detecting the point of maximal energy change (cf. Niyogi and Ramesh, 1998; Liu, 1996)


Background4 l.jpg
Background Voiceless Stops

  • Our approach

    • Compare the spectrogram of the target token at each point against that of fricatives and silence

    • Assess how “fricative-like” and “silence-like” the spectrogram is at each time point

    • Find the point where “fricative-ness” suddenly rises and “silence-ness” suddenly drops  point of burst


Background5 l.jpg
Background Voiceless Stops

  • Our approach (cont’d)

    • What do we need?

      • Spectral features of a given time frame

      • Spectral templates of fricatives and silence

        • Specific to speaker and the recording environment

      • Measure and compare fricative-ness and silence-ness

      • An algorithm to find the most likely point for release

    • Advantage

      • Easy to implement

      • No worries about change in the environment and individual differences


Slide6 l.jpg
Data Voiceless Stops

  • Buckeye corpus (Pitt, M. et al. 2005)

  • 40 speakers

    • All residents of Columbus, Ohio

    • Balanced in gender and age

    • One-hour interview

    • Transcribed at word and phone level

    • 19 used in the current study

  • Target tokens

    • Transcribed word-initial voiceless stops (e.g. [p], [t], [k])


Methodology spectral measures l.jpg
Methodology: Voiceless Stopsspectral measures

  • Spectral vector

    • 20ms Hamming window

    • Mel scale

    • 1 × 60 array

  • Spectral template

    • Speaker-specific, phone-specific

    • Ignore tokens shorter than average duration of that phone of the speaker

    • For the remaining tokens

      • Calculate a spectral vector for the middle 20ms window

      • Average over the spectral vectors


Methodology spectral template l.jpg
Methodology: Voiceless Stopsspectral template

[a] of F01

[f] of F01

Silence of F01


Methodology similarity scores l.jpg
Methodology: Voiceless Stopssimilarity scores

  • Similarity between spectral vectors x and u

    • Dx,u =

    • Sx,u = e-0.005Dx,u

  • Comparing the given acoustic data against any spectral templates of that speaker

    • Stepsize = 5ms


Similarity scores l.jpg
Similarity scores Voiceless Stops

Formulae:

Dx,t =

Sx,t =e-0.005Dx,t

Step size = 5ms

- [s] score

- <sil> score


Methodology finding the release point l.jpg
Methodology: Voiceless Stopsfinding the release point

  • Basic idea

    • Near the release point

      • - Fricative similarity score rises

      • - Silence similarity score drops

close

release

vowel onset

Q1: Which fricative to use?

Q2: Which period of rise or drop to pick?


Methodology finding the release point12 l.jpg
Methodology : Voiceless Stopsfinding the release point

  • Slope is a better predictor than absolute score value

  • The end point of a period with maximal slope

    •  the release point

  • Which fricative?

  • [sh] score is more consistent than other fricatives

  • [h]

  • [s]

  • [sh]

  • <sil>

similarity scores


Methodology finding the release point13 l.jpg
Methodology : Voiceless Stopsfinding the release point

Initial [t] in "doing"

Initial [k] in “countries”

  • [h]

  • [s]

  • [sh]

  • <sil>

  • [h]

  • [s]

  • [sh]

  • <sil>


Methodology finding the release point14 l.jpg
Methodology : Voiceless Stopsfinding the release point

  • Original algorithm

    • Find the end point of a period of fastest increase in <sh> score

    • Find the end point of a period of fastest decrease in <sil> score

    • Return the middle point of the two end points as the point of release

    • If either or both end points cannot be found within the duration of the stop, return NULL.


Methodology finding the release point15 l.jpg
Methodology : Voiceless Stopsfinding the release point

  • Select two speakers’ data to tune the model

    • Hand-tag the release point for all tokens in the test set.

    • If the stop doesn’t appear to have a release point on the spectrogram, mark it as a problematic case, and take the end point of the stop as the release point, for calculating error.


Methodology problematic cases l.jpg
Methodology : Voiceless Stopsproblematic cases

no burst

no closure

weak and double release(??)

  • [sh]

  • <sil>


Methodology finding the release point17 l.jpg
Methodology : Voiceless Stopsfinding the release point

17

  • Calculate the difference between hand-tagged release point and the estimated one (i.e. error) for each case.

  • RMS (Root Mean Square) of error is used to measure the performance of the algorithm.


Methodology error analysis l.jpg
Methodology : Voiceless Stopserror analysis

F07 ( n=231 tokens)

M08 (n=261 tokens)

Add 5ms to the estimation

14.ms

RMS = 7.22ms

RMS = 13.11ms

4.85ms

real release-estimate

real release-estimate


Methodology tuning the algorithm l.jpg
Methodology: Voiceless Stopstuning the algorithm

  • 1st Rejection Rule

    -- A target token will be rejected if the changes in scores are not drastic enough.

    • E.g.

  • [sh]

  • <sil>

Insignificant rise  Reject!


Methodology tuning the algorithm20 l.jpg
Methodology: Voiceless Stopstuning the algorithm

  • Applying 1st Rejection Rule

    • Rejecting 4 cases inF07

      • RMS(+5ms) = 4.19ms

    • Rejecting 28 cases in M08

      • covering most of the

        problematic cases

      • RMS(+5ms)=9.27ms

Error analysis in M08 after 1st rejection rule

RMS(+5ms) = 14ms

9.27ms


Methodology tuning the algorithm21 l.jpg
Methodology : Voiceless Stopstuning the algorithm

Still a problem…

  • Multiple releases

    • Each might corresponds

      to a rise/drop of the scores

Initial [k] in “cause” of M08

  • [sh]

  • <sil>


Methodology tuning the algorithm22 l.jpg
Methodology: Voiceless Stopstuning the algorithm

  • 2nd Rejection Rule

    -- A target token will be dropped If the points found in <sh> and <sil> scores are too far apart. (>20ms)

    • Partly solves the multiple release problem

    • The ideal way would to identify all candidate release points, and return the first one.


Methodology tuning the algorithm23 l.jpg
Methodology: Voiceless Stopstuning the algorithm

  • Applying 2nd Rejection Rule

    • Rejecting 3 cases inF07

      • RMS(+5ms) = 3.22ms

    • Rejecting 20 cases in M08

      • Only 2 problematic cases remain

      • RMS(+5ms) = 3.44ms

Error analysis in M08 after 2nd rejection rule

RMS(+5ms) = 9.26ms

3.44ms

Compare: Optimal error is 2.5ms given the 5ms step size…


Methodology tuning the algorithm24 l.jpg

F07 Voiceless Stops

M08

Methodology: tuning the algorithm

Rejection rate: 3.03%

Rejection rate: 15.05%


Methodology testing the algorithm l.jpg
Methodology: Voiceless Stopstesting the algorithm

  • Select a random sample of 50 tokens from all speakers

    • Hand-tag the release point

    • Use the current algorithm together with two rejection rules to find the estimated release.

    • Compare the hand-tagged point and the estimated one

    • 4 rejected by the 1st rule (3 were legitimate)

    • 3 rejected by the 2nd rule (2 were legitimate)

    • 43 accepted cases. RMS(error) <5ms


Methodology summary l.jpg

Calculate <silence> score and <sh> score Voiceless Stops

Calculate the slope in <silence> score and <sh> score

In a labeled voiceless stop span, (i)find the time point of largest positive slope in <sh> score, and store in p1; (ii)find the time point of smallest negative slope in <silence> score, and store in p2

p1 = null or p2 = null

slope (p1)<0.02 and

slope (p2)>0.04

|p1–p2|>=0.02 s

return (p1+p2)/2+0.005

reject the case

Methodology: summary

Y

N

Y

N

Y

N


Results grand means l.jpg
Results: Voiceless Stopsgrand means

  • Rejection rates (2 rules combined)

    • Varies from 3.03% to 30.5% (mean = 13.3%,sd= 8.6%) across speakers.

  • VOT and closure duration


Results vot by speaker l.jpg
Results: Voiceless StopsVOT by speaker


General discussion l.jpg
General Discussion Voiceless Stops

  • Echoing previous findings

    • Byrd (1993): Closure duration and VOT in read speech

    • Shattuck-Hufnagel &Veilleux (2007): 13% of missing landmarks in spontaneous speech


General discussion30 l.jpg
General Discussion Voiceless Stops

  • Future work

    • Fine-tune the 2nd rejection rule

    • Generalize the exemplar-based method for other automatic phonetic processing problem?


Acknowledgement l.jpg
Acknowledgement Voiceless Stops

  • Anonymous speakers

  • Buckeye corpus developers

  • Prof. Keith Johnson

  • Members of the phonology lab in UC Berkeley

    Thank you!

    Any comments are welcome.


References l.jpg
References Voiceless Stops

  • Byrd, D. (1993) 54,000 American stops. UCLA Working Papers in Phonetics. No 83, pp: 97-116.

  • Johnson, K. (2006) Acoustic attribute scoring: A preliminary report.

  • Liu, S. (1996) Landmark detection for distinctive feature-based speech recognition. J. Acoust. Soc. Amer. Vol 100, pp 3417-3430.

  • Niyogi, P., Ramesh, P. (1998) Incorporating voice onset time to improve letter recognition accuracies. Proceedings of the 1998 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP '98. Vol 1, pp: 13-16.

  • Pitt, M. et al. (2005) The Buckeye Corpus of conversational speech: labeling conventions and a test of transcriber reliability. Speech Communication. Vol 45, pp: 90-95

  • Shattuck-Hufnagel, S., Veilleux, N.M. (2007) Robustness of acoustic landmarks in spontaneously-spoken American English. Proceedings of International Congress of Phonetic Science 2007, Saarbrucken, August 2007.

  • Zue, V.W. (1976) Acoustic Characteristics of stop consonants: A controlled study. Sc. D. thesis. MIT, Cambridge, MA.


ad