an exemplar based approach to automatic burst detection in voiceless stops
Download
Skip this Video
Download Presentation
An Exemplar-based Approach to Automatic Burst Detection in Voiceless Stops

Loading in 2 Seconds...

play fullscreen
1 / 32

An Exemplar-based Approach to Automatic Burst Detection in Voiceless Stops - PowerPoint PPT Presentation


  • 305 Views
  • Uploaded on

An Exemplar-based Approach to Automatic Burst Detection in Voiceless Stops. YAO YAO UC BERKELEY [email protected] http://linguistics.berkeley.edu/~yaoyao JULY 25, 2008. Overview. Background Data Methodology Algorithm Tuning the model Testing Results General Discussion.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'An Exemplar-based Approach to Automatic Burst Detection in Voiceless Stops' - Sophia


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
an exemplar based approach to automatic burst detection in voiceless stops

An Exemplar-based Approach to Automatic Burst Detection in Voiceless Stops

YAO YAO

UC BERKELEY

[email protected]

http://linguistics.berkeley.edu/~yaoyao

JULY 25, 2008

overview
Overview
  • Background
  • Data
  • Methodology
    • Algorithm
    • Tuning the model
    • Testing
  • Results
  • General Discussion
background
Background
  • Purpose of the study
    • To find the point of burst in a word initial voiceless stop (i.e. [p], [t], [k])

close

release

vowel onset

  • Existing approach
    • Detecting the point of maximal energy change (cf. Niyogi and Ramesh, 1998; Liu, 1996)
background4
Background
  • Our approach
    • Compare the spectrogram of the target token at each point against that of fricatives and silence
    • Assess how “fricative-like” and “silence-like” the spectrogram is at each time point
    • Find the point where “fricative-ness” suddenly rises and “silence-ness” suddenly drops  point of burst
background5
Background
  • Our approach (cont’d)
    • What do we need?
      • Spectral features of a given time frame
      • Spectral templates of fricatives and silence
        • Specific to speaker and the recording environment
      • Measure and compare fricative-ness and silence-ness
      • An algorithm to find the most likely point for release
    • Advantage
      • Easy to implement
      • No worries about change in the environment and individual differences
slide6
Data
  • Buckeye corpus (Pitt, M. et al. 2005)
  • 40 speakers
    • All residents of Columbus, Ohio
    • Balanced in gender and age
    • One-hour interview
    • Transcribed at word and phone level
    • 19 used in the current study
  • Target tokens
    • Transcribed word-initial voiceless stops (e.g. [p], [t], [k])
methodology spectral measures
Methodology: spectral measures
  • Spectral vector
    • 20ms Hamming window
    • Mel scale
    • 1 × 60 array
  • Spectral template
    • Speaker-specific, phone-specific
    • Ignore tokens shorter than average duration of that phone of the speaker
    • For the remaining tokens
      • Calculate a spectral vector for the middle 20ms window
      • Average over the spectral vectors
methodology spectral template
Methodology: spectral template

[a] of F01

[f] of F01

Silence of F01

methodology similarity scores
Methodology: similarity scores
  • Similarity between spectral vectors x and u
    • Dx,u =
    • Sx,u = e-0.005Dx,u
  • Comparing the given acoustic data against any spectral templates of that speaker
    • Stepsize = 5ms
similarity scores
Similarity scores

Formulae:

Dx,t =

Sx,t =e-0.005Dx,t

Step size = 5ms

- [s] score

- <sil> score

methodology finding the release point
Methodology: finding the release point
  • Basic idea
    • Near the release point
      • - Fricative similarity score rises
      • - Silence similarity score drops

close

release

vowel onset

Q1: Which fricative to use?

Q2: Which period of rise or drop to pick?

methodology finding the release point12
Methodology : finding the release point
  • Slope is a better predictor than absolute score value
  • The end point of a period with maximal slope
    •  the release point
  • Which fricative?
  • [sh] score is more consistent than other fricatives
  • [h]
  • [s]
  • [sh]
  • <sil>

similarity scores

methodology finding the release point13
Methodology : finding the release point

Initial [t] in "doing"

Initial [k] in “countries”

  • [h]
  • [s]
  • [sh]
  • <sil>
  • [h]
  • [s]
  • [sh]
  • <sil>
methodology finding the release point14
Methodology : finding the release point
  • Original algorithm
    • Find the end point of a period of fastest increase in <sh> score
    • Find the end point of a period of fastest decrease in <sil> score
    • Return the middle point of the two end points as the point of release
    • If either or both end points cannot be found within the duration of the stop, return NULL.
methodology finding the release point15
Methodology : finding the release point
  • Select two speakers’ data to tune the model
    • Hand-tag the release point for all tokens in the test set.
    • If the stop doesn’t appear to have a release point on the spectrogram, mark it as a problematic case, and take the end point of the stop as the release point, for calculating error.
methodology problematic cases
Methodology : problematic cases

no burst

no closure

weak and double release(??)

  • [sh]
  • <sil>
methodology finding the release point17
Methodology : finding the release point

17

  • Calculate the difference between hand-tagged release point and the estimated one (i.e. error) for each case.
  • RMS (Root Mean Square) of error is used to measure the performance of the algorithm.
methodology error analysis
Methodology : error analysis

F07 ( n=231 tokens)

M08 (n=261 tokens)

Add 5ms to the estimation

14.ms

RMS = 7.22ms

RMS = 13.11ms

4.85ms

real release-estimate

real release-estimate

methodology tuning the algorithm
Methodology: tuning the algorithm
  • 1st Rejection Rule

-- A target token will be rejected if the changes in scores are not drastic enough.

    • E.g.
  • [sh]
  • <sil>

Insignificant rise  Reject!

methodology tuning the algorithm20
Methodology: tuning the algorithm
  • Applying 1st Rejection Rule
    • Rejecting 4 cases inF07
      • RMS(+5ms) = 4.19ms
    • Rejecting 28 cases in M08
      • covering most of the

problematic cases

      • RMS(+5ms)=9.27ms

Error analysis in M08 after 1st rejection rule

RMS(+5ms) = 14ms

9.27ms

methodology tuning the algorithm21
Methodology : tuning the algorithm

Still a problem…

  • Multiple releases
    • Each might corresponds

to a rise/drop of the scores

Initial [k] in “cause” of M08

  • [sh]
  • <sil>
methodology tuning the algorithm22
Methodology: tuning the algorithm
  • 2nd Rejection Rule

-- A target token will be dropped If the points found in <sh> and <sil> scores are too far apart. (>20ms)

    • Partly solves the multiple release problem
    • The ideal way would to identify all candidate release points, and return the first one.
methodology tuning the algorithm23
Methodology: tuning the algorithm
  • Applying 2nd Rejection Rule
    • Rejecting 3 cases inF07
      • RMS(+5ms) = 3.22ms
    • Rejecting 20 cases in M08
      • Only 2 problematic cases remain
      • RMS(+5ms) = 3.44ms

Error analysis in M08 after 2nd rejection rule

RMS(+5ms) = 9.26ms

3.44ms

Compare: Optimal error is 2.5ms given the 5ms step size…

methodology tuning the algorithm24
F07

M08

Methodology: tuning the algorithm

Rejection rate: 3.03%

Rejection rate: 15.05%

methodology testing the algorithm
Methodology: testing the algorithm
  • Select a random sample of 50 tokens from all speakers
    • Hand-tag the release point
    • Use the current algorithm together with two rejection rules to find the estimated release.
    • Compare the hand-tagged point and the estimated one
    • 4 rejected by the 1st rule (3 were legitimate)
    • 3 rejected by the 2nd rule (2 were legitimate)
    • 43 accepted cases. RMS(error) <5ms
methodology summary

Calculate <silence> score and <sh> score

Calculate the slope in <silence> score and <sh> score

In a labeled voiceless stop span, (i)find the time point of largest positive slope in <sh> score, and store in p1; (ii)find the time point of smallest negative slope in <silence> score, and store in p2

p1 = null or p2 = null

slope (p1)<0.02 and

slope (p2)>0.04

|p1–p2|>=0.02 s

return (p1+p2)/2+0.005

reject the case

Methodology: summary

Y

N

Y

N

Y

N

results grand means
Results: grand means
  • Rejection rates (2 rules combined)
    • Varies from 3.03% to 30.5% (mean = 13.3%,sd= 8.6%) across speakers.
  • VOT and closure duration
general discussion
General Discussion
  • Echoing previous findings
    • Byrd (1993): Closure duration and VOT in read speech
    • Shattuck-Hufnagel &Veilleux (2007): 13% of missing landmarks in spontaneous speech
general discussion30
General Discussion
  • Future work
    • Fine-tune the 2nd rejection rule
    • Generalize the exemplar-based method for other automatic phonetic processing problem?
acknowledgement
Acknowledgement
  • Anonymous speakers
  • Buckeye corpus developers
  • Prof. Keith Johnson
  • Members of the phonology lab in UC Berkeley

Thank you!

Any comments are welcome.

references
References
  • Byrd, D. (1993) 54,000 American stops. UCLA Working Papers in Phonetics. No 83, pp: 97-116.
  • Johnson, K. (2006) Acoustic attribute scoring: A preliminary report.
  • Liu, S. (1996) Landmark detection for distinctive feature-based speech recognition. J. Acoust. Soc. Amer. Vol 100, pp 3417-3430.
  • Niyogi, P., Ramesh, P. (1998) Incorporating voice onset time to improve letter recognition accuracies. Proceedings of the 1998 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP \'98. Vol 1, pp: 13-16.
  • Pitt, M. et al. (2005) The Buckeye Corpus of conversational speech: labeling conventions and a test of transcriber reliability. Speech Communication. Vol 45, pp: 90-95
  • Shattuck-Hufnagel, S., Veilleux, N.M. (2007) Robustness of acoustic landmarks in spontaneously-spoken American English. Proceedings of International Congress of Phonetic Science 2007, Saarbrucken, August 2007.
  • Zue, V.W. (1976) Acoustic Characteristics of stop consonants: A controlled study. Sc. D. thesis. MIT, Cambridge, MA.
ad