1 / 29

Pitch Estimation

Pitch Estimation. By Chih-Ti Shih. Objective. Determine the fundamental frequency of a speech waveform automatically. Automatic Extraction of Fundamental Frequency Methods. Cepstrum-based FΦ determinator (CFD) Harmonic product spectrum (HPS) Feature-based FΦ tracker (FBFT)

aricin
Download Presentation

Pitch Estimation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Pitch Estimation By Chih-Ti Shih Chih-Ti Shih

  2. Objective Determine the fundamental frequency of a speech waveform automatically Chih-Ti Shih

  3. Automatic Extraction of Fundamental Frequency Methods • Cepstrum-based FΦ determinator (CFD) • Harmonic product spectrum (HPS) • Feature-based FΦ tracker (FBFT) • Parallel processing method (PP) • Integrated FΦ tracking algorithm (IFTA) • Super resolution FΦ determinator (SRFD) Chih-Ti Shih

  4. eSRFD eSRFD: Enhanced Super resolution FΦ determinator. 1. Pass the sample through low-pass filter to simplify the temporal structure of the waveform 2. Pass the sample frames through silence detector to identify unvoiced frames. No analysis will be done for the unvoiced frames. if |XNmin or XNmax| + |YNmin or YNmax| < Tsrfd it is a silent frame Equation 1 Chih-Ti Shih

  5. eSRFD Each frame is subdivided into 3 consecutive segments, xn,yn and zn. Chih-Ti Shih

  6. eSRFD Chih-Ti Shih

  7. eSRFD Cross-correlation 3. For the ‘voiced’ frame, the first normalized cross-correlation of Px,y(n) of the frame is determined. Normalization Equation 2 Chih-Ti Shih

  8. eSRFD 4. Candidate values of the fundamental period are obtained by locating the peaks in the normalized cross-correlation coefficient for which the value Px,y(n) exceeds a certain threshold Tsrfd If no candidates are found in the frame, the frame is classified as ‘unvoiced’. Chih-Ti Shih

  9. 5. For the voiced frame (Px,y(n) > Tsrfd), the second normalized cross-correlation coefficient py,z(n) is determined eSRFD Chih-Ti Shih

  10. eSRFD 6. For those candidates with both Px,y(n) and py,z(n) exceeds the threshold Tsrfd are given a score of 2, others are 1. Note: If there are one or more candidates with a score of 2, then all those with a score of 1 are removed from the list of candidates. Chih-Ti Shih

  11. eSRFD If there is only one candidate with score 1 or 2, the candidate is assumed to be the best estimate of the fundamental period of that frame. Otherwise, an optimal fundamental period is sought from the set of remaining candidates. The candidate at the end of this list represents a fundamental period is nM, and the m’th candidate represents a period nm. Chih-Ti Shih

  12. eSRFD 7. then calculate q(nM) which is a normalized cross-correlation coefficient between sections of length nM spaced nm . q(nM) is defined as: Chih-Ti Shih

  13. eSRFD Chih-Ti Shih

  14. eSRFD The first coefficient q(n1) is assumed to be the optimal value. If the subsequent q(nm) * 0.77 > the current optimal value , the subsequent q(nm) is the optimal value. Chih-Ti Shih

  15. eSRFD If previous frame is ‘unvoiced’: the current value is hold and depends on the next frame. If the next frame is also unvoiced, the current frame will be considered as ‘unvoiced’ Otherwise, the current frame is considered as ‘voiced’ and current held FΦ will be considered as the good estimation for the current frame. If only 1 candidate score 1 but no candidate score2: Chih-Ti Shih

  16. eSRFD • The changes reduced the occurrence of doubling and halving in FΦ contour. However, they increase the chance the voiced region been miss-classified as unvoiced. Chih-Ti Shih

  17. eSRFD 8. Applying biasing to Px,y(n) and Py,z(n) if: 1. The two previous frames were ‘voiced’ 2. The FΦ value of the previous frame is not being temp held. 3. FΦ of previous frame is less than 7/4 *(FΦ of current frame) and greater than 5/8*(FΦ of current frame) However, the biasing tends to increase the percentage of unvoiced regions of speech being incorrectly classified as ‘voiced’. If the unbiased coefficient Px,y(n) does not exceed the Tsrfd and this candidate is believed to be the best estimate of the frame. The FΦ of this candidate is held until the state of the subsequent frame is known. If the next frame is silent, the current frame is re-classified as silent. Chih-Ti Shih

  18. eSRFD • 9. The fundamental period for the frame is estimated by calculate rx,y(n) for n in the region –L < n < L. The maximum within this range corresponds to a more accurate value of the fundamental period. Chih-Ti Shih

  19. Comparison of asynchronous frequency contours • Compare Fx which is generated from the laryngograph with the FΦ contours generated by eSRFD • Fxreference refer to reference value from laryngograph. • FΦ refer to the value from eSRFD Chih-Ti Shih

  20. Comparison of asynchronous frequency contours • Fxreference and FΦ are zero: both describe a silent or unvoiced region of the utterance and no error result. • FΦ is non-zero but Fxreference is zero: the region is incorrectly classified as voiced by eSRFD • Fxrefernece is non-zero but FΦ is zero: the voice region is incorrectly classified as unvoiced by eSRFD • Fxreference and FΦ are non-zero: both correctly classify the region as voiced. In such case, calculate the ration of: Chih-Ti Shih

  21. Gross Error Halving error Doubling error Acceptable accuracy Chih-Ti Shih

  22. Comparison of asynchronous frequency contours Female Chih-Ti Shih

  23. Comparison of asynchronous frequency contours Female Chih-Ti Shih

  24. Comparison of asynchronous frequency contours Male Chih-Ti Shih

  25. Comparison of asynchronous frequency contours Male Chih-Ti Shih

  26. Comparison of asynchronous frequency contours laryngograph eSRFD Chih-Ti Shih

  27. Comparison of asynchronous frequency contours Chih-Ti Shih

  28. Comparison of asynchronous frequency contours Chih-Ti Shih

  29. Question? Chih-Ti Shih

More Related