1 / 20

Endpoint Detection ( 端點偵測 )

Endpoint Detection ( 端點偵測 ). Jyh-Shing Roger Jang ( 張智星 ) http://mirlab.org/jang MIR Lab, CSIE Dept National Taiwan Univ., Taiwan. Intro. To Endpoint Detection. Endpoint detection (EPD, 端點偵測 ) Goal: Determine the start and end of voice activity Also known as voice activity detection (VAD)

lila-bright
Download Presentation

Endpoint Detection ( 端點偵測 )

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Endpoint Detection( 端點偵測) Jyh-Shing Roger Jang (張智星) http://mirlab.org/jang MIR Lab, CSIE Dept National Taiwan Univ., Taiwan

  2. Intro. To Endpoint Detection • Endpoint detection (EPD,端點偵測) • Goal: Determine the start and end of voice activity • Also known as voice activity detection (VAD) • Importance • Acts as preprocessing for many recognition tasks • Requires as small computing power as possible • Operation scenarios for speech recognition • Off-line for “pushing to talk” • On-line for “continuously listening”

  3. Two Types of Approaches to EPD • Time-domain methods • Volume • ZCR (zero crossing rate) • HOD (high-order difference) • Frequency-domain methods • Variance of spectrum • Entropy of spectrum

  4. Typical Frameworks to EPD • Thresholding • Simple thresholding • Compute a features (e.g., volume) from each frame • Select a threshold vth • Any frame with a volume higher than vth is considered positive • Combined thresholding • Use two features (e.g., volume and ZCR) to have more complicated decision making • Classification • Take more than one features • Perform binary classification • Negativesil or noise • Positivesound activity • Sequence alignment • Use hidden Markov models (HMM) for sequence alignment

  5. Performance Evaluation for EPD • Types of errors • False rejection positive  negative • False acceptance negative  positive • Performance evaluation • Start & end position accuracy • Frame-based accuracy

  6. EPD by Volume Only • The simplest method for EPD • Four intuitive way to select vth? • vth = vmax*a • vth = vmedian*b • vth = vmin*g • vth = v1*d

  7. EPD by Volume Only (II) • Unfortunately… • Most of the thresholds fail one way or another. • Dataset-based fine-tuning of a, b, g, d is always advisable. • Under what situations do they fail? • Plosive sounds • Silence too long • Total-zero frame • Unstable frame 1

  8. EPD by Volume Only (III) • A presumably better way to select vth • vlower= 3rd percentile of all ascending volume • vupper= 97th percentile of all ascending volume • vth = (vupper-vlower)*k+vlower • Why do we need to use percentile? • To deal with plosive sounds • To deal total-zero frames • Does it fail? Yes, still, in certain situation…

  9. Example: EPD by Volume • epdByVol01.m

  10. EPD by Volume and ZCR • 以高音量(tu)為標準,決定端點 • 將端點前後延伸到低音量(tl)處 • 再將端點前後延伸到過零率門檻(tzc)處

  11. Example: EPD by Volume and ZCR • epdByVolZcr01.m

  12. EPD by Volume and HOD • How to detect unvoiced sounds reliably? • ZCR • High order difference • Order-1 HOD = sum(abs(diff(s))) • Order-2 HOD = sum(abs(diff(diff(s)))) • Order-3 HOD = sum(abs(diff(diff(diff(s))))) • …

  13. Example: EPD by Vol. and HOD • highOrderDiff01.m

  14. Example: EPD by Vol. and HOD (II) • epdByVolHod01.m

  15. Example: EPD by Vol. and HOD (III) • A hard example: epdByVolHod02.m

  16. EPD by Spectrum • epdBySpectrum01.m

  17. How to Aggregate Spectrum? • To aggregate spectrum for EPD • Entropy function • Geometric mean over arithmetic mean

  18. Spectral Entropy • PDF: • Normalization • Spectral entropy:

  19. N=2 entropyPlot.m N=3 Properties of Entropy

  20. References • References for EPD • Jialin Shen, Jeihweih Hung, Linshan Lee, “Robust entropy-based endpoint detection for speech recognition in noisy environments”, International Conference on Spoken Language Processing, Sydney, 1998

More Related