160 likes | 406 Views
DTW for Speech Recognition. J.-S. Roger Jang ( 張智星 ) jang@cs.nthu.edu.tw http://www.cs.nthu.edu.tw/~jang MIR Lab ( 多媒體資訊檢索實驗室 ) CS, Tsing Hua Univ. ( 清華大學 資工系 ). Dynamic Time Warping (DTW). Characteristics: Pattern-matching-based approach Require less memory/computation
E N D
DTW for Speech Recognition J.-S. Roger Jang (張智星) jang@cs.nthu.edu.tw http://www.cs.nthu.edu.tw/~jang MIR Lab (多媒體資訊檢索實驗室) CS, Tsing Hua Univ. (清華大學 資工系)
Dynamic Time Warping (DTW) • Characteristics: • Pattern-matching-based approach • Require less memory/computation • Suitable for speaker-dependent recognition • Suitable for small to medium vocabulary • Suitable for microprocessor/chip implementation • Applications • Speaker identification & verification for surveillance • Voice commands for mobile phones, toys
Dynamic Time Warping: Type 1 j t: input MFCC matrix (Each column is a frame’s feature.) r: reference MFCC matrix Local paths: 27-45-63 degrees DTW recurrence: r(j) r(j-1) t(i-1) t(i) i
Dynamic Time Warping: Type 2 j t: input MFCC matrix (Each row is a frame’s feature.) r: reference MFCC matrix Local paths: 0-45-90 degrees DTW recurrence: r(j) r(j-1) t(i-1) t(i) i
Type 1 27-45-63 local paths Type 2 0-45-90 local paths Local Path Constraints
Path Penalty for Type-1 DTW • Path penalty • No penalty for 45-degree path • Some penalty for paths deviated from 45-degree
We assume the speed of a user’s acoustic input falls within 1/2 and 2 times of that of the intended sentence. Both corners are fixed. (End point detection is critical.) Suitable for voice command applications DTW Paths of “Match Corners” j i
No fixed anchored positions Suitable for retrieval of personal spoken documents DTW Paths of “Match Anywhere” j i
Local constraints Start/ending area Other Variants
Implementation Issues • To save memory • Use 2-column table for type-1 DTW • Use 1-column table for type-2 DTW • To avoid too many if-then statements • Pad type-1 DTW with two-layer padding • Pad type-2 DTW with one-layer padding • To find a suitable path • Minimizing total distance • Minimizing average distance
DTW for Spoken Document Retrieval • Applications • Voice-based audio/video retrieval • Issues in SDR using DTW • Speaker normalization • Vocal track length normalization (VTLN) • Frequency warping • Efficiency
DTW for Speaker-independent Voice Command Recognition • Applications • Digit recognition • Technical highlights • Extensive recordings • Clustering within each command • Some indexing methods for DTW • Suitable for small-vocabulary applications