DTW for Speech Recognition

DTW for Speech Recognition J.-S. Roger Jang (張智星) jang@cs.nthu.edu.tw http://www.cs.nthu.edu.tw/~jang MIR Lab (多媒體資訊檢索實驗室) CS, Tsing Hua Univ. (清華大學資工系)

Dynamic Time Warping (DTW) • Characteristics: • Pattern-matching-based approach • Require less memory/computation • Suitable for speaker-dependent recognition • Suitable for small to medium vocabulary • Suitable for microprocessor/chip implementation • Applications • Speaker identification & verification for surveillance • Voice commands for mobile phones, toys

Dynamic Time Warping: Type 1 j t: input MFCC matrix (Each column is a frame’s feature.) r: reference MFCC matrix Local paths: 27-45-63 degrees DTW recurrence: r(j) r(j-1) t(i-1) t(i) i

Dynamic Time Warping: Type 2 j t: input MFCC matrix (Each row is a frame’s feature.) r: reference MFCC matrix Local paths: 0-45-90 degrees DTW recurrence: r(j) r(j-1) t(i-1) t(i) i

Type 1 27-45-63 local paths Type 2 0-45-90 local paths Local Path Constraints

Path Penalty for Type-1 DTW • Path penalty • No penalty for 45-degree path • Some penalty for paths deviated from 45-degree

We assume the speed of a user’s acoustic input falls within 1/2 and 2 times of that of the intended sentence. Both corners are fixed. (End point detection is critical.) Suitable for voice command applications DTW Paths of “Match Corners” j i

No fixed anchored positions Suitable for retrieval of personal spoken documents DTW Paths of “Match Anywhere” j i

Local constraints Start/ending area Other Variants

Implementation Issues • To save memory • Use 2-column table for type-1 DTW • Use 1-column table for type-2 DTW • To avoid too many if-then statements • Pad type-1 DTW with two-layer padding • Pad type-2 DTW with one-layer padding • To find a suitable path • Minimizing total distance • Minimizing average distance

DTW Path of “Match Corners”

DTW Path of “Match Anywhere”

DTW for Spoken Document Retrieval • Applications • Voice-based audio/video retrieval • Issues in SDR using DTW • Speaker normalization • Vocal track length normalization (VTLN) • Frequency warping • Efficiency

DTW for Speaker-independent Voice Command Recognition • Applications • Digit recognition • Technical highlights • Extensive recordings • Clustering within each command • Some indexing methods for DTW • Suitable for small-vocabulary applications

DTW for Speech Recognition

DTW for Speech Recognition

Presentation Transcript

Speech Recognition

Speech Recognition

Using Speech Recognition for Speech Therapy

Speech Recognition

Speech recognition

Combining Speech Attributes for Speech Recognition

Speech Recognition

Speech Recognition

Speech Recognition

Speech Recognition

Speech Recognition

SPEECH RECOGNITION:

Speech Recognition

SPEECH RECOGNITION

Speech Recognition

Speech Recognition

Speech Recognition

Speech Recognition

Speech Recognition for Dummies

Speech Recognition

Speech Recognition

Speech Recognition