1 / 21

HTK tutorial

HTK tutorial. Speaker: ricer Date:2005.08.26. outline. Data preparation Corpora: label & speech data Three models for A utomatic S peech R ecognition Acoustic model Feature extraction HMM ( H idden M arkov M odel) Pronunciation dictionary Searching net Free-syllable net

scott-miles
Download Presentation

HTK tutorial

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. HTK tutorial Speaker: ricer Date:2005.08.26

  2. outline • Data preparation • Corpora: label & speech data • Three models for Automatic Speech Recognition • Acoustic model • Feature extraction • HMM (Hidden Markov Model) • Pronunciation dictionary • Searching net • Free-syllable net • Large vocabulary • Recognizer evaluation

  3. Data preparation • We have • Wave files and the correspond labels • MD01M0P0000 • 國語 年份 女生 編號 • We want • The all files list J:/sp12/wav/1.wav, …*.lab J:/sp12/wav/2.wav, …*.lab • The all labels in a file (Master Label File) #!MLF!# "*/1.lab" sia tian . "*/2.lab" sia yu . Function: file2list Function: lab2mlf

  4. xxx1000 夏天下雨 sia4_tieN1_sia4_Y3 transcription 國語語音波形 夏天(sia4 tieN1) 下雨(sia4 Y3) 聲學單位 夏sia4 天tieN1 下sia4 雨Y3 去聲調音節 sia sp tieN sia Y 去聲調音素 s i a sp t i e N s i a Y 音節內右相關 s+i i+a a sp t+i i+e e+N N s+i i+a a Y 音節內左右相關 s+i s-i+t a-i sp t+i t-i+e i-e+N e-N s+i s-i+t a-i Y

  5. sia4 sia s s+i tian1 tian i s-i+a Yu3 yu a a-i sp t t+i … …. Model list Master label File #!MLF!# "*/1.lab" sia4 tian1 . "*/2.lab" sia4 yu3 . syl_tone.mod syl_sp.mod phn.mod tri.mod Mono-syllable Mono-phone Tri-phone Tonal syllable #!MLF!# "*/1.lab" sia sp tian . "*/2.lab" sia sp yu . #!MLF!# "*/1.lab" s i a sp t i a n . "*/2.lab" s i a sp y u . #!MLF!# "*/1.lab" s+i s-i+a i-a sp t+i t-i+a i-a+n a-n . "*/2.lab" s+i s-i+a a-i sp y+u y-u . Function: hled

  6. Data preparation Mono-phone mlf tri-phone mlf hled hled3("syl.mlf", "ex.led", "syl2tri.dic", "tri.mlf", "tri.mod" )

  7. Feature extraction Mel-Frequency Cepstrum Coefficient • See vip/eda.cfg NATURALREADORDER = TRUE SOURCEFORMAT = WAV TARGETKIND = MFCC_E_D_A TARGETRATE = 100000.0 WINDOWSIZE = 200000.0 USEHAMMING = TRUE PREEMCOEF = 0.97 NUMCHANS = 26 NUMCEPS = 12 ENORMALISE = TRUE DELTAWINDOW = 2 ACCWINDOW = 2

  8. Creating mono-phone HMM Transition matrix a33 state3 a11 a22 state1 state2 a01 a12 a23 a34 0.000e+0 1.000e+0 0.000e+0 0.000e+0 0.000e+0 0.000e+0 5.000e-1 5.000e-1 0.000e+0 0.000e+0 0.000e+0 0.000e+0 5.000e-1 5.000e-1 0.000e+0 0.000e+0 0.000e+0 0.000e+0 5.000e-1 5.000e-1 0.000e+0 0.000e+0 0.000e+0 0.000e+0 0.000e+0 • See vip/3n3s1m.pro hcompv("3n3s1m.pro","phn.mod","mfc.lst","phn.mlf") ~o <VecSize> 39 <MFCC_E_D_A> <DiagC> <BeginHMM> <NumStates> 5 <StreamInfo> 3 13 13 13 <State> 2 <SWeights> 3 1.000000e+000 1.000000e+000 1.000000e+000 <Stream> 1 <Mean> 13 0 0 0 0 0 0 0 0 0 0 0 0 0 <Variance> 13 1 1 1 1 1 1 1 1 1 1 1 1 1 <Stream> 2 <Mean> 13 0 0 0 0 0 0 0 0 0 0 0 0 0 <Variance> 13 1 1 1 1 1 1 1 1 1 1 1 1 1 <Stream> 3 <Mean> 13 0 0 0 0 0 0 0 0 0 0 0 0 0 <Variance> 13 1 1 1 1 1 1 1 1 1 1 1 1 1 Function: hcompv

  9. Creating mono-phone HMM erest(0,"mfc.lst", "phn.mlf", "phn.mod", 4) All Gaussian have the same mean and variance Refine Gaussain to fit each data a a i i t t s s

  10. Creating mono-phone HMM *.sts No. model name acounts state1 state2 state3 1 "A" 592 4905.791992 2986.182129 2671.728271 2 "C" 340 2879.937256 1012.802734 1034.989014 3 "E" 124 1856.104492 837.523865 670.962036 4 "G" 2082 12491.683594 13483.448242 7560.445313 5 "I" 580 5163.432617 2224.220703 2649.229248 6 "J" 358 1926.428955 990.966858 1072.944946 hhed(5, "ssp.hed", "phn_sp.mod", 6) AT 2 4 0.2 {sil.transP} AT 4 2 0.2 {sil.transP} AT 1 3 0.3 {sp.transP} TI ssp {sil.state[3],sp.state[2]}

  11. Creating tri-phone HMM ~h "i" <BEGINHMM> <NUMSTATES> 5 <STATE> 2 <SWEIGHTS> 3 1.000000e+000 1.000000e+000 1.000000e+000 <STREAM> 1 <MEAN> 13 -1.231087e+001 -9.749413e-001 9.766034e+000 … <VARIANCE> 13 2.357146e+001 6.214857e+001 6.707030e+001 … <GCONST> 6.855833e+001 <STREAM> 2 <MEAN> 13 -4.161490e-002 -3.644128e-001 2.665605e-002 … <VARIANCE> 13 9.070483e-001 3.527379e+000 4.040065e+000 … <GCONST> 3.090531e+001 <STREAM> 3 <MEAN> 13 -2.207233e-002 2.695016e-002 -4.460607e-001 … <VARIANCE> 13 1.712317e-001 4.176200e-001 3.959162e-001 … <GCONST> 6.822828e+000 …… ~h “s-i+a" <BEGINHMM> <NUMSTATES> 5 <STATE> 2 <SWEIGHTS> 3 1.000000e+000 1.000000e+000 1.000000e+000 <STREAM> 1 <MEAN> 13 -1.231087e+001 -9.749413e-001 9.766034e+000 … <VARIANCE> 13 2.357146e+001 6.214857e+001 6.707030e+001 … <GCONST> 6.855833e+001 …... ~h “t-i+a" <BEGINHMM> <NUMSTATES> 5 <STATE> 2 <SWEIGHTS> 3 1.000000e+000 1.000000e+000 1.000000e+000 <STREAM> 1 <MEAN> 13 -1.231087e+001 -9.749413e-001 9.766034e+000 … <VARIANCE> 13 2.357146e+001 6.214857e+001 6.707030e+001 … <GCONST> 6.855833e+001 …... tri-phone HMM Mono-phone HMM hhed hhed(10, "tri.hed", "phn_sp.mod", 11)

  12. Creating tri-phone HMM Gaussian (with same mean and variance) i-a+n a s-i+a i t-i+a i-a

  13. Creating tri-phone HMM hhed(15, "mix2.hed", "tri.mod", 16) Single Gaissian for each “model” Gaussina Mixtrures (two Gaussians) i-a+n i-a+n s-i+a s-i+a t-i+a t-i+a i-a i-a behhed(2,"hmm/hmm15/15.sts","hed/mix2.hed") mix2.hed MU 2 {s-i+a.state[2].stream[1-3].mix} MU 2 {t-i+a.state[3].stream[1-3].mix} …… erest(16,"mfc.lst", "tri.mlf", "tri.mod", 20)

  14. Creating tri-phone HMM : Training data s+i(mix1) i-a(mix2) s+i(mix2) i-a(mix1) i-a(mix3) s-i+a(mix1) s-i+a(mix2)

  15. Pronunciation dictionary 文字 發音機率 發音 HMM Model 一 0.16134 i2 i sp 一 0.26218 i4 i sp 一 0.57647 i1 i sp 乙 1.00000 i3 i sp 丁 1.00000 ding1 d+i d-i+n i-n+g n-g sp 七 1.00000 ci1 c+i c-i sp sia s i a tian t i a n Yu y u sp sp sil sil sia s+i s-i+a i-a sp tian t+i t-i+a i-a+n a-n sp Yu y+u y-u sp sp [] sp sil [] sil Syl2phn.dic Syl2tri.dic hdman hdman("syl.mod", "syl2phn.dic", "syl2rcd.dic","man1.log","man2.log");

  16. Searching net 台 北 市 Linear net 政 府 中 縣 廳 Free Hanzi net Tree structured net

  17. Searching net j9 j0 yu tian sia I1 i0 i4 !NULL !NULL !NULL j7 j2 i5 i2 j3 j8 j4 i3 j1 Hparse(" free_syl.grm", “free_syl.net ") $free_syl= sia | tian | yu; (<$free_syl>) VERSION=1.0 N=6 L=10 I=0 W=yu I=1 W=!NULL I=2 W=tian I=3 W=sia I=4 W=!NULL I=5 W=!NULL J=0 S=1 E=0 J=1 S=5 E=0 J=2 S=0 E=1 J=3 S=2 E=1 J=4 S=3 E=1 J=5 S=1 E=2 J=6 S=5 E=2 J=7 S=1 E=3 J=8 S=5 E=3 J=9 S=1 E=4 j5

  18. Recognizer evaluation :Testing data s+i i-a s-i+a Vite("mfc.lst", 25, "tri.mod", "syl2tri.dic", "freesyl.net", "rec_freesyl.mlf","rec_freesyl.log" )

  19. Recognizer evaluation Mandarin syllable network 3 da xuei 1 2 s1 Syllable HMM T M s2 tai t t+ai wan wan da d d+a xuei x x+uei dai d d+ai bah b b+ah … s3 s4 sh le tai bei bah wan wan qi dai dai hah liau s5 s6 Model “t” Model “ah” Model “ai” s7 s8 s9 … s10 … Bi-lingual dictionary Syllable network From 1 to 3 layer is to find the best syllable sequences by acoustic characteristic 4 acoustic HMM “tai wan” would translate to “台灣”or “太晚”, two different meanings “tai bei” would translate to “台北”or “泰北”, two different locations From the best syllable sequences to find the best path of Chinese characters HMM s+i s-i+a i-a t+i t-i+a i-a+n a-n y+u y-u t

  20. Recognizer evaluation syl.mlf rec_freesyl.mlf tri.rec Result("rec_freesyl.mlf", "syl.mlf", "syl.mod", "tri.rec" ) #!MLF!# "*/1.lab" sia tian . "*/2.lab" sia yu . #!MLF!# "*/1.lab" sia yu yu . "*/2.lab" sia yu Aligned transcription:I:/… LAB: sia tian REC: sia yu yu Aligned transcription:I:/… LAB: sia yu REC: sia yu WORD: %Correct=50 [H=1, S=1, N=2] SYLL: %Corr=75, Acc=50((3-1/)4) [H=3, D=0, S=1, I=1, N=4] insertion deletionsubstitution

  21. Homework (’01 corpus) • CGU • (tri-phone,free-syllable net) • g1 • 台語 MDXXXX • 華語 TWXXXX • 男生 XXM1XX • 女生 XXM0XX • 時間:兩星期後 • Data:下載

More Related