1 / 26

主要内容

主要内容. 孤立字识别的过程和数据流图 孤立字识别实验演示 展示语音数据波形. HTK 介绍. The Hidden Markov Model Toolkit (HTK) is a portable toolkit for building and manipulating hidden Markov models. HTK is primarily used for speech recognition research

zayit
Download Presentation

主要内容

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 主要内容 • 孤立字识别的过程和数据流图 • 孤立字识别实验演示 • 展示语音数据波形

  2. HTK介绍 • The Hidden Markov Model Toolkit (HTK) is a portable toolkit for building and manipulating hidden Markov models. HTK is primarily used for speech recognition research • HTK was originally developed at the Machine Intelligence Laboratory (formerly known as the Speech Vision and Robotics Group) of the Cambridge University Engineering Department (CUED) where it has been used to build CUED's large vocabulary speech recognition systems (see CUED HTK LVR). In 1993 Entropic Research Laboratory Inc. acquired the rights to sell HTK and the development of HTK was fully transferred to Entropic in 1995 when the Entropic Cambridge Research Laboratory Ltd was established. HTK was sold by Entropic until 1999 when Microsoft bought Entropic. Microsoft has now licensed HTK back to CUED and is providing support so that CUED can redistribute HTK and provide development support via the HTK3 web site • in C source form

  3. 关键字集合 .wav 词级转录 字典 Train.list Test.list HLEd HCopy HMM模型列表 音级转录 Train.fea Test.fea 字典 HMM原型 HCompV hmm.init HHEd HMM1 HERest×5 HMM6 HVite Result 词网络 HParse 语法规则

  4. 主要内容 • 孤立字识别的过程和数据流图 • 孤立字识别实验演示 • 展示语音数据波形

  5. 主要内容 • 孤立字识别的过程和数据流图 • 孤立字识别实验演示 • 展示语音数据波形

  6. .wav • 每个文件中包含一个关键字发音 • 要求有多个人的语音数据,每个人的语音都覆盖集合全部关键字,并且每个字重复多次。 • 比如30个人,每人读数字0到9,每个数字读4遍,每一个数字发音保存为一个单独的文件 Back

  7. 词级转录 • #!MLF!# • "*/10001-0a.lab" • sil • 零 • sil • . • "*/10001-1a.lab" • sil • 一 • sil • . • "*/10001-2a.lab" • sil • 二 • sil • . Back

  8. 字典 • 八 ba 八 b a • 二 er 二 er • 九 jiu 九 j iu • 零 ling 零 l ing • 六 liu 六 l iu • 七 qi 七 q i • 三 san 三 s an • 四 si 四 s i • sil sil sil sil • 五 wu 五 w u • 一 yi 一 y i Back

  9. Train.list Test.list • Train.list Test.list • 10001-0a 10022-0a • 10001-0b 10022-0b • 10001-0c 10022-0c • 10001-0d 10022-0d • 10001-1a 10022-1a • 10001-1b 10022-1b • 10001-1c 10022-1c • 10001-1d 10022-1d • 10001-2a 10022-2a • 10001-2b 10022-2b • 10001-2c 10022-2c • 10001-2d 10022-2d • 10001-3a 10022-3a • .... ...... Back

  10. HLEd • HLEd -n HMM模型列表名 -d 字典名 -l * -i 音级转录 HLEd脚本 词级转录名 • 虽然HLEd还有其他功能,但这里仅仅是使用字典把词级转录转换成音级转录,并统计当前语料中涉及的HMM集合 Back

  11. HMM模型列表 • 就是语料中含有的HMM集合,因为语音数据采集时是覆盖整个关键字集合的,所以这里得到的集合就是关键字集合。 • 列表为: ling yi er san si wu liu qi ba jiu Back

  12. 音级转录 • #!MLF!# • "*/10001-0a.lab" • sil • ling • sil • . • "*/10001-1a.lab" • sil • yi • sil • . • "*/10001-2a.lab" • sil • er • sil • . Back

  13. HCopy • HCopy -C mfcc39.cfg -S Output\wav2fea.scp • Mfcc39.cfg内容如下:SOURCEKIND = WAVEFORMSOURCEFORMAT = WAVTARGETKIND = MFCC_E_D_A_ZTARGETRATE = 100000.0 10ms一个向量WINDOWSIZE = 200000.0 PREEMCOEF = 0.975NUMCHANS = 26 26个滤波带用于Mel频谱CEPLIFTER = 22 NUMCEPS = 12 12个Mel参数USEHAMMING = TDELTAWINDOW = 2 ACCWINDOW= 2 Back

  14. Train.fea Test.fea • 这里得到的是mfcc(mel倒频谱系数) • 每一个为39维的向量:(12个Mel系数+1个能量参数)×3 • MFCC的求解过程:其中三角滤波的过程就是模仿人耳的听觉特性,通过二十多组(个数不固定)三角滤波器对FFT所得频谱进行滤波,得到mel尺度下的频谱。 Back

  15. HMM原型 • ~o <VecSize> 39 <MFCC_E_D_A_Z> ~h "proto" <BeginHMM> <NUMSTATES> 5 <STATE>2 <MEAN> 39 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 <VARIANCE> 39 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 …… <TRANSP> 5 0.000000e+00 1.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 6.000000e-01 4.000000e-01 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 6.000000e-01 4.000000e-01 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 6.000000e-01 4.000000e-01 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 <ENDHMM> Back

  16. HCompV • HCompV是flat start初始化的第一个步骤,它完整读入train.fea,根据定义计算mean和variance的均值 Back

  17. hmm.init • <BEGINHMM><NUMSTATES> 5<STATE> 2<MEAN> 39 -3.452563e-008 1.197302e-008 1.949034e-008 1.896097e-009 2.614947e-009 1.110050e-008 5.698710e-009 3.141062e-009 4.583971e-009 -5.636201e-009 -5.568483e-009 -1.328831e-008 4.742789e-001 1.069706e-002 1.051066e-004 -1.669424e-003 -5.334170e-004 -3.474722e-003 -1.687546e-003 -2.144248e-003 -4.614734e-003 -4.277431e-003 -3.187123e-003 -1.875205e-003 -3.291089e-003 -4.547900e-004 7.004798e-004 1.116028e-004 -1.051262e-004 -7.312075e-005 -3.629404e-004 6.134512e-004 3.128700e-004 8.426207e-004 1.094497e-004 8.713897e-004 9.952679e-004 3.674021e-004 3.289685e-005<VARIANCE> 393.121075e+001 2.072707e+001 3.332254e+001 4.022979e+001 3.308155e+001 2.775239e+001 3.611903e+001 2.532878e+001 2.914802e+001 2.716696e+001 2.552222e+001 2.256501e+001 8.785716e-002 1.521960e+000 8.724220e-001 1.098581e+000 1.433422e+000 1.527411e+000 1.644578e+000 1.744107e+000 1.755233e+000 1.794381e+000 1.701270e+000 1.600709e+000 1.405881e+000 9.507918e-004 2.811392e-001 1.511097e-001 1.878311e-001 2.512953e-001 2.779487e-001 3.066817e-001 3.242885e-001 3.386956e-001 3.427569e-001 3.260452e-001 3.065307e-001 2.679282e-001 1.465851e-004<GCONST> 8.293417e+001…… Back

  18. HHEd • HHEd -H HMM1 mxup.scp 模型列表名 • 其中mxup.scp内容是HHEd的执行脚本,内容仅包含一条语句MU 4 {*.state[2-4].mix}含义是把HMM的state2到state4的mixture分裂为4个 • 此外,HHEd还具有连结hmm对象的作用,如state,transP等。可以对hmm进行聚类,合并相同项。 Back

  19. HMM1 • <VECSIZE> 39<NULLD><MFCC_E_D_A_Z><DIAGC>~h "i" <BEGINHMM><NUMSTATES> 5<STATE> 2<NUMMIXES> 4<MIXTURE> 1 2.500000e-001<MEAN> 39 2.234663e+000 1.821080e+000 2.309027e+000 ……<VARIANCE> 39 3.121075e+001 2.072707e+001 3.332254e+001 ……<GCONST> 8.293417e+001<MIXTURE> 2 2.500000e-001<MEAN> 39 -9.006241e-008 3.079538e-008 -1.535603e-008 ……<VARIANCE> 39 3.121075e+001 2.072707e+001 3.332254e+001 ……<GCONST> 8.293417e+001<MIXTURE> 3 2.500000e-001…… Back

  20. HERest • HERest -H HMM1 –M HMM2 -I 词级转录-S train.fea 模型列表 • 根据模型列表查找所有的HMM模型,根据词级转录的顺序逐个连接相应HMM,对对应的train.fea进行训练,重新调整各个HMM中的mean和variance值,以及训练过程中所得的一些概率值。 Back

  21. HMM6 • HMM1文件夹下的hmm集合经过5次HERest后得到HMM6文件夹下的新hmm集合 • 这个例子在5次重新估值的过程中,没有对hmm的结构(状态数,stream,mixture,transP等)进行调整,所以新hmm与旧hmm仅仅是参数值的变化。 Back

  22. HVite • 是HTK用训练得到的hmm集合对输入的test.fea进行识别的工具。命令如下: • HVite -H HMM6 -l * -S test.fea -i result_test.mlf -w 词网络 字典 模型列表 Back

  23. 词网络 • N节点个数 L链接个数 I节点id W词 J链接id S start E end • VERSION=1.0N=15 L=23 I=0 W=!NULL I=1 W=!NULL I=2 W=sil I=3 W=ling I=4 W=!NULL I=5 W=i I=6 W=er I=7 W=san I=8 W=si I=9 W=wu I=10 W=liou I=11 W=qi I=12 W=ba I=13 W=jiou I=14 W=sil J=0 S=14 E=1 J=1 S=0 E=2 J=2 S=2 E=3 J=3 S=3 E=4 J=4 S=5 E=4 J=5 S=6 E=4 J=6 S=7 E=4 J=7 S=8 E=4 J=8 S=9 E=4 J=9 S=10 E=4 J=10 S=11 E=4 J=11 S=12 E=4 J=12 S=13 E=4 J=13 S=2 E=5 J=14 S=2 E=6 J=15 S=2 E=7 J=16 S=2 E=8 J=17 S=2 E=9 J=18 S=2 E=10 J=19 S=2 E=11 J=20 S=2 E=12 J=21 S=2 E=13 J=22 S=4 E=14 Back

  24. HParse • 将语法规则从范式形式转成参数形式 Back

  25. 语法规则 • 值得是所识别的语音内容满足的语法规则,有助于语音的识别。 • 如:$syl=( 零 | 一 | 二 | 三 | 四 | 五 | 六 | 七 | 八 | 九 ); ( sil $syl sil ) Back

More Related