1 / 16

Chapter 3 Time Domain Analysis of Speech Signal

Chapter 3 Time Domain Analysis of Speech Signal. 3.1 Short-time windowing signal (1). π. Three types windows : Rectangular window h r [n] = u[n] – u[n – N] H r (e j ω ) = (sin(ωN/2)/sinω/2)e -jω(N-1) /2 General Hamming window H h [n] = (1-α) – αcos(2πn/N) 0 ≤ n < N

tyler
Download Presentation

Chapter 3 Time Domain Analysis of Speech Signal

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 3 Time Domain Analysis of Speech Signal

  2. 3.1 Short-time windowing signal (1) π • Three types windows : • Rectangular window • hr[n] = u[n] – u[n – N] • Hr(ejω) = (sin(ωN/2)/sinω/2)e-jω(N-1) /2 • General Hamming window • Hh[n] = (1-α) – αcos(2πn/N) 0 ≤ n < N • = (1-α) hr[n] - α hr[n] cos(2πn/N) • Hh(ejω) = (1-α)Hr (ejω) - (α/2)Hr [ej(ω-2π/N) ] - (α/2)Hr[ej(ω+2π/N) ] • α=0.5 hanning window, α=0.46 hamming win • Windowed signal is xw(n) = x(n) w(n) π

  3. Short-time windowing signal (2) • Qn = Σm=n-N+1n T[x(m)]w(n-m) • This is another representation for analysis. Window length is limited, so the values of Qn is a sequence of local weighted average values of the sequence T[x(m)]. • T[ ] is a linear or nonlinear transformation. • Qn describes the short-time property of speech signal.

  4. 3.2 Time domain parameters (1) • 3.2.1 Short-time Energy and short-time average amplitude • En = Σm=nn+N-1 xw2(m) (by using rectangle window) • the summation is from n to n+N-1 • For voiced segment (or frame) En is large, for unvoiced segment it is small • En is too sensitive to large signal levels • Mn = Σm=nn+N-1|xw(m)|/N • Mn also describes the average intensity of the signal

  5. Time domain parameters (2) • 3.2.2 Short-time average zero-crossing rate • Zn = Σm=nn+N-1|sgn[xw(m)] - sgn[xw(m-1)]| • where sgn(x) = 1 x ≥ 0 • = -1 x < 0 • Zn can roughly estimate the frequency of signal • Multiple threshold for zero-crossing: • Zni = Σm=nn+N-1{|sgn[xw(m)-Ti] - sgn[xw(m-1)-Ti]| + |sgn[xw(m)+Ti] - sgn[xw(m-1)+Ti]|}, i=1,2,3,… • It has some ability to avoid interference of low frequency. Random noise won’t contribute to Zni.

  6. Time domain parameters (3) • 3.2.3 Short-time auto-correlation function • Rw(k) =Σm=0N-k-1 xw(m)xw(m+k) • Rw(k) = Rw(-k) =Σm=kN-1 xw(m)xw(m-k) • Rw(k) = 0 for k<-N+1 or k>N-1 • Rw(0) = Σm=0N-1 xw2(m) >= Rw(k)

  7. Time domain parameters (4) • 3.2.4 Short-time frequency and power spectrum • Xw(exp(jω)) = Σn=0N-1 xw(n)exp(-jωn) is short-time frequency spectrum • |Xw(exp(jω))|2 is called short-time power spectrum density • |Xw(exp(jω))|2 = Σ-N+1N-1Rw(n)exp(- jωn) • Short-time auto-correlation function and power spectrum is an important pair of parameter

  8. Time domain parameters (5) • 3.2.5 Short-time Average Magnitude Difference Function • rw(k) = Σm=0N-k-1|xw(m+k) - xw(m)| • AMDF is implemented with subtraction, addition, and absolute value operations, in contrast to addition and multiplication operation for the auto-correlation function.

  9. 3.3 S/U/V detection • S-silence, U-unvoiced, V-voiced are three basic speech states • S, U and V are random, they have different distributions (close to normal distribution). • For voiced, M is max, Z is min(20/160) • For unvoiced, Z is max (70/160), M is mid • For silence, M is min, Z is mid

  10. 3.4 Endpoint detection • 3.4.1 double threshold beginning detection • Set two thresholds Th and Tl for the En or Mn to get the real starting and ending points; for unvoiced, the Zn is used to differ the starting point to silence. • 3.4.2 multi zero-crossing threshold beginning detection • Set T1<T2<T3, for every frame find their Z1, Z2,Z3 and Z=W1Z1+W2Z2+W3Z3 • If Z>Z0 the frame is voiced, otherwise unvoiced

  11. 3.5 Pitch period (Tp) estimation (1) • 3.5.1 preprocessing • 1. Center clipping • x(n)-CL x(n) > CL • y(n)=C[x(n)]= 0 |x(n)|<=CL • x(n)+CL x(n) < -CL • 2. Low pass filter (900Hz) with linear phase • 3. Three levels of clipping • y’(n)=C’[y(n)]=1,0,-1 if y(n)>0,=0,<0

  12. Pitch period (Tp) estimation (2) • 3.5.2 pitch detection by auto-correlation function • 1. 900Hz low pass filtering, deleting first 20 signals {x(n)}  {x’(n)} • 2. CL = 0.68 max {x’(n)} • 3. y(n) = C[x’(n)] 20<n<300 y’(n) = C’[y(n)] 20<n<300 • 4. R(k) = y(n)y’(n+k) k=0,20,21,…,150

  13. Pitch period (Tp) estimation (3) • 5. Rmax = max { R20 ~ R150 } • 6. If Rmax < 0.25R(0) then Tp=0 (unvoiced) else Tp=argmax20<k<150 R(k)xT (voiced) • 3.5.3 pitch detection by average difference of amplitude • 1. Same as above. 900Hz filtering • 2. r(k) = |x’(n+m) – x’(n+m-k)|/140 k=21,22,…,140

  14. Pitch period (Tp) estimation (3) • 3. Tp’ = argmink r(k), rmin =mink r(k) • 4. Check if rmin>a1,Tp=0 (unvoiced); if rmin/M<a1, voiced; (M= |x’(n)|/280) • 5. Check if rmin(Tp’/2)/M<a2, Tp = Tp’/2 else if rmin(Tp’/3)/M<a2, Tp = Tp’/3 • ai is determined by experimental statistics • If there are I frames, pi is the correct pitch estimation of frame i, a2 = mini ri(pi)/Mi • a2 < a1(for unvoiced)

  15. Pitch period (Tp) estimation (4) • 3.5.4 post-processing of pitch detect • Smoothing processing by median filtering : y(n) = mediann-Ln+L [x(n)] • Linear smoothing : y(n)=Σm=-LL x(n-m)w(m), Σm=-LLw(m)=1 • Smoothing processing by dynamic programming : p1, p2 ,…pN for smoothing • Define cost function (B>0) C(i,j)=|(Pi-Pj)/(i-j)| - B (i!=j) or -B (i=j)

  16. Pitch period (Tp) estimation (5) • D(i) is the cost of i-th step, the track steps: • 1. i=1, D(j)=0, j=1~N • 2. Calculate C(i,j), j=1~i • 3. d(i,j)=D(j)+C(i,j),j=1~I • 4. Find optimal path: D(i)=minj=1~i d(i,j) J(i) = argminj=1~i d(i,j) • 5. If I<N goto 2 • 6. Smooth result: Pi = Pj(i), i=1~N • C(i,j)means the cost for replacing Pi with Pj

More Related