1 / 80

Speech Recognition Chapter 3

Speech Recognition Chapter 3. Speech Front-Ends. Linear Prediction Analysis Linear-Prediction Based Processing Cepstral Analysis Auditory signal Processing. Linear Prediction Analysis. Introduction Linear Prediction Model Linear Prediction Coefficients Computation

cynara
Download Presentation

Speech Recognition Chapter 3

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Speech RecognitionChapter 3

  2. Speech Front-Ends • Linear Prediction Analysis • Linear-Prediction Based Processing • Cepstral Analysis • Auditory signal Processing

  3. Linear Prediction Analysis • Introduction • Linear Prediction Model • Linear Prediction Coefficients Computation • Linear Prediction for Automatic Speech Recognition • Linear Prediction in Speech Processing • How good is the LP Model.

  4. Signal Processing Front End Convert the speech waveform in some type of parametric representation. sk Filterbank Signal Processing Front End Linear Prediction Front End Linear Prediction Coefficients O=o(1)o(2)..o(T)

  5. Introduction • In short intervals, it provides a good model of the speech. • Mathematical precise and simple. • Easy to implement in software or hardware. • Works fine for recognition applications. • It also has applications in formant and pitch estimation, speech coding and synthesis.

  6. Linear Prediction Model • Basic idea: • are called LP(Linear Prediction) coefficients. • By including the excitation signal, we obtain: • where is the normalised excitation and is the gain of the excitation.

  7. In the z-domain (secc. 1.1.4, pp. 15, Deller) • leading to the transfer function (Fig. 3.27)

  8. LP model retains the spectral magnitude, but it has a minimum phase (Sec. 1.1.7, Deller) feature. • However, in practice, phase is not very important for speech perception. Observation: H(z) models the glottal filter(G(z)) and the lips radiation(R(z).

  9. Linear Prediction Coefficients Computation • Introduction • Methogologies

  10. Linear Prediction Coefficients Computation • LP coefficients can be obtained by solving the next equation system (Secc. 3.3.2, Prove ):

  11. Methodologies • Autocorrelation Method • Covariance Method • Not commonly usedin Speech Recognition

  12. Autocorrelation Method • Assumptions: Each frame is independent (Fig. 3.29 ). • Solution (Juang, secc. 3.3.3 pp105-106): where (2) These equations are know as Yule-Walker equations.

  13. Using matrix notation: • or

  14. Features Symetric. Diagonal elements are the same. Toeplitz Matriz

  15. This matrix is known as Toeplitz. A linear system with this matrix can be solved very efficient. • Examples (Fig. 3.32 and 3.33 ) • Example (Fig. 3.34 ) • Example (Fig. 3.35 ) • Example (Fig. 3.36 )

  16. Linear Prediction for Automatic Speech Recogition To minimise signal discontinuity Flats the spectrum equation (2) usually M=8 Incorporate signal dynamics to minimise noisesensitivity To Cepstral Coefficients Durbin Algorithm

  17. Preemphasis • The transfer function of the glottis can be modelled as follows: • The radiation effect can be modelled as follows:

  18. Hence, to obtain the transfer function of the vocal tract the other pole must be cancelled as follows:.

  19. Preemphasis sould be done only for sonorant sounds. This process can be automated as follows. where is the autocorrelation function.

  20. N samples size frame, M samples frame shift N samples size frame, M samples frame shift

  21. Minimize signal discontinuities at the edges of the frames. • A typical window is the Hamming window.

  22. LPC Analysis • Converts the autocorrelations coefficients into LPC “parameter set”. • LPC Parameter set • LPC coefficients • Reflection (PARCOR) coefficients • log area ratio coefficients • The formal method to obtain the LPC parameter set is know as Durbin’s method.

  23. Durbin’s method

  24. LPC (Typical values)

  25. LPC Parameter Conversion • Conversion to Cepstral Coeficients. • Robust feature set for speech recognition. • Algorithm:

  26. Parameter weighting • low-order cepstral coefficents are highly sensibles to noise

  27. Temporal Cepstral Derivative • First or second order derivatives is enough. • It can be aproximated as follows:

  28. Given

  29. Hamming Windowed Large prediction errors since speech is predicted form previous samples arbitray set to zero.

  30. Large prediction errors since speech is predicted form previous samples arbitray set to zero.

  31. Unvoiced signals are not position sensitive. It does not show special effect at the edges.

  32. Observe the “whitening” phenomena at the error spectrum.

  33. Observe the “whitening phenomena at the error specturm

  34. Observe the error • wave periodicity • behaviour taken • as bases for the • Pitch Estimators.

  35. Observe that a sharp decrease • in the prediction error is obtain • for small M value (M=1...4). • Observe that unvoiced signal • has higher RMS error.

  36. Observe the all-pole model • ability to match the spectrum.

  37. Linear Prediction in Speech Processing • LPC for Vocal Tract Shape Estimation • LPC for Pitch Detection • LPC for Formant prediction

  38. LPC for Vocal Tract Shape Estimation To minimise signal discontinuity Free of glottis and radiation effects Vocal Tract Shape Estimation Parameter Calculation to minimise noisesensitivity To Cepstral Coefficients

  39. Parameter Calculation • Durbin’s Method (As in Speech Recognition) • In case, this method is used, first the autocorrelation analysis should be performed. • Lattice Filter

  40. Lattice Filter • The reflection coefficients are obtain directly form the signal, avoiding the autocorrelation analysis. • Methods: • Itakura-Saito (Parcor) • Burg • New forms • Advantage: • Easier to implement in Hardware • Disadvantage: • needs around 5 times more calculation.

  41. Itakura-Saito (PARCOR) where Accumulates over time (n). It can be shown that the PARCOR coefficients, obtain for the Itakura-Saito method are exactly the same as the reflection coefficients obtained by the Levison Durbin algorithm. Example

  42. Burg where Example

  43. Example Itakura-Saito Burg

  44. New Forms • Stroback, New forms of Levinson and Schur algorithms, IEEE Signal Processing Magazine, pp. 12-36, 1991.

  45. Vocal Tract Shape Estimation From: We obtain Therefore, by setting the the lips area to an arbitrary value we can obtain the vocal tract configuration relative to the initial condition. This technique as been succesfully used to train deaf persons.

  46. LPC for Pitch Detection Speech Sampled at 10KHz Inverse Filering A(z) LPF 800Hz DownSampler 5:1 Peak finding Autocorrelation LPC Analysis V/U decision or Pitch

More Related