1 / 10

# Chapter 20 Speech Encoding by Parameters - PowerPoint PPT Presentation

Chapter 20 Speech Encoding by Parameters. 20.1 Linear Predictive Coding (LPC) 20.2 Linear Predictive Vocoder 20.3 Code Excited Linear Prediction (CELP). 20.1 Linear Predictive Coding (1). Two kinds of correlation Short-time correlation between samples

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about ' Chapter 20 Speech Encoding by Parameters' - dawn-austin

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

• 20.1 Linear Predictive Coding (LPC)

• 20.2 Linear Predictive Vocoder

• 20.3 Code Excited Linear Prediction (CELP)

• Two kinds of correlation

• Short-time correlation between samples

• Long-time correlation between adjacent pitch periods.

• By Linear Prediction these correlations could be de-correlated and the residual signal will be obtained.

• Short-time Prediction of Speech

• The all poles model H(Z) = 1/A(Z)=1/[1-Σ aiZ-i], i=1~P

• where {ai} are the predictive coefficients, P is the order number of the filter. In general, H(Z) is LP synthesis filter.

• A(Z) is LP analytic filter or inverse filter. P(Z)= Σ aiZ-i is the predictor of order P.

• For 8kHz sampling rate, P is typically 8-12. The {ai} are obtained for every frame and updated with speed of 30-100 per second (frame shift is 10-33ms).

• Long-time Prediction

• Filter 1/P(Z) represent the long-time correlations. The general form is 1/P(Z)=1/[1- Σ biZ-(D+i)], i=-q~r

• Delay parameter D equals pitch period, {bi} are predictive coefficients of long-time correlation.

• In general the number of b equals 1(q=r=0) to 3(q=r=1).

• D and {bi} could be extracted from speech signal or the residual signal obtained by removing the short-time correlation. These coefficients are updated with speed of 50-200 per second.

• In some cases there is no the long-time prediction, only short-time prediction is done, then the long-time correlation is introduced into the LPC excitation model.

• Excitation Signal Source

• If the speech signal is input to the A(Z) and P(Z), the short-time and long-time correlations wil be removed and noise like signal is obtained that is the LP residual signal.

• If the speech is voiced, there exists the peak pulses repeated with pitch period.

• The spectrum of LP residual signal has much less fluctuation so it is possible to encode it with low rate.

• In general, the lower the rate is the worse the speech quality is or the lager the complexity is.

• In summary, LPC method encodes the pridictive coefficients (side information) and the excitation signal and outputs them at sending side; then decodes them and synthesizes the speech signals.

• The systemusing analysis and synthesisto encode the speech is called Vocoder.

• In LPC vocoder, Every frame (N samples) has P+3 parameters : {ai, i=1~P}, gain RMS, Voicing and pitch for voiced. It could implement low rate encoding with 2.4kb/s or less.

• LPC-10 Vocoder

• (1) Encoder (Fig.8-3)

• (2) RC(Reflection Coefficients) calculation

• (3) RMS calculation (average energy)

• (4) Pitch period extraction and unvoiced/voiced detection

• (5) Parameter encoding and decoding

• (6) Decoder at receive side

• (7) Comparison between the synthesized speech and primary speech

• (8) Problems of LPC-10 Vocoder

• Initially proposed in 1985. Now there is a family of this algorithm. It features with high quality and low rate (4.8kb/s to 16kb/s).

• It encodes the frames. Frame length is 20~30ms. The encoding technique is based on search process of A-B-S , Perceptually Weighted VQ and LPC.

• The encoder is on Fig. 8-15. The key point is to search the optimal code vector and gain to minimize the perceptually weighted squared error of the origial signal and the synthesized signal.

• CELP search algorithm

• US Standard FED-STD-1016