Loading in 5 sec....

Chapter 20 Speech Encoding by ParametersPowerPoint Presentation

Chapter 20 Speech Encoding by Parameters

- 85 Views
- Uploaded on
- Presentation posted in: General

Chapter 20 Speech Encoding by Parameters

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

- 20.1 Linear Predictive Coding (LPC)
- 20.2 Linear Predictive Vocoder
- 20.3 Code Excited Linear Prediction (CELP)

- Two kinds of correlation
- Short-time correlation between samples
- Long-time correlation between adjacent pitch periods.
- By Linear Prediction these correlations could be de-correlated and the residual signal will be obtained.

- Short-time Prediction of Speech
- The all poles model H(Z) = 1/A(Z)=1/[1-Σ aiZ-i], i=1~P
- where {ai} are the predictive coefficients, P is the order number of the filter. In general, H(Z) is LP synthesis filter.
- A(Z) is LP analytic filter or inverse filter. P(Z)= Σ aiZ-i is the predictor of order P.
- For 8kHz sampling rate, P is typically 8-12. The {ai} are obtained for every frame and updated with speed of 30-100 per second (frame shift is 10-33ms).

- Long-time Prediction
- Filter 1/P(Z) represent the long-time correlations. The general form is 1/P(Z)=1/[1- Σ biZ-(D+i)], i=-q~r
- Delay parameter D equals pitch period, {bi} are predictive coefficients of long-time correlation.
- In general the number of b equals 1(q=r=0) to 3(q=r=1).
- D and {bi} could be extracted from speech signal or the residual signal obtained by removing the short-time correlation. These coefficients are updated with speed of 50-200 per second.

- In some cases there is no the long-time prediction, only short-time prediction is done, then the long-time correlation is introduced into the LPC excitation model.
- Excitation Signal Source
- If the speech signal is input to the A(Z) and P(Z), the short-time and long-time correlations wil be removed and noise like signal is obtained that is the LP residual signal.
- If the speech is voiced, there exists the peak pulses repeated with pitch period.
- The spectrum of LP residual signal has much less fluctuation so it is possible to encode it with low rate.

- In general, the lower the rate is the worse the speech quality is or the lager the complexity is.
- In summary, LPC method encodes the pridictive coefficients (side information) and the excitation signal and outputs them at sending side; then decodes them and synthesizes the speech signals.

- The systemusing analysis and synthesisto encode the speech is called Vocoder.
- In LPC vocoder, Every frame (N samples) has P+3 parameters : {ai, i=1~P}, gain RMS, Voicing and pitch for voiced. It could implement low rate encoding with 2.4kb/s or less.
- LPC-10 Vocoder
- (1) Encoder (Fig.8-3)
- (2) RC(Reflection Coefficients) calculation
- (3) RMS calculation (average energy)

- (4) Pitch period extraction and unvoiced/voiced detection
- (5) Parameter encoding and decoding
- (6) Decoder at receive side
- (7) Comparison between the synthesized speech and primary speech
- (8) Problems of LPC-10 Vocoder

- Initially proposed in 1985. Now there is a family of this algorithm. It features with high quality and low rate (4.8kb/s to 16kb/s).
- It encodes the frames. Frame length is 20~30ms. The encoding technique is based on search process of A-B-S , Perceptually Weighted VQ and LPC.
- The encoder is on Fig. 8-15. The key point is to search the optimal code vector and gain to minimize the perceptually weighted squared error of the origial signal and the synthesized signal.

- CELP search algorithm
- US Standard FED-STD-1016