Chapter 20 Speech Encoding by Parameters

Chapter 20 Speech Encoding by Parameters • 20.1 Linear Predictive Coding (LPC) • 20.2 Linear Predictive Vocoder • 20.3 Code Excited Linear Prediction (CELP)

20.1 Linear Predictive Coding (1) • Two kinds of correlation • Short-time correlation between samples • Long-time correlation between adjacent pitch periods. • By Linear Prediction these correlations could be de-correlated and the residual signal will be obtained.

Linear Predictive Coding (2) • Short-time Prediction of Speech • The all poles model H(Z) = 1/A(Z)=1/[1-Σ aiZ-i], i=1~P • where {ai} are the predictive coefficients, P is the order number of the filter. In general, H(Z) is LP synthesis filter. • A(Z) is LP analytic filter or inverse filter. P(Z)= Σ aiZ-i is the predictor of order P. • For 8kHz sampling rate, P is typically 8-12. The {ai} are obtained for every frame and updated with speed of 30-100 per second (frame shift is 10-33ms).

Linear Predictive Coding (3) • Long-time Prediction • Filter 1/P(Z) represent the long-time correlations. The general form is 1/P(Z)=1/[1- Σ biZ-(D+i)], i=-q~r • Delay parameter D equals pitch period, {bi} are predictive coefficients of long-time correlation. • In general the number of b equals 1(q=r=0) to 3(q=r=1). • D and {bi} could be extracted from speech signal or the residual signal obtained by removing the short-time correlation. These coefficients are updated with speed of 50-200 per second.

Linear Predictive Coding (4) • In some cases there is no the long-time prediction, only short-time prediction is done, then the long-time correlation is introduced into the LPC excitation model. • Excitation Signal Source • If the speech signal is input to the A(Z) and P(Z), the short-time and long-time correlations wil be removed and noise like signal is obtained that is the LP residual signal. • If the speech is voiced, there exists the peak pulses repeated with pitch period. • The spectrum of LP residual signal has much less fluctuation so it is possible to encode it with low rate.

Linear Predictive Coding (5) • In general, the lower the rate is the worse the speech quality is or the lager the complexity is. • In summary, LPC method encodes the pridictive coefficients (side information) and the excitation signal and outputs them at sending side; then decodes them and synthesizes the speech signals.

20.2 Linear Predictive Vocoder (1) • The systemusing analysis and synthesisto encode the speech is called Vocoder. • In LPC vocoder, Every frame (N samples) has P+3 parameters : {ai, i=1~P}, gain RMS, Voicing and pitch for voiced. It could implement low rate encoding with 2.4kb/s or less. • LPC-10 Vocoder • (1) Encoder (Fig.8-3) • (2) RC(Reflection Coefficients) calculation • (3) RMS calculation (average energy)

Linear Predictive Vocoder (2) • (4) Pitch period extraction and unvoiced/voiced detection • (5) Parameter encoding and decoding • (6) Decoder at receive side • (7) Comparison between the synthesized speech and primary speech • (8) Problems of LPC-10 Vocoder

20.3 Code Excited Linear Prediction (CELP) (1) • Initially proposed in 1985. Now there is a family of this algorithm. It features with high quality and low rate (4.8kb/s to 16kb/s). • It encodes the frames. Frame length is 20~30ms. The encoding technique is based on search process of A-B-S , Perceptually Weighted VQ and LPC. • The encoder is on Fig. 8-15. The key point is to search the optimal code vector and gain to minimize the perceptually weighted squared error of the origial signal and the synthesized signal.

Code Excited Linear Prediction (CELP) (2) • CELP search algorithm • US Standard FED-STD-1016

Chapter 20 Speech Encoding by Parameters