Speech Coding Process for High-Quality Communication

Speech Processing Speech Coding

Speech Coding • Definition: • Speech Coding is a process that leads to the representation of analog waveforms with sequences of binary digits. • Even though availability of high-bandwidth communication channels has increased, speech coding for bit reduction has retained its importance. • Reduced bit-rates transmissions required for cellular networks • Voice over IP • Coded speech • Is less sensitive than analog signals to transmission noise • Easier to: • protect against (bit) errors • Encrypt • Multiplex, and • Packetize • Typical Scenario depicted in next slide (Figure 12.1) Veton Këpuska

Digital Telephone Communication System Veton Këpuska

Categorization of Speech Coders • Waveform Coders: • Used to quantize speech samples directly and operate at high-bit rates in the range of 16-64 kbps (bps - bits per second) • Hybrid Coders • Are partially waveform coders and partly speech model-based coders and operate in the mid bit rate range of 2.4-16 kbps. • Vocoders • Largely model-based and operate at a low bit rate range of 1.2-4.8 kbps. • Tend to be of lower quality than waveform and hybrid coders. Veton Këpuska

Quality Measurements • Quality of coding can is viewed as the closeness of the processed speech to the original speech or some other desired speech waveform. • Naturalness • Degree of background artifacts • Intelligibility • Speaker identifiability • Etc. Veton Këpuska

Quality Measurements • Subjective Measurement: • Diagnostic Rhyme Test (DRT) measures intelligibility. • Diagnostic Acceptability Measure and Mean Opinion Score (MOS) test provide a more complete quality judgment. • Objective Measurement: • Segmental Signal to Noise Ratio (SNR) – average SNR over a short-time segments • Articulation Index – relies on an average SNR across frequency bands. Veton Këpuska

Quality Measurements • A more complete list and definition of subjective and objective measures can be found at: • J.R. Deller, J.G. Proakis, and J.H.I Hansen, “Discrete-Time Processing of Speech”, Macmillan Publishing Co., New York, NY, 1993 • S.R. Quackenbush, T.P. Barnwell, and M.A. Clements, “Objective Measures of Speech Quality. Prentice Hall, Englewood Cliffs, NJ. 1988 Veton Këpuska

Statistical Models • Speech waveform is viewed as a random process. • Various estimates are important from this statistical perspective: • Probability density • Mean, Variance and autocorrelation • One approach to estimate a probability density function (pdf) of x[n] is through histogram. • Count up the number of occurrences of the value of each speech sample in different ranges:for many speech samples over a long time duration. • Normalize the area of the resulting curve to unity. Veton Këpuska

Statistical Models • The histogram of speech (Davenport, Paez & Glisson) was shown to approximate a gamma density:where x is the standard deviation of the pdf. • Simpler approximation is given by the Laplacian pdf of the form: Veton Këpuska

PDF of Speech Veton Këpuska

Scalar Quantization • Assume that a sequence x[n] was obtained from speech waveform that has been lowpass-filtered and sampled at a suitable rate with infinite amplitude precision. • x[n] samples are quantized to a finite set of amplitudes denoted by . • Associated with the quantizer is a quantization step size. • Quantization allows the amplitudes to be represented by finite set of bit patterns – symbols. • Encoding: • Mapping of to a finite set of symbols. • This mapping yields a sequence of codewords denoted by c[n] (Figure 12.3a). • Decoding – Inverse process whereby transmitted sequence of codewords c’[n] is transformed back to a sequence of quantized samples (Figure 12.3b). Veton Këpuska

Scalar Quantization Veton Këpuska

Fundamentals • Assume a signal amplitude is quantized into M levels. • Quantizer operator is denoted by Q(x); Thus • Where denotes M possible reconstruction levels – quantization levels, and • 1≤i≤M • xi denotes M+1 possible decision levels with 0≤i≤M • If xi-1< x[n] < xi, then x[n] is quantized to the reconstruction level • is quantized sample of x[n]. Veton Këpuska

Fundamentals • Scalar Quantization Example: • Assume there M=4 reconstruction levels. • Amplitude of the input signal x[n] falls in the range of [0,1] • Decision levels and Reconstruction levels are equally spaced: • Decision levels are [0,1/4,1/2,3/4,1] • Reconstruction levels assumed to be [0,1/8,3/8,5/8,7/8] • Figure 12.4 in the next slide. Veton Këpuska

Example of Uniform 2-bit Quantizer Veton Këpuska

Uniform Quantizer • A uniform quantizer is one whose decision and reconstruction levels are uniformly spaced. Specifically: •  is the step size equal to the spacing between two consecutive decision levels which is the same spacing between two consecutive reconstruction levels (Exercise 12.1). • Each reconstruction level is attached a symbol – the codeword. Binary numbers typically used to represent the quantized samples (Figure 12.4). Veton Këpuska

Uniform Quantizer • Codebook: Collection of codewords. • In general with B-bit binary codebook there are 2B different quantization (or reconstruction) levels. • Bit rate is defined as the number of bits B per sample multiplied by sample rate fs: I=Bfs • Decoder inverts the coder operation taking the codeword back to a quantized amplitude value (e.g., 01 → ). • Often the goal of speech coding/decoding is to maintain the bit rate as low as possible while maintaining a required level of quality. • Because sampling rate is fixed for most applications this goal implies that the bit rate be reduced by decreasing the number of bits per sample Veton Këpuska

Uniform Quantizer • Designing a uniform scalar quantizer requires knowledge of the maximum value of the sequence. • Typically the range of the speech signal is expressed in terms of the standard deviation of the signal. • Specifically, it is often assumed that: -4x≤x[n]≤4x where x is signal’s standard deviation. • Under the assumption that speech samples obey Laplacian pdf there are approximately 0.35% of speech samples fall outside of the range: -4x≤x[n]≤4x. • Assume B-bit binary codebook ⇒ 2B. • Maximum signal value xmax = 4x. Veton Këpuska

Uniform Quantizer • For the uniform quantization step size  we get: • Quantization step size  relates directly to the notion of quantization noise. Veton Këpuska

Quantization Noise • Two classes of quantization noise: • Granular Distortion • Overload Distortion • Granular Distortion • x[n] unquantized signal and e[n] is the quantization noise. • For given step size  the magnitude of the quantization noise e[n] can be no greater than /2, that is: • Figure 12.5 depicts this property were: Veton Këpuska

Quantization Noise Veton Këpuska

Quantization Noise • Overload Distortion • Maximum-value constant: • xmax = 4x (4x≤x[n]≤4x) • For Laplacian pdf, 0.35% of the speech samples fall outside the range of the quantizer. • Clipped samples incur a quantization error in excess of /2. • Due to the small number of clipped samples it is common to neglect the infrequent large errors in theoretical calculations. Veton Këpuska

Quantization Noise • Statistical Model of Quantization Noise • Desired approach in analyzing the quantization error in numerous applications. • Quantization error is considered an ergodic white-noise random process. • The autocorrelation function of such a process is expressed as: Veton Këpuska

Quantization Error • Previous expression states that the process is uncorrelated. • Furthermore, it is also assumed that the quantization noise and the input signal are uncorrelated, i.e., • E(x[n]e[n+m])=0,  m. • Final assumption is that the pdf of the quantization noise is uniform over the quantization interval: Veton Këpuska

Quantization Error • Stated assumptions are not always valid. • Consider a slowly varying – linearly varying signal ⇒ then e[n] is also changing linearly and is signal dependent (see Figure 12.5 in the previous slide). • Correlated quantization noise can be annoying. • When quantization step  is small then assumptions for the noise being uncorrelated with itself and the signal are roughly valid when the signal fluctuates rapidly among all quantization levels. • Quantization error approaches a white-noise process with an impulsive autocorrelation and flat spectrum. • One can force e[n] to be white-noise and uncorrelated with x[n] by adding white-noise to x[n] prior to quantization. Veton Këpuska

Quantization Error • Process of adding white noise is known as Dithering. • This decorrelation technique was shown to be useful not only in improving the perceptual quality of the quantization noise but also with image signals. • Signal-to-Noise Ratio • A measure to quantify severity of the quantization noise. • Relates the strength of the signal to the strength of the quantization noise. Veton Këpuska

Quantization Error • SNR is defined as: • Given assumptions for • Quantizer range: 2xmax, and • Quantization interval: = 2xmax/2B, for a B-bit quantizer • Uniform pdf, it can be shown that (see Exercise 12.2): Veton Këpuska

Quantization Error • Thus SNR can be expressed as: • Or in decibels (dB) as: • Because xmax = 4x, then SNR(dB)≈6B-7.2 Veton Këpuska

Quantization Error • Presented quantization scheme is called pulse code modulation (PCM). • B-bits per sample are transmitted as a codeword. • Advantages of this scheme: • It is instantaneous (no coding delay) • Independent of the signal content (voice, music, etc.) • Disadvantages: • It requires minimum of 11 bits per sample to achieve “toll quality” (equivalent to a typical telephone quality) • For 10000 Hz sampling rate, the required bit rate is:B=(11 bits/sample)x(10000 samples/sec)=110,000 bps=110 kbps • For CD quality signal with sample rate of 20000 Hz and 16-bits/sample, SNR(dB) =96-7.2=88.8 dB and bit rate of 320 kbps. Veton Këpuska

Nonuniform Quantization • Uniform quantization may not be optimal (SNR can not be as small as possible for certain number of decision and reconstruction levels) • Consider for example speech signal for which x[n] is much more likely to be in one particular region than in other (low values occurring much more often than the high values). • This implies that decision and reconstruction levels are not being utilized effectively with uniform intervals over xmax. • A Nonuniform quantization that is optimal (in a least-squared error sense) for a particular pdf is referred to as the Max quantizer. • Example of a nonuniform quantizer is given in the figure in the next slide. Veton Këpuska

Nonuniform Quantization Veton Këpuska

Nonuniform Quantization • Max Quantizer • Problem Definition: For a random variable x with a known pdf, find the set of M quantizer levels that minimizes the quantization error. • Therefore, finding the decision and boundary levels xi and xi, respectively, that minimizes the mean-squared error (MSE) distortion measure: D=E[(x-x)2] • E-denotes expected value and x is the quantized version of x. • It turns out that optimal decision level xk is given by: ^ ^ ^ Veton Këpuska

Nonuniform Quantization • Max Quantizer (cont.) • The optimal reconstruction level xk is the centroid of px(x) over the interval xk-1≤ x ≤xk: • It is interpreted as the mean value of x over interval xk-1≤ x ≤xk for the normalized pdf p(x). • Solving last two equations for xk and xk is a nonlinear problem in these two variables. • Iterative solution which requires obtaining pdf (can be difficult). ^ ~ ^ Veton Këpuska

Nonuniform Quantization Veton Këpuska

Companding • Alternative to the nonuniform quantizer is companding. • It is based on the fact that uniform quantizer is optimal for a uniform pdf. • Thus if a nonlinearity is applied to the waveform x[n] to form a new sequence g[n] whose pdf is uniform then • Uniform quantizer can be applied to g[n] to obtain g[n], as depicted in the Figure 12.10 in the next slide. ^ Veton Këpuska

Companding Veton Këpuska

Companding • A number of other nonlinear approximations nonlinear transformation that achieves uniform density are used in practice which do not require pdf measurement. • Specifically and A-law and –law companding. • -law coding is give by: • CCITT international standard coder at 64 kbps is an example application of -law coding. • -law transformation followed by 7-bit uniform quantization giving toll quality speech. • Equivalent quality of straight uniform quantization achieved by 11 bits. Veton Këpuska

Adaptive Coding • Nonuniform quantizers are optimal for a long term pdf of speech signal. • However, considering that speech is a highly-time-varying signal, one has to question if a single pdf derived from a long-time speech waveform is a reasonable assumption. • Changes in the speech waveform: • Temporal and spectral variations due to transitions from unvoiced to voiced speech, • Rapid volume changes. • Approach: • Estimate a short-time pdf derived over 20-40 msec intervals. • Short-time pdf estimates are more accurately described by a Gaussian pdf regardless of the speech class. Veton Këpuska

Adaptive Coding • A pdf derived from a short-time speech segment more accurately represents the speech nonstationarity. • One approach is to assume a pdf of a specific shape in particular a Gaussian with unknown variance 2. • Measure the local variance then adapt a nonuniform quantizer to the resulting local pdf. • This approach is referred to as adaptive quantization. • For a Gaussian we have: Veton Këpuska

Adaptive Coding • Measure the variance x2 of a sequence x[n] and use resulting pdf to design optimal max quantizer. • Note that a change in the variance simply scales the time signal: • If E(x2[n]) = x2 then E[(x[n])2] = 2x2 • Need to design only one nonuniform quantizer with unity variance and scale decision and reconstruction levels according to a particular variance. • Fix the quantizer and apply a time-varying gain to the signal according to the estimated variance (scale the signal to match the quantizer). Veton Këpuska

Adaptive Coding Veton Këpuska

Adaptive Coding • There are two possible approaches for estimation of a time-varying variance 2[n]: • Feed-forward method (shown in Figure 12.11) where the variance (or gain) estimate is obtained from the input • Feedback method where the estimate is obtained from a quantizer output. • Advantage – no need to transmit extra side information (quantized variance) • Disadvantage – additional sensitivity to transmission errors in codewords. • Adaptive quantizers can achieve higher SNR than the use of –law companding. • –law companding is generally preferred for high-rate waveform coding because of its lower background noise when transmission channel is idle. • Adaptive quantization is useful in variety of other coding schemes. Veton Këpuska

Differential and Residual Quantization • Presented methods are examples of instantaneous quantization. • Those approaches do not take advantage of the fact that speech is highly correlated signal: • Short-time (10-15 samples), as well as • Long-time (over a pitch period) • In this section methods that exploit short-time correlation will be investigated. Veton Këpuska

Differential and Residual Quantization • Short-time Correlation: • Neighboring samples are “self-similar”, that is, not changing too rapidly from one another. • Difference of adjacent samples should have a lower variance than the variance of the signal itself. • This difference, thus, would make a more effective use of quantization levels: • Higher SNR for fixed number of quantization levels. • Predicting the next sample from previous ones (finding the best prediction coefficients to yield a minimum mean-squared prediction error  same methodology as in LPC of Chapter 5). Two approaches: • Have a fixed prediction filter to reflect the average local correlation of the signal. • Allow predictor to short-time adapt to the signal’s local correlation. • Requires transmission of quantized prediction coefficients as well as the prediction error. Veton Këpuska

Differential and Residual Quantization • Illustration of a particular error encoding scheme presented in the Figure 12.12 of the next slide. • In this scheme the following sequences are required: • x[n] – prediction of the input sample x[n]; This is the output of the predictor P(z) whose input is a quantized version of the input signal x[n], i.e., x[n] • r[n] – prediction error signal; residual • r[n] – quantized prediction error signal. • This approach is sometimes referred to as residual coding. ~ ^ ^ Veton Këpuska

Differential and Residual Quantization Veton Këpuska

Differential and Residual Quantization • Quantizer in the previous scheme can be of any type: • Fixed • Adaptive • Uniform • Nonuniform • Whatever the case is, the parameter of the quantizer are determined so that to match variance of r[n]. • Differential quantization can also be applied to: • Speech signal • Parameters that represent speech: • LPC – linear prediction coefficients • Cepstral coefficients obtained from Homomorphic filtering. • Sinewave parameters, etc. Veton Këpuska

Differential and Residual Quantization • Consider quantization error of the quantized residual: • From Figure 12.12 we express the quantized input x[n] as: ^ Veton Këpuska

Differential and Residual Quantization • Quantized signal samples differ form the input only by the quantization error er[n]. • Since the er[n] is the quantization error of the residual: ⇒ if the prediction of the signal is accurate then the variance of r[n] will be smaller than the variance of x[n] ⇒ A quantizer with a given number of levels can be adjusted to give a smaller quantization error than would be possible when quantizing the signal directly. Veton Këpuska

Differential and Residual Quantization • The differential coder of Figure 12.12 is referred to: • Differential PCM (DPCM) when used with • a fixed predictor and • fixed quantization. • Adaptive Differential PCM (ADPCM) when used with • Adaptive prediction (i.e., adapting the predictor to local correlation) • Adaptive quantization (i.e., adapting the quantizer to the local variance of r[n]) • ADPCM yields greatest gains in SNR for a fixed bit rate. • The international coding standard CCITT, G.721 with toll quality speech at 32 kbps (8000 samples/sec x 4 bits/sample) has been designed based on ADPCM techniques. • To achieve higher quality with lower rates it is required to: • Rely on speech model-based techniques and • The exploiting of long-time prediction, as well as • Short-time prediction Veton Këpuska

Speech Coding Process for High-Quality Communication

Speech Coding Process for High-Quality Communication

Presentation Transcript

74.406 Natural Language Processing - Speech Processing -

Speech Processing

Speech Processing

Speech Processing Text to Speech Synthesis

Speech Processing

Speech Processing

Speech Processing

Speech Processing

Speech Processing

Speech Processing

Speech Processing

Speech Processing

Speech Processing

Speech Processing

Speech Signal Processing I

Speech Processing

Speech Processing

Speech Processing

Speech Processing

Speech Processing

Speech Processing

Speech Processing