1 / 35

Waveform SpeechCoding Algorithms: An Overview

Waveform SpeechCoding Algorithms: An Overview. June 20 th , 2012 Adel Zaalouk. Outline. Introduction Concepts Quantization PCM DPCM ADPCM Standards & Applications G711 G726 Performance Comparison & Examples Summary & Conclusion.

ferris
Download Presentation

Waveform SpeechCoding Algorithms: An Overview

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Waveform SpeechCoding Algorithms: An Overview June 20th, 2012 Adel Zaalouk

  2. Outline • Introduction • Concepts • Quantization • PCM • DPCM • ADPCM • Standards& Applications • G711 • G726 • Performance Comparison & Examples • Summary & Conclusion Technical Presentation Page 2

  3. Introduction Motivation What is Speech Coding ? It is the procedure of representing a digitized speech signal as efficiently as possible, while maintaining a reasonable level of speech quality. Why would we want to do that ? To Answer this, let’s have a look at the Structure of the Coding System Our Guy Technical Presentation  Page 3

  4. Introduction Motivation Filtering & Sampling (1) Technical Presentation  Page 4

  5. Introduction Motivation Filtering & Sampling (2) Technical Presentation  Page 5

  6. Introduction Motivation Filtering & Sampling (3) Technical Presentation  Page 6

  7. Introduction Motivation Filtering & Sampling (4) • Most of the speech contents lies in between 300 – 3400 Hz • According to Nyquist theorem Fs >= 2 fm (to avoid aliasing) • A value of 8kHz is selected (8 >= 2*3.4). • For good quality16 bits are used to represent each sample. • Bit-rate = 8kHz *16 bits = 128 kbps Input Rate • The Input rate could even be more, for example in Skype: 16 kHz sampling frequency is used in skype and so resulting to an input rate of 192 kBit/s. But, this is a waste of bandwidth that could rather be used by other services and applications. Source Coding (Speech Coding in this Context) [1] Technical Presentation  Page 7

  8. Introduction Motivation Desirable Properties of a Speech Coder • Low Bit-Rate: By using a lower bit-rate, a smaller bandwidth for transmission is • needed , leaving room for other services and applications . • High Speech Quality: Speech quality is the rival of “low bit-rate”. It is important for the • decoded speech quality to be acceptable for the target application. • Low Coding Delays: The process of speech coding introduce extra delay, this might • affect application that have real time requirements. [1] Technical Presentation  Page 8

  9. Introduction Speech Coding Categories What are the different Categories of speech coding ? • Speech coding is divided into three different categories: • Waveform Codecs (PCM, DM, APCM, DPCM, ADPCM) • Vocoders (LPC, Homo-morphic, …etc ) • Hybrid codecs (CELP, SELP, RELP, APC, SBC, … etc) [2] Technical Presentation  Page 9

  10. Concepts Quantization What Is Quantization ? Quantization is the process of transforming the sample amplitude of a message into a discrete amplitude from a finite set of possible amplitudes. [3] Each sampled value is approximated with a quantized pulse, the approximation will result in an error no larger than q/2 in the positive direction or –q/2 in the negative direction. Technical Presentation  Page 10

  11. Concepts Quantization Understanding Quantization To understand quantization a bit more let’s have a look at the following Example: Technical Presentation  Page 11

  12. Concepts Quantization Classification Of Quantization Process • The Quantization process is classified as follows: • Uniform Quantization: The representation levels are equally spaced (Uniformly spaced) • Midtread type • Midrise type • Non-Uniform Quantization: The representation levels have variable spacing from one • another . [4] But why do we need such classification ?! Technical Presentation  Page 12

  13. Concepts Quantization Human Speech – Excursion & Recap (1) • Speech can broken into two different categories: • Voiced (zzzzz) • Un-Voiced (sssss) • Naturally occurring speech signals are composed of a combination of the above categories, take the word “Goat” for example: [4] Goat contains two voiced signals followed by a partial closure of the vocal tract and then an Un-voiced signal. Those occurs at 3400-3900, 3900-5400, and 6300-6900, respectively. Technical Presentation  Page 13

  14. Concepts Quantization - why do we need such classification ?! (1) Human Speech – Excursion & Recap (2) • It should be noted that: • The peak-to-peak amplitude of voiced signals is approximately ten times that of un-voiced • signal. • Un-voiced signals contain more information, and thus higher entropy than voiced signals. • The telephone system must provide higher resolution for lower amplitude signals • Statistics of Speech Signals : Probability of occurrence [6] [3] Amplitude of speech signals Technical Presentation  Page 14

  15. Concepts Quantization - why do we need such classification ?! - (2) Quantization Noise • The Quantization process is lossy (errorneous). • An error defined as the difference between the input signal M and the output signal V. This • error E is called the Quantization Noise. • Consider the simple example: • M = (3.117, 4.56, 2.31, 7.82, 1) • V = (3,3,2,7,2) • E = M – V = (0.117 ,1.561, 0.31, 0.89, 1) • How do we calculate the noise power ? • Consider an input m of continuous amplitude of the range (-M_max, M_max) • Assume a uniform Quantizer, how do we get the Quantization Noise Power 1 Technical Presentation  Page 15

  16. Concepts Quantization - why do we need such classification ?! - (3) Comparison – Uniform Vs. Non-Uniform Usage • Speech signals doesn’t require high quantization resolution for • high amplitudes (50% Vs. 15%). • wasteful to use uniform quantizer ? • The goal is decrease the SQNR, more levels for low amplitudes, less levels for high ones. • Maybe use a Non-uniform quantizer ? [3] Technical Presentation  Page 16

  17. Concepts Quantization More About Non-Uniform Quantizers (Companding) • Uniform quantizer = use more levels when you need it. • The human ear follows a logarithmic process in which high amplitude sound doesn’t • require the same resolution as low amplitude sounds. • One way to achieve non-uniform quantization is to use what is called as “Companding” • Companding = “Compression + Expanding” Uniform Quantization Compressor Function Expander Function (-1) Technical Presentation  Page 17

  18. Concepts Quantization What is the purpose of a Compander ? • The purpose of a compander is to equalize the histogram of speech signals so that the • reconstruction levels tend to be equally used. [6] [6] • There are two famous companding techniques that Follow the • Encoding law • A-Law Companding • µ-Law Companding 2 Technical Presentation  Page 18

  19. Concepts Quantization A-Law Encoding µ-Law Encoding [3] Technical Presentation  Page 19

  20. Concepts Quantization Companding Approximation • Logarithmic functions are slow to compute, why not approximate ? • 3 bits, 8 segments ( chords ) to approximate • P is the sign bit of the output • S’s are the segment code • Q’s are the quantization codes [3] Technical Presentation  Page 20

  21. Concepts Quantization Companding Approximation – Algorithm • Encoding • Add a bias of 33 to the absolute value of the input sample • Determine the bit position of the most significant among bits 5 to 12 of the input • Subtract 5 from that position, and this is the Segment code • Finally, the 4 bit quantization code is set to 4 bits after the bit position of the most • significant among bits 5 to 12 • Decoding • Multiply the quantization code by 2 and add 33 the bias to the result • Multiply to the result by 2 raised to the power of the segment code • Decrement the result by the bias • Use P – bit to determine the sign of the result • Example ?! [3] Technical Presentation  Page 21

  22. Concepts Quantization µ-Law Encoding - Example • Example Input - 656 P S2 S1 S3 Q3 Q4 Q5 Q6 • Sample is negative so bit P becomes 1 • Add 33 to the absolute value to bias high input values (due to wrapping) • The result after adding is 689 = 0001-0101-10001 • The most-significant 1 bit in position 5 to 12 is at position 9 • Subtracting 5 from the position values yields 4  The segment code • Finally the 4 bits after the last position are inserted as the quantization code Technical Presentation  Page 22

  23. Concepts Quantization µ-Law Decoding - Example • Example Input - 656 P S2 S1 S3 Q3 Q4 Q5 Q6 • The quantization code is 101 = 5, so 5*2 +33 =43 • The segment code is 100 = 4 , so 43* 2^4 = 688. • Decrement the Bias 688 -33 =655 • But P is 1 so the final result is -655 • Quantization Noise is 1 (Very small) Technical Presentation  Page 23

  24. Concepts Quantization µ-Law Encoding • Approximately linear for smaller values & Logarithmic for high input values • The practically used values for µ is 255 • Used for speech signals • Used for PCM telephone systems in US, Canada and Japan A-Law Encoding • Linear segments for low level inputs & a logarithmic segment for high level inputs • The practically used values for A is 100 • Used for PCM telephone system in Europe Technical Presentation  Page 24

  25. Concepts Pulse Code Modulation (PCM) PCM Description • Sampling results in PAM • PCM uniformly quantizes PAM • The result of PCM are PCM words • Each PCM word is l= Log2 (L) bits [3] Technical Presentation  Page 25

  26. Concepts Differential Pulse Code Modulation (DPCM) DPCM Description • Signals that are sampled at a high rate have high correlation. • The difference between those samples will not be large • Instead of quantizing each sample, why not quantize the difference ? • This will result in a quantizer with much less number of bits [7] [7] • This is a simple form where (First Order) • More than one signal can be used in the prediction (N-Order) • Problems with this approach ? Technical Presentation  Page 26

  27. Concepts Differential Pulse Code Modulation (DPCM) DPCM Example [7] • It is clear here from the table that the error adds up to produce an output signal which is • completely different from the original one Technical Presentation  Page 27

  28. Concepts Differential Pulse Code Modulation (DPCM) DPCM Prediction • Previously, input to predictor in the encoder was different than the one in the decoder. • The difference between the predictor led to reconstruction error e(n) = x[n] – x’[n]. • To solve this problem completely the same predictor that was used in the decoder will also • be used in the decoder • Therefore the reconstruction error at the decoder output will be the same as the • quantization error at the encoder. • There will be no quantization accumulation. Channel Technical Presentation  Page 28

  29. Concepts Adaptive Differential Pulse Code Modulation (ADPCM) ADPCM Description • As can be inferred from the name, ADPCM combines PCM + DPCM and adds the ADPCM • The “A” in ADPCM stands for “Adaptive” • In DPCM, the difference between x[k] and x[k-1] is transmitted instead of x[k] • To further reduce the number of bits per sample, ADPCM adapts the quantization levels to • the characteristics of the analog signal . Original 32-Kbps ADPCM used 4 bits [9] Technical Presentation  Page 29

  30. Standards, Examples & Applications G711 G711 Description • A Wave form codec that was Released in 1972 • Formal name is Pulse Code Modulation (PCM) since it uses PCM in it’s encoding • G711 achieves 64 kbps bit rate (8 kHz sampling frequency x 8 bits per sample) • G711 defines two main compression algorithms • A-Law (Used in North America & Japan) • µ-Law (Used in Europe and the rest of the world) • A and µ laws takes as an input 14-bit and 13-bit signed linear PCM samples and Compress • them to 8-bit samples • Applications • Public Switching Telephone Network (PSTN) • WiFi phones VoWLAN • Wideband IP Telephony • Audio & Video Conferencing • H.320 & H.323 specifications Technical Presentation  Page 30

  31. Standards, Examples & Applications G726 G726 Description • G726 makes a conversion of a 64 kbps A-law or µ-law PCM channel to and from a 40, 32, 24 • or 16 kbps channel. • The conversion is applied to raw PCM using the ADPCM Encoding Technique • Different rates are achieved by adapting the number of quantization levels • 4 - levels (2 bits and 16 kbps) • 7 - levels (3 bits and 24 kbps) • 15 - levels (4 bits and 32 kbps) • 31 - levels (5 bits and 64 kbps) • Includes G721 and G723 [12] Technical Presentation  Page 31

  32. Performance Comparison [1] Technical Presentation  Page 32

  33. Summary & Conclusion Summary & Conclusion Summary • We talked about quantization concepts in all it’s flavors • We discussed about the category of waveform coding (PCM,DPCM and ADPCM) • We presented the ITU Standards (G711 and G726) and mentioned some examples and • applications • Finally we did a comparison the most prominent speech codec's out there. Conclusion • Speech coding Is an important concept that is required to efficiently use the existing • bandwidth • There exist many important metrics to keep in mind when doing speech coding. It is I • important for a good speech coder to balance those metrics. The Most important ones are • Data Rate • Speech Quality • Delay • Waveform codec's, achieves the best speech quality as well as low delays. • Vocoders achieves low data rate but at the cost of delays and speech quality • Hybrid coders achieves acceptable speech quality and acceptable delay and data rate. Technical Presentation  Page 33

  34. References Wai C. Chu Speech Coding Algorithms: Foundation & Evolution of Standardized Coders Speech Coding: http://www-mobile.ecs.soton.ac.uk/speech_codecs/ Sklar: Digital Communication Fundamentals And Applications. A-Law and mu-Law Companding Implementations Using the TMS320C54x Michael Langer: Data Compression – Introduction to lossy compression Signal Quantization and Compression Overview    http://www.ee.ucla.edu/~dsplab/sqc/over.html Wajih Abu-Al-Saud: Ch. VI Sampling & Pulse Code Mod. Lecture 25 Yuli You: Audio Coding: Theory And Applications Tarmo Anttalainen: Introduction to telecommunication Networks Engineering Wikipedia G711: http://en.wikipedia.org/wiki/G.711 David Salomon: Data Communication the Complete Reference ITU CCIT Recommendation G.726 ADPCM Technical Presentation  Page 34

  35. Questions & Discussion Thank you!! Technical Presentation  Page 35

More Related