Embedded Audio Coder - PowerPoint PPT Presentation

wilmet
embedded audio coder n.
Skip this Video
Loading SlideShow in 5 Seconds..
Embedded Audio Coder PowerPoint Presentation
Download Presentation
Embedded Audio Coder

play fullscreen
1 / 75
Download Presentation
Embedded Audio Coder
158 Views
Download Presentation

Embedded Audio Coder

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Embedded Audio Coder Jin Li

  2. Outline • Introduction • Embedded audio coder - Algorithm • MLT with window switching • Quantizer • Entropy coder • Bitstream assembly • Modular software design • Experimental results & demos • Conclusion

  3. Introduction

  4. Introduction – Audio Compression Audio Waveform . . . Bitstream

  5. EAC vs. Other Compression • Existing audio compression schemes • MP3, AAC, MPEG4 audio, WMA, Real Audio, … • Why research for a new audio codec?

  6. Media vs. File Compression • File compression • Every bit is important, has to be compressed losslessly • Media compression • Exact bit/value is not important, distortion is tolerable • Amount of media is huge, high compression ratio is required • Media needs adaptation

  7. Key Features of EAC • Not only good compression performance • But also flexible bitstream syntax • The compressed bitstream may be manipulated for • Different bitrate • Different # of audio channels • Different audio sampling rate • Versatile • Lossless • Low delay • Streaming/storage application

  8. EAC Encoder . . . Master Bitstream Encoder Companion File

  9. Parser . . . . . . Master Bitstream Parser Application Bitstream Companion File • Except header, application bitstream is a subset of the master bitstream (parsing is fast) • May be changed according to the required bitrate, # of audio channels, and audio sampling rate

  10. EAC Decoder .wav file Encoder . . . Bitstream Speaker (Direct Sound)

  11. Embedded Audio Coder- Algorithm Description

  12. . . . Frame Work - Encoder Audio Transform Entropy coder Bitstream Assembly L+R(or mono) Transform Entropy coder Bitstream Assembly L-R Bitstream

  13. Audio Transform • Input: audio sample • Output: transform coefficient • Goal: convert audio from space domain to frequency domain • Compact energy • Better match with psychoacoustic characteristics • Enable audio sampling rate change

  14. Lossy vs Lossless Mode Audio Quantization MLT(SW) Lossy mode Audio Reversible MLT(SW) Lossless mode

  15. Lossy (Float) Pass

  16. MLT - Modulated Lapped Transforms Frequency Domain Spatial Response

  17. MLT with Window Switching • Features • Basic window size 2048 • Short window size 256 • Switching criterion • A frame (2048 samples) is switched to short window if and only if • Energy is bigger than a certain threshold • Energy within the 8 subframes (256 samples) differs more than Ta • There are at least two neighbor subframes, where the energy of the former subframe is greater than the latter subframe by Tb

  18. Band Separation Audio (44.1kHz sampling) MLT with window switching 0.5p p 0 0.125p 0.25p Band separation

  19. Synthesis (Half Sampling) 0.5p 0 0.125p 0.25p MLT with window switching Audio (22.05kHz sampling) Band separation

  20. Synthesis (Quarter Sampling) 0 0.125p 0.25p MLT with window switching Audio (11.025kHz sampling) Band separation

  21. Quantizer • Input: coefficient • Output: quantized coefficient • Goal: convert coefficient from float to integer • Reduce signal levels • Fast implementation of entropy coding

  22. Quantizer • Scalar quantizer with a deadzone d 0 Quantized Magnitude Sign

  23. Lossless (Integer) Pass

  24. Key to Achieve Lossless • Break the MLT into small steps • Make every step reversible • Definition of reversible transform • Integer input, integer output • The transform should have a determinant of 1 (donot expand data volume)

  25. MLT Framework Post Rotation Window Pre-Rotate Complex FFT DCT IV Lapped Transform Forward MLT Inverse MLT Post Rotation-l Inv Window-l Pre-Rotate-l Complex FFT-l

  26. Window Operation x(-n-1) x(n) Complex Rotate

  27. Pre-Rotation xw(0) xw(1) xw(2) xw(3) xw(4) xw(5) xw(6) xw(7) Complex Rotate –/32 xp(0) xp(1) xp(2) xp(3) xp(4) xp(5) xp(6) xp(7) Complex Rotate –5/32 Complex Rotate –9/32 Complex Rotate –13/32

  28. FFT (4 Point Complex) yp(0) yp(1) xp(2) xp(3) xp(4) xp(5) xp(6) xp(7) xp(0) xp(1) xp(2) xp(3) xp(4) xp(5) xp(6) xp(7) yc(0) yc(1) yc(2) yc(3) xc(0) xc(1) xc(2) xc(3) - - - - e-j/2

  29. Post-Rotation Conjugate Rotate –0 y(0) y(1) y(2) y(3) y(4) y(5) y(6) y(7) yp(0) yp(1) yp(2) yp(3) yp(4) yp(5) yp(6) yp(7) Conjugate Rotate –/8 Conjugate Rotate –2/8 Conjugate Rotate –3/8

  30. Reversible MLT • Make the following operation reversible • Butterfly operation • Complex rotation • Conjugate rotation

  31. Reversible Unit Transform

  32. Entropy Coder • Input: • quantized coefficients • Output: • embedded coded bitstream with R-D performance curve • Goal: • Compression • Embedded bitstream for future manipulation

  33. Frame Grouping Time slot 1 2 3 4 5 6 7 8 Frame

  34. Entropy Coder Bitstream D R R-D curve

  35. Entropy Coder • Embedded coding • Implicit psychoacoustic masking • Context modeling • Arithmetic coding • Implementation concerns

  36. 45 0 0 0 0 0 0 0 -74 -13 0 0 3 0 4 0 21 0 4 0 0 3 5 0 14 0 23 23 0 0 0 0 -4 5 0 0 0 1 -1 0 -18 0 0 19 -4 33 0 -1 4 0 23 0 0 0 1 0 -1 0 0 0 0 0 0 0 A block of coefficients Next View graph

  37. 0 1 0 1 1 Bits of Coefficients b1 b2 b3 b4 b5 b6 b7 Sign w0 w1 w2 w3 w4 w5 w6 w7 45 0 1 0 1 1 0 0 1 1 + -74 1 0 0 1 0 1 0 - 21 0 0 1 0 1 0 1 + coefficient 14 0 0 0 1 1 1 0 + -4 0 0 0 0 1 0 0 - -18 0 0 1 0 0 1 0 - 4 0 0 0 0 1 0 0 + -1 0 0 0 0 0 0 1 -

  38. 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 1 1 1 1 + + 1 0 0 1 0 1 0 - 1 0 0 1 0 1 0 - 0 0 1 0 1 0 1 + 0 0 1 0 1 0 1 + 0 0 0 1 1 1 0 + 0 0 0 0 1 0 0 - 0 0 1 0 0 1 0 - 0 0 0 0 1 0 0 + 0 0 0 0 0 0 1 - Conventional Coding b1 b2 b3 b4 b5 b6 b7 Sign First w0 w1 w2 w3 w4 w5 w6 w7 46 Second -74 Third 22 0 0 0 0 0

  39. 0 0 1 1 0 0 1 1 1 1 0 0 1 1 + 0 1 0 + 1 0 0 1 0 1 0 - 1 0 0 - 0 0 1 0 1 0 1 + 0 0 1 + 0 0 0 1 1 1 0 + 0 0 0 0 0 0 0 1 0 0 - 0 0 0 0 0 1 0 0 1 0 - 0 0 1 - 0 0 0 0 1 0 0 + 0 0 0 0 0 0 0 0 0 1 - 0 0 0 Embedded Coding Value Range b1 b2 b3 b4 b5 b6 b7 Sign w0 w1 w2 w3 w4 w5 w6 w7 32..47 40 -72 -79..-64 24 16..31 0 -31..31 0 -31..31 -24 -31..31 0 -31..31 0 -31..31 First Second Third

  40. Audio Masking Signal Signal-to mask ratio Masking Threshold Maximum Mask Noise-to mask ratio Noise Level Frequency Critical Band Neighboring Band

  41. Psychoacoustic Masking • Traditional approach (explicit masking, all existing approaches) • Calculate the mask • Transmit the mask • Modify transform coefficients (or coding approach) according to the masking • Encode the transform coefficients • Note • Mask modifies the coding content

  42. Implicit Psychoacoustic Masking • Key • Mask modifies the coding order, the content is the same • Implicit masking • Calculate the static masking (Fletcher_Munson threshold) • Encode the MSB of the transform coefficients • Calculate the mask based on the MSB of the coefficients • Modify coding order • Encode the next most important part of the coefficients • Repeat the process

  43. 0 0 1 - 0 0 0 0 0 0 Embedded Coding with Implicit Psychoacoustic Masking Value Range b1 b2 b3 b4 b5 b6 b7 Sign w0 w1 w2 w3 w4 w5 w6 w7 -63..63 0 0 -96 1 - -127..-64 0 -63..63 0 0 -63..63 0 0 0 -63..63 0 -63..63 0 Coefficient: Significant Insignificant 0 -127..127 0 0 -127..127 0 Mask First

  44. 0 0 1 1 + 1 0 - 0 0 0 0 0 0 0 0 0 0 0 0 Embedded Coding with Implicit Psychoacoustic Masking Value Range b1 b2 b3 b4 b5 b6 b7 Sign w0 w1 w2 w3 w4 w5 w6 w7 32..63 48 0 1 + -96 1 0 - -127..-64 0 -31..31 0 0 0 -31..31 0 0 0 0 0 -31..31 0 -31..31 0 0 Coefficient: Significant Insignificant 0 -63..63 0 0 0 -63..63 0 0 First Second

  45. Context Modeling • Context • Zero coding • Significant statuses of neighbor coefficients • Refinement • Whether it is the 1st refinement pass • Significant statuses of neighbor coefficients • Sign • Neighbor signs

  46. 45 0 0 0 0 0 0 0 -74 -13 0 0 3 0 4 0 21 0 4 0 0 3 5 0 14 0 23 23 0 0 0 0 -4 5 0 0 0 1 -1 0 -18 0 0 19 -4 33 0 -1 4 0 23 0 0 0 1 0 -1 0 0 0 0 0 0 0 After Implicit Psychoacoustic Masking & Context Modeling To be encoded Bit: 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 …… Ctx: 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 …… Automatically generated

  47. Arithmetic Coding – Illustration (QM Coder used) Coding result: 1 • What is arithmetic coding P0 0.100 P2 C P1 1-P2 A B (Shortest binary bitstream ensures that interval B=0.100 0000000 to C=0.100 1111111 is (B,C) A ) 1-P0 1-P1 0 S0=0 S1=1 S2=0

  48. Entropy Coder (Summary) Bitstream D R R-D curve

  49. Speed Up Issues • Context Modeling • Use stored context • Update context when a coefficient becomes significant • Implicit Masking • Fast calculation of energy in a critical band • Lookup table convert energy to mask • R-D curve calculation • Lookup table calculation of distortion • Context entropy coder • QM coder • Run-length Rice coder

  50. Bitstream Assembly • Input : • Bitstream • R-D curve • Output : • Assembled bitstream • Companion file . . . Bitstream assembling