embedded audio coder n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Embedded Audio Coder PowerPoint Presentation
Download Presentation
Embedded Audio Coder

Loading in 2 Seconds...

play fullscreen
1 / 75

Embedded Audio Coder - PowerPoint PPT Presentation


  • 157 Views
  • Uploaded on

Embedded Audio Coder. Jin Li. Outline. Introduction Embedded audio coder - Algorithm MLT with window switching Quantizer Entropy coder Bitstream assembly Modular software design Experimental results & demos Conclusion. Introduction. Introduction – Audio Compression. Audio Waveform.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

Embedded Audio Coder


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
    Presentation Transcript
    1. Embedded Audio Coder Jin Li

    2. Outline • Introduction • Embedded audio coder - Algorithm • MLT with window switching • Quantizer • Entropy coder • Bitstream assembly • Modular software design • Experimental results & demos • Conclusion

    3. Introduction

    4. Introduction – Audio Compression Audio Waveform . . . Bitstream

    5. EAC vs. Other Compression • Existing audio compression schemes • MP3, AAC, MPEG4 audio, WMA, Real Audio, … • Why research for a new audio codec?

    6. Media vs. File Compression • File compression • Every bit is important, has to be compressed losslessly • Media compression • Exact bit/value is not important, distortion is tolerable • Amount of media is huge, high compression ratio is required • Media needs adaptation

    7. Key Features of EAC • Not only good compression performance • But also flexible bitstream syntax • The compressed bitstream may be manipulated for • Different bitrate • Different # of audio channels • Different audio sampling rate • Versatile • Lossless • Low delay • Streaming/storage application

    8. EAC Encoder . . . Master Bitstream Encoder Companion File

    9. Parser . . . . . . Master Bitstream Parser Application Bitstream Companion File • Except header, application bitstream is a subset of the master bitstream (parsing is fast) • May be changed according to the required bitrate, # of audio channels, and audio sampling rate

    10. EAC Decoder .wav file Encoder . . . Bitstream Speaker (Direct Sound)

    11. Embedded Audio Coder- Algorithm Description

    12. . . . Frame Work - Encoder Audio Transform Entropy coder Bitstream Assembly L+R(or mono) Transform Entropy coder Bitstream Assembly L-R Bitstream

    13. Audio Transform • Input: audio sample • Output: transform coefficient • Goal: convert audio from space domain to frequency domain • Compact energy • Better match with psychoacoustic characteristics • Enable audio sampling rate change

    14. Lossy vs Lossless Mode Audio Quantization MLT(SW) Lossy mode Audio Reversible MLT(SW) Lossless mode

    15. Lossy (Float) Pass

    16. MLT - Modulated Lapped Transforms Frequency Domain Spatial Response

    17. MLT with Window Switching • Features • Basic window size 2048 • Short window size 256 • Switching criterion • A frame (2048 samples) is switched to short window if and only if • Energy is bigger than a certain threshold • Energy within the 8 subframes (256 samples) differs more than Ta • There are at least two neighbor subframes, where the energy of the former subframe is greater than the latter subframe by Tb

    18. Band Separation Audio (44.1kHz sampling) MLT with window switching 0.5p p 0 0.125p 0.25p Band separation

    19. Synthesis (Half Sampling) 0.5p 0 0.125p 0.25p MLT with window switching Audio (22.05kHz sampling) Band separation

    20. Synthesis (Quarter Sampling) 0 0.125p 0.25p MLT with window switching Audio (11.025kHz sampling) Band separation

    21. Quantizer • Input: coefficient • Output: quantized coefficient • Goal: convert coefficient from float to integer • Reduce signal levels • Fast implementation of entropy coding

    22. Quantizer • Scalar quantizer with a deadzone d 0 Quantized Magnitude Sign

    23. Lossless (Integer) Pass

    24. Key to Achieve Lossless • Break the MLT into small steps • Make every step reversible • Definition of reversible transform • Integer input, integer output • The transform should have a determinant of 1 (donot expand data volume)

    25. MLT Framework Post Rotation Window Pre-Rotate Complex FFT DCT IV Lapped Transform Forward MLT Inverse MLT Post Rotation-l Inv Window-l Pre-Rotate-l Complex FFT-l

    26. Window Operation x(-n-1) x(n) Complex Rotate

    27. Pre-Rotation xw(0) xw(1) xw(2) xw(3) xw(4) xw(5) xw(6) xw(7) Complex Rotate –/32 xp(0) xp(1) xp(2) xp(3) xp(4) xp(5) xp(6) xp(7) Complex Rotate –5/32 Complex Rotate –9/32 Complex Rotate –13/32

    28. FFT (4 Point Complex) yp(0) yp(1) xp(2) xp(3) xp(4) xp(5) xp(6) xp(7) xp(0) xp(1) xp(2) xp(3) xp(4) xp(5) xp(6) xp(7) yc(0) yc(1) yc(2) yc(3) xc(0) xc(1) xc(2) xc(3) - - - - e-j/2

    29. Post-Rotation Conjugate Rotate –0 y(0) y(1) y(2) y(3) y(4) y(5) y(6) y(7) yp(0) yp(1) yp(2) yp(3) yp(4) yp(5) yp(6) yp(7) Conjugate Rotate –/8 Conjugate Rotate –2/8 Conjugate Rotate –3/8

    30. Reversible MLT • Make the following operation reversible • Butterfly operation • Complex rotation • Conjugate rotation

    31. Reversible Unit Transform

    32. Entropy Coder • Input: • quantized coefficients • Output: • embedded coded bitstream with R-D performance curve • Goal: • Compression • Embedded bitstream for future manipulation

    33. Frame Grouping Time slot 1 2 3 4 5 6 7 8 Frame

    34. Entropy Coder Bitstream D R R-D curve

    35. Entropy Coder • Embedded coding • Implicit psychoacoustic masking • Context modeling • Arithmetic coding • Implementation concerns

    36. 45 0 0 0 0 0 0 0 -74 -13 0 0 3 0 4 0 21 0 4 0 0 3 5 0 14 0 23 23 0 0 0 0 -4 5 0 0 0 1 -1 0 -18 0 0 19 -4 33 0 -1 4 0 23 0 0 0 1 0 -1 0 0 0 0 0 0 0 A block of coefficients Next View graph

    37. 0 1 0 1 1 Bits of Coefficients b1 b2 b3 b4 b5 b6 b7 Sign w0 w1 w2 w3 w4 w5 w6 w7 45 0 1 0 1 1 0 0 1 1 + -74 1 0 0 1 0 1 0 - 21 0 0 1 0 1 0 1 + coefficient 14 0 0 0 1 1 1 0 + -4 0 0 0 0 1 0 0 - -18 0 0 1 0 0 1 0 - 4 0 0 0 0 1 0 0 + -1 0 0 0 0 0 0 1 -

    38. 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 1 1 1 1 + + 1 0 0 1 0 1 0 - 1 0 0 1 0 1 0 - 0 0 1 0 1 0 1 + 0 0 1 0 1 0 1 + 0 0 0 1 1 1 0 + 0 0 0 0 1 0 0 - 0 0 1 0 0 1 0 - 0 0 0 0 1 0 0 + 0 0 0 0 0 0 1 - Conventional Coding b1 b2 b3 b4 b5 b6 b7 Sign First w0 w1 w2 w3 w4 w5 w6 w7 46 Second -74 Third 22 0 0 0 0 0

    39. 0 0 1 1 0 0 1 1 1 1 0 0 1 1 + 0 1 0 + 1 0 0 1 0 1 0 - 1 0 0 - 0 0 1 0 1 0 1 + 0 0 1 + 0 0 0 1 1 1 0 + 0 0 0 0 0 0 0 1 0 0 - 0 0 0 0 0 1 0 0 1 0 - 0 0 1 - 0 0 0 0 1 0 0 + 0 0 0 0 0 0 0 0 0 1 - 0 0 0 Embedded Coding Value Range b1 b2 b3 b4 b5 b6 b7 Sign w0 w1 w2 w3 w4 w5 w6 w7 32..47 40 -72 -79..-64 24 16..31 0 -31..31 0 -31..31 -24 -31..31 0 -31..31 0 -31..31 First Second Third

    40. Audio Masking Signal Signal-to mask ratio Masking Threshold Maximum Mask Noise-to mask ratio Noise Level Frequency Critical Band Neighboring Band

    41. Psychoacoustic Masking • Traditional approach (explicit masking, all existing approaches) • Calculate the mask • Transmit the mask • Modify transform coefficients (or coding approach) according to the masking • Encode the transform coefficients • Note • Mask modifies the coding content

    42. Implicit Psychoacoustic Masking • Key • Mask modifies the coding order, the content is the same • Implicit masking • Calculate the static masking (Fletcher_Munson threshold) • Encode the MSB of the transform coefficients • Calculate the mask based on the MSB of the coefficients • Modify coding order • Encode the next most important part of the coefficients • Repeat the process

    43. 0 0 1 - 0 0 0 0 0 0 Embedded Coding with Implicit Psychoacoustic Masking Value Range b1 b2 b3 b4 b5 b6 b7 Sign w0 w1 w2 w3 w4 w5 w6 w7 -63..63 0 0 -96 1 - -127..-64 0 -63..63 0 0 -63..63 0 0 0 -63..63 0 -63..63 0 Coefficient: Significant Insignificant 0 -127..127 0 0 -127..127 0 Mask First

    44. 0 0 1 1 + 1 0 - 0 0 0 0 0 0 0 0 0 0 0 0 Embedded Coding with Implicit Psychoacoustic Masking Value Range b1 b2 b3 b4 b5 b6 b7 Sign w0 w1 w2 w3 w4 w5 w6 w7 32..63 48 0 1 + -96 1 0 - -127..-64 0 -31..31 0 0 0 -31..31 0 0 0 0 0 -31..31 0 -31..31 0 0 Coefficient: Significant Insignificant 0 -63..63 0 0 0 -63..63 0 0 First Second

    45. Context Modeling • Context • Zero coding • Significant statuses of neighbor coefficients • Refinement • Whether it is the 1st refinement pass • Significant statuses of neighbor coefficients • Sign • Neighbor signs

    46. 45 0 0 0 0 0 0 0 -74 -13 0 0 3 0 4 0 21 0 4 0 0 3 5 0 14 0 23 23 0 0 0 0 -4 5 0 0 0 1 -1 0 -18 0 0 19 -4 33 0 -1 4 0 23 0 0 0 1 0 -1 0 0 0 0 0 0 0 After Implicit Psychoacoustic Masking & Context Modeling To be encoded Bit: 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 …… Ctx: 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 …… Automatically generated

    47. Arithmetic Coding – Illustration (QM Coder used) Coding result: 1 • What is arithmetic coding P0 0.100 P2 C P1 1-P2 A B (Shortest binary bitstream ensures that interval B=0.100 0000000 to C=0.100 1111111 is (B,C) A ) 1-P0 1-P1 0 S0=0 S1=1 S2=0

    48. Entropy Coder (Summary) Bitstream D R R-D curve

    49. Speed Up Issues • Context Modeling • Use stored context • Update context when a coefficient becomes significant • Implicit Masking • Fast calculation of energy in a critical band • Lookup table convert energy to mask • R-D curve calculation • Lookup table calculation of distortion • Context entropy coder • QM coder • Run-length Rice coder

    50. Bitstream Assembly • Input : • Bitstream • R-D curve • Output : • Assembled bitstream • Companion file . . . Bitstream assembling