1 / 37

CS 414 – Multimedia Systems Design Lecture 11 – MP3 Audio & Introduction to MPEG-4 (Part 6)

CS 414 – Multimedia Systems Design Lecture 11 – MP3 Audio & Introduction to MPEG-4 (Part 6). Klara Nahrstedt Spring 2011. Administrative. MP1 – deadline February 18. Outline. MP3 Audio Encoding MPEG-2 and Intro to MPEG-4 Reading: Media Coding book, Section 7.7.2 – 7.7.5

koss
Download Presentation

CS 414 – Multimedia Systems Design Lecture 11 – MP3 Audio & Introduction to MPEG-4 (Part 6)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS 414 – Multimedia Systems DesignLecture 11 – MP3 Audio &Introduction to MPEG-4 (Part 6) Klara Nahrstedt Spring 2011 CS 414 - Spring 2011

  2. Administrative • MP1 – deadline February 18 CS 414 - Spring 2011

  3. Outline • MP3 Audio Encoding • MPEG-2 and Intro to MPEG-4 • Reading: • Media Coding book, Section 7.7.2 – 7.7.5 • Recommended Paper on MP3: Davis Pan, “A Tutorial on MPEG/Audio Compression”, IEEE Multimedia, pp. 6-74, 1995 • Recommended books on JPEG/ MPEG Audio/Video Fundamentals: • Haskell, Puri, Netravali, “Digital Video: An Introduction to MPEG-2”, Chapman and Hall, 1996 CS 414 - Spring 2011

  4. MPEG-1 Audio Encoding • Characteristics • Precision 16 bits • Sampling frequency: 32KHz, 44.1 KHz, 48 KHz • 3 compression layers: Layer 1, Layer 2, Layer 3 (MP3) • Layer 3: 32-320 kbps, target 64 kbps • Layer 2: 32-384 kbps, target 128 kbps • Layer 1: 32-448 kbps, target 192 kbps CS 414 - Spring 2011

  5. MPEG Audio Encoding Steps CS 414 - Spring 2011

  6. MPEG Audio Filter Bank Filter bank divides input into multiple sub-bands (32 equal frequency sub-bands) Sub-band i defined - filter output sample for sub-band i at time t, C[n] – one of 512 coefficients, x[n] – audio input sample from 512 sample buffer CS 414 - Spring 2011

  7. MPEG Audio Psycho-acoustic Model MPEG audio compresses by removing acoustically irrelevant parts of audio signals Takes advantage of human auditory systems inability to hear quantization noise under auditory masking Auditory masking: occurs when ever the presence of a strong audio signal makes a temporal or spectral neighborhood of weaker audio signals imperceptible. CS 414 - Spring 2011

  8. MPEG/audio divides audio signal into frequency sub-bands that approximate critical bands. Then we quantize each sub-band according to the audibility of quantization noise within the band CS 414 - Spring 2011

  9. MPEG Audio Bit Allocation • This process determines number of code bits allocated to each sub-band based on information from the psycho-acoustic model • Algorithm: • Compute mask-to-noise ratio: MNR=SNR-SMR • Standard provides tables that give estimates for SNR resulting from quantizing to a given number of quantizer levels • Get MNR for each sub-band • Search for sub-band with the lowest MNR • Allocate code bits to this sub-band. • If sub-band gets allocated more code bits than appropriate, look up new estimate of SNR and repeat step 1 CS 414 - Spring 2011

  10. MP3 Audio Format Source: http://wiki.hydrogenaudio.org/images/e/ee/Mp3filestructure.jpg CS 414 - Spring 2011

  11. MPEG Audio Comments Precision of 16 bits per sample is needed to get good SNR ratio Noise we are getting is quantization noise from the digitization process For each added bit, we get 6dB better SNR ratio Masking effect means that we can raise the noise floor around a strong sound because the noise will be masked away Raising noise floor is the same as using less bits and using less bits is the same as compression CS 414 - Spring 2011

  12. MPEG-2 Standard Extension • MPEG-1 was optimized for CD-ROM and apps for 1.5 Mbps (video strictly non-interleaved) • MPEG-2 adds to MPEG-1: • More aspect ratios: 4:2:2, 4:4:4 • Progressive and interlaced frame coding • Four scalable modes • spatial scalability, data partitioning, SNR scalability, temporal scalability CS 414 - Spring 2011

  13. MPEG-1/MPEG-2 • Pixel-based representations of content • takes place at the encoder • Lack support for content manipulation • e.g., remove a date stamp from a video • turn off “current score” visual in a live game • Need support manipulation and interaction if the video is aware of its own content CS 414 - Spring 2011

  14. MPEG-4 Compression CS 414 - Spring 2011

  15. Interact With Visual Content CS 414 - Spring 2011

  16. Original MPEG-4 • Conceived in 1992 to address very low bit rate audio and video (64 Kbps) • Required quantum leaps in compression • beyond statistical- and DCT-based techniques • committee felt it was possible within 5 years • Quantum leap did not happen CS 414 - Spring 2011

  17. The “New” MPEG-4 • Support object-based features for content • Enable dynamic rendering of content • defer composition until decoding • Support convergence among digital video, synthetic environments, and the Internet CS 414 - Spring 2011

  18. MPEG-4 Components Systems – defines architecture, multiplexing structure, syntax Video – defines video coding algorithms for animation of synthetic and natural hybrid video (Synthetic/Natural Hybrid Coding) Audio – defines audio/speech coding, Synthetic/Natural Hybrid Coding such as MIDI and text-to-speech synthesis integration Conformance Testing – defines compliance requirements for MPEG-4 bitstream and decoders Technical report DSM-CC Multimedia Integration Framework – defines Digital Storage Media – Command&Control Multimedia Integration Format; specifies merging of broadcast, interactive and conversational multimedia for set-top-boxes and mobile stations CS 414 - Spring 2011

  19. MPEG-4 Example CS 414 - Spring 2011 ISO N3536 MPEG4

  20. MPEG-4 Example CS 414 - Spring 2011 ISO N3536 MPEG4

  21. MPEG-4 Example Daras, P. MPEG-4 Authoring Tool, J. Applied Signal Processing, 9, 1-18, 2003 CS 414 - Spring 2011

  22. Interactive Drama CS 414 - Spring 2011 http://www.interactivestory.net

  23. MPEG-4 Characteristics and Applications CS 414 - Spring 2011

  24. Media Objects • An object is called a media object • real and synthetic images; analog and synthetic audio; animated faces; interaction • Compose media objects into a hierarchical representation • form compound, dynamic scenes CS 414 - Spring 2011

  25. Composition Scene character furniture sprite voice desk globe CS 414 - Spring 2011 ISO N3536 MPEG4

  26. Video Syntax Structure New MPEG-4 Aspect: Object-based layered syntactic structure CS 414 - Spring 2011

  27. Content-based Interactivity • Achieves different qualities for different objects with a fine granularity in spatial resolution, temporal resolution and decoding complexity • Needs coding of video objects with arbitrary shapes • Scalability • Spatial and temporal scalability • Need more than one layer of information (base and enhancement layers) CS 414 - Spring 2011

  28. Examples of Base and Enhancement Layers CS 414 - Spring 2011

  29. Coding of Objects • Each VOP corresponds to an entity that after being coded is added to the bit stream • Encoder sends together with VOP • Composition information where and when each VOP is to be displayed • Users are allowed to change the composition of the entire scene displayed by interacting with the composition information CS 414 - Spring 2011

  30. Spatial Scalability VOP which is temporally coincident with I-VOP in the base layer, is encoded as P-VOP in the enhancement layer. VOP which is temporally coincident with P-VOP in the base layer is encoded as B-VOP in the enhancement layer. CS 414 - Spring 2011

  31. Temporal Scalability CS 414 - Spring 2011

  32. Composition (cont.) • Encode objects in separate channels • encode using most efficient mechanism • transmit each object in a separate stream • Composition takes place at the decoder,rather than at the encoder • requires a binary scene description (BIFS) • BIFS is low-level language for describing: • hierarchical, spatial, and temporal relations CS 414 - Spring 2011

  33. MPEG-4 Rendering CS 414 - Spring 2011 ISO N3536 MPEG4

  34. Interaction as Objects • Change colors of objects • Toggle visibility of objects • Navigate to different content sections • Select from multiple camera views • change current camera angle • Standardizes content and interaction • e.g., broadcast HDTV and stored DVD CS 414 - Spring 2011

  35. Hierarchical Model • Each MPEG-4 movie composed of tracks • each track composed of media elements (one reserved for BIFS information) • each media element is an object • each object is a audio, video, sprite, etc. • Each object specifies its: • spatial information relative to a parent • temporal information relative to global timeline CS 414 - Spring 2011

  36. Synchronization • Global timeline (high-resolution units) • e.g., 600 units/sec • Each continuous track specifies relation • e.g., if a video is 30 fps, then a frame should be displayed every 33 ms. • Others specify start/end time CS 414 - Spring 2011

  37. MPEG-4 parts • MPEG-4 part 2 • Includes Advanced Simple Profile, used by codecs such as Quicktime 6 • MPEG-4 part 10 • MPEG-4 AVC/H.264 also called Advanced Video Coding • Used by coders such as Quicktime 7 • Used by high-definition video media like Blu-ray Disc CS 414 - Spring 2011

More Related