Multiplexing H.264/AVC Video with MPEG-AAC Audio Harishankar Murugan University of Texas at Arlington
Outline : • Multiplexing: Areas of applications • Why H.264 and AAC? • Multiplexing • De-multiplexing • Synchronization and Playback • Results • Conclusions • Future work • References
Multiplexing : Areas of applications • DVB : DVB-C, DVB-T • ATSC • IPTV
Why H.264 Video? • Up to 50% in bit rate savings: Compared to H.263v2 (H.263+) or MPEG-2 Simple Profile. • High quality video: H.264 offers consistently good video quality at high and low bit rates. • Error resilience: H.264 provides the tools necessary to deal with packet loss in packet networks and bit errors in error-prone wireless networks. • Wide areas of application streaming mobile TV, HDTV, and storage options for the home user
Important features of H.264 • IDR (Instantaneous decoder refresh) picture: Anchor picture with only I-slices. • Sequence parameter set: • profile and level indicator. • decoding or playback order. • number of reference frames. • aspect ratio or color space details. • Picture parameter set: • entropy coding mode used. • slice data partitioning and macroblock reordering. • Flags indicating the usage of weighted (bi) prediction. • Quantization parameter details.
AAC Audio • Advanced Audio Coding is a standardized, lossy compression scheme for audio. Encoder Block diagram of AAC
AAC Audio • Profiles : • Low Complexity (LC) - the simplest and most widely used; • Main Profile (MAIN) - LC profile with backwards prediction; • Sample-Rate Scalable (SRS) – LC profile with gain control tool; • Bit stream Formats: • ADIF - Audio Data Interchange Format: Only one header in the beginning of the file followed by raw data blocks • ADTS - Audio Data Transport Stream Separate header for each frame enabling decoding from any frame
Why AAC Audio? • Supports Sample frequencies from 8 kHz to 96 kHz (official MP3: 16 kHz to 48 kHz) • Higher coding efficiency and simpler filterbank (pure MDCT ) as compared to mp3 (hybrid filter bank ) • Improved compression provides higher-quality audio with smaller bit rates . • Superior performance at bit rates > 64 kbps and at bit rates reaching as low as 16 kbps.
Factors to be considered for Multiplexing and Transmission • Split the video and audio coded bit streams into smaller data packets • Multiplex with equal priority given to all elementary streams • Detect packet losses and errors • Additional information to help synchronize audio and video
H264 Encoder Video Source Packetizer MPEG encoded stream Multiplexer Transport Stream AAC Encoder Audio Source Packetizer Data Source Packetizer Packetization • 2 layers of packetization : • PES - Packetized Elementary stream : • Transport Stream : PES
Packetized Elementary stream (PES) • Elementary streams (ES): • Encoded video stream • Encoded audio stream • Data stream (Optional) • PES contains access units that are sequentially separated and packetized • PES headers distinguish different ES and contain timestamp information • Packet size varies with the size of access units
Packetized Elementary stream (PES) AUDIO OR VIDEO ELEMENTARY STREAM PES PES PES PES Header Payload
PES Header Description • 3 bytes of start code – 0x000001 • 1 byte of stream ID • 2 bytes of packet length • 2 bytes of time stamp (Frame number)
Frame number as time stamp • Video frame rate : constant (25/30/.. fps) time = frame number/fps • Audio sampling rate : constant (8 – 96 kHz) Number of samples/frame (AAC) : 1024 time = 1024*frame number/(sampling rate)
Advantages over the method that uses clock samples as time stamps • Saves the extra header bytes used for sending program clock reference (PCR) information periodically • No synchronization problem due to clock jitters • No propagation of delay between audio and video • Less complex and more suitable for software implementation
Transport Packets • PES from various elementary sources are broken into smaller packets called transport packets • Transport packets have a fixed length of 188 bytes • Constraints • Each packet can have data from only one PES • PES header should be the first byte of the transport packet payload. • Stuffing bytes are added if the above constraints are not met
Transport stream PES Header PES Payload Transport Stream Packet Stuffing bytes Transport Header
Packet Header • PID (Packet identifier) : Each elementary stream has a unique PID. Some are reserved for NULL packets and PSI (Program Specific Information). • PSI (Program specific information) : Sequence parameter set and picture parameter set are sent as PSI at frequent intervals. • Payload unit start indicator : 1 bit flag to indicate presence of PES header in the payload. • Adaptation field control : 1 bit flag to indicate presence of any data other than PES data in payload.
Packet Header • Continuity counter : 4 bit rolling counter which is incremented by 1 for each consecutive TS packet of the same PID. To detect packet loss. • Payload Byte offset : If adaptation field control bit is ‘1’, byte offset value of the start of the payload or the length of adaptation field is mentioned here. • Adaptation field : • Stuffing bytes , if PES data < TS packet size • Additional header information
Multiplexing method adopted • Multiplexing method affects buffer fullness at the de-multiplexer and in turn playback • Video and audio timing counters are used to ensure proper multiplexing • Timing counters are incremented according to the playback time of each packet multiplexed • PES with the least timing counter value is always given preference during packet allocation
Multiplexing method adopted fps = 25 Video PES PES length = 570 => 1/25 = 40 ms # of TS = round(570/185) => 40/4 = 10 ms 4 TS packets
Multiplexed transport stream Video PES Audio PES P1 V 0x2 P1 A 0x4 P1 A 0x5 P1 A 0x6 P1 V 0x3 N N P1 A 0x7 Transport stream PID 15 16 16 16 15 1024 16
Synchronization and playback • During playback, data is loaded from the buffer • IDR frame is searched from the top of the video buffer • Frame number of IDR frame is extracted • Corresponding audio frame number is calculated as follows Aframe number = ( Vframe number * sampling rate) / (1024*fps)
Synchronization and playback • If a non-integer value, frame number is rounded off and the corresponding audio frame is searched. • The audio and video contents from the corresponding frame numbers are decoded with PSI and played back. • Then the audio and video buffers are emptied and incoming data gets buffered and the process continues. • If corresponding audio frame is not found, next IDR frame is searched and same process is repeated.
Conclusions • Synchronization of audio and video is achieved by starting de-multiplexing from any TS packet. • Visually there is absolutely no lag between video and audio • Bit rate can be changed by using rate control module in the H.264 encoder
Test Conditions • Single program Transport stream is generated • Input raw video : YUV format • Input raw audio : WAVE format • Profiles used : • H.264 : Main profile • AAC : Low complexity profile (ADTS format) • GOP : IBBPBB (IDR forced) • Video frame rate: 25fps • Audio sampling frequency : 48 kHz
Future work • Extension of the algorithm to multiplex multiple program streams • Error correction method • Reduce initial buffering time
References Books and Papers: • MPEG–2 advanced audio coding, AAC. International Standard IS 13818–7, ISO/IEC JTC1/SC29 WG11, 1997. • MPEG. Information technology — generic coding of moving pictures and associated audio information, part 3: Audio .International Standard IS 13818–3, ISO/IEC JTC1/SC29 WG11, 1994. • MPEG. Information technology — generic coding of moving pictures and associated audio information, part 4: Conformance testing .International Standard IS 13818–4, ISO/IEC JTC1/SC29 WG11, 1998. • Information technology—Generic coding of moving pictures and associated audio—Part 1: Systems, ISO/IEC 13818-1:2005, International Telecommunications Union. •  MPEG-4: ISO/IEC JTC1/SC29 14496-10: Information technology – Coding of audio-visual objects - Part 10: Advanced Video Coding, ISO/IEC, 2005. •  P. V. Rangan, S. S. Kumar, and S. Rajan, “Continuity and Synchronization in MPEG,” IEEE Journal on Selected Areas in Communications, Vol. 14, pp. 52-60, Jan. 1996. •  B.J. Lechner et. al “The ATSC Transport Layer, Including Program and System Information Protocol (PSIP)”, Proc of the IEEE, vol. 94, no. 1,pp 77-101, January 2006
References •  Hari Kalva et. al “Implementing Multiplexing, Streaming,and Server Interaction for MPEG-4”, IEEE transactions on circuits and systems for video technology, vol 9, No.8, pp 1299-1311,december 1999. •  M. Bosi and M. Goldberg “Introduction to digital audio coding and standards”, Boston : Kluwer Academic Publishers, c2003. •  D. K. Fibush, “Timing and Synchronization Using MPEG-2 Transport Streams,” SMPTE Journal, pp. 395-400,July, 1996. • K. Brandenburg, “MP3 and AAC Explained”, AES 17th International Conference, Florence, Italy, September 1999. •  S-k. Kwon, A. Tamhankar and K.R. Rao ”Overview of H.264 / MPEG-4 Part 10”, J. Visual Communication and Image Representation, vol. 17, pp.183-552, April 2006. • A. Puri, X. Chen and A. Luthra, “Video coding using the H.264/MPEG-4 • AVC compression standard”, Signal Processing: Image Communication, vol. 19, issue 9, pp. 793-849, Oct 2004. •  T. Wiegand et. al “Overview of the H.264/AVC Video Coding Standard,” IEEE Trans. CSVT, Vol. 13, pp. 560-576, July 2003.
Reference •  R. Hopkins, “United States digital advanced television broadcasting standard,” SPIE/IS & T, Photonics West, vol. CR61,pp 220-226, San Jose, CA, Feb. 1996. •  Z. Cai et. al “A RISC Implementation of MPEG-2 TS Packetization”, in the proceedings of IEEE HPC conference, pp 688-691, May 2000. •  M.Fieldler, “Implementation of basic H.264/AVC Decoder”, seminar paper at Chemnitz university of technology, June 2004 •  R.Linneman, “Advanced audo coding on FPGA”, BS honours thesis, October 2002, School of Information Technology, Brisbane. •  J. Watkinson, “The MPEG Handbook” , Second Edition , Oxford ; Burlington, MA : Elsevier/Focal Press, 2004. •  I.E.G.Richardson, “H.264 and MPEG-4 Video Compression: Video Coding • for Next Generation Multimedia”, John Wiley & Sons, 2003. • Proceedings of the IEEE, Special issue on Global Digital Television: Technology and Emerging Services, vol.94,pp 5-7, Jan. 2006. •  P.D Symes “Digital video compression“, McGraw-Hill, c2004 •  C. Wootton, “Practical guide to video and audio compression : from sprockets and rasters to macro blocks”, Oxford : Focal, 2005.
References •  “FAAC and FAAD AAC software, website www.audiocoding.com •  MPEG official website www.mpeg.org •  Alternative AAC software from http://www.psytel-research.co.yu •  H.264 software JM (10.2) from http://iphome.hhi.de/suehring/tml/ •  Bauvigne G. “MPEG-2/MPEG-4 AAC”, MP3 Tech Website, www.mp3-tech.org •  Whittle R., “Comparing AAC and MP3”, Website http://www.firstpr.com.au/audiocomp/aac-mp3-vq.html •  Public discussion forum website for a/v containers: http://forum.doom9.org/forumdisplay.php?s=c68a3cd483892abb630cf026aa06d3c5&f •  JVT documents website: http://www.dspr.com/www/technology/JVT-G050.pdf • Audio test files website http://www.rme-audio.com/english/download/audtest.htm • Reference for H.264 website http://www.vcodex.com/h264.html
Video Buffer H.264 Decoder Demultiplexer Synchronized playback Transport stream Timestamp information Audio buffer AAC Decoder