Loading in 5 sec....

A high-level simulator for the H.264/AVC decoding process in multi-core systemsPowerPoint Presentation

A high-level simulator for the H.264/AVC decoding process in multi-core systems

- By
**toki** - Follow User

- 74 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about ' A high-level simulator for the H.264/AVC decoding process in multi-core systems' - toki

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

### A high-level simulator for the H.264/AVC decoding processin multi-core systems

Florian H. Seitner, Ralf M. Schreier, Michael Bleyer, Margrit Gelautz

Vienna University of Technology, Austria

SPIE IS&T Electronic Imaging Conference, Multimedia on Mobile Devices 2008

Outline

- Introduction
- Multi-processor decoding
- High-level simulator
- Simulation result
- Conclusion

Introduction

- H.264 as a new-generation video coding algorithm is becoming increasingly important for international broadcasting standards such as DVB-H and DMB.
- H.264 improved high compression efficiency at the cost of increased computational complexity.
- Mobile devices (embedded processor)
- Low processing(computation) capability
- Limited energy(power)

- Multi-core systems provide an elegant and power-efficient solution to overcome the performance limitation.

Introduction It requires detailed knowledge about the algorithmic complexity and inter-dependencies between functional blocks. The objective of this paper is an investigation on

- Efficiently distributing the video algorithm among multiple processors is a non-trivial task.
- The decoding load should be distributed equally
- Data dependency
- Synchronization

- the dynamic behavior of the H.264 decoding process
- the interaction between the main decoding tasks in the multi-core environments

Figure 1. Dynamic variations in the execution times of individual macroblocks in the H.264 decoding process. Histograms are shown for six IPB coded sequences of 100 frames with Group of Pictures (GOP) sizes being 13.

- Histogram bins plot the number of macroblocks having similar runtimes.
- It is observed that the runtimes of macroblocks significantly vary within a sequence due to different image content.
- The overall runtime of the decoder strongly depends on the content of the encoded video material.

We present a high-level simulator for multi-core implementations of the H.264 decoder in this paper.

Figure 2. Concept of the simulator. (a) Profiling data. (b) Underlying hardware.

(c) Simulation of a splitting that maps f1 and f2 to the first core and f3 to thesecond one.

Parallel H.264 Decoding implementations of the H.264 decoder in this paper.The H.264 Decoder

Encoded Bitstream

Inverse Quantization

Inverse DCT

Stream Parsing

Entropy Decoder

Deblocking

+

Spatial Prediction

Motion Compensation

Reference Frames

Reconstructor

Data-Parallel Processing

The H.264 decoding process http://www.powercam.cc/slide/1580

Parser

Multi-processor decoding implementations of the H.264 decoder in this paper.

- The parser processor (1st CPU) performs all functions related to bitstream parsing
- Entropy Decoding : the basic entropy decoding of picture data such as motion vectors and DCT residuals
- Context Calculation : the prediction step of context adaptive VLC coding for residuals and motion vector prediction
- Init: the memory initialization of macroblock data structures

Figure 3. Partitioning the H.264 decoder on a dual-core system.

Multi-processor decoding implementations of the H.264 decoder in this paper.

- The reconstructor processor (2nd CPU) handles all pixel-based operations
- Intra/Inter Prediction : the intra and inter prediction routines
- IDCT : the inverse residual transformation, which are based on multiples of the 4 × 4 pixel block size
- Strength Calculation: filter strength coefficients for the deblocking process are calculated
- Deblocking : before applying the deblocking filter as the last step in the macroblock decoding process

High-level simulator implementations of the H.264 decoder in this paper. - Austrochip 2008, Invited Poster

CHILI Vector Processor

CHILI Design

• CHILI Core with 32bit / 4 Slots / 8 SIMD

• High performance for signal processing and control code

• Compiler friendly instruction set

• Fully programmable (C / Assembler)

• C-Compiler (LLVM, GCC) and instruction set simulator available

CHILI Processor Features

• Separate instruction and data path

• 16-bit SIMD operands

• 64 32-bit general purpose registers

• 128-bit core memory interface

• 64 KB instruction cache

• 64 KB data SRAM (core memory)

• 64-channel data load and store DMA controller

• 1.92 GMAC 16-bit operations (@ 240 MHz)

SVENm Multimedia Engine implementations of the H.264 decoder in this paper.

• Video / multimedia companion

• Targets H.264 encoding / decoding at SD resolution

Simulation result implementations of the H.264 decoder in this paper.

- 6 test sequences
- Foreman, Flowergarden, Barcelona, Paris, Bus, Mobile

- Parameters
- Test sequences are encoded in H.264 main profile using the JM12.2 encoder
- GOP size = 13 frames
- CIF, IPB, VLC, deblocking active, all prediction modes allowed
- SR(Search Range) = +/–16 pixels
- 3 reference frames
- 1 slice per frame

Simulation result implementations of the H.264 decoder in this paper.– (1)Variation of partitioning

Figure 5. Two methods for partitioning the H.264 decoder on a dual-core system.

(a) Scenario 1: The function Strength Calc. is part of the parsing module.

(b) Scenario 2: The function Strength Calc. is part of the reconstructor.

Figure 6. implementations of the H.264 decoder in this paper.The Foreman sequence at different bitrates. Macroblocks are classified based on the percentage of the overall time that is spent in the parsing module of the decoder.

A percentage of 80 means that 80% of the runtime is spent in the parser, while 20% are consumed in the reconstructor. A value of 50% indicates a perfect balance.

Figure 6(b) indicates that the work load balancing between the two processors is significantly improved.

- Figure 7(a) : the reconstructor processor’s idle time is approximately 40% in all three test sequences and for all data rates.
- Figure 7(b) : the reconstructor idle time can be reduced below 15%.

Figure 7. Idle time for parser and reconstructor core for three test sequences. (a) Filter strength calculation is done at the parser side. (b) Filter strength calculation is performed at the reconstructor side.

Simulation result approximately 40% in all three test sequences and for all data rates.– (2)Variation of buffers

Figure 8. Average idle times of all system cores while decoding (a) intra-coded(I frames). For the simulations the calculation of the filter strength was assigned to the reconstructor core.[Figure 5(b)]

- Increasing the PSNR value (and the bitrate) mainly raises the macroblock processing complexity at the parsing core and performance decrease.
- At a buffer size of one macroblock(1MB) the Foreman sequence performs best at 35 dB.
- 5MB: a continuous performance decrease with increasing PSNR values can be observed for the Foreman sequence.

- Flowergarden approximately 40% in all three test sequences and for all data rates. and the Barcelona sequences, the higher parsing complexity results in typically higher idle times and less performance improvements at higher buffer sizes.

Figure 8. Average idle times of all system cores while decoding (b) inter-coded P- and (c) inter-coded B-frames of three test sequences.

Conclusion approximately 40% in all three test sequences and for all data rates.

- A simulator for mapping the H.264 decoding process onto hardware architecture has been introduced.
- We have demonstrated the simulators abilities to analyze the efficiency of a multi-core architecture under various conditions.

References approximately 40% in all three test sequences and for all data rates.

- [4] T.-T. Shih, C.-L. Yang, and Y.-S. Tung, “Workload characterization of the H.264/AVC decoder,” in Proc. of the 5th IEEE Pacific-Rim Conference on Multimedia, pp. 957–966, 2004.
- [6] F. Seitner, R. Schreier, M. Bleyer, and M. Gelautz, “A macroblock-level analysis on the dynamic behaviour of an H.264 decoder,” in Proc. of ISCE 2007, (Dallas), June 2007.
- [7] E. B. van der Tol, E. G. Jaspers, and R. H. Gelderblom, “Mapping of H.264 decoding on a multiprocessor architecture,” in Proc. of the SPIE, 5022, pp. 707–718, May 2003.
- F. Seitner, R. Schreier, M. Bleyer, and M. Gelautz, “Evaluation of data-parallel splitting approaches for H.264 decoding,” Proc. of the 6th International Conference on Advances in Mobile Computing and Multimedia, Linz; November 2008. http://www.powercam.cc/slide/1580
- Florian Seitner, Josef Meser, Gerold Schedelberger, Andreas Wasserbauer, Michael Bleyer, Margrit Gelautz, Markus Schutti, Ralf Schreier, Premysl Vaclavik, Gerald Krottendorfer, Günther Truhlar, Thomas Bauernfeind, Philipp Beham, “Design Methodology for the SVENm Multimedia Engine,” Austrochip 2008, Invited Poster.

FFmpeg H.264 decoder approximately 40% in all three test sequences and for all data rates.

- H264 benchmarks
- JM Reference Codec
- X264 encoder
- FFmpeg H.264 decoder

- FFmpeg H.264 decoder
- FFmpeg includes a H.264/AVC decoder that implements most of the features of the main and high profiles of the standard.
- The code is very optimized and include MMX/SSE and Altivec SIMD instructions for the most time consuming kernels.
- It is widely used in free multimedia players like MPlayer, VLC media player(VideoLAN), Xine…etc.
- http://ffmpeg.org/

Download Presentation

Connecting to Server..