A high-level simulator for the H.264/AVC decoding process in multi-core systems. Florian H. Seitner, Ralf M. Schreier, Michael Bleyer, Margrit Gelautz Vienna University of Technology, Austria SPIE IS&T Electronic Imaging Conference, Multimedia on Mobile Devices 2008. Outline. Introduction
A high-level simulator for the H.264/AVC decoding processin multi-core systems
Florian H. Seitner, Ralf M. Schreier, Michael Bleyer, Margrit Gelautz
Vienna University of Technology, Austria
SPIE IS&T Electronic Imaging Conference, Multimedia on Mobile Devices 2008
Figure 1. Dynamic variations in the execution times of individual macroblocks in the H.264 decoding process. Histograms are shown for six IPB coded sequences of 100 frames with Group of Pictures (GOP) sizes being 13.
Table 1. Six test sequences with normalization to 35 dB.
We present a high-level simulator for multi-core implementations of the H.264 decoder in this paper.
Figure 2. Concept of the simulator. (a) Profiling data. (b) Underlying hardware.
(c) Simulation of a splitting that maps f1 and f2 to the first core and f3 to thesecond one.
The H.264 decoding process http://www.powercam.cc/slide/1580
Figure 3. Partitioning the H.264 decoder on a dual-core system.
High-level simulator - Austrochip 2008, Invited Poster
CHILI Vector Processor
• CHILI Core with 32bit / 4 Slots / 8 SIMD
• High performance for signal processing and control code
• Compiler friendly instruction set
• Fully programmable (C / Assembler)
• C-Compiler (LLVM, GCC) and instruction set simulator available
CHILI Processor Features
• Separate instruction and data path
• 16-bit SIMD operands
• 64 32-bit general purpose registers
• 128-bit core memory interface
• 64 KB instruction cache
• 64 KB data SRAM (core memory)
• 64-channel data load and store DMA controller
• 1.92 GMAC 16-bit operations (@ 240 MHz)
SVENm Multimedia Engine
• Video / multimedia companion
• Targets H.264 encoding / decoding at SD resolution
Figure 5. Two methods for partitioning the H.264 decoder on a dual-core system.
(a) Scenario 1: The function Strength Calc. is part of the parsing module.
(b) Scenario 2: The function Strength Calc. is part of the reconstructor.
Figure 6. The Foreman sequence at different bitrates. Macroblocks are classified based on the percentage of the overall time that is spent in the parsing module of the decoder.
A percentage of 80 means that 80% of the runtime is spent in the parser, while 20% are consumed in the reconstructor. A value of 50% indicates a perfect balance.
Figure 6(b) indicates that the work load balancing between the two processors is significantly improved.
Figure 7. Idle time for parser and reconstructor core for three test sequences. (a) Filter strength calculation is done at the parser side. (b) Filter strength calculation is performed at the reconstructor side.
Figure 8. Average idle times of all system cores while decoding (a) intra-coded(I frames). For the simulations the calculation of the filter strength was assigned to the reconstructor core.[Figure 5(b)]
Figure 8. Average idle times of all system cores while decoding (b) inter-coded P- and (c) inter-coded B-frames of three test sequences.