1 / 42

Adaptive slice-level parallelism for H.264/AVC encoding using pre macroblock mode selection

Adaptive slice-level parallelism for H.264/AVC encoding using pre macroblock mode selection. Bongsoo Jung, Byeungwoo Jeon. Journal of Visual Communication and Image Representation 2008. Outline. Introduction Complexity Analysis Method Pre Macroblock Mode Selection

julinka
Download Presentation

Adaptive slice-level parallelism for H.264/AVC encoding using pre macroblock mode selection

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Adaptive slice-level parallelism for H.264/AVC encoding using pre macroblock mode selection Bongsoo Jung, Byeungwoo Jeon Journal of Visual Communication and Image Representation 2008

  2. Outline • Introduction • Complexity Analysis • Method • Pre Macroblock Mode Selection • Adaptive Slice-level Parallelism • Experimental Results • Conclusions

  3. Introduction • H.264/AVC achieves high coding efficiency • Variable block size, multiple reference frame, quarter-pel motion vector accuracy,etc. • High computational complexity • Complexity reduction algorithm • Parallel processing

  4. Introduction • GOP level • Simple but high latency • Frame level • Keep coding efficiency, but the dependence among frames limits the thread scalability • Slice level • Encode independently but less coding efficiency • Macroblock level • High dependency

  5. Introduction • MBs in a slice may not have similar computational complexity. • Unnecessary extra waiting time in some threads. PU0 slice 0 PU1 slice 1 PU2 slice 2 PU3 slice 3 PU4 slice 4 PU5 slice 5 PU6 slice 6 PU7 slice 7 Encoding time

  6. Main Purpose • Objective • Using parallel algorithm to speed up H.264/AVC encoder • Maximize the parallelism efficiency by distributing the workload equally. • Method • Pre processing: Fast MB mode selection • Adaptive slice-level parallelism

  7. Complexity Analysis • Inter prediction mode of MBs in H.264 • Intra prediction mode: 4*4, 16*16

  8. Complexity Analysis • The run-time complexity of the H.264/AVC encoder • Pentium IV 2.4GHz • Foreman_CIF with IPPP structure

  9. Pre Macroblock Mode SelectionOverview • Why? • High computational complexity of ME in variable block size • Remove unnecessary ME block size and RD calculation of intra prediction mode • This removal leads to • Complexity reduction • Workload balancing among slices

  10. Pre Macroblock Mode SelectionInter MB mode selection • MC block sizes in video sequence • Foreground region : 8*8 or smaller • Non-moving region : 16*16 • High temporal correlation • Check consistency history of block size 16*16 and zero MV • Two measurements • Zero motion consistency (ZMC) • Large block consistency (LBC)

  11. Pre Macroblock Mode SelectionInter MB mode selection • Zero Motion Consistency (ZMC) • Indicates how long a specified block has had a zero MV consecutively • When a block is encoded in intra mode • ZMC is set to 0 t : frame index , ZMC0 = 0, (n,m;i,j) indicates a 4*4 block at (n,m) within a MB (i,j) high value of ZMC  high prob. of belonging to background region

  12. Pre Macroblock Mode SelectionInter MB mode selection • Zero Motion Consistency Score • Indicates how likely a MB being a stationary region TMOTION : A threshold value

  13. Pre Macroblock Mode SelectionInter MB mode selection • Large Block Consistency (LBC) • Indicates the number of continuous frames having a 16*16 MC block size at (i,j)thMB • When a block is encoded in intra mode • LBC is set to 0 bestModet(i,j) : The best MB mode of the (i,j) MB in tth frame LBC0 = 0

  14. Pre Macroblock Mode SelectionInter MB mode selection • Large Block Consistency Score • Indicates how likely a MB being partitioned in 16*16 TMODE1 ,TMODE2 : Threshold values used to make the assessment of the LBC

  15. Pre Macroblock Mode SelectionInter MB mode selection • A illustration of LBCS

  16. Pre Macroblock Mode SelectionInter MB mode selection • Conditional probability of MB modes given ZMCS = High • The other block sizes are very unlikely to appear (less than about 0.04) • Early detect SKIP and P16*16 mode TMotion = 4

  17. Pre Macroblock Mode SelectionInter MB mode selection • Joint conditional probability of given LBCS with ZMCS = Low TMODE1 = 1, TMODE2= 4 A: LBCS = High, B: LBCS = Medium, C: LBCS = Low

  18. Pre Macroblock Mode SelectionPre selective intra mode selection • High computational load of computing RD costs of intra mode • Comparing temporal correlation with spatial correlation of the current MB prior to frame coding

  19. Pre Macroblock Mode SelectionSelective intra mode selection • Mean Absolute Temporal Difference • Mean Absolute Spatial Difference cx,y : Pixel values at location (x,y) of MB in current frame rx,y : Pixel values at location (x,y) of MB in previous frame X, Y : Horizontal and vertical dimensions of a MB MASDH : The MASD between horizontally neighboring pixels MASDV : The MASD between vertically neighboring pixels

  20. Pre Macroblock Mode SelectionSelective intra mode selection • Comparing MATD and MASD to determine whether current MB should calculate RD costs of intra modes • A larger w makes skipping intra mode search easier • A smaller QP will incur more intra modes than a larger QP More temporally correlated than spatially correlated w: Weighting factor, currently is set to 0.6

  21. Pre Macroblock Mode SelectionMB mode classfication • Decision table of candidate MB mode • A block diagram of MB selection

  22. Adaptive Slice-level ParallelismOverview • Characteristic • Easy to implement • Lower overhead of inter communication among processor unit • Good scalability • Increase bitrate • Slice boundary is defined on the basis of a fixed number of MBs or fixed number of bits Hard to decide a slice boundary prior to encoding

  23. Adaptive Slice-level ParallelismFixed MB assignment • The number of consecutive MBs in each slice L : The number of processor units on a multi-core system M : The total number of MBs in a frame i : Slice index Example : number of processing unit L = 8, sequence resolution is CIF (352*288), M = 22*18 = 396  We can assign about 49 MBs to each slice

  24. Adaptive Slice-level ParallelismFixed MB assignment • The scheduling of slice-level parallelism in eight processor units Ideal case Practical case PU0 PU0 slice 0 slice 0 PU1 PU1 slice 1 slice 1 PU2 PU2 slice 2 slice 2 PU3 PU3 slice 3 slice 3 PU4 PU4 slice 4 slice 4 Bottleneck PU5 PU5 slice 5 slice 5 PU6 PU6 slice 6 slice 6 PU7 PU7 slice 7 slice 7 Encoding time Encoding time

  25. Adaptive Slice-level ParallelismFixed MB assignment • The imbalance of computational load distribution Fast ME / Fast Mode Search Exhaustive Search Method

  26. Adaptive Slice-level ParallelismFixed MB assignment • Computational load for encoding one frame in slice level parallelism • Computation load of the tth frameby a single processor system Ctslice(i) : The computational load of ith slice in tthframe L : Number of slice in a frame

  27. Adaptive Slice-level ParallelismFixed MB assignment • The speedup of multiprocessor system over a single processor system • To achieve the maximum speedup • Computation loads of each slice should be as similar as possible  Adaptive slice partition method

  28. Adaptive Slice-level ParallelismComplexity estimation model • A simple estimation method by utilizing the result of fast MB mode selection • Define the group value g corresponding to the candidate MB modes

  29. Adaptive Slice-level ParallelismComplexity estimation model • Complexity model Ck,CHKIntra(g) : Complexity cost of the kth MB g : Group index einter : Estimated complexity cost of inter mode in g = 1 eintra : Complexity cost according to the intra mode check in g = 1 α1, α2, α3, β1 β2 β3 : Weighting values of complexity cost

  30. Adaptive Slice-level ParallelismComplexity estimation model • Relative computational load Assume einter = 1, eintra = 0 CHKintra = 0 α1=2.42, α2=3.12,α3=5.28 CHKintra = 1 Assume einter = 1, eintra = 3.97 β1=0.82, β2=0.83, β3=0.84

  31. Adaptive Slice-level ParallelismAdaptive MB assignment • The total computational load at the tth frame • Ideal computational load of each slice for the uniform workload distribution

  32. Adaptive Slice-level ParallelismAdaptive MB assignment • MB assignment of slice • Much better than fixed MB assignment in each slice

  33. Adaptive Slice-level ParallelismAdaptive MB assignment • Entire block diagram

  34. Experimental ResultsOverview • Performance comparison between proposed MB mode decision and the conventional method • Comparing adaptive slice-level parallelism with fixed slice-level parallelism

  35. Experimental ResultsMB mode selection • Average encoding time saving AST[%] • BDPSNR and BDBR are used to measure the performance against FULL_1Slice FULL_1Slice : Exhaustive method FMD_1Slice : Fast MB mode search method

  36. Experimental ResultsRate distortion curves

  37. Experimental Results • R-D performance compared to one slice per frame (FMD_1Slice)

  38. Experimental ResultsRate distortion curves

  39. Experimental ResultsSlice-level parallelism • Comparing adaptive and fixed slice level parallelism • Speedup Encoding time of one slice per frame by a single processor system The longest encoding time of a slice using fixed mode The longest encoding time of a slice using adaptive mode

  40. Experimental ResultsSpeedup

  41. Conclusions • Proposed a fast MB mode selection using consistency history of block size and a zero MV • Proposed a intra mode selection by comparing the correlation • Using these two schemes, they proposed a new adaptive slice-level parallelism to speed up H.264/AVC encoder

  42. Reference • Z. Chen, P. Zhou, Y. He, Fast motion estimation for JVT, JVT Doc.JVT-G016,March 2003. • B. Jeon, J. Lee, Fast mode decision for H.264, JVT-J003, ISO/IEC MPEG and ITU-T VCEG Joint Video Team, (Waikoloa, HI), December 2003. • I. Choi, J. Lee, B. Jeon, Fast coding mode selection with rate-distortion optimization for MPEG-4 Part-10 AVC/H.264, IEEE Trans. Circuits Syst. VideoTechnol. 16 (12) (2006) 1557–1561.

More Related