1 / 23

Efficient Prediction Structure for Multi-view Video Coding

Efficient Prediction Structure for Multi-view Video Coding. Philipp Merkle, Aljoscha Smolic Karsten Müller, Thomas Wiegand CSVT 2007. Outline. Multi-view video coding (MVC) introduction Requirements and test conditions for MVC Prediction structures Experimental results Conclusion.

hien
Download Presentation

Efficient Prediction Structure for Multi-view Video Coding

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Efficient Prediction Structure for Multi-view Video Coding Philipp Merkle, Aljoscha Smolic Karsten Müller, Thomas Wiegand CSVT 2007

  2. Outline • Multi-view video coding (MVC) introduction • Requirements and test conditions for MVC • Prediction structures • Experimental results • Conclusion

  3. MVC Introduction • MVC: Multi-view Video Coding • Multi-view video (MVV): A system that uses multiple camera views of the same scene is called. • Usage: 3DTV, free viewpoint video(FVV), etc.

  4. Requirements for MVC • Temporal random access • View random access • Scalability • Backward compatibility • Quality consistency • Parallel processing

  5. Temporal and inter-view correlation temporal/inter-view mixed mode temporal/inter-view mixed mode Temporal T Inter-view T T

  6. Temporal and inter-view correlation analysis • H.264/AVC encoder was used with the following settings: • Motion compensation block size of 16*16 • Search range of ±32 pixels • Lagrange parameter (λ) of 29.5 • denotes the decrease of the average in comparison to temporal prediction only.

  7. Temporal and inter-view correlation analysis (cont’d) • Simply including temporal and inter-view prediction modes

  8. Lagrangian cost function • Lagrangian cost function: • D denotes distortion. • R denotes number of bits to transmit all components of the motion vector. • For each block in a picture, algorithm chooses MVwithin a search rage that minimizes . • The distortion in the subject macroblock B is calculated by: (1) (2) (3)

  9. Test data and test conditions • 1D camera: Ballroom, Exit, Rena, Race1, Uli, (line) Breakdancers (arched) • 2D camera: Flamenco2 (cross), AkkoKayo (array) • Use 5 to 16 camera views • Target high quality TV-type video (640*480 or 1024*768) then limited channel communication-type video.

  10. Knowledge – hierarchical B picture, QP cascading • Hierarchical B picture, key picture, non-key picture: • QP cascading : [1] key picture key picture [1] “Analysis of hierarchical B pictures and MCTF”, ICME 2006, IEEE International Conference on Multimedia and Expo, Toronto, Ontario, Canada, July 2006

  11. Knowledge – DPB size • Decoded Picture Buffer (DPB) size is increased to:[2] Memory-efficient reordering of multi-view input for compression [2] “Efficient Compression of Multi-view Video Exploiting Inter-view Dependencies Based on H.264/AVC”, ICME 2006, IEEE International Conference on Multimedia and Expo, Toronto, Ontario, Canada, July 2006

  12. Two tasks • To adapt the multi-view prediction schemes to the specific camera arrangements of the test data sets. • To adapt the prediction structures to the random access specification.

  13. Prediction structure • Simulcast coding structure • To allow synchronization and random access, all key pictures are coded in intra mode.

  14. Prediction structure (cont’d) • The first view is called base view (remains the I frame).

  15. Prediction structure (cont’d) • Alternative structures of inter-view for key pictures Linear camera arrangement 2D Camera array KS_IPP KS_PIP KS_IBP KS_IPP KS_PIP KS_IBP

  16. Prediction structure (cont’d) • Inter-view prediction for key and non-key pictures AS_IPP mode

  17. Experimental results – objective evaluation Average coding gains compared with anchor coding Ballroom test result

  18. Experimental results – subjective evaluation • Different bit-rates were selected for the different data sets. Ballroom test result Race1 test result

  19. Experimental results – subjective evaluation • AS_IBP outperforms the anchors significantly. • The gain decreases slightly with higher bit-rates. Average results over all test sequences

  20. Influence of camera density • Using Rena sequence, and consisting of 16 linear arranged cameras with a 5 cm distance between two adjacent cameras • Repeated for each shifted set of 9 adjacent cameras • The structure are applied to every time instance of the MVV sequence without temporal prediction.

  21. Results of experiments on camera density • Coding gain increases with decreasing camera distance and decreasing reconstruction quality.

  22. Results of experiments on camera density (cont’d) • Results of average per camera rate relative to the one camera case(→) • Alarger QP value leads to a larger coding gain

  23. Conclusion • Resulting multi-view prediction: achieving significant coding gains and being highly flexible. • Parallel processing is supported by the presented sequential processing approach. • Problems: • Large disparities between the different views of multi-view video sequences • Illumination and color inconsistencies across views

More Related