1 / 66

Low Complexity H.264 Encoder using Machine Learning.

Low Complexity H.264 Encoder using Machine Learning. THEJASWINI PURUSHOTHAM Electrical Engineering Graduate Student The University of Texas at Arlington Advisor D r. K. R. Rao , EE Dept, UTA. Agenda. Introduction. H.264/AVC. Machine learning. C4.5. Weka . Thesis Approach. Results.

gay
Download Presentation

Low Complexity H.264 Encoder using Machine Learning.

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Low Complexity H.264 Encoder using Machine Learning. THEJASWINI PURUSHOTHAM Electrical Engineering Graduate Student The University of Texas at Arlington Advisor Dr. K. R. Rao, EE Dept, UTA 8 Septmeber 2010

  2. Agenda Introduction. H.264/AVC. Machine learning. C4.5. Weka. Thesis Approach. Results. Conclusions. 8 Septmeber 2010

  3. H.265/HEC / NGVC 2010 Coding Efficiency Network awareness Complexity VC-1 2005 SVC HDTV H.264 2003 Mobile TV MPEG4 1999 Hand PC Video Conferencing MPEG2 H.263 Mobile Phone 1994 1992 MPEG1 video compression and standardization • Need for standardization • Ensures interoperability • Importance of video • Need for compression • High bandwidth requirements • Remove inherent redundancy 8 Septmeber 2010

  4. Motivation for the research 8 Septmeber 2010

  5. Motivation for a low complexity H.264 encoder H.264 can achieve considerably higher coding efficiency than previous standards. Motion estimation, in-loop deblocking filter, sub-pel interpolationand mode decision bring in the complexity. The high-computational complexity of H.264 and real-time requirements of video systems are the main challenges. 8 Septmeber 2010

  6. Overview of H.264/AVC 8 Septmeber 2010

  7. Design Features Highlights • Features for enhancement of prediction • Directional spatial prediction for intra coding • 9 intra 4x4 modes + 4 intra 16x16 modes + 9 intra 8x8 modes • Variable block-size motion compensation with small block size • 16x16, 16x8, 8x16, 8x8, 8x4, 4x8, 4x4 • Quarter-sample-accurate motion compensation • Multiple reference picture motion compensation • In-the-loop deblocking filtering to remove blocky artifacts • Features for improved coding efficiency • Small block-size transform – 4x4 and 8x8 integer DCT • Exact-match inverse transform • Short word-length transform • Hierarchical block transform • Arithmetic entropy coding • Context-adaptive entropy coding 8 Septmeber 2010

  8. H.264 - Encoder 8 Septmeber 2010

  9. H.264 Decoder 8 Septmeber 2010

  10. H.264 Decoder 8 Septmeber 2010

  11. Overview of machine learning 8 Septmeber 2010

  12. Machine learning is a subfield of artificial intelligence. It is the subject concerned with the design and development of algorithms and techniques that allow computers to learn. Machine learning method in this thesis extracts rules and patterns out of massive data sets. The major focus of machine learning research is to extract information from data automatically, by computational and statistical methods. 8 Septmeber 2010

  13. C4.5 classifier 8 Septmeber 2010

  14. C4.5 was developed by Ross Quinlan. C4.5 (know as a J48) is a system that constructs classifiers. Classifiers are one of the commonly used tools in data mining. Such systems take as input a collection of cases, each belonging to one of a small number of classes and described by its values for a fixed set of attributes. With that, a classifier accurately predicts the class to which a new case belongs. C4.5 uses the information gain of the data attribute to sort the data. 8 Septmeber 2010

  15. Illustration of C4.5 classification 8 Septmeber 2010

  16. Decision tree 8 Septmeber 2010

  17. WEKA Weka is a collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from another Java code. Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. It is also well-suited for developing new machine learning schemes [25]. 8 Septmeber 2010

  18. Complexity in the h.264 encoder 8 Septmeber 2010

  19. Figure 1: Multi-frame Motion Estimation. 8 Septmeber 2010

  20. The most computational expensive process in H.264 is the Motion Estimation. For example, assuming FS and P block types, Q reference frames and a search range of MxN, MxNxPxQcomputions are needed. 8 Septmeber 2010

  21. Approach in this thesis 8 Septmeber 2010

  22. Approach J4.8 analysis is used to reduce the complexity of determining mode decisions. The statistics for each 16x16 macroblock of the first four frames of the video sequence is calculated. The statistics are the mean, variance, variance of means for all the sub macroblock sizes in the macroblock, mean of the adjacent macroblocks, variance of the adjacent macroblocks and variance of means for all the submacroblock sizes in the adjacent blocks. 8 Septmeber 2010

  23. Figure 2:Flow chart of the process followed to achieve the low complexity encoder. 8 Septmeber 2010

  24. The modes for the same first four frames from the video sequences are determined from the H.264 encoder in the JM 16.2 software. These modes and the determined statistics are collectively given as attributes for training in the WEKA tool. This is an offline process. WEKA tool uses C4.5 (J48) classifier algorithm to determine the mode decision tree. A universal tree that can give relatively accurate mode decisions to any video sequence is developed. 8 Septmeber 2010

  25. Different combination of video sequences are used for training the mode decision trees and later testing the mode decision trees. Table 1 summarizes the results. The attributes most commonly considered for mode decision in all the entries in the table are considered to determine the mode decision for the universal mode decision tree. This tree is implemented in the form of if – else statements in the motion estimation block of JM16.2. Hence, the mode decision process is reduced to if –else statements. 8 Septmeber 2010

  26. Attributes in the thesis The metrics used in the decision trees are the mean, variance, variance of means, residual absolute sum, residual mean, residual variance, residual variance of means and means of variance. These metrics were calculated for the main MB shapes 16x16, 8x8 and 4x4. 8 Septmeber 2010

  27. Decision Tree for mode decision 8 Septmeber 2010

  28. Table 1:Classification rule accuracy 8 Septmeber 2010

  29. Table 1 summarizes the WEKA tool results. The accuracy in determining the modes from the classification rule is summarized. 8 Septmeber 2010

  30. Table 2: Results obtained using JM 16.2 and JM using machine learning for 4 frames. 8 Septmeber 2010

  31. Table 3: Speed up in encoding time and motion estimation time for 4 frames using machine learning compared to JM 16.2 encoder. 8 Septmeber 2010

  32. Motion estimation time for 4 frames for sequences in Table 3. 8 Septmeber 2010

  33. Table 4: Comparison of compressed file sizes for four frames for sequences in Table 2. 8 Septmeber 2010

  34. Compressed file sizes using machine learning for four frames for sequences in Table 4. 8 Septmeber 2010

  35. Table 5: Comparison of PSNR and MSE for four frames. 8 Septmeber 2010

  36. Comparison of PSNR and MSE for four frames in Table 5. 8 Septmeber 2010

  37. Table 6: SSIM comparison for four frames. 8 Septmeber 2010

  38. Comparison of SSIM for four frames in Table 6. 8 Septmeber 2010

  39. CONCLUSIONS It was observed that a single universal mode decision tree failed in terms of fidelity of the video when all the modes for ME/MC were used in the machine learning algorithm. So this thesis uses only sub macroblock modes, i.e 8x8, 8x4, 4x8 and 4x4 modes for the machine learning. The function called ‘submacroblock_mode_decision’ in the JM 16.2 was replaced by the if-else statements . The results are tabulated in the Tables 7 through 11. From Table 8, it is clear that the average speed up in the encoding time is 28.5%. The average speed up in the motion estimation time is 42.846%. From table 9, the average percentage decrease in compressed file size is 0.36%. From Table 11, it is evident that the average decrease in SSIM is less than 0.0107%. When 100 frames are encoded the average speed up in the encoding time is 8.5%. The average speed up in the motion estimation time is 18.346% and the average decrease in SSIM is less than 0.0109%. 8 Septmeber 2010

  40. REFERENCES [1] http://iphome.hhi.de/suehring/tml/ for JM software [2] Soon-kak Kwon, A. Tamhankar and K.R. Rao ”Overview of H.264 / MPEG-4 Part 10”, J. Visual Communication and Image Representation, vol. 17, pp.186-216, April 2006. [3] http://www.vcodex.com/files/h264_overview_orig.pdf reference for H.264 [4] http://iphome.hhi.de/suehring/tml/JM%20 Reference%20Software%20Manual%20(JVT- AE010).pdf for JM reference software documentation manual [5] G. A. Davidson, et al “ATSC video and audio coding”, Proceedings of IEEE, vol. 94, pp. 60- 76, Jan. 2006 [6] http://www.birds-eye.net/definition/c/cif-common_intermediate_format.shtml for information about CIF and QCIF formats [7] M.Fieldler, “Implementation of basic H.264/AVC Decoder”, seminar paper at Chemnitz University of Technology, June 2004 [8] A.Puri, X.Chen and A. Luthra , “ Video coding using H.264/MPEG-4 AVC compression standard”, Science Direct. Signal processing: Image communication, vol.19, pp 793-849, Oct. 2004. [9] T.Wiegand, et al “Overview of the H.264/AVC video coding standard”, IEEE Trans. CSVT, vol.13, pp 560-576, July 2003. 8 Septmeber 2010

  41. [10] T. Wiegand and G. J. Sullivan, “The H.264 video coding standard”, IEEE Signal Processing Magazine, vol. 24, pp. 148-153, March 2007. [11] D. Marpe, T. Wiegand and G. J. Sullivan, “The H.264/MPEG-4 AVC standard and its applications”, IEEE Communications Magazine, vol. 44, pp. 134-143, Aug. 2006. [12] R. Schäfer, T. Wiegand and H. Schwarz, “The emerging H.264/AVC standard”, EBUTechnical Review, Jan. 2003. [13] Video test sequences (YUV 4:2:0): http://trace.eas.asu.edu/yuv/index.html [14] Z. Wang et al, “Image quality assessment: From error visibility to structural similarity,” IEEE Trans. on Image Processing, vol. 13, pp. 600-612, Apr. 2004.  [15] Z. Wang, L. Lu, and A.C. Bovik, “Video quality assessment based on structural distortion measurement,” Signal Processing: Image Communication, Special Issue on Objective Video Quality Metrics, vol. 19,pp. 122-124, Jan. 2004.  [16] Z. Wang, H.R. Sheikh, and A.C. Bovik, “Objective video quality assessment,” in The Handbook of Video Databases: Design and Applications (B. Furht and O. Marques, eds.), pp. 1041–1078, CRC Press, Sept. 2003.  [17] T.K. Tan, G. Sullivan and T. Wedi, “Recommended simulation conditions for coding efficiency experiments”, ITU-T SC16/Q6, 34th VCEG Meeting, Antalya, Turkey, Jan. 2008, Doc.VCEG-AH10r3. [18] P.Carrillo, H.Kalva, and T.Pin, “Low complexity H.264 video encoding”, Applications of Digital Image Processing. Proc. of SPIE, vol. 7443, 74430A, Sept.2009. 8 Septmeber 2010

  42. [19] G.Sullivan and T.Wiegand, “Video compression – From concepts to the H.264/AVC Standard,” Proc. IEEE, vol.93, pp. 18-31, Jan.2005. [20] http://www.apple.com/quicktime/technologies/h264/ for H.264 codec reference [21] D. Kumar, P. Shastry and A. Basu, “Overview of the H.264 / AVC”, 8th Texas Instruments Developer Conference India, 30 Nov. – 1 Dec. 2005, Bangalore. [22] http://wiki.multimedia.cx/index.php?title=Motion_Prediction for motion prediction [23] Zhi-Yi Mai, et al “A new-rate distortion optimization using structural information in H.264 I-frame encoder” ACIVS 2005, LNCS 3708, pp. 435–441, 2005 [24] Z. Wang and A. C. Bovik, Modern Image Quality Assessment. Synthesis Lectures on Image, Video and Multimedia Processing. Morgan and Claypool, 2006. [25] http://www.cs.waikato.ac.nz/ml/weka/ for WEKA tool download [26]I.Richardson, “The H. 264 Advanced Video Compression Standard”, Wiley , 2006. [27]I.E.Richardson, “The H. 264 Advanced Video Compression Standard”, Wiley, II edition, 2010. [28] HTTP://iphome.hhi.de/suehring/tml/download/ , JM reference software. [29] http://trace.eas.asu.edu/yuv/index.html, Video sequences. [30] E. Peixoto, R. L. de Queiroz, and D. Mukherjee, “Mobile video communications using a Wyner-Zivtranscoder,” Proc. SPIE 6822, VCIP, 68220R Jan. 2008. [31] A. Aaron, D. Varodayan, and B. Girod, “Wyner-Ziv residual coding of video,” Proc. International Picture Coding Symposium, Beijing, P. R. China , April 2006. 8 Septmeber 2010

  43. THANK YOU 8 Septmeber 2010

  44. H.264 - Profiles 8 Septmeber 2010

  45. Design Features Highlights • Features for enhancement of prediction • Directional spatial prediction for intra coding • Variable block-size motion compensation with small block size • Quarter-sample-accurate motion compensation • Motion vectors over picture boundaries • Multiple reference picture motion compensation • Decoupling of referencing order from display order • Decoupling of picture representation methods from picture referencing capability • Weighted prediction • Improved “skipped” and “direct” motion inference • In-the-loop deblocking filtering 8 Septmeber 2010

  46. Features for improved coding efficiency • Small block-size transform • Exact-match inverse transform • Short word-length transform • Hierarchical block transform • Arithmetic entropy coding • Context-adaptive entropy coding 8 Septmeber 2010

  47. Features for robustness to data errors/losses • Parameter set structure • NAL unit syntax structure • Flexible slice size • Flexible macroblock ordering (FMO) • Arbitrary slice ordering (ASO) • Redundant pictures • Data Partitioning • SP/SI synchronization/switching pictures 8 Septmeber 2010

  48. Directional spatial prediction for intra coding Intra prediction is to predict the texture in current block using the pixel samples from neighboring blocks Intra prediction for 44 (9 modes) and 16  16 blocks (4 modes) are supported in all H.264 profiles. Intra prediction for 8x8 (9 modes) is supported in the high profiles. 8 Septmeber 2010

  49. Luma prediction modes in H.264 8 Septmeber 2010

  50. Variable block-size motion compensation • Partitioned in 2 stages • In the 1st stage, determine first 4 modes • 1616 • 168 • 816 • 88 • If mode 4 (88) is chosen, further partition into smaller blocks for every 88 block • 84 • 48 • 44 • At most 16 motion vectors may be transmitted for a 1616 macroblock • Sub pixel accuracy • Large computational complexity to determine the modes but efficient encoding 8 Septmeber 2010

More Related