1 / 14

Performance Enhancement of Video Compression Algorithms using SIMD

Performance Enhancement of Video Compression Algorithms using SIMD. Valia, Shamik Jamkar, Saket. Motivation. Understand the SSE architecture Understand the Video compression algorithm and identify the bottlenecks. Improve performance of Video Compression Algorithm using the SSE platform.

minor
Download Presentation

Performance Enhancement of Video Compression Algorithms using SIMD

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Performance Enhancement of Video Compression Algorithms using SIMD Valia, Shamik Jamkar, Saket

  2. Motivation • Understand the SSE architecture • Understand the Video compression algorithm and identify the bottlenecks. • Improve performance of Video Compression Algorithm using the SSE platform

  3. Components of Video Compression Algorithm • Motion Estimation • Motion Compensation and Image Subtraction • Discrete Cosine Transform • Quantization • Run Length Encoding • Huffman Coding

  4. Bottleneck • Motion Estimation • It is the process of calculating motion vectors by searching image blocks from a reference image in a new target image • DCT • Technique to change from the time domain to spatial frequency domain • Highest energy compaction after KLT

  5. SSE 2 Specifics • Intel C/C++ Compiler 8 • 3 coding styles • Intrinsics • Assembly • Vector Ops • Use of Intrinsics • _mm_sad_epu8 for __m128i datatype • _m_psadbw for __m64 datatype

  6. SSE2 platform for Motion Estimation

  7. Original Frame from Video

  8. Part of Frames 4 and 5

  9. Motion Compensated frames 16 x 16 8 x 8

  10. Discrete Cosine Transform • 2-D DCT is extensively used in JPEG compression algorithm. • Highly computational intensive. • FOCUS • Exploring DCT implementation on SSE2. • Identify the DCT algorithm which is scalable with the SIMD Architecture

  11. DCT hardware Accelerator • Distributed Arithmetic • Choice of DA implementation of DCT • Scalable with SSE platform. • 2-D 8x8 DCT operations can be performed as • Preprocessing • 1-D DCT (Using DA) • Transpose • 1-D DCT (Using DA) • Post Processing

  12. 1-D DCT on SSE2 using DA x0+ x7 x1+x6 x2+x5 x3+x4 x0-x7 x1-x6 x2-x5 x3-x4 4 DAP DAP DAP DAP DAP DAP DAP ROM ROM 16 0.5 16 + X2 X4 X6 X1 X3 X5 X7 16 16 • Total of 8 DAP structures. • Each DAP completes operations in 8 cycles • Scalable on various datapaths 16,32,64,128. • DAP subword dest,source R 0.25 16 X0

  13. Work done • Accomplished • Motion Estimation coding and analysis • DCT hardware accelerator in Verilog • ISA extension for DCT implementation. • To be done • Synthesis to get delay and area estimate • Assembly code with SSE-DCT enhancements and its performance analysis

  14. Questions

More Related