1 / 37

Dynamically Parameterized Architectures for Power Aware Video Coding: Motion Estimation and DCT

Dynamically Parameterized Architectures for Power Aware Video Coding: Motion Estimation and DCT. Wayne Burleson (burleson@ecs.umass.edu) Prashant Jain (pjain@ecs.umass.edu) Subramanian Venkatraman (svenkatr@ecs.umass.edu). Dept. of Electrical and Computer Engineering

kagami
Download Presentation

Dynamically Parameterized Architectures for Power Aware Video Coding: Motion Estimation and DCT

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dynamically Parameterized Architectures for Power Aware Video Coding: Motion Estimation and DCT Wayne Burleson (burleson@ecs.umass.edu) Prashant Jain(pjain@ecs.umass.edu) Subramanian Venkatraman (svenkatr@ecs.umass.edu) Dept. of Electrical and Computer Engineering University of Massachusetts Amherst This work was partially supported by NSF-9988238

  2. Outline • Introduction • Video Content Variation • Dynamic Parameterization to achieve Power-Aware Video Coding • Motion Estimation & DCT • On-Going Work

  3. Introduction • Video Content and processing are non-uniform in space and time. • Video processing can gracefully degrade in power constrained environments. • Exploits Perceptual tolerance. • MPEG-4. • High level algorithm changes affect power efficiency the most.

  4. Recent Work • Configurable FPGA based Architectures [Villasenor ‘95]. • Heterogeneous architecture with Programmable Processors [Kneip ‘98]. • Heterogeneous Configurable architecture with on-chip low-power FPGA [Zhang ‘00]. • FPGAs • Slow • High power dissipation

  5. Adaptive System-On-a-Chip (aSOC) • Partially Predefined Configuration Architecture • Heterogeneous tiles with Statically scheduled interconnection switches • Tiles can be reconfigured internally as well as from an external source FPGA uP RISC SRAM ME/DCT Core DSP Switch Switch Memory RAM FPGA Ref. J. Liang et. al., aSOC: A Scalable, Single-Chip Communications Architecture in the Proceedings of the IEEE International Conference on Parallel Architectures and Compilation Techniques, 2000

  6. Outline • Introduction • Video Content Variation • Dynamic Parameterization • Motion Estimation & DCT • On-Going Work

  7. Content Variation across sequences

  8. Content Variation in Time Horizontal Component of the Motion Vectors

  9. Content Variation in Space Background: Not much variation High variation

  10. Outline • Introduction • Content Variation • Dynamic Parameterization • Motion Estimation & DCT • On-Going Work

  11. Standard Time IP Time Design Time Compile/ Boot Time Config. Time Run-Time Years… Months… Secs… msecs… secs… Dynamic Parameterization • Functional parameters vary the output of a computation. • Architectural parameters allow trade-offs in area, performance, power and reliability. • Parameters can be bound at varying stages.

  12. Dynamic Parameter Adjustment Predictor Inputs Area Speed Power • System Requirements and Constraints Predictor Archi. Para. Function. Para. Algo. & Archi. Stats. Signal Stats. Signal statistics from the Input Signals Signal Processing System Algorithm Algorithm statistics from the post processing of the Input Signals Precision, Quality, Compress. Signals Architecture Predictor Outputs Architectural and Functional Parameters Area, Latency, Power

  13. Functional Parameter Adjustment: Algorithms Full Search Logarithmic

  14. Functional Parameter Adjustment: Search Space • Larger search space improves chances of a good match. A Good match • Increasing search space is effective up to a point • Larger search space increases computations. bpp High Compression Plot for a specific sequence

  15. Power versus Search Area • Memories – Major contributors to Power dissipation. • Algorithms presented reduce memory accesses and computations. Our novel architecture reconfigures to different algorithms with reduced memory accesses and computations, thus saving power.

  16. Power Consumption in Video Coding ME Computation (%) DCT IDCT VLC, etc. Ref. Peter Kuhn, “Algorithms, Complexity Analysis and VLSI Architectures for MPEG-4 Motion Estimation”

  17. Outline • Introduction • Content Variation • Dynamic Parameterization • Motion Estimation & DCT • On-Going Work

  18. Functional Parameter: Full Search Selects the most representative block from an exhaustive set of candidate blocks within a search window.

  19. Functional Parameter: Spiral Search Performs a Spiral Search for the matching block. Algorithm is data dependent during run-time.

  20. Functional Parameter : 3-Step Search

  21. Functional Parameter: Pel Subsampling 16x16 Pixel Array 2:1 Subsampling 4:1 Subsampling

  22. Functional Parameter: Half-Pel ME • Current and Previous block data can be filtered to Half-Pel resolution. Ref. Peter Kuhn, “Algorithms, Complexity Analysis and VLSI Architectures for MPEG-4 Motion Estimation” A B a= (A+B+C+D)/2 b= (B+D)/2 a b c= (C+D)/2 C c D

  23. I/O Re-use Candidate Blocks Current Block Candidate blocks differ by a single row of pixels Can reuse the previous rows of pixels Previous rows are stored in FIFOs

  24. Matching Criteria • The Matching Criteria used is Sum of Absolute Differences (SAD). Ref. Peter Kuhn, “Algorithms, Complexity Analysis and VLSI Architectures for MPEG-4 Motion Estimation”

  25. Proposed Architecture for Dynamically Parameterized ME SRAM External to PE Array 16x16 PE Array PE 307,200 bytes/frame storage Summing Block PE Control Memory Block Address Generator Unit RAM Addresses

  26. Architecture: Processing Element (PE) FIFO Current Pixel &  256 bytes Half-Pel |c-p| Sum of Absolute Differences Local Control

  27. Outline • Introduction • Content Variation • Dynamic Parameterization • Motion Estimation & DCT • On-Going Work

  28. Discrete Cosine Transform • Integral part of any still-image or video compression system. • Compute intensive - next only to motion estimation. • Amenable to VLSI implementation – “Decomposition” property and “Distributed Arithmetic”.

  29. Decomposition Property 1D DCT in matrix notation 2D DCT~ 2 1D DCTs Ref. W.H. Chen at al., “A Fast Computational Algorithm for the Discrete Cosine Transform”, IEEE Trans. Commun.,

  30. Distributed Arithmetic A0 A1 Inner product computation of coefficient vector A and input vector X A1+A0 A1+A0 X00 X01 X02 X03 A2 X10 X11 X12 X13 Bit-serial arithmetic using Read Accumulate Computation (RAC) unit 4 to 16 Address Decoder X20 X21 X22 X23 X30 X31 X32 X33 A3+A2+A1 Facilitates variable-precision processing A3+A2+A1+A0 + X2 Ref. T. Xanthopoulos et al., “A Low-Power DCT Core Using Adaptive Bitwidth and Arithmetic Activity Exploiting Signal Correlations and Quantization”, IEEE JSSC 2000 Result

  31. Exploiting Content Variation • Most Significant Bit Rejection (MSBR) • RAC operation disabled in the presence of spatial correlation • Row Column Classification (RCC) • Reduction in overall arithmetic activity by imposing upper bound on RAC cycles • Replication of Arithmetic Units (RAU) • Replication of the RAC units – trade-off between Power and Performance

  32. Energy Efficiency Comparison Among DCT/IDCT Ref. T. Xanthopoulos et al., “A Low-Power DCT Core Using Adaptive Bitwidth and Arithmetic Activity Exploiting Signal Correlations and Quantization”, IEEE JSSC 2000

  33. Architecture of DCT Core Ref. T. Xanthopoulos et al., “A Low-Power DCT Core Using Adaptive Bitwidth and Arithmetic Activity Exploiting Signal Correlations and Quantization”, IEEE JSSC 2000

  34. Outline • Introduction • Video Content Variation • Dynamic Parameterization to achieve Power-Aware Video Coding • Motion Estimation & DCT • On-Going Work

  35. On-Going Work • Implementations at the RTL, netlist and physical levels. • Power estimation at the various levels mentioned above. • Techniques for statistically tracking content variation. • Full prototyping based on actual video workloads using a logic emulator from IKOS systems, and • Extensions to other parameterized multimedia computations (e.g. 3D Graphics, natural and synthetic audio).

  36. Conclusions • Content variation and Dynamic Parameterization can be used to achieve power aware video coding. • Proposed Motion Estimation & DCT architectures to be implemented to achieve the above.

  37. Dynamically Parameterized Architectures for Power Aware Video Coding: Motion Estimation and DCT http://vsp2.ecs.umass.edu/vspg/publication.html Wayne Burleson (burleson@ecs.umass.edu) Prashant Jain(pjain@ecs.umass.edu) Subramanian Venkatraman (svenkatr@ecs.umass.edu) Dept. of Electrical and Computer Engineering University of Massachusetts Amherst This work was partially supported by NSF-9988238

More Related