1 / 26

Accelerating Image Processing Pipelines in a Hardware/Software Environment

Accelerating Image Processing Pipelines in a Hardware/Software Environment. Heather Quinn , Dr. Miriam Leeser, Northeastern University Dr. Laurie Smith King College of the Holy Cross. Outline. Background Image processing and hardware The cost of codesign systems Image processing pipelines

rollin
Download Presentation

Accelerating Image Processing Pipelines in a Hardware/Software Environment

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Accelerating Image Processing Pipelines in a Hardware/Software Environment Heather Quinn, Dr. Miriam Leeser, Northeastern University Dr. Laurie Smith King College of the Holy Cross

  2. Outline • Background • Image processing and hardware • The cost of codesign systems • Image processing pipelines • An example • The Pipeline Assignment Problem • Solving Pipeline Assignment

  3. Goal of Project • Accelerate image processing tasks through efficient use of FPGAs • Combine already designed components at runtime to implement series of transformations (pipelines)

  4. Image Processing Tasks • Good candidates for FPGAs • Algorithms are often explicitly parallel • Input and output data large but individual pixels are small • Data access regular

  5. Hardware Systems • Using hardware incurs execution costs not present in software systems • hardware initialization • communicating image • reprogramming

  6. Efficient Use of FPGAs • Software algorithm’s runtime for small images less than the hardware costs • Profiling the hardware and software runtimes for different image sizes determines the crossover point • Deciding at runtime to execute in software or hardware is simple for one algorithm processing one image

  7. Image Processing Pipelines • Series of image processing algorithms applied to an image • Each algorithm has a software and hardware implementation • Finding the crossover point for a pipeline is complicated • Exponential number of implementations • Reprogramming costs • Need a strategy to find a fast pipeline implementation at runtime

  8. Median Filter & Edge Detection Median Repgm Edge Det Get Data Display Start App HW Init Send Data 300 ms .00105 ms per pixel 70 ms .00105 ms per pixel An Example Need pipeline implementations that minimize reprogramming and communication costs

  9. Possible Implementations • Blue boxes are hw/sw boundaries • Red boxes are fixing image edges • Green Boxes are reprogramming

  10. Median Filter and Edge Detection Profiles

  11. Median FilterEdge Detection Profiles

  12. Problem Statement • Inputs: a profiled library of image processing components, a pipeline, and an image • Output: an assignment of each component to a hardware or software implementation

  13. The Library of Components • Each component has two implementations: hardware and software • Each implementation has known runtimes for a set of images • Interpolation used for rest of images • Each hardware implementation has a known area size • Each component interface is image in/image out

  14. Assumptions • Reprogramming and communication costs incurred at sw/hw boundaries • Might need to fix image edge in between components • Problems sizes of 20 or fewer stages • 500 ms to make a decision

  15. Solving Pipeline Assignment • Exhaustively • ILP • Greedy • Local Search • Experiments

  16. Related Problems • Codesign partitioning problem • ILP: Niemann and Marwedel • Simulated Annealing: Cosyma • Iterative: Ptolemy • Parallel computing scheduling • Local Search: UNM

  17. Exhaustive • Find: Optimal solutions • How: Search entire problem space • Algorithm Runtime: O(2N), where N is the number of pipeline stages

  18. ILP • Find: Optimal solutions • How: AMPL model running on CPLEX • Need: ILP formulation of the problem statement • Algorithm Runtime: Unknown

  19. Greedy • Find: Sub-optimal solutions • How: Make optimal decisions for each pipeline stage based on hardware area usage and speedup values • Algorithm Runtime: O(N), where N is the number of pipeline stages

  20. Local Search • Find: Sub-optimal solutions • How: Improve upon initial solutions (found through greedy or randomly) • Algorithm Runtime: Runs for user supplied amount of time

  21. Experiments • Synthetic components arranged into pipelines of length 1 to 20 • Exhaustive algorithm run to completion • Used as a baseline for solution quality • Timed to find 500 ms boundary • ILP solver constrained to 500 ms • Ability to solve dependent on components • Local Search returns best solution found within time limit

  22. Results • Optimal solutions in 500 ms • Exhaustive: up to 11 stages • ILP: all pipelines up to 13 stages, some pipelines up to 18 stages, and none larger • Sub-optimal solutions in 500 ms • Greedy and local search: all problem sizes • Strategy • Exhaustive and ILP up to 13 stages • Greedy or local search for more than 13 stages

  23. Optimal Solutions in 500 ms

  24. Conclusion • Defined pipeline assignment • Introduced 4 possible ways to solve the problem at runtime • Found that 3 algorithms to efficiently solve different problem sizes

  25. Future Work • ADAPT: Algorithm that calls exhaustive, ILP and local search algorithms to solve pipeline assignment problem based on problem size • Decision Time: Study how the amount of time allotted affects ADAPT results • Virtex II Pro: Add support for using embedded Power PC cores

  26. References [1] R. Niemann and P. Marwedel, Hardware/Software Partitioning using Integer Programming, Proceedings of the European Design and Test Conference, Paris, France, 1996, pp. 473-480. [2] J. Henkel, R. Ernst, U. Holtmann, and T. Benner, Adaptation of Partitioning and High-Level Synthesis in Hardware/Software Co-synthesis, Proceedings of International Conference on Computer-Aided Design, 1994. [3] A. Kalavade and E. A. Lee, A Gobal Criticality/Local Phase Driven Algorithm for the Constrainted Hardware/Software Partitioning Problem, Proceedings of CODES/CASHE 1994, Third International Workshop on Hardware/Software Codesign, Grenoble, France, Sept. 22-24, 1994, pp 42-48. [4] M.-Y. Wu,W. Shu and J. Gu, Efficient Local Search for DAG Scheduling, IEEE Transactions on Parallel and Distributed Systems, vol 12, num 6

More Related