1 / 19

Exploiting Operation Level Parallelism through Dynamically Reconfigurable Datapahts

Exploiting Operation Level Parallelism through Dynamically Reconfigurable Datapahts. Zhining Huang and Sharad Malik DAC 2002. Outline. Introduction Methodology overview Datapaths for kernel loops Reconfigurable Datapath Benchmark studies Conclusions and futrue work. Introduction.

neomaj
Download Presentation

Exploiting Operation Level Parallelism through Dynamically Reconfigurable Datapahts

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Exploiting Operation Level Parallelism through Dynamically Reconfigurable Datapahts Zhining Huang and Sharad Malik DAC 2002

  2. Outline • Introduction • Methodology overview • Datapaths for kernel loops • Reconfigurable Datapath • Benchmark studies • Conclusions and futrue work

  3. Introduction • Programmable platforms • Bit-level programmable • Coarse-grained FPGA • Word-level programmable • Dynamically reconfigurable co-processor • Fixed hardware blocks • Programmable interconnect • Accelerate loops

  4. Methodology overview • Master processor • Reconfigurable co-processor • Reconfigurable datapath • ASIC-like function units • Reconfigurable interconnections • Control logic • State machine • Control datapath execution

  5. Datapaths for kernel loops • Extract kernel loops • Direct mapping of kernel loop datapath • Branch condition transforms • Pipelining the execution • Estimation of the pipeline execution time

  6. Extract kernel loops • Use IMPACT compiler • Profiling • Loop detection (only inner loops) • Data dependence analysis • Register live-in/out • Data dependence between instructions • Within a loop • In different loop

  7. Direct mapping of kernel loop datapath

  8. Branch condition transforms • Into different datapaths • Selected by multiplexer

  9. Pipelining the execution • Insert registers in the datapah • Data dependence • Delay or by pass • From registers or memory operations • Four data dependence cases and solutions • Tstore<Tload : delay store • Tstore1<Tstore2 : eliminate store1 • Two loads : do nothing • Tload<Tstore : delay or bypass

  10. Estimation of the pipeline execution time • T=[S+D*(N-1)]+O+W cycles • S: total number of pipeline stages of a datapath • D: delay of the consecutive loop iteration • N: loop iteration number • O: switch overhead • W: write back cycles

  11. Reconfigurable Datapath • Datapath mapping • Routing box • Critical path and clock speed • Reconfiguration overhead

  12. Datapath mapping

  13. Routing box

  14. Critical path and clock speed • Trouting box+Tfunction unit+Twire delay • Sophisticated control and function unit • Twice or more cycle for longer timing

  15. Reconfiguration overhead • Overhead • Reconfiguration context switch • Execution switching to the datapath • Execution switching back • Number of loops are selected • 8 or 16 • Control bits are stored in distributed co-processor • Register file bandwidth in master processor

  16. Benchmark studies

  17. Benchmark studies

  18. Benchmark studies

  19. Conclusions and future work • Methodology • Dynamically reconfigurable datapath • For a specific applications • The co-processor can be viewed as a VLIW • Loop restructuring techniques • Reduce data dependencies

More Related