1 / 37

A Novel Algorithm Combining Temporal Partitioning and Sharing of Functional Units

A Novel Algorithm Combining Temporal Partitioning and Sharing of Functional Units. João M. P. Cardoso April 30 , 2001 IEEE Symposium on Field-Programmable Custom Computing Machines, Rohnert Park, CA, USA. Faculty of Sciences and Technology University of Algarve, Faro. Portugal. Index.

ciqala
Download Presentation

A Novel Algorithm Combining Temporal Partitioning and Sharing of Functional Units

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Novel Algorithm Combining Temporal Partitioning and Sharing of Functional Units João M. P. Cardoso April30, 2001 IEEE Symposium on Field-Programmable Custom Computing Machines, Rohnert Park, CA, USA Faculty of Sciences and Technology University of Algarve, Faro Portugal

  2. Index • Introduction • Temporal Partitioning • Problem Definition • New vs Previous Approach • Algorithm Working Through an Example • Experimental Results • Related Work • Conclusions • Future Work

  3. Introduction • “Virtual Hardware”: • Reuse of devices • Save silicon area • View “unlimited resources” • Enabled by the dynamically reconfigurable FPGAs • Two concepts: • Context switching among functionalities • Allowing a large “function” to be executed • FPGA devices allowing virtualization: • off-chip configurations • on-chip configurations • Several research efforts…

  4. dx dx u y dx x x u << 1 << 1   y + + + x_1  y_1  dx u  - - u_1 Introduction • Size larger than the available reconfigware area? • Answers: • Temporal Partitioning • Sharing of Functional Units • Goal: combining the two...

  5. dx << 1   y + + x_1  y_1 aux1 Temporal Partitioning dx u x x u time

  6. dx y << 1 + aux1  dx u  - - u_1 Temporal Partitioning time

  7. Temporal Partitioning dx dx u y dx x x u << 1 << 1   y + + + x_1  y_1 aux1  aux1 dx u  - - u_1 time

  8. Temporal Partitioning • Create temporal partitions to be executed by time-sharing the device • Netlist level (structural) • Difficulties when dealing with feedbacks • Loss of Information • Flat structure • Intricate for exploiting sharing of functional units • Behavioral level (functional) • Loops can be explicitly represented • Better design decisions • “A must” for compilers for reconfigurable computing

  9. Problem Definition But, if we decrease the needed area by sharing functional units? • Simultaneously Temporal Partitioning and sharing of Functional Units THE PROBLEM: • Given a dataflow graph (representing a behavioral description), a library of components,... • Map the dataflow graph onto the available resources of the FPGA device: • Considering sharing of Functional Units • Considering Temporal Partitioning • Decreasing the overall execution latency

  10. DFG, CDFG DFG, CDFG Constraints Constraints Temporal Partitioning Simultaneously Temporal Partitioning and High-Level Synthesis Component Library Component Library High-Level Synthesis Circuit-generation, Logic Synthesis Circuit-generation, Logic Synthesis New vs Previous Approach • Previous • New

  11. 3 0 1 4 2 5 Algorithm Working Through an Example Suppose the following dataflow graph • Consider: • Area(+) = 1 cell • Area(x) = 2 cells • Delay(+) = 1 control step (cs) • Delay(x) = 2 cs • Total area of the DFG: 8 cells • Available Area: 3 cells

  12. 3 0 1 4 2 5 Algorithm Working Through an Example Calculate ASAP and ALAP values Node 012345 ASAP0 0 10 23 ALAP 1 12 0 2 3

  13. Algorithm Working Through an Example Identify the critical path 0 1 3 Node 012345 ASAP0 0 10 23 ALAP 1 12 0 2 3 4 2 5

  14. Algorithm Working Through an Example Create an initial number of TPs: suppose 3 Area MAXCS 0 1 3 1 4 2 2 5 3

  15. 3 4 5 Algorithm Working Through an Example Map each node of the critical path on each temporal partition Area MAXCS 0 1 3 1 2 cs 4 2 2 1 cs 5 3 1 cs

  16. 3 4 5 Algorithm Working Through an Example Try to map nodes in each temporal partition (1) Area MAXCS 0 1 3 1 2 cs 4 2 2 1 cs 5 3 1 cs

  17. 0 3 4 5 Algorithm Working Through an Example Try to map nodes in each temporal partition (1) Area MAXCS 0 1 3 1 2 cs 4 2 2 1 cs 5 3 1 cs

  18. 0 1 3 4 5 Algorithm Working Through an Example Try to map nodes in each temporal partition (1) Area MAXCS 0 1 3 1 2 cs 4 2 2 1 cs 5 3 1 cs

  19. 0 1 3 4 5 Algorithm Working Through an Example Try to map nodes in each temporal partition (1) Area MAXCS 0 1 3 1 2 cs 3 4 2 2 1 cs 5 3 1 cs

  20. 0 1 3 4 5 Algorithm Working Through an Example Try to map nodes in each temporal partition (2) Area MAXCS 0 1 3 1 2 cs 4 2 2 2 1 cs 5 3 1 cs

  21. 0 1 3 4 5 Algorithm Working Through an Example Try to map nodes in each temporal partition (3) Area MAXCS 0 1 3 1 2 cs 4 2 2 2 1 cs 5 3 1 cs

  22. 0 1 3 4 5 Algorithm Working Through an Example Relax: add 1 clock step to MAXCS Area MAXCS 0 1 3 1 2 cs 4 2 2 1 cs 5 3 1 cs

  23. 0 1 3 4 5 Algorithm Working Through an Example Try to map nodes in each temporal partition (1) Area MAXCS 0 1 3 1 2 cs 3 4 2 2 1 cs 5 3 1 cs

  24. 0 1 3 4 5 Algorithm Working Through an Example Try to map nodes in each temporal partition (2) Area MAXCS 0 1 3 1 2 cs 4 2 2 2 1 cs 5 3 1 cs

  25. 0 1 3 4 2 5 Algorithm Working Through an Example Try to map nodes in each temporal partition (2) Area MAXCS 0 1 3 1 2 cs 4 2 2 2 1 cs 5 3 1 cs

  26. 0 1 3 4 2 5 Algorithm Working Through an Example Merge Operation (1) Area MAXCS 0 1 3 1 2 cs 4 2 2 2 cs 5 3 1 cs

  27. 0 1 3 4 2 5 Algorithm Working Through an Example Merge Operation (1) Area MAXCS 0 1 3 1,2 4 2 4 cs 5 3 1 cs

  28. 0 1 3 4 2 5 Algorithm Working Through an Example Merge Operation (2) Area MAXCS 0 1 3 1,2 4 2 4 cs 5 3 1 cs

  29. 0 1 3 4 2 5 Algorithm Working Through an Example Merge Operation (2) Area MAXCS 0 1 3 1,2,3 4 2 5 4 cs

  30. Experimental Results Near-optimal w/o sharing vs sharing EX1 SEHWA HAL EWF

  31. Experimental Results Near-optimal w/o sharing vs sharing 72 37 FIR MAT4x4

  32. Experimental Results Performance vs No. of Temporal Partitions • Mult4x4, RMAX=10 (no sharing of adders)

  33. Experimental Results Is the algorithm good for scheduling? • Comparison to some optimum results EWF SEHWA

  34. Related Work • List-Scheduling considering dynamic reconfiguration [Vasilko et al., FPL’96] • ASAP [GajjalaPurna et al., IEEE Trans. on Comp., 1999] • Minimize latency taking onto account communication costs [Cardoso et al. VLSI’99]: • Enhanced Static-List Scheduling • Iterative approach (Simulated Annealing) • ILP formulation [SPARCs, DATE’98; RAW’98] • Enhanced Force-Directed List Scheduling [Pandey et al., SPIE’99] • And others [see the Related Work section]

  35. Conclusions • Novel algorithm simultaneously doing temporalpartitioning and sharing offunctionalunits • Low complexity • Heuristic approach • Based on gradually enlarging of time slots • Permits to exploit the duality between the numberof temporal partitions and resource sharing • Close-to-optimum results with some examples • Results proved that the algorithm is not weak when performing scheduling

  36. Future Work • Enhancements to the algorithm: • consider functional units with pipelining • consider pipelining between execution and reconfiguration • Study the possibility to take into account communication and reconfiguration costs • Test results with a reconfigurable computing system (comercial board)

  37. Contact Author João M. P. Cardoso jmpc@acm.org http://w3.ualg.pt/~jmcardo THANK YOU!

More Related