1 / 37

Compiling Application-Specific Hardware

Compiling Application-Specific Hardware. Mihai Budiu Seth Copen Goldstein Carnegie Mellon University. Resources. Problems. Complexity Power Global Signals Limited issue window => limited ILP. We propose a scalable architecture. Outline. Introduction

Download Presentation

Compiling Application-Specific Hardware

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Compiling Application-Specific Hardware Mihai Budiu Seth Copen Goldstein Carnegie Mellon University

  2. Resources

  3. Problems • Complexity • Power • Global Signals • Limited issue window => limited ILP We propose a scalable architecture

  4. Outline • Introduction • ASH: Application Specific Hardware • Compiling for ASH • Conclusions

  5. Application-Specific Hardware C program Dataflow IR Compiler dataflow machine Reconfigurable hardware

  6. Our Solution General: applicable to today’s software - programming languages - applications Automatic: compiler-driven Scalable:- run-time: with clock, hardware - compile-time: with program size Parallelism: exploit application parallelism

  7. Asynchronous Computation + data ack data valid

  8. New • Entire C applications • Dynamically scheduled circuits • Custom dataflow machines • - application-specific • - direct execution (no interpretation) • - spatial computation

  9. Outline • Scalability • Application Specific Hardware • CASH: Compiling in ASH • Conclusions

  10. Circuits Memory partitioning Interconnection net CASH: Compiling for ASH C Program RH

  11. Primitives Arithmetic/logic Multiplexors Merge Eta (gateway) Memory + data predicates data predicate ld st

  12. Forward Branches b x 0 if (x > 0) y = -x; else y = b*x; * - > ! y Decoded mux Conditionals => Speculation

  13. - > Critical Paths b x 0 if (x > 0) y = -x; else y = b*x; * ! y

  14. - > Lenient Operations b x 0 if (x > 0) y = -x; else y = b*x; * ! y Solve the problem of unbalanced paths

  15. 0 i * 0 +1 < 100 sum + ! ret Loops int sum=0, i; for (i=0; i < 100; i++) sum += i*i; return sum; Control flow => data flow

  16. Compilation • Translate C to dataflow machines • Optimizations software-, hardware-, dataflow-specific • Expose parallelism • predication • speculation • localized synchronization • pipelining

  17. i 1 Pipelining + * 100 <= pipelined multiplier sum +

  18. i 1 Pipelining + * 100 <= sum +

  19. i 1 Pipelining + * 100 <= sum +

  20. i 1 Pipelining + * 100 <= sum +

  21. i’s loop Longlatency pipe sum’s loop i 1 Pipelining + * 100 <= sum +

  22. i 1 Pipelining + * 100 <= sum +

  23. i’s loop Longlatency pipe predicate sum’s loop i 1 Pipelining + * 100 <= sum +

  24. i’s loop sum’s loop i 1 Pipelining + * 100 critical path <= Predicate ackedge is on the critical path. sum +

  25. i’s loop sum’s loop i 1 Pipelining + * 100 <= decoupling FIFO sum +

  26. i 1 Pipelining + * 100 critical path <= i’s loop decoupling FIFO sum sum’s loop +

  27. ASH Features • What you code is what you get • no hidden control logic • lean hardware (no CAM, multi-ported files, etc.) • no global signals • Compiler has complete control • Dynamic scheduling => latency tolerant • Natural ILP and loop pipelining

  28. ASH promises to scale with: • circuit speed • transistors • program size Conclusions • ASH: compiler-synthesized hardware from HLL • Exposes program parallelism • Dataflow techniques applied to hardware

  29. Backup slides • Hyperblocks • Predication • Speculation • Memory access • Procedure calls • Recursive calls • Resources • Performance

  30. Hyperblocks Procedure back

  31. Predication hyperblock if (!p) ....... p !p if (p) ....... q q back

  32. Speculation if (!p) ...... if (!p) ...... ops w/ side-effects q q back

  33. Memory Access predicate address token Interconnection network load Load-store queue data token data address pred token store Memory token back

  34. Procedure calls Interconnection network Extract args args call P result caller ret Procedure P back

  35. Recursion save live values recursive call restore live values stack hyperblock back

  36. Resources • Estimated SpecINT95 and Mediabench • Average < 100 bit-operations/line of code • Routing resources harder to estimate • Detailed data in paper back

  37. Performance • Preliminary comparison with 4-wide OOO • Assumed same FU latencies • Speed-up on kernels from Mediabench back

More Related