1 / 60

Computing Without Processors Thesis Proposal

Computing Without Processors Thesis Proposal. Mihai Budiu July 30, 2001. Thesis Committee: Seth Goldstein, chair Todd Mowry Peter Lee Babak Falsafi, ECE Nevin Heintze, Agere Systems. This presentation uses TeXPoint by George Necula. Four Types of Research. Solve nonexistent problems

Download Presentation

Computing Without Processors Thesis Proposal

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Computing Without ProcessorsThesis Proposal Mihai BudiuJuly 30, 2001 Thesis Committee: Seth Goldstein, chair Todd Mowry Peter Lee Babak Falsafi, ECE Nevin Heintze, Agere Systems This presentation uses TeXPoint by George Necula

  2. Four Types of Research • Solve nonexistent problems • Solve past problems • Solve current problems • Solve future problems

  3. The Law (source: Intel)

  4. The Crossover Phenomenon technology time

  5. Example Crossover access speed (ns) nocaches caches CPU DRAM 200 1980 time

  6. Trouble Aheadfor Microarchitecture

  7. Signal Propagation mm die size 20 distancein 1 clock now time

  8. Reliability & Yield defects/chip occurring tolerable new process now time

  9. Energy power CPU consumption thermal dissipation 100W now time

  10. Instruction-Level Parallelism (ILP) instructions fetch commit now time

  11. Premises of this Research • We will have lots of gates • Moore’s law continues • Nanotechnology • Contemporary architectures do not scale

  12. Outline • Motivation • ASH: Application-Specific Hardware • The spatial model of computation • CASH: Compiling for ASH • Evolutionary path • Conclusions • Future work

  13. ASH Application-Specific Hardware HLL program Compiler Circuit Reconfigurable hardware

  14. ASH: A Scalable Architecture-- Thesis Statement -- Application-specific hardware on a reconfigurable-hardware substrate is a solution for the smooth evolution of computer architecture. • We can provide scalable compilers for translating high-level languages into hardware.

  15. Example int f(void) { int i=0, j = 0; for (; i < 10; i++) j += i; return j; }

  16. Outline • Motivation • ASH: Application-Specific Hardware • The spatial model of computation • CASH: Compiling for ASH • Evolutionary path • Conclusions • Future work

  17. Huge structures Nano-RAM cell . In yellow: a CMOS RAM cell ASH and Nanotechnology • Build reconfigurable hardware using nanotechnology • Low Power: 1010 gates use less than 2 W • Low cost: nanocents/gate • High density: 105x over CMOS

  18. Control-flow transfer Basic block Memory write Memory read Memory word A Limit Study of Performance A graph of the whole program execution:

  19. memcpy Typical Program Graph (g721_e) Memory reads Control flow transfer 100% code cluster 100% memory cluster

  20. Program Graph After Inlining memcpy memcpy

  21. Application Slowdown

  22. How Time Is Spent No caches: reads expensive No speculation

  23. Lesson The spatial model of computation has different properties.

  24. Outline • Motivation • ASH: Application-Specific Hardware • The spatial model of computation • CASH: Compiling for ASH • Evolutionary path • Future work

  25. Program to circuits Memory partitioning Interconnection net CASH: Compiling for ASH

  26. Reliability Computations & local storage 2. Split-phase Abstract Machines Unknown latency ops. 3. Configurations placed independently 4. Placement on chip Compilation int reverse(int x){ int k,r=0; for (k=0; k<32; k++) r |= x&1; x = x >> 1; r = r << 1; }} 1. Program

  27. Power Split-phase Abstract Machines CFG SAM 1 SAM 3 SAM 2

  28. Hyperblock => SAM • Single-entry, multiple exit • May contain loops

  29. SAM => FSM Exit Start Loop Exit Local memory Remote Memory

  30. Implementing SAMs- interesting details -

  31. The SAM FSM Computation args results Register exit start Predicates (control) Combinational logic

  32. Signals Computation = Dataflow Programs Circuits a 7 x = a & 7; ... y = x >> 2; & 2 x >> • Variables => wires + tokens • No token store; no token matching • Local communication only

  33. data data data valid ack valid valid reset Local Global Static Tokens & Synchronization • Tokens signal operation completion • Possible implementations:

  34. ILP and Eager Muxes slow - - > > Speculation b x 0 if (x > 0) y = -x; else y = b*x; * ! f y Computation Predicates Static-Single Assignment implemented in hardware

  35. Guard side-effects • Memory access • Procedure calls *q = 2; • Control looping • Decide exit branch Predicates x=... x=... • Select variable definition ...=x

  36. Computing Predicates s t b • Correct for irreducible graphs • Correct even when speculatively computed • Can be eagerly computed

  37. = Pipelining a[3] a[2] a[1] Loops + Dataflow 0 i 1 &a[0] for (i=0; i < 10; i++) a[i] += i; + + load + a[0] store

  38. Outline • Motivation • ASH: Application-Specific Hardware • The spatial model of computation • CASH: Compiling for ASH • Evolutionary path • Conclusions • Future work

  39. Microprocessors ASH Evolutionary Path The problem with ASH: Resources

  40. Virtualization

  41. CPU+ASH CPU ASH support computation + OS + VM core computation Memory

  42. Outline • Motivation • ASH: Application-Specific Hardware • The spatial model of computation • CASH: Compiling for ASH • Evolutionary path • Conclusions • Future work

  43. ASH Benefits

  44. Scalable Performance performance ASH CPU now time

  45. Summary • Contemporary CPU architecture faces lots of problems • Application-Specific Hardware (ASH) provides a scalable technology • Compiling HLL into hardware dataflow machines is an effective solution

  46. Timeline now CASH core Explore architectural/compiler trade-offs Hw/sw partitioning (ASH + CPU) Loop parallelization Memory partitioning Writethesis Costmodels ASH Simulation 06/01 09/01 12/01 04/02 06/02 09/02 12/02

  47. Extras • Related work • Reconfigurable hardware • Other cross-over phenomena • A CPU + ASH study • More about predicates

  48. Related Work • Hardware synthesis from HLL • Reconfigurable hardware • Predicated execution • Dataflow machines • Speculative execution • Predicated SSA back

  49. Interconnection network Universal gates and/or storage elements Programmable Switches Reconfigurable Hardware back to presentation

  50. Main RH Ingredient: RAM Cell 0 0 0 1 a0 data a0 a1 & a2 a1 a1 Universal gate = RAM data in 0 control Switch controlled by a 1-bit RAM cell back

More Related