1 / 11

Scaling to the End of Silicon with EDGE Architectures: The TRIPS Architecture

Scaling to the End of Silicon with EDGE Architectures: The TRIPS Architecture. Doug Burger et al. University of Texas at Austin Presented by Jason D. Bakos. EDGE Architecture. The most popular clichés in computer architecture:.

redwards
Download Presentation

Scaling to the End of Silicon with EDGE Architectures: The TRIPS Architecture

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Scaling to the End of Silicon with EDGE Architectures:The TRIPS Architecture Doug Burger et al. University of Texas at Austin Presented by Jason D. Bakos

  2. EDGE Architecture • The most popular clichés in computer architecture: “[Microarchitectures | ISAs] are always driven by [capabilities | challenges] of the available fabrication technology.” “We are reaching a fundamental limit in [transistor size | clock speed | pipeline depth | voltage reduction]!” “Future architectures will be (on-chip) communication-bound.” “Future architectures must strike a balance between static, dynamic, and programmer-specified parallelism.” “Future architectures will be constrained by a power budget.”

  3. EDGE Architecture • EDGE (Explicit Data-Graph Execution) addresses weaknesses in modern architectures: • Traditional architectures exploit ILP at run-time • Dynamic out-of-order schedulers • Multiple-issue units • Branch predictors • Register renaming/bypass networks • Reorder buffers • Large multi-ported register files • Not efficient: • Compiler already does much of this work (AST, CFG, DFG) – provide communication to MA • Get more parallelism? • Power hungry? • Shared structure bottleneck?

  4. EDGE Architecture • Goals: • Develop ISA that allows the compiler to convey DFG directly to the architecture • Use aggressive compiler techniques to map large DFG to execution array • To achieve good utilization, requires extraction of large DFGs from code • Problem: DFGs model data dependence, not control • Must merge CFG into DFG • Involves flattening of control structures (unrolling, inlining) • Execute predicated instructions in parallel • Use direct instruction communication for explicit parallelism and eliminate need for global register file for holding temporary values

  5. TRIPS • TRIPS: Tera-Op Reliable Intelligently-adaptive Processing System • Concurrency: • Scalable issue width and window size • Array of concurrent ALUs • Temporary register values not saved in register file • Control structures encapsulated/flattened within instruction block (single-entry, multiple exit) • Power: • Amortizes overheads of sequential instruction execution over blocks of instructions (128 instructions wide) • Communication delays: • Compile-time instruction placement into ALUs • Flexibility: • Reconfigurable memory banks

  6. Compilation • For compiling C code: • Block-atomic execution of 128 instructions • Maps to 4x4 ALU array (8 instructions each) • TRIPS instructions don’t encode operands • Example: • RISC: ADD R1, R2, R3 • TRIPS: ADD T1, T2 (T1/T2 are the consumers) • Instructions fire when their operands arrive • Eliminates the need to go through shared structures, such as a scheduler or register file (except load/store)

  7. TRIPS Prototype Architecture

  8. Block Compilation Schedules instructions along critical path into same ALUs Branches produce boolean signals Core is re-mapped when all writes complete

  9. Compiling for TRIPS • Superscaler architectures spend too much power resolving parallelism at run-time • VLIW architectures rely too much on compiler and can’t exploit additional parallelism discovered at run-time • Don’t schedule across control instructions • Stalls entire instruction word

  10. Compiling for TRIPS • Compiler forms large instruction blocks with no internal control flow • Schedule instructions within each block to minimize inter-ALU communication latency • Each block is 128 instructions – 8 instructions per each of 16 nodes

  11. Other Issues • Utilization of ALUs? Placement of instructions in ALU buffers (FIFO)? • Size limitations for forming merged CFG/DFGs? • Thread-level parallelism with TRIPS? • Dual-purpose memory banks? • “Scratchpads” • Dynamic code translation?

More Related