1 / 22

An Instruction Set and Micro architecture for Instruction Level Distribution Processing

An Instruction Set and Micro architecture for Instruction Level Distribution Processing. (Ho-Seop Kim and James E. Smith) Haiying Qu Electrical and Computer Engineering University of Alberta. Introduction 1. ILP : Instruction Level Parallelism Achieved significant performance gains

roscoe
Download Presentation

An Instruction Set and Micro architecture for Instruction Level Distribution Processing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An Instruction Set and Micro architecture for Instruction Level Distribution Processing (Ho-Seop Kim and James E. Smith) Haiying Qu Electrical and Computer Engineering University of Alberta

  2. Introduction 1 • ILP: Instruction Level Parallelism • Achieved significant performance gains • ILDP: Instruction Level Distributed Processing • Technology trend

  3. Introduction 2 • Proposed Micro architecture • Short pipelines • Distributed processing elements: in-order instruction processing enable out-of order execution • Strand: dependent instructions • Accumulator • Inter instruction communication

  4. 64 General Purpose Registers: R0-R63 Source or Destination 8 Accumulators: A0-A7 Dead Accumulator Instruction Set

  5. Load/store Instruction • One accumulator value • One GPR • One parcel • Ai <- mem(Aj) • Ai <- mem(Rj) • mem(Ai) <- Rj • mem(Rj) <- Ai

  6. Register Instruction • Operation: accumulator and GPR/immediate • Result: accumulator or GPR • Ai <- Ai op Rj • Ai <- Ai op immed • Ai <- Rj op immed • Rj <- Ai • Rj <- Ai op immed

  7. Branch/jump Instruction • Conditional branch: compare Ai, 0 or GPR(All usual predicates) • Program counter (p) • Indirect jump: Ai or GPR • Return address: GPR • P <- P + immed; Ai pred Rj • P <- P + immed; Ai pred 0 • P <- Ai • P <- Rj • P <- Ai; Rj <- P++

  8. Example Code

  9. Strand Figure 3. Types of values and and associated registers

  10. Two strands intersect: copy one to GPR Out put is a static global register New strand Strand Ends Figure 4. Issue timing

  11. Stages • Fetch: 4 words-- over 4 instructions • Parceling: Break into individual instructions • Renaming: GPR • Steering: into FIFO according to the accumulators

  12. Figure 5 ILDP Processor Block Diagram

  13. Some Concepts • PE: Processing Element • IR: Issue Register—single Reservation Station • ICN: Interconnection Network

  14. Figure 6 Micro architecture

  15. Table 1 Complexity Comparison Please be noted: the ILDP’s is based on one PE

  16. Table 2 Bench Mark Program Properties

  17. Evaluation 1 Figure 7 type of register values Figure 8 Average strand length

  18. Evaluation 2 Figure 9 Strand end Figure 10 instruction size

  19. Evaluation 3 Figure 11 Cumulative strand re-use Figure 12 IPC

  20. Evaluation 4 Figure 13 Global register rename map read/ write bandwidth

  21. Table 3 Simulator Configurations

  22. Discussion

More Related