1 / 21

CS 7960-4 Lecture 14

CS 7960-4 Lecture 14. Delaying Physical Register Allocation Through Virtual-Physical Registers T. Monreal, A. Gonzalez, M. Valero, J. Gonzalez, V. Vinals Proceedings of MICRO-32 November 1999. Register File Design Considerations. Number of ports = 3 x issue width

lenoraa
Download Presentation

CS 7960-4 Lecture 14

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS 7960-4 Lecture 14 Delaying Physical Register Allocation Through Virtual-Physical Registers T. Monreal, A. Gonzalez, M. Valero, J. Gonzalez, V. Vinals Proceedings of MICRO-32 November 1999

  2. Register File Design Considerations • Number of ports = 3 x issue width • Number of entries = window size + logical-regs • Multiple threads  more registers (more power) • Wire delays, clock speeds  multiple cycle access • Pipelining a RAM structure is hard

  3. Register Allocation Fetch Rename Issue Complete Wake-up Commit assign pr7 cycle 4 cycle 15 write pr7 cycle 30 read pr7 cycle 50 release pr7 cycle 80 no result – 26 cyc useful time – 20 cyc no activity – 30 cyc

  4. Two-Level Register File Base regfile Two-level regfile

  5. Virtual-Physical Registers Register map table lr3  vr7 vr7   vr7 vr7  Virtual map table

  6. Virtual-Physical Registers Register map table lr3  vr7 vr7   vr7 Instruction issues vr7  Virtual map table

  7. Virtual-Physical Registers Register map table lr3  vr7, pr9  vr7 (pr9) vr7  pr9 Virtual map table vr7, pr9 Instruction completes Is assigned pr9

  8. Virtual-Physical Registers Register map table lr3  vr7, pr9  vr7 (pr9)  pr9 vr7  pr9 Virtual map table

  9. Lack of Registers Finishes, has no register, keeps re-executing In-flight window Has physical register Has no physical register

  10. Lack of Registers cycle t cycle t+1 commits Finishes, has no register, keeps re-executing gets reg In-flight window Has physical register Has no physical register

  11. Deadlock Who will generate a register for this instr? Solution: Reserve a register for the oldest instruction Finishes, has no register, keeps re-executing In-flight window Has physical register Has no physical register

  12. Sequential Execution Oldest instr has reserved register In-flight window Has physical register Has no physical register

  13. Sequential Execution instr commits, releases another reg, that is then reserved for the new oldest instr In-flight window Has physical register Has no physical register

  14. Sequential Execution Behaves like an in-order processor instr commits, releases another reg, that is then reserved for the new oldest instr In-flight window Has physical register Has no physical register

  15. Reserving All Registers Allows quick progress, but almost behaves like a conventional processor Has physical register Has no physical register

  16. Register Stealing Instr finishes; steals register from the youngest finished instr In-flight window • No reservation of regs • The younger instrs may • have to execute twice • Note the pre-execution effect Has physical register Has no physical register

  17. Implementation • Finished instructions have to remain in issueq in • case they have to re-execute • Issued dependents of the victim instruction need • not re-execute • The VP tag of the victim has to be broadcast so • that unissued dependents can reset the ready bit • Can benefit from an instruction reuse buffer? • Pre-execution without explicitly attempting it

  18. Results • Improves the base case by 5% (Int programs) • and 24% (FP programs) • FP programs have more ILP, better branch • prediction, and are more limited by cache misses • Re-executions: 10% (int) 58% (fp) • Steals: 5% (int) 12% (fp) • For the same IPC, VP registers employ 25% fewer • registers

  19. Next Week’s Paper • “Pipeline Gating: Speculation Control for Energy • Reduction”, S. Manne, A. Klauser, D. Grunwald, • Proceedings of ISCA-25, June 1998

  20. Harmonic and Arithmetic Means • HM of IPC = N / (1/IPCa + 1/ IPCb + 1/ IPCc) • = N / (CPIa + CPIb + CPIc) • = 1 / AM of CPI • Weight each benchmark as if they all execute one • instruction • If you want to assume each benchmark executes • for the same time, HM of CPI or AM of IPC is • appropriate

  21. Title • Bullet

More Related