Delaying Physical Register Allocation in Computer Architecture

CS 7960-4 Lecture 14 Delaying Physical Register Allocation Through Virtual-Physical Registers T. Monreal, A. Gonzalez, M. Valero, J. Gonzalez, V. Vinals Proceedings of MICRO-32 November 1999

Register File Design Considerations • Number of ports = 3 x issue width • Number of entries = window size + logical-regs • Multiple threads  more registers (more power) • Wire delays, clock speeds  multiple cycle access • Pipelining a RAM structure is hard

Register Allocation Fetch Rename Issue Complete Wake-up Commit assign pr7 cycle 4 cycle 15 write pr7 cycle 30 read pr7 cycle 50 release pr7 cycle 80 no result – 26 cyc useful time – 20 cyc no activity – 30 cyc

Two-Level Register File Base regfile Two-level regfile

Virtual-Physical Registers Register map table lr3  vr7 vr7   vr7 vr7  Virtual map table

Virtual-Physical Registers Register map table lr3  vr7 vr7   vr7 Instruction issues vr7  Virtual map table

Virtual-Physical Registers Register map table lr3  vr7, pr9  vr7 (pr9) vr7  pr9 Virtual map table vr7, pr9 Instruction completes Is assigned pr9

Virtual-Physical Registers Register map table lr3  vr7, pr9  vr7 (pr9)  pr9 vr7  pr9 Virtual map table

Lack of Registers Finishes, has no register, keeps re-executing In-flight window Has physical register Has no physical register

Lack of Registers cycle t cycle t+1 commits Finishes, has no register, keeps re-executing gets reg In-flight window Has physical register Has no physical register

Deadlock Who will generate a register for this instr? Solution: Reserve a register for the oldest instruction Finishes, has no register, keeps re-executing In-flight window Has physical register Has no physical register

Sequential Execution Oldest instr has reserved register In-flight window Has physical register Has no physical register

Sequential Execution instr commits, releases another reg, that is then reserved for the new oldest instr In-flight window Has physical register Has no physical register

Sequential Execution Behaves like an in-order processor instr commits, releases another reg, that is then reserved for the new oldest instr In-flight window Has physical register Has no physical register

Reserving All Registers Allows quick progress, but almost behaves like a conventional processor Has physical register Has no physical register

Register Stealing Instr finishes; steals register from the youngest finished instr In-flight window • No reservation of regs • The younger instrs may • have to execute twice • Note the pre-execution effect Has physical register Has no physical register

Implementation • Finished instructions have to remain in issueq in • case they have to re-execute • Issued dependents of the victim instruction need • not re-execute • The VP tag of the victim has to be broadcast so • that unissued dependents can reset the ready bit • Can benefit from an instruction reuse buffer? • Pre-execution without explicitly attempting it

Results • Improves the base case by 5% (Int programs) • and 24% (FP programs) • FP programs have more ILP, better branch • prediction, and are more limited by cache misses • Re-executions: 10% (int) 58% (fp) • Steals: 5% (int) 12% (fp) • For the same IPC, VP registers employ 25% fewer • registers

Next Week’s Paper • “Pipeline Gating: Speculation Control for Energy • Reduction”, S. Manne, A. Klauser, D. Grunwald, • Proceedings of ISCA-25, June 1998

Harmonic and Arithmetic Means • HM of IPC = N / (1/IPCa + 1/ IPCb + 1/ IPCc) • = N / (CPIa + CPIb + CPIc) • = 1 / AM of CPI • Weight each benchmark as if they all execute one • instruction • If you want to assume each benchmark executes • for the same time, HM of CPI or AM of IPC is • appropriate

Title • Bullet

Delaying Physical Register Allocation in Computer Architecture

Delaying Physical Register Allocation in Computer Architecture

Presentation Transcript

CS 7960-4 Lecture 20

CS 7960-4 Lecture 24

CS 7810 Lecture 14

CS 7960-4 Lecture 8

CS 519: Lecture 4

CS 7960-4 Lecture 5

CS 140L Lecture 4

CS 140L Lecture 4

CS 425 Lecture 4

CS 140 Lecture 14

CS 7960-4 Lecture 23

CS 7960-4 Lecture 2

CS 7960-4 Lecture 17

CS 160: Lecture 14

CS 7960-4 Lecture 10

CS 7960-4 Lecture 7

CS 7960-4 Lecture 20

CS 7960-4 Lecture 4

CS 160: Lecture 14

CS 160: Lecture 14

CS 7960-4 Lecture 20

CS 7960-4 Lecture 18