1 / 19

Reducing the Complexity of the Register File in Dynamic Superscalar Processors

Reducing the Complexity of the Register File in Dynamic Superscalar Processors. Nathir Rawashdeh University of Massachusetts, Amherst Low Power Architecture, Professor Moritz Note :

Download Presentation

Reducing the Complexity of the Register File in Dynamic Superscalar Processors

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Reducing the Complexity of the Register File in Dynamic Superscalar Processors Nathir Rawashdeh University of Massachusetts, Amherst Low Power Architecture, Professor Moritz Note : This presentation is, to a large extent, a reproduction of slides created buy the School of Electrical Engineering at Korea University. I have altered them and added new slides to better suit my audience. Nathir Rawashdeh (3 November 2003) Rajeev Balasubramonian, Sandhya Dwarkadas, and David H. Albonesi In Proceedings of the 34th Annual ACM/IEEE International Symposium on Microarchitecture, pp. 237 –248, MICRO 2001.

  2. Contents • Motivation • Reduce register file size  Two Level Register File (1stTechnique) • Reduce port complexity  Banked Organization (2ndTechnique) • Evaluation • Two-Level Register File Evaluation • Banked Register File Evaluation • Combining the Two Techniques

  3. Motivation • Modern high-performance processors use an out-of-order superscalar core to dynamically extract instruction level parallelism (ILP) from running applications. • Examine large window of in-flight instructions to find/issue multiple ready and independent instructions every cycle. • A larger instruction window: • Achieves better ILP • Requires a larger register file, issue queue, and reorder buffer. • Large multi-ported register file can potentially compromise clock cycle time in future wire-limited technologies. • Suggested two Methods in this Paper: • Two-Level Register File Organization to reduce register file size requirements. • Banked Organization to reduces port complexity.

  4. Motivation • Conventional Register File Organization • Logical registers are renamed to physical registers • At 1 and 2 : lr5 is renamed to pr18 • Branch at 3 is predicted not taken -> must keep pr18 in case of misprediction. Lr5 at 5 must be allocated a new reg. pr27 • Pr18 can only released to the free-list after 5 commits. Then lr5 at5 will be remapped to pr27

  5. Two-Level Register File(1stTechnique) • Level One (L1) Register File : Leaves register values that have potential readers. • Level Two (L2) Register File : Keeps other register values waiting to be released after their instructions commit. • Effects: • Reduced register file access time. Because a smaller portion (L1) of the register file is on the critical path. • More energy needed to copy register contents between L1 and L2.

  6. Two-Level Register File • Microarchitectural Changes • Assumption : 8-way issue processor • During rename, register renames correspond only to L1 Physical registers, L2 registers are hidden from the rename process.

  7. Two-Level Register File • Usage Table • Monitors the usage statistics for each L1 physical registers. • Maintaining Information • Pending consumer counter : keeps track of the number of pending consumers of that value. • Increment : during rename, an instruction that sources the register increments the counter • Decrement : during issue, the same instruction decrements the counter or if the instruction is squashed after a mispredict. • Overwrite bit (single bit) • Set when the physical register is no longer the latest mapping for its logical register. (the lr’s mapping changed to a different pr) • Another “result-written” bit • Indicates if a result has been written into the physical register. • Sequence number counter (sequence number 1) • For the branch immediately following the instruction that writes to this physical register. • Sequence number counter (sequence number 2) • For the branch immediately preceding the next instruction that writes to the same logical register. • Sequence number counter size : log2(ROB size).

  8. Two-Level Register File • Single L2 ID valid bit • Added to each ROB entry. • Indicates that the destination register ID in that entry corresponds to an L2 register.

  9. Two-Level Register File • Copy List • Keeps track of L1-L2 copies for recovery from a branch mispredict. • Maintaining Information for each L2 entry: • The L1 physical register name that had earlier contained the value. • The sequence number for the branch immediately following the instruction that writes to this physical register. • The sequence number for the branch immediately preceding the next instruction that writes to the same logical register. Two branch sequence numbers stored indicate the live period of a physical register value, the period during which instructions sourcing this value are dispatched.

  10. Minimally-Ported Banked Register File (2nd Technique) • Motivation • The large number of register file ports (in a wide-issue processor) • Increase complexity -> more power consumption • Increase reg. file access time -> will limit clock speed in future wire-limited technologies. • The number of ports required on average are a lot fewer than the actual port count (that supports the worst case). Reasons: • Many operands are read off the bypass network, not form the reg. file. • Many instructions only have a single register operand. • A number of instructions produce results that are not written to the register file (branches, stores, effective address computation part of a load or store)

  11. Minimally-Ported Banked Register File

  12. Evaluation • Metrics used to evaluate the Two-Level Register File Organization and Banked Register File Organization. • IPC : instructions per cycle • IPS : instructions per second = IPC/Access Time • Assume register file access time is the bottleneck, IPS is a better measure than IPC

  13. Two-Level Register File Evaluation • IPC (single vs. two-level reg. file) Gap between the two lines : Addition of L2 frees up more L1 registers Two-level organization has IPC = (1.67) with just 80 L1 registers (and 80 L2) Single-level organization requires as many as 140 registers to attain an IPC of 1.65. • out of 140 physical registers, only about 80 are active at any given time. Renaming 60 don’t have any consumers unless there is a misprediction or exception and they can be move away to the L2. 1.63 1.65

  14. Two-Level Register File Evaluation • IPS (single vs. two-level reg. file) For single level register file, IPS peaks for a 100-entry register file. For two-level register file, peak IPS value is seen for 60-entry L1. Optimal IPS with two-level organization is 17% better than the optimal IPS with a single-level register file ( better access time with two-level design). max max

  15. Two-Level Register File Evaluation • IPS on individual applications. The 100-L1 has the longest access time, but it’s IPS is not always worse than the 60-L1. In those cases, the 100-L1’s IPC out ways the access time penalty. Two-level organization achieves best IPS because it maintains low access time and an IPC comparable (within 1%) to the single-level 100-L1 design.

  16. Banked Register File Evaluation • Reg. file with a single read and single write port with N banks. Base Case: “Single bank,4rd,4wr” is within 2% of 24-ported case Third Bar : penalty by conflicts for read ports. 1% IPC degradation Fourth Bar : additional penalty by write port conflicts. 5% IPC degradation Worst port contention for apps with high ILP

  17. Banked Register File Evaluation • Reducing conflicts  move from 4 to 8 banks With 8 banks -> almost no IPC degradation due to read/write port conflicts (compared to 4 banks in previous figure) Still 2% IPC loss over 24-ported design

  18. Combining the Two Techniques

  19. Summary of various Organizations Two-level organization has slightly lower IPC than single-level, but 17% better IPS due to shorter L1 access times. Energy penalty due to copying between L1 & L2. Banked (single port per bank) reg. file has shorter access time (>2 factor) and needs 18 times less energy than a conventional organization. The Choice of technique dependant on design goals

More Related