1 / 34

Transparent Control Independence (TCI)

Transparent Control Independence (TCI). Ahmed S. Al-Zawawi Vimal K. Reddy Eric Rotenberg Haitham H. Akkary*. * Dept. of Electrical & Computer Engineering * North Carolina State University, Raleigh, NC *Digital Enterprise Group * Intel Corporation, Hillsboro, OR.

barbarapena
Download Presentation

Transparent Control Independence (TCI)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Transparent Control Independence (TCI) Ahmed S. Al-Zawawi Vimal K. Reddy Eric Rotenberg Haitham H. Akkary* *Dept. of Electrical & Computer Engineering *North Carolina State University, Raleigh, NC *Digital Enterprise Group *Intel Corporation, Hillsboro, OR

  2. Effect of branch mispredictions • Branch misprediction rate of 5%-10% still a problem • Each misprediction squash’s 100s of inst. • Reduces performance: limits window size • Increases power: useless speculative work ISCA 34

  3. Control independence basics ISCA 34

  4. Control independence basics ISCA 34

  5. Control independence basics ISCA 34

  6. Control independence basics ISCA 34

  7. Four steps for exploiting CI ISCA 34

  8. Four steps for exploiting CI • Identify reconv. point ISCA 34

  9. Four steps for exploiting CI • Identify reconv. point • Remove/Insert CD inst. ISCA 34

  10. Four steps for exploiting CI • Identify reconv. point • Remove/Insert CD inst. • Identify CIDD inst. ISCA 34

  11. Four steps for exploiting CI • Identify reconv. point • Remove/Insert CD inst. • Identify CIDD inst. • Repair CIDD inst. • Fix data dependencies • Re-execute CIDD inst. ISCA 34

  12. Wrong CD instructions CIDD instructions CI inst. CD inst. Conventional CI misprediction recovery R CIDI-supplied source value Insert correct CD instructions in middle of the window: Repair program order Re-execute CIDD instructions: Re-reference values from CIDI instructions Squash wrong CD instructions Identify wrong CD inst. and CIDD inst. ISCA 34

  13. Conventional CI limitations Program order between CD & CI inst: Fine-grain retirement using ROB requires reordering the correct CD inst. with the CI inst. Dependence order between CIDD & CIDI inst.: Re-executing CIDD instructions requires preserving referenced CIDI instructions Goal of selective misprediction recovery: Fully decouple CIDI instructions from CD & CIDD instructions ISCA 34

  14. Recovery program Duplicate CIDD inst. Correct CD inst. CI inst. CD inst. TCI misprediction recovery R Repair program state using self-sufficient recovery program while relaxing program order No need to identify wrong CD and CIDD instructions Insert correct CD instructions like any new instructions Insert duplicate CIDD instructions like any new instructions ISCA 34

  15. branch checkpoint CIDD instructions Recovery program Duplicate CIDD inst. Correct CD inst. TCI misprediction recovery Checkpoint 1 Checkpoint 2 R Checkpoint CIDI-supplied source values CIDI-supplied source value Checkpoint-based retirement enables aggressive register reclamation (e.g., CPR): Completed instructions free their resources In-order retirement is not possible when instructions are out of program order Leverage branch checkpoint for correct CD instructions Exploit coarse-grain checkpoint-based retirement to relax ordering constraints Leverage checkpointed source values to mimic the effect of program order ISCA 34

  16. Transparent Control Independence • TCI repairs program state, not program order • TCI pipeline is recovery-free • Transparent recovery by fetching additional instructions with checkpointed source values • TCI pipeline is free-flowing • Leverage conventional speculation to execute correct and incorrect instructions quickly and efficiently • Completed instructions free their resources ISCA 34

  17. TCI microarchitecture • Add repair rename map • Add selective re-execution buffer (RXB) ISCA 34

  18. Predict the branch Instructions execute and leave the pipeline when done ISCA 34

  19. Construct recovery program Copy duplicate of CIDD inst. with their source values into RXB ISCA 34

  20. Insert correct CD instructions Load branch checkpoint into repair rename map, then fetch correct CD inst. ISCA 34

  21. Repair & re-execute CIDD instructions Inject duplicate CIDD inst. with their checkpointed source values ISCA 34

  22. Merge repair & spec. rename maps Copy corrected register mappings from repair map to spec. map ISCA 34

  23. TCI implementation details • Identifying CIDD instructions: • Control-flow stack (CFS) detects nested reconv. points • Influenced register set (IRS) and branch-sets • RXB reconstruction: • CIDD inst. of multiple branches are co-mingled • A misprediction may require repairing RXB • Renaming partial programs: • Re-rename recovery program despite its CIDI gaps • Merging repair/speculative rename maps ISCA 34

  24. Example: construct the RXB • B1 & B2 are branches • R1 & R2 are reconvergent points • Rectangular inst. are CIDD on B1 • Oval inst. are CIDD on B2 ISCA 34

  25. Example: reconstructing the RXB • Dispatch 11 • Don’t insert 11 into the RXB: CIDI w.r.t. B1 & B2 • Fetch correct CD: 11 and 12 • Meanwhile pre-read 16 to Temp Buffer • Dispatch 12 • Insert 12 into the RXB:CIDD w.r.t. B1 • Rollback RXB tail, like complete squash • Initiate RXB pre-read pointer • Start fetching correct CD • Objective of this example: • Inject recovery program for B2 • Reconstruct RXB for B1 ISCA 34

  26. Example: reconstructing the RXB • Dispatch 13 • Don’t insert 13 into the RXB:CIDI w.r.t. B1 & B2 • Reconvergence point detected • Correct CD complete • Dispatch 14 • Insert 14 into the RXB:CIDD w.r.t. B1 • Fetch correct CD: 13 and 14 • Meanwhile pre-read 18 to Temp Buffer ISCA 34

  27. Example: reconstructing the RXB • Dispatch 18:CIDD w.r.t. B2 • Don’t insert 18 into the RXB:Not CIDD w.r.t. B1 • Dispatch 20:CIDD w.r.t. B2 • Insert 20 into the RXB:CIDD w.r.t. B1 • B2 recovery program injection complete • B1 recovery program is maintained and compressed • Don’t dispatch 16:Not CIDD w.r.t. B2 • Insert 16 into the RXB:CIDD w.r.t. B1 • Begin renaming CIDD instructions from Temp Buffer • Meanwhile pre-read 20 into Temp Buffer ISCA 34

  28. Simulation methodology • Baseline: • Checkpoint-based superscalar processor • Issue width: 4 • Perceptron branch predictor • Register file: 256 registers • Branch checkpoints: 16 • Load store queue: 512 entries • L1 I & L1 D: 64KB 4-way (Hit: 1 cycle) • L2: 2MB 8-way (Hit:10 cycles, Miss: 200 cycles) • Benchmarks: 11 SPEC2000 INT + 4 SPEC95 INT SimPoint: 10M inst. warm-up + 100M inst. simulated ISCA 34

  29. CIDD inst. re-renaming models • Seq CIDD (TCI): • Only CIDD inst. are re-renamed and re-executed • Seq CI: [Akkary et al.] [Chou et al.] [Rotenberg et al.] • All CI inst. are re-renamed, but only CIDD inst. re-execute • Proxy: [Cher et al.] [Gandhi et al.] • Uses proxy move instructions to insulate CIDD inst. from source name changes • Only proxies are re-renamed • Both proxies and CIDD inst. re-execute by holding issue queue entries • All models have relaxed order through checkpoint-based substrate ISCA 34

  30. Results for 32 & 64 entries issue queue TCI maximum %IPC improvement is 61%(64%) Proxy average %IPC improvement is 6%(11%) Proxy can degrade performance Seq CI can degrade performance TCI average %IPC improvement is 16%(16%) ISCA 34

  31. Varying the issue queue size Proxy is bandwidth efficient, but resource inefficient Seq CI is bandwidth inefficient, but resource efficient TCI is both bandwidth and resource efficient ISCA 34

  32. Varying the RXB size TCI overcomes problem by only buffering CIDD inst. In Seq CI, the RXB limits the window size ISCA 34

  33. Conclusion • Recover program state, not program order • Transparent branch misprediction recovery using fully decoupled recovery program • Resource efficient • All instructions execute, drain, and free resources quickly based on conventional speculation • Bandwidth efficient • TCI only re-sequences CIDD instructions ISCA 34

  34. Questions

More Related