1 / 31

Dynamic Removal of Redundant Computations

ICS´99, Rhodes (Greece) - June 20-25, 1999. Dynamic Removal of Redundant Computations. Carlos Molina, Antonio González and Jordi Tubella Universitat Politècnica de Catalunya - Barcelona {cmolina,antonio,jordit}@ac.upc.es. Motivation. Quasi-common subexpression. Quasi - invariant.

caia
Download Presentation

Dynamic Removal of Redundant Computations

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ICS´99, Rhodes (Greece) - June 20-25, 1999 Dynamic Removal of Redundant Computations Carlos Molina, Antonio González and Jordi Tubella Universitat Politècnica de Catalunya - Barcelona{cmolina,antonio,jordit}@ac.upc.es

  2. Motivation Quasi-common subexpression Quasi - invariant . . . . . R = S / T ; . . . . . X = S / U ; . . . . . for (i=0; i<N; i++) A[i] = B[i]+C[i];

  3. Outline • Instruction Reuse • Related Work • Redundant Computation Buffer • Performance Results • Conclusions

  4. Instruction Reuse Reuse Mechanism index OOO Execution Fetch Commit Decode & Rename

  5. Related Work • Instruction Reuse • Value Cache for the Tree Machine (Harbison 82) • Result Cache (Richardson 92, Oberman et al. 95) • Reuse Buffer (Sodani and Sohi 97) • Physical Register Reuse (Jourdan et al. 98) • Trace Reuse • Basic blocks (Huang and Lilja 99) • General traces (González et al. 99)

  6. Related Work • Result Cache • Richardson 92, Oberman & Flynn 95 • Special purpose (long latency operations) • Indexed by operand values • No reuse chaining • Can reuse dynamic instances of other static instructions • Reuse Buffer • Sodani & Sohi 97 • General purpose • Indexed by PC • Reuse chaining • Only reuse dynamic instances of same static instructions

  7. address tag result Redundant Computation Buffer Vtable Atable pointer Mtable Atable opcode result/address opnd1 opnd2 pointer Reuse Test Reused Memory Value Reused Value

  8. div 8 2 4 nil 10: 4 I1: 8 / 2 = 4 RCB (Working Example) Vtable Atable while (cond) { r = s / t ; ...... x = s / u ; }

  9. 4 div 8 2 4 nil 20: I2: 8 / 2 = 4 RCB (Working Example) Vtable Atable div 8 2 4 nil 10: while (cond) { r = s / t ; ...... x = s / u ; }

  10. div 8 2 4 nil 4 div 8 2 4 20: I2: 8 / 2 = 4 RCB (Working Example) Vtable Atable 10: while (cond) { r = s / t ; ...... x = s / u ; }

  11. div div 9 8 3 2 4 3 nil nil 3 4 div 8 2 4 nil 20: I1: 9 / 3 = 3 I2: 9 / 3 = 3 RCB (Working Example) Vtable Atable 10: while (cond) { r = s / t ; ...... x = s / u ; }

  12. opcode result/address opnd1 opnd2 address tag address tag result result PC Enhancements to Other Schemes • Enhanced Result Cache Mtable Atable Operands • Enhanced Reuse Buffer Mtable Atable opcode result/address opnd1 opnd2

  13. fetch decode& rename opnd read &dispatch issue execute write back commit Atable lookup reuse test Latency of the Reuse Buffer 1stAtable lookup 2ndAtable lookup reuse test Latency of the RCB Atable lookup reuse test Latency of the Result Cache Timing Considerations Pipeline Stages

  14. Experimental Framework • Simulator Alpha version of the SimpleScalar Toolset • Benchmarks Spec95 • Maximum Optimization Level DEC C & F77 compilers with -non_shared -O5 • Statistics Collected for 125 million instructions Skipping initializations

  15. Basic Reuse Statistics • We evaluate different schemes - Enhanced Result Cache (ERC) - Enhanced Reuse Buffer (ERB) - Redundant Computation Buffer (RCB) • We find best configuration for each scheme - Number of entries - History depth • Best configurations will be evaluated - Percentage of reuse - Speedup

  16. Quasi-Common Subexpressions 32 KB

  17. Study of Reuse (ERB) | | | | | | | | | 8 16 32 64 128 256 512 1024 2048 4096 Size in Kbytes

  18. Study of Reuse (RCB) | | | | | | | | | 8 16 32 64 128 256 512 1024 2048 4096 Size in Kbytes

  19. Study of Reuse (Comparative) | | | | | | | | | 8 16 32 64 128 256 512 1024 2048 4096 Size in Kbytes

  20. Performance Evaluation • Two different capacities are evaluated - 32 KB - 200 KB • Best configuration has been chosen for each reuse scheme • We present a performance evaluation for a supercalar processor - Speedup - Percentage of reuse

  21. Base Microarchitecture

  22. 1.20 1.15 1.10 1.05 1.00 Speedup (32 KB)

  23. Speedup (200 KB) 1.25 1.20 1.15 1.10 1.05 1.00

  24. Reuse (32 KB) Ops ready

  25. Reuse (200 KB) Ops ready

  26. Reuse by Instruction Category Load Value Memory Address Arithmetic  Cond Branch

  27. opco opco res/addr res/addr op1 op1 op2 op2 pointer pointer opco res/addr op1 op2 nil opcod result/addr opnd1 opnd2 Hybrid Scheme Atable Atable PC PC Atable Opnds Opnds

  28. Speedup (Hybrid Scheme) 1.20 1.15 1.10 1.05 1.00

  29. Reuse (Hybrid Scheme)

  30. Speedup (Perfect Reuse Engine) 2.20 2.00 1.80 1.60 1.40 1.20 1.00

  31. Conclusions • Redundant Computation Buffer • Quasi-invariants • Quasi-common subexpressions • High reuse coverage and low latency • 30% reuse • 10% speedup • Outperforms previous schemes

More Related