1 / 26

On Tuning Microarchitecture for Programs

On Tuning Microarchitecture for Programs. Daniel Crowell, Wenbin Fang, and Evan Samanas. Outline. Goal : Adapt µArch to meet program’s performance/energy requirement during runtime Motivation An ideal framework for µArch adaptivity Feasibility study on different adaptive components.

xue
Download Presentation

On Tuning Microarchitecture for Programs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. On Tuning Microarchitecture for Programs Daniel Crowell, Wenbin Fang, and Evan Samanas

  2. Outline Goal: Adapt µArch to meet program’s performance/energy requirement during runtime • Motivation • An ideal framework for µArch adaptivity • Feasibility study on different adaptive components. • Case study on adaptive cache (selective-way/set) • Evaluation on adaptive cache • Conclusion

  3. Motivation • Optimizing for all is optimizing for nothing • Software is more and more complex, and many are closed source • S/W and H/W co-design is infeasible for legacy software

  4. Three Questions for Microarchitecture Adaptivity • When to adapt? => Policy • Interval? Context switch? Function boundary? • What goal(s)? => Policy • Performance first? Performance-power ratio first? • How to adapt? => Mechanism • What technique to use to allow reconfiguration during runtime? Reference: Lee and Brooks [1], and Albonesi et al. [7]

  5. Adaptivity Framework Reference: Lee and Brooks [1] and Albonesi et al. [7]

  6. Policy • Instruction 1: adapt_advise [% perf] • Inspired from “madvise” in os system calls • When to adapt:when this instruction is executed • What goal: an operand (how much performance?) • Instruction 2: adapt_setup [user_prog|os|both|none] • Privileged, only used by OS • Operand: allowed user programs to use adapt_advise or not Reference: Ipek [5], and Clark [6] Adding new instructions to SimpleScalar: http://ce.et.tudelft.nl/~demid/SSIAT/

  7. Policy Application boundary (OS) [3] Time interval (OS) [1][2] Context switching (OS) [4] User program (Compiler / User program)

  8. Feasibility study • Back up motivation: What should be configured? • Ideal configuration differs by workload • L1 Data Cache, TLB, Branch Predictor • Simplescalar, Wattch • 6 Programs from SPEC2000Int

  9. Feasibility study (TLB)

  10. Feasibility study (TLB)

  11. Feasibility Study (Branch Predictor)

  12. Feasibility Study (Cache)

  13. Feasibility Study (Cache)

  14. What We Learned • TLB • Variability with # entries • Fully-associative better • Branch Predictor: Combined better • Cache: Variability in both • Size Variability > Assoc. Variability • Cache most interesting • Lots of Literature

  15. Selective set (Yang et. al. 2001) • Adjust size (# of sets) of L1 cache • Double size • Shrink by half • Goal: Decrease static power by reducing leakage • When to Adjust: Time interval • Miss rate threshold • Lower Bound • Focus on I-Cache

  16. Selective way (Albonesi 1999) • Disables “unneeded” cache ways • Goal: Reduce dynamic power • Decrease cache switching activity • When to adjust: • Disable: Extend ISA? • Enable: Performance Degradation Threshold

  17. Evaluation • Simulator • SimpleScalar 3.0 • Wattch • Workload • 6 programs from SPEC2000Int • Case study: Adaptive Cache 17

  18. Simulation Methodology Two methods used: • Simplescalar implementation of Selective Sets • Used timer with miss counter to determine sets to disable • Power down portions of cache and selectively flush dirty data • Scripting based method • Can use this same design for both selective sets and selective ways • Completely replaces cache when resized, flushes all values at each interval 18

  19. Application-boundary policy Configuration set at start of program, then remains unchanged Selective-set Cache Selective-set Cache Selective-set Cache Selective-set Cache Selective-set Cache Selective-set Cache • Instructions Per Cycle vs Energy Delay • IPC: considers only performance (higher better) • Energy Delay: considers both performance and power (lower better) • Smaller cache size • Energy delay decreases at first, but rises later • Want to choose point where it is smallest 19

  20. Application-boundary policy Selective-way Cache • Similar tradeoffs in IPC and Power to Selective Set • Fewer choices • Simplescalar limits to power of two associatively • Unlike cache set size, power of two limit not normally necessary 20

  21. Time-interval policy • Reconfigurations occur every so many CPU cycles (~1 million for example) • Why? • Good if program behavior not known before execution • Program may require fewer/more cached data later in execution • For our cache study: Relies on % Cache misses to determine reconfiguration. • Performance hit for changing too frequently • May oscillate between two roughly equivalent states • Reconfiguration requires temporarily halting, possibly flushing values from cache 21

  22. Time-interval policy Cache miss rate Cache miss rate Selective-set Cache • What is the minimum allowed cache miss rate? (1%, 2%, 3%, 4%? – policy choice) • Notice positive energy delay on right graph (not good!) • Never resizes down, since miss rate always higher than 1% • So all adaptivity adds is overhead under those circumstances 22

  23. Time-interval policy Cache miss rate Cache miss rate Selective-way Cache • Again, similar to selective sets • Differences dependent upon program being executed 23

  24. Conclusion • Adaptivity is useful • Tune for different program requirements • Save power • An ideal adaptivity framework • Mechanism • Policy • Cache just one of many areas where this is useful 24

  25. Reference [1] B. C. Lee and D. Brooks. Efficiency trends and limits from comprehensive microarchitecturaladaptivity. In ASPLOS, 2008. [2] S.-H. Yang, M. D. Powell, B. Falsa, K. Roy, and T. Vijaykumar. An integrated circuit/architecture approach to reducing leakage in deep-submicron high-performance i-caches. In HPCA, 2001. [3] D. H. Albonesi. Selective cache ways: On-demand cache resource allocation. In JILP, 2000. [4] M. C. Huang, J. Renau, and J. Torrellas. Positional adaptation of processors: application to energy reduction. In ISCA, 2003. [5] E. Ipek, M. Kirman, N. Kirman, and J. F. Martinez. Core fusion: accommodating software diversity in chip multiprocessors. In ISCA, 2007. [6] M. Clark and L. K. John. Performance evaluation of congurable hardware features on the amd-k5. In ICCD, 1999. [7] D. H. Albonesi, R. Balasubramonian, S. G. Dropsho, S. Dwarkadas, E. G. Friedman, M. C. Huang, V. Kursun, G. Magklis, M. L. Scott, G. Semeraro, P. Bose, A. Buyuktosunoglu, P. W. Cook, and S. E. Schuster. Dynamically tuning processor resources with adaptive processing. In Computer, 2003

  26. Questions?

More Related