1 / 24

On Tuning Microarchitecture for Programs

On Tuning Microarchitecture for Programs. Daniel Crowell, Wenbin Fang, and Evan Samanas. Outline. Adapt µArch to meet program’s performance/energy requirement during runtime A flexible framework for µArch adaptivity Case study on adaptive cache (selective-way/set)

dewei
Download Presentation

On Tuning Microarchitecture for Programs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. On Tuning Microarchitecture for Programs Daniel Crowell, Wenbin Fang, and Evan Samanas

  2. Outline • Adapt µArch to meet program’s performance/energy requirement during runtime • A flexible framework for µArch adaptivity • Case study on adaptive cache (selective-way/set) • Evaluation on adaptive cache

  3. Motivation • Optimizing for all is optimizing for nothing • Software is more and more complex, and many are close source • S/W and H/W codesign is infeasible for legacy software

  4. Three Questions for Microarchitecture Adaptivity • When to adapt? => Policy • Interval? Context switch? Function boundary? • What goal(s)? => Policy • Performance first? Performance-power ratio first? • How to adapt? => Mechanism • What technique to use to allow reconfiguration during runtime? Reference: Lee and Brooks [1], and Albonesiet al. [7]

  5. Adaptivity Framework Reference: Lee and Brooks [1] and Albonesiet al. [7]

  6. Policy • Instruction 1: adapt_advise • Inspired from “madvise” in os system calls • When to adapt:when this instruction is executed • What goal: an operand (performance? energy? both?) • Instruction 2: adapt_setup • Privilleged, only used by OS • Operand: allowed user programs to use adapt_advise or not Reference: Ipek [5], and Clark [6] Adding new instructions to SimpleScalar: http://ce.et.tudelft.nl/~demid/SSIAT/

  7. Policy Application boundary (OS) [3] Time interval (OS) [1][2] Context switching (OS) [4] User program (Compiler / User program)

  8. Feasibility study • To back our motivation to do this project • To support our decision of doing case study on adaptive cache, rather than other components • Wait for evan’s figure

  9. Feasibility study (Cont.) • You may have more figures to show …

  10. Case study: Adaptive Cache • According to our experimental result, we find cache is more interesting than other components …

  11. Selective set • What is selective set (may need more than one slide)

  12. Selective way • What is selective way (may need more than one slide)

  13. Selective set vs Selective way • Pros and cons?

  14. Evaluation • Simulator • SimpleScalar 3.0 • Wattch • Workload • 6 programs from SPEC 2000 • Case study: Adaptive Cache 14

  15. SimpleScalar changes Two methods used: • Simplescalar implementation of Selective Sets • Used timer with miss counter to determine sets to disable • Power down portions of cache and selectively flush dirty data • Scripting based method • can use this same design for both selective sets and selective ways • Completely replaces cache when resized, flushes all values at each interval 15

  16. Application-boundary policy Configuration set at start of program, then remains unchanged Selective-set Cache Selective-set Cache Selective-set Cache Selective-set Cache Selective-set Cache Selective-set Cache • Instructions Per Cycle vs Energy Delay • IPC: considers only performance (higher better) • Energy Delay: considers both performance and power (lower better) • Smaller cache size • Energy delay decreases at first, but rises later • Want to choose point where it is smallest 16

  17. Application-boundary policy Selective-way Cache • Similar tradeoffs in IPC and Power to Selective Set • Fewer choices • simplescalar limits to power of two associatively • Unlike cache set size, power of two limit not normally necessary 17

  18. Time-interval policy • Reconfigurations occur every so many CPU cycles • Why? • Good if program behavior not known before execution • Program may require fewer/more cached data later in execution • For our cache study: Relies on % Cache misses to determine reconfiguration. • Performance hit to changing too frequently • May oscillate between two roughly equivalent states • Reconfiguration requires temporarily halting, possibly flushing values from cache 18

  19. Time-interval policy Cache miss rate Cache miss rate Selective-set Cache • What is the minimum allowed cache miss rate? (1%, 2%, 3%, 4%? – policy choice) • Notice positive energy delay on right graph (not good!) • – never resizes down, since miss rate always higher than 1% • So all adaptivity adds is overhead under those circumstances 19

  20. Time-interval policy Cache miss rate Cache miss rate Selective-way Cache • Again, similar to selective sets • Differences dependent upon program being executed 20

  21. Cache miss rate Decreasing number of ways or sets almost always increases miss rate Problem Mentioned Earlier: See how Gzip and Vpr are always higher than 1%, which does not work well with a < 1% dynamic reconfiguration level 21

  22. Conclusion • Adaptivity is useful • Tune for different program requirements • Save power • A flexible adaptivity framework • Mechanism • Policy • Cache just one of many areas where this is useful 22

  23. Reference [1] B. C. Lee and D. Brooks. Efficiency trends and limits from comprehensive microarchitecturaladaptivity. In ASPLOS, 2008. [2] S.-H. Yang, M. D. Powell, B. Falsa, K. Roy, and T. Vijaykumar. An integrated circuit/architecture approach to reducing leakage in deep-submicron high-performance i-caches. In HPCA, 2001. [3] D. H. Albonesi. Selective cache ways: On-demand cache resource allocation. In JILP, 2000. [4] M. C. Huang, J. Renau, and J. Torrellas. Positional adaptation of processors: application to energy reduction. In ISCA, 2003. [5] E. Ipek, M. Kirman, N. Kirman, and J. F. Martinez. Core fusion: accommodating software diversity in chip multiprocessors. In ISCA, 2007. [6] M. Clark and L. K. John. Performance evaluation of congurable hardware features on the amd-k5. In ICCD, 1999. [7] D. H. Albonesi, R. Balasubramonian, S. G. Dropsho, S. Dwarkadas, E. G. Friedman, M. C. Huang, V. Kursun, G. Magklis, M. L. Scott, G. Semeraro, P. Bose, A. Buyuktosunoglu, P. W. Cook, and S. E. Schuster. Dynamically tuning processor resources with adaptive processing. In Computer, 2003

  24. Question?

More Related