1 / 22

Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction

Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction. Rakesh Kumar, Keith I. Farkas, Norman P. Jouppi, Parthasarathy Ranganathan, Dean M. Tullsen. Presenter: Borys Bradel. Introduction. Different programs have different requirements (e.g. ILP)

hanley
Download Presentation

Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Single-ISA Heterogeneous Multi-Core Architectures:The Potential for Processor Power Reduction Rakesh Kumar, Keith I. Farkas, Norman P. Jouppi, Parthasarathy Ranganathan, Dean M. Tullsen Presenter: Borys Bradel

  2. Introduction • Different programs have different requirements (e.g. ILP) • Extends to phases of a single program • Heterogeneous cores • Use core that matches the requirements • Reuse existing cores • Use multiple generations of the same family of processors

  3. Outline • Methodology • Hardware • Assumptions • Power • Experiments • Optimal – energy/energy delay product • Heuristic based – static/dynamic • Related Work • Conclusion

  4. Single ISA Multi-Core Benefits • Small area overhead because of the growth in core sizes between generations • Clock frequencies of older cores would scale with technology • P3 1 GHz = P4 1.4 GHz • Increased pipeline depth precisely because could not scale

  5. Hardware – Alpha Family • 2 in order cores • EV4=21064 • EV5=21164 • 2 out of order cores • EV6=21264 • EV8-=21464 (multi thread support removed)

  6. Hardware Size • 15% more area than just using 21464

  7. Assumptions • Can switch cores dynamically • Private L1 cache and common L2 cache • All cores use 0.10 micron technology • Single process executing on a single core at any one time • 2.1 GHz clock (=21264 0.35 micron 600 MHz) • Input voltage 1.2V • Cores shut down when idle • 1000 cycle restart cost (staged, phase lock loop left alone) • 150 ms memory access • Stall cycles through CACTI

  8. Core Configurations

  9. Power Model • Use Wattch to account for activity based dissipation • Use scaling and offset factors to account for other factors • This hybrid model is closer to manufacturer’s data points • Peak power: data sheets less L2 cache and output pins • Typical power: scaled based on Intel chips

  10. Power and Area Statistics

  11. Performance Modeling • Use SMTSIM, a cycle accurate simulator • simpoint is used to identify representative instructions of programs and how many instructions need to be fast forwarded

  12. Varying Performance Ratio

  13. Varying Energy Efficiency Ratio

  14. Oracle Switching for Energy • Performance always within 10% of EV8-

  15. Oracle Switching for Energy

  16. Oracle Switching for Energy Delay Product • Performance always within 50% of EV8-

  17. Oracle Switching for Energy Delay Product

  18. Others • Voltage/frequency scaling – not as good • Static core selection • only EV6 and EV8- are used • Dynamic heuristic • Running average performance within 10% • Every 100 time intervals (100 million instructions) cores are sampled for 5 intervals • Select best core based on sampling

  19. Results for Heuristics

  20. Results for Heuristics/Static Core

  21. Related Work • Gating based power optimization • Cannot gate at a fine enough granularity • May still have leakage • This could be thought of as gating to reduce capabilities of different units • Voltage and frequency scaling • Chip wide – one size does not fit all • Fine grained – granularity problems

  22. Conclusions • Heterogeneous multi core architectures reduce the energy-delay product • More fine grained than other approaches • Using several cores from the same family is good • Reduces development/testing costs • Is it scalable? • Just use EV6??

More Related