1 / 39

Performance , Energy and Thermal Considerations for SMT and CMP Architectures

Performance , Energy and Thermal Considerations for SMT and CMP Architectures. Erkan Çetiner. Outline. Introduction Related Works Modeling Methodology Baseline Results DTM Techniques Conclusions. INTRODUCTION. SMT(Simultaneous Multithreading).

duff
Download Presentation

Performance , Energy and Thermal Considerations for SMT and CMP Architectures

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Performance , Energy and Thermal Considerations for SMT and CMP Architectures Erkan Çetiner

  2. Outline • Introduction • Related Works • Modeling Methodology • Baseline Results • DTM Techniques • Conclusions

  3. INTRODUCTION

  4. SMT(Simultaneous Multithreading) • Allows instructions from multiple threads to be simultaneously fetched and executed in same pipeline • Amortizing the cost by allowing more IPC(instruction per cycle) • Even though SMT has shown energy efficiency for most workloads , the significant boost in IPC results in increased power dissipation & possible increased power density • So thermal behavior & cooling costs are major concern

  5. CMP(Core Multiprocessors) • Instantiates multiple processor “cores” on a single die • Each core has private branch predictors , first-level caches and a shares a second-level , on-chip cache • For multiprogrammed workloads  it amortizes cost of die by allowing data sharing within a common L2 cache • Like in SMT , CMP promise to boost in throughput • The replication of cores means that area and power overhead to support extra threads is much greater with CMP than SMT • For a given die size , a single-core SMT chip will therefore support a larger L2 size than a multi-core chip • Side effect for CMP  Each added cores on a chip increases power dissipation , so thermal behavior and cooling costs are also major concerns for CMP

  6. Why Compare Those ? • Both paradigms target increased througput for multithreaded and multi-programmed workloads , it is worthy to compare them to see the performance , energy and thermal conditions of them

  7. RELATED WORKS

  8. Research Areas • Area overhead & energy efficiency of SMT • Energy efficiency & several power-aware optimizations for a multithreaded Alpha processor • Energy efficiency of SMT & CMP for Multimedia Workloads • Hybrid Systems include SMT & CMP

  9. Modeling Methodology

  10. Microarchitecture & Performance Modeling Turando/Powertimer usedto model an out-of-order , superscalar processor with resource configuration similar to current generation multiprocessors

  11. Microarchitecture & Performance Modeling • SMT is modeled by duplicating data structures that correspond to duplicated resources and increasing the sizes of those shared critical resources like the register file • Round-Robin policy is used at various pipeline stages for deciding which threads should go ahead • It is difficult to compare performance of different CMP or SMP configurations need a baseline

  12. Benchmarks • 15 SPEC2000 used – single thread benchmark • Simpoint toolset used – get representative simulation points for 500 million instructions • Trace Generation Tool used – generates final static traces by skipping the number of instructions given by Simpoint • Finally 500 million instructions are simulated and captured • Use pairs of single-thread benchmarks to form dual-thread SMT&CMP benchmark • Categorization of Benchmarks • High IPC(>0.9) • Low IPC(<0.9) • High Temperature(peak temperature>82°C) • Low Temperature(peak temperature <82°C) • Floating Benchmark • Integer Benchmark

  13. Power Model • Base energy models are derived from circuit level power analysis • In this research analysis performed at macro level • AssumptionUniform Leakage Power Density for all units on chip if they have same temperature(More accurate leakage power models resulted in more accurate conclusions)

  14. Temperature Model • HotSpot2.0 usedmodels temperature using a circuit of thermal resistances and capacitances that are derived from the layout of microarchitecture units • Assumption • Provide at least one temperature sensor for each microarchitecture block in floorplan

  15. Chip Die Area & L2 Cache Size Selection • Appropriate L2 cache size selection is very important • Core area stays fixed in experiment • The number of cores & L2 cache size determines total chip die area • CMP requires additional chip area for second core , L2 cache size must be smaller to achieve equivalent die area

  16. Baseline Results

  17. Some statistics • Chip area 210 mm² • L2 Cache Sizes • ST – 2MB • SMT – 2MB • CMP – 1MB

  18. Performance & Energy CMP outperforms SMT for workloads with low L2 cache miss rates (87%-26%) SMT outperforms CMP for workloads with high miss rates(42%-22%)

  19. Performance & Energy • Power overhead of SMT (38%-46%) • Main reasons for power growth Increased resources it requires  Increased utilization due to additional simultaneous instruction throughput • Power overhead for CMP(93%-71%) • Main Reason  Addition of entire second processor • By looking these metrics , • CMP is most-energy efficient for benchmarks with low L2 cache miss rates • SMP is most-energy efficient for benchmarks with high L2 cache miss rates

  20. Performance & Energy With Smaller L2 Cache size & High Cache Miss Ratio  Program is memory bounded hence SMT is better in terms of performance & energy With Larger L2 Cache Size & Low Cache Miss Ratio  No memory-bound  CMP is better

  21. Temperature Relatively similar temperature ratings

  22. Temperature • So why temperature increase for both of them ? • SMT processor the temperature hotspots are largelydue to the higher utilization factor of certain structures like the integer register file • CMP processor  integrated two cores and the totalpower of the chip nearly doublesand hence the total amount of heat being generated nearly doubles

  23. DTM TECHNIQUES

  24. DTM Constrained Techniques • Reduce packaging costs • Sustain thermal requirements of typical workloads Set some DTM techniques when temperature exceeds the design set point

  25. DTM Techniques • Dynamic Voltage Scaling • Fetch-Throttling • Rename-Throttling • Register-File Occupancy Throttling

  26. Dynamic Voltage Scaling • Cuts voltage& frequency in response to thermal violations • Restores the high voltage & frequency when the temperature drops below the trigger threshold

  27. Fetch-throttling • Limits how often the fetch stage is allowed to proceed • Reduces activity factors through pipeline Rename-throttling • Limits number of instructions renamed each cycle

  28. Register-File Occupancy-throttling • Register file is hottest spot of all chip • Its power is proportional to occupancy • To reduce power of register file  limit the number of register entries to a fraction of full size • All these techniques have a coomon property that  by limiting resources available to processors , these policies will cause the processor to slow down , thus consuming less power & finally cooling down to below the thermal trigger level

  29. Performance of DTM For workloads with low or moderate miss ratios , CMP always gives the best performance regardless of the DTM technique For workloads that are memory bound , SMT always give better performance

  30. Performance of DTM • For CMP • Register-throttling & fetch-throttling work equally well • For SMT • Register-throttling is the best techniquerename-throttlingglobal-fetch throttling

  31. Energy of DTM • Energy consumption is critical design criteria for : • Battery life • Energy utility costs (e.g. High-performance mobile laptops , servers designed for throughput oriented data centers like Google cluster architecture) • Dominant trend is that global DTM techniques tenf to have superior energy-efficiency compared against to local techniques for most configuration • Because global nature of DTM mechanism , larger portion of chip will be cooled , resulting in larger savings

  32. SMT architecture is superior to ST architecture for all DTM techniques except for Rename-throttling

  33. For CMP In Low L2 miss rates , CMP is always superior to the SMT for all DTM configurations

  34. CONCLUSIONS

  35. Conclusions • Both exhibit similar operating temperatures within current generation process technologies but heating behaviors are different : • SMT Heating is caused by localized heating within certain key microarchitecturral structures such as register file , due to increased utilization • CMP Heating is primarily caused by global impact of increased energy output • CMP machines offer significantly more throughput than SMT machines for CPU-bound applications and this leads to significant energy-efficiency savings despite a substantial increase in power dissipation .

  36. Conclusions • In equal-area comparison loss of L2 cache size hurts the CMP’s performance for L2-bound applications • CMP&SMT cores tend to perform better with different DTM techniques • In performance oriented systems Localized DTM techniques work better for SMT cores and global DTM techniques work better for CMP cores • In energy-oriented systems  global DVS thermal management technique offer significant energy savings

  37. REFERENCES • Performance, energy, and thermal considerations for SMT and CMP architectures Yingmin Li Skadron, K. Brooks, D. Zhigang Hu Dept. of Comput. Sci., Virginia Univ., Charlottesville,VA, USA • Efficiency of Thread-Level Speculation in SMT and CMPArchitectures - Performance, Power and Thermal Perspective Venkatesan Packirisamy, Yangchun Luo, Wei-lung Hung, Antonia Zhai, and Pen-chung Yew

  38. THANK YOU 

More Related