1 / 15

Combining Software and Hardware Monitoring for Improved Power and Performance Tuning

Combining Software and Hardware Monitoring for Improved Power and Performance Tuning. Eric Chi, A. Michael Salem, and R. Iris Bahar Brown University Division of Engineering. Richard Weiss Hampshire College School of Cognitive Science. BROWN UNIVERSITY. Motivation.

tola
Download Presentation

Combining Software and Hardware Monitoring for Improved Power and Performance Tuning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Combining Software and Hardware Monitoring for Improved Power and Performance Tuning Eric Chi, A. Michael Salem, and R. Iris Bahar Brown University Division of Engineering Richard Weiss Hampshire College School of Cognitive Science BROWN UNIVERSITY

  2. Motivation • Performance drives high-end processor design • Include many complex architectural features • Resources may not always be optimally utilized • Resources dissipate some power regardless of utilization • Dynamic schemes allow processor to reconfigure resources according to program’s needs • Some means of monitoring program is needed to drive reconfiguration BARC January 30, 2003

  3. Monitoring Options • Hardware monitoring • Relatively easy to implement • Can easily adjust to changing patterns • Must first recognize pattern before reacting • Restricted to fixed-sized sampling windows • Software profiling • Reconfiguration occurs in anticipation of changing needs • Sampling ranges are adaptable • Requires instruction annotation and initial sampling overhead • Only applicable to instructions with very deterministic behavior BARC January 30, 2003

  4. Why Not Combine? • Each has its particular benefits • If hardware and software techniques can be combined, can we improve the control policies driving processor reconfiguration? • Potentially lead to better energy savings and higher overall performance. BARC January 30, 2003

  5. Our Goal • Have HW and SW profiling work together to better identify program behavior • Allow processor to react more quickly to strongly deterministic behavior • Allow HW monitoring to assist with hard-to-predict cases with hints from software profiling BARC January 30, 2003

  6. Low Power Configurations • We consider 2 different configurations separately: • Reducing issue width and ALUs • Save power in issue queue arbitration logic • Save power from underutilized ALUs • Fetch Halting • Triggered by a critical load missing to main memory • Fetching is disabled for the duration of the miss • Reduces occupancy rates in fetch and issue queues • Reduces number of wrong path instructions fetched BARC January 30, 2003

  7. Load/Store Unit Load/Store Unit Load/Store Unit Pipeline Organization Integer ALU Cluster 1 Branch Predictor Low-Power State Logic Integer ALU Cluster 2 Disable Fetch Unit Disable auxiliary ALU cluster and reduce issue width Annotation Decoder Data Cache Instruction Scheduler FetchUnit Instruction Decoder Instruction Cache RegisterFile Floating Point ALU Cluster 1 Floating Point ALU Cluster 2 BARC January 30, 2003

  8. Adjusting Issue Width • Adjust issue width between 8 and 4 and disable second integer ALU cluster • SW approach profiles IPC from train dataset • Annotates blocks with low IPC • Decoding start of block triggers entry to LP mode • HW approach using built-in counters to monitor IPC • Use fixed 256 cycle window • If integer IPC < threshold, enter LP mode • Combined approach • SW steers blocks with consistent behavior • HW handles remaining blocks BARC January 30, 2003

  9. Results for Reduced Issue Width • SW and HW results are comparable • COMBined results show that SW + HW methods identify different opportunities for saving power BARC January 30, 2003

  10. Results for Reduced Issue Width • SW performance is more consistent because thresholds can be tuned on a per-application basis BARC January 30, 2003

  11. Fetch Halting • Requires a combination of SW and HW monitoring: • SW profiling: • Identify critical loads that miss to main memory • IPC, occupancy rates, dead cycles, “miss stride” • HW monitoring: • Using annotations from SW profiling, HW tracks miss behavior only for “promising” load instructions. • Miss stride from annotations is compared to miss counter in HW to capture dynamic miss behavior • For now we simulate a perfect miss-predictor BARC January 30, 2003

  12. Fetch Halting Potential • Memory access rates shows that the fetch halting potential for each benchmark varies BARC January 30, 2003

  13. Results for fetch halting • Restricting fetch halting based on criticality information benefits performance BARC January 30, 2003

  14. Fetch Halting and RUU Occupancy • Perfect + crit results in average 10% RUU occupancy drop BARC January 30, 2003

  15. Conclusions and Future Work • HW and SW predict different low power events and can be combined offering greater power saving potential. • Future work: • Improve HW/SW combination scheme • Improve criticality predictor • Currently working on HW miss predictor • Adjust the halt period BARC January 30, 2003

More Related