1 / 31

A Fine-grained Component-level Power Measurement Method

A Fine-grained Component-level Power Measurement Method. Zehan Cui, Yan Zhu, Yungang Bao , Mingyu Chen Institute of Computing Technology, Chinese Academy of Sciences July 28, 2011. Outline. Motivation Design & Implementation Experiments Conclusion & Work in Progress. Outline.

walter
Download Presentation

A Fine-grained Component-level Power Measurement Method

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Fine-grained Component-level Power Measurement Method Zehan Cui, Yan Zhu, YungangBao, Mingyu Chen Institute of Computing Technology, Chinese Academy of Sciences July 28, 2011

  2. Outline • Motivation • Design & Implementation • Experiments • Conclusion & Work in Progress

  3. Outline • Motivation • Design & Implementation • Experiments • Conclusion & Work in Progress

  4. Background • Watts/Server  [source: The Problem of Power Consumption in Servers,Intel,2009] • CPU no longer dominates the system power. [source: Barroso et. al. , The datacenter as a computer, 2009]

  5. Motivation • Measurement is the basis. Hardware model Low power measurement Software

  6. Existing Measurement Method • Component-Level: ATX-based method accuracy Directly powered through ATX wires. Modern motherboards mostly have dedicated ATX wires for processor. VRM (Voltage Regulation Module) loss Usually deduced from multi ATX wires. Platform dependent.

  7. Outline • Motivation • Design & Implementation • Experiments • Conclusion & Work in Progress

  8. Our Solution: A Hybrid Way Disk • Disk & CPU • Similar to other ATX-based methods • Memory & Add-in Card Devices • Wrapper-based methods • Advantages • Accurate: direct measurement • Easy-to-use: no deduction needed • Portable: multi-platform CPU wrapper Memory X Power Supply Current Sensor

  9. Implementation • Prototype • Disk power • CPU power • Memory power

  10. Outline • Motivation • Design & Implementation • Experiments • Conclusion & Work in Progress

  11. Experimental Setup

  12. An Example • 401.bzip2 from SPECCPU2006

  13. Time Granularity • More frequently we measure the power, more details we can get. Observation: 5,000 samples/s is an appropriate sample frequency at component level.

  14. Graph BFS (Breadth First Search) Higher BW, but lower Power Lower BW, Higher Power

  15. Microbenchmark • Malloc 512MB • Access in different strides • Two causes • Row conflict • Lots of TLB miss Time: 6.5 times longer Power: slightly lower Energy: 5.9 times higher  increase row buffer hit rate large page may be more efficient What is the relationship between performance and power?

  16. Random vs. Sequential • 64MB memory • Random vs. Sequential • Jump at least 64B • eliminate cache hit • Large page(2MB) • eliminate TLB miss • Load/Sotre_Unit % = LSU_stall_time/CPU_Cycle Observation: It seems that DRAM power is already proportional to bandwidth. But the fact is that …

  17. Random Access • Use different SEEDs to generate different random access patterns; • Power varies less than 1.1%. • Observation: • DRAM power is highly correlated to two factors • Load/Store Unit Utilization • Sequential / Random • We can build memory power models based on the two factors rather than Bandwidth.

  18. Outline • Motivation • Design & Implementation • Experiments • Conclusion & Work in Progress

  19. Takeaway Messages (Conclusions) • We use a hybrid approach • ATX-Based  CPU/Disk • Wrapper card  DRAM/… • 5KHz is an appropriate sampling frequency to disclose fine-grain power behavior. • DRAM power is highly correlated to Load/Store Unit Utilization, rather than Bandwidth.

  20. Work in progress • Upgrade current system • Support DDR3 • Support Large memory capacity • Support 40 simultaneous measuring channels • Use FPGA to collect measured data • Correlate the measured power data with high-level semantics information

  21. Thanks!& Questions?

  22. Backup

  23. Wrapper Card Design • Wrapper Card already exists • We only did several small modifications Current Sensor Power Supply Signals

  24. Memory Capacity Limitation • Normal DIMM: Dual-Inline Memory Module DIMM slot Motherboard

  25. Memory Capacity Limitation • With our initial wrapper card Wrapper Card DIMM DIMM slot Motherboard

  26. Inside a DRAM Device ODT • I/O Circuitry • Runs at bus speed • Clock sync/distribution • Bus drivers and receivers • Buffering/queueing Row Decoder • Banks • Independent arrays • Asynchronous: independent of memory bus speed Bank 0 Write FIFO Registers Recievers • On-Die Termination • Required by bus electrical characteristicsfor reliable operation • Resistive element that dissipates power when bus is active Drivers Sense Amps Column Decoder [Source: H. David et. al., Memory Power Management viaDynamic Voltage/Frequency Scaling, ICAC, 2011]

  27. DRAM power • Can be approximately divided into • Background power • considered to be stable • Bank power • active/precharge • Related to frequency of row operation • I/O power • Burst • proportional to bandwidth • Termination power • Termination resistors • Proportional to bandwidth

  28. Current Sensor P = U * I Doesn’t fluctuate too much, less than 2% in our platform. DC Voltage DC Current ADC or DMM CSA (Current-Sense Amplifier) DC Voltage Data Collector (PC)

  29. DRAM power • Possible reason for non-proportional of random power in slide17: • When bandwidth is low, auto-precharge (caused by refresh) cause every access needs ACTIVE; the bank power is proportional to bandwidth. • When bandwidth is high, some access may hit in the row buffer, which need less ACTIVE; the slope of bank power increase is lower than before.

More Related