1 / 34

Scalably verifiable dynamic power management

Scalably verifiable dynamic power management. Opeoluwa Matthews, Meng Zhang, and Daniel J. Sorin 20th International Symposium on High Performance Computer Architecture (HPCA) Orlando, Florida, February 17-19, 2014. - Krishnaprasad K and Yashas Krishna. Some Background.

liang
Download Presentation

Scalably verifiable dynamic power management

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Scalably verifiable dynamic power management Opeoluwa Matthews, Meng Zhang, and Daniel J. Sorin 20th International Symposium on High Performance Computer Architecture (HPCA) Orlando, Florida, February 17-19, 2014 - Krishnaprasad K and Yashas Krishna

  2. Some Background • Current day biggest problem • Power Management • Managing power each Component gets • When power is given • How system gets power when needed • Etc .. • Power management • Static Power Management • Pre allocate power to each component • Dynamic Power Management • Allocate power when needed • Eg : Dynamic Voltage / frequency scaling

  3. Problems with DPM • Designing DPM is Difficult • Because of Increasing scale of Computer Systems • Cores / Processor increases • Processors /System Increasing • Challenge to efficient DPM: • Scalability • Scalable to large-scale systems • Verifiability • Verify correctness in all situations • Scalability affects Verifiability • But no automated methods to Verify DPM

  4. Important Factors in DOM • Scalability Factor • Scalability proportional to Power Consumption • High Scale = High Power Req. • Low Scale = Low Power Req. • Verification Of DPM and Benefits • Find Bugs in DPM • To Prove Correctness of DPM • If not Done : • Component Overheat • System Failure and Damage • So a Scalably verifiable DPM is needed

  5. Contents • Existing System Model and Issues • Introducing new DPM system : Fractal DPM • How verification possible in new System ? • Fractal DPM vs Performance : Tradeoffs • New System Evaluation • Implementation Strategy • Comparison to Prior works • Conclusion

  6. Initial System Model • DPM Model • Dynamically allocate power to each component Ci • Power Allotted proportional to Current performance Xi • Xi = function of ( Current power Allocation Pi & Current unconstrained perf (Xmaxi).) • Initial Setting : • Set a power Budget • Allot power to Components satisfying Budget • Maximize Xi • Sum(Pi) < Budget • Power Performance Model • 5 possible power settings for each Ci • Low ( L) • Medium_Low (ML) • Medium (M) • Medium_High (MH) • High ( H )

  7. Initial Model : Issues • Design Using Existing tools • Fully automated Formal verification Methodologies • Tool : MurΦ Model Checker • Exhaustive State Space search • Checks Invariant Satisfied or not • Issue : State Space Explosion problems • As Ci increase : States Increase • Infeasible to traverse all states • For Eg: 5 C and 5 setting means 5^5 states • Typical Solution: • Check for small scale and if satisfied , assume Large scale also satisfies • Need not be true always

  8. Fractal DPM Design • Fractal Design • A design in which system behaves the same at every scale • This makes Inductive verification possible • Base case: Verify that the minimum system satisfies its power constraints • Inductive step: Verify that larger systems are equivalent to smaller systems • Both done Using MurΦ

  9. Fractal System organization • Hierarchical Structure : Binary tree model • Leaves : Computing Resource ( CR ) • Intermediate Nodes : DPM Controllers • Records Power states of Child Nodes • Handles power requests of CRs • Power Requests • CR can request more power • Sending req to DMP controller ( Parent ) • DMP Controller Responds • Either directly • Or Passing the req to Its parent Controller • A DMP Controller and Its Two Child considered a single “Node” like a Single CR • Each such Node has a combined Power Setting • Average of Child Nodes L:R

  10. Fractal System organization • Eg : If Child are H and L , then average is MH • L:R format represents power setting of Left child : power setting of right child

  11. Fractal System organization

  12. Fractal Power Invariant • The Invariant Must be fractal • Applicable on all scales of System • Plus point of Fractal DPM : makes its unique from other DPMs • Fractal Invariant • It is impossible for both children of a DPM controller to be at the High power setting at the same time • Why? • Good for cases when Sum(Pi) > Budget • Limits System Wide power consumption • Limitation • Other Invariants are not considered or Compared : Future Work

  13. Fractal DPM : Specification • Table based specification Method • Each entry in the table corresponds to a state/event combination, and the entry specifies what happens in that situation.

  14. Specification Continued • Special States : • Pend-* • family of pending states in which the computing resource has requested a new power state and is waiting for a response • Block-* • family includes states such as block-L:ML, in which the DPM controller granted or denied a request to a child and is blocked waiting on the Ackfrom the child and will then go to state L:ML • Specification Of root DPM • Same as Non Root DPM except Root has no parent DPM to request power • No Pending States , Only Block States • Non root DPM passes to parent DPM only if : • It handles req by itself ( but Node state unchanged ) • 4 Exceptions : Invariant not satisfied

  15. Fractal DPM : Scalability Issues • When High Scalability • Tree height Increase • Request from leaves to root take more time • Latency Issues • More hops • Possible Solution • Multi Degree Tree : Reduces Height of Tree • Prob : MurΦ doesn’t support this ; Couldn't verify • Scalability Issues : No big Concern • latency of DPM itself is not critical. • many requests can be satisfied without traveling far up the tree • Experimental results on a real system (modestly sized system (16 computing resources)) • latencies are reasonable.

  16. Verification of Fractal DPM • Scalably Verify • Verification Effort : Independent of number of CR • Steps • Base Case Verification • Induction Step Verification • Base Case :Minimum System verification • Base system must be complete • Include all basic components • Incomplete base system • When some elements not considered • Gives incomplete verification : Spurious Actions • MurΦ verifies whether Invariants satisfied

  17. Base Case :Minimum System verification

  18. Verification of Fractal DPM • Inductive Step : Equivalence Verification • Observation Equivalence verification chosen • Only outside behavior of system of diff. scale considered • No internal Actions considered • Considers only how system reacts to inputs • Two Perspectives • Looking Down • When system scaled Downwards • Looking Up • When system scaled Upwards • In both case , verify the larger system behaves same as sub system . • Tool : MurΦis used • Using same tool for both steps decrease transitional errors • On-The-Fly Mode : No extra state space

  19. Equivalence Verification

  20. Power management Efficiency • System wide power consumption : upper bounded • Max power consumed : ( C-1) MH + H • As C approach Infinity • Max Average power of CR = MH • F-DPM allows all CR to be in MH • Do not permit certain cases • Causes Inefficiency But Tradeoff between this and Fractal Invariance • But Rare and Inefficiency caused is small • Another Inefficiency : F-DPM forces on CR of H to MH

  21. Evaluation of System • Goal • Fractal DPM actually does its Job well ? • In allocating power to CRs Dynamically and Efficiently • Simulation Methodology • Dynamically set Xmaxi to all CRs • Keep it changing at Time steps • Give weights to power settings • Model behavior of CRs and DPMCs • Specification Tables • Computes performance of each CR • Function of power it is granted by DPM per Time Steps

  22. Performance Modeling • How determine performance of a given CR at a given power setting ? • Each CR can use power different way • May achieve different performance at same setting • Abstract way : as a function of Pi and Xmaxi • Two Functions : • Perf1: • Decreasing marginal performance benefit • E.g. using more power to enable a faster core clock frequency helps performance but eventually performance becomes memory-bound • Perf2: • Linear Performance benefit • E.g. ideal voltage/frequency scaling

  23. Performance Comparison and Results • Compare Against Implementable Oracle ( Ideal DPM) • Gives best possible allocations , even H:H allocations • Results ( give #CRs = 8) : • In majority of the time steps (>72%) : performance(FDPM) = performance(Oracle) • the performance gap is never more than 37% for perf1 and 46% for perf2 • Performance difference greater for Perf2 • perf2 models greater performance at higher power states, and thus being at a lower power state (to maintain the fractal invariant) is somewhat more costly • Thus : amount of performance sacrificed = Small

  24. Implementation Strategy • Dynamic Voltage/Frequency Scaling as Power adjustment strategy • V/F adjusted on a core-pair ( Granularity ) • Possible because of fractal structure • CR and DPMC using Linux Daemons • Communication through Sockets • Optimization : OptiFDPM • CR re-requests next lower power setting if current request rejected • Optimized version holds scalable verifiability of FDPM

  25. Evaluation of Implementation • Compare the power and performance of fractal DPM against an un-implementable oracle DPM scheme that always assigns the optimal power levels to core pairs. • Compare the power and performance of fractal DPM against a provably correct power management scheme that statically sets all cores to a given power level. • Determine the latency to service requests for new power levels

  26. Evaluation of Implementation • Comparison to Oracle Power Management

  27. Evaluation of Implementation • Comparison to Static Power Management

  28. Evaluation of Implementation • Latency

  29. Comparison : Previous Works • Lungu et al.’s research on verifiable DPM for multicore processors [9] • Observed DPM schemes cannot be verified on Large Scale • Showed State space explosion • Zhang et al.’s works on Fractal Coherence [14] • Derived idea of Fractal design • First time used for DPM • Others Works on DMP [10][8][6] • Did not use Verification

  30. Conclusion • Design of Scalably verifiable DPM • Using Fractal Design for Verifiability • Small performance in efficiency only • Par with Oracle Model

  31. Reference • [1] D. Bergamini, N. Descoubes, C. Joubert, and R. Mateescu, “BISIMULATOR: A Modular Tool for On-the-Fly Equivalence Checking,” in Proceedings of TACAS’05, volume 3440 of LNCS, 2005, pp. 581–585. • [2] C. Bienia, S. Kumar, J. P. Singh, and K. Li, “The PARSEC Benchmark Suite: Characterization and Architectural Implications,” in Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2008. • [3] C.-T. Chou, P. Mannava, and S. Park, “A Simple Method for Parameterized Verification of Cache Coherence Protocols,” in Formal Methods in Computer-Aided Design, 2004, pp. 382–398. • [4] G. Dhiman, K. K. Pusukuri, and T. Rosing, “Analysis of Dynamic Voltage Scaling for System Level Energy Management,” in Proceedings of the 2008 Conference on Power Aware Computing and Systems, 2008. • [5] D. L. Dill, A. J. Drexler, A. J. Hu, and C. H. Yang, “Protocol Verification as a Hardware Design Aid,” in IEEE International Conference on Computer Design: VLSI in Computers and Processors, 1992, pp. 522–525.

  32. Reference • [6] A. Efthymiou and J. D. Garside, “Adaptive Pipeline Depth Control for Processor Power-Management,” in Proceedings of the IEEE International Conference on Computer Design, 2002. • [7] J.-C. Fernandez, H. Garavel, A. Kerbrat, L. Mounier, R. Mateescu, and M. Sighireanu, “CADP - A Protocol Validation and Verification Toolbox,” in Proceedings of the 8th International Conference on Computer Aided Verification, 1996, pp. 437–440. • [8] C. Isci, A. Buyuktosunoglu, C.-Y. Cher, P. Bose, and M. Martonosi, “An Analysis of Efficient Multi-Core Global Power Management Policies: Maximizing Performance for a Given Power Budget,” in Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, 2006. • [9] A. Lungu, P. Bose, D. J. Sorin, S. German, and G. Janssen, “Multicore Power Management: Ensuring Robustness via Early-Stage Formal Verification,” in Proceedings of the Seventh ACM-IEEE International Conference on Formal Methods and Models for Codesign (MEMOCODE), 2009. • [10] R. Maro, Y. Bai, and R. I. Bahar, “Dynamically Reconfiguring Processor Resources to Reduce Power Consumption in High-Performance Processors,” in Proceedings of the Workshop on Power-Aware Computer Systems, pp. 97–111, Nov. 2000.

  33. Reference • [11] S. Park, S. Das, and D. L. Dill, “Automatic Checking of Aggregation Abstractions Through State Enumeration,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 19, no. 10, pp. 1202–1210, Nov. 2006. • [12] S. Park and D. L. Dill, “Verification of FLASH Cache Coherence Protocol by Aggregation of Distributed Transactions,” in Proceedings of the Eighth ACM Symposium on Parallel Algorithms and Architectures, 1996, pp. 288–296. • [13] D. J. Sorin, M. Plakal, M. D. Hill, A. E. Condon, M. M. K. Martin, and D. A. Wood, “Specifying and Verifying a Broadcast and a Multicast Snooping Cache Coherence Protocol,” IEEE Transactions on Parallel and Distributed Systems, vol. 13, no. 6, pp. 556–578, Jun. 2002. • [14] M. Zhang, A. R. Lebeck, and D. J. Sorin, “Fractal Coherence: Scalably Verifiable Cache Coherence,” in Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture2010.

More Related