Opeoluwa (Luwa) Matthews, Meng Zhang, and Daniel J. Sorin

Scalably Verifiable Dynamic Power Management Opeoluwa (Luwa) Matthews, MengZhang, and Daniel J. Sorin Duke University HPCA-20 Orlando, FL, February 19, 2014

Executive Summary • Dynamic Power Management (DPM) used to improve power-efficiency at several levels of computing stack • -within multicore chip, across servers in datacenter, etc. • Deploying DPM scheme risky if not fully verified • -difficult to verify scheme for large-scale systems • Our contribution: Fractal DPM • -framework for designing scalably verifiable DPM • -implement Fractal DPM on 2-chip (16-core) system • -experimental evaluation on real system HPCA-20 Orlando, FL, February 19, 2014

Dynamic Power Management • DPM aims to: • -dynamically allocate power to computing resources • (e.g. cores, chips, servers, etc.) • -attain best performance at given power budget • -achieve lowest power consumption for desired performance n cores in CMP DPM Request Power grant deny Request Power … HPCA-20 Orlando, FL, February 19, 2014

Dynamic Power Management • DPM aims to: • -dynamically allocate power to computing resources • (e.g. cores, chips, servers, etc.) • -attain best performance at given power budget • -achieve lowest power consumption for desired performance DPM grant deny Request Power Request Power nmachines in datacenter … … HPCA-20 Orlando, FL, February 19, 2014

Case for Dynamic Power Management • Chips have hit power density ceiling [Hennessy and Patterson Computer Architecture] HPCA-20 Orlando, FL, February 19, 2014

Case for Dynamic Power Management • Datacenters consume increasing amounts of power Reducing cloud electricity consumption by half saves as much as UK consumes map of UK Cloud [hp.com] HPCA-20 Orlando, FL, February 19, 2014

Case for Verifiable DPM • DPM can greatly improve energy efficiency • Unverified DPM could • -overshoot power budget  system damage • -underutilize resources • -deadlock • Want formal verification • - prove correctness for all possible DPM allocations • - guarantee safety of DPM scheme HPCA-20 Orlando, FL, February 19, 2014

Why Scalably Verifiable DPM is Hard • CMPs and datacenters have many computing resources n computing resources (CR) S power states per CR Sn possible DPM states + • Checking Snstates is intractable for typical values of S and n HPCA-20 Orlando, FL, February 19, 2014

Hypothesis and Assumptions Problem: verification of existing DPM protocols is unscalable Hypothesis: We can design DPM such that it is scalably verifiable -key idea: design DPM amenable to inductive verification -change architecture to match verification methodologies Approach: -abstract away details of computing resources -abstract power states – e.g. Medium power -focus on decision policy (not mechanism e.g. DVFS) HPCA-20 Orlando, FL, February 19, 2014

Outline • Background and Motivation • Fractal DPM • Experimental Evaluation • Conclusions HPCA-20 Orlando, FL, February 19, 2014

Our Inductive Approach • Induction key to scalable verification  can prove DPM correct for arbitrary number of computing resources • Base case: small scale system with few CRs is correct • - small enough that it’s easy to verify with existing tools • Inductive step: system behaves the same at every scale  fractal behavior • Prove base case + prove inductive step  DPM scheme is correct for any number of CRs • Approach more general than DPM, borrowed from prior work on coherence protocols [Zhang 2010] HPCA-20 Orlando, FL, February 19, 2014

Attaining Scalable Verification-base case of induction • CRs request power from DPM controller • DPM controller grants or denies each request • Few states  easy to verify that DPM is correct • note: over-simplified base case for now DPM-C Request Power Grant/Deny CR CR HPCA-20 Orlando, FL, February 19, 2014

Attaining Scalable Verification-base case of induction • Base Case • -Refine our base case a little • -Need all types of structures: CR, DPM-C, Root DPM-C Root DPM-C DPM-C CR CR CR HPCA-20 Orlando, FL, February 19, 2014

Attaining Scalable Verification-inductive step • behavior must be fractal DPM-C Grant/Deny Request Power CR CR HPCA-20 Orlando, FL, February 19, 2014

Attaining Scalable Verification-inductive step • can scale system by replacing CR with larger system DPM-C DPM-C • {DPM-C + 2 CRs} “behaves just like” 1 CR • observational equivalence Grant/Deny Request Power Request Power Grant/Deny CR CR CR HPCA-20 Orlando, FL, February 19, 2014

Attaining Scalable Verification-observational equivalence • Inductive Step – Two Observational Equivalences 1) “Looking-down” equivalence check P1 P1 Small System Large System A’ A Observed externally from P1, A and A’ behave same HPCA-20 Orlando, FL, February 19, 2014

Attaining Scalable Verification-observational equivalence • Inductive Step – Two Observational Equivalences 2) “Looking-up” equivalence check Small System P2 P2 B’ B Large System Observed externally from P2, B and B’ behave same • By induction, protocol correct for all scales HPCA-20 Orlando, FL, February 19, 2014

Fractal DPM Design • CR can be in 1 of 5 power states: L(ow), LM, M(ed), MH and H(igh) • DPM controller state is <Left Child State>:<Right Child State> • Parent DPM controller “sees” child DPM controller in averaged state Avg(H:L) = M M:L M L H:L L H HPCA-20 Orlando, FL, February 19, 2014

Fractal DPM Design • CR can be in 1 of 5 power states: L(ow), LM, M(ed), MH and H(igh) • DPM controller state is <Left Child State>:<Right Child State> • Parent DPM controller “sees” child DPM controller in averaged state Avg(MH:H) = H H:L H L MH:H H MH HPCA-20 Orlando, FL, February 19, 2014

Fractal DPM Design -fractal invariant • Fractal design + inductive proof  invariant must also be fractal • - Invariant must apply at every scale of system • - Not OK to specify, e.g., <75% of all CRs are in Hstate • Our fractal invariant: children of DPM controller not both in H H:L H:H H L H H:H H:MH H H H MH ILLEGAL ILLEGAL HPCA-20 Orlando, FL, February 19, 2014

Translating Fractal Invariant to System-Wide Cap • We must have fractal invariant for fractal design • But most people interested in system-wide invariants • We prove (not shown) that our fractal invariant implies system-wide power cap • Max power for n CRs is: (n-1)MH + H • i.e., (n-1) CRs in state MH and one CR in state H HPCA-20 Orlando, FL, February 19, 2014

Fractal DPM Design -illustration • CR requests MH M:L L Req. MH H:L L H HPCA-20 Orlando, FL, February 19, 2014

Fractal DPM Design -illustration • CR requests MH • Granting request doesn’t change controller’s Avg state • Avg(H:L)=Avg(MH:L)=M • Request Granted, doesn’t violate invariant M:L block • Controller blocks waiting for ack L MH:L Grant MH L H HPCA-20 Orlando, FL, February 19, 2014

Fractal DPM Design -illustration • CR sets its state • CR sends ack to Controller M:L block L ack MH:L L MH HPCA-20 Orlando, FL, February 19, 2014

Fractal DPM Design -illustration • Controller unblocks M:L L H:L L H HPCA-20 Orlando, FL, February 19, 2014

Fractal DPM Design -illustration • Computing Resource requests H L:L L L:L Req. H L L HPCA-20 Orlando, FL, February 19, 2014

Fractal DPM Design -illustration • CR requests H from its Controller • Controller defers request to its parent • -new request is M (not H) because Avg(H:L)=M L:L Req. M L L:L Req. H L L HPCA-20 Orlando, FL, February 19, 2014

Fractal DPM Design -illustration • Root grants request to Controller, blocks block Grant M M:L L L:L L L HPCA-20 Orlando, FL, February 19, 2014

Fractal DPM Design -illustration • Controller grants request to CR, blocks block Grant M M:L block L H:L Grant H L L HPCA-20 Orlando, FL, February 19, 2014

Fractal DPM Design -illustration • ackspercolate up tree from CR block M:L block L H:L ack L H HPCA-20 Orlando, FL, February 19, 2014

Fractal DPM Design -illustration • ackspercolate up tree from CR • Controllers unblock upon receiving ack block ack M:L L H:L ack L H HPCA-20 Orlando, FL, February 19, 2014

Fractal DPM Design -illustration • ackspercolate up tree from CR • Controllers unblock upon receiving ack M:L L H:L L H HPCA-20 Orlando, FL, February 19, 2014

Verification Procedure • Use model checker to verify base case • - we use well-known, automated Murphimodel checker • Use same model checker to verify observational equivalences • - use prior aggregation method for equivalence check • (Park, TCAD 2000) HPCA-20 Orlando, FL, February 19, 2014

Outline • Background and Motivation • Fractal DPM • Experimental Evaluation • Conclusions HPCA-20 Orlando, FL, February 19, 2014

Experimental Evaluation -fractal inefficiency: cost of fractal behavior • Our fractal invariant implies system-wide cap > n*MH M:H MH:MH violates fractal invariant M:M H:H MH:MH MH:MH M M H MH H MH MH MH Legal: total power = 4MH Illegal: total power = 4MH overshooting system-wide power cap • Violating fractal invariant • Situations are few and don’t significantly degrade performance HPCA-20 Orlando, FL, February 19, 2014

Experimental Evaluation -system model • Implemented Fractal DPM on 16-core linux system, 2 sockets • -2 cores act as a CR • -controllers communicate through UDP across sockets HPCA-20 Orlando, FL, February 19, 2014

Experimental Evaluation -experimental setup Power Mode DVFS Mappings • Entire system plugged into power meter (Wattsup?) HPCA-20 Orlando, FL, February 19, 2014

Experimental Evaluation -comparison schemes • Static Scheme: • - no DPM, set all CRs to the same power state (e.g. MH) • - trivially correct, poor energy efficiency • Oracle DPM: • - allocates for optimal energy efficiency (ED2) under budget • - oracle doesn’t scale, unimplementable • Optimized Fractal DPM (OptFractal): • - CRs re-request lower power state when denied • - no change to Fractal DPM decision algorithm HPCA-20 Orlando, FL, February 19, 2014

Experimental Evaluation • Benchmarks: Details in the paper. HPCA-20 Orlando, FL, February 19, 2014

Results- compared to static scheme • OptFractalDPM within 2% of Oracle DPM ED2savings • FractalDPM within 8% of Oracle DPM ED2 savings HPCA-20 Orlando, FL, February 19, 2014

Results- response latency • Most power requests serviced within 1ms. • - UDP packet round trip ~0.6ms HPCA-20 Orlando, FL, February 19, 2014

Conclusions • We show how a scalably verifiable DPM can be built • Fractal behavior enables one-time verification for all scales • Entire verification is done completely automated in model checker • Fractal DPM achieves energy-efficiency close to optimal allocator HPCA-20 Orlando, FL, February 19, 2014

Scalably Verifiable Dynamic Power Management Opeoluwa (Luwa) Matthews, MengZhang, and Daniel J. Sorin Duke University HPCA-20 Orlando, FL, February 19, 2014

Benchmarks • Important: experiments must stress all Fractal DPM power modes • Each CR repeatedly launches bodytrack (from PARSEC benchmark suite), under a range of predetermined duty cycles • Under given duty cycle, CRs request power state that minimizes ED2 • Why rely on duty cycle, not just different benchmarks or phases? • Stressing all Fractal DPM power modes  stressing DVFS states • Without varying duty cycle, optimal ED2always under highest frequency for all benchmarks tried [Dhiman 2008] • Predetermined set of duty cycles for launching bodytrack that directly maps to set of power modes (or DVFS state) • Experiment constitutes running sequence of bodytrack jobs, randomly selecting duty cycles from predetermined set HPCA-20 Orlando, FL, February 19, 2014

Results • Millions of time steps simulated • For each time step, system perf = % CDF % system perf loss = () * 100% % system performance loss HPCA-20 Orlando, FL, February 19, 2014

Results • Millions of time steps simulated • For each time step, system perf = On 72.6% of time steps Fractal DPM ≡ Oracle DPM % CDF % system perf loss = () * 100% % system performance loss HPCA-20 Orlando, FL, February 19, 2014

Results • Millions of time steps simulated • For each time step, system perf = On 99.9% of time steps Fractal DPM < 20% off from Oracle % CDF % system perf loss = () * 100% % system performance loss HPCA-20 Orlando, FL, February 19, 2014

Results • Millions of time steps simulated • For each time step, system perf = Worst case, Fractal DPM < 36.4% off from Oracle % CDF % system perf loss = () * 100% % system performance loss HPCA-20 Orlando, FL, February 19, 2014

Opeoluwa (Luwa) Matthews, Meng Zhang, and Daniel J. Sorin

Opeoluwa (Luwa) Matthews, Meng Zhang, and Daniel J. Sorin

Presentation Transcript