power aware design part ii reduction management n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Power-aware Design - Part II Reduction & Management PowerPoint Presentation
Download Presentation
Power-aware Design - Part II Reduction & Management

Loading in 2 Seconds...

play fullscreen
1 / 170

Power-aware Design - Part II Reduction & Management - PowerPoint PPT Presentation


  • 94 Views
  • Uploaded on

Power-aware Design - Part II Reduction & Management. EE202A (Fall 2004): Lecture #9. Reading List for This Lecture. Required

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Power-aware Design - Part II Reduction & Management' - kare


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
reading list for this lecture
Reading List for This Lecture
  • Required
    • Sandy Irani, Sandeep Shukla, and Rajesh Gupta. Online Strategies for Dynamic Power Management in Systems with Multiple Power Saving States. ACM Transactions on Embedded Computing Systems, August 2003. http://portal.acm.org/citation.cfm?id=860180&jmp=cit&dl=GUIDE&dl=ACM
    • V. Raghunathan, C. Pererira, M.B. Srivastava, and R.K. Gupta. Energy-aware Wireless Systems with Adaptive Power-Fidelity Trade-offs. Accepted for IEEE Transactions on VLSI Systems. http://www.ee.ucla.edu/~vijay/files/tvlsi04_dvs.pdf
    • V. Raghunathan, S. Ganeriwal, C. Schurgers, and M.B. Srivastava. Energy Efficient Wireless Packet Scheduling and Fair Queuing. ACM Transactions in Embedded Computing Systems, February 2004. http://www.ee.ucla.edu/~vijay/files/tecs04_wfq.pdf
  • Recommended
    • C. Schurgers, V. Raghunathan, and M.B. Srivastava. Power Management for Energy-aware Communication Systems. ACM Transactions on Embedded Computing Systems, August 2003.http://www.ee.ucla.edu/~vijay/files/tecs03_dpm.pdf
    • Yao, F.; Demers, A.; Shenker, S. A scheduling model for reduced CPU energy. Proceedings of IEEE 36th Annual Foundations of Computer Science, Milwaukee, WI, USA, 23-25 Oct. 1995. p.374-82.
    • Gruian, F. Hard real-time scheduling for low-energy using stochastic data and DVS processors. Proceedings of the 2001 ACM International Symposium on Low power electronics and design, August 2001. p.46-51.
    • M. Anand, E. Nightingale, & J Flinn. Self-tuning Wireless Network Power Management. ACM MobiCom 2003. http://portal.acm.org/citation.cfm?id=939004&jmp=indexterms&coll=portal&dl=GUIDE
    • Yung-Hsiang Lu, Luca Benini , Giovanni De Micheli. Power -Aware Operating Systems for Interactive Systems. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 10, no. 2, April 2002.http://citeseer.nj.nec.com/lu02poweraware.html
  • Others
    • None
power consumption in cmos digital logic
Power Consumption in CMOS Digital Logic
  • Dynamic power consumption
      • charging and discharging capacitors
  • Short circuit currents
      • short circuit path between supply rails during switching
  • Leakage
      • leaking diodes and transistors
      • problem even when in standby!
power consumption in cmos digital logic contd
Power Consumption in CMOS Digital logic (contd.)

P = A.C.V2.f + A.Isw.V.f + Ileak.V

where

A = activity factor (probability of 0  1 transition)

C = total chip capacitance

V = total voltage swing, usually near the power supply voltage

f = clock frequency

Isw = short circuit current when logic level changes

Ileak = leakage current in diodes and transistors

why not simply lower v
Why not simply lower V?
  • Total P can be minimized by lower V
    • lower V are a natural result of smaller feature sizes
  • But… transistor speeds decrease dramatically as V is reduced to close to “threshold voltage”
    • performance goals may not be met
    • td = CV / k(V-Vt) where  is between 1-2
  • Why not lower this “threshold voltage”?
    • makes noise margin and Ileak worse!
  • Need to do smarter voltage scaling!
approaches to energy efficiency

2

P =

C V

f

a

“Event-Driven”

“Continuous”

Latency is Important

Only Throughput is

(Burst throughput)

Important

Reduce V

Make f low or 0

Increase h/w and

Shutdown when

algorithmic concurrency

inactive

e.g., Speech Coding

e.g., X Display Server

Video Compression

Disk I/O

Reduce

C

a

Communication

Energy efficient s/w

System partitioning

Efficient Circuits & Layouts

Approaches to Energy Efficiency
speed vs voltage
Speed vs. Voltage

7.0

N

o

r

m

a

l

i

z

e

d

5.0

D

e

l

a

y

3.0

1.0

1.0

1.5

2.0

2.5

3.0

Supply Voltage, V

reducing the supply voltage an architectural approach
Reducing the Supply Voltage: an Architectural Approach
  • Operate at reduced voltage at lower speed
  • Use architecture optimization to compensate for slower operation
    • e.g. concurrency, pipelining via compiler techniques
  • Architecture bottlenecks limit voltage reduction
    • degradation of speed-up
    • interconnect overheads
  • Similar idea for memory: slower and parallel

Trade-off AREA for lower POWER

example voltage parallelism trade off

7.0

p

5.0

u

d

e

e

p

S

3.0

1.0

1

2

3

4

5

6

7

8

Parallelism, N

Example: Voltage-Parallelism Trade-off

7.0

Ideal Speedup

d

e

l

i

z

a

m

o

r

N

y

l

a

D

e

5.0

3.0

1.0

1.0

1.5

2.0

2.5

3.0

Supply Voltage, V

slide11

Critical path delay: Tadder + Tcomparator = 25 ns

  • Frequency: fref = 40 MHz
  • Total switched capacitance = Cref
  • Vdd = Vref = 5V
  • Power for reference datapath = Pref = CrefVref2fref

Example: Reference Datapath

from “Digital Integrated Circuits” by Rabaey

slide12

The clock rate can be reduced by x2 with the same throughput: fpar = fref/2 = 20 MHz

  • Total switched capacitance = Cpar = 2.15Cref
  • Voar = Vref/1.7
  • Ppar = (2.15Cref)(Vref/1.7)2(fref /2) = 0.36Pref

Parallel Datapath

from “Digital Integrated Circuits” by Rabaey

pipelined datapath

fpipe = fref Cpipe = 1.1Cref Vpipe = Vref/1.7

  • Voltage can be dropped while maintaining the original throughput
  • Pipe = CpipeVpipe2fpipe = (1.1Cref)(Vref/1.7)2fref = 0.37Pref
Pipelined Datapath

from “Digital Integrated Circuits” by Rabaey

example of voltage scaling

p

u

d

e

e

p

S

r

e

w

o

P

d

e

z

i

l

a

m

r

o

N

Example of Voltage Scaling

3.0

7.0

% CommunicationOverhead

Ideal Speedup

2.0

5.0

Actual Speedup

1.0

3.0

Supply Voltage

(Fixed Throughput)

1.0

0.0

1

2

3

4

5

6

7

8

1

2

3

4

5

6

7

8

Number Of Processors, N

Number of Processors, N

1.0

0.8

x3.3 reduction

0.6

0.4

0.2

1

2

3

4

5

6

7

8

Number of Processors, N

low power software
Low-power Software
  • Wireless industry  Constantly evolving standards
  • Systems have to be flexible and adaptable
    • Significant portion of system functionality is implemented as software running on a programmable processor
  • Software drives the underlying hardware
    • Hence, it can significantly impact system power consumption
  • Significant energy savings can be obtained by clever software design.
low power software strategies
Low-power Software Strategies

CPU

  • Code running on CPU
    • Code optimizations for low power
  • Code accessing memory objects
    • SW optimizations for memory
  • Data flowing on the buses
    • I/O coding for low power
  • Compiler controlled power management

Cache

Memory

code optimizations for low power
Code Optimizations for Low Power
  • High-level operations (e.g. C statement) can be compiled into different instruction sequences
      • different instructions & ordering have different power
  • Instruction Selection
    • Select a minimum-power instruction mix for executing a piece of high level code
  • Instruction Packing & Dual Memory Loads
    • Two on-chip memory banks
      • Dual load vs. two single loads
      • Almost 50% energy savings
code optimizations for low power contd
Code Optimizations for Low Power (contd.)
  • Reorder instructions to reduce switching effect at functional units and I/O buses
    • E.g. Cold scheduling minimizes instruction bus transitions [Su94]
  • Operand swapping
    • Swap the operands at the input of multiplier
    • Result is unaltered, but power changes significantly!
  • Other standard compiler optimizations
    • Intermediate level: Software pipelining, dead code elimination, redundancy elimination
    • Low level: Register allocation and other machine specific optimizations
  • Use processor-specific instruction styles
    • e.g. on ARM the default int type is ~ 20% more efficient than char or short as the latter result in sign or zero extension
    • e.g. on ARM the conditional instructions can be used instead of branches
minimizing memory access costs
Minimizing Memory Access Costs
  • Reduce memory access, make better use of registers
    • Register access consumes power << than memory access
  • Straightforward way: minimize number of read-write operations, e.g.
  • Cache optimizations
    • Reorder memory accesses to improve cache hit rates
  • Can use existing techniques for high-performance code generation
minimizing memory access costs contd
Minimizing Memory Access Costs (contd.)
  • Loop optimizations such as loop unrolling, loop fusion also reduce memory power consumption
  • More effective: explicitly target minimization of switching activity on I/O busses and exploiting memory hierarchy
    • Data allocation to minimize I/O bus transitions
      • e.g. mapping large arrays with known access patterns to main memory to minimize address bus transitions
      • works in conjunction with coding of address busses
    • Exploiting memory hierarchy
      • e.g. organizing video and DSP data to maximize the higher levels (lower power) of memory hierarchy
energy efficient i o encoding
Energy Efficient I//O Encoding
  • C of system busses is >> C inside chips
      • large amount of power goes to I/O interfaces
          • 10-15% in uPs, 25-50% in FPGAs, 50-80% in logic
      • encoding bus data can reduce the power significantly
          • but need to handle encoding/decoding cost (power, latency)

Subsystem #1

Subsystem #2

bus

ENC

DEC

ENC

DEC

control

examples
Examples
  • Compression to remove redundancy
  • Gray code on address busses
      • addresses usually increment sequentially by 1
      • modified code that increments by 4 or 8 for word oriented CPUs
  • T0 code for address busses
      • add redundant INC line
        • INC=0 : address is equal to the bus lines
        • INC=1 : Tx freezes the other bus lines, and Rx increments the previously transmitted address by a pre-agreed stride
      • Better than Gray code: asymptotically zero transitions for sequences
examples contd
Examples (contd.)
  • Bus-Invert Coding
      • transmit D or invert(W), whichever results in fewer transitions from the previous transmitted code
      • an extra signal indicates polarity
      • performance
        • at most N/2 lines switch
        • average: code is optimal for 1-bit redundancy codes
        • better for small N (25% for N=2, 18.2% for N=8, 14.6% for N=16)
          • partition into k subbusses with k polarity bits
          • but, no longer optimal among redundant codes
  • Encode based on statistical analysis of bus traces
      • calculate spatio-temporal correlation (on-line or off-line)
examples contd1
Examples (contd.)
  • Mixed bus encoding T0_BI
    • Use two redundant lines: INC and INV
    • Good for shared address/data busses
    • Use SEL line of the bus to distinguish data and address
      • Use T0 when SEL indicates address, BI otherwise
  • Choice depends on type of bus
      • data: busses: random white noise
      • address busses: spatio-temportal correlations
shutdown for energy saving
Shutdown for Energy Saving

Blocked

“Off”

Active

“On”

  • Subsystems may have small duty factors
      • CPU, disk, wireless interface are often idle
  • Huge difference between “on” & “off” power
      • Some Low-Power CPUs:

StrongARM 400mW (active)/ 50 mW (idle) / 0.16 mW (sleep)

      • 2.5” Hard Disk [Harris95]:

1.35W (idle spinning) / 0.4W (standby) / 0.2W (sleep) / 4.7W (start-up)

Tblock Tactive ideal improvement = 1 + Tblock/Tactive

potential cpu power reduction in a wireless x terminal
Potential CPU Power Reduction in a Wireless X Terminal
  • 96-98% time spent in the blocked state
  • Average time in the blocked state is short (<< a second)
generic power managed system
Generic Power-managed System

Power

Manager

observation

observation

  • An abstract & flexible interface between power-manageable components (chips, disk driver, display driver etc.) & the power manager
      • but need insight on how & when to power manage
        • power management policy
      • Essentially PM is a controller that needs to be synthesized
  • Components (service providers) with several internal states
      • corresponding to power and service levels
      • can be abstracted as a power state machine
        • power and service annotation on states
        • power and delay annotation on edges

command (on, off)

Service

Requestor

Service

Provider

Queue

request

example sa 1100 cpu
Example: SA-1100 CPU

400 mW

RUN

  • RUN
  • IDLE
    • CPU stopped when not in use
    • Monitoring for interrupts
  • SLEEP
    • Shutdown on-chip activity

10 ms

90 ms

10 ms

160 ms

IDLE

SLEEP

90 ms

50 mW

0.16 mW

example fujitsu mhf 2043 at
Example: Fujitsu MHF 2043 AT

read/write

Working: 2.2 W(spinning + I/O)

Idle: 0.95 W(spinning)

I/O done

spin up4.4 J, 1.6 s

shutdown0.36 J, 0.67 s

Sleep: 0.13 W(stop spinning)

when is dpm useful
When is DPM useful?

Ptr Ttr

Blocked

“Off”

Active

“On”

  • If Ttr=0, Ptr=0 then DPM policy is trivial
    • Stop a component when it is not needed
  • If, as is usual, Ttr!=0, Ptr!=0
    • shutdown only when idleness is going to be long enough to make it worthwhile
    • Complex decision if the time spent in state is not deterministic

Ptr Ttr

problems in shutdown
Problems in Shutdown
  • Cost of restarting: latency vs. power trade-off
      • increase in latency (response time)
          • e.g. time to save restore CPU state, spin up disk
      • increase in power consumption
          • e.g. higher start-up current in disks
  • When to Shutdown

Optimal     vs.     Idle Time Threshold     vs.     Predictive

  • When to Wakeup

Optimal     vs.     On-demand     vs.     Predictive

  • Cross-over point for shutdown to be effective
conventional reactive approach

BLOCK

BLOCK

RUN

RUN

T

[i-1]

T

[i]

T

[i]

T

[i+1]

block

run

block

run

IDLE

OVER

REDUCED

OVER

WAIT

HEAD

POWER MODE

HEAD

Conventional Reactive Approach

“Go to Reduced Power Mode after the user has been idle for a few seconds/minutes, and restart on demand”

predictive shutdown approach
Predictive Shutdown Approach

“Use computation history to predict whetherTblock[i] is large enough ( Tblock[i]  Tcost )”

  • Example of a heuristic:

Tblock[i] TcostTrun[i]Ton_threshold

      • up to x20 power reduction with 3% slowdown on X server traces
          • compared to x2 with non-predictive
      • Eliminates power wasted while waiting for time-out
pre wakeup
Pre-wakeup
  • System wakeup takes time, adversely hurting the performance
  • One could pre-wakeup the system by predicting the occurrence of the next wakeup signal

R

I

R

R

I

R

delay

R

E

S

W

R

R

E

S

W

I

R

I’

I’

breakeven point
Breakeven Point
  • Breakeven point: minimum idle time that would make it worthwhile to shutdown
  • DPM worthwhile when TBE < Average Tidle
dpm approaches predictive
DPM Approaches: Predictive
  • Exploit correlation between the recent past & future
  • Predict idle time and schedule shutdown and/or wakeup accordingly
  • Static techniques
    • E.g. fixed timeout Tthreshold with on-demand wakeup
    • P(Tidle > Tthreshold + TBE | Tidle > Tthreshold) ≈ 1
    • Tthreshold = TBE yields energy consumption not more than 2x worse than ideal oracle policy
      • Worst case when point activities are separated by Tidle = 2TBE
  • Adaptive techniques
    • E.g. maintain set of time out values to figure out how successful it would have been
    • E.g. weighted timeouts where weights based on performance relative to oracle policy
    • E.g. increase and decrease timeout based on its performance
dpm approaches stochastic
DPM Approaches: Stochastic
  • Predictive approaches handle workload uncertainty
    • But assume deterministic response and transition time
    • Abstract system model introduces uncertainty
  • Predictive algorithms based on 2-state model of system
    • Real-life systems have multiple power states
    • Decide not only when to change state but also to which state
  • Stochastic approaches formulate problem of DPM policy as an optimization under uncertainty
    • Service requestor (SR): a Markov chain with state set R which models the arrival of service requests
    • Service provider (SP): a controlled Markov chain with S states that models the system. The states represent modes of operation of the system and transitions are probabilistic. The probabilities are controlled by the power manager.
    • Power manager (PM) which implements a function f: S x R  A from the state set of SR and SP to a set of possible commands A. Each function represents a decision process: the PM observes the state of the system and the workload, takes a decision, and issues a command to control the future state of the system
    • Cost metrics which associate power and performance values with each system state-command pair in S x R x A
  • Captures global view of the system with possibly multiple inactive states and resources
  • Performance and power are expected values
competitive analysis
Competitive Analysis
  • DPM is an inherently on-line problem
    • Make decisions without seeing the entire input
    • E.g. no way of knowing the length of an idle period until it ends
  • Competitive ratio: a way to characterize solutions to such problems
    • Compares the cost of an on-line algorithm with that of an optimal off-line one that knows the input in advance (the “oracle” solution)
    • An algorithm is c-competitive if for any input the cost of the on-line approach is bounded by c times the cost of the optimal off-line approach for the same input
    • Competitive Ration (CR) of an algorithm is the infimum over all c such that the algorithm is c-competitive
  • Competitive analysis done by case analysis of various adversarial scenarios or via formal theorem proving
  • Provides assurance about worst case performance
    • But this can be quite pessimistic!
classical results on cr
Classical Results on CR
  • The best CR achieved by any deterministic on-line algorithm is 2
    • So, the fixed timeout Tthreshold = TBE is optimal in this sense
  • Methods exist to determine an on-line DPM algorithm for a given idle period distribution such that for any distribution the correspond DPM algorithm is within a factor of e/(e-1) ≈ 1.58 of the optimal off-line algorithm
    • The result is tight: there is at least one distribution for which the ratio is exactly e/(e-1)
multi state dpm with optimal cr
Multi-state DPM with Optimal CR
  • Let there be k+1 states
    • Let State k be the shut-down state and 0 be the active state
    • Let i be the energy dissipation rate at state i
    • Let i be the total energy dissipated to move back to State 0
    • States are ordered such that i+1  i
    • k = 0 and 0 = 0 (without loss of generality).
    • Power down energy cost can be incorporated in the power up cost for analysis (if additive).

Now formulate an optimization problem to determine the state transition thresholds.

lower envelope idea
Lower Envelope Idea

State1

State2

State3

State 4

  • LEA can be deterministic or probabilistic
  • DLEA is 2 competitive while PLEA is e/(e-1) competitive
  • Learn p(t): On-line Probability Based Algorithm (OPBA)
    • Histogram of previous w idle intervals, and thresholds calculated based on that

Energy

For each state i, plot:

Time

t1

t2

t3

implementing dpm
Implementing DPM
  • Clock gating
  • Supply shutdown
  • Display shutdown
  • Motor shutdown
voltage reduction is better
Voltage Reduction is Better
  • Example: task with 100ms deadline, requires 50ms CPU time at full speed
    • normal system gives 50ms computation, 50ms idle/stopped time
    • half speed/voltage system gives 100ms computation, 0ms idle
    • same number of CPU cycles but 1/4 energy reduction

T1

T2

T1

T2

Same work,

lower energy

Speed

Idle

Task

Task

Time

problem with voltage reduction
Problem with Voltage Reduction
  • Voltage gets dictated by the tightest (critical) timing constraint
      • not a problem if latency not important
          • throughput can always be improved by pipelining, parallelism etc.
      • but, real systems have bursty throughput and latency critical tasks

Solution: dynamically vary the voltage!

varying the supply voltage

Variable Supply

Fixed Supply

T

T

frame

frame

Active

Idle

Active

2

2

×

E

= 1/2

CV

×

E

= 1/2

C(V

/2)

= 1/4E

fixed

dd

var

dd

fixed

1.0

0.8

N

o

r

m

a

l

i

z

e

d

o

0.6

P

w

e

r

0.4

0.2

0

0

0.2

0.4

0.6

0.8

1.0

Varying the Supply Voltage

Fixed Supply

from [Gutnik96]

(VLSI Symposium)

Variable Supply

Normalized Workload

dynamically variable voltage
Dynamically Variable Voltage
  • Use voltage to control the operating point on the power vs. speed curve
      • power and clock frequency are functions of voltage
  • Technology exists
      • efficient variable voltage DC-DC regulators available commercially
      • most CMOS chips operate over a range of voltage
  • Main problem is algorithmic:
      • one has to schedule the voltage variation as well!
          • via compiler or OS or hardware
intel s xscale strongarm 2
Intel’s Xscale (StrongARM-2)
  • Ultra-low power + high performance
    • via 7-stage pipeline (“Superpipeline”)
  • Compliant with ARM 5TE ISA
    • 16-bit Thumb instructions, additional DSP instructions
  • Power-awareness features
    • Dynamic voltage and frequency scaling on the fly
      • energy per op dynamic range of ~6x in SA-2 vs. ~2x for SA-1
        • SA2: 1 mW (standby); 40-mW/185-MIPS @ 150-MHz/0.75-V ; 450-mW/750-MIPS @ 600-MHz/1.3-V; 900-mW/1000-MIPS @ 800-MHz/1.6-V(source: Intel’s web site)
        • SA1: 33-mW @ 59-MHz/0.8-V to 360-mW @ 206-MHz/1.5V in 11 discrete steps

(source: estimated from Anantha Chandrakasan’s publications)

      • 30 S PLL-relock vs. 150 S for SA-1
        • regulator transition time may be bottleneck?
    • Idle, sleep, and quick wakeup modes
      • 100 W drowsy state
      • Functional block powered up only when needed
  • MAC coprocessor for signal processing
how to exploit dynamic voltage scaling
How to Exploit Dynamic Voltage Scaling?
  • Two observations:
    • bursty traffic or variable workload
    • workload averaging helps due to convex shape of power-speed curve
  • Generic system architecture
    • many examples in hardware and software

Power-Speed

Control Knob

WorkloadFilter

VariablePower-SpeedSystem

FIFO Input Buffer

approaches to dynamically schedule the system voltage
Approaches to Dynamically Schedule the System Voltage
  • Observation:
      • most energy efficient way to execute N instructions in an interval T is to use a constant voltage & frequency
      • can’t change V & f instantaneously: several s to ms
          • dynamical system
  • Ideal goal:
      • schedule voltage changes to minimize the energy consumed while meeting all the task deadlines
  • Approaches:
      • Heuristics for general purpose interactive OSs
          • [Weiser94], [Govil95]
      • Optimal approaches under deadline constraints
          • [Yao95], [Hong98a], [Hong98b], [Hong98c]
workload averaging helps

Without Averaging:

2

2

×

E

= CV

+ 1/4

C(V

/4)

var

dd

dd

Active

2

×

= 65/64

CV

dd

Frame #1

Frame #2

With Averaging:

2

×

×

E

= 5/4

C(5/8

V

)

avg

dd

Active

2

×

= 125/256

CV

dd

Frame #1

Frame #2

»

0.5 E

var

Workload Averaging Helps
dvs in general purpose oss using workload averaging
DVS in General-purpose OSs using Workload Averaging
  • Approach #1: [Weiser94]
      • time divided into 10-50 ms intervals
      • f & V raised or lowered at the beginning of the interval based on CPU utilization during the previous interval
          • 50% savings for a processor in the range 3.3V-5V
          • 70% savings for a processor in the range 2.2V-5V
  • Approach #2: [Govil95]
      • predicts CPU cycles needed in the next interval
      • sets f & V accordingly
      • many prediction strategies: some did well, others not
adaptive voltage scaling in asynchronous systems
Adaptive Voltage Scaling in Asynchronous Systems

from [Nielsen94]

  • Exploit data dependent computation time to vary the supply voltage
adaptive voltage scaling in synchronous systems

Rate

Actively

Controller

Damped

Ring

Ref_CLK

Switching

Oscillator

Supply

Workload

Filter

Var_CLK

Var_V

DD

Sample

Workload

Generic Synchronous

Input

DSP

Buffer

Data_In

Data_Out

from Gutnik & Chandrakasan (1996 VLSI Circuits Symposium)

Adaptive Voltage Scaling in Synchronous Systems
problem with workload averaging approach
Problem with Workload Averaging Approach
  • Can handle average throughput constraints
    • e.g. DSP, general-purpose workstation
  • But, can’t ensure deadline constraints are met
  • Deadlines are often critical
    • protocols, target tracking

FIFO scheduling with workload averaging is not good for RTOSs!

power management opportunities in rtos task scheduling
Power Management Opportunities in RTOS Task Scheduling
  • Low processor utilization factor (U)
    • E.g., CNC task set: U = 0.48

Result: Long intervals with processor being idle (power wastage)

  • Instance to instance task exec. time varies, and is usually much lower than worst case exec. time
    • E.g., JPEG decoding: WCET / BCET = 9.7

Reasons: (a) Program control flow depends on inputs, (b) cache effects, and (c) floating point ops have variable latency

Result: Additional processor idle intervals (power wastage)

choice of power management strategy
Choice of Power Management Strategy
  • Shutdownthe processor whenever idle
    • Power savings  Shutdown duration
      • Power savings increase linearly with shutdown time
  • Slowdownthe processor (i.e., reduce clock frequency) and scale the supply voltage (DVS) of the system to just meet instantaneous computational load
    • Power consumption  (Supply voltage)2
      • Quadratic savings (better than shutdown)
    • Shutdown becomes the secondary strategy, and is used to augment voltage scaling when further DVS is not possible

Task scheduling problem also becomes a voltage scheduling one

example

80

120

74

107

0

60

90

30

T1

T2

T3

Idle intervals

Example
  • Consider task set (period, WCET, deadline)
    • { T1(30, 10, 30), T2(40, 17, 40), T3(120, 10, 120)}
  • CPU utilization = 10/30 + 17/40 + 10/120 = 84.17%
  • Shutdown when idle  15.83% power savings
  • Slowdown by 15.63% ??  NO! Task T2 misses its deadline
slide65
How much static slowdown is possible while guarantying satisfaction of timing constraints?
global static slowdown
Global Static Slowdown
  • Key idea: Slowdown the CPU so that the task set is just schedulable, and any further CPU slowdown will make the taskset non-schedulable
  • Option 1: Slowdown till U > n*(21/n –1)

Problem: Very pessimistic. In above example,

U = 0.8417 > 3*(21/3 –1)  No slowdown possible!

  • Option 2: Perform RM analysis (e.g., response time)
      • Use schedulability test to find maximum slowdown S such that task set {(T1, S*C1), (T2, S*C2), … (Tn, S*Cn)} is schedulable
      • Shutdown whenever CPU is idle
    • For above example, S = 30/27 = 1.11  U = 93.43%
    • Power savings = 27.8%
    • Is this the best we can do? Not really………
per task static slowdown
Per-task Static Slowdown
  • Consider previous example… during static slowdown, T2 was the bottleneck that decided the global slowdown factor S = 1.11
  • Can we slowdown T2 beyond factor of 1.11?
    • Obviously not, because it is the critical task
  • How about T1?
    • T1 has higher priority than T2  it executes before T2. Slowing down T1 also delays T2
    • Therefore, the answer is NO.
  • How about T3?
    • T2 executes before T3. Therefore, slowing down T3 should not affect response time of T2. YES!
per task static slowdown contd

80

120

110

0

60

71

90

30

T1

T2

T3

Per-task Static Slowdown (contd.)
  • Key idea: Using a slowdown factor of 1.11 for T1 and T2, slowdown T3 further till it becomes just schedulable
    • Final slowdown factors: T1 = 1.11, T2 = 1.11, T3 = 1.9
    • U = 1.00 (CPU completely utilized!)
    • Power savings: 31.42%
  • This is the optimal static solution!
static slowdown algorithm
Static Slowdown Algorithm

Repeat until no more tasks are left to schedule

Compute set of critical tasks (i.e., bottlenecks), and the slowdown factor

Set the slowdown factors so that task set is just schedule

Remove all tasks with priority greater than the lowest priority critical task and continue

observation 1 execution time variation
Observation #1: Execution-time Variation
  • Significant variation in execution time of real-time tasks
    • WCET:BCET often >> 1
    • e.g. on a test run, MPEG decoder time range [0.003s, 0.15s] with average = 0.035s
    • e.g. compressed speech playout task has different time for talkspurt vs. silence
  • But, variation is not random due to correlation in udnerlying signal (speech, sensor etc.)
observation 2 applications tolerant to deadline misses
Observation #2: Applications Tolerant to Deadline Misses
  • E.g. sensor networks
  • Computation deadline misses lead to data loss
  • Packet loss common in wireless links
    • e.g. a wireless link of 1E-4 BER means packet loss rate of 4% for small 50 byte packets
    • radio links in sensor networks often worse
  • Significant probability of error in sensor signals
    • noisy sensor channels
  • Applications designed to tolerate noisy/bad data by exploiting spatio-temporal redundancy
    • high transient losses acceptable if localized in time or space

If the communication is noisy, and applicationsare loss tolerant, is it worthwhile to strivefor perfect noise-free computing?

exploiting execution time variation and tolerance to deadlines
Exploiting Execution-time Variation and Tolerance to Deadlines
  • Idea: predict execution time of task instance and dynamically scale voltage so as to minimize shutdown
  • Execution time prediction
    • learn distribution of execution times (pdf)
    • Tasks with distinct modes can help the OS by providing hint after starting
      • E.g. MPEG decode can tell the OS after learning whether the frame is P, I, or F
  • But, some deadlines are missed!
  • Adaptive control loop to keep missed deadlines < limit
  • Provides adaptive power-fidelity trade-off
dynamic slowdown algorithm
Dynamic Slowdown Algorithm

Set dynamic slowdown factor by stretching the adaptive predicted completion time to the worst-case completion time.

Predict the execution time of next instance based on past history of execution times

Adaptive control mechanism to vary the aggressiveness of the prediction scheme. More deadline misses cause prediction to turn conservative.

dynamic slowdown
Dynamic Slowdown
  • Consider the previous example: If first instances of T1 and T2 have an execution time of 0.5 times their respective WCETs
    • Power savings: 46.76%
  • Further enhancement: Monitor if currently executing task is the only request in the system. If so, slowdown till it reaches arrival point of next request, or its deadline, whichever is earlier.
performance of predictive dvs for adaptive power fidelity tradeoff
Performance of Predictive DVS for Adaptive Power-Fidelity Tradeoff

Normalized energy

Average Exec. Time / Worst Case Exec. Time

  • Result: up to 75% reduction in energy over worst case based voltage scheduling with negligible loss in fidelity (up to 4% deadline misses) on variety of multimedia and signal processing tasks
extensions to edf
Extensions to EDF
  • Assuming Di = Ti, the per-task static slowdown algorithm reduces to the global static slowdown algorithm.
  • Dynamic slowdown algorithm does not change
energy in radio recap
Energy in Radio: Recap….

Tx: Sender

Rx: Receiver

  • Wireless communication subsystem consists of three components with substantially different characteristics
  • Their relative importance depends on the transmission power of the radio

Incoming

information

Outgoing

information

Channel

Power amplifier

Transmit electronics

Receive electronics

examples1
Examples

nJ/bit

nJ/bit

nJ/bit

  • The RF energy increases with transmission range
  • The electronics energy for transmit and receive are typically comparable

~ 50 m (WLAN)

~ 10 m (Mote)

~ 1 km (GSM)

energy consumption of the sender
Energy Consumption of the Sender
  • Parameter of interest:
    • energy consumption per bit

Tx: Sender

Incoming

information

Energy

Energy

Energy

RFDominates

Electronics Dominates

Transmission time

Transmission time

Transmission time

effect of transmission range
Effect of Transmission Range

Energy

Short-range

Long-range

Medium-range

Transmission time

radio energy management 1 shutdown

Power

available time

time

transmission time

Energy

no shutdown

transmission time

shutdown

allowed time

Radio Energy Management #1: Shutdown
  • Principle
    • Operate at a fixed speed and power level
    • Shut down the radio after the transmission
    • No superfluous energy consumption
  • Gotcha
    • When and how to wake up?
    • Paging radio, wakeup protocol
radio energy management 2 scaling along the performance energy curve

Power

available time

time

transmission time

Radio Energy Management #2: Scaling along the Performance-Energy Curve

Principle

  • Vary radio ‘control knobs’ such as modulation and error coding
  • Trade off energy versus transmission time

Modulation scaling

fewer bits per symbol

Code scaling

more heavily coded

Energy

Energy

transmission time

transmission time

when to scale
When to Scale?

Energy

RF dominates

Electronics dominates

Scaling beneficial

Scaling not beneficial

Emin

transmission time

t*

  • Scaling results in a convex curve with an energy minimum Emin
  • It only makes sense to slow down to transmission time t* corresponding to this energy minimum
scaling vs shutdown

Power

Power

Power

allowed time

allowed time

allowed time

transmission time

time

time

time

transmission time = t*

transmission time = t*

Scaling vs. Shutdown

Energy

  • Use scaling while it reduces the energy
  • If more time is allowed, scale down to the minimum energy point and subsequently use shutdown

Region of scaling

Region of shutdown

Emin

time

t*

long range system

realizable region

Energy

transmission time

Long-range System
  • The shape of the curve depends on the relative importance of RF and electronics
  • This is a function of the transmission range
  • Long-range systems have an operational region where they benefit from scaling

Region of scaling

t*

short range systems
Short-range Systems

Energy

  • Short-range systems have an operational region where scaling in not beneficial
  • Best strategy is to transmit as fast as possible and shut down

realizable region

Region of shutdown

t*

transmission time

sensor node radio power management summary
Sensor Node Radio Power Management Summary

Short-range links

  • Shutdown based
  • Turn-off sender and receiver
  • Topology management schemes exploit thise.g. Schurgers et. al. @ ACM MOBIHOC ‘02

Energy

Long-range links

  • Scaling based
  • Slow down transmissions
  • Energy-aware packet schedulers exploit thise.g. Raghunathan et. al. @ ACM ISLPED ‘02

transmission time

Energy

transmission time

slide90

Another Issue: Start-up Time

Ref: Shih et. al., Mobicom 2001

slide91

Wasted Energy

  • Fixed cost of communication: startup time
    • High energy per bit for small packets

Ref: Shih et. al., Mobicom 2001

slide92

Communication Energy Model

Path Loss

Rx Electronics

Tx Electronics

ProtocolMACLink

ProtocolMACLink

Radio Tx

Radio Rx

Power Amp Efficiency

Digital Processing

Startup (turn-on) energy

Path loss exponent

Attenuation over one meter

Output power from power amp

Receiver static power

Transmitter static power

Estart

n

P1m

Pout

PrxElec

PtxElec

Switched capacitance per bitLeakage current

Processing time per bit

Supply Voltage

Transistor Threshold Voltage

Cbit

Ileak

Tbit

VDD

VTH

Min et. al., Mobicom 2002 (Poster)

slide93

Simplified Model

d

Static Power,Digital Processing

Power amp,Receiver Sensitivity

  • Myth: communication energy scales with d^n
  • Reality: hardware terms dominate

Min et. al., Mobicom 2002 (Poster)

slide94

Framework for Energy-Performance Scaling

  • Energy model
    • Take circuit energy into account
  • Knobs for energy-performance scaling
    • Coding rate
    • Packet length
    • Transmit power
    • Modulation level
  • Non-intuitive results (relative to what communication theory says) manifest at short distances
    • E.g. Single hop may be better than multiple hops
exploiting the modulation knob

BER

256-QAM

16-QAM

64-QAM

4-QAM

SNR (dB)

256-QAM

SNR

64-QAM

16-QAM

4-QAM

No transmission

time

Exploiting the Modulation Knob
  • Fixed transmit power
  • Target performance (BER)
  • Adapt modulation
    • Varying channel SNR
    • Maximize throughput

Throughput 

TraditionalAdaptiveModulation

alternative exploiting the modulation knob for energy

64-QAM

Required Rb

16-QAM

4-QAM

time

No transmission

Alternative: Exploiting the Modulation Knob for Energy

Ebit(J)

  • Slow varying SNR
  • Target performance (BER)
  • Adapt modulation
    • Varying load Rb
    • Minimize energy Ebit

4-QAM

16-QAM

64-QAM

SNR = 10 dB

SNR = 16 dB

SNR = 22 dB

SNR = 28 dB

Rb(Mbit/s)

Energy 

ModulationScaling

modulation scaling

Ebit

b = 6

b

b = 4

b = 4

b = 2

b = 2

Tbit

Modulation Scaling

Shutdown

b = 0

  • The energy - delay curve is convex

 Slowing down is more energy efficient than shutting down

  • For energy efficiency, operate as slow as possible

Energy 

Slowdown

L·Ebit

L·Tbit

energy consumption of transmitting a packet
Energy Consumption of Transmitting a Packet

Ptransmit

Power consumed by the power amplifier, depends on the required performance and the wireless channel (distance, fading, etc.)

Pelectronics

Power consumed by the electronic circuitry for filtering, upconverting, modulation, frequency synthesis, etc.

Eoverhead

Energy consumption that is independent of the packet size and modulation scheme (startup cost, fixed encoded header, etc.)

Tbit

Time to transmit one bit (depends on modulation and symbol rate Rs)

L

Size of packet payload

H

Size of packet header

energy consumption of transmitting a bit
Energy Consumption of Transmitting a Bit

: Optimize modulation

Minimize overhead

Minimize header size

Independent of modulation

Modulation Scaling

Function of the target performance, only very weakly dependent on b

=1 when no variable symbol rate provision

operate at max symbol rate
Operate at Max Symbol Rate

It is preferable to operate at the maximum symbol rate that can be implemented efficiently (i.e. without severe penalty)

The energy is a function of the modulation level: there is an optimum value of b, which depends on the parameters of the system

energy per bit
Energy per Bit

Region of modulation scaling

Rs (Mbaud)

b (bits/symbol)

energy delay trade off

b = 6

Ptransmit = 36 mW

Ptransmit = 9 mW

b = 4

Ptransmit = 2.25 mW

b = 2

Energy-Delay Trade-off
analogy between dynamic voltage and modulation scaling

Voltage

scaling

Modulation

scaling

V

b

f

Rb

Ebit (J)

b = 6

b = 4

b = 2

Tbit (s)

Analogy between Dynamic Voltage and Modulation Scaling
  • Scaling modulation on the fly results in energy awareness
  • Strong analogy between modulation scaling and voltage scaling
    • Low power techniques, like parallelism
    • Packet scheduling like task scheduling
    • Other power management techniques
controlling the modulation scaling knob
Controlling the Modulation Scaling Knob
  • Who controls the modulation scaling knob?
  • One possibility: packet scheduler
  • Normally wireless packet schedulers decide
    • Which node transmits
    • What packet
    • At what time
  • With modulation scaling, the scheduler decides
    • Which node transmits
    • What packet
    • At what time
    • What modulation setting
exploiting dms for energy aware packet scheduling

Eav (J)

Queue

Radio

Processor

R-DPM

Tav (s)

Exploiting DMS for Energy-aware Packet Scheduling
  • Example: adapt modulation setting based on # of packets in the queue
  • Different {queue, b} settings results in different point on the energy-delay curve
e 2 wfq combining dms wfq raghunathan @ islped 2002
E2WFQ : Combining DMS & WFQ[Raghunathan @ ISLPED-2002]
  • Intuitively, what does fairness mean?
    • Each connection gets no more than what it wants (subject to a max.)
    • The excess, if any, is equally shared
  • In E2WFQ, excess is not distributed  Used to save energy instead
  • Energy saving opportunities
    • Average input rate of a stream is lower then guaranteed rate
    • Packet lengths may be variable (often shorter than worst case)
    • Low, time varying link utilization
energy aware real time packet scheduling
Energy-aware Real-time Packet Scheduling
  • Analogous to RTOS task scheduling
  • Exploit variation in packet length to perform aggressive DMS

 static

 staticdyn

EnergySavings (%)

 staticdyn stretch

Lavg/Lmax

  • Up to 69% reduction in transmission energy (source: work @ UCLA)
another energy vs delay knob in radios coding
Another Energy vs. Delay Knob in Radios: Coding

Optimal coding

Antipodal signaling, block FEC

Source: [Prabhakar01] @ Infocom 2001

putting it all together power aware sensor node
Putting it All Together: Power-aware Sensor Node

Sensors

Radio

CPU

Dynamic Voltage & Freq. Scaling

Scalable Sensor Processing

Freq., Power, Modulation, & Code Scaling

Coordinated Power Management

PA-APIs for Communication, Computation, & Sensing

Energy-aware RTOS, Protocols, & Middleware

Hardware

energy quality scaling for communications
Metrics

Range d

Packet Error PM

Packet Delay Ttot

Throughput Rtot

Energy per bit Etot

Knobs (x1, x2, …)

Transmitter’s radiated power Prad

FEC rate RC

Number of bits per packet N

Supply voltage VDD

Modulation level b

Energy-Quality Scaling for Communications
where to do the power management
Where to do the Power Management?
  • Choices: H/W, Firmware, OS, Application, User
  • Hardware & firmware don’t know the global state and application-specific knowledge
  • Users don’t know component characteristics, and can’t make frequent decisions
  • Applications operate independently, and the OS hides machine information from them
  • OS is the most reasonable place, but…
    • OS should incorporate application information in power management
    • OS should expose power state and events to applications for them to adapt
power management of wireless nics
Power Management of Wireless NICs
  • Power modes: transmit, receive, idle, sleep, off
    • typically idle mode (ready but neither receiving nor transmitting) takes similar power as receive mode
    • transmit power in WLANs is x2-x3 of receive power
      • difference larger in WWANs (RF power dominates)
      • often RF transmit power to be varied, and thereby NICs transmit mode power, but at the cost of varying BER
    • transition times are significant [Lorch98]
      • HP’s HSDL-1001 IR transceiver takes 10 s to enter sleep mode, and 40 s to wake up
      • Wavelan takes about 100ms to wake up
      • Metricom’s Ricochet takes about 5s to wake up
power management strategies for wireless nics
Power Management Strategies for Wireless NICs
  • Shutdown strategies similar to disks and CPUs
    • sleep  wakeup transition times << in disks
    • need to worry about data destined to the system
    • could be done by MAC protocols e.g. 802.11
  • Reduce load in NIC
    • header compression
    • stop data transmission during bad channels
    • use proxies to reduce data fidelity, size, resolution
    • repartition the application
  • Reduce the EIRP (transmitted RF power)
    • impact on BER of self & others, and system capacity
help from upper layers in power management of wireless nic
Help from Upper Layers in Power Management of Wireless NIC
  • Minimizing idle time matters the most
    • other factors secondary, such as specific protocol
  • Transport: don’t leave the receiver idle while there is congestion in the network
  • Data scheduling: coordinate data delivery to receiver in bursts
  • S/W control of NI for application-level optimizations
power management inside the radio
Power Management inside the Radio
  • Digital part can be managed
  • However, managing the RF transceiver is not trivial
  • Powering down sections of the transceiver (e.g. buffers, amplifier stages, oscillator) lead to problems
    • stability
    • turn-around time (e.g. time for PLL to settle etc.)
mechanisms to support power management
Mechanisms to Support Power Management
  • Hardware and firmware based power management is problem prone
      • transparent to applications
          • e.g. screen goes blank during slide-show
  • Solution: incorporate application knowledge
      • make power management decision based on application’s knowledge of user’s usage pattern
  • OS is the logical place for doing system power management decision making and coordination
      • e.g. Microsoft’s OnNow architecture and API extensions for Windows98 and Windows 2000
case for power management support in os
Case for Power Management Support in OS
  • Power is a critical, limited, & shared resource
    • OS plays a major role in managing such resources
  • Applications hold an important key
    • application-specific constraints and opportunities for saving energy that can be known only at that level
  • Needs of applications are driving force for OS power management functions & power-based API
    • collaboration between applications and the OS in setting “energy use policy”
      • OS helps resolve conflicts and promote cooperation
microsoft s onnow
Microsoft’s OnNow
  • Win32 API extension allows applications to
    • affect the power management decision making
    • adapt to power state
      • find out if running on batteries so as to reduce processing
      • discover disk state & postpone low priority I/Oe.g. paging
  • Requires changes in hardware, firmware (BIOS), OS, and application software
    • bus & device power management standards for h/w
    • interface standard between OS & hardware
      • ACPI (Intel & Toshiba)
    • integration of power management into app control
advanced configuration and power interface acpi
Advanced Configuration and Power Interface (ACPI)
  • Standard way for the system to describe its device config. & power control h/w interface to the OS
    • register interface for common functions
      • system control events, processor power and clock control, thermal management, and resume handling
  • Info on devices, resources, & control mechanisms
    • Description Tables, linked in a "table of tables"
    • description data for each device:
      • Power management capabilities and requirements
      • Methods for setting and getting the power state
      • Hardware resource settings
      • Methods for setting hardware resources
onnow components
OnNow Components

Ref.: Microsoft’s “OnNow Power Management Architecture for Applications”

onnow architecture
OnNow Architecture
  • User’s view: system is either on or off
  • Reality: system transitions among a number of “power states” according to OS’s power policy
  • Global power states
    • working: apps are executing
    • sleep: software is not executing, & CPU is stopped
      • OS tracks user’s activities & application execution states to decide when to enter sleep monitor user input, hints from applications
      • wake-up is time-based or device-based
    • off: system has shutdown and must reboot
onnow architecture contd
OnNow Architecture (contd.)

Full on

Device and processor power

conservation occurring

according to system usage

Working

OnNow Global Power States

power switch

idle

wake-up

AppearsOff

Processor stopped; devices off

Wake-up events enabled

Timed wake-up enabled

Sleeping

SoftOff

power switch

Off

Able to turn on electronically

Ref.: Microsoft’s “OnNow Power Management Architecture for Applications”

onnow architecture contd1
OnNow Architecture (contd.)
  • Power states for individual devices
    • managed by device drivers while system is “working”
      • function of application needs, device capabilities, and OS information
    • e.g. shutdown serial port if not in use
  • Power states for CPU
    • OS transitions CPU between its various low-power states based on CPU usage
      • function of power source, processing time, user preferences etc.
  • API mechanisms
    • for apps to learn about power events & status from OS
    • for apps to tell their requirements to the OS
self tuning power management anand et al mobicom 2003
Self-Tuning Power Management(Anand et. al., MobiCom 2003)
  • Linux-based power management system for mobile devices

User or Energy Aware OS

slide138

802.11 Power Management

Network interface may be continuously-active (CAM)

  • Large power cost (~1.5 Watts)
  • May halve battery lifetime of a handheld

Alternatively, can use power-saving mode (PSM)

  • If no packets at access point, client interface sleeps
  • Wakes up periodically (beacon every 100 ms)
  • Reduces network power usage 70-80%
slide139

Effect of Power Management on NFS

Time to list a directory on handheld with Cisco 350 card

  • PSM-static:
  • 16-32x slower
  • 17x more energy
  • PSM-adaptive:
  • up to 26x slower
  • 12x more energy
slide140

What’s Going On?

NFS issues RPCs one at a time …..

RPC requests

RPC responses

NFS Server

Access Point

Mobile Client

Beacons

50ms

100ms

100ms

  • Each RPC delayed 100ms – cumulative delay is large
    • Affects apps with sequential request/response pairs
    • Examples: file systems, remote X, CORBA, Java RMI…
know application intent
Know Application Intent
  • Not enough network traffic to switch to CAM
  • Data rate is dependent on the power mgt.

Application: NFS File access

Best Policy: Use CAM during activity period

Beacon Period

PSM

CAM

slide142

Know Application Intent

Application: Stock Ticker that is receiving 10 packets per second

Best policy: Use PSM

  • Data rate is not dependent on power mgmt.

Beacon Period

PSM

CAM

  • STPM allows applications to disclose hints about:
        • - When data transfer are occurring
    • - How much data will be transferred (optional)
    • - Max delay on incoming packets
slide143

Be Proactive

Transition cost of changing power mode: 200-600 ms.

Large transfers: use a reactive strategy

- If transfer large enough, should switch to CAM

- Break-even point depends on card characteristics

- STPM calculates this dynamically

Many applications (like NFS) only make short transfers: be proactive

- Benefit of being in CAM small for each transfer

- But if many transfers, can amortize transition cost

- STPM builds empirical distribution of network transfers

- Switches to CAM when it predicts many transfers likely in future

slide144

Respect the Critical Path

Many applications are latency sensitive

- NFS file accesses

- Interactive applications

- Performance and Energy critical

Other applications are less sensitive to latency

- Prefetching, asynchronous write-back (Coda DFS)

- Multimedia applications (with client buffering)

- Only energy conservation critical

Applications disclose the nature of transfer: foreground or background

slide145

Embrace Performancevs. Energy Tradeoff

Inherent tradeoff exists between performance and energy conservation

STPM lets user specify relative priorities using a tunable knob

slide146

Adapt to the operating environment

Must consider base power of the mobile computer

Consider mode that reduces network power from 2W to 1W

- Delays interactive application by 10%

On handheld with base power of 2 Watts:

- Reduces power 25% (from 4W to 3W)

- Energy reduced 17.5% (still pretty good)

On laptop with base power of 15 Watts:

- Reduces power by only 5.9%

- Increases energy usage by 3.5%

- Battery lasts longer, user gets less work done

slide147

Transition to CAM

STPM switches from PSM to CAM when:

  • Application specifies max delay < beacon period
  • Disclosed transfer size > break-even size
  • Many forthcoming transfers are likely

To predict forthcoming transfers STPM generates an empirical distribution of run lengths

Transfers

>150 ms

>150 ms

>150 ms

Run

Run

Run

Run

slide148

Intuition: Using the Run-Length History

A good time to switch

Switch when expected # of transfers remaining in run is high

expected time to complete a run
Expected Time to complete a Run

Expected time to execute transfers in PSM mode

Expected to execute rest of the transfers in CAM mode

Time penalty for making a PSM to CAM switch

expected energy to complete a run
Expected Energy to complete a Run
  • Energy calculation includes base power
performance and energy tradeoff
Performance and Energy Tradeoff

Calculate expected time and energy to switch after each # of transfers

  • What if these goals conflict?
  • Refer to knob value for relative priority of each goal!
results for coda distributed file system
Results for Coda Distributed File System

Workload: 45 minute interactive software development activity

Energy (Joules)

Time (Minutes)

STPM: 21% less energy, 80% less time than 802.11b power mgmt.

results for coda on ibm t20 laptop
Results for Coda on IBM T20 Laptop

Same workload as before: effect of base power on power mgmt strategies

Time (Minutes)

Energy (Joules)

PSM-Static and PSM-Adaptive use more energy than CAM!

results for xmms streaming audio
Results for XMMS Streaming Audio

Workload: 128Kb/s streaming MP3 audio from an Internet server

Effect of knowing application intent

Power (Watts)

  • XMMS buffers data on client:
  • App not latency sensitive
  • PSM uses least power

STPM: 2% more power usage than PSM-static – no dropped pkts

beyond energy efficiency battery aware design
Beyond Energy Efficiency: Battery-aware Design
  • Theoretical capacity of battery is decided by the amount of the active material in the cell
    • batteries often modeled as buckets of constant energy
      • e.g. halving the power by halving the clock frequency is assumed to double the computation time while maintaining constant computation per battery life
  • In reality, delivered or nominal capacity depends on how the battery is discharged
    • discharge rate (load current)
    • discharge profile and duty cycle
    • operating voltage and power level drained
problem formulation
Problem Formulation
  • Due to battery properties, the ways at which the batteries are discharged have impact on the actual lifetime
  • Static Approach: Fix U(t), find the optimal U that maximize S
  • Dynamic Approach: Heuristic approach to find optimal U(t)
example scenario
Example Scenario

Goal: Needs to transfer as much sensor data as possible with given battery capacity

Resource: Communication

Protocol: similar to 802.11 Ad-hoc Power Savings Mode

  • Simulation Setup
  • Power Number: Based on Measurement of Medusa (Mote)
  • Battery Simulator: Battery Model from John Newman’s Group at UC Berkeley Chemistry Dept. exhibits both rate capacity effect and relaxation effect
radio throughput vs power

Variable

Power

TX

Data Rate=

19.2 kbps

RX

SLP

RX

One Cycle

Radio Throughput vs. Power

Radio BW RAPP: Effective Data rate

battery capacity curve
Battery Capacity Curve
  • Battery model shows the Rate Capacity Effect
static battery state aware approach
Static Battery State Aware Approach
  • T=CAPactual(P)/P
  • SAPP=RAPP(P)·T

Actual Output when Non Battery Aware Approach is applied

can we do better than static approach
Can we do better than Static Approach?

Discharge current toward the end of battery life increase rapidly due to DC/DC converter.

dynamic battery state aware approach based on voltage slope
Dynamic Battery State Aware Approach- Based on Voltage Slope
  • Make a linear projection based on voltage slope change in past intervals
dynamic battery state aware approach based on voltage slope1
Dynamic Battery State Aware Approach- Based on Voltage Slope
  • 720% improvement over non-battery aware approach
  • 20% improvement over the static approach
  • Strength
    • Make use of Battery Voltage Information
    • Responds to battery state
result
Result
  • 75% improvement over non-battery aware approach
  • 18% improvement over the static approach (ranges from 12% to 25%)
beyond reduction energy harvesting
Beyond Reduction: Energy Harvesting
  • Extract energy from the environment and store in a capacitor or battery
    • Wind
    • Solar
    • Vibration/Motion
    • Chemical
  • Challenge: how to manage energy harvesting?
    • Variation in harvesting opportunities
      • E.g. light level is a function of node location
    • How to extract maximum performance?

Prototypes from IASL, UWE, Bristol.

harvesting aware network level tasking

Topology

Control

Routing

Clustering

Harvesting-aware Network-level Tasking
  • Tasking aware of battery status & harvesting opportunities
    • Richer nodes take more load
    • Looking at the battery status is not enough
  • Learn the energy environment

Learn Local Energy

Characteristics

Distributed

Decision

for

Scheduling

Predict Future

Energy

Opportunity

Learn

Consumption

Statistics

example solar harvesting aware routing
Example: Solar Harvesting Aware Routing

morning

Afternoon

Simulation using light traces from James Reserve HelioMote Platform