Integrated management of power aware computing communication technologies
This presentation is the property of its rightful owner.
Sponsored Links
1 / 118

Integrated Management of Power Aware Computing & Communication Technologies PowerPoint PPT Presentation


  • 60 Views
  • Uploaded on
  • Presentation posted in: General

Integrated Management of Power Aware Computing & Communication Technologies. Kickoff review meeting Nader Bagherzadeh, Pai H. Chou, Fadi Kurdahi University of California, Irvine, ECE Dept. DARPA Contract F33615-00-1-1719 September 27, 2000. Agenda. Introduction and overview

Download Presentation

Integrated Management of Power Aware Computing & Communication Technologies

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Integrated management of power aware computing communication technologies

Integrated Management of Power Aware Computing & Communication Technologies

Kickoff review meeting

Nader Bagherzadeh, Pai H. Chou, Fadi KurdahiUniversity of California, Irvine, ECE Dept.

DARPA Contract F33615-00-1-1719

September 27, 2000


Agenda

Agenda

  • Introduction and overview

  • Management status, financial, milestones, schedule.

  • Technical presentation

    • Task progress

      • Architecture

      • Applications

      • CAD

    • Lessons learned, challenges, issues.

  • Questions + action items review.


Outline

Outline

  • Introduction

    • Program goals

    • Project overview

  • Management status

    • Personnel and teaming plans

    • Plans and milestones

    • Financial information

  • Technical presentation

    • Background

    • Technical approach

    • Status and accomplishments

    • Current detailed schedule

  • Program impact and anticipated transitions


Introduction

Introduction


Program goals

Program Goals

  • Power-aware system-level design

    • Enhance mission success (time, task)

    • Rapid customization for different missions

  • Design tool

    • Exploration & evaluation

    • Optimization& specialization

    • Technique integration

  • System architecture

    • Statically configurable

    • Dynamically adaptive

    • Use COTS parts & protocols


Technical approach

Technical approach

  • High-level specification

    • Separate behavior from architecture

    • Explicit constraints (timing, power)

    • Library characterization

  • System synthesis tool

    • Source-aware power usage scheduling

    • Bus topology transformation and communication scheduling

  • Configurable architecture

    • Task migration & selective shutdown

    • Bus segmentation and voltage scaling

  • Domain knowledge

    • Encompass mechanical / thermal power

    • Aware of power supply model


Quad chart

behavioral

system model

high-level

components

composition

operators

parameterizable

components

system

architecture

busses, protocols

Quad Chart

Behavior

Innovations

high-level

simulation

  • Component-based power-aware design

    • Exploit off-the-shelf components & protocols

    • Best price/performance, reliable, cheap to replace

  • CAD tool for global power policy optimization

    • Optimal partitioning, scheduling, configuration

    • Manage entire system, including mechanical & thermal

  • Power-aware reconfigurable architectures

    • Reusable platform for many missions

    • Bus segmentation, voltage / frequency scaling

functional

partitioning

& scheduling

Architecture

mapping

system integration& synthesis

static

configuration

dynamic powermanagement

Year 1

Year 2

Impact

Kickoff

2Q 02

2Q 00

2Q 01

  • Static & hybrid optimizations

    • partitioning / allocation

    • scheduling

    • bus segmentation

    • voltage scaling

  • COTS component library

  • FireWire and I2C bus models

  • Static composition authoring

  • Architecture definition

  • High-level simulation

  • Benchmark Identification

  • Dynamic optimizations

    • task migration

    • processor shutdown

    • bus segmentation

    • frequency scaling

  • Parameterizable components library

  • Generalized bus models

  • Dynamic reconfiguration authoring

  • Architecture reconfiguration

  • Low-level simulation

  • System benchmarking

  • Enhanced mission success

    • More task for the same power

    • Dramatic reduction in mission completion time

  • Cost saving over a variety of missions

    • Reusable platform & design techniques

    • Fast turnaround time by configuration, not redesign

  • Confidence in complex design points

    • Provably correct functional/power constraints

    • Retargetable optimization to eliminate overdesign

    • Power protocol for massive scale


Innovations

Innovations

  • Component-based power-aware design

    • Exploit off-the-shelf components & protocols

    • COTS offer best price/performance, reliable, cheap to replace

  • CAD tool for global power policy optimization

    • Optimal partitioning, scheduling, configuration

    • Manage entire system, including mechanical & thermal

  • Power-aware reconfigurable architectures

    • Reusable platform for many missions

    • Bus segmentation, voltage / frequency scaling


Impact

Impact

  • Enhanced mission success

    • More task for the same power

    • Dramatic reduction in mission completion time

  • Cost saving over a variety of missions

    • Reusable platform & design techniques

    • Fast turnaround time by configuration, not redesign

  • Confidence in complex design points

    • Provably correct functional/power constraints

    • Retargetable optimization to eliminate overdesign

    • Power protocol for massive scale


Management status

Management Status


Personnel teaming plans

Personnel & teaming plans

  • UC Irvine, Co-PI's- Design tools

    • Nader Bagherzadeh

    • Pai Chou

    • Fadi Kurdahi

  • UC Irvine, research assistants

    • Dexin Li

    • Jinfeng Liu

    • Afshin Niktash

  • USC- Component power optimization

    • Jean-Luc Gaudiot

    • Seong-Won Lee

  • JPL- Applications and benchmarking

    • Nazeeh Aranki

    • Nikzad “Benny” Toomarian


Previous work

Previous work

  • Design tools

    • System-level:the Chinook HW/SW codesign tool

    • Architectural synthesis (w/ physical design considerations)

  • Components

    • Reconfigurable computing:the MorphoSys Chip

    • Parameterizable components:PCL

    • Simultaneous MultiThreadingvs. Chip MultiProcessing

  • Architectural platform

    • Segmented busX-2000, Mars Pathfinder

    • Configurable SMP


Responsibilities

Responsibilities

  • Bagherzadeh, Chou, Kurdahi -- co-PIs

    • Oversee project operation

    • Integration into curriculum and related research efforts

  • Li, Liu, Afshin -- RA's

    • Development of CAD tools

    • Modeling of demonstrator examples

    • Authoring of component / protocol library

  • JPL

    • Furnish example specifications

    • Co-develop optimization techniques

  • USC

    • Supporting link to low-level technologies


External collaborations

External collaborations

  • JPL

    • X-2000 multi-mission architecture

    • Mars Pathfinder as baseline

    • JPL to provide COTS testbed

    • JPL to evaluate IMPACCT optimizations

  • USC

    • Parameterizable components

    • Low-level power estimation

  • Consystant Design Technologies (Seattle, WA)

    • Framework for component-based design

    • IMPACCT plugins to support power management


Technical background

Technical Background


Background morphosys project

Background: MorphoSys project

  • Reconfigurable processor array

  • MIPS-like RISC processor

  • High-bandwidth data interface

  • 100 MHz clock

  • 0.35µm 4metal CMOS

  • Software support

  • Platform for dynamic power management

Advanced RISC

Processor

MorphoSys

Reconfigurable

Processor Array

System Bus

Instr./Data

Cache (L1)

High Bandwidth

Data Interface

External Memory

(e.g. SDRAM, RDRAM)


Rc array and context memory

column block

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

row block

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

RC

16

16

16

16

16

16

16

16

16

16

16

16

16

16

16

16

RC Array and Context Memory

  • Context Memory

  • 2 blocks

  • 8 sets in each block

  • A set controls 1 row or column (SIMD)

  • 16 contexts in 1 set.

  • Possible to overlap ctx broadcast with ctx reloading


The m1 chip layout

The M1 chip layout


M1 chip test fixture

M1 chip test fixture


Software environment

mLoad

Software environment

mView

App. (C Code)

TR_app

a = b + c

p = a + 1

Configuration

context

Z=RC_F(X)

W=RC_F(Y)

RC Array

functions

Context

Lib.

mcc

0100011....11100

1100110....00010

0011101....10100

mSched

Executable

MuLate,

MorphoSim

MorphoSys

Chip

C++,

VHDL

TinyRISC

RC Array


Background on usc s smt work

Background on USC's SMT work

  • High performance processors

    • Superscalar processor (SSP)

    • Single chip multiprocessor (CMP)

    • Very long instruction word (VLIW)

    • Simultaneous multithreading (SMT)

  • Performance and power dissipation

    • High performance need high power consumption

  • Recent applications need for low power, high performance processor


Microarchitectural tradeoffs

Microarchitectural tradeoffs

  • Power tradeoffs between different architectures

    • SMT vs. SSP:

      • SMT has more modules than SSP

      • SMT has better performance and consumes more power

    • SMT vs. CMP:

      • SMT has better utilization

      • They have similar performance, but SMT consumes less power

    • SMT vs. VLIW:

      • SMT consume more power

      • SMT has compatibility with conventional architecture

  • Design of simple SMT

    • A simplified SMT may consume less power and still have the advantage of TLP

  • Analysis of architectural features

    • Power drain of modern processor (control vs. data path)


Smt design methodology

SMT design methodology

  • Measuring power consumption of a processor

    • Checking transitions of signals and module operations

    • Hardware implementation of the processor simulator

  • Measuring performance of modules

    • The contribution of each module to the total performance

    • Performance-power ratio of each module

  • Comparison between architectures

  • Design of a low power processor


Measuring performance

Measuring performance

  • Finding the performance per power of each module

    • Simulate and measure the performance without a module

    • Calculate the performance per power for each module

    • Classify modules if more than two modules cooperate with each other

  • Find the solution for the low power high performance processor


Background chinook project

Background: Chinook project

  • Component-based HW/SW codesign framework

    • Specification, simulation, synthesis

    • Motivated by IP reuse, system integration

  • Problem: IP reuse forces modification

    • Reason:components have hardwired coordination protocols

  • Approach

    • Adaptable components

    • Separate coordination protocolsfrom components

  • Benefits

    • Reuse without modification

    • Enable system-level optimizations


Example protocol subsumption

s

i

y

i

i

y

s

i

joystick

override

idle

subsuming

i

bumper

escape

s

s

i

sonar

avoid

s

subsuming

yielding

F

B

i

wheels

y

i

W

B

W

sensors

actuators

decision

modules

decision

composition

Example protocol: Subsumption

  • Must handle three cases:

    • Subsuming, yielding, idle

    • Hardwired protocol

  • Generalization:

    • Adaptable components (by mode mapping)

    • Separate protocols & components

y

s

i

y

s

i

+subsuming

y

subsumption

interface

idle

subsuming

yielding

s

i

Bumper

process

y

release

W

F

B

W

T

2s

F

B

T

s

i

bump

45d

+B

+W


Architectural mapping

mode

manager

Architectural mapping

  • Single processor or multiple processors

  • Multiple mappings to an architecture

modal

processes


Distributed mode managers

mode

manager

Distributed mode managers

  • Automatically partitioned among processors

    • Synthesized control communication

    • Comm. tradeoffs: synchronization, replication

modal

processes


Technical presentation

Technical Presentation


Past missions mars pathfinder

Past missions – Mars Pathfinder

“Sojourner”

The Mars Pathfinder Microrover Flight Experiment

Alpha Proton X-ray Spectrometer (APXS)


Application requirements

Application requirements

  • System specification

    • 6 wheel motors

    • 4 steering motors

    • System health check

    • Hazard detection

  • Power supply

    • Battery (non-rechargeable)

    • Solar panel

  • Power consumption

    • Digital

      • Computation, imaging, communication, control

    • Mechanical

      • Driving, steering

    • Thermal

      • Motors must be heated in low-temperature environment


System level power budget

Energy Required

Function

Time and Calculation

7.51W-hr

5.63W-hr

6.92W-hr

1.83W-hr

0.45W-hr

1.2W-hr

5.2W-hr

0.63W-hr

15.0W-hr

50W-hr

95W-hr

motor heating: 1 motor at a time

motor heating: 2 motors at a time

driving (extreme terrain @ -80degC)

hazard detection

imaging (3 images @ 2 min/image)

image compression (compress 3 images @ 6 min/image)

6Mbit communication @ 50min/sol

42, 10 sec health checks during day

remainder of 7 hr daytime CPU operation

WEB heating (as needed)

= 7.51W x 1hr

= 11.26W x 0.5hr

= 13.85W x 0.5hr

= 7.33W x 0.25hr

= 4.5W x 0.1hr

= 3.7W x 0.3hr

= 6.27W x 0.8hr

= 6.27W x 0.1hr

= 3.7W x 4hr

= 50W-hr

System-level power budget


Design issues

Design issues

  • Timing constraints

    • System health check 10s/10min

    • Heating motor for 5s, 50s prior to driving

    • Hazard detection 10s – steering 5s – driving 10s

  • Power management

    • Low-power electronics cannot make significant power saving

    • No system-level management tool available

  • Conservative hand-crafted schedule

    • Serialize all operations to avoid power surge

    • Long execution time

    • Solar power wasted


Present missions athena mars 03 rover configuration

Present missions – Athena/Mars ’03 Rover configuration

Pancam/Mini-TES

Instrument Arm Cluster :

Raman Spectrometer

Alpha-Proton-X-Ray Spectrometer (APXS)

Mössbauer Spectrometer

Microscopic Imager

Mini-Corer


Athena mars 03 rovers power subsystem

Athena/Mars ‘03 Rovers - power subsystem

  • Power utilization:

    • 38 W = 19 W (CPU&I/O) + 9 W (accel and gyro) + 10 W (wheel motors) for driving.

    • 75 W = 19 W (CPU&I/O) + 55 W (transmission) for orbiter communication

    • 30 W = 19 W (CPU&I/O) + 10 W (transmission) for lander relay communication

    • 55 W = 19 W (CPU&I/O) + 33 W (peak motor) for drilling

    • 29 W = 23 W (CPU&I/O) + 6 W (cameras) required for imaging

    • 11 W Raman, 1.4W APXS and 2.3 W for nighttime spectrometer operation

    • 141Whr daily for housekeeping engineering

    • 75Whr limit for nighttime operations


Present missions muses cn asteroid nanorover

Present missions – MUSES-CN Asteroid NanoRover

  • Completely solar powered

    • Requiring only 1 watt, including an RF telecommunications system for communications between the rover and a lander or small-body orbiter for relay to Earth.

  • Power source

    • 500 grams of commercial, non-rechargeable, replaceable lithium batteries, with energy density of 750 joules per gram.


Power aware designs

Power-aware designs

  • Subsume low power as a special case

    • Minimize power consumption

    • Minimal application specific knowledge, limited reconfiguration space

    • Conservative

  • Make best use of available power

    • Use MAX solar power while it's available

    • Increase parallelism, perform more tasks, reduce mission time

    • Both MIN and MAX power constraints

  • Application-specific knowledge

    • Multiple mission requirement

    • Adapt to run-time power supply, operating environment


System level power management

System-level power management

  • Amdahl's law -- extended to power

    • Component-level improvements must be scaled by % contributions

    • Synergy between inter-component interactions

  • Scope of system power model

    • Digital, mechanical, thermal

    • Battery model - control power surge

    • Renewable source - solar panel, etc

  • Mission-driven tradeoffs

    • Execution time vs. power saving

    • Adapt to operating environment


What s needed

What's needed?

  • Reconfigurable system architecture

    • Statically configurable for different missions

    • Reconfiguration for dynamic power management

    • Support state-of-the-art power management policies

  • System-level design tool

    • Support design space exploration

    • Take full advantage of COTS components

    • Optimize mission-specific system configuration

    • Synthesize system-level power manager

    • Support simulation for early validation


X2000 avionics system architecture

X2000 avionics system architecture

  • Symmetric COTS multiprocessors

    • Low cost component with strong commercial support

    • Widely accepted specification, design, application and testing

    • Reduced development cost

  • Dual system bus architecture

    • High speed data rate with moderate power

    • Low speed control with low power

  • Industry standard bus protocols

    • FireWire (IEEE 1394) bus

    • I2C bus

    • Reconfigurable bus topology


Pa system architecture

PA system architecture

The NASA X2000 Avionics System

high-rateinput

symmetric

multiprocessor

modules

reconfigurable

hardware blocks

communication

module (CDMA)

(camera)

high-speed bus

(e.g. IEEE 1394)

low-speed bus

(e.g. I2C )

bus power

controller

microcontroller-directed subnet

- power regulations & control

- analog telemetry sensors

- safety inhibits

- valve & pyro drive

altimeter

subnet


Applicable power optimizations

Applicable power optimizations

  • Application level

    • Scheduling under timing and power constraints

    • Task partitioning, allocation, migration

    • Algorithm selection

  • Architecture level

    • Bus segmentation / clustering

    • Communication scheduling

  • Component level

    • Voltage / frequency scaling

    • Power down

  • X-2000 goals

    • Digital electronics power:10x decrease

    • Analog electronics power:2x decrease

    • Computer performance:10 to 20x increase

both static &

dynamic versions


The need for a system level cad tool

The need for a system-level CAD tool

  • Avoid pitfalls with manual design

    • Overdesign (too conservative)

    • Hardwired assumptions in implementation (hard to change/adapt)

    • System integration (bottleneck in projects)

  • Scalable methodology

    • Specification: separation of concerns

      • Behavior vs. architecture

      • Policy vs. mechanism

      • Constraint vs. implementation

    • Exploration

      • Framework for technique integration

      • Rapid feedback

    • Manage complexity

      • Knowledge base for component/bus details

      • Consistent knowledge propagation through design stages


Design tool

Design tool

  • Library

    • Components and bus protocols

    • Provides power estimation

    • Defines configuration space

  • Authoring

    • Behavioral description, architecture description

    • Mapping from behavior to architecture

  • Synthesis

    • Scheduling, partitioning

    • Bus segmentation, voltage scaling

    • Synthesis of power manager with task scheduler

  • Simulation

    • High-level: explore design space

    • Detailed-level: power/performance for a given design point


Impac 2 t overview

high-level

components

composition

operators

behavioral

system model

parameterizable

components

system

architecture

busses, protocols

IMPAC2T overview

Behavior

high-level

simulation

functional

partitioning

& scheduling

Architecture

mapping

system integration& synthesis

static

configuration

dynamic powermanagement


Library low level components

VHDL code

Bus width = 8

Bus width = 16

Library: low-level components

  • Supported components

    • COTS

    • Parameterizable

  • Levels of abstraction

    • Parameterizable

    • Simulatable

    • Synthesizable

    • Reconfigurable


Library component definition

Library: component definition

  • Component interface

    • Physical:pin interface

    • Functional:data and control interface

    • Power, current, voltage

  • Power/mode characterization

    • Mode governs power usage

    • Restrictions on mode changes allowed

    • High-level yet refined power estimation

  • Aggregation

    • Smaller components combined into larger ones

    • New external parameters, interfaces, modes


Example components

Example components

  • Processor :

    • PowerPC, ARM, Pentium, MIPS

  • Microcontroller

    • StrongARM, Intel 8051, Motorola 68HC11, 68332

  • Bus controller/transceiver:

    • FireWire controller& transceiver

    • I2C bus controller, GPIB

  • Memory

    • SRAM

    • DRAM

    • Flash memory


Example component definition

Example component definition

  • FireWire bus transceiver: National Semi CS4103

    • Working voltage: 3.3 V

    • Power modes

      • Full-on (400mW)

      • PHY-on (150mW)

      • Standby (50mW)

      • CLK-disable (21mW)

      • Crystal-disable (16mW)

  • FireWire bus controller: National Semi CS4210

    • Working voltage: 3.3 V

    • Power modes

      • Full-on (300mW)

      • Standby (17mW)

  • Aggregated bus transceiver/controller

    • Up to ten working modes to play with

    • Flexibility in power management


Library bus protocols

Library: bus protocols

  • Architecture

    • Parallelism (parallel or serial)

    • Topology (serial, tree, ring)

    • Service layers (physical, link, transaction, application)

  • Communication

    • Data transfer mode (asynchronouus, isochronous)

    • Data transfer speed

    • Response mode (need acknowledgement or not)

    • Arbitration mode

  • Configuration

    • Configuration process (deterministic or randomly )

    • Reconfigurability (statical, hybrid, dynamical)

  • Power

    • Power mode ( full-on, standby, deep-sleep, shutdown)

    • Media (cable, wireless, backplane)


Bus protocols exploration

Bus protocols exploration

  • Explore bus protocol dimensions

    • Protocol simulation

      • Input: bus protocol model

      • Ouput: sequency of events

    • Map events into relative power quantities

    • Compare and tradeoff between different design points

  • Example: simulating FireWire bus configuration

    • Event-driven simulator

    • Compare two designs with different topology

      • Pure tree topology (acyclic)

      • Tree topology with bus segmentation

    • Tree-ID process, 9 nodes

      • Tree 37 events

      • Segmented tree 24 events


Bus optimization

Bus optimization

  • Bus: a significant power consumer

    • Up to 30% - 50% of the total system power consumption[Mehra97]

    • Bus power consumption determined by

      • Capacitance (load C and bus C, proportional to bus length)

      • Voltage (bus supply voltage and swing voltage)

      • Bus access frequency

      • Bus signal switching activity

  • Why bus power optimization?

    • System performance requirements

    • Power constraints

    • Adapt to execution time variations

    • Bus segmentation for increased bandwidth

    • Enable other novel power management techniques


Bus level optimizations

Bus-level optimizations

  • Bus encoding [Shin98][Benini97][Nakase98]

    • Minimize switching activity on bus

    • Makes sense mostly for parallel bus

    • Gray code, bus-invert code, T0 code and Beach code

    • Bus driver design

  • Bus clustering (segmentation) [Mehra97][Zhang98]

    • Optimize bus topology by grouping components

    • Divide the global bus into multiple segments

    • Benefits:

      • Reduced bus capacitance (power saving)

      • Shorter bus latency, higher throughput, increased flexibility

  • Partitioning [Hauck95][Yang94][Cong93]

    • Divide tasks among components

    • Minimize inter-cluster traffic

    • Clustering before partitioning


Firewire ieee 1394

FireWire (IEEE 1394)

  • High speed serial bus

    • 100, 200, 400 Mbps in 1394a

    • 800M, 1.6Gbps in 1394b

  • Advantages

    • Low power

    • Real-time bandwidth guarantee => important for media apps

    • Isochronous and asynchronous transfer modes

    • Hot-pluggable, self reconfiguring

    • Supports bus segmentation


X2000 architecture mapping

X2000 architecture mapping

  • Map Mars Rover application onto X2000 architecture

Legend

CAM: camera

MC: micro controller

HD: hard drive

NVM: non-volatile memory

SCI: scientific equipment

RF modem: radio frequency modem

I2C bus omitted on this diagram

CPU 1

CPU2 (Bus controller)

HD / NVM

SCI

SCI1

SCI

SCI2

FireWire 1394 Bus

MC1

MC2

MC3

RF Modem

CAM

Tasks:

  • Capture picture, compress in CPU1, and send data to RF Modem

  • MC's are responsible for sensing, drive control, steering control

  • SCI's carry out scientific experiments, sending data to CPU2

  • After analysis, CPU2 stores data in HD/ NVM


Bottlenecks in an unsegmented architecture

Bottlenecks in an unsegmented architecture

  • Contention for bus bandwidth

    • Camera, RF, harddisk

    • Forces serialization of communication globally

  • All nodes must be kept awake

    • Prevents component shutdown

    • Global overhead for bus reconfiguration

  • Long routing path

    • Power overhead on routing controllers


Segmentation example

CPU2/ Bus controller

CPU1/DSP

MC2

HD

RF Modem

CAM

MC1

MC3

SCI1

SCI2

Segmentation example

Three bus segments

Suppose bus bandwidth is 100Mbps, image size 20Mb each, 20 pictures to work on, SCI data volume 16kbps X 10 Ks X 2 (4 hrs a day)

Power numbers:

CPU1: 4.0W

CPU2: 240mW

RF modem: 1.7 W

Camera: 2.6 W

SCI1: 0.8 W

SCI2: 3.2 W

Power number details

CAM picture capture

image compression

RF transmission

SCI

scientific experiment

MC sensing

drive control

steering control


Bus segmentation with firewire

Blue nodes can't be disabled

All nodes’ PHY layers must remain active.

Request packets are broadcast to all nodes

Gray nodes can be safely disabled

They are in different segments from the active ones.

Request packets are broadcast to only active nodes.

segmentation

Bus segmentation with FireWire


Throughput improvement

CPU 1

CPU2 (Bus controller)

CPU2/ Bus controller

HD / NVM

SCI

SCI1

SCI

SCI2

FireWire 1394 Bus

MC1

MC2

MC3

CPU1/DSP

MC2

HD

RF Modem

CAM

RF Modem

CAM

MC1

MC3

SCI1

SCI2

100Mbps bandwidth

9s transfer time

Throughput improvement

No useful traffic

Bus segmentation help improve bus bandwidth.

300Mbps

5s transfer time


Bandwidth enabled voltage scaling

CPU2/ Bus controller

Power consumption = 12.3 W

CPU1/DSP

MC2

HD

Power consumption after voltage scaling = 9.2 W

Bandwidth

100Mbps

RF Modem

CAM

MC1

MC3

SCI1

SCI2

Bandwidth-enabled voltage scaling

Use voltage scaling and

clock scaling to decrease

component power.

Could be 300Mbps,

keep it at 100Mbps


Power latency reduction

Power consumption = 12.3 W

energy consumption = 111 J

Data transfer time = 9 s

Power saving 25%

energy saving 58%

Power consumption after voltage scaling = 9.2 W

energy consumption = 46 J

Data transfer time = 5 s

Power/latency reduction

Note: bus configuration power not counted


Segmentation enabled shutdown

All components’ bus interfaces are active.

Entire bus is hot.

Drive control

(10 min.)

Picture capture

(6 min.)

Drive control

(20 min.)

Science experiment

(20 min.)

Segmentation-enabled shutdown

Non-operating bus segments are disabled.

Non-operating components are disabled.

Bus power is saved.


Combined energy savings from static techniques

Not shutting down inactive nodes:

Bus transceiver active all the time.

Transceiver energy:

150 mW x 10 x 3360 s = 5040J

Transceiver: National Semi CS4103, PHY-active only mode.

Combined energy savings from static techniques

Shutting down inactive nodes:

27 times of global bus configs.

Only 11 bus configurations

Config energy << 165 J

Transceiver energy 1962 J

Config energy + transceiver energy

< 1962 + 165 = 2127J

2.4 X energy reduction!


Dynamic bus reconfiguration

New task: send data from HD to RF modem!

(continue from previous task )

CPU2/ Bus controller

CPU1

MC2

HD

Science experiments

Radio frequency data transfer

SCI+RF (20+60 min)

RF Modem

CAM

MC1

MC3

SCI1

SCI2

Solution: dynamically change bus topology

CPU2/ Bus controller

CPU1/DSP

MCS2

HD

CAM

MCS1

MCS3

SCI1

SCI2

RF Modem

Science experiments

Radio frequency data transfer

SCI+RF (20+60 min)

Dynamic bus reconfiguration


Energy savings from dynamic bus reconfiguration

Energy savings from dynamic bus reconfiguration

Power number list:

Local config: 12.7W

Global config: 23.7W

Active transceiver: 150mW

Segmentation: software support

Bus segment: proportional to bus length

Local configuration: none

Global configuration: 1

re-segmentation : 1

Active transceiver: 3+2

Active bus segment: 1

Local configuration: 3

Global configuration: none

re-segmentation : none

Active transceiver: 7

Active bus segment: 2

Energy: 23.7 x 1 x 1+ 0.15 x 3 x 4800 + 0.05 x 2 x 4800 = 2664 J

Energy: 12.7 x 3 x 1+ 0.15 x 7 x 4800 = 5078 J

1.9 X energy reduction!


Summary of architecture optimization

Summary of architecture optimization

  • Towards loose coupling

    • Reduced bus contention

    • Increased parallel bandwidth

    • Enabling voltage/frequency scaling

  • Application-driven clustering

    • Communication bandwidth requirements between processes

    • Knowledge from high-level behavioral model

  • Static optimization2.4x energy reduction

    • Bus segmentation

    • Cluster shutdown

  • Dynamic reclustering1.9x energy reduction


Power management optimization

Power management & optimization

  • Behavioral modeling

    • Extract power related attributes of all objects

  • Architecture modeling

    • Use low-power devices or devices that can operate on low-power mode

  • Partitioning

    • Migration – merge computations on under-utilized processors on one processor to improve utilization

    • Segmentation – separate tightly coupled computations into clusters to localize communication

  • Scheduling

    • Arrange operation sequences on multi-processor / multiple power consumer to meet both performance and power requirement


Behavioral model

Behavioral model

  • Application specific knowledge

    • Input, output and function

    • Dependency and precedence

    • Control and data flow

    • Timing and sequence

  • Software architecture

    • Operating system features – real-time, centralized, distributed, and etc.

    • Execution model – event driven, interrupt, distributed agent, client-server, and etc.

    • Communication model – protocol stack and specification

  • Power related attributes

    • Data rate, execution time, CPU speed, memory size, communication path, and etc.


Allocation

Allocation

  • Map behavioral objects to hardware

    • Group related OS, communication, control and application objects into processing nodes

    • Extract data objects into storage nodes

    • Allocate components/packages for each processing node

    • Arrange data storage for data nodes and optimize storage location to reduce communication

  • Map communication paths to busses

    • Setup working mode of each component/package to fit the behavioral requirement

    • Extract attribute of each structure

      • Function – computation, control, communication

      • CPU utilization

      • Bus traffic

      • Power consumption


Scheduling

Scheduling

  • Mapping of tasks to time slots

    • Computation

    • Communication

  • Mapping of power usage to time slots

    • Mechanical devices

    • Thermal subsystems

    • Other electronics subsystems

  • Constraints

    • Real-time deadlines, periods, min/max separation

    • Power budget, power surge (min/max)

    • Potentially scenario-driven


Scheduling techniques

Scheduling techniques

  • Deadline based real-time scheduling on multiprocessors

    • Rate-monotonic scheduling – extend existing RM scheduling to multiprocessors

    • Timing constraint graph scheduling – multiple serializable sequences in a single heart beat


Novel impacct scheduler

Novel IMPACCT scheduler

  • A novel graphical tool

    • Timing and power constraint visualization

    • Transforms them into graph problems

    • Give designers a vision to the power surge at run-time

  • Complete system-level model

    • All power sources

    • All power consumers

  • Power-aware scheduling

    • Schedule operations based on power source output

    • Both performance requirement and power constraint

    • Regulate power surge

    • Optimize for power efficiency and reduce execution time


Impacct scheduler

Power

Power level

Energy consumption

Time

Starting time

Ending time

IMPACCT scheduler

  • Extended Gantt-chart in real-time scheduling for single processor

    • Event – bins

      • Timing – horizontal size

      • Power – vertical size

      • Energy – area of the bin

    • Power surge – compacting bins downward

Demo


Impacct scheduler1

Power

Task D follows B

D

D

D

Periodic task C

C

C

C

C

C

B

B

B

B

Periodic task B

Constant task A

A

Time

IMPACCT scheduler

  • Scheduling chart for multi-processor and multiple power consumers

    • Events can overlap vertically

      • Multi-processor

      • Multiple power consumer – electronics, mechanical, thermal

    • Power awareness – min and max power supply

Demo


Impacct scheduler2

Squeeze/extend bin to available time slot

Slide bin within timing space

Min timing constraint of D

D

Power

C

Max timing constraint of D

Scheduling space of D

C

C

C

B

B

A

Deadline of B (scheduling space)

Time

Deadline of B

Deadline of C (scheduling space)

Deadline of C

IMPACCT scheduler

  • Timing constraints – bin packing problem to satisfy horizontal constraints

    • Independent tasks – moving bins horizontally

    • Dependent tasks – moving grouped bins horizontally

    • Power/voltage/clock scaling – extending/squeezing bins

Demo


Impacct scheduler3

Automated global scheduling to meet min-max power

Power

Attack spike

Improve utilization

C

Max

B

B

D

C

C

Min

A

Time

Manual scheduling while monitoring power surge

Power

D

C

C

B

B

A

Time

IMPACCT scheduler

  • Power constraints – bin packing problem to satisfy vertical constraints

    • Automatic optimization – let the tool do everything

    • Manual optimization – visualizing power in manual scheduling

Demo


Example revisited mars rover

Example revisited – Mars Rover

  • System specification

    • 6 wheel motors

    • 4 steering motors

    • System health check

    • Hazard detection

  • Power supply

    • Battery (non-rechargeable)

    • Solar panel

  • Power consumption

    • Digital

      • Computation, imaging, communication, control

    • Mechanical

      • Driving, steering

    • Thermal

      • Motors must be heated in low-temperature environment


Timing constraints mars rover

Timing constraints – Mars Rover


Scheduling method

Scheduling method

  • Constraint graph construction

    • Nodes: operations

    • Edges: precedence relationship between operations

  • Resource specification

    • Resource: an executing unit that can perform operations independently

      • Six thermal resources for wheel heating

      • Four thermal resources for steer motor heating

      • One mechanical resource for driving

      • One mechanical resource for steering

      • One computation resource for control

    • Operations on one resource must be serialized

  • Scheduling

    • Primary resource selection

    • Schedule primary resource by applying graph algorithms

    • Auxiliary resources and power requirement are considered as scheduling constraints


Constraint graph

Constraint graph

Hazard detection / Thd

System health check / Thc

Heat steer 1 / Ths

Heat steer 2 / Ths

Heat steer 3 / Ths

Heat steer 4 / Ths

Steer / Ts

thc

-ths

-(thc + Thc)

System health check / Thc

Heat wheel 2 / Thw

Heat wheel 3 / Thw

Heat wheel 5 / Thw

Heat wheel 6 / Thw

Heat wheel 1 / Thw

Heat wheel 4 / Thw

Drive / Td

- thw


Resource specification

Hazard detection (C) / Thc / Phc_C

Hazard detection

Health check (C) / Thc / Phc_C

Health check

Steer (C) / Ts_C / Ps_C

Heat steer i (C) / Ths_C / Phs_C

thc

Heat steer i (T) / Ths_T / Phs_T

Steer (M) / Ts_M / Ps_M

-(thc + Thc)

Heat steer i

Steer

Health check (C) / Thc / Phc_C

-ths + Ths_E

Health check

Heat wheel j (C) / Thw_C / Phw_C

Drive (C) / Td_C / Pd_C

Heat wheel j

Heat wheel j (T) / Thw_T / Phw_T

Drive (M) / Td_M / Pd_M

Drive

Computation

Mechanical

-thw + Thw_E

Thermal

Resource specification


Scheduling graph

Primary resource: Computation

Auxiliary resource: Thermal

Auxiliary resource:

Mechanical

Health check (C) / Thc / Phc_C

Hazard detection (C) / Thc / Phc_C

thc

-(thc + Thc)

Steer (C) / Ts_C / Ps_C

Heat steer i (C) / Ths_E / Phs_E

Heat steer i (T) / Ths_T / Phs_T

Steer (M) / Ts_M / Ps_M

-ths

-ths + Ths_E

-Ts_C + Ts_M

Heat wheel j (C) / Thw_E / Phw_E

Drive (C) / Td_C / Pd_C

Heat wheel j (T) / Thw_T / Phw_T

Drive (M) / Td_M / Pd_M

-thw

-thw + Thw_E

Scheduling graph


Example mars rover

Example – Mars Rover

  • Power constraints

    • Different solar power supply over time

    • Different power consumption over temperature/time


Previous solution by jpl

Previous solution by JPL

  • Over-constrained, conservative

    • Serialize every operation to satisfy power constraint

    • Longer execution time and under-utilization of solar power

    • No scheduling tool is used – manual scheduling

  • Not power-aware

    • Scheduling without considering power sources and consumers

System heart-beat - moving two steps

(a) Begin with health check (b) no health check


Solution 1 high solar power 14 9w

Solution 1: high solar power (14.9W)

  • Max solar power: 14.9W at noon

    • Improved utilization of solar power

    • Automated scheduling – use scheduling tools

  • Aggressive – do as much as possible

    • heating motors while doing other operations

    • Fastest moving speed – no waiting on heating

System heart-beat - moving two steps

(a) Begin with health check (b) no health check


Solution 2 typical solar power 12w

Solution 2: typical solar power (12W)

  • Moderate solar power output – 12W

    • Improved utilization of solar power

    • Automated scheduling – use scheduling tools

  • Moderately aggressive – avoid exceeding power limit

    • Relaxed constraint –heating motors while doing other operations

    • Faster moving speed – some waiting time on heating

System heart-beat - moving two steps

(a) Begin with health check (b) no health check


Solution 3 low solar power 9w

Solution 3: low solar power (9W)

  • Minimum solar power output – 9W

    • Restricted constraint – serialize operations

    • Automated scheduling – use scheduling tools

  • Conservative – same as JPL solution

    • Slow moving speed

    • Full utilization of low solar power

System heart-beat - moving two steps

(a) Begin with health check (b) no health check


Comparison

Comparison

  • JPL's previous solution

    • Conservative – long execution time, low solar power utilization

    • Not power aware – same schedule for all cases

    • Not intend to use battery energy

  • Our solution

    • Adaptive – speedup when solar power supply is high

    • Power-aware – smart scheduling on different power supply/consumption

    • Use battery energy when necessary


Application level evaluation

Application-level evaluation

  • Mission description

    • Target location – 48 (distance-) steps away from current location

  • Power condition

    • 14.9W solar power for first 10 minutes, 12W for next 10 minutes, 9W thereafter

  • Metrics

    • Execution time

    • Total energy drawn from battery


Application level evaluation1

Application-level evaluation

  • Power-awareness

    • Execution speed scales with power condition adaptively

  • Smart schedule

    • Maximize best case

    • Avoid worst case

  • Tradeoff

    • Power vs. performance

    • Energy renewability

  • Application-specific

    • Application-level knowledge

    • Working mode parameters of components


Program plans and milestones

Program plans and milestones


Development plans

Development plans

  • Web-based CAD tool

    • Perl/CGI scripts for configuration

    • Java applets for interactive scheduling UI

    • Interface with database engine

  • Interface with commercial CAD backend

    • Detailed power estimation tools

    • Functional simulation with proprietary models

  • Rationale

    • No software installation needed by end user

    • Ready to use by everyone on the Internet

    • Open source with all publicly available development tools


Status accomplishments to date

Status & accomplishments to date


Impacct schedule

IMPACCT schedule

Aug 2000

Nov 2000

Sept 2000

July 2000

Jan 2001

Oct 2000

Dec 2000

Library

Authoring

Partitioning

Scheduling

Segmentation

Volt. Scaling

Simulation

planned

in progress

core tool

UI


Original schedule

System modeling

Coordination synthesis

Architecture definition

Static partitioning

Component partitioning

Component simulator

PCL benchmarking

Synthesizable components

System benchmarking

Original schedule

  • Authoring tool v1.0

  • Dynamic partitioning

  • Simulator v1.0

  • Component partitioning

network

option

Kickoff

2Q 02

2Q 00

2Q 01

  • Power aware design techniques

  • PCL definition

  • Simulatable components

  • Benchmark Identification


Updated schedule

Static & hybrid optimizations

Partitioning / allocation

Scheduling

Bus segmentation

Voltage scaling

Library

COTS components

FireWire and I2C bus models

Static composition authoring

High-level simulation

Benchmark Identification

Architecture definition

Updated schedule

Year 1

Year2

option

Kickoff

2Q 02

2Q 00

2Q 01

  • Dynamic optimizations

    • Task migration

    • Processor shutdown

    • Bus segmentation

    • Frequency scaling

  • Library

    • Parameterizable components

    • Parameterizable bus models

  • Reconfiguration authoring

  • Architecture reconfiguration

  • Low-level simulation

  • System benchmarking


Quarterly schedule

FireWire and I2C bus models

Static bus segmentation

Architecture definition

Low-level simulation

System benchmarking

Frequency scaling

Quarterly schedule

2000

  • COTS components library

  • Static scheduling

  • Benchmark identification

2001

  • Parameterizable components

  • Dynamic scheduling

  • Parameterizable bus models

3Q

3Q

  • Hybrid bus segmentation

  • Architecture reconfiguration

  • Dynamic task migration

4Q

4Q

2002

2001

  • Static partitioning / allocation

  • Hybrid scheduling

  • Static composition authoring

  • Dynamic processor shutdown

  • Dynamic bus segmentation

  • Dynamic reconfig. authoring

1Q

1Q

  • High-level simulation

  • Hybrid partitioning / allocation

  • Voltage scaling

2Q

2Q


Financial information

Financial information


Impacct budget

IMPACCT budget

  • Months 1-6$180,000Months 7-12$180,000Second year$400,000


Budget distribution

Budget distribution


Http www ece uci edu impacct

http://www.ece.uci.edu/impacct/


Bibliography

Bibliography

  • [Mehra97] R. Mehra, et al. "A partitioning scheme for optimizing Interconnect power", IEEE Journal of solid-state circuits, Vol. 32, No.3, March 1997

  • [Shin98] Y. Shin, et al. "Reduction of bus transitions with partial bus-invert coding", Electrons Letters, vol.34, No.7, IEE 2 April 1998 p. 642-3

  • [Benini97 ] L. Benini et al. "Asymptotic zero-transition activity encoding for address buses in low-power microprocessor-based systems", Proceedings Great Lakes Symposium on VLSI, Los Alamitos, CA, USA: IEEE Comput. Soc. Press, 1997, p.77-82

  • [Nakase98] Y. Nakase et al. "Complementary half-swing bus architecture and its application for wide band SRAM macros", IEE proceedings-Circuits, Devices and Systems, vol.145, No.5 IEE, Oct 1998, p337-42

  • [Zhang98] Y. Zhang et al. "An alternative architecture for on-chip global interconnect: segmented bus power modeling", Thirty-Second Asilomar Conference on Signals, Systems and Computers, Pacific Grove, CA, USA, 1-4 Nov. 1998.

  • [Kernighan70] B. Kernighan et al. “An Efficient Heuristic Procedure for Partitioning Graphs”, Bell System technical Journal Vol. 49 No.2, Feb. 1970 p291-307

  • [Hauck95] S. Hauck et al. “Logic Partition Orderings for Multi-FPGA Systems”, International Symposium on Field-Programmable Gate Arrays, 1995


Program goals1

Program Goals

  • Evaluation, exploration

    • power usage, performance, cost

    • alternative configurations, algorithms

  • Optimization

    • achieve most effective power usage

    • high-level, global knowledge

  • Tool integration

    • many point tools, independent techniques

  • Specialization

    • configurable platform

  • Reuse

    • take advantage of rich collection of COTS

    • not to re-design from scratch


Technical approach1

Technical approach

  • High-level abstraction

    • component vs. composition

    • Separate models for architecture and behavior

  • Synthesis and optimization of power manager

    • Architecture reconfiguration

    • Scheduling for optimal power usage

    • adaptable to different power management policies

  • Aggressive, domain-knowledge

    • Encompass mechanical / thermal power

    • Aware of power supply model


System level modeling

System level modeling

  • Architectural modeling

    • COTS components

    • component encapsulation

    • bus architecture

    • system interconnect

  • Behavioral modeling

    • Application specific knowledge

    • Software architecture

    • Mission goals

    • High level constraints


Power aware coordination

Power-aware coordination

  • Protocols

    • Coordinate power usage

      • e.g. peak power, resource arbitration

    • Multiple versions of given algorithm

  • Components

    • Adaptable to different power management policies, not hardwired

    • Usable in new applications even if not designed to be power aware!

  • Synthesis

    • Coordination controller (“mode manager”)

    • Optimization to minimize control dependency

    • Optimality depends on architectural mapping


Measuring power consumption 1

Measuring power consumption (1)

  • Different levels of analysis by

    • # of operations:

      • (+) easy to implement

      • (-) neglect of different sizes of modules

      • Appropriate to compare two different architectures with similar modules

    • # of lines of code:

      • (+) assume the size of hardware to be implemented

      • (-) may be too simple to estimate power consumption

      • With the number of operations, gives a indication of the power consumption of each module

    • # of F/F:

      • (+) more accurate measure

      • (-) should find the relationship between # of F/F and # of lines of code

      • The number of F/F is the lowest hardware characteristics in the high level simulator

      • Control unit and data path have different power dissipation pattern even with same amount of gates


Measuring power consumption 2

Measuring power consumption (2)

  • # of gates:

    • (+) Makes accurate power estimation possible

    • (-) needs Register transfer level (RTL) description and power analysis tools

    • To get accurate hardware information, we have to implement RTL modules

    • Input/output statistics of each module are also necessary


Usc s work in progress

USC's Work in Progress

  • Select a processor simulator

  • Analyze the hardware description of each module

  • Estimate the power consumption of each module

  • Find performance-power ratio

  • Design a minimum power processor model


Program impact transitions

Program impact & transitions

  • Productivity

    • Fully exploit off-the-shelf components

    • Rapid turnaround time to architecture

  • Massive Scalability

    • Protocol based power management

    • System architecture platform

  • Robust methodology

    • Unified functional/power correctness

    • Confidence in complex design points


Bus architecture perspectives x

Bus Architecture Perspectives (X)

  • Parallelism

    • Parallel:

      • high cost, high throughput, enable design exploration

    • Serial:

      • low cost, constrained throughput, simple bus interface

  • Locality

    • Functional

    • Spatial

  • Adaptivity

    • Adaptive

    • Deterministic


Firewire ieee 1394 bus

FireWire (IEEE 1394) bus

  • Service model

    • Physical layer

    • Link layer

    • Transaction layer

  • Communication model

    • asynchronous transfer

    • isochronous transfer

  • Arbitration model

    • Fair gap arbitration

    • Priority arbitration

  • Configuration model

    • Bus initialization

    • Tree identification

    • Self identification


Architectural model

Architectural Model

  • Component – parameterized COTS

    • Type – processor, memory, I/O, DSP, bus, and etc.

    • Interface – how the components can be connected to each other

    • Modes – operation modes parameters, voltage, clock speed, bandwidth, power consumption, and etc.

  • Package – a bundle of connected components that performs certain operation

    • A set of connected components

    • Internal/external interface – how components are connected

    • Modes – configuration space of the collected components specified by each component’s working mode and collective attributes, e.g., voltage, speed, power and etc.


Approach system level modeling

Approach: system-level modeling

  • High-level abstractions

    • Employ application specific knowledge in system models

    • Encompass multiple domains – electronics, mechanical, thermal

  • System modeling

    • Behavioral modeling – software architecture, application specific knowledge

    • Architectural modeling – hardware platform built on top of parameterized components

    • Partitioning – mapping behavioral objects to architectural structures

    • Scheduling – a valid sequence of concurrent/parallel operations on multiple processors that satisfies real-time requirement


Example mars rover1

Example – Mars Rover

  • System specification

    • 6 wheel motors

    • 4 steering motors

    • System health check

    • Hazard detection

  • Power supply

    • Battery (non-rechargeable)

    • Solar panel

  • Power consumption

    • Digital

      • computation, imaging, communication, control

    • Mechanical

      • driving, steering

    • Thermal

      • motors must be heated in low-temperature environment


Scheduling example mars rover

Scheduling example – Mars Rover

  • Power constraints

    • Solar panel: 14.9W peak power @ noon, 11W for 6hr/sol

    • Battery: 10W max power output. 150W-hr energy storage

    • CPU: 3.7W, constant for 4h/sol

    • Health check: 6.3W, 10s

    • Hazard detection: 7.3W, 10s

    • Heating: 7.5W (1 motor) or 11.3W (2 motors), 5s

    • Steering: 6.8W, 5s (7º/s)

    • Driving: 12.4W, 10s (7cm)

  • Existing solution

    • Serialize each operation to satisfy power constraint

    • Conservative – longer execution time and under utilization of solar power

    • No scheduling tool is used


Scheduling techniques1

Scheduling techniques

  • Constraint logic solving

    • Transfer all constraints into a pure mathematical form

    • Use tools to solve the problem in mathematical domain

  • Example – CLPR

    • Constraints

      • C1 > 3, C1 < 5, C2 > 2, C2 < 4 # two power consumers

      • C1 + C2 < S, S > 6, S < 12 # one power source

    • Inputs

      • C1 = 4.5, S = 7

    • Results

      • C2 < 2.5

      • 2 < C2


Evaluation

Evaluation

  • Application level evaluation

    • Metrics based on overall mission objectives

    • Constraint-driven solutions

  • Power related scenario

    • Various power constraint (supply/consumption) over different stages of application

    • Power-aware adaptive scheduling for different stages


  • Login