1 / 24

Peter M. Kogge: CSE Dept. University of Notre Dame kogge@cse.nd Kanad Ghose: CS Dept.

Morphable Computer Architectures for Highly Energy Aware Systems: PACC Program Review: Nov. 1-3; Annapolis, MD. Peter M. Kogge: CSE Dept. University of Notre Dame kogge@cse.nd.edu Kanad Ghose: CS Dept. SUNY-Binghamton; ghose@cs.binghamton.edu Nikzad “Benny” Toomarian:

abla
Download Presentation

Peter M. Kogge: CSE Dept. University of Notre Dame kogge@cse.nd Kanad Ghose: CS Dept.

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Morphable Computer Architecturesfor Highly Energy Aware Systems:PACC Program Review: Nov. 1-3; Annapolis, MD Peter M. Kogge: CSE Dept. University of Notre Dame kogge@cse.nd.edu Kanad Ghose: CS Dept. SUNY-Binghamton; ghose@cs.binghamton.edu Nikzad “Benny” Toomarian: Center for Integrated Space Microsystems (CISM) Jet Propulsion Lab; benny@cism.jpl.nasa.gov

  2. Outline • Quad Chart • “Gear-Shifting” Simplified • The Morph Program • The Morph Architecture • Test Bed & Benchmarks

  3. New Ideas • Morphable microarchitecture to allow dynamic changes in energy expended per cycle • Energy efficient morphable memory hierarchies • Energy efficient ISA extensions to process data more energy efficiently • Adaptive algorithms to select best configuration • Energy aware run-time which can reconfigure system MORPH Adds An “Energy Gear” to Dynamically Configurable Embedded Systems • IMPACT • Focus on energy, not just power, management • Develops suite of widely applicable energy-reducing architectural techniques • Adds extra technology-independent degrees of freedom to dynamic energy control • Provides an overall inherently more energy efficient embedded computing system • Designed for transfer to real missions 5/00 11/00 5/01 11/01 5/02 Profiles Baseline Morphable Node Data Placement Adaptive Algorithms Run-time Demo & Eval MORPH: Dynamic Low Energy Architectures

  4. What is “Gear-Shifting” all about? • Definitions: • IPC = Instructions per Cycle • EPC = Energy per Cycle • C = Cycles per Second • Performance = “Instructions/second” = IPCxC • Power = “Energy/second” = EPCxC • M = performance required during some mode (instructions/second) • Real world: performance needs change very dramatically • Observations on Conventional Designs: • Conventional designs fix IPC at some IPCmax to meet peak need • In such designs EPC = KxIPCa, where “a” can range to almost 4 • Assume arbitrary clock selection (up to a maximum clock Cmax) • Ignore Vdd changes for now • Power @ M = KxIPCmaxax(M/ IPCmax) = KxMxIPCmaxa-1 • Dependent on clock only thru M

  5. Some Simplified Gear Equations • Assume IPC smoothly changeable from IPCmin to IPCmax • Let R = (IPCmax/IPCmin) = “dynamic ratio” of performance range • Let g be a gear setting, ranging from 0 to 1 to change IPC • IPC(g) = IPCmin + (IPCmax - IPCmin)g = IPCmax[1/R + (1-1/R)g] • EPC(g) = Kx{IPCmax[1/R + (1-1/R)g]}a • Power(g, C) = K x {IPCmax[1/R + (1-1/R)g]}a x C GEARS Large R: OUR CHALLENGE

  6. Cmax 1 G C 0 0 Performance Rqmt Performance Rqmt 0 0 Imax x Cmax Imax x Cmax Imin x Cmax Imin x Cmax A Gear-Shifting Strategy To minimize power as we vary performance requirement M: • Use most efficient IPCmin as long as possible (until clock at maximum) • G = 0 • Then smoothly vary g while using Cmax

  7. The Result Ratio of Power under optimal gear change to conventional fixed IPC Power 1 (1/R)a-1 0 Potentially huge for large R And we can still use all the other tricks to lower peak power! Power Savings Factor Huge savings if applications spend most time here Performance Rqmt M 0 IminCmax ImaxCmax

  8. The Morph Program • Develop a microarchitecture with a large dynamic R • “Multi-cluster” superscalar CPU • Intelligent placement of data within mixed memory type hierarchy • Inherently low energy caches • Low energy ISA extensions • Define & use a realistic embedded benchmark suite • Drawn from deep-space processing needs - initially rovers • Include other DARPA benchmarks such as from DIS • Baseline on variety of systems • Develop real-time algorithms for reconfiguration • Demonstrate potential gains via simulation • Simplescalar + energy models • Technology transfer to potential future JPL missions

  9. The Team • Overall Goals: • Architectures with variable IPC, EPC • Tools & S/W to manage morphing • Realistic demonstrations Peter Kogge Vincent Freeh Jay Brockman • UNIVERSITY • OF NOTRE DAME • Morphable multi-cluster architecture • “At the sense amps” ISA extension • Runtime with hooks for dynamic morphing control Kanad Ghose Energy Aware Data Placement • SUNY-BINGHAMTON • Morphable Caches, RFs • Dynamic Bit Slicing • Energy Eff VLIW archs • Supporting compiler techniques • JET PROPULSION • LABORATORY • Scenarios & benchmarks • Baseline characterizations • Runtime adaptation algorithms Nikzad Toomarian Mohammed Mojarradi Savio Chau

  10. Starting A Solution:Multi Cluster Architecture (c) New Multi Cluster (a) Simple Pipeline (b) Classical Superscalar w(IW/w)k << (IW)k w Clusters Issue Width (IW) IW/w Problem: single large centralized register files with many ports Solution: multiple smaller register files with few ports EPC/IPC ~ (IW)k k as high as 1.9

  11. EEPROM FLASH DRAM SRAM Energy-aware data placement Alternative ISA features Embedded+external memory Dynamic issue width Dynamic ALU width Low energy caches Selective substrate bias Dynamic data path width Target Morph Configuration Variable multi-cluster microarchitecture

  12. PACC Benchmarks + Today’s Performance Only Design Point + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Energy Efficient Family + + + + + + + + + + + + + + + EPC: Energy per Cycle + + + + + + + + + IPC: Instructions per Cycle Evaluation Methodology

  13. Multi-Cluster vs Conventional Results Conventional Morph: dynamically change the cluster size & ride the EPC/IPC Savings 1x8 2x6 4x4 1x6 2x4 1x4 4x2 Up to 1/2 the energy at same IPC, or 20% better IPC at same energy

  14. On-chip Caches: Addressing Dynamic & Static Leakage • On-chip caches dissipate 25% to 45% of total energy • Likely to increase because of leakage • Added line buffers (4 to 16) reduce dynamic energy dissipation by 40% to 65+%, with no penalty in access time and with 4% to 6% area penalty • Use of dynamic activation of recently-accessed L2 cache areas reduce dynamic dissipation component by 40% to 80% • Only selected areas of L2 in active mode, rest in standby • Size of bit-cell groups controlled is critical • Additional L2 area penalty of approx. 8% • Heuristics for controlling transitions between active & standby modes

  15. Addressing Dynamic & Static Dissipations in Caches

  16. Exploiting Bit-Slice Inactivity in Datapaths • Expectation: Higher-order data bits likely to be insignificant at least some of the time • Opportunity: exploit byte slice inactivity over transfer paths, within storage devices (register files, caches) & function units FOR SPECfp95 DP FOR INTEGERS FROM SPECfp95 A circuit to provide read-enables in RFs to avoid energy dissipation on access

  17. Deep Space: The Ultimate Power-Constrained Embedded System • Limited energy/power sources • Renewable variable power: Solar cells • Constant power: RPGs • Fixed energy: batteries • Multiple operational modes, all compute/energy constrained • Cruise • Communication: compression vs transmission • Data gathering vs analysis • Movement: collision avoidance • Today: • “Pre-canned” power management by serialized operations Morph Initial Focus: Rovers

  18. Energy Required Function Time and Calculation 7.51W-hr 5.63W-hr 6.92W-hr 1.83W-hr 0.45W-hr 1.2W-hr 5.2W-hr 0.63W-hr 15.0W-hr 50W-hr 95W-hr motor heating: 1 motor at a time motor heating: 2 motors at a time driving (extreme terrain @ -80degC) hazard detection imaging (3 images @ 2 min/image) image compression (compress 3 images @ 6 min/image) 6Mbit communication @ 50min/sol 42, 10 sec health checks during day remainder of 7 hr daytime CPU operation WEB heating (as needed) = 7.51W x 1hr = 11.26W x 0.5hr = 13.85W x 0.5hr = 7.33W x 0.25hr = 4.5W x 0.1hr = 3.7W x 0.3hr = 6.27W x 0.8hr = 6.27W x 0.1hr = 3.7W x 4hr = 50W-hr Pathfinder Sojourner vs peak 15 W-hr Solar Cells + 150 W-hr non-rechargeable battery • Effects on application code: • Many actions sequential, not simultaneous • No dynamic scheduling, no autonomy • Not even CPU-clock management • Nowhere near enough CPU performance • Designed to limit worst case power • Dump excess power into heaters

  19. Athena/Mars ’03 Rovers Rover Configuration Pancam/Mini-TES • 3 Hrs/day of solar @ 50 W • 5 amp hr 16V batteries • More complex communication • More complex on-board eqpt • Still statically scheduled Instrument Arm Cluster : Raman Spectrometer Alpha-Proton-X-Ray Spectrometer (APXS) Mössbauer Spectrometer Microscopic Imager Mini-Corer

  20. MUSES-CN Asteroid NanoRover • To run a command: • Determine available solar power. • Minimum required power = device + CPU power • If available power < minimum required: • if parameter enables re-orienting , re-orient to maximize solar power • if still not enough and parameter enables waiting, wait up to parameter limit for solar power • if still not enough, abort command • Set CPU speed to maximum allowable based on (power available) - (minimum needed for devices) • Perform command: during command execution, if power drops significantly (or load shed indication?...): • CPU speed is reduced to minimum required • Operate motors one-at-a-time • Return CPU speed to parameter-specified idle • Still “sequential” operation • Solar powered @ 1 watt • including RF telecommunications system for communications to lander or small-body orbiter for relay to Earth. • Clock-adjustable CPU speed

  21. Oscilloscope Logic Analyzer PowerPC 750 NT Box Ethernet Some Morph Test Beds • Different PowerPC configurations • Microarchitecture • Clock rates • ISA extensions • Run rover/PACC application code • Measure time/power • Use as input to Simplescalar simulation • PACC-Blue • 400MHz PPC 7400 • Enhanced superscalar + Altivec • Linux • PACC-Gold • 400MHz PPC 750 • Linux • JPL PPC-SBC • 200 MHz 750 • VxWorks

  22. high-rateinput symmetric multiprocessor modules reconfigurable hardware blocks communication module (CDMA) (camera) high-speed bus (e.g. IEEE 1394) low-speed bus (e.g. I2C ) bus power controller microcontroller-directed subnet - power regulations & control - analog telemetry sensors - safety inhibits - valve & pyro drive altimeter subnet The NASA X2000 Avionics System • Design for 10-20X reduction in power, at 10-20X performance increase • With long-term survivability & technology scaling • Application-specific adaptive configuration to match run-time power supply constraints

  23. PCI Bus analyzer Current Meter Current Meter Current Meter Micro Gyro Built-In Power Supply Built-In Power Supply Built-In Power Supply cPCI bus (6U chassis) cPCI bus (6U chassis) cPCI bus (6U chassis) PMC EPP Adapter PMC PMC PMC PPC 750 (Synergy) 1394a I/F (Saderta) 1394a I/F (Saderta) Dual I2C I/F (JPL) Empty Slot Empty Slot GPIB PPC 750 (Synergy) 1394a I/F (Saderta) 1394a I/F (Saderta) Dual I2C I/F (JPL) Empty Slot Empty Slot FPGA Rapid Prototype PPC 750 (Synergy) PPC 750 (Synergy) PPC 750 (Synergy) Empty Slot PPC 750 (Synergy) 1394a I/F (Saderta) 1394a I/F (Saderta) Dual I2C I/F (JPL) Empty Slot Empty Slot Hard Drive Hard Drive Hard Drive Terminal Server SUN Ultra 10 Workstation SUN Ultra 10 Workstation Pentium III w/1394a analyzer (Saderta) Pentium III w/1394a analyzer (Saderta) Hard Drive Hard Drive SUN E3500 Workstation (35 GB HD) PPC 750 (Synergy) PPC 750 (Synergy) 1394a I/F (Saderta) 1394a I/F (Saderta) Dual I2C I/F (JPL) Empty Slot Empty Slot Empty Slot PPC 750 (Synergy) PPC 750 (Synergy) 1394a I/F (Saderta) Dual I2C I/F (JPL) Empty Slot Empty Slot Empty Slot 1394a I/F (Saderta) Current Meter Current Meter cPCI bus (6U chassis) cPCI bus (6U chassis) Built-In Power Supply Built-In Power Supply Legends Ethernet RS232 COTS IEEE 1394 I2C SCSI IEEE 488 COTS Support Equipment JPL In-House Product Outlets for power measurement X2000 FD Testbed with Power Awareness

  24. Near Term Activities • Extract Rover application code • Run on SBC & Apples for baseline data • Continue microarchitectural design and simulation • Continue activities not mentioned here • Instruction annotation for energy-aware data access • Benchmark analysis for data placement • ISA extensions

More Related