Peter M. Kogge: CSE Dept. University of Notre Dame kogge@cse.nd Kanad Ghose: CS Dept.

Morphable Computer Architecturesfor Highly Energy Aware Systems:PACC Kickoff: May 23, 24, 2000; Scottsdale, AZ Peter M. Kogge: CSE Dept. University of Notre Dame kogge@cse.nd.edu Kanad Ghose: CS Dept. SUNY-Binghamton; ghose@cs.binghamton.edu Nikzad “Benny” Toomarian: Center for Integrated Space Microsystems (CISM) Jet Propulsion Lab; benny@cism.jpl.nasa.gov

MORPH: Dynamic Low Energy Architectures • New Ideas • Multi-cluster microarchitecture to allow dynamic changes in energy expended per cycle • Energy efficient ISA extensions to process data more energy efficiently • Energy efficient morphable memory hierarchies • Adaptive algorithms to select best configuration • Energy aware run-time which can reconfigure system MORPH Adds An “Energy Gear” to Embedded Systems • IMPACT • Changes focus to energy, not power, management • Adds extra degrees of freedom to dynamic energy control • Provides an inherently more energy efficient architecture • Designed with real embedded missions in mind 0 6 mo 1 yr 18 mo 2 yr Profiles Baseline Morphable Node Data Placement Adaptive Algorithms Run-time Demo & Eval

Why is PACC Important? • Real world: limited energy sources • Renewable energy: 12-15 watts at high noon • Fixed capacity batteries for off-peak sunlight or emergencies in shade • Multiple operational modes, all compute/energy constrained • Movement: collision avoidance • Spectroscopy: data gathering vs analysis • Communication: compression vs transmission • Today: • Select computers for peak performance needs • Limited ability to “downshift”

The Future at the Low End: Microexplorers Extremely limited energy sources => Peak computing only when absolutely necessary SENSORS COMMUNICATION TEMPERATURE CONTROL STRUCTURE ADVANCED MOBILITY COMPUTING POWER NAVIGATION 10 kg 1 kg 2002? 100 gm 2007? 10 gm 2012? 1997

“Larger” Systems Have More Diverse Energy/Performance Profiles RLV Hydrobot Nano-Spacecraft Integrated Inflatable Sailcraft Atmospheric Probes Nano-Rovers Distributed Sensors Penetrators

Recasting The Classical Power Equation Power = 1/2 x C x T x V2 Energy/sec Logic transitions/sec Energy/cycle x cycles/sec transitions/cycle x cycles/sec EnergyPerCycle = 1/2 x C x Na x V2 EPC is independent of clock rate! Lowering EPC is our focus!

Why is This Important? • Power = EPC x F • Performance = IPC x F • Today’s designs: Performance/Power = IPC/EPC • EPC & IPC are fixed at design time (other than voltage scaling) • THUS: Ratio is fixed at design time • Only runtime “knobs” are V and F • Real embedded scenarios: • Short periods of very high peak performance need => high IPC • Followed by long periods of much lower performance need • Result: long periods of lower performance still running at inefficient EPC!! F = cycles/second

This Project:A “Morphable” System Architecture • Today’s microarchitectures: EPC = IPCkwhere k>>1 • Our approach: • Inherently lower EPC (lower k) • With variable IPC (in turn varying EPC) • Thus IPC/EPC can be varied dynamically • Lowering IPC lowers EPC even more • Result: additional runtime “knobs” to run-time energy management • Adjust configuration so IPC x F matches performance needs • Reap energy savings of lower EPC Allow systems to change the “Energy Gear” on demand!

The Team • Overall Goals: • Architectures with variable IPC, EPC • Tools & S/W to manage morphing • Realistic demonstrations Peter Kogge Vincent Freeh Jay Brockman • UNIVERSITY • OF NOTRE DAME • Morphable multi-cluster architecture • “At the sense amps” ISA extension • Runtime with hooks for dynamic morphing control Kanad Ghose Energy Aware Data Placement • SUNY-BINGHAMTON • Morphable Caches, RFs • Energy Eff VLIW archs • Supporting compiler techniques • JET PROPULSION • LABORATORY • Scenarios & benchmarks • Baseline characterizations • Runtime adaptation algorithms Nikzad Toomarian Mohammed Mojarradi Savio Chau

Project Components • Morphable, inherently low EPC design • Memory system allowing both width and placement shaping • Dynamic algorithms to select best “shape” for current energy/performance profile • Augmented run-time to allow dynamic reconfiguration

Our Background • NSF MIPS: Inherently Low Power Architectures • The Multi-cluster microarchitecture • Cache-In-Memory • Energy Efficient Caches • IEEC Binghamton: Reducing power on interconnects • DARPA Processing-In-Memory Projects: HTMT & DIVA • Utilizing wide bandwidth on-chip storage macros • Data placement in deep memory hierarchies • Multi-threading • NASA • X2000: highly scalable low power systems for deep space missions • Evolvable Computing Program: adaptive algorithms to select system parameters to meet some mission objective

How Power Explodeswith Conventional Designs

Starting A Solution:Multi Cluster Architecture (c) New Multi Cluster (a) Simple Pipeline (b) Classical Superscalar w(IW/w)k << (IW)k w Clusters Issue Width (IW) IW/w Problem: single large centralized register files with many ports Solution: multiple smaller register files with few ports EPC/IPC ~ (IW)k k as high as 1.9

Multi-Cluster vs Conventional Results Conventional 1x8 2x6 4x4 1x6 2x4 1x4 4x2 Up to 1/2 the energy at same IPC, or 20% better IPC at same energy

Insertion into PACC • Implement CPU as nominal 4 cluster configuration • Modify Instruction Issue to target variable # of clusters • Equivalent need for separating memory disambiguation units • Make this a runtime settable parameter • Unused clusters turned off • Additional CPU options • Implement selected subset of “wide word” & VLIW-like operations within a cluster • Utilize unused clusters for additional concurrent threads

Another Starting Point:Low Energy Caches & Register Files • Approach: exploit locality to reduce energy requirements of on-chip storage resources: • Example: multiple line buffers:

Storage System Morphs • Exploit locality to reduce dynamic AND static energy dissipations of on chip storage resources: • Selective substrate biasing to reduce leakage – reverse body bias removed when storage component is accessed • Clustered data placement to maximize access to each partition within on-chip and off-chip RAMs • Compiler/OS prefetching to avoid/reduce turn-on delay • Changeable Widths of Interconnect & Storage Resources • Sub-banking for caches and on-chip/off-chip RAM • FU-driven selection of activation width of dispatch buffer and reservation stations, data register files • Operand-width driven activation of FU slices

ISA Extensions with Energy Reduction Potential • VLIW-like multiple move instructions • Use compiler to optimize number of moves/energy • Useful for many signal processing loops, numerical computations • “Wide word” multiple operation per instruction • Utilize existing bandwidth more completely • Inclusion of simultaneous multi-threading extensions • Allow for pipelines without costly hazard detection/forwarding

Run-Time Considerations • Application must have freedom to provide • expected energy/performance of code • requests for levels of service • But, only run-time sees global picture • All current running applications & their requests • Existing energy/power resources and mission profiles • Measurements on current activities • Run-time modifications: changing the “energy gear” • Number of clusters per thread • Number of threads • Active width of on-chip storage resources & substrate biases • Active width of off-chip memory & interfaces • Placement of data within hierarchy

Determining the Gear:Reconfiguration Algorithms Outgrowth of JPL’s Evolvable Computing Program • Objective: • Develop reconfigurable computing capability which will allow: • Self-reconfiguration and adaptation to unforeseen conditions • Faster, cheaper development cycles • Approach: • Use powerful parallel searches (e. g. genetic algorithms, neural nets, etc.), possibly including hardware, to determine the optimal performance. • Payoff: • Achieve high autonomy on-board spacecraft • The best schedule for highest science return with lowest power consumption • Maintain functionality under changes in operating conditions

0 6 mo 1 yr 18 mo 2 yr Profiles Baseline Morphable Node Data Placement Adaptive Algorithms Run-time Demo & Eval Program Plan Optional 3rd year: high level design & demo on FPLA or MOSIS prototype of run-time investigation of needed program development environment demo in JPL test bed analysis for insertion into real JPL mission

Expected Deliverables • Benchmark suite & corresponding mission energy profiles • Detailed morphable architecture • System simulator with energy & performance projections & evaluation against profiles • Demonstration of data placement & architectural adaptation algorithms • Specification of energy aware run-time & API

Some Recent References • Zyuban, Victor and Peter M. Kogge, “Inherently Lower-Power High-Performance Superscalar Architectures,” submitted to IEEE Trans. On Computers • Zyuban, Victor and Peter M. Kogge, "Optimization of High-Performance Super-Scalar Architectures for Energy-Delay Product," accepted for ISPLED 2000 • K. Ghose, “Reducing Energy Requirements for Instruction Issue and Dispatch in Superscalar Processors”, accepted for ISLPED 2000 • K. Ghose and M. B. Kamble, “Reducing Power in Superscalar Caches Using Subbanking, Multiple Line Buffers and Bit-Line Segmentation”, ISPLED’99, pp. 70-75. • Zyuban, Victor and Peter M. Kogge, "The Energy Complexity of Register Files,” ISPLED’98, pp.305-310. • K. Ghose and M. B. Kamble “Energy-efficient Cache Organizations for Superscalar Processors”, Workshop on Power-Driven Microarchitecture, in conjunction with ISCA’98 • Zyuban, Victor and Peter M. Kogge, "Split Register File Architecture for Inherently Lower Power Architectures," Workshop on Power-Driven Microarchitecture, in conjunction with ISCA’98. • Zawodny, Jason T., Jay B. Brockman, Peter M. Kogge, Eric Johnson, "Cache-In-Memory: A Lower Power Alternative," Workshop on Power-Driven Microarchitecture, in conjunction with ISCA’98. • M.B. Kamble and K. Ghose, “Analytical Energy Dissipation Models for Low Power Caches, “ ISPLED’97, pp. 143-148. • M.B. Kamble and K. Ghose, “Energy-Efficiency of VLSI Caches: A Comparative Study,” IEEE 10-th. Int’l. Conf. on VLSI Design, Jan. 1997, pp. 261-267.

“Just enough energy”

Peter M. Kogge: CSE Dept. University of Notre Dame kogge@cse.nd Kanad Ghose: CS Dept.

Peter M. Kogge: CSE Dept. University of Notre Dame kogge@cse.nd Kanad Ghose: CS Dept.

Presentation Transcript

Research Experiences for Teachers at Notre Dame

Purpose and Well-Being in Adulthood

Food Safety Training

Current Trends in Engineering Education

Notre Dame de Paris

Food Safety Training

Episode 9: Notre Dame

The Hunchback of Notre Dame

Xiao Fang University of Notre Dame

Transaction Department (TX Dept)

Monte Carlo Simulation of Ising Model and Phase Transition Studies

An Introduction to Grid Computing Research at Notre Dame

Notre Dame Biology Club

Crew Resource Management/ Personal Survival

The QuarkNet Center at Notre Dame

Prashant V. Kamat Radiation Laboratory and Dept Of Chemistry and Biochemistry

Instrument Flying

NOTRE DAME RC GIRLS’ SCHOOL