From Adaptive to Self-Tuning Systems

Sudhakar Yalamanchili, Subramanian Ramaswamy and Gregory Diamos School of Electrical and Computer Engineering From Adaptive to Self-Tuning Systems

Power ILP Leakage current increases 7.5X with each generation [3] Pipeline in-order OOO aggressive OOO Architectural Challenges • Negative returns with power • Increasing inefficiencies due to • speculation • control flow Frequency Wall Power Wall Not much headroom left in the stage to stage times (currently 8-12 FO4 delays) [4] Single Thread Performance Memory Wall Source:http://techreport.com/reviews/2005q2/opteron-x75/dualcore-chip.jpg • Cache Area • 80% of transistor budget  50% of total area [1] • Defects in cache affect processor yield • Significant power consumers (e.g. > 40% of total power in Strong ARM)[2] • On-chip-DRAM gap continues to grow Economic Wall • Costs of developing next generation processors • Design & Manufacturing costs • Extreme Device Variability • P. Ranganathan, S. Adve, N. Jouppi. Reconfigurable Caches and their Application to Media Processing. ISCA 2000 • Michael Zhang, Krste Asanovic “Fine-Grain CAM-Tag Cache Resizing Using Miss Tags” ISLPED 02 • S. Borkar “Design Challenges of Technology Scaling” Micro 1999 • Vikas Agarwal, M. S. Hrishikesh, Stephen W. Keckler, Doug Burger. Clock rate versus IPC: the end of the road for conventional microarchitectures. In ISCA 2000

Large scale P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M System View 1. Capture and adapt to intrinsic application behavior Dynamic, on-line, evolutionary behaviors Static, off-line characterizations Many-core, Heterogeneous System 2. Device-Level Variations reduce architecture yield Solution: Systems are self-tuning

Ill- Structured Workloads Structured Workloads Rigid, HW/SW Boundaries Evolutionary or Self-Tuning Systems P P P M M M P M Traditional Architectures (Fixed) Architectures Change At SW-determined Points of Execution The Space of Solutions State of the Practice P M Architectures continuouslyautonomously evolve and adapt Ability to Customize Architectures Before Application Deployment

From Adaptive to Self Tuning • Where do we make future investments in transistors and software? • Hardware software co-design for continuous monitoring and/or tuning • Expose and (dynamically) eliminate design redundancies • Two Examples • Cache memory hierarchy • On-Chip Networks

Generational Behavior of Caches Memory Lines miss Idle interval hit new generation new generation Time 1. Kaxiras, S., Hu, Z. and Martonosi, M., "Cache Decay: Exploiting Generational Behavior to Reduce Cache Leakage Power“ ISCA 2001 2. Jaume Abella, Antonio Gonzlez, Xavier Vera, Michael F. P. O'Boyle “IATAC: a smart predictor to turn-off L2 cache lines.” TACO 2005

Cache Tuning: Conceptual Model • Remap memory into the cache  shape the cache • Match the program footprint  resize the cache

y z x Cache Tuning: System Model & Opportunities statement Static analysis or programmer supplied statement Placement( B[][], param ) Structured accesses remapping directive Region A loop Placement( B[][] , param) statement statement Profile based insertion end loop P L1 Run-time tuning Thread 2 Thread 1 L2 AT logic LUT Alternative implementations M

Static Tuning: Scientific Applications • Targeted to programs with predictable access patterns • Compiler can both resizeand remap • Advanced compiler optimizations made possible

Dynamic Tuning: Folding Heuristics • Find and utilize redundancies in the design • Miss folding fold misses via re-mapping memory lines into the same cache set Comparisons shown for a 256KB L2 cache S. Ramaswamy, S. Yalamanchili. Improving Cache Efficiency via Resizing + Remapping. ICCD 2007

Tuning for Yield: Decreasing Defect Sensitivity* • Performance Yield  yield at a given performance (e.g. AMAT) for 1000 units • Up to four times greater than modulo placement • Exploiting redundancies  application to power management Recovering Design Inefficiencies S. Ramaswamy, S. Yalamanchili,“Customizable Fault Tolerant Caches for Embedded Processors,” ICCD 2006

Opportunities • Voltage scaling • Combine voltage scaling and remapping for program phase dependent power management • Compiler-directed hardware optimizations • For example concurrent data layout + cache placement • Application to multi-threaded and multi-core domains • Cache sharing across threads • Challenge: coherency traffic

The On-Chip Network • The network is in the critical path (performance) • Operand networks • Cache hierarchy • System on Chip • Increasing impact of wire (channel) delays • Wire delays must be actively managed • On-demand resource management • Initial studies: link tuning • Reference: Research at EPFL & Stanford on robust link design

A System for Tuning and Actively Reconfiguring SoC Links (STARS) Too Fast Well Tuned Too Slow Latch 1 Value 1 Value 2 Latch 2 Value 1 Value 2 Latch 3 Value 1 Value 2 Time • Variable delays and and cascaded registers measure link delay • Digital PLL tunes the clock to match the link delay

FPGA Tests Monitoring Find End of Link Transition Find Start of Link Transition Tuning Adjust Clock Frequency Determine Slack In the Link • Low speed tests to validate the control strategy

Prototyping: 180nm • Variable Delay Elements (VDE) • Variable delay from 118ps to 1.47ns • 10 bits of resolution • 502 transistors • Digitally Controlled Oscillator (DCO) • Clock period from 240ps to 2.97ns • 10 bits of resolution • 528 transistors • Digital Clock Divider (DCD) • Min input clock period 480ps • 8 bits of resolution • 1127 transistors • Allows tuning links up to 2.083 GHz • From reference clock of 8.13MHz

Extensions • Modulate link widths • Modulate buffer organizations • Channels/depth • Feedback between local congestion detection and link and buffer resources

Summary • Application demands will be time varying • Technology will introduce time-varying hardware characteristics • Continuous cooperative HW/SW tuning provides a methodology for addressing these concerns • Need the support of abstractions for tuning • Influence of prior applications to datapaths (Razor-UMich), communication systems (Vizor-GT), and reliable links (Stanford/EPFL) • Build on existing research in cache performance & power management

From Adaptive to Self-Tuning Systems

From Adaptive to Self-Tuning Systems

Presentation Transcript

Self-Adaptive Intelligent Sensors and Systems: From Theory to Practical Design

Self Adaptive Software

Adaptive Systems

Adaptive Systems Lecture 8: Artificial Adaptive Systems

Self tuning regulators

TOWARDS SELF-ADAPTIVE SOFTWARE-INTENSIVE SYSTEMS

Design principles for adaptive self-organizing systems

FUSION: A Framework for Engineering Self-Tuning Self-Adaptive Software Systems

Self-Learning, Adaptive Computer Systems

A Self-Tuning Cache architecture for Embedded Systems

Self-adaptive DSP Software

Adaptive Information Systems: From Adaptive Hypermedia to the Adaptive Web

Adaptive Self-Tuning Memory in DB2

Separation of concerns in self-adaptive systems Self-adaptation in decentralized systems

SELF TUNING OF CONTROLLERS

Self-Tuning Musical Chime

Dependable, Self-Adaptive, Self-Healing, Distributed Systems through Reflection

Modeling Behavior of Self-Adaptive Systems

Self-Tuning Database systems

Self tuning regulators

Adaptive immunity Specificity Memory Distinguishes self from non-self

Variability in Self-Adaptive Systems