VIFI-CMP: Variability-Tolerant Chip-Multiprocessors for Throughput & Power Wan-Yu Lee Iris H.-R. Jiang GLSVLSI-09 20

VIFI-CMP:Variability-TolerantChip-Multiprocessorsfor Throughput & PowerWan-Yu LeeIris H.-R. JiangGLSVLSI-092009/05/06

Outline • Introduction • Chip-multiprocessor architectures • Models • Monte Carlo analysis • Experimental results • Conclusion VIFI - GLSVLSI-09

Introduction

Process Variability • Process variability is growing as technology advancing • Transistor level • Unexpected deviations on • Threshold voltage Vth • Effective channel length Leff • System level • Unwanted degradation on throughput and power

Chip-Multiprocessors (CMPs) • Chip-multiprocessors (CMPs) • Integrate multiple microprocessors (cores) into a single chip • The # of cores is increasing as technology scaling • Multi-threaded multi-core design • Process variability in CMPs? • Restricted the impact within each core • Process variations cause core-to-core variations • Frequency core L2 cache

Target • Modeling process variability in CMPs • Analytical model • Monte Carlo analysis • Mitigating the impact of process variability • Reducing the throughput degradation caused by process variation • Reducing dynamic power consumption

CMP Architectures Globally-clocked Frequency island Voltage island Voltage island frequency island

Globally-Clocked (GC-CMP) • Using general clock for all cores • Clock frequency is dominated by the slowest core. core L2 cache

Frequency Island (FI-CMP) • Adding frequency island to each core • The granularity of frequency bins is 10% of the nominal value. • Feasible frequency (FF) • Original frequency of a core • Operating frequency (OF) • Core frequency with frequency island • OF≦FF • Example: a 3.7GHz core with nominal value of 4GHz will be assign a frequency island 3.6GHz

Voltage Island (VI-CMP) • Adding voltage island to each core • Meeting the frequency of nominal design by voltage scaling • Nominal design (Nom): design without process variation • The granularity of voltage bins is 10% of the nominal value. • Example: a 3.7GHz core to reach its nominal value 4GHz → adding VI to change supply voltage from 0.9V to 0.99V

Voltage Island Frequency Island (VIFI-CMP) • Adding both voltage and frequency island • Taking advantages of slack between FF and OF of each core • If FF ＞OF, the supply voltage can be lower. • Naturally collaborating with dynamic voltage and frequency scaling (DVFS)

Models Transistor model Process variations Throughput model Power model

Transistor Model • Alpha-power law model • Drain-situation current • Transistor delay • The transistor delay is related to Vth and Leff.

Process Variations (1/2) • Within-die (WID) • Systematic components • Caused by lithography issue • Modeled by normal distribution with spherical spatial correlation • Random components • Random dopant fluctuation • Modeled by normal distribution • Die-to-die (D2D) • Offset of within-die variation

Process Variations (2/2) • Spatial correlation • Chip is divided into n×mgrids • ITRS projected the variation on Leff is as half of Vth • Leff can be obtained from Vth

Throughput Model • TP(Nthreads): total system throughput • ML2(SL2): the miss rate of L2 cache • SL2 : the effective L2 cache size • Ls: the unloaded service latency of the L2 miss path • : the frequency of thread i • : stall time of link component • tdram: stall time of DRAM

Power Model • Pdyn(Nthreads): total dynamic power • : dynamic power of a computation component • pdram and : power penalty when L2 cache miss occurs • ci : self loading capacitance

Monte Carlo Analysis

Monte Carlo Methods • Computational algorithms • Rely on repeated random sampling to compute their results • Used when it is infeasible or impossible to compute an exact result with a deterministic algorithm • Suited to calculation by a computer • Rely on repeated computation and random or pseudo-random numbers

Box Muller Transform • Generating normal random distributions by uniform distributions • Taking two samples from the uniform distribution on the interval (0, 1] • One is used for radius, the other is for angle • Generating two independent normal random distributions • x=Rcosθ, y=Rsinθ y x

Monte Carlo Analysis • Generating 1000 variation maps of Vth and Leff • Flow • Using Box Muller transform to generate n×m normal random numbers of and • Adding spatial correlation • By Vth and Leff, Tg of each grid can be obtained • Partitioning grids into corresponding #cores • FF of each core is the inverse of its slowest delay core core core core core core core core

Experimental Results

Experimental Setting (1/2) • Benchmark • 4 cores: Small(S), Medium(Me), Large(L), Monolithic(Mo) • 16MB shared L2 cache • Nthreads ranges from 1 to 70.

Experimental Setting (2/2) • Technology: 22nm • Operating point • α=1.3 • =22nm, =0.9V, =0.094V • Granularity • Fclk=10% Fclk • VDD=10%VDD • Variability • =6.4%, =0.0% • φ=0.5, n×m=100×100

Experimental Results—Vth & Leff • The distributions of Vth and Fclk are generated by Monte Carlo analysis • Over 1000 variation maps • Vth→Frequency→Throughput

Experimental Results—Throughput (1/2)

Experimental Results—Throughput (2/2) • Saturated when Nthreads> #cores • Throughput degradation • Mo>L>Me>S • Small core has the best variability-tolerance

Experimental Results—Power • Core type: Mo>L>Me>S • Small core has the best power efficiency. • CMP: VI>Nom>GC>FI>VIFI • VIFI-CMP has the best power efficiency.

Tradeoff between Throughput and Power • Best tradeoff between throughput and power occurs when Nthreads=#cores • Nthreads=70, S/VIFI has 0.06% throughput degradation and saves 36.27% dynamic power • VIFI-CMP has the best variability-tolerance • Same throughput as FI-CMP • Lower power consumption than Nom

Conclusion • This work characterized process variability on throughput and power for CMPs. • Results showed that • Small core has the best variability-tolerance. • VIFI-CMP has the best variability-tolerance on throughput and power and can naturally collaborate with DVFS. • VIFI-CMP even can have lower power consumption than the nominal design.

References • Monte Carlo method, Wikipedia, Available: http://en.wikipedia.org/wiki/Monte_Carlo_method. • J. S. Liu, “Monte Carlo Strategies in Scientific Computing.” Springer, 2008. • S. Herbert and D. Marculescu. “Characyerizing Chip-Multiprocessor Variability-Tolerance.” In the Proceedings of Design Automation Conference, pp.313-318, 2008. • S. Sarangi, B. Greskamp, R. Teodorescu, J. Nakano, A. Tiwari, and J. Torrellas, “VARIUS: a model of process variation and resulting timing errors for microarchitects,” IEEE Trans on Semiconductor Manufacturing, 21(1), pp. 3-13, 2008. • The International Technology Roadmap for Semiconductors (ITRS), 2007. Available: http://www.itrs.net/. • G. E. P. Box and M. E. Muller, “A note on the generation of random normal deviates,” Ann. Mathematical Statistics, 29(2), pp. 610-611, 1958.

VIFI-CMP: Variability-Tolerant Chip-Multiprocessors for Throughput & Power Wan-Yu Lee Iris H.-R. Jiang GLSVLSI-09 20