1 / 31

VIFI-CMP: Variability-Tolerant Chip-Multiprocessors for Throughput & Power Wan-Yu Lee Iris H.-R. Jiang GLSVLSI-09 20

VIFI-CMP: Variability-Tolerant Chip-Multiprocessors for Throughput & Power Wan-Yu Lee Iris H.-R. Jiang GLSVLSI-09 2009/05/06. Outline. Introduction Chip-multiprocessor architectures Models Monte Carlo analysis Experimental results Conclusion. Introduction. Process Variability.

daxia
Download Presentation

VIFI-CMP: Variability-Tolerant Chip-Multiprocessors for Throughput & Power Wan-Yu Lee Iris H.-R. Jiang GLSVLSI-09 20

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. VIFI-CMP:Variability-TolerantChip-Multiprocessorsfor Throughput & PowerWan-Yu LeeIris H.-R. JiangGLSVLSI-092009/05/06

  2. Outline • Introduction • Chip-multiprocessor architectures • Models • Monte Carlo analysis • Experimental results • Conclusion VIFI - GLSVLSI-09

  3. Introduction

  4. Process Variability • Process variability is growing as technology advancing • Transistor level • Unexpected deviations on • Threshold voltage Vth • Effective channel length Leff • System level • Unwanted degradation on throughput and power

  5. Chip-Multiprocessors (CMPs) • Chip-multiprocessors (CMPs) • Integrate multiple microprocessors (cores) into a single chip • The # of cores is increasing as technology scaling • Multi-threaded multi-core design • Process variability in CMPs? • Restricted the impact within each core • Process variations cause core-to-core variations • Frequency core L2 cache

  6. Target • Modeling process variability in CMPs • Analytical model • Monte Carlo analysis • Mitigating the impact of process variability • Reducing the throughput degradation caused by process variation • Reducing dynamic power consumption

  7. CMP Architectures Globally-clocked Frequency island Voltage island Voltage island frequency island

  8. Globally-Clocked (GC-CMP) • Using general clock for all cores • Clock frequency is dominated by the slowest core. core L2 cache

  9. Frequency Island (FI-CMP) • Adding frequency island to each core • The granularity of frequency bins is 10% of the nominal value. • Feasible frequency (FF) • Original frequency of a core • Operating frequency (OF) • Core frequency with frequency island • OF≦FF • Example: a 3.7GHz core with nominal value of 4GHz will be assign a frequency island 3.6GHz

  10. Voltage Island (VI-CMP) • Adding voltage island to each core • Meeting the frequency of nominal design by voltage scaling • Nominal design (Nom): design without process variation • The granularity of voltage bins is 10% of the nominal value. • Example: a 3.7GHz core to reach its nominal value 4GHz → adding VI to change supply voltage from 0.9V to 0.99V

  11. Voltage Island Frequency Island (VIFI-CMP) • Adding both voltage and frequency island • Taking advantages of slack between FF and OF of each core • If FF >OF, the supply voltage can be lower. • Naturally collaborating with dynamic voltage and frequency scaling (DVFS)

  12. Models Transistor model Process variations Throughput model Power model

  13. Transistor Model • Alpha-power law model • Drain-situation current • Transistor delay • The transistor delay is related to Vth and Leff.

  14. Process Variations (1/2) • Within-die (WID) • Systematic components • Caused by lithography issue • Modeled by normal distribution with spherical spatial correlation • Random components • Random dopant fluctuation • Modeled by normal distribution • Die-to-die (D2D) • Offset of within-die variation

  15. Process Variations (2/2) • Spatial correlation • Chip is divided into n×mgrids • ITRS projected the variation on Leff is as half of Vth • Leff can be obtained from Vth

  16. Throughput Model • TP(Nthreads): total system throughput • ML2(SL2): the miss rate of L2 cache • SL2 : the effective L2 cache size • Ls: the unloaded service latency of the L2 miss path • : the frequency of thread i • : stall time of link component • tdram: stall time of DRAM

  17. Power Model • Pdyn(Nthreads): total dynamic power • : dynamic power of a computation component • pdram and : power penalty when L2 cache miss occurs • ci : self loading capacitance

  18. Monte Carlo Analysis

  19. Monte Carlo Methods • Computational algorithms • Rely on repeated random sampling to compute their results • Used when it is infeasible or impossible to compute an exact result with a deterministic algorithm • Suited to calculation by a computer • Rely on repeated computation and random or pseudo-random numbers

  20. Box Muller Transform • Generating normal random distributions by uniform distributions • Taking two samples from the uniform distribution on the interval (0, 1] • One is used for radius, the other is for angle • Generating two independent normal random distributions • x=Rcosθ, y=Rsinθ y x

  21. Monte Carlo Analysis • Generating 1000 variation maps of Vth and Leff • Flow • Using Box Muller transform to generate n×m normal random numbers of and • Adding spatial correlation • By Vth and Leff, Tg of each grid can be obtained • Partitioning grids into corresponding #cores • FF of each core is the inverse of its slowest delay core core core core core core core core

  22. Experimental Results

  23. Experimental Setting (1/2) • Benchmark • 4 cores: Small(S), Medium(Me), Large(L), Monolithic(Mo) • 16MB shared L2 cache • Nthreads ranges from 1 to 70.

  24. Experimental Setting (2/2) • Technology: 22nm • Operating point • α=1.3 • =22nm, =0.9V, =0.094V • Granularity • Fclk=10% Fclk • VDD=10%VDD • Variability • =6.4%, =0.0% • φ=0.5, n×m=100×100

  25. Experimental Results—Vth & Leff • The distributions of Vth and Fclk are generated by Monte Carlo analysis • Over 1000 variation maps • Vth→Frequency→Throughput

  26. Experimental Results—Throughput (1/2)

  27. Experimental Results—Throughput (2/2) • Saturated when Nthreads> #cores • Throughput degradation • Mo>L>Me>S • Small core has the best variability-tolerance

  28. Experimental Results—Power • Core type: Mo>L>Me>S • Small core has the best power efficiency. • CMP: VI>Nom>GC>FI>VIFI • VIFI-CMP has the best power efficiency.

  29. Tradeoff between Throughput and Power • Best tradeoff between throughput and power occurs when Nthreads=#cores • Nthreads=70, S/VIFI has 0.06% throughput degradation and saves 36.27% dynamic power • VIFI-CMP has the best variability-tolerance • Same throughput as FI-CMP • Lower power consumption than Nom

  30. Conclusion • This work characterized process variability on throughput and power for CMPs. • Results showed that • Small core has the best variability-tolerance. • VIFI-CMP has the best variability-tolerance on throughput and power and can naturally collaborate with DVFS. • VIFI-CMP even can have lower power consumption than the nominal design.

  31. References • Monte Carlo method, Wikipedia, Available: http://en.wikipedia.org/wiki/Monte_Carlo_method. • J. S. Liu, “Monte Carlo Strategies in Scientific Computing.” Springer, 2008. • S. Herbert and D. Marculescu. “Characyerizing Chip-Multiprocessor Variability-Tolerance.” In the Proceedings of Design Automation Conference, pp.313-318, 2008. • S. Sarangi, B. Greskamp, R. Teodorescu, J. Nakano, A. Tiwari, and J. Torrellas, “VARIUS: a model of process variation and resulting timing errors for microarchitects,” IEEE Trans on Semiconductor Manufacturing, 21(1), pp. 3-13, 2008. • The International Technology Roadmap for Semiconductors (ITRS), 2007. Available: http://www.itrs.net/. • G. E. P. Box and M. E. Muller, “A note on the generation of random normal deviates,” Ann. Mathematical Statistics, 29(2), pp. 610-611, 1958.

More Related