1 / 27

Leveraging PVT Variations to Reduce Timing Emergencies in Multi-Core Processors

This paper discusses the use of PVT (process, voltage, and temperature) variations to mitigate timing emergencies in multi-core processors. It analyzes the complementary effect of these variations in both the timing and frequency domains and proposes an implementation scheme using delay sensors. Experimental results show the effectiveness of this approach.

bradleymin
Download Presentation

Leveraging PVT Variations to Reduce Timing Emergencies in Multi-Core Processors

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Leveraging the Core-Level Complementary Effects of PVT Variations to Reduce Timing Emergencies in Multi-Core Processors Guihai Yan1, Xiaoyao Liang2, Yinhe Han1, and Xiaowei Li1 1. Key Laboratory of Computer System and Architecture, Institute of Computing Technology, Chinese Academy of Sciences (ICT, CAS) 2. NVIDIA Corporation Jun. 23, 2010

  2. Outline • Introduction to PVT variations • Analyzing “complementary effect” • Timing domain • Frequency domain • Implementation challenges & solutions • Experimental results

  3. Introduction to variations • Variation sources • Process variation • Random dopant fluctuation • Sub-wave length lithography • Voltage variation • Parasitic power delivery networks • Application variability • Inductive noise, IR-drop • Temperature variation • Imbalanced activity • Hotspot • We focus on the primary manifestation • Performance variation

  4. Process variation Sub-wavelength lithography • Sub-wavelength Lithography • “What you get is not what you want” • Systematic • Random dopant fluctuations • Vth variation • Random [Borkar, DAC’09] [Teodorescu, ISCA’08] Max Freq. differentiate by 20% ! P variation is time-independent, “DC component” [Aitken, ATS’07]

  5. Temperature variation Measured PentiumM processor temperatures • Application-specific • Slow-varying • Milliseconds • Typical thermal constant: 2ms [Donald, ISCA’06] T variation is slow-varying, “Low-frequency components”

  6. Voltage variation • Fast-changing • Inductive noise • a.k.a. L(di/dt) problem • IR-drop Why it is harder to keep a constant voltage level ? Example Power budget: 100W Working voltage: 1V Current: 100A To keep voltage fluctuation between ±5%, RPDN < 0.5 mOhm Hierarchical PDN V variation is fast-changing, “High-frequency components”

  7. Resultant impact of PVT variations Fast cores Mild apps. Low temp.  Timing (Delay) Variation  High temp. Slow cores Violent apps.

  8. Prior solutions • Strive to compensate P, V, and T variation individually • Mitigate P variation • ReCycle[ISCA’06], Body Bias[Micro’07], ReVIVal[ISCA’08] et al. • Stabilize V variation • Pipeline damping[ISCA’03], DeCoR[HPCA’08] et al. • Balance T variation • Hotspot [ISCA’03], DVFS + Activity Migration[ISCA’03, HPCA’01, TODAES’07] et al. • Other timing-oriented solutions • Razor[JSSC’06], EVAL[Micro’08], Tribeca[Micro’09] et al.

  9. Our perspective • Focus on the essential Timing issue Process Delay Voltage Temp. Delay variation Temp. variation Process variation Voltage variation Design Goal: Minimize Delay variation Not Necessarily aggregated, but can cancel off each others in some cases. Hence, “Complementary”

  10. Some terms Timing Emergency • Timing emergency (TE) • Emergency level (EL) • “Density” of TE • Define: EL = # of TE per 100 millions cycles • Violent vs. Mild • Voltage • Large fluctuation = Violent • Small fluctuation = Mild • Temperature • “Hot” = Violent • “Cool” = Mild • Process • Slow corner = Violent • Fast corner = Mild Delay Threshold Time Voltage Traces Violent Mild

  11. How PVT Variations Complement each other ? • Observation in time domain Excessive headroom T. Mild, V. Mild Threshold Core1: Delay  T Mild, V Mild T Mild, V Violent Large margin, low EL Time Emergency T. Violent, V. Violent Delay Core2: T Violent, V Mild T Violent, V Violent  Little margin, High EL Time What if exchange the threads on Core1 and Core2? Mild + Violent

  12. Frequency domain analysis • Y(f) = FFT(D(t)) • Sample interval: 5ns • Span of analysis: 1ms DC component: “P” Low freq. component: “T” High freq. component : “V”

  13. The strength of each component of PVT variations P T PT Migrate threads = “Graft”V component

  14. Frequency domain analysis (cont.) • Relative frequency spectrum deviations on 2GHz quad-core processor. • P: 0-100Hz, T: 100Hz-1MHz, V: 1MHz-250MHz. • Potential • Core3 and Core4 are mild • Strategy • exchange threads on Core1 and Core4, Core2 and Core 3

  15. How to exploit such “complementary effect”? • Straightforward approach Product test Voltage sensor Temp. sensor Aging sensor Xyz sensor P component V component T component • Pros. • Conceptually simple • Cons. • Slow: V. and T. sensor are slow • Incomprehensive: e.g. what if aging ? • Our approach: Delay sensor-based scheme • Pros. • Fast • Comprehensive (Timing) • Cons. • Need a little trick Delay sensor (P+T) component V component

  16. What we have known Delay variation Delay sensors What we need to know The strength of PT and V component Implementation (cont.) How to bridge the gap? • Three challenges • Infer PVT component from delay Values • On-the-fly thread migration decision-making • On-the-fly variation prediction

  17. Top view of architecture Timing Emergency Aware + Thread Migration TEA-TM

  18. Infer PVT component from Delay Values • Use mean delay to infer PT component(< 1MHz) Mean delay PT component This simplification greatly facilitates cost-efficient implementation of TEA-TM. Then, how about “V component”?

  19. Urgent First Policy (UFP) Do NOT directly rely on accurate V-component Basic idea: Migrate the threads running on the highest EL core to the core with the smallest PT component. —— Always right, but may not be optimum! On-the-fly TEA-TM Decision Making EL = PT “+” V Emergency Level PT Component TM Core1 Core2 Refer to our paper for the more sophisticated “DUFP” heuristic

  20. On-the-fly Variation Prediction • Objective: reducing the emergency level in the future • Emergency Level • PT component • Linear prediction mechanism EL prediction result

  21. Experiments • Methodology • Trace-based evaluation • Modeled processor • Quad-core • Superscalar • 2GHz • PDN • Similar to Intel Xeon 5500 quad-core microprocessor • 130W (peak 150W) • Workload

  22. Metrics • Relative throughput loss Where, • Relative Fairness Where,

  23. Impact of TM interval on average EL reduction Perf. Overhead & EL Reduction • When take migration penalty into account    Overall Throughput  Large Migration Penalty Large Emergency Rate  • No migration overhead accounted • 1ms at 2GHz, migration overhead is negligible • 0.3 ms at 2GHz, migration overhead < 15% Minimal TM Interval

  24. Reduction in Relative Throughput Loss • TM Interval: 0.2ms, Accuracy: 90% • Developing more sophisticated heuristics

  25. Fairness Improvement • 80% fairness improvement

  26. Conclusion • Analyzing the complementary effect • from both time and frequency domain • Presenting a delay sensor-based scheme (TEA-TM) to exploit the comp. effect • Simple, cost-efficient • The experimental results show • Improved throughput • Improved fairness

  27. Thanks!

More Related