1 / 73

Measure Twice and Cut Once: Robust Dynamic Voltage Scaling for FPGAs

This paper proposes a dynamic voltage scaling (DVS) technique for reducing FPGA power consumption while maintaining performance. By exploiting FPGA programmability, the proposed method performs design and chip-specific calibration to find the minimum VDD that guarantees operation at the required speed. The technique is evaluated through testing and results show significant reductions in power consumption without compromising performance.

byoung
Download Presentation

Measure Twice and Cut Once: Robust Dynamic Voltage Scaling for FPGAs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Measure Twice and Cut Once:Robust Dynamic Voltage Scaling for FPGAs Ibrahim Ahmed, Shuze Zhao, Olivier Trescases and Vaughn Betz Email:ibrahim@ece.utoronto.ca

  2. FPGA Power Consumption Challenge

  3. FPGA Power Consumption Challenge VDD not scaling

  4. FPGA Power Consumption Challenge • Obstacle against entering emerging low power/mobile market (IoT) • Must show superior perf/W to compete in Data centers • Need innovation to bring power down “The future of continued scaling is dependent on adaptive power management and voltage scaling”, IEEE Fellow Kevin Zhang, VP of Intel's Technology and Manufacturing Group

  5. Worst-case Modelling is Wasteful • Devices have different delay -> Variation !!

  6. Worst-case Modelling is Wasteful • Delay is temperature dependant High Temperature

  7. Worst-case Modelling is Wasteful • Delay is affected by VDD Lower VDD

  8. Worst-case Modelling is Wasteful • Aging also affects delay End-of-life

  9. Worst-case Modelling is Wasteful • Aging also affects delay Static timing analysis (STA) accommodates the tail End-of-life

  10. Worst-case Modelling is Wasteful • Aging also affects delay • Timing models add margins for :- • Slow device • Worst temperature • Worst voltage droop • End-of-life effects • Guard-bands for noise, etc.. End-of-life

  11. How significant are the added margins ?

  12. How significant are the added margins ? > 20 % reduction in VDD without reducing Fmax

  13. How significant are the added margins ? Dynamic Voltage Scaling (DVS) > 20 % reduction in VDD without reducing Fmax

  14. Dynamic Voltage Scaling • Find minimum VDD that guarantees operation at required speed • VDD, reduces both dynamic and static power • DVS has been commercially adopted by CPUs, but not FPGAs • FPGA’s programmability  unknown critical path at fabrication time • This work: exploit programmability to perform design & chip-specific calibration Pdynamica VDD2 • Static power drops even faster

  15. Outline • DVS proposal • Testing Procedure • FRoC • Results • Summary & Future work

  16. Outline • DVS proposal • Testing Procedure • FRoC • Results • Summary & Future work

  17. Conventional Design Cycle One Measurement by STA Application HDL Passes timing  FPGA Application bit-stream Program & run application with nominal VDD

  18. DVS Proposal Overview 1st measurement by conventional STA (once per application) CAD System Application HDL FPGA FPGA Calibration bit-stream Application bit-stream Replicated critical path Critical path Heaters

  19. DVS Proposal Overview CAD System Application HDL FPGA FPGA VDD Power stage Calibration bit-stream Application bit-stream Critical path Program & generate calibration table (CT) 2nd measurement by on-chip calibration (repeated for each FPGA)

  20. DVS Proposal Overview CAD System Application HDL FPGA FPGA VDD Calibration bit-stream Application bit-stream Power stage Program & generate calibration table (CT) CT Program & run application with DVS

  21. DVS Proposal Overview CAD System Today’s talk Application HDL FPGA FPGA Calibration bit-stream Application bit-stream Program & generate calibration table (CT) CT Program & run application with DVS

  22. Generating the Calibration Bit-stream • Performed on each FPGA at least once • For aging effects, calibration with every power up • Capture all speed-limiting paths • Invisible to FPGA users Fast Robust Automated Calibration FRoC CAD tool

  23. Outline • Motivation • DVS proposal • Testing Procedure • FRoC • Results • Summary & Future work

  24. How to measure Fmax • Stimulate with random inputs and check output ? • Does not guarantee exercising the critical path (CP) • To robustly measure the delay of a path :- • Off-path inputs must have a steady non-controlling value Tested path LUT Steady 1/0

  25. How to measure Fmax • Stimulate with random inputs and check output ? • Does not guarantee exercising the critical path (CP) • To robustly measure the delay of a path :- • Off-path inputs must have a steady non-controlling value • Control over the edge transition from input  output Tested path LUT / Edge 1/0

  26. Measuring the Delay of a Single Path Application FF FF FF FF FF FF LUT LUT LUT Critical path (CP) Replicate LUT LUT LUT FF FF FF

  27. Measuring the Delay of a Single Path Application FF FF FF FF FF FF LUT LUT LUT Critical path (CP) Replicate LUT LUT LUT FF FF FF

  28. Measuring the Delay of a Single Path Application FF FF FF FF FF FF Change LUT mask LUT LUT XOR Critical path (CP) LUT LUT XOR FF FF FF

  29. Measuring the Delay of a Single Path Application FF FF FF FF FF FF Edge1 Control edge transition LUT LUT XOR Critical path (CP) Edge2 LUT LUT XOR FF FF FF

  30. Measuring the Delay of a Single Path Input stimulus Application FF FF FF FF FF FF Edge1 Error detection FF Detect timing faults LUT LUT XOR Critical path (CP) XNOR Edge2 LUT LUT XOR FF FF Error FF FF

  31. A Single Path Delay is Not Robust • Many paths have delay close to the CP • Within-die variation may cause some other pathsto be more critical • Varying VDD affects FPGA elements delay differently Robust; measure delay of many near critical paths Fast; use 1 calibration bit-stream

  32. Testing Disjoint Paths • Testing many disjoint paths is mostly easy • Repeat the same procedure for single path testing Application FF FF FF FF

  33. Testing Disjoint Paths • Testing many disjoint paths is mostly easy • Repeat the same procedure for single path testing Application Calibration FF FF FF FF Error FF FF FF FF Error

  34. ..but What to Do with Overlapping Paths? • Paths sharing a LUT through different inputs Path1 LUT A FF S1 LUT C FF LUT B FF S2 Path2

  35. ..but What to Do with Overlapping Paths? • Paths sharing a LUT through different inputs • To test Path1, fix off-path input at C Path1 LUT A FF S1 LUT C FF LUT B FF S2 Path2

  36. ..but What to Do with Overlapping Paths? • Paths sharing a LUT through different inputs • To test Path1, fix off-path input at C • Path1 & Path2 can’t be tested together Path1 LUT A FF S1 LUT C FF LUT B FF S2 Path2

  37. ..but What to Do with Overlapping Paths? • Paths sharing a LUT through different inputs • To test Path1, fix off-path input at C • Path1 & Path2 can’t be tested together • Need 2 separate test phases Path1 LUT A FF S1 LUT C FF LUT B FF S2 Path2

  38. ..but What to Do with Overlapping Paths? • Paths sharing a LUT through different inputs • To test Path1, fix off-path input at C • Path1 & Path2 can’t be tested together • Need 2 separate test phases FixA Path1 LUT A FF S1 LUT C FF LUT B FF S2 Path2 -Add Fix control signals to keep LUT output constant -Test controller cycles through test phases sequentially FixB

  39. LUT Masks for Testing • only added when required • Developed more LUT masks to test Cyclone IV carry-chains with the same controllability K-LUT Fix off-path inputs Break re-convergent fan-outs Control edge transition

  40. Can’t Test Everything with 1 Bit-stream P1 P2 • One or two LUT inputs used as control signals LUT P3 P4

  41. Can’t Test Everything with 1 Bit-stream P1 P2 • One or two LUT inputs used as control signals LUT Edge Fix

  42. Can’t Test Everything with 1 Bit-stream P1 P2 • One or two LUT inputs used as control signals • Fixing LUT output does not break all re-convergent fan-outs LUT Edge Fix LUT B Path2 LUT A LUT C Path1

  43. Can’t Test Everything with 1 Bit-stream P1 P2 • One or two LUT inputs used as control signals • Fixing LUT output does not break all re-convergent fan-outs • LAB inputs constraint • Carry-chains constraints LUT Edge Fix LUT B Path2 LUT A LUT C Path1

  44. Outline • Motivation • DVS proposal • Testing Procedure • FRoC • Results • Summary & Future work

  45. CAD System with FRoC Proposed CAD system Calibration HDL Calibration bit-stream Quartus STA FRoC Quartus P&R Quartus Application HDL Location & Routing Constraints Application bit-stream 1) Paths selection 2) Paths replication 3) Grouping replicated paths 4) Test controller generation

  46. 1) Path selection Application circuit FF FF FF FF LUT LUT LUT FF

  47. 1) Path selection • Extract near critical paths from STA • {P1, P2, P3, P4, P5} Application circuit P5 P4 P1 P3 P2 FF FF FF FF 4-LUT 4-LUT 4-LUT FF

  48. 1) Path selection • Extract near critical paths from STA • {P1, P2, P3, P4, P5} • Select which paths to test • Can’t test {P2,P3,P4} in 1 bit-stream Application circuit P5 P4 P1 P3 P2 FF FF FF FF 4-LUT 4-LUT Two inputs reserved for control signals (Fix , Edge) 4-LUT FF

  49. 1) Path selection • Extract near critical paths from STA • {P1, P2, P3, P4, P5} • Select which paths to test • Can’t test {P2,P3,P4} in 1 bit-stream • Select the more critical paths • {P1, P2, P3, P5} Application circuit P5 P1 P3 P2 FF FF FF FF 4-LUT 4-LUT 4-LUT FF

  50. 2) Path replication Application circuit P5 P1 P3 P2 FF FF FF FF 4-LUT 4-LUT Replication + Control Signals 4-LUT FF

More Related