1 / 45

ECE 636 Reconfigurable Computing Lecture 16 Power Reductions Techniques for FPGAs

ECE 636 Reconfigurable Computing Lecture 16 Power Reductions Techniques for FPGAs. Overview. FPGAs generally considered power hungry compared to ASIC and processor counterparts Mostly due to unused interconnect Recent area of extensive research Device techniques Voltage scaling Sleep mode

garrick
Download Presentation

ECE 636 Reconfigurable Computing Lecture 16 Power Reductions Techniques for FPGAs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ECE 636Reconfigurable ComputingLecture 16Power Reductions Techniques for FPGAs

  2. Overview • FPGAs generally considered power hungry compared to ASIC and processor counterparts • Mostly due to unused interconnect • Recent area of extensive research • Device techniques • Voltage scaling • Sleep mode • Software techniques • Reduced switching • Reduced capacitance

  3. Dynamic Power • Dynamic power is required to charge and discharge load capacitances when transistors switch. • One cycle involves a rising and falling output. • On rising output, charge Q = CVDD is required • On falling output, charge is dumped to GND Short circuit current Charge/discharge current Courtesy: Harris

  4. Dynamic Power Short circuit power <10% of dynamic power

  5. FPGA Static Power Consumption • Junction leakage • Gate oxide leakage • Subthreshold leakage

  6. FPGA Static Power Consumption • Junction leakage • Small fraction of leakage • Gate oxide leakage • When Vgs < Vt still some source-drain current • Increases exponentially as Vt decreases • Decreases exponentially as Vgs decreases • Subthreshold leakage • Increases exponentially as Vgs increases Courtesy: Nowak Technology trend

  7. FPGA Power Reduction Goals • Dynamic power goals • Reduce Vdd along non-critical paths • Low swing signalling • Use CAD approaches to limit long high-toggle paths • Pdynamic = 0.5 * C * Vdd2 * f • Static power goals • Cut-off Vdd for unused transistors • Use high Vt transistors for SRAM cells • Various other voltage biasing techniques

  8. Traditional Routing Switch Courtesy: Anderson level-restoringbuffer

  9. Proposed Switch Designs: Anderson • Based on 3 observations: • Routing switch inputs tolerant to weak-1 signals (level-restoring buffers). • Considerable slack in FPGA designs  many switches can be slowed down. • Most routing switches feed other routing switches. • Can produce weak-1 logic signals.

  10. VVD “Basic” Switch Design high-speed: MNX & MPX ON low-power: MNX ON, MPX OFF sleep: MNX OFF, MPX OFF MODE OPERATION:

  11. VVD = VDD output swing:rail-to-rail. High-Speed Mode high-speed: MNX & MPX ON low-power: MNX ON, MPX OFF sleep: MNX OFF, MPX OFF MODE OPERATION:

  12. VVD = VDD - VTH VVD output swing:GND-to-(VDD-VTH). output swing:GND-to-(VDD-VTH). high-speed: MNX & MPX ON low-power: MNX ON, MPX OFF sleep: MNX OFF, MPX OFF MODE OPERATION: Low-Power Mode

  13. VVD Sleep Mode high-speed: MNX & MPX ON low-power: MNX ON, MPX OFF sleep: MNX OFF, MPX OFF MODE OPERATION:

  14. Leakage Power Results: Anderson 70 Basic 60.8 60 50 39.7 38.7 40 36 % leakage power reduction vs. high-speed mode 30 20 10 0.3 0 LP mode Sleep mode LP mode LP mode Traditional (+unused (+used switch fanout) fanout)

  15. Region Constrained Placement • Rather than just focusing on routing, consider constraining logic • Most circuits exhibit locality • Gayasen: FPGA’2004

  16. Region Constrained Placement • Several issues to consider • Size of sleep transistor • Too large: increases leakage, area • Too small: affects logic performance • Size of region • Too large: possibly unused resources, complicates placement • Too small: Sleep transistors take up too much room

  17. Experimental Flow: RCP • Different region sizes considered for flow • Area constraints for portions of design determined by hand • May encourage designers to create granular designs

  18. Power Savings: RCP • Note significant reduction in leakage power savings as region size increases • Bottom curve primarily due to luck

  19. Performance Limitation: RCP • Performance limited by use of regions • Nearly 10% clock frequency reduction for many designs

  20. Low-swing Signalling • Techniques we have examined so far look at tinkering with supply voltage • Also possible to modify wire signalling to reduce voltage swing • Most of FPGA is made up of interconnect • Approach targets dynamic power consumption George and Rabaey: 1997

  21. Low-swing Signalling • Interconnect swing is at 0.8V while rest of circuit operates at 1.5V • Cascode circuitry used at sink to overcome slow speed issues • 50% energy savings at cost of 25% delay

  22. Alternate approach: Modifying FPGA CAD • FPGA architecture modification impact all designs- even those that don’t care about power • Can placement and routing be modified to consider dynamic power • Need to know which signals are high toggle • Attempt to minimize length of high-toggle wires • Minimize impact on performance and area • Techniques fit well into our previous work on placement and routing Lamoreaux and Wilton

  23. Modifying FPGA CAD Placement • Previous cost metrics for annealing considered bounding box wire length and timing costs • Include additional term which considers signal switching activity

  24. FPGA Placement for Power • Previous cost metrics for annealing considered bounding box wire length and timing costs • Include additional term which considers signal switching activity • Post-route energy reduced by 3.0%. Power decreased by 7% but delay increases by 4%

  25. FPGA Routing Modifications for Power • Original routing cost function takes congestion b(n) and delay(n) into account • Augment with factor that takes net activity into account • Minimize length of most active nets, even in the presence of congestion.

  26. FPGA Routing for Power Results • Potential benefits somewhat limited by placement • Note that most nets have low activity • Power is decreased by 6% but delay increased by 4%. Energy savings of about 3%

  27. FPGA Embedded Memory Blocks • Embedded memory blocks (EMBs) are important parts of FPGAs • Consume roughly 14% of Altera Stratix II dynamic power * • Increasing in recent designs * Stratix II Low Power Applications Note, 2005

  28. Bit Line Pre-charge Clk Enable MClk MClk Clk BIT BIT RAM cell Row Decode Column Mux Write Buffers Sense Amps MClk Write Enable MClk Read Enable MClk Latch Address Read Data Write Data Embedded Memory Block Port Internal View Reducing clocking saves dynamic power

  29. Power Optimization #1 • Convert EMB read enable/write enable signals to associated read/write clock enable signals • Limitations • Each port has read or write enable control signal • Embedded memory block has read enable input Before After Data Data Q Q Data Data Q Q Wr clk enable Rd clk enable Wr clk enable Rd clk enable Vcc Vcc Wren Rden Wren Rden Vcc Write enable Read enable Write enable Read enable Vcc Read Address Read Address Write Address Read Address Write Address Read Address Write Address Write Address Clock Clock

  30. Write Enable Combined Write Clk Enable User-defined Write Clk Enable Implementation • Conversion mode • Ties off R/W enable to RAM clock enables • Doesn’t make transform if CE already present on port • Combining mode • AND user RAM clock enables with derived R/W clock • Could impact performance

  31. FPGA RAM Processing FIFO, Shift Register, RAM specification • FIFOs and Shift registers converted into logical RAMs • Logical RAMs mapped to RAM blocks Logical-to-physical RAM processing Memory/ logic placement Placed Memory Create Logical Memory Logical RAMs/ logic RAM blocks/ logic

  32. 16K bits 4k deep x 4 wide Mapping RAM to EMBs • Implementation choice can impact design area, performance, and power. • Some mappings may require multiple EMBs User-defined (logical) memory Physical (EMB) memory 4K bits 4K bits 4K bits 4K bits M4K M4K M4K M4K 512K MRAM

  33. 512 words deep 4K words deep 8 bits wide 1 bit wide Memory Organization • Each EMB can be configured to have different depth and width (e.g. Stratix II M4K) • All hold 4K bits • Slightly lower power consumption for wider EMB configurations (not including routing) 128 words deep 32 bits wide

  34. Logical memory 4k words deep and 1 bit wide (4 times) 4k words deep and 4 bits wide 4 EMBs active during access Addr[0:11] EMB Data[0:3] Area and Delay Optimal Mapping • Configure each EMB to be as deep as possible • Number of address bits on each EMB same as on logical memory • Area and performance efficient: no external logic needed • Power inefficient: All EMBs must be active during each logical RAM access Vertical Slicing

  35. Addr[10:11] Addr[10:11] Alternative Mapping • Configure EMB to have width of logical RAM (e.g. 1Kx4) • Allows shutdown of some RAMs each cycle • But adds some logic • Saves RAM power, adds combinational logic and register power Horizontal Slicing Addr Decoder 1K deep x 4 wide More Power Efficient: Logical memory (4 times) 1 EMB active during access Addr[0:9] 4k words deep and 4 bits wide 4 Data[0:3]

  36. RAM Slicing - Example • Power reduction available with different slicing 4kx32 Dynamic Power Multiplexer Power Increasing 140 Best range 120 100 80 Dynamic Power (mW) 60 40 20 0 128 256 512 1k 2k 4k EMB Power Increasing Maximum Depth

  37. FIFO, Shift Register Power-aware Physical RAM processing Memory/ Logic Placement Create Logical Memory Insert Decode and Mux Logic Completed placement Power Library Power Optimization #2: Power-aware RAM Partitioning • Algorithm considers possible logical to physical RAM mappings

  38. Experimental Approach • 40 designs evaluated • Quartus 5.1 • Mapped to smallest possible device and target max frequency • Simulation with test vectors • Power analysis with PowerPlay

  39. Memory Power • 21.0% average reduction for all techniques (9.7% with convert/combine)

  40. Overall Core Dynamic Power • 6.8% average power reduction for all techniques (2.6% with convert/combine) 35 Enable convert/ combine 30 Enable convert/ combine + mem partition 25 20 % Dyn. Power Reduction 15 10 5 0 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 -5 Designs

  41. Design Performance • 1.0% average performance loss for all techniques (0.1% for enable convert/combine) Average Design Clock Frequency 10 5 0 -5 % Frequency Improvement -10 Enable Convert/ -15 Combine -20 Enable Convert/ Combine + -25 Mem Partition -30 Designs

  42. Results Summary • Almost 7% core dynamic power reduction across all designs • Some designs benefit more than others • Minimal clock frequency hit for most designs

  43. Impact of Multiple Embedded Memory Blocks • Rerun 40 designs but only allow one type of target EMB for each mapping • All designs targeted to Stratix II EP2S180 • Significant power impact for most designs versus EP2S180 target with no restrictions

  44. Summary • Key to reducing RAM power is keeping clocks disabled. • Movement of read/write enables to clock enables limits dynamic activity • Power-aware RAM partitioner attempts to select power-optimal mapping – combined with clock enable enhancement • Overall • About 21% average memory power reduction • 10% enable convert/combine • About 7% average dynamic power reduction • 3% enable convert/combine • Diversity of EMBs reduces power by 33%

  45. Summary • FPGA power consumption under consideration at numerous level: architecture, circuit, CAD, and physical • FPGA companies just now embracing power-aware CAD, power-aware architectures on the way • Many circuit-level techniques still possible • RTL CAD synthesis techniques provide a promising area for exploration

More Related