Literature review
This presentation is the property of its rightful owner.
Sponsored Links
1 / 21

Literature Review PowerPoint PPT Presentation


  • 59 Views
  • Uploaded on
  • Presentation posted in: General

Literature Review. Measuring the Gap Between FPGAs and ASICs Ian Kuon, Jonathan Rose University of Toronto IEEE TCAD/ICAS Feburary 2007. Henry Chen February 26, 2010. Introduction. Trade-offs between FPGAs and standard-cell ASICs Decreased NRE, design time

Download Presentation

Literature Review

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Literature review

Literature Review

Measuring the GapBetween FPGAs and ASICs

Ian Kuon, Jonathan Rose

University of Toronto

IEEE TCAD/ICAS

Feburary 2007

Henry Chen

February 26, 2010


Introduction

Introduction

  • Trade-offs between FPGAs and standard-cell ASICs

    • Decreased NRE, design time

    • Increased silicon area, power; decreased performance

  • FPGA inefficiencies known and accepted,but largely un-quantified


Previous comparisons

Previous Comparisons

  • Jones et al. (1986): MPGAs to standard cells

    • 1.52.6x area, ~1.1x delay

    • Estimates based on only 5 circuits

  • Brown et al. (1992): FPGAs to MPGAs

    • 812x area, ~3x delay

    • Optimistic FPGA gate counting?

    • Anecdotal evidence

    • Doesn’t consider “hard” macros (multipliers, memories)

  • Combine for FPGAs to standard cells

    • 1238x area, ~3.4x delay

    • Dated; based on (questionable?) extractions


Previous comparisons 2000 s

Previous Comparisons (2000’s)

  • Zuchowski et al. (2002): LUT to ASIC gate (0.25μm90nm)

    • ~1/45 gate density, 1214x delay, ~500x dynamic power

    • Unexplained process-dependent density/power variation

    • Dependent on gates implemented per LUT

  • Wilton et al. (2005): Partial programmable replacement

    • 88x area, 2x delay

    • Single logic module

  • Compton & Hauck (2007): FPGA apps. to standard-cell

    • Avg 7.2x area

    • Scaled FPGA 0.15μm to 0.18μm standard-cell


Methodology

Methodology

  • Implement in both FPGA and standard-cell

    • Altera Stratix II FPGA: TSMC 90nm multi-Vt, 1.2V

    • Standard-cell: ST CMOS090 90nm, dual-Vt, 1.2V

  • Empirical results from 23 benchmarks

    • Rejected if different synthesis tools resulted in>5% register count deviation

    • Mix of logic, memory, DSP

  • Analyze gains from FPGA’s DSP and memory blocks

  • Exclude I/Os

  • Have device data from Altera


Implementations

Implementations

  • FPGA

    • Altera-provided CAD flow

    • Speed/area balanced optimization; optimize critical paths performance, otherwise optimize area

    • Automatic DSP, memory block inference

    • Set to mimic effects of high resource utilization

  • ASIC

    • Synopsys/Cadence synthesis/PAR flow

    • Free to choose from high/standard-Vt cells

    • Timing-driven placement; target 7585% utilization

    • Emphasized performance in compiled memories


Area comparison

Area Comparison

  • ASIC

    • Post PAR’d core area

    • Include memory macros

  • FPGA

    • Count only silicon area for used resources

    • Include surrounding routing resources

    • Count full block area even if only partially used

    • Area data from Altera


Area comparison results

Area Comparison Results

  • Logic only:35x avg (17‒54x)

  • Logic + DSP:25x avg (12‒58x)

  • Logic + Memory:33x avg (19‒70x)

  • Logic + Memory + DSP:18x avg (9.5‒26x)


Impact of hard macros on area

Impact of Hard Macros on Area

  • Smaller area penalty for designs using hard macros

    • Hard macro close to ASIC implementation(plus programmable interface & routing)


Area comparison caveats

Area Comparison Caveats

  • Pessimistic FPGA area estimation; count full resource area even if only partially used (~5‒10% reduction)

  • ASIC density may decrease for larger designs, while FPGAs are designed to handle large designs


Delay comparison

Delay Comparison

  • Altera Quartus II / Synopsys PrimeTime SI

  • Static timing analysis to extract max. clock frequency

  • Compare for different FPGA speed grades

    • FPGAs are binned for performance

    • ASICs tend to be designed for worst-case


Delay comparison results fastest speed grade

Delay Comparison Results(Fastest Speed Grade)

  • Logic only:3.4x avg (1.9‒5.0x)

  • Logic + DSP:3.5x avg (2.4‒4.7x)

  • Logic + Memory:3.5x avg (2.8‒4.3x)

  • Logic + Memory + DSP:3.0x avg (2.6‒3.5x)


Delay comparison results slowest speed grade

Delay Comparison Results(Slowest Speed Grade)

  • Logic only:4.6x avg (2.5‒6.7x)

  • Logic + DSP:4.6x avg (3.0‒6.3x)

  • Logic + Memory:4.8x avg (3.8‒5.7x)

  • Logic + Memory + DSP:4.1x avg (3.8‒4.7x)


Impact of hard macros on delay

Impact of Hard Macros on Delay

  • Almost no benefit—sometimes penalty!

    • Fixed positions in FPGA; extra routing to use

    • Fixed architecture; some apps. may not use efficiently


Power comparison

Power Comparison

  • Altera Quartus II Power Analyzer / Synopsys PrimePower

  • Compare power, not energy consumption

    • FPGAs slower; need more time or parallelism

    • Implement for highest speed possible

    • Simulate at same operating frequency, voltage

  • Measure only core power

  • Assume constant toggle rates for all nets in design

    • Meaningful test vectors not available for all designs

  • FPGA static power consumption scaled by used fraction


Power comparison results

Power Comparison Results

  • Logic only:14x avg (5.7‒52x)

  • Logic + DSP:12x avg (7.5‒16x)

  • Logic + Memory:14x avg (12‒16x)

  • Logic + Memory + DSP:7.1x avg (5.3‒8.3x)


Impact of hard macros on power

Impact of Hard Macros on Power

  • Slight benefit—primarily from area savings?

    • Less area and interconnect


Power consumption caveats

Power Consumption Caveats

  • May be disproportionate power in FPGA clock network

    • “Overdesigned” for tested circuits

    • Could have small incremental power increase

  • ASIC clock network would have to grow with designs


Static power comparison

Static Power Comparison

  • Unable to draw useful conclusions about static power

    • 87x for typical silicon, typical temp. (25°C)

    • 5.4x for worst-case silicon, worst-case temp. (85°C)

  • Had to scale worst-case silicon temp. characterization

  • Subthreshold leakage is process-dependent

    • Little information on leakage estimate factors

    • Different processes from different foundries

  • Some correlation between static power and area gap(correlation coefficient ~0.8)

    • Hard macros likely reduced static power penalty


Conclusions

Conclusions

  • Disparity hard to quantify—very application dependent

    • Avg. gap gap 3x; gap gap range 1.3‒9.1x

  • All-LUT designs avg. 35x area, 3.4‒4.6x delay, 14x power

    • 119x area, 47.6x power gap for equal performance(assuming ideal parallelization)

  • Hard macros reduce area and power, but have little performance benefit

    • Avg. 18x area, 3‒4.1x delay, 7.1x power

    • 54x area, 21.3x power for equal performance


References

References

  • Jones, Jr., H. S., Nagle, P. R., Nguyen, H. T., “A Comparison of Standard Cell and Gate Array Implementations in a Common CAD System”, Proc. IEEE CICC, 1986, pp. 228232

  • Brown, S. D., Francis, R., Rose, J., Vranesic, Z., Field-Programmable Gate Arrays, Norwell, MA: Kluwer, 1992

  • Zuchowski, P. S., Reynolds, C. B., Grupp, R. J., Davis, S. G., Cremen, B., Troxel, B., “A Hybrid ASIC and FPGA Architecture,” Proc. ICCAD, Nov. 2002, pp. 187194

  • Wilton, S. J., Kafafi, N., Wu, J. C. H., Bozman, K. A., Aken’Ova, V., Saleh, R., “Design Considerations for Soft Embedded Programmable Logic Cores”, IEEE JSSC, vol 40, no. 2, pp. 485497, Feb. 2005

  • Compton, K., Hauck, S., “Automatic Design of Area-Efficient Configurable ASIC Cores,” IEEE Trans. Comp., vol 56, no. 5, pp. 662672, May 2007


  • Login