Why do so many chips fail
Download
1 / 10

Why do so many chips fail? - PowerPoint PPT Presentation


  • 58 Views
  • Uploaded on

Why do so many chips fail?. Ira Chayut , Verification Architect (opinions are my own and do not necessarily represent the opinion of my employer). Failure rate of first silicon is rising.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Why do so many chips fail?' - genevieve-harris


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Why do so many chips fail

Why do so many chips fail?

Ira Chayut, Verification Architect

(opinions are my own and do not necessarily represent the opinion of my employer)


Failure rate of first silicon is rising
Failure rate of first silicon is rising

  • “… research by Collett International revealed that 52% of complex application specific integrated circuits (ASICs) required a respin and the reason was largely due to functional errors.” (http://www.techonline.com/community/ed_resource/feature_article/36655)

  • Who is to blame? (There must be someone to blame!)

    • Management – they didn’t provide enough resources

    • HW Engineering – they created the functional errors

    • Verification – they didn’t catch the functional errors

    • Architecture – they didn’t focus on testability

    • Marketing – they kept changing the specs


People don t kill chips complexity kills chips
People don’t kill chips, complexity kills chips

http://www.cs.utexas.edu/users/dburger/teaching/cs395t-s99/papers/2_src.pdf(1999) — Projected numbers are a bit lower than current reality – a dual core AMD Opteron has 233 million transistors and the Intel Itanium 2 has 592 million transistors


Complexity increases exponentially
Complexity increases exponentially

  • Chip component count increases exponentially over time (Moore’s law)

  • Interactions increase super-exponentially

  • IP reuse and parallel design teams facilitate more functions with fewer HW engineers per function and more functions per chip

  • Verification effort gets combinatorially more difficult as functions are added


Why verification is not able to keep up
Why verification is not able to keep up

  • Verification effort gets combinatorially more difficult as functions are added

    BUT

  • Verification staffing/time cannot be made combinatorially larger to compensate

    AND

  • Chip lifetimes are too short to allow for complete testing

    THUS

  • Chips will continue to have ever-increasing functional errors as chips get more complex


Limiting the number of architectural and functional errors
Limiting the number of architectural and functional errors

  • Thorough unit-level verification testing

    • Small simulations run faster

    • Avoids combinatorial explosion of interactions

  • Well defined interfaces between blocks with assertions and formal verification techniques to reduce inter-block problems

  • Emulation or FPGA prototyping to accelerate testing


How to live with functional errors
How to live with functional errors

  • Successful companies have learned how to ship chips with functional and architectural – time to market pressures and chip complexity force the delivery of chips that are not perfect (even if that were possible). How can this be done better?

  • For a long while, DRAMs have been made with extra components to allow a less-than-perfect chip to provide full device function and to ship

  • How to do the same with architectural features? How can full device function exist in the presence of architectural or implementation omissions or errors?


Architecture support
Architecture support

  • Embrace Perl’s motto: “There's More Than One Way to Do It” — allow for multiple ways of accomplishing all critical specified functions

  • Analogous to Design for Test (DFT) and Design for Verification (DFV), we should start thinking about Architect for Verification (AFV)

    [Thanks to Dave Whipp for the AFV phrase and acronym]

  • In some problem domains, such as networking, upper-layer protocols can recover from some silicon errors; though there is a performance penalty when this is used


Architect support continued
Architect support, continued

  • A programmable abstraction layer between the real hardware and user’s API can hide functional warts — hardware catches specific operations and either directs them to one of multiple hardware implementations, or signals a software trap

    • Pyramid minicomputers hid the assembly language from users, compiler could work around problems

    • Transmeta maps standard machine language to hidden processor architecture, translation software can work around problems

  • Soft hardware can allow chip redesign after silicon is frozen (and shipped!)


Summary
Summary

  • Ever increasing chip complexity prevents total testing before tape-out (or even before shipping)

  • AFV techniques can make chip verification not subject to combinatorial explosion

  • We have to accept that there will be architectural and functional failures in every advanced chip that is built

  • Architecture support needed to allow failures to be worked around or fixed after post-silicon


ad