1 / 22

A Billion Cycles a Day: Industrial Verification

A Billion Cycles a Day: Industrial Verification. Matthew Heath Presentation to Synthesis & Verification Class May 8, 2003. Based on “Validating the Intel Pentium 4 Microprocessor” by Bob Bentley, DAC 2001. How do you verify a design with. 42 million transistors 1 million lines of RTL code

gauthier
Download Presentation

A Billion Cycles a Day: Industrial Verification

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Billion Cycles a Day:Industrial Verification Matthew Heath Presentation to Synthesis & Verification Class May 8, 2003 Based on “Validating the Intel Pentium 4 Microprocessor” by Bob Bentley, DAC 2001

  2. How do you verify a design with... • 42 million transistors • 1 million lines of RTL code • 600 – 1000 people working on it • A 3-year design time • Daily design changes

  3. How do you verify a design which has bugs like this?? • The FMUL instruction, when the rounding mode is set to “round up”, incorrectly sets the sticky bit when the source operands are: src1[67:0] = X*2i+15 + 1*2i src2[67:0] = Y*2j+15 + 1*2jwhere i+j = 54 and {X,Y} are integers

  4. And the answer is... • Hire 70+ validation engineers • Buy several thousand compute servers • Write 12,000 validation tests • Run up to 1 billion simulation cycles per day for 200 days • Check 2,750,000 manually-defined properties • Find, diagnose, track, and resolve 7,855 bugs • Apply formal verification with 10,000 proofs to the instruction decoder and FP units • This found that obscure FMUL bug!

  5. We know why validation is hard for tools.Why is it hard for people who run them? • To meet an aggressive tapeout schedule, design and validation must occur in parallel without one blocking the other. • Validation starts before the design is done • Design changes occur while validation tests are running • Both design and validation must continue in the presence of known, unfixed bugs

  6. The design team • 300 designers write RTL code • Refer to architectural spec, textbooks, research papers, conversations • Start with basic functionality and progressively add features according to project staging plan • Do simple self-checks along the way

  7. The validation team • 100 validators write RTL tests • Refer to same sources as designers, plus the RTL implementation itself • Write functional tests to exercise features as they’re implemented • Run tests on RTL simulator • Diagnose failures • File bug reports in central database

  8. The management • Collect and analyze data • Pass/fail status of tests • Bug database statistics (counts, priority, age, discovery rate, fix rate, etc.) • RTL feature implementation progress • Compare trends with project schedule • Respond if necessary • Re-allocate resources to high-risk areas • Prioritize work

  9. SRTL = “Structural” RTL • Boolean equations; no behavioral syntax • State-accurate • RTL state maps directly to schematic state • High-level constructs supported • Macros, constants, loops, vectors • Design hierarchy • Full-chip has 6 clusters • Each cluster has several units • Each unit has tens of functional blocks • Each block has O(104) transistors • Each designer owns several functional blocks

  10. SRTL models • Cluster and full-chip level • Full-chip models consume ~1GB of disk space • Compiled, executable SRTL code • Source code • Test environments • Include emulation of external logic • Direct control over interface signals • Pre-defined sets of signals commonly selected for tracing during test debug • Library of useful test fragments

  11. Most design work at cluster level • Decouples cluster and full-chip validation • Designers “graft” to latest cluster models • Check-out and edit selected source files • Incremental model build • Run validation tests • Revision control system • Designers check-in edited source files • Log messages include change descriptions, author, timestamp

  12. Cluster model release process • Designers periodically turn-in selected checked-in versions of source files • Coordinated turn-ins sometimes necessary • Cluster model builders process turn-ins • Merge changes from different versions of the same source file included in multiple turn-ins • Compile an executable cluster SRTL model • Run tests provided by the validators • Report test failures to validators and designers for debug • Acceptable models released to design team for future grafts

  13. Full-chip model release process • Same process, different hierarchy • Cluster model builders don designer’s hat • Graft to full-chip model • Edit based on changes to recent cluster models • Incremental full-chip model build • Run full-chip validation tests • Debug failures, full-chip turn-in • Now full-chip model builders take over... • Process turn-ins from all clusters • Run full-chip validation tests again! • Release full-chip models to design team

  14. Netbatch • 109 simulation cycles / day =10 Hz * 105 sec/day * 103 computers • Netbatch manages compute server workload • For a given SRTL model and set of tests, create a job file and send it to netbatch • Each sub-team has a netbatch allocation • Jobs exceeding allocation enter wait queue • Wait times of 24 hrs + not uncommon • Test results • Pass/fail statistics • Failure time and meaningful error message • Traces of user-selected system state

  15. Efficiency improvements • A SRTL change made by a designer... • Appears in a cluster model 1 week later • Appears in a full-chip model 2 weeks later • Validators find bugs in released models which the designer has already fixed • “Onion peeling” vs. “whack-a-mole” debug • Temporarily disabling failing properties • Releasing models which fail some tests • System state capture and restore

  16. Central bug database • Released model version • Failing validation test & symptoms • Root cause • Requested design change • Priority • Log of discussion among designers, validators, and managers • Status / disposition • New, ETA, test fixed, design fixed (& version), validated, dropped

  17. Bug root causes

  18. Schematic formal verification • Use formal techniques because schematic simulation takes too long • Schematic design starts long before SRTL design is done • Bottom-up • Verify SRTL macros vs. library cells first • Black-box macrocells & verify block • Because SRTL is state-accurate, verification is combinational only!

  19. D D D Q Q Q One SRTL state may map to multiple functionally equivalent schem states X Z W Y CLK W1 D Q Z X Z = X & Y MSFF (Z, W, CLK) Y W2 D Q CLK Z1 W1 X Y Z2 W2 CLK

  20. D D D Q Q Q Retiming must be back-annotated into SRTL • Exception: Inverters Z1 X Z = X & Y MSFF (Z, W, CLK) W Y Z2 CLK Y MSFF (X, Y, CLK) Z = ~Y X Z CLK

  21. Conclusion • Efficient verification of large-scale designs is a daunting management challenge • Design and validation are concurrent, not iterative • Possible with adequate resources and powerful tools to use the resources efficiently • Methodology constraints keep the problem tractable • Clear communication among team • Careful documentation • Progress tracking is key to staying on schedule • Motto: “If it hasn’t been verified, it doesn’t work.”

  22. How NOT to do verification... Arnold was unhappily aware that the complete Jurassic Park program contained more than half a million lines of code, most of it undocumented, without explanation... “What are you doing, John?” “Checking the code.” “By inspection? That’ll take forever.” - Michael Crichton, Jurassic Park

More Related