560 likes | 694 Views
This paper presents "BackSpace," a novel solution for post-silicon debugging aimed at addressing the critical challenges faced when chips return from fabrication with functional bugs. It reviews current debugging practices and highlights their limitations, demonstrating the effectiveness of BackSpace through proof-of-concept experiments. By leveraging efficient formal analysis techniques, BackSpace aims to avoid guesswork, reduce system interference, and provide accurate traces to facilitate bug identification. The paper concludes with insights into future developments and the potential of BackSpace in advanced debugging processes.
E N D
BackSpace: Formal Analysis for Post-Silicon Debug Flavio M. de Paula* Marcel Gort *, Alan J. Hu *, Steve Wilton *, Jin Yang+ * University of British Columbia + Intel Corporation
Outline • Motivation • Current Practices • BackSpace – The Intuition • Proof-of-Concept Experimental Results • (Recent Experiments) • Conclusions and Future Work
Motivation • Chip is back from fab! • Screened out chips w/ manufacturing defects
Motivation • Chip is back from fab! • Screened out chips w/ manufacturing defects • A bring-up procedure follows: • Run diagnostics w/o problems, everything looks fine!
Motivation • Chip is back from fab! • Screened out chips w/ manufacturing defects • A bring-up procedure follows: • Run diagnostics w/o problems, everything looks fine! • But, the system becomes irresponsive while running the real application…
Motivation • Chip is back from fab! • Screened out chips w/ manufacturing defects • A bring-up procedure follows: • Run diagnostics w/o problems, everything looks fine! • But, the system becomes irresponsive while running the real application… • Every single chip fails in the same way (1M DPM: Func. bugs)
Motivation • Chip is back from fab! • Screened out chips w/ manufacturing defects • A bring-up procedure follows: • Run diagnostics w/o problems, everything looks fine! • But, the system becomes irresponsive while running the real application… • Every single chip fails in the same way (1M DPM: Func. bugs) • What do we do now?
Current Practices Inputs Scan-out buggy state
Current Practices Inputs Scan-out buggy state But, cause is not obvious!!!
Current Practices Guess when to stop and single step Inputs ? ? ? Scan-out
Current Practices Guess when to stop and single step Inputs ? Problems: Single-stepping interference; Non-determinism; Too early/late to stop? Non-buggy path
Current Practices • Leveraging additional debugging support: • Trace buffer of the internal state
Current Practices • Leveraging additional debugging support: • Trace buffer of the internal state • Provides only a narrow view of the design, e.g., program counter, address/data fetches
Current Practices • Leveraging additional debugging support: • Trace buffer of the internal state • Provides only a narrow view of the design, e.g., program counter, address/data fetches • Record all I/O and replay • Solves the non-determinism problem, but… • Requires highly specialized bring-up systems
Current Practices • Leveraging additional debugging support: • Trace buffer of the internal state • Provides only a narrow view of the design, e.g., program counter, address/data fetches • Record all I/O and replay • Solves the non-determinism problem, but… • Requires highly specialized bring-up systems • Just having additional hardware • does NOT solve the problem
A Better Solution: BackSpace • Goal: • Avoid guess work • Avoid interfering with the system • Run at speed • Portable debug support • Compute an accurate trace to the bug
A Better Solution: BackSpace • Requires: • Hardware: • Existing test infrastructure and scan-chains; • Breakpoint circuit; • Good signature scheme; • Software: • Efficient SAT solver; • BackSpace Manager
A Better Solution: BackSpace Inputs 1. Run at-speed until hit the buggy state Non-buggy path
A Better Solution: BackSpace Inputs 1. Run at-speed until hit the buggy state Non-buggy path
A Better Solution: BackSpace Inputs 1. Run at-speed until hit the buggy state Non-buggy path
A Better Solution: BackSpace Inputs 1. Run at-speed until hit the buggy state Non-buggy path
A Better Solution: BackSpace Inputs 2. Scan-out buggy state and history of signatures
A Better Solution: BackSpace Inputs Off-Chip Formal Analysis Formal Engine
A Better Solution: BackSpace Inputs • Off-Chip Formal Analysis • - Compute Pre-image Formal Engine
A Better Solution: BackSpace Inputs Pick candidate state and load breakpoint circuit Formal Engine
A Better Solution: BackSpace Inputs Run until hits the breakpoint Formal Engine
A Better Solution: BackSpace Inputs Pick another state Formal Engine
A Better Solution: BackSpace Inputs Run until hits the breakpoint Formal Engine
A Better Solution: BackSpace Inputs Run until hits the breakpoint Formal Engine
A Better Solution: BackSpace Inputs Computed trace of length 2
A Better Solution: BackSpace Inputs Iterate Formal Engine
A Better Solution: BackSpace Inputs BackSpace trace
Outline • Motivation • Current Practices • BackSpace – The Intuition • Proof-of-Concept Experimental Results • Recent Experiments • Future Work
Proof-of-Concept Experimental Results Chip on Silicon BackSpace Manager SAT Solver
Proof-of-Concept Experimental Results Logic Simulator BackSpace Manager SAT Solver
Proof-of-Concept Experimental Results • Setup: • OpenCores’ designs: • 68HC05: 109 latches • oc8051 : 702 latches • Run real applications
Proof-of-Concept Experimental Results • Can we find a signature that reduces the size of the pre-image? • Experiment: • Select 10 arbitrary ‘crash’ states on 68HC05; • Try different signatures
Proof-of-Concept Experimental Results • How far can we go back? • Experiment: • Select arbitrary ‘crash’ states: • 10 for each 68HC05 and oc8051; • Set limit to 500 cycles of backspace; • Set limit on size of pre-image to 300 states; • Compare the best two types of signature; • Hand-picked • Universal Hashing of entire state
Proof-of-Concept Experimental Results • Results • Signature: Universal Hashing • Small size of pre-images • All 20 cases successfullyBackSpaced to limit
Proof-of-Concept Experimental Results • Breakpoint Circuitry • 40-50% area overhead. • Signature Computation • Universal Hashing naïve implementation results in 150% area overhead.
Recent Experiments • OpenRisc 1200: • 32-bit RISC processor; • Harvard micro-architecture; • 5-stage integer pipeline; • Virtual memory support; • Total of 3k+ latches • BackSpace implemented in HW/SW • AMIRIX AP1000 FPGA board (provided by CMC) • Board mimics bring-up systems • Host-PC: off-chip formal analysis
Recent Experiments • BackSpacing OpenRisc 1200: • Running simple software application • Backspaced for hundreds of cycles • Demonstrated robustness in the presence of nondeterminism
Conclusions & Future Work • Introduced BackSpace: a new paradigm for post-silicon debug • Demonstrated it works • Main challenges: • Find hardware-friendly & SAT-friendly signatures • Minimize breakpoint circuitry overhead