1 / 18

Enhancing Post-Silicon Processor Debug with Incremental Cache State Dumping

Enhancing Post-Silicon Processor Debug with Incremental Cache State Dumping. Preeti Ranjan Panda, Anant Vishnoi, and M. Balakrishnan Proceedings of the IEEE 18th VLSI System on Chip Conference (VLSI-SoC 2010) Sept. 2010 Presenter: Chun-Hung Lai. Abstract.

mirra
Download Presentation

Enhancing Post-Silicon Processor Debug with Incremental Cache State Dumping

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Enhancing Post-Silicon Processor Debug with Incremental Cache State Dumping Preeti Ranjan Panda, Anant Vishnoi, and M. Balakrishnan Proceedings of the IEEE 18th VLSI System on Chip Conference (VLSI-SoC 2010) Sept. 2010 Presenter: Chun-Hung Lai

  2. Abstract • During post-silicon validation/debug of processors, it is common to alternate between two phases: processor execution and state dump. The state dump, where the entire processor state is dumped off-chip to a logic analyzer for further processing, is a major bottleneck. We present a technique for improving debug efficiency by reducing the volume of cache data dumped off-chip, while still capturing the complete state. • The reduction is achieved by introducing hardware mechanisms to transmit only the portion of the cache that was updated since the last dump. We propose two design alternatives based on whether or not the processor is permitted to continue execution during the dump: Blocking Incremental Cache Dumping (BICD) and Non-blocking Increment Cache Dumping (NICD). We observe a 64% reduction in overall cache lines dumped and the dump time reduces to an average of 16.8% and 0.0002% for BICD and NICD respectively.

  3. What’s the Problem • The state dump is a major bottleneck during post-silicon debug of processors • Dump processor state to off-chip • Last level cache forms the majority of the processor state • To improve debug efficiency • Reduce the volume of cache data dumped • While still capture the complete state Large amount cache Large cache dump size Huge dump duration

  4. Related Works Compression specific memory/cache data Design for debug Scan-based debug for physical / logic probing [17][20] Collection of selected signal traces For performance / energy For Debug Expand few trace signal to restore untraced signal Reduce area overhead and dump time Decompression without impacting μp execution Halt real time execution Decompression in off-line Trace compression [6][10][18] Trace signal selection [9][11][13][15] Conservative compression [1][4][12][21] Aggressive compression [14][18] Capture only error-data; zoom in interval of error signature Compression is limited Dump simultaneously with μp execution Iterative silicon debug with signature [2] Online cache dump for μp debug [19] Only for repeatable Reduce dump size Incremental cache state dumping This paper:

  5. Incremental Cache Dumping • Goal: reduce total amount of cache data to be transferred off-chip • Dump only the cache lines that are updated since last dump • Use an Update History Table (UHT) to track all cache updates between two consecutive dumps μp execution -> $ update Time Dump all Dump updated only

  6. Two Methodologies for Incremental Cache Dumping – 1st BICD • Blocking Incremental Cache Dumping (BICD) • Processor is halted during the cache dump • Dump lines whose UHT entry is set • Cost-dump time trade-offs • Each UHT bit represents more than one $ line • May lead to extra dump Don’t update but dump Reduce 56%dump size Blocking Incremental Cache Dumping (BICD)

  7. Two Methodologies for Incremental Cache Dumping – 2nd NICD • Non-Blocking Incremental Cache Dumping (NICD) • Cache dump is performed simultaneously with μp execution • Two challenges with NICD • (1) Cache state is corrupted by the executing processor • Reset the corresponding UHT entry after dumping UHT Being dumped 0 0 1 1 2 Dump 0 3 Updated 1 Non-Dumped • Solution: dump a cache line before the cache attempts to update it

  8. Two Methodologies for Incremental Cache Dumping – 2nd NICD (Cont.) • Two challenges with NICD • (2) Maintenance of the Update History Table (UHT) • UHT get incorrectly updated with the “cache dump” and the “executing μp“ • UHT-P(previous): cache updates since the last dump (indicate dump) • UHT-C(current): cache updates during dump interval (Swap their roles at the start of the next dump) Update but don’t affect current dump Dump before update UHT UHT UHT 0 0 0 0 0 0 1 1 1 1 1 1 2 2 2 0 0 0 3 3 3 Updated Non-dumped 1 Updated Updated 0 1 Time: T Time: T+1 Time: T+2 • Solution: use two UHTs

  9. Illustration of Non-Blocking Incremental Cache Dumping (NICD) - 1 Indicate lines to be dumped Dump then reset UHT entry Update line F during dump line B • Dump F then reset UHT-P entry • Update F then set UHT-C entry

  10. Illustration of Non-Blocking Incremental Cache Dumping (NICD) - 2 2 UHT-P= ‘0’ -> not dump 3 UHT-P= ‘0’ -> not dump Update line C but don’t affect current dump Update line H but don’t affect current dump 1 Ready for next dump Dump only ‘0’ for F since has been dumped due to update • For next dump: • - UHT-P: capture further updates • - UHT-C: indicate lines to be dumped

  11. Hardware Implementation – NICD Arch. Counter: dump line’s index Mask: addr. of updated window UHTs: track $ updates Use for update Use for dump • Export from $: • - W_sel: Updated way • Write: $ update • Dump: ready for dump W_sel

  12. Hardware Implementation- Operation Flow 1 Dump_S: start dump 2 Sense Valid & Dump ->data lines to buffer 3 • Cache updates (Write): • If the UHT entry = ‘1’ • then dump in advance Dump line Valid from UHT W_sel

  13. Experimental Results- Lines Dumped at Various Dump Intervals / Window Sizes • For CHESS • Lines dumped increases with “window size” and “dump interval” • For HMMER • Difference is minimal with respect to window size Increase with the dump interval Increase with the dump interval For window size 1: only 36% of total lines is dumped in average

  14. Experimental Results- Processor Stalls with NICD • Cache updates during the dumping of a window -> stall • For CHESS • Average 0.0005% stall overhead for window size 2 • For HMMER • Average 0.0001% stall overhead for window size 2 • Memory requests are spread over time with infrequent updates Stalls increase with window size Stalls increase with window size

  15. Experimental Results – Dump Time Overhead for NICD • Total dump time overhead • Processor stall overhead + dumping overhead (bus busy during dump) • For CHESS • 0.0002% dump time for all dump intervals (window size 1) • As a percentage of the original dump time • For HMMER • 0.0009% ~ 0.003% dump time for all dump intervals (window size 1) Overall dump time follows the trends of processor stalls (increase with window size) Overall dump time follows the trends of processor stalls (increase with window size)

  16. Experimental Results – Area / Access Time • Additional area / timing overhead • For BICD • Require a UHT (vary window size between 1 and 16) • Area: 0.24 ~ 0.03 • Timing: no overhead (UHT access time is smaller than cache access time) • For NICD • Dump logic • Area: twice of BICD (no extra timing overhead) • Cache modification for online dumping • Area difference is 0.0002 mm2 (no extra timing overhead) Dump logic $ modification - 180 nm synthesis technology Require 2 UHTs

  17. Conclusions • This paper proposed an incremental cache dumping • Goal: reduce transfer time and logic analyzer space requirement • Two hardware mechanisms • Blocking Incremental Cache Dumping (BICD) • Non-blocking Incremental Cache Dumping (NICD) • The results show that • Incremental dumping reduces the lines dumped by 64% • BICD: reduce dump time to 16.2% of the original dump time • NICD: reduce dump time to 0.0002% of the original dump time

  18. Comments for This Paper • Good points • Let me understand how to use cache dumping for debug • Signature based debugging approach • Map a sequence of events into a cache state dump • Factors for dump time overhead • Things can be improved • Why “dump line’s index” doesn’t import to the UHT • From the architecture, it seems to use single-port SRAM • How to achieve “cache line dump” and “normal cache access” simultaneously • Environment for transferring from dump logic to logic analyzer is not clear

More Related