1 / 23

Dynamic Verification of Sequential Consistency

Dynamic Verification of Sequential Consistency. Albert Meixner and Daniel J. Sorin. Presented by Peter Gilbert ECE259 Spring 2008. Introduction. Problem: How to detect errors in multithreaded memory systems?

duyen
Download Presentation

Dynamic Verification of Sequential Consistency

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dynamic Verification of Sequential Consistency Albert Meixner and Daniel J. Sorin Presented by Peter Gilbert ECE259 Spring 2008

  2. Introduction • Problem: How to detect errors in multithreaded memory systems? • Transient physical faults increasingly problematic as memory systems become more complex • Errors possible in many components: caches, memories, cache/memory controllers, interconnect

  3. Potential approach • Tailored detection mechanisms for each component and each type of error • Example: for “single-bit-stuck-at-x” model for system bus, add a parity bit • Problems: • Lots of components… • Requires understanding error models and how they interact • Cannot detect design bugs • Some errors difficult to detect with localized mechanisms

  4. Dynamic verification • Monitor system execution • Verify high-level invariants rather than considering individual components • Can detect transient faults, design bugs, fabrication defects • Any error that affects the high-level invariants • End-to-end correctness

  5. DVSC • Verifying memory consistency == verifying memory system correctness • Goal of DVSC: verify that SC is enforced in a shared memory multiprocessor • SC defines end-to-end correctness

  6. DVSC error model • Caches and memories • Bit corruption • Cache/memory controllers • Corruption of state or outputs • Interconnect • Messages corrupted, dropped, replicated, misrouted, reordered

  7. First idea: DVSC-Direct • Dynamically construct total order of loads and stores • Verify that total order satisfies SC • Trigger system recovery when error is detected

  8. DVSC-Direct Design • For every load and store: • Processor informs block’s home memory node • Inform message: • <address, load/store, data value, logical time> • Logical time: if A causes B, A has smaller logical time • Replay accesses in logical time order at home node • Uses priority queue for Informs and shadow copies of memory blocks • Verify that load gets value from most recent store

  9. Cost of DVSC-Direct • Inform bandwidth proportional to the number of loads and stores • 8-53 times more bandwidth than unprotected system • Uses bandwidth like a system without caches!

  10. Alternative: DVSC-Indirect • Verify sub-invariants proven to be equivalent to SC • Proof due to Plakal et al. • Terminology • coherence epoch - interval of logical time during which a processor has Shared or Exclusive access to a block • A memory access is bound to a coherence transaction T if permission is obtained via T

  11. Constructing SC from sub-invariants • Fact 1: load of block B bound to T receives either: • Most recent store of B bound to T • Value of B received in response to T • Lemmas • Exclusive epochs for block B do not overlap with other Exclusive or Shared epochs for B • Every load or store occurs in some epoch and is bound to the transaction that epoch • Each word w of B received at the start of an epoch equals the most recent store to w

  12. DVSC-Indirect approach • DIVA to verify that memory operations occur in program order (Fact 1) • Recall from Architecture I: DIVA dynamically verifies a speculative core • DIVA presents in-order abstraction to memory system • ECC added to each cache and memory line to detect silent corruptions • Hardware for verifying epoch invariants

  13. Cache controller • Cache controller maintains Cache Epoch Table • CET entry (per-block): • Check that every load and store is performed in appropriate epoch (Lemma 2) • When epoch ends, send info to home memory controller in Inform-Epoch message (CET + end time and end data hash) S/E DRB Logical time at start Hash of data block at start of epoch 1 bit 1 bit 16 bits 16 bits S/E - type of epoch: shared or exclusive DRB - data ready bit

  14. Memory controller • Memory controller maintains directory-like Memory Epoch Table • MET entry (per-block): • When Inform-Epoch is received from cache controller: • Sort in priority queue (VWB) • Process in logical time order, checking: • This epoch does not overlap with other epochs • Correct block data is transferred from epoch to epoch Latest end time of any S epoch Latest end time of any E epoch Hash of data block from latest E epoch 16 bits 16 bits 16 bits

  15. DVSC-Indirect implementation CPU CPU CPU DIVA DIVA DIVA Cache CET Cache CET Cache CET Interconnect Memory VWB Memory VWB Memory VWB Verifier Verifier Verifier MET MET MET

  16. DVSC-Indirect snooping example

  17. DVSC-Indirect summary • Verifies SC through sub-invariants • Costs: • DIVA • Storage structures • CET at each cache controller, MET and VWB at each memory controller, ECC on cache and memory lines • Not large or complicated • Bandwidth usage • Proportional to coherence traffic (as opposed to number of loads and stores for DVSC-Direct)

  18. Evaluation • Can DVSC detect the errors from the error model? • Corrupted, dropped, misrouted, reordered, duplicated messages • Corrupted cache and memory blocks • Don’t consider errors in processor core • DIVA handles this • How much does DVSC increase bandwidth usage? • How does it affect error-free performance?

  19. Methodology • Full-system simulation with Simics • 8-node multiprocessor • Each processor implements SC, speculates for higher performance • Two levels of cache • Support for backward error recovery with SafetyNet at each node • DVSC-Indirect costs • Each CET 68 KB • Each MET 102 KB • VWB: 1024 entries • SPARC V9 running Solaris 8 • Interconnect: 2.5 GB/s links • Benchmarks • Four commercial workloads + barnes-hut from SPLASH-2

  20. Error coverage • DVSC-Direct and DVSC-Indirect detected all injected errors in simulation • Small probability of false negatives in DVSC-Indirect • ECC fails to detect a bit error • hash collisions • False positives also possible in DVSC-Indirect • When VWB not large enough to prevent out-of-order processing of Inform-Epochs

  21. Bottleneck link bandwidth • DVSC-Direct: uses 8-53 times more bandwidth • DVSC-Indirect directory: 8-25% increase • DVSC-Indirect snooping: 0-15% increase Bandwidth on most-utilized link - directory

  22. DVSC-Indirect error-free performance • Performance impact minimal: usually equivalent to that of SafetyNet by itself • Similar results for snooping Runtime with DVSC-Indirect compared to unprotected system - directory

  23. Conclusions • DVSC-Indirect is effective at detecting all memory system errors injected in the simulations • False negatives and false positives will occur with small probability • DVSC-Indirect imposes small error-free performance overhead • Bottleneck bandwidth usage with DVSC-Indirect is only 8-25% greater than unprotected case • Is the hardware cost justified? • Probably depends on the application reliability requirements

More Related