1 / 16

Memory State Compressors for Gigascale Checkpoint/Restore

Memory State Compressors for Gigascale Checkpoint/Restore. www.eecg.toronto.edu/aenao. Andreas Moshovos moshovos@eecg.toronto.edu. Gigascale Checkpoint/Restore. Several Potential Uses: Debugging Runtime Checking Reliability Gigascale Speculation. Instruction Stream. checkpoint.

lilian
Download Presentation

Memory State Compressors for Gigascale Checkpoint/Restore

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Memory State Compressors for Gigascale Checkpoint/Restore www.eecg.toronto.edu/aenao Andreas Moshovos moshovos@eecg.toronto.edu

  2. Gigascale Checkpoint/Restore • Several Potential Uses: • Debugging • Runtime Checking • Reliability • Gigascale Speculation Instruction Stream checkpoint Restore trigger Many instructions

  3. Key Issues & This Study • Track and Restore Memory State • I/O? • This Work: Memory State Compression • Goals: • Minimize On-Chip Resources • Minimize Performance Impact • Contributions: • Used Value Prediction to simplify compression hardware • Fast, Simple and Inexpensive • Benefits whether used alone or not

  4. Outline • Gigascale Checkpoint/Restore • Compressor Architecture: Challenges • Value-Prediction-Based Compressors • Evaluation

  5. Our Approach to Gigascale CR (GCR) • Checkpoint: • blocks that were written into • Current Memory State + Checkpoint = Previous Memory State Checkpoints: Can be large (Mbytes) and we may want many 3 1 checkpoint begins Restore trigger 2 Checkpoint memory block on first write Restore all checkpointedmemory blocks 4 5

  6. Checkpoint Storage Requirements 1G 32M 1M Max. Checkpoint Size in Bytes 32K 1K Checkpoint Interval in Instructions

  7. Architecture of a GCR Compressor out-buffer in-buffer L1 Data Cache Main Memory Alignment Network Compressor Resources & Performance Size Size • Previous work: Compressor = Dictionary-Based • Relatively Slow, Complex Alignment, order 10K of Transistors • 64K In-Buffer  ~3.7% Avg. Slowdown

  8. Our Compression Architecture • Standalone: • ~Compression, - Resources • In Combination: • -Resources (in-buffer), +Compression, +Performance out-buffer in-buffer DictionaryCompressor L1 Data Cache Alignment Network Main Memory VP Compressor Simple Alignment VP stage Optional

  9. Value-Predictor-Based Compression Input stream Output stream mispredicted 0 value Value Predictor value predicted 1

  10. Example 0 VP 0 0 22 VP 0 22 TIME 22 VP 1

  11. Block VP-Based Compressor Input stream Output stream • Shown is Last-Outcome Predictor • Studied Others (four combinations per word) Header (one word) address VP 0 1 1 word 0 VP mispredicted words Cache block word 1 VP value value word 15 VP Half-word alignment single entry predictors

  12. Evaluation • Compression Rates • Compared with LZW • Performance • As a function of in-buffer size

  13. Methodology • Simplescalar v3 • SPEC CPU 2000 with reference inputs • Ignore first checkpoint to avoid artificially skewing the results • Simulated up to: • 80Billion instructions (compression rates) • 5Billion instructions (performance) • 8-way OOO Superscalar • 64K L1D, L1I, 1M UL2

  14. Compression Rate vs. LZW better 256M Instructions Checkpoint Interval

  15. Performance Degradation • LZW + 64K buffer = ~3.7% slowdown • LZW + LO + 1K buffer = 1.6% slowdown better

  16. Summary • Memory State Compression for Gigascale CR • Many Potential Applications • Used Simple Value-Prediction Compressors • Few Resources • Low Complexity • Fast Performance • Can be Used Alone • Can be Combined with Dictionary-based Compressors • Reduced on-chip buffering • Better Performance • Main memory compression?

More Related