1 / 19

Speculative Sequential Consistency with Little Custom Storage

Speculative Sequential Consistency with Little Custom Storage. Impetus Group Computer Architecture Lab (CALCM) Carnegie Mellon University http://www.ece.cmu.edu/~puma2. Chris Gniady and Babak Falsafi. Distributed Shared Memory (DSM). …. CPU. CPU. CPU. Cache. Cache. Cache. Memory Bus.

vito
Download Presentation

Speculative Sequential Consistency with Little Custom Storage

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Speculative Sequential Consistency with Little Custom Storage Impetus Group Computer Architecture Lab (CALCM) Carnegie Mellon University http://www.ece.cmu.edu/~puma2 Chris Gniady and Babak Falsafi

  2. Distributed Shared Memory (DSM) … CPU CPU CPU Cache Cache Cache MemoryBus Network Memory DSM Hardware Logically shared but physically distributed memory • Shared-memory programming • Scalable • Long shared memory access can be a bottleneck! Speculative Sequential Consistency with Little Custom Storage

  3. Programming DSM To achieve high performance: • Release Consistency (RC) • Relaxes memory order • Software annotation What programmers want: • Sequential Consistency (SC) • Intuitive • Memory order enforced  slow Prior work: Speculative SC (SC++) [ISCA’99] • Hardware speculatively relaxes order • High performance & intuitive • Large custom “speculative history” queue Speculative Sequential Consistency with Little Custom Storage

  4. This Talk’s Contributions • Characterize history size across apps • Varies from 16 to 8K entries! • Bursty: Over 85% of time empty • Propose SC++Lite • Allocates history in memory hierarchy • Enhances scalability across apps & systems • Reduces custom storage from 51 KB to 2 KB Result  Speculative SC (almost) for Free! Speculative Sequential Consistency with Little Custom Storage

  5. Outline • Overview • Memory Ordering in RC • Memory Ordering in SC++ • SC++Lite: SC++ with Little Custom Storage • Results • Conclusions Speculative Sequential Consistency with Little Custom Storage

  6. ST X Miss LD Y Miss LD/ST Queue LD Z Miss ... Memory Ordering in RC ST X Out of order LD A ALU Retired ST A LD Y LD Z Reorder Buffer ... • “LD A” & “ST A” retire out of order • Overlaps “ST X”, “LD Y” & “LD Z” misses • Software guarantees overlap is ok! ... Speculative Sequential Consistency with Little Custom Storage

  7. ST X Miss LD Y Miss LD/ST Queue LD Z Miss ... SC++: Hardware Relaxes Memory Order [ISCA’99] ST X Coherence Messages LD A Speculative History Queue ALU Look up for potential rollback ST A Speculative Retirement LD Y LD Z Reorder Buffer • Speculatively retires instructions in hardware • Rolls back when coherence messages hit in history ... ... Speculative Sequential Consistency with Little Custom Storage

  8. SC++’s Implementation Overhead Speculative History Queue: • On-chip custom storage • Grows up to subsequent missing load • Size is application & system dependent • Must assume worst-case size at design! Can we (virtually) eliminate custom storage in SC++? Speculative Sequential Consistency with Little Custom Storage

  9. SC++Lite: SC++ with Little Custom Storage Store history into memory hierarchy! • Queue allocated at boot time in physical memory • Use block buffer to pack history, ship to L2 • Store ack updates head pointer (in LD/ST queue) • ROB retirement updates tail pointer • “Dead” history is not written back! Speculative Sequential Consistency with Little Custom Storage

  10. ST X Miss ST Z Miss Head LD Y Miss Index LD Z Miss ROB ... ROB ... Memory Ordering in SC++Lite Coherence Messages Cache block to L2 Look up for potential rollback LD A Speculative Block Buffer ALU LD/ST Queue Location in L2 ST A Speculative Retirement • Only history burst retires into L2 • History in L2 typically discarded LD Y LD Z Reorder Buffer ... ... Speculative Sequential Consistency with Little Custom Storage

  11. SC++Lite Design Requirements Avoid perturbing application’s critical path! SBB: • Size depends on L2 latency & retirement rate • Large enough to filter store hits into L2 L2: • Retirement rate proportional to required bandwidth • Large blocks help • Small blocks may need multiporting • Head & tail registers reduce history traffic Speculative Sequential Consistency with Little Custom Storage

  12. Outline • Overview • Memory Ordering in RC • Memory Ordering in SC++ • SC++Lite: SC++ with Little Custom Storage • Results • Conclusions Speculative Sequential Consistency with Little Custom Storage

  13. Experimental Methodology Using RSIM • 16 nodes with 1 GHz, 8-issue CPU • 128-entry ROB & LD/ST queue • Average remote-to-local access ratio of ~2 • 32-Kbyte, direct-mapped L1 cache • 512-Kbyte, 8-way L2 cache, 64 GB/s • 256-entry Lookup Table • 32-entry SBB Speculative Sequential Consistency with Little Custom Storage

  14. System & application dependent: varies 16–4K History is bursty: non-empty < 15% time History Size Characterization Speculative Sequential Consistency with Little Custom Storage

  15. Base RC, SC++ & SC++Lite • Up to 80% gap between SC & RC • 31% average speedup for SC++, 28% for SC++lite Speculative Sequential Consistency with Little Custom Storage

  16. Sensitivity to 4x Network Latency • SC++ requires 2x queue size to perform best • SC++Lite’s performance remains stable Speculative Sequential Consistency with Little Custom Storage

  17. Custom Storage Requirements SC++: • ~51KB of custom storage • Doubles for 4x network latency • Radix shows worst-case history SC++Lite: • ~2KB of custom storage for all apps • Performance insensitive to network latency Speculative Sequential Consistency with Little Custom Storage

  18. Conclusions Previously showed [ISCA’99]: • Speculative SC achieves RC’s performance This talk: • Proposed SC++Lite • Allocates history in memory hierarchy • Enhances scalability across apps & systems Result  Speculative SC (almost) for Free! Speculative Sequential Consistency with Little Custom Storage

  19. For More Information Please visit our web site at Impetus Group Computer Architecture Lab (CALCM) Carnegie Mellon University http://www.ece.cmu.edu/~puma2 Speculative Sequential Consistency with Little Custom Storage

More Related