1 / 24

Design Alternatives for SAS: The Beauty of Mobile Homes

This article explores alternative design options for SAS to overcome capacity limitations and improve performance in mobile homes. It discusses communication elimination, hardware support reduction, weak consistency exploitation, and cache-only memory architecture.

heal
Download Presentation

Design Alternatives for SAS: The Beauty of Mobile Homes

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Design Alternatives for SAS:The Beauty of Mobile Homes CS 258, Spring 99 David E. Culler Computer Science Division U.C. Berkeley

  2. Some Questions you might ask • Can all unnecessary communication be eliminated? • capacity-related communication? • false-sharing? • How much hardware support can be eliminated? • Can weak consistency models be exploited to reduce communication? • Can we simplify hardware coherence mechanisms while avoiding capacity-related communication? CS258 S99

  3. Overcoming Capacity Limitations • Seen big remote caches • 32 MB on NUMA-Q • What about using region of local mem as remote cache? • basic operation is to access mem. and check tag state • dispatch to specific protocol action Rmt $ dir M Rmt $ CA CA dir M $ $ P P CS258 S99

  4. Well? • Does it eliminate communication on capacity misses? • How much of memory is wasted? • Do we need a home memory at all? CS258 S99

  5. Cache-Only Memory Arch (COMA) • View entire local memory as ‘attraction memory’ • Associate tag with every memory block • Whenever a block as access and brought into cache, also bring it into local AMem. • keep cache consistent with AMem • keep Amem’s consistent • Location decoupled from physical address • What new issues arise? • Recall: basic components • find state information • find copies • communication with copies CS258 S99

  6. Finding a block on a miss? • Hierarchical Snooping COMA? • Hierarchical Directory COMA? • Flat Directory COMA? CS258 S99

  7. Last Copy Replacement Problem • How to avoid discarding the only copy of a block? • Hierarchical? • Flat? CS258 S99

  8. Hardware/Software Tradeoffs? • What aspects of SW are simplified? • What extra HW support is required? • memory tags and comparators • memory for replication • Why is HW support more complicated? • discover location on miss • no space preallocated CS258 S99

  9. Performance Trade-Offs • Artrifactual Communiation? • Remote access latency? • Local memory access latency? CS258 S99

  10. Which access patterns win/lose • Miss rate low? • Miss rate ~low • mostly coherence misses • true sharing • false sharing • mostly capacity misses or poor initial data placement • course grained • blocking • fine grained • unpredictable CS258 S99

  11. Plenty of replication memory • Why is it important for performance? CS258 S99

  12. Hardware support • Four fundamental components of the comm assist • access control checks • per-block tags and state used in the check • protocol processing • network interface • Case for tight integration • access control check must see every load/store miss to shared data • requires checking per mem. block tag • protocol processing involves update cache/dir state • move small data items to/from NI • Case against • L1 and L2 cache controller’s on chip, closest you can get is the memory controller CS258 S99

  13. Decoupled Assist Options CS258 S99

  14. M M M $ $ $ P P P SAS w/o hardware support? • Treat memory as fully-associative cache for global shared virtual address space • Unit of coherence: page • Basic components • Access control? • Tag and state check? • Protocol processing? • Communication? • Problems? Shared Virtual Address Space Same virtual address represented at different physical addresses on each processor! - what needs to be invalidated? Inclusion?? CS258 S99

  15. Exploiting Weak Consistency • So far in HW approaches • changes when invalidations must be processed • avoid stalling processor while invalidations processed • still propagate invalidations ASAP • Can invalidations be avoided? SC x and y on same page! RC P0 P1 ... W(x) ... R(y) ... W(x) ... R(y) barrier barrier ... W(x) ... R(y) P0 P1 ... W(x) ... R(y) ... W(x) ... R(y) barrier barrier ... W(x) ... R(y) CS258 S99 propagate inv. at synch points!

  16. When should inv be propagated? • Eager Release Consistency • At release propagate invalidations due to writes and wait for acks before proceding past the release point • conservative, actually! • need to propagate before another processor does an acquire CS258 S99

  17. Lazy Release Consistency • Associate invalidations with release • On acquire, process invalidations for logically preceeding releases • apply them to relavant pages • Respect program order and causal order CS258 S99

  18. Causal Order CS258 S99

  19. Relationship to HW coherence • Is LRC coherent? • writes not propagated unless synchronization occurs • different processes may see writes through different synchronization chains • LRC and RC are different consistency models! CS258 S99

  20. Multiple Writer Protocols • How do we avoid false sharing traffic when two processors write to different locations in a page? CS258 S99

  21. Options • Treadmarks: • protect page till first write • make twin at first write from processor • at release, compute diff. • propagate at release, when demanded at acquire, ... • on page fault, collect and merge diffs from all copies • storage reclamation for diffs? • Home based protocol • propagate diffs into home at release • on fault, get full page • write-thru to other pages (Shrimp, Memory channel) CS258 S99

  22. Further Weakening • Entry consistency • associated regions with each synch variable • only propagate invalidations for associated region • Jade • similar, but language construct • Scope consistency CS258 S99

  23. Middle Ground: Simple-COMA, Stache • automatic migration at page level controlled in software • fine grain access control in hardware • page fault: • allocate page in local memory, but leave all blocks invalid • page hit, cache miss: • access tag in parallel with memory access • can be separate memory • physical address valid (not uniform) • on protocol transactions, reverse translate to shared virtual address • No HW tag comparison. (just state) • No local/remote check! $ P CS258 S99

  24. Conclusions • Memory is a binding of names to values • Modern shared address space designs deeply separate NAMES from LOCATIONS • Share virtual addresses • COMA has uniform virtual->’physical’, but physical can migrate • SVM use distinct virtual->physical to achieve page-level sharing • must exploit weak consistency models so artifactual communication doesn’t dominate • Simple-COMA uses mapping to simplify HW while providing migration! CS258 S99

More Related