1 / 11

Architecture and Design of AlphaServer GS320

Architecture and Design of AlphaServer GS320. Presented by Vijeta Johri 02/13/04. Motivation. Huge Demand for small and medium scale multiprocessors compared to larger servers Scarcity of scalable applications and OS Achieving high reliability and fault-containment is tough

sugar
Download Presentation

Architecture and Design of AlphaServer GS320

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Architecture and Design of AlphaServer GS320 Presented by Vijeta Johri 02/13/04

  2. Motivation • Huge Demand for small and medium scale multiprocessors compared to larger servers • Scarcity of scalable applications and OS • Achieving high reliability and fault-containment is tough • GS320 targeted at medium-scale multiprocessing • Take advantage of smaller size of system • Eliminate inefficiencies of directory-based protocol

  3. AlphaServer GS320 Architecture • Hierarchical Shared Memory multiprocessor • 8 QBBs • 10 port local switch • 4 Alpha 21264 processors • Separate on chip I & D cache & external cache • 4 memory modules • 1-8GB SDRAM memory • IO interface supports 8 PCI buses

  4. AlphaServer GS320 Architecture • QBB’s (contd.) • DIR (directory) • 14-bit entry per 64 byte memory line • 6 bit owner field • 8 bit coarse vector having granularity of QBB • Dirty sharing supported • DTAG • Functions as centralized full map directory • Maintains coherence within QBB • TTT • 48 entry associative table • Global Switch • Supports virtual lanes & multicast

  5. Cache Coherence • Goals • Make the common transaction efficient • Exploit small size and interconnect ordering properties • Protocol messages • Resource occupancy • Invalidation based protocol • 4 request types • Read • Read-exclusive • Exclusive • Exclusive-without-data • Reply-forwarding from remote owners • Eager exclusive replies

  6. Cache Coherence • Handle corner cases without NAKs/retries and blocking at home directory • Guarantees owner node always services a forwarded request • all transactions complete with at most 1 message to home • Directory controller implemented as simple pipelined state machine • Eliminates livelock, starvation • Virtual lanes • Q0 : processor to home (point to point order) • Q1 : home / memory to processors ( total order ) • Q2 : replies from third party node or processor to requestor

  7. Cache Coherence • Dealing with late request race • 2 level mechanism • Wait for victim signal before discarding from victim buffer • For writeback to remote home, TTT maintains copy • Dealing with early request race • Delay forwarded request on Q1 until data arrives on Q2 • Allow transactions to be served within a node • No invalidate acknowledgement messages • Multicast is used to send Q1 messages to multiple nodes

  8. Cache Coherence • R: Requestor • H: Home • O: Owner • S: Sharer • Dirty sharing • no sharing writeback • Marker message • Allows requestor node to disambiguate the order of requests

  9. Memory Consistency Optimizations • Alpha memory model is supported • Barrier instructions impose memory ordering • To implement safe early acknowledgement of invalidation, reply message split into • Data component needed to service the request • Commit component used for ordering • Generate early commit component for read and read-exclusive requests

  10. Performance Evaluation • Relatively high back to back read latency • Effective latency smaller for independent read misses • Smaller L2 hit latency than snoopy systems • Latency impact of sending invalidations is small and independent of no. of sharers • Local home writes with remote sharers take longer than with no sharers in case of barrier • Conflicting writes to same line have approximately same latency as 1-hop write latency

  11. Questions • Do you think that Alphaserver GS320 can completely replace snoopy systems and if not, why? • What are the major disadvantages of AlphaServer GS320?

More Related