1 / 12

The Alpha 21364 Network Architecture

The Alpha 21364 Network Architecture. Mukherjee, Bannon, Lang, Spink, and Webb Summary Slides by Fred Bower ECE 259, Spring 2004. It’s A Small Paper…. …Packed With Detail Overview At High Level 21364 Chip Features and Built-In MP Constructs Network, Routing, and Router Basics More Depth

deane
Download Presentation

The Alpha 21364 Network Architecture

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Alpha 21364 Network Architecture Mukherjee, Bannon, Lang, Spink, and Webb Summary Slides by Fred Bower ECE 259, Spring 2004

  2. It’s A Small Paper… • …Packed With Detail • Overview At High Level • 21364 Chip Features and Built-In MP Constructs • Network, Routing, and Router Basics • More Depth • Routing Policies • Deadlock Avoidance Via Routing Policies • What’s In A Router? • Discussion

  3. 21364 Overview • 21264 Core With MP Additions • MC = Memory Controller • Router • Directory-Based CC • Runs at Core Clock • Buffering Capability • 1.75 MB L2 Cache Figure 1: The Alpha 21364 Floorplan

  4. The 21364 Network Topology • 2-d Torus • Limited Support for Imperfect Tori • Allows Fault Remapping • Virtual Cut-Through • 316* Packet Router Buffer • Simple, Adaptive Routing • Constrained Within Minimum Rectangle Figure 2: A 12-Processor 21364 Network Configuration *316 Total Packets of Buffer Capacity Divided Unevenly Amongst Classes and Ports

  5. Packet Classes • Seven Packet Classes • Request (3 Flits) • Forward (3 Flits) • Block Response (18 or 19 Flits) • Non-Block Response (2 or 3 Flits) • Write I/O (19 Flits) • Read I/O (3 Flits) • Special (1 or 3 Flits) • Flits Are 32 Bits Data Plus 7 Bits ECC

  6. Routing Policies: Minimum Rectangle • Four Rectangles With Current and Destination At Diagonals • Recall 2-d Torus – All Edges Wrap • Constrain Adaptive Routing To Minimum • Center of Figure 3 Figure 3: Routing Rectangles

  7. Routing Basics • Decode Of Packet Determines Routing • Use Of Lookup Tables For Destination Resolution, Virtual Channel Assignments, and Broadcast Invalidation Clusters • First Flit Has Routing And Packet Information • ECC Checked/Corrected At Each Router • Routers May Rewrite ECC • Routers Send Feedback About Buffer Availability

  8. Avoiding Coherence Deadlocks • Virtual Channels Break Cyclic Dependence • Separate Channel For Each Packet Class • Guarantees Independence of Class Traffic • Additional Ordering Constraint Amongst Classes of Packets • Additional Measures To Preserve I/O Consistency • Force Same-Class Requests To Arrive In-Order Using Deadlock-Free Virtual Channels • Allow I/O Writes To Pass I/O Reads Using Separate Virtual Channels For Reads and Writes • Prevent I/O Reads From Passing I/O Writes To Preserve Ordering Rules

  9. Avoiding Routing Deadlocks • 19 Virtual Channels • 3 Networks For Each of 6 Packet Classes Plus 1 Special • Adaptive, VC0, and VC1 • Adaptive Is First Choice • VC0 and VC1 Provide Guaranteed Drain If Adaptive Blocked • Careful Selection of Rules To Break Deadlocks Within Dimensions and Across Dimensions

  10. Internals Of The Router • Pipelined Design • 9 Pipeline Types Based Upon Input X Output Mapping • Input/Output Either Local, Interprocessor, or I/O • 13 Cycle In To Out Latency • Key To Performance (Smaller Better) • Recall Chip-Side At 1.2 GHz • Network-Side Speed At 800 MHz • Clock Sent With Outgoing Packets

  11. Brief Conclusions • Even With Moderate Constraints, Jelly-Bean MP Is Challenging • Correctness, Deadlock-Avoidance, Buffering, Arbitration, and Performance Require Careful Consideration In Design • This Paper Illustrates Where Network Latency Comes From • Even A Fast Network Seems Slow Compared To Local Access

  12. Discussion • Was 2-d Torus the Right Shape For This Design? • What Are the Limitations Imposed? • How Is the 1.2 GHz Internal/800 MHz External Clock Discrepancy OK? • Is MP Capability Better Than More Aggressive Core Optimizations For the Transistor Cost? • What About SMT, CMP?

More Related