100 likes | 105 Views
The Alpha 21364 Network Architecture. By Shubhendu S. Mukherjee, Peter Bannon Steven Lang, Aaron Spink, and David Webb Compaq Computer Corporation Presented by Luis Alfredo Campos. Alpha 21364 Goals. Support communication-intensive server applications High performance technical computing
E N D
The Alpha 21364 Network Architecture By Shubhendu S. Mukherjee, Peter Bannon Steven Lang, Aaron Spink, and David Webb Compaq Computer Corporation Presented by Luis Alfredo Campos
Alpha 21364 Goals • Support communication-intensive server applications • High performance technical computing • Database servers • Web servers • Telecommunication applications • Achieve: • Extremely low latency • Enormous bandwidth • Support directory cache coherence • Improve: • Reliability • Availability
Overview • Alpha 21264 core with enhancements • Tightly-Coupled multiprocessor network • Connects up to 128 processors • Two-Dimensional torus network • Integrated L2 Cache • Integrated memory controller • Router • Directory-Based CC • Separate Virtual Channels • Packet Classes
Network Packet Classes • Seven Packet Classes • Request (3 Flits) • Forward (3 Flits) • Block Response (18 or 19 Flits) • Non-Block Response (2 or 3 Flits) • Write I/O (19 Flits) • Read I/O (3 Flits) • Special (1 or 3 Flits) • Flits Are 32 Bits Data Plus 7 Bits ECC
Network Architecture • Two-dimensional torus • Limited Support for Imperfect Tori • Allows Fault Remapping • Virtual Cut-Through Routing • Buffer space for 316 packets
Adaptive Routing • Four Rectangles With Current and Destination At Diagonals • Packets route within the minimum rectangle • Maximize the bandwidth between source and destination
Avoiding Deadlocks in Adaptive Routing • “Adaptive routing will not deadlock a network as long as packets can drain via a deadlock-free path” • 19 Virtual Channels • 3 sets of virtual channel per Packet class except for the Special Class (only one channel) • Adaptive, VC0, and VC1 • Adaptive Is First Choice • VC0 and VC1 combination creates deadlock-free network
Router Architecture • 9 pipeline types • Input and Output: Local, Interprocessor, and I/O • Pin to pin latency of 13 cycles • Running at 1.2 Ghz • Network Links run 33% slower • Running at 0.8 Ghz • Synchronous with outgoing links • Asynchronous with incoming links
Arbitration • Needs to avoid central bottleneck • 16 local arbiters • 7 global arbiters • Least Recently Selected (LRS) Scheme • Local Arbiters • Classes • Virtual Channel • Global Arbiters • Input ports • Rotary Rule mode • Priority to oldest packets • Coherence Dependence Priority (CDP) Rule mode • Priority depending on class ordering
Questions • How Is the 1.2 GHz Internal/800 MHz External Clock OK? • Why 2-d Torus? • What Are the Limitations Imposed?