1 / 24

ATOLL

ATOLL. ATOLL - Performance And Cost Optimization of a SAN Interconnect. Dipl.-Inf. Patrick R. Schulz schulz@uni-mannheim.de Computer Architecture Group University of Mannheim, Germany. Presentation Outline. Design Considerations and Goals Basic Architecture of ATOLL

tracy
Download Presentation

ATOLL

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ATOLL ATOLL - Performance And Cost Optimization of a SAN Interconnect Dipl.-Inf. Patrick R. Schulz schulz@uni-mannheim.de Computer Architecture Group University of Mannheim, Germany Nov. 4th PDCS2002

  2. Presentation Outline • Design Considerations and Goals • Basic Architecture of ATOLL • Optimization for Performance and Cost • Special features of ATOLL • Performance results • Future Developments and Conclusion Nov. 4th PDCS2002

  3. ATOLL SAN • Design considerations for ATOLL: • Design for highest performance and lowest cost • Minimization of communication latency • Optimization of bandwidth for small and large messages • Realization of basic communication functions in hardware • Simplification of program access to the NIC • Avoiding software overhead Nov. 4th PDCS2002

  4. ATOLL NIC • Design goals on ATOLL: • Integration of all network components to a single chip • the external switch moves onto the NIC • Provides 4 replicated independant NI devices on the host side to serve 2/4-way SMP nodes without OS intervention • 4 bidirectional Link Ports to SAN • User level communication • Hardware message handler • Many support functions for parallel processing (atomic message startup, thread synchronization, ...) Nov. 4th PDCS2002

  5. ATOLL Basic Architecture ATOLL-Chip 4,5 Mio transistors 0.18u CMOS process 5,7 x 5,7 mm Chip Fastest and Second Biggest Design of a European University Nov. 4th PDCS2002

  6. ATOLL HW Architecture • PCI Interface • 64bit/66,100,133MHz PCI-X 1.0 compliant • runs also as 32bit/33MHz PCI interface (3.3V) • master (DMA) and slave (PIO) functionality • capable of combining several transactions into one burst if applicable Nov. 4th PDCS2002

  7. ATOLL HW Architecture • Host Port (Network Interface) • four fully featured devices • running at 250 MHz • PIO Mode for efficient send/receive of small messages utilizing write-combining and read-prefetching • DMA engines for autonomous transfer of large messages • small NI context of two cache lines fully loadable (virtual interfaces) Nov. 4th PDCS2002

  8. ATOLL HW Architecture • 4 x 4 bi-directional Crossbar • fully integrated network switch on-chip • running at 250 MHz • 2 GBytes/s bisection bandwidth • fully pipelined, wormhole routing • fall-through latency of 6 cycles (24ns) • reverse flow control through crossbar Nov. 4th PDCS2002

  9. ATOLL HW Architecture • Link Interface • bidirectional byte-wide LVDS Links (2 x 250 MBytes/s) • running at 250 MHz • reverse flow control characters are exchanged to prevent buffer overflow • CRC protection & automatic retransmission for 64 byte link packets • guaranteed message delivery after injection into network Nov. 4th PDCS2002

  10. ATOLL2d Torus Topology Example Node with an ATOLL NIC All topologies fitting to the 4 interconnects are supported ... Nov. 4th PDCS2002

  11. NIC NIC NIC ATOLLTree Topology Example Nov. 4th PDCS2002

  12. Optimization for Performance and Cost regarding cost: • wormhole philosophy eliminates memory on NIC • link cables and connectors (HD-68pin), PCB, chip package (custom BGA) are highly optimized for routability => ONLY 2+2 layer PCB, single layer package • LVDS signalling => high speed, low power, low EMI • I/O cells (LVDS, PCI-X) designed by partner university • free standard cell lib (VST, 0.18um) • low cost backend service, wire-length driven, traditional design flow Nov. 4th PDCS2002

  13. Optimization for Performance and Cost regarding performance: • Hardware retransmission => low software overhead • PCI-X => high performance node interface • User-level communication (multiple devices) => low latency • High clock frequency (250MHz) => high bandwidth (2GB/s) • Low latency (3 clock cycles for xbar arbitration) • NO kernel traps, IRQs when accessing the device andNO polling on PCI bus • mirroring important status registers in main memory using cache coherence Nov. 4th PDCS2002

  14. Optimization for Performance and Cost Nov. 4th PDCS2002

  15. Special Hardware Features regarding performance and cost: • programmable clock period (14MHz steps) => speed grades • cables with controlled impedance and low skew => transmission lines characteristics => wave pipelining • double pumped data on the cables => only one frequency, no phase shift Nov. 4th PDCS2002

  16. ATOLLBandwidth ~225 MByte/s Link utilization 100% = 250MByte/s >100 MByte/s link utilization [%] message size [bytes] Nov. 4th PDCS2002

  17. ATOLLLatency Test system: P3-1000 (Serverworks) PCI 66/64bit ATOLL@245MHz ONLY 27 clockcycles (~100 ns) latency per hop. Nov. 4th PDCS2002

  18. Cost Comparision Performance Cost 16Gb/s 2GB/s 1xNIC + 1x 4 port Switch ~ $2700 4x 0.3x 1GB/s 1xNIC ~ $900 4Gb/s 0.5GB/s 4xNIC + 1x 4 port Switch ~ $540 $1000 100Mb/s 12MB/s Fast-Ethernet ATOLL Fast-Ethernet ATOLL Myrinet 2000 Myrinet 2000 ATOLL:Cost-effectivness of 4 x (1/0.3) = 12 x of Myrinet Nov. 4th PDCS2002

  19. ATOLL-Team Uni MannheimLS Rechnerarchitektur Thanks to: Uni KaiserslauternLS Schaltungstechnik Ulrich Brüning Lambert Schälicke Patrick R. Schulz Holger Fröning Lars Rzymianowicz Basic Architecture Prof. Tielert Mark Wegener I/O Cells HW Implementation Architectural Enhancements IMEC Belgium Carl Das Layout Backend Service SUN Microsystems Synopsys Nov. 4th PDCS2002

  20. Future Development • Future of ATOLL Hardware-Development • optical Link Interconnect • based on a high performance SERDES chip (2 x 250 MB/s to 2.5 Gb/s) • short distance (up to 100m) serial optical interconnect • plug compatible to electrical interface • very cost effective implementation • ATOLL 2 • 500 MHz clock • higher dimensional Crossbar for multidimensional IN structures • multithreaded cached host interface • memory management support • command extension for direct memory operations (put, get, …) => MPI-2 Nov. 4th PDCS2002

  21. Conclusion • Radical new design approach leads to a single chip solution integrating a whole network on a chip. • Low budget design implemented from architecture to the chip. • It’s now reality (We are lucky: It’s first time right) Nov. 4th PDCS2002

  22. ATOLL: A New Contender in the System Area Network Market Thank you for your attention! Questions? further information: www.atoll-net.de schulz@uni-mannheim.de Nov. 4th PDCS2002

  23. Chip Photo Nov. 4th PDCS2002

  24. Interconnect Nov. 4th PDCS2002

More Related