1 / 18

Mellanox Technologies Maximize World-Class Cluster Performance

Mellanox Technologies Maximize World-Class Cluster Performance. April, 2008. Gilad Shainer – Director of Technical Marketing. Mellanox Technologies. Fabless semiconductor supplier founded in 1999 Business in California R&D and Operations in Israel Global sales offices and support

Download Presentation

Mellanox Technologies Maximize World-Class Cluster Performance

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.


Presentation Transcript

  1. Mellanox TechnologiesMaximize World-Class Cluster Performance April, 2008 Gilad Shainer – Director of Technical Marketing

  2. Mellanox Technologies • Fabless semiconductor supplier founded in 1999 • Business in California • R&D and Operations in Israel • Global sales offices and support • 250+ employees worldwide • Leading server and storage interconnect products • InfiniBand and Ethernet leadership • Shipped over 2.8M 10 & 20Gb/s ports as of Dec 2007 • $106M raised in Feb 2007 IPO on NASDAQ (MLNX) • Dual Listed on Tel Aviv Stock Exchange (TASE: MLNX) • Profitable since 2005 • Revenues: FY06=$48.5M, FY07=$84.1M • 73% yearly growth • 1Q08 guidance ~$24.8M • Customers include Cisco, Dawning, Dell, Fujitsu, Fujitsu-Siemens, HP, IBM, NEC, NetApp, Sun, Voltaire

  3. Interconnect: A Competitive Advantage • Adapter ICs & Cards • Switch ICs • Cables ADAPTER ADAPTER SWITCH Blade/Rack Servers Storage Switch Providing end-to-end products • Reference Designs • Software • End-to-End Validation

  4. InfiniBand in the TOP500 • InfiniBand shows the highest yearly growth of 52% compared to Nov 06 • InfiniBand strengthens its leadership as high-speed interconnect of choice • 4+ times the total number of all other high-speed interconnects (1Gb/s+) • InfiniBand 20Gb/s connects 40% of the InfiniBand clusters • Reflects the ever growing performance demands • InfiniBand makes the most powerful clusters • 3 of the top 5 (#3,#4, #5) and 7 of the Top20 (#14, #16, #18, #20) • The leading interconnect for the Top100 Number of Clusters in Top500

  5. TOP500 - InfiniBand Performance Trends • Ever growing demand for compute resources • Explosive growth of IB 20Gb/s • IB 40Gb/s anticipated in Nov08 list 180% CAGR 220% CAGR InfiniBand the optimized Interconnects for multi-core environments Maximum Scalability, Efficiency and Performance

  6. ConnectX: Leading IB and 10GigE Adapters • Server and Storage Interconnect • Highest InfiniBand and 10GigE performance • Single chip or slot optimizes cost, power, footprint, reliability • One device for 40Gb/s IB, FCoIB, 10GigE, CEE, FCoE • One SW stack for offload, virtualization, RDMA, storage 10/20/40 Gb/s IB Adapter 10GigE Adapter Dual Port PCIe Gen1/2 Hardware IOV FCoE, CEE 10/20/40 Gb/s PCIe Gen 1/2 Dual Port Hardware IOV InfiniBand Switch Ethernet Switch

  7. ConnectX Multi-core MPI Scalability • Scalability to 64+ cores per node • Scalability to 20K+ nodes per cluster • Guarantees same low latency regardless of the number of cores • Guarantees linear scalability for real applications

  8. ConnectX EN 10GigE Performance • World leading 10GigE performance • TCP, UDP, CPU utilization Performance testing on same HW platform Bandwidth with MTU of 9600B, CPU utilization for Rx

  9. InfiniScale IV: Unprecedented Scalability • Up to 36 40Gb/s or 12 120Gb/s InfiniBand Ports • 60-70ns switching latency • Adaptive routing and congestion control • Systems available latter part of 2008 ~3 terabits per second of switching capacity in a single silicon device!

  10. InfiniBand Address the Needs for Petascale Computing • Balanced random network streaming • “One to One” random streaming • Solution: Dynamic routing (InfiniScale IV) • Balanced known network streaming • “One to One” known streaming • Solution: Static routing (Now) • Un-balanced network streaming • “Many to one” streaming • Solution: Congestion control (Now) • Faster network streaming propagation • Network speed capabilities • Solution: InfiniBand QDR (InfiniScale IV) • 40/80/120Gb/s IB designed to handle all communications in HW

  11. Hardware Congestion Control • Congestion spots  catastrophic loss of throughput • Old techniques are not adequate today • Over-provisioning – applications demands high throughput • Algorithmic predictability – virtualization drives multiple simultaneously algorithms • InfiniBand HW congestion control • No a priori network assumptions needed • Automatic hot spots discovery • Data traffics adjustments • No bandwidth oscillation or other stability side effects • Ensures maximum effective bandwidth Simulation results 32-port 3 stage fat-tree network High input load, large hotpot degree Before congestion control After congestion control “Solving Hot Spot Contention Using InfiniBand Architecture Congestion Control IBM Research; IBM Systems and Technology Group; Technical University of Valencia, Spain

  12. Fast path modifications No overhead throughput Maximum flexibility for routing algorithms Random based decision Least loaded based decision Greedy Random based solution Least loaded out of random set Maximize “One to One” random traffic network efficiency Dynamically re-routes traffic to alleviate congested ports InfiniBand Adaptive Routing Simulation model (Mellanox): 972 nodes cases, Hot Spot traffic

  13. InfiniBand QDR 40Gb/s Technology • Superior performance for HPC applications • Highest bandwidth, 1us node-to-node latency • Low CPU overhead, MPI offloads • Designed for current and future multi-core environments • Addresses the needs for Petascale Computing • Adaptive routing, congestion control, large-scale switching • Fast network streaming propagation • Consolidated I/O for server and storage • Optimize cost and reduce power consumption • Maximizing cluster productivity • Efficiency, scalability and reliability

  14. Commercial HPC Demands High Bandwidth Scalability mandates InfiniBand QDR and beyond 4-node InfiniBand cluster demonstrates higher performance versus any cluster size with GigE

  15. Mellanox Cluster Center • http://www.mellanox.com/applications/clustercenter.php • Neptune cluster • 32 nodes • Dual core AMD Opteron CPUs • Helios cluster • 32 nodes • Quad core Intel Clovertown CPUs • Vulcan cluster • 32 nodes • Quad core AMD Barcelona CPUs • Utilizing “Fat Tree” network architecture (CBB) • Non-blocking switch topology • Non-blocking bandwidth • InfiniBand 20Gb/s (40Gb/s May 2008) • InfiniBand based storage • NFS over RDMA, SRP • GlusterFS cluster file system (Z Research)

  16. Summary • Market-wide adoption of InfiniBand • Servers/blades, storage and switch systems • Data Centers, High-Performance Computing, Embedded • Performance, Price, Power, Reliable, Efficient, Scalable • Mature software ecosystem • 4th Generation adapter extends connectivity to IB and Eth • Market leading performance, capabilities and flexibility • Multiple 1000+ node clusters already deployed • 4th Generation switch available May08 • 40Gb/s server and storage connections, 120Gb/s switch to switch links • 60-70nsec latency, 36 ports in a single switch chip, 3 Terabits/second • Driving key trends in the market • Clustering/blades, low-latency, I/O consolidation, multi-core CPUs and virtualization

  17. Thank You Gilad Shainer shainer@mellanox.com 17

  18. Accelerating HPC Applications

More Related