1 / 27

Protocols Working with 10 Gigabit Ethernet

Protocols Working with 10 Gigabit Ethernet. Richard Hughes-Jones The University of Manchester www.hep.man.ac.uk/~rich/ then “Talks”. Introduction 10 GigE on SuperMicro X7DBE 10 GigE on SuperMicro X5DPE-G2 10 GigE and TCP– Monitor with web100 disk writes

tadhg
Download Presentation

Protocols Working with 10 Gigabit Ethernet

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ProtocolsWorking with 10 Gigabit Ethernet Richard Hughes-Jones The University of Manchesterwww.hep.man.ac.uk/~rich/ then “Talks” ESLEA Closing Conference, Edinburgh, March 2007, R. Hughes-Jones Manchester

  2. Introduction • 10 GigE on SuperMicro X7DBE • 10 GigE on SuperMicro X5DPE-G2 • 10 GigE and TCP– Monitor with web100 disk writes • 10 GigE and Constant Bit Rate transfers • UDP + memory access • GÉANT 4 Gigabit tests ESLEA Closing Conference, Edinburgh, March 2007, R. Hughes-Jones Manchester

  3. Udpmon: Latency & Throughput Measurements • Tells us about: • Behavior of the IP stack • The way the HW operates • Interrupt coalescence • UDP/IP packets sent between back-to-back systems • Similar processing to TCP/IP but no flow control & congestion avoidance algorithms • Latency • Round trip times using Request-Response UDP frames • Latency as a function of frame size • Slope s given by: • Mem-mem copy(s) + pci + Gig Ethernet + pci + mem-mem copy(s) • Intercept indicates processing times + HW latencies • Histograms of ‘singleton’ measurements • UDP Throughput • Send a controlled stream of UDP frames spaced at regular intervals • Vary the frame size and the frame transmit spacing & measure: • The time of first and last frames received • The number packets received, lost, & out of order • Histogram inter-packet spacing received packets • Packet loss pattern • 1-way delay • CPU load • Number of interrupts • Tells us about: • Behavior of the IP stack • The way the HW operates • Capacity & Available throughput of the LAN / MAN / WAN ESLEA Closing Conference, Edinburgh, March 2007, R. Hughes-Jones Manchester

  4. Sender Receiver Zero stats OK done Send data frames at regular intervals Inter-packet time (Histogram) ●●● ●●● Time to receive Time to send Get remote statistics Send statistics: No. received No. lost + loss pattern No. out-of-order CPU load & no. int 1-way delay Signal end of test OK done Time Number of packets n bytes  time Wait time Throughput Measurements • UDP Throughput with udpmon • Send a controlled stream of UDP frames spaced at regular intervals ESLEA Closing Conference, Edinburgh, March 2007, R. Hughes-Jones Manchester

  5. High-end Server PCs for 10 Gigabit • Boston/Supermicro X7DBE • Two Dual Core Intel Xeon Woodcrest 5130 • 2 GHz • Independent 1.33GHz FSBuses • 530 MHz FD Memory (serial) • Parallel access to 4 banks • Chipsets: Intel 5000P MCH – PCIe & MemoryESB2 – PCI-X GE etc. • PCI • 3 8 lane PCIe buses • 3* 133 MHz PCI-X • 2 Gigabit Ethernet • SATA ESLEA Closing Conference, Edinburgh, March 2007, R. Hughes-Jones Manchester

  6. Histogram FWHM ~1-2 us 10 GigE Back2Back: UDP Latency • Motherboard: Supermicro X7DBE • Chipset: Intel 5000P MCH • CPU: 2 Dual Intel Xeon 5130 2 GHz with 4096k L2 cache • Mem bus: 2 independent 1.33 GHz • PCI-e 8 lane • Linux Kernel 2.6.20-web100_pktd-plus • Myricom NIC10G-PCIE-8A-R Fibre • myri10ge v1.2.0 + firmware v1.4.10 • rx-usecs=0 Coalescence OFF • MSI=1 • Checksums ON • tx_boundary=4096 • MTU 9000 bytes • Latency 22 µs & very well behaved • Latency Slope 0.0028 µs/byte • B2B Expect: 0.00268 µs/byte • Mem 0.0004 • PCI-e 0.00054 • 10GigE 0.0008 • PCI-e 0.00054 • Mem 0.0004 ESLEA Closing Conference, Edinburgh, March 2007, R. Hughes-Jones Manchester

  7. 10 GigE Back2Back: UDP Throughput • Kernel 2.6.20-web100_pktd-plus • Myricom 10G-PCIE-8A-R Fibre • rx-usecs=25 Coalescence ON • MTU 9000 bytes • Max throughput 9.4 Gbit/s • Notice rate for 8972 byte packet • ~0.002% packet loss in 10M packetsin receiving host • Sending host, 3 CPUs idle • For <8 µs packets, 1 CPU is >90% in kernel modeinc ~10% soft int • Receiving host3 CPUs idle • For <8 µs packets, 1 CPU is 70-80% in kernel modeinc ~15% soft int ESLEA Closing Conference, Edinburgh, March 2007, R. Hughes-Jones Manchester

  8. 10 GigE UDP Throughput vs packet size • Motherboard: Supermicro X7DBE • Linux Kernel 2.6.20-web100_pktd-plus • Myricom NIC10G-PCIE-8A-R Fibre • myri10ge v1.2.0 + firmware v1.4.10 • rx-usecs=0 Coalescence ON • MSI=1 • Checksums ON • tx_boundary=4096 • Steps at 4060 and 8160 byteswithin 36 bytes of 2n boundaries • Model data transfer time as t= C + m*Bytes • C includes the time to set up transfers • Fit reasonable C= 1.67 µs m= 5.4 e4 µs/byte • Steps consistent with C increasing by 0.6 µs • The Myricom drive segments the transfers, limiting the DMA to 4096 bytes – PCI-e chipset dependent! ESLEA Closing Conference, Edinburgh, March 2007, R. Hughes-Jones Manchester

  9. 10 GigE via Cisco 7600: UDP Latency • Motherboard: Supermicro X7DBE • PCI-e 8 lane • Linux Kernel 2.6.20 SMP • Myricom NIC10G-PCIE-8A-R Fibre • myri10ge v1.2.0 + firmware v1.4.10 • Rx-usecs=0 Coalescence OFF • MSI=1 Checksums ON • MTU 9000 bytes • Latency 36.6 µs & very well behaved • Switch Latency 14.66 µs • Switch internal: 0.0011 µs/byte • PCI-e 0.00054 • 10GigE 0.0008 ESLEA Closing Conference, Edinburgh, March 2007, R. Hughes-Jones Manchester

  10. The “SC05” Server PCs • Boston/Supermicro X7DBE • Two Intel Xeon Nocona • 3.2 GHz • Cache 2048k • Shared 800 MHz FSBus • DDR2-400 Memory • Chipsets: Intel 7520 Lindenhurst • PCI • 2 8 lane PCIe buses • 1 4 lane PCIe buse • 3* 133 MHz PCI-X • 2 Gigabit Ethernet ESLEA Closing Conference, Edinburgh, March 2007, R. Hughes-Jones Manchester

  11. 10 GigE X7DBEX6DHE: UDP Throughput • Kernel 2.6.20-web100_pktd-plus • Myricom 10G-PCIE-8A-R Fibre • myri10ge v1.2.0 + firmware v1.4.10 • rx-usecs=25 Coalescence ON • MTU 9000 bytes • Max throughput 6.3 Gbit/s • Packet loss ~ 40-60 % in receiving host • Sending host, 3 CPUs idle • 1 CPU is >90% in kernel mode • Receiving host3 CPUs idle • For <8 µs packets, 1 CPU is 70-80% in kernel modeinc ~15% soft int ESLEA Closing Conference, Edinburgh, March 2007, R. Hughes-Jones Manchester

  12. So now we can run at 9.4 Gbit/s Can we do any work ? ESLEA Closing Conference, Edinburgh, March 2007, R. Hughes-Jones Manchester

  13. 10 GigE X7DBEX7DBE: TCP iperf Web100 plots of TCP parameters • No packet loss • MTU 9000 • TCP buffer 256k BDP=~330k • Cwnd • SlowStart then slow growth • Limited by sender ! • Duplicate ACKs • One event of 3 DupACKs • Packets Re-Transmitted • Throughput Mbit/s • Iperf throughput 7.77 Gbit/s ESLEA Closing Conference, Edinburgh, March 2007, R. Hughes-Jones Manchester

  14. 10 GigE X7DBEX7DBE: TCP iperf Web100 plots of TCP parameters • Packet loss 1: 50,000 -recv-kernel patch • MTU 9000 • TCP buffer 256k BDP=~330k • Cwnd • SlowStart then slow growth • Limited by sender ! • Duplicate ACKs • ~10 DupACKs every lost packet • Packets Re-Transmitted • One per lost packet • Throughput Mbit/s • Iperf throughput 7.84 Gbit/s ESLEA Closing Conference, Edinburgh, March 2007, R. Hughes-Jones Manchester

  15. 10 GigE X7DBEX7DBE: CBR/TCP Web100 plots of TCP parameters • Packet loss 1: 50,000 -recv-kernel patch • tcpdelay message 8120bytes • Wait 7 µs • RTT 36 µs • TCP buffer 256k BDP=~330k • Cwnd • Dips as expected • Duplicate ACKs • ~15 DupACKs every lost packet • Packets Re-Transmitted • One per lost packet • Throughput Mbit/s • tcpdelay throughput 7.33 Gbit/s ESLEA Closing Conference, Edinburgh, March 2007, R. Hughes-Jones Manchester

  16. B2B UDP with memory access • Send UDP traffic B2B with 10GE • On receiver run independent memory write task • L2 Cache 4096 k Byte • 8000k Byte blocks • 100% user mode • Achievable UDP Throughput • mean 9.39 Gb/s sigma 106 • mean 9.21 Gb/s sigma 37 • mean 9.2 sigma 30 • Packet loss • mean 0.04% • mean 1.4 % • mean 1.8 % • CPU load: Cpu0 : 6.0% us, 74.7% sy, 0.0% ni, 0.3% id, 0.0% wa, 1.3% hi, 17.7% si, 0.0% st Cpu1 : 0.0% us, 0.0% sy, 0.0% ni, 100.0% id, 0.0% wa, 0.0% hi, 0.0% si, 0.0% st Cpu2 : 0.0% us, 0.0% sy, 0.0% ni, 100.0% id, 0.0% wa, 0.0% hi, 0.0% si, 0.0% st Cpu3 : 100.0% us, 0.0% sy, 0.0% ni, 0.0% id, 0.0% wa, 0.0% hi, 0.0% si, 0.0% st ESLEA Closing Conference, Edinburgh, March 2007, R. Hughes-Jones Manchester

  17. ESLEA-FABRIC:4 Gbit flows over GÉANT • Set up 4 Gigabit Lightpath Between GÉANT PoPs • Collaboration with Dante • GÉANT Development Network London – London or London – Amsterdamand GÉANT Lightpath service CERN – Poznan • PCs in their PoPs with 10 Gigabit NICs • VLBI Tests: • UDP Performance • Throughput, jitter, packet loss, 1-way delay, stability • Continuous (days) Data Flows – VLBI_UDP and • multi-Gigabit TCP performance with current kernels • Experience for FPGA Ethernet packet systems • Dante Interests: • multi-Gigabit TCP performance • The effect of (Alcatel) buffer size on bursty TCP using BW limited Lightpaths ESLEA Closing Conference, Edinburgh, March 2007, R. Hughes-Jones Manchester

  18. 10 Gigabit SDH backbone • Alkatel 1678 MCC • Node location: • London • Amsterdam • Paris • Prague • Frankfurt • Can do traffic routingso make long rtt paths • Available Now 07 • Less Pressure for long term tests Options Using the GÉANT Development Network ESLEA Closing Conference, Edinburgh, March 2007, R. Hughes-Jones Manchester

  19. Options Using the GÉANT LightPaths • 10 Gigabit SDH backbone • Alkatel 1678 MCC • Node location: • Budapest • Geneva • Frankfurt • Milan • Paris • Poznan • Prague • Vienna • Can do traffic routingso make long rtt paths • Ideal: London Copenhagen • Set up 4 Gigabit Lightpath Between GÉANT PoPs • Collaboration with Dante • PCs in Dante PoPs ESLEA Closing Conference, Edinburgh, March 2007, R. Hughes-Jones Manchester

  20. Any Questions? ESLEA Closing Conference, Edinburgh, March 2007, R. Hughes-Jones Manchester

  21. Backup Slides ESLEA Closing Conference, Edinburgh, March 2007, R. Hughes-Jones Manchester

  22. 10 Gigabit Ethernet: UDP Throughput • 1500 byte MTU gives ~ 2 Gbit/s • Used 16144 byte MTU max user length 16080 • DataTAG Supermicro PCs • Dual 2.2 GHz Xenon CPU FSB 400 MHz • PCI-X mmrbc 512 bytes • wire rate throughput of 2.9 Gbit/s • CERN OpenLab HP Itanium PCs • Dual 1.0 GHz 64 bit Itanium CPU FSB 400 MHz • PCI-X mmrbc 4096 bytes • wire rate of 5.7 Gbit/s • SLAC Dell PCs giving a • Dual 3.0 GHz Xenon CPU FSB 533 MHz • PCI-X mmrbc 4096 bytes • wire rate of 5.4 Gbit/s ESLEA Closing Conference, Edinburgh, March 2007, R. Hughes-Jones Manchester

  23. mmrbc 512 bytes mmrbc 1024 bytes mmrbc 2048 bytes CSR Access PCI-X Sequence Data Transfer Interrupt & CSR Update mmrbc 4096 bytes 5.7Gbit/s 10 Gigabit Ethernet: Tuning PCI-X • 16080 byte packets every 200 µs • Intel PRO/10GbE LR Adapter • PCI-X bus occupancy vs mmrbc • Measured times • Times based on PCI-X times from the logic analyser • Expected throughput ~7 Gbit/s • Measured 5.7 Gbit/s ESLEA Closing Conference, Edinburgh, March 2007, R. Hughes-Jones Manchester

  24. Data Transfer CSR Access 10 Gigabit Ethernet: TCP Data transfer on PCI-X • Sun V20z 1.8GHz to2.6 GHz Dual Opterons • Connect via 6509 • XFrame II NIC • PCI-X mmrbc 4096 bytes66 MHz • Two 9000 byte packets b2b • Ave Rate 2.87 Gbit/s • Burst of packets length646.8 us • Gap between bursts 343 us • 2 Interrupts / burst ESLEA Closing Conference, Edinburgh, March 2007, R. Hughes-Jones Manchester

  25. Data Transfer CSR Access 2.8us 10 Gigabit Ethernet: UDP Data transfer on PCI-X • Sun V20z 1.8GHz to2.6 GHz Dual Opterons • Connect via 6509 • XFrame II NIC • PCI-X mmrbc 2048 bytes66 MHz • One 8000 byte packets • 2.8us for CSRs • 24.2 us data transfereffective rate 2.6 Gbit/s • 2000 byte packet wait 0us • ~200ms pauses • 8000 byte packet wait 0us • ~15ms between data blocks ESLEA Closing Conference, Edinburgh, March 2007, R. Hughes-Jones Manchester

  26. 10 Gigabit Ethernet: Neterion NIC Results • X5DPE-G2 Supermicro PCs B2B • Dual 2.2 GHz Xeon CPU • FSB 533 MHz • XFrame II NIC • PCI-X mmrbc 4096 bytes • Low UDP rates ~2.5Gbit/s • Large packet loss • TCP • One iperf TCP data stream 4 Gbit/s • Two bi-directional iperf TCP data streams 3.8 & 2.2 Gbit/s ESLEA Closing Conference, Edinburgh, March 2007, R. Hughes-Jones Manchester

  27. SC|05 Seattle-SLAC 10 Gigabit Ethernet • 2 Lightpaths: • Routed over ESnet • Layer 2 over Ultra Science Net • 6 Sun V20Z systems per λ • dcache remote disk data access • 100 processes per node • Node sends or receives • One data stream 20-30 Mbit/s • Used Neteion NICs & Chelsio TOE • Data also sent to StorCloudusing fibre channel links • Traffic on the 10 GE link for 2 nodes: 3-4 Gbit per nodes 8.5-9 Gbit on Trunk ESLEA Closing Conference, Edinburgh, March 2007, R. Hughes-Jones Manchester

More Related