1 / 28

Investigating Network Performance of Remote Real-Time Computing Farms for ATLAS Trigger DAQ

This study explores the potential use of remote computing farms in the ATLAS TDAQ system, investigating network performance and connectivity. The study includes analysis of end hosts, NICs, TCP congestion, and application protocols.

ijustus
Download Presentation

Investigating Network Performance of Remote Real-Time Computing Farms for ATLAS Trigger DAQ

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Investigating the Network Performance of Remote Real-Time Computing Farms For ATLAS Trigger DAQ. Richard Hughes-JonesUniversity of ManchesterIn Collaboration with:Bryan Caron University of Alberta Krzysztof Korcyl IFJ PAN KrakowCatalin Meirosu Politehnica University of Bucuresti & CERNJakob Langgard Nielsen Niels Bohr Institute IEEE Real Time 2005 Stockholm, 4-10 June, R. Hughes-Jones Manchester

  2. Introduction Poster: On the potential use of Remote Computing Farms in the ATLAS TDAQ System IEEE Real Time 2005 Stockholm, 4-10 June, R. Hughes-Jones Manchester

  3. Remote institute filtering • calibration • monitoring MSS MSS MSS Northern Tier ~200kSI2k Atlas Computing Model ~PByte/sec Trigger &Event Builder PC (2004) = ~1 kSpecInt2k 10 GByte/sec Event Filter~7.5MSI2k 320 MByte/sec Tier 0 ~5 PByte/yearno simulation CERN Center PBytes of Disk; Tape Robot Castor ~ 75MB/s/T1 for ATLAS Tier 1 UK Regional Centre (RAL) US Regional Centre Dutch Regional Centre French Regional Centre MSS ~2 PByte/year/T1 Tier 2 Tier2 Centre ~200kSI2k Tier2 Centre ~200kSI2k Tier2 Centre ~200kSI2k 622Mb/s – 1 Gbit/s links ~200 TByte/year/T2 • High Bandwidth Network • Many Processors • Experts at Remote sites Lancaster ~0.25TIPS Liverpool Manchester Sheffield Physics data cache 100 - 1000 MB/s links Desktop IEEE Real Time 2005 Stockholm, 4-10 June, R. Hughes-Jones Manchester

  4. ROB ROB ROB ROB PF PF Data Collection Network PF PF SFI SFI SFI L2PU L2PU Back End Network L2PU PF L2PU PF SFOs Switch Remote Computing Concepts Remote Event Processing Farms Copenhagen Edmonton Krakow Manchester ATLAS Detectors – Level 1 Trigger Event Builders lightpaths GÉANT Level 2 Trigger Local Event Processing Farms CERN B513 Mass storage Experimental Area IEEE Real Time 2005 Stockholm, 4-10 June, R. Hughes-Jones Manchester

  5. ATLAS Remote Farms – Network Connectivity IEEE Real Time 2005 Stockholm, 4-10 June, R. Hughes-Jones Manchester

  6. SFI and SFO Event Filter Daemon EFD Request event Send event data Request-Response time (Histogram) Process event Request Buffer Send OK Send processed event ●●● Time ATLAS Application Protocol • Event Request • EFD requests an event from SFI • SFI replies with the event ~2Mbytes • Processing of event • Return of computation • EF asks SFO for buffer space • SFO sends OK • EF transfers results of the computation • Tcpmon - instrumented tcp request-response program emulates the Event Filter EFD to SFI communication. IEEE Real Time 2005 Stockholm, 4-10 June, R. Hughes-Jones Manchester

  7. Networks and End Hosts IEEE Real Time 2005 Stockholm, 4-10 June, R. Hughes-Jones Manchester

  8. End Hosts & NICs CERN-nat-Manc. Throughput Packet Loss Re-Order • Use UDP packets to characterise Host, NIC & Network • SuperMicro P4DP8 motherboard • Dual Xenon 2.2GHz CPU • 400 MHz System bus • 64 bit 66 MHz PCI / 133 MHz PCI-X bus Request-Response Latency • The network can sustain 1Gbps of UDP traffic • The average server can loose smaller packets • Packet loss caused by lack of power in the PC receiving the traffic • Out of order packets due to WAN routers • Lightpaths look like extended LANShave no re-ordering IEEE Real Time 2005 Stockholm, 4-10 June, R. Hughes-Jones Manchester

  9. Using Web100 TCP Stack Instrumentation to analyse application protocol - tcpmon IEEE Real Time 2005 Stockholm, 4-10 June, R. Hughes-Jones Manchester

  10. TCP Congestion windowgets re-set on each Request • TCP stack implementation detail to reduce Cwnd after inactivity • Even after 10s, each response takes 13 rtt or ~260 ms • Transfer achievable throughput120 Mbit/s tcpmon: TCP Activity Manc-CERN Req-Resp • Round trip time 20 ms • 64 byte Request green1 Mbyte Response blue • TCP in slow start • 1st event takes 19 rtt or ~ 380 ms IEEE Real Time 2005 Stockholm, 4-10 June, R. Hughes-Jones Manchester

  11. tcpmon: TCP activity Manc-cern Req-RespTCP stack tuned • Round trip time 20 ms • 64 byte Request green1 Mbyte Response blue • TCP starts in slow start • 1st event takes 19 rtt or ~ 380 ms • TCP Congestion windowgrows nicely • Response takes 2 rtt after ~1.5s • Rate ~10/s (with 50ms wait) • Transfer achievable throughputgrows to 800 Mbit/s IEEE Real Time 2005 Stockholm, 4-10 June, R. Hughes-Jones Manchester

  12. tcpmon: TCP activity Alberta-CERN Req-RespTCP stack tuned • Round trip time 150 ms • 64 byte Request green1 Mbyte Response blue • TCP starts in slow start • 1st event takes 11 rtt or ~ 1.67 s • TCP Congestion windowin slow start to ~1.8s then congestion avoidance • Response in 2 rtt after ~2.5s • Rate 2.2/s (with 50ms wait) • Transfer achievable throughputgrows slowly from 250 to 800 Mbit/s IEEE Real Time 2005 Stockholm, 4-10 June, R. Hughes-Jones Manchester

  13. SC2004 Disk-Disk bbftp • bbftp file transfer program uses TCP/IP • UKLight: Path:- London-Chicago-London; PCs:- Supermicro +3Ware RAID0 • MTU 1500 bytes; Socket size 22 Mbytes; rtt 177ms; SACK off • Move a 2 Gbyte file • Web100 plots: • Standard TCP • Average 825 Mbit/s • (bbcp: 670 Mbit/s) • Scalable TCP • Average 875 Mbit/s • (bbcp: 701 Mbit/s~4.5s of overhead) • Disk-TCP-Disk at 1Gbit/sis here! IEEE Real Time 2005 Stockholm, 4-10 June, R. Hughes-Jones Manchester

  14. Time Series of Request-Response Latency • Manchester – CERN • Round trip time 20 ms • 1 Mbyte of data returned • Stable for ~18s at ~42.5ms • Then alternate points 29 & 42.5 ms • Alberta – CERN • Round trip time 150 ms • 1 Mbyte of data returned • Stable for ~150s at 300ms • Falls to 160ms with ~80 μs variation IEEE Real Time 2005 Stockholm, 4-10 June, R. Hughes-Jones Manchester

  15. Using the Trigger DAQ Application IEEE Real Time 2005 Stockholm, 4-10 June, R. Hughes-Jones Manchester

  16. Time Series of T/DAQ event rate • Manchester – CERN • Round trip time 20 ms • 1 Mbyte of data returned • 3 nodes: 1 GEthernet + two 100Mbit • 2 nodes: two 100Mbit nodes • 1node: one 100Mbit node • Event Rate: • Use tcpmon transfer time of ~42.5ms • Add the time to return the data 95ms • Expected rate 10.5/s • Observe ~6/s for the gigabit node • Reason: TCP buffers could not be set large enough in T/DAQ application IEEE Real Time 2005 Stockholm, 4-10 June, R. Hughes-Jones Manchester

  17. Tcpdump of the Trigger DAQ Application IEEE Real Time 2005 Stockholm, 4-10 June, R. Hughes-Jones Manchester

  18. ●●● tcpdump of the T/DAQ dataflow at SFI (1) Cern-Manchester 1.0 Mbyte event Remote EFD requests event from SFI Incoming event request Followed by ACK SFI sends event Limited by TCP receive buffer Time 115 ms (~4 ev/s) When TCP ACKs arrive more data is sent. N 1448 byte packets IEEE Real Time 2005 Stockholm, 4-10 June, R. Hughes-Jones Manchester

  19. Tcpdump of TCP Slowstart at SFI (2) Cern-Manchester 1.0 Mbyte event Remote EFD requests event from SFI First event request SFI sends event Limited by TCP Slowstart Time 320 ms N 1448 byte packets When ACKs arrive more data sent. IEEE Real Time 2005 Stockholm, 4-10 June, R. Hughes-Jones Manchester

  20. tcpdump of the T/DAQ dataflow for SFI &SFO • Cern-Manchester – another test run • 1.0 Mbyte event • Remote EFD requests events from SFI • Remote EFD sending computation back to SFO • Links closed by Application Link setup & TCP slowstart IEEE Real Time 2005 Stockholm, 4-10 June, R. Hughes-Jones Manchester

  21. Some First Conclusions • The TCP protocol dynamics strongly influence the behaviour of the Application. • Care is required with the Application design eg use of timeouts. • With the correct TCP buffer sizes • It is not throughput but the round-trip nature of the application protocol that determines performance. • Requesting the 1-2Mbytes of data takes 1 or 2 round trips • TCP Slowstart (the opening of Cwnd) considerably lengthens time for the first block of data. • Implementation “improvements” (Cwnd reduction) kill performance! • When the TCP buffer sizes are too small (default) • The amount of data sent is limited on each rtt • Data is send and arrives in bursts • It takes many round trips to send 1 or 2 Mbytes • The End Hosts themselves • CPU power is required for the TCP/IP stack as well and the application • Packets can be lost in the IP stack due to lack of processing power IEEE Real Time 2005 Stockholm, 4-10 June, R. Hughes-Jones Manchester

  22. Summary • We are investigating the technical feasibility of remote real-time computing for ATLAS. • We have exercised multiple 1 Gbit/s connections between CERN and Universities located in Canada, Denmark, Poland and the UK • Network providers are very helpful and interested in our experiments • Developed a set of tests for characterization of the network connections • Network behavior generally good – e.g. little packet loss observed • Backbones tend to over-provisioned • However access links and campus LANs need care. • Properly configured end nodes essential for getting good results with real applications. • Collaboration between the experts from the Application and Network teams is progressing well and is required to achieve performance. • Although the application is ATLAS-specific, the information presented on the network interactions is applicable to other areas including: • Remote iSCSI • Remote database accesses • Real-time Grid Computing – eg Real-Time Interactive Medical Image processing IEEE Real Time 2005 Stockholm, 4-10 June, R. Hughes-Jones Manchester

  23. Thanks to all who helped, including: • National Research NetworksCanarie, Dante, DARENET, Netera, PSNC and UKERNA • “ATLAS remote farms” J. Beck Hansen, R. Moore, R. Soluk, G. Fairey, T. Bold, A. Waananen, S. Wheeler, C. Bee • “ATLAS online and dataflow software” S. Kolos, S. Gadomski, A. Negri, A. Kazarov, M. Dobson, M. Caprini, P. Conde, C. Haeberli, M. Wiesmann, E. Pasqualucci, A. Radu IEEE Real Time 2005 Stockholm, 4-10 June, R. Hughes-Jones Manchester

  24. More Information Some URLs • Real-Time Remote Farm site http://csr.phys.ualberta.ca/real-time • UKLight web site: http://www.uklight.ac.uk • DataTAG project web site: http://www.datatag.org/ • UDPmon / TCPmon kit + writeup: http://www.hep.man.ac.uk/~rich/ (Software & Tools) • Motherboard and NIC Tests: http://www.hep.man.ac.uk/~rich/net/nic/GigEth_tests_Boston.ppt& http://datatag.web.cern.ch/datatag/pfldnet2003/ “Performance of 1 and 10 Gigabit Ethernet Cards with Server Quality Motherboards” FGCS Special issue 2004 http:// www.hep.man.ac.uk/~rich/ (Publications) • TCP tuning information may be found at:http://www.ncne.nlanr.net/documentation/faq/performance.html& http://www.psc.edu/networking/perf_tune.html • TCP stack comparisons:“Evaluation of Advanced TCP Stacks on Fast Long-Distance Production Networks” Journal of Grid Computing 2004http:// www.hep.man.ac.uk/~rich/ (Publications) • PFLDnet http://www.ens-lyon.fr/LIP/RESO/pfldnet2005/ • Dante PERT http://www.geant2.net/server/show/nav.00d00h002 IEEE Real Time 2005 Stockholm, 4-10 June, R. Hughes-Jones Manchester

  25. Any Questions? IEEE Real Time 2005 Stockholm, 4-10 June, R. Hughes-Jones Manchester

  26. Backup Slides IEEE Real Time 2005 Stockholm, 4-10 June, R. Hughes-Jones Manchester

  27. End Hosts & NICs CERN-Manc. Throughput Packet Loss Re-Order • Use UDP packets to characterise Host & NIC • SuperMicro P4DP8 motherboard • Dual Xenon 2.2GHz CPU • 400 MHz System bus • 66 MHz 64 bit PCI bus Request-response Latency IEEE Real Time 2005 Stockholm, 4-10 June, R. Hughes-Jones Manchester

  28. TCP (Reno) – Details • Time for TCP to recover its throughput from 1 lost packet given by: • for rtt of ~200 ms: 2 min UK 6 msEurope 20 msUSA 150 ms IEEE Real Time 2005 Stockholm, 4-10 June, R. Hughes-Jones Manchester

More Related