1 / 12

ATLAS Networking & T2UK

ATLAS Networking & T2UK. Richard Hughes-Jones The University of Manchester www.hep.man.ac.uk/~rich/ then “Talks”. Remote Computing Farms. Discussion at CERN to establish a work-plan for 2006 Valuable for Monitoring and Calibration MOU Alberta CERN Krakow Manchester

Download Presentation

ATLAS Networking & T2UK

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ATLAS Networking & T2UK Richard Hughes-Jones The University of Manchesterwww.hep.man.ac.uk/~rich/ then “Talks” T2UK RAL 15 Mar 2006, R. Hughes-Jones Manchester

  2. Remote Computing Farms • Discussion at CERN to establish a work-plan for 2006 • Valuable for Monitoring and Calibration • MOU Alberta CERN Krakow Manchester • New Network Topology with all links carried by GÉANT and NRNs • Planned Investigations • Characterise the new network links and end host performance • Tools:iperf udpmon thrulay yatm • Measure the ATLAS request-response behaviour • Tools: tcpmon, web100 tcpdump • Setup the WAN emulator with the measured conditions • Compare network and ATLAS traffic observations • Install and test ATLAS application gateway (as used at the pit) • Test deployment of Online TDAQ HLT releases • Measure performance of Online TDAQ HLT releases • Consider how to link Real-Time T/DAQ to remote Grid farms • First draft of Work Plan document circulated T2UK RAL 15 Mar 2006, R. Hughes-Jones Manchester

  3. Network Operation & Performance • Analysis of Fault Tolerance in ATLAS T/DAQ Networks • Document the action of the switches • Fate of the packets • Effect on T/DAQ applications • Networks Considered: • Front End (DataFlow) Network • BackEnd Network • Controls Network (Run control, services, some monitoring) • Consider questions like: • “Failure of a link between the ROS and the ROS Concentrator Switch” • Draft Document being discussed • Performance tests discussed • The PCI-e 4* 1GE PEG4 NIC Silicom. • Simple and trunking Throughput • ROS SuperMicro Motherboard • 6 PCI, 1 4 lane PCI-e, one 3.4 GHz Xeon (dual socket) T2UK RAL 15 Mar 2006, R. Hughes-Jones Manchester

  4. Network Monitoring in ATLAS T/DAQ • Levels of Monitoring • SNMP Statistics MRTG, RRD, YATM higher sample rate • Traffic patterns, bytes, packets NOT dropped packets • Network test programs udpmon, iperf • Throughput loss 1-way delay rtt • Standalone ATLAS test programs speaking the TDAQ application protocol. • Richard • ATLAS test programs speaking the TDAQ application protocol using TDAQ APIs • Stefan • Monitoring by the TDAQ application itself • Integration of Message Passing Libraries • DataFLow (Reiner) and EF (Mario) main difference in substantiation of buffers • Integrate over common thin shim over the socket calls • Idea to put monitoring into (common) message passing layer • What can be observed? • Question of keeping state – Application would be the best place ! T2UK RAL 15 Mar 2006, R. Hughes-Jones Manchester

  5. Related Work: RAID, ATLAS Grid • RAID0 and RAID5 tests • 4th Year MPhys project last semester • Throughput and CPU load • Different RAID parameters • Number of disks • Stripe size • User read / write size • Different file systems • Ext2 ext3 XSF • Sequential File Write, Read • Sequential File Write, Read with continuous background read or write • Status • Need to check some results & document • Independent RAID controller tests planned. T2UK RAL 15 Mar 2006, R. Hughes-Jones Manchester

  6. ESLEA: ATLAS Grid on UKLight • Demonstration of benefits of Dedicated links • 1 Gbit Lightpath Lancaster-Manchester • Disk 2 Disk Transfers • Storage Element with SRM using distributed disk pools dCache & xrootd T2UK RAL 15 Mar 2006, R. Hughes-Jones Manchester

  7. PCI-X bus with RAID Controller Read from diskfor 44 ms every 100ms PCI-X bus with Ethernet NIC Write to Network for 72 ms Check out the end host: bbftp • What is the end-host doing with your application protocol? • Transatlantic bbftp over TCP/IP • Look at the PCI-X buses • 3Ware 9000 controller RAID0 • 1 Gbit Ethernet link • 2.4 GHz dual Xeon • ~660 Mbit/s T2UK RAL 15 Mar 2006, R. Hughes-Jones Manchester

  8. Any Questions? T2UK RAL 15 Mar 2006, R. Hughes-Jones Manchester

  9. Backup Slides T2UK RAL 15 Mar 2006, R. Hughes-Jones Manchester

  10. 1.2GHz PIII rtt 1 ms • TCP iperf 980 Mbit/s • Kernel mode 95% Idle 1.3 % • CPULoad with nice priority • Throughput falls as priorityincreases • No Loss No Timeouts • Not enough CPU power • 2.8 GHz Xeon rtt 1 ms • TCP iperf 916 Mbit/s • Kernel mode 43% Idle 55% • CPULoad with nice priority • Throughput constant as priority increases • No Loss No Timeouts • Kernel mode includes TCP stackand Ethernet driver TCP Stacks & CPU Load • Real User problem! • End host TCP flow at 960 Mbit/s with rtt 1 ms falls to 770 Mbit/s when rtt 15 ms T2UK RAL 15 Mar 2006, R. Hughes-Jones Manchester

  11. A Few Items for Discussion • Achievable Throughput • Sharing link Capacity (OK what is sharing?) • Convergence time • Responsiveness • rtt fairness (OK what is fairness?) • mtu fairness • TCP friendliness • Link utilisation (by this flow or all flows) • Stability of Achievable Throughput • Burst behaviour • Packet loss behaviour • Packet re-ordering behaviour • Topology – maybe some “simple” setups • Background or cross traffic - how realistic is needed? – what protocol mix? • Reverse traffic • Impact on the end host – CPU load, bus utilisation, Offload • Methodology – simulation, emulation and Real links ALL help T2UK RAL 15 Mar 2006, R. Hughes-Jones Manchester

  12. More Information Some URLs 1 • UKLight web site: http://www.uklight.ac.uk • MB-NG project web site:http://www.mb-ng.net/ • DataTAG project web site: http://www.datatag.org/ • UDPmon / TCPmon kit + writeup: http://www.hep.man.ac.uk/~rich/net • Motherboard and NIC Tests: http://www.hep.man.ac.uk/~rich/net/nic/GigEth_tests_Boston.ppt& http://datatag.web.cern.ch/datatag/pfldnet2003/ “Performance of 1 and 10 Gigabit Ethernet Cards with Server Quality Motherboards” FGCS Special issue 2004 http:// www.hep.man.ac.uk/~rich/ • TCP tuning information may be found at:http://www.ncne.nlanr.net/documentation/faq/performance.html& http://www.psc.edu/networking/perf_tune.html • TCP stack comparisons:“Evaluation of Advanced TCP Stacks on Fast Long-Distance Production Networks” Journal of Grid Computing 2004 • PFLDnet http://www.ens-lyon.fr/LIP/RESO/pfldnet2005/ • Dante PERT http://www.geant2.net/server/show/nav.00d00h002 T2UK RAL 15 Mar 2006, R. Hughes-Jones Manchester

More Related