1 / 9

Lessons Learned in Grid Networking or How do we get end-2-end performance to Real Users ?

Lessons Learned in Grid Networking or How do we get end-2-end performance to Real Users ?. Richard Hughes-Jones. Network Monitoring is Essential. Detect or X-check problem reports Isolate / determine a performance issue Capacity planning Publication of data: network “cost” for middleware

rainey
Download Presentation

Lessons Learned in Grid Networking or How do we get end-2-end performance to Real Users ?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lessons Learned inGrid NetworkingorHow do we get end-2-end performanceto Real Users ? Richard Hughes-Jones GNEW2004 CERN March 2004 R. Hughes-Jones Manchester

  2. Network Monitoring is Essential • Detect or X-check problem reports • Isolate / determine a performance issue • Capacity planning • Publication of data: network “cost” for middleware • RBs for optimized matchmaking • WP2 Replica Manager • Capacity planning • SLA verification • Isolate / determine throughput bottleneck – work with real user problems • Test conditions for Protocol/HW investigations • Protocol performance / development • Hardware performance / development • Application analysis • Input to middleware – eg gridftp throughput • Isolate / determine a (user) performance issue • Hardware / protocol investigations • End2End Time Series • Throughput UDP/TCP • Rtt • Packet loss • Passive Monitoring • Routers Switches SNMP MRTG • Historical MRTG • Packet/Protocol Dynamics • tcpdump • web100 • Output from Application tools GNEW2004 CERN March 2004 R. Hughes-Jones Manchester

  3. Multi-Gigabit transfers are possible and stable 10 GigEthernet at SC2003 BW Challenge • Three Server systems with 10 GigEthernet NICs • Used the DataTAG altAIMD stack 9000 byte MTU • Send mem-mem iperf TCP streams From SLAC/FNAL booth in Phoenix to: • Pal Alto PAIX • rtt 17 ms , window 30 MB • Shared with Caltech booth • 4.37 Gbit hstcp I=5% • Then 2.87 Gbit I=16% • Fall corresponds to 10 Gbit on link • 3.3Gbit Scalable I=8% • Tested 2 flows sum 1.9Gbit I=39% • Chicago Starlight • rtt 65 ms , window 60 MB • Phoenix CPU 2.2 GHz • 3.1 Gbit hstcp I=1.6% • Amsterdam SARA • rtt 175 ms , window 200 MB • Phoenix CPU 2.2 GHz • 4.35 Gbit hstcp I=6.9% • Very Stable • Both used Abilene to Chicago GNEW2004 CERN March 2004 R. Hughes-Jones Manchester

  4. The performance of the end host / disks is really important BaBar Case Study: RAID Throughput & PCI Activity • 3Ware 7500-8 RAID5 parallel EIDE • 3Ware forces PCI bus to 33 MHz • BaBar Tyan to MB-NG SuperMicroNetwork mem-mem 619 Mbit/s • Disk – disk throughput bbcp40-45 Mbytes/s (320 – 360 Mbit/s) • PCI bus effectively full! • User throughput ~ 250 Mbit/s • User surprised !! Read from RAID5 Disks Write to RAID5 Disks GNEW2004 CERN March 2004 R. Hughes-Jones Manchester

  5. MB - NG Application design – Throughput + Web100 • 2Gbyte file transferred RAID0 disks • Web100 output every 10 ms • Gridftp • See alternate 600/800 Mbit and zero • Apachie web server + curl-based client • See steady 720 Mbit GNEW2004 CERN March 2004 R. Hughes-Jones Manchester

  6. Network Monitoring is vital • Development of new TCP stacks and non-TCP protocols is required • Multi-Gigabit transfers are possible and stable on current networks • Complementary provision of packet IP & λ-Networks is needed • The performance of the end host / disks is really important • Application design can determine Perceived Network Performance • Helping Real Users is a must – can be harder than herding cats • Cooperation between Network providers, Network Researchers, and Network Users has been impressive • Standards (eg GGF / IETF) are the way forward • Many grid projects just assume the network will work !!! • It takes lots of co-operation to put all the components together GNEW2004 CERN March 2004 R. Hughes-Jones Manchester

  7. GNEW2004 CERN March 2004 R. Hughes-Jones Manchester

  8. mmrbc 512 bytes mmrbc 1024 bytes mmrbc 2048 bytes CSR Access PCI-X Sequence Data Transfer Interrupt & CSR Update mmrbc 4096 bytes Tuning PCI-X: Variation of mmrbc IA32 • 16080 byte packets every 200 µs • Intel PRO/10GbE LR Adapter • PCI-X bus occupancy vs mmrbc • Plot: • Measured times • Times based on PCI-X times from the logic analyser • Expected throughput GNEW2004 CERN March 2004 R. Hughes-Jones Manchester

  9. GGF: Hierarchy Characteristics Document • “A Hierarchy of Network Performance Characteristics for Grid Applications and Services” • Document defines terms & relations: • Network characteristics • Measurement methodologies • Observation • Discusses Nodes & Paths • For each Characteristic • Defines the meaning • Attributes that SHOULD be included • Issues to consider when making an observation • Status: • Originally submitted to GFSG as Community Practice Documentdraft-ggf-nmwg-hierarchy-00.pdf Jul 2003 • Revised to Proposed Recommendation http://www-didc.lbl.gov/NMWG/docs/draft-ggf-nmwg-hierarchy-02.pdf 7 Jan 04 • Now in 60 day Public comment from 28 Jan 04 – 18 days to go. GNEW2004 CERN March 2004 R. Hughes-Jones Manchester

More Related