1 / 15

Network Monitoring in the BaBar Experiment

Network Monitoring in the BaBar Experiment. S. Luitz, D. Millsom, D. Salomoni. Summary. The BaBar Data Acquisition Network A Typical Scenario... Traffic Monitoring and Recording Traffic Dump Analysis Tools Real-Time Analysis of Traffic Conclusions and Outlook.

terris
Download Presentation

Network Monitoring in the BaBar Experiment

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Network Monitoring in the BaBar Experiment S. Luitz, D. Millsom, D. Salomoni CHEP2000 - Padova, February 2000

  2. Summary • The BaBar Data Acquisition Network • A Typical Scenario... • Traffic Monitoring and Recording • Traffic Dump Analysis Tools • Real-Time Analysis of Traffic • Conclusions and Outlook CHEP2000 - Padova, February 2000

  3. The BaBar Data Acquisition Network (1) • ca. 200 VME single board computers (VxWorks): 100 Mbit/s full duplex Ethernet • 78 Sun Ultra 5 "farm" workstations for Level-3 trigger and fast monitoring: 2 100 Mbit/s full duplex Ethernet each ("dual homed") • 5 Sun Ultra 60 application servers (e.g. Run control): 100 Mbit/s full duplex Ethernet • 15 Sun Ultra 5 display console machines: 10 or 100 Mbit/s Ethernet CHEP2000 - Padova, February 2000

  4. The BaBar Data Acquisition Network (2) • 1 Sun E 450 (4 CPU, 780 Gbyte RAID) central boot/NFS/database/data buffer server: 2 x 1GBit/s Ethernet • various development and user workstations • 3 Cisco Cat 5500 switches • 2 VLANs / IP subnets: • dedicated real-time DAQ network (35-40 MByte/s) • general purpose / data transfer network CHEP2000 - Padova, February 2000

  5. CHEP2000 - Padova, February 2000

  6. A Typical Scenario • Problem: • Shift crew reports: "Run control server problem ca. 45 min ago at 23:50" • A look at the system logs shows NFS timeouts at 23:08 but no network-related events (like spanning tree reconfigurations) • Central network monitoring shows "normal" traffic • What was going on? Did someone/something overload the NFS server? Data base access? ...? • Server based performance monitoring very poor ! • Wouldn´t it be nice to be able to have a close look at the network traffic around 23:05? CHEP2000 - Padova, February 2000

  7. Traffic Monitoring and Recording (1) • We can! Even with free software tools! • Configure switch to forward all traffic in the BaBar general-purpose VLAN/subnet to a monitoring port (SPAN) • Standard protocol analyzers no good: small buffers, what to trigger on? • Sun E 250 with 72 Gbyte disk and Gigabit Ethernet as traffic recorder and protocol analyzer • Record packet headers into "circular" disk buffer CHEP2000 - Padova, February 2000

  8. Traffic Monitoring and Recording (2) • Use tcpdump (ftp://ftp.ee.lbl.gov) to capture packet headers and write them to files • In our environment: • We can´t monitor the real-time network, switch backplane capacity could be exceeded at peak • We have 3 switches, however presently we only monitor the switch where the file server is connected • Typical captured data rates during normal operation: 4 Gbytes / hour CHEP2000 - Padova, February 2000

  9. Analysis Tools (1) • How to look at Gigabytes of recorded network data? • Use tcpdump to filter dump file (e.g. "host bbr-srv02 and host bbr-srv03 and port nfs") into a smaller file • Use tcpslice (ftp://ftp.ee.lbl.gov) to isolate time intervals from the dump files • Use tcptrace to automatically analyze TCP connections and plot throughput graphshttp://jarok.cs.ohiou.edu/software/tcptrace/tcptrace.html • Look at low rate events directly with tcpdump CHEP2000 - Padova, February 2000

  10. Analysis Tools (2) • Sample tcptrace output for a connection (NFS) NFS port on server TCP connection 4: host g: BBR-SRV03.SLAC.Stanford.EDU:32769 host h: BBR-SRV02.SLAC.Stanford.EDU:2049 complete conn: yes first packet: Fri Jan 28 23:24:35.019938 2000 last packet: Fri Jan 28 23:24:35.027876 2000 elapsed time: 0:00:00.007938 total packets: 11 filename: srv02srv03.dump g->h: h->g: total packets: 6 total packets: 5 ack pkts sent: 5 ack pkts sent: 5 pure acks sent: 3 pure acks sent: 2 unique bytes sent: 44 unique bytes sent: 28 actual data pkts: 1 actual data pkts: 1 actual data bytes: 44 actual data bytes: 28 data xmit time: 0.000 secs data xmit time: 0.000 secs idletime max: 4.4 ms idletime max: 4.1 ms throughput: 5543 Bps throughput: 3527 Bps Not much happened! Much more info available, edited to fit ... CHEP2000 - Padova, February 2000

  11. Analysis Tools (3) Throughput between two hosts Yellow dots: instantaneous rate, quantization due to time resolution of packet time (GBit!) Red line: Averaged rates CHEP2000 - Padova, February 2000

  12. Analysis Tools (4) • The network dump can e.g. answer the following questions (and many more): • Who (UID,GID) has read the 25 Gbyte data file over NFS? • Were NFS timeouts correlated to a high NFS transaction volume/rate? • Which hosts were accessing the file server? • Do we have hosts/software with configuration problems? (Wrong subnet masks, applications using incorrect subnet broadcast addresses) • However, the analysis of the files is complicated, we´d like to have better tools! CHEP2000 - Padova, February 2000

  13. Real-Time Analysis of Traffic • A very interesting and promising free tool is NTOP (www.ntop.org) • Captures packets, analyzes the protocol headers in real-time and dynamically generates web pages, e.g.: • Protocols and their distribution • Hosts, host info, data sources and destinations • Throughput graphs • Traffic matrix • Still in development, not perfectly stable yet CHEP2000 - Padova, February 2000

  14. Real-Time Monitoring NTOP example CHEP2000 - Padova, February 2000

  15. Conclusions and Outlook • Network traffic recording and analysis • is feasible (with some restrictions) even in high performance switched network environments • looking forward to the next generation of gigabit-speeds-monitoring-capable switches and workstations • has shown to be very helpful in understanding host and network performance problems and computing infrastructure troubleshooting • Powerful free software tools are available: • but multiple programs, command line based, make analysis of network traffic log files quite a complicated procedure • The ultimate tool would be a PAW-like program for networks which allows filtering and plotting with a simple command language CHEP2000 - Padova, February 2000

More Related