1 / 33

GridNM Network Monitoring Architecture (and a bit about my phd)

GridNM Network Monitoring Architecture (and a bit about my phd). Yee-Ting Li, <ytl@hep.ucl.ac.uk> 1 st Year Report @ UCL, 17 th June 2002. What the GRID is. Distributed System Interconnected with networks Balancing processors, storage and network utilisation

Download Presentation

GridNM Network Monitoring Architecture (and a bit about my phd)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. GridNMNetwork Monitoring Architecture(and a bit about my phd) Yee-Ting Li, <ytl@hep.ucl.ac.uk> 1st Year Report @ UCL, 17th June 2002

  2. What the GRID is • Distributed System • Interconnected with networks • Balancing processors, storage and network utilisation • Like the SETI project on steriods • Networking is important to make GRID work GridNM - Yee-Ting Li

  3. Networking Important! • Only way two grid nodes can communicate with each other • Need ways of determining how ‘efficiently’ they talk • Focus on: • The characterising how they talk • The language they use to talk GridNM - Yee-Ting Li

  4. Part 1 • Network Metrics and Measurement • GridNM • Case studies GridNM - Yee-Ting Li

  5. Network Metrics / Characteristics • Metric: ‘several quantities related to the performance and reliability of the Internet that we'd like to know the value of. When such a quantity is carefully specified, we term the quantity a metric.’ • Can be empirical or derived • Singletons, Sample and Statistical Metrics GridNM - Yee-Ting Li

  6. Example Metrics • Connectivity • One-way delay • Two-way delay • Throughput / goodput • Network path • Loss • Jitter GridNM - Yee-Ting Li

  7. Metrics Example • Video Conferencing • Needs predictable bit rate • Doesn’t usually matter if bit rate changes too much • Needs constant jitter • Low one-way delay preferable • FTP • Needs reliable transport • Throughput depends on urgency of data • Jitter and delay don’t matter GridNM - Yee-Ting Li

  8. Measurement Methodology • How to get the metrics • Must be repeatable – need to define methodology carefully • Direct measurement of a performance metric using injected test traffic. • Projection of a metric from lower-level measurements. • Estimation of a constituent metric from a set of aggregated measurements. • Estimation of a given metric at one time from a set of related metrics at other times. GridNM - Yee-Ting Li

  9. Measurement Example • ‘ping’ measures rtt – a direct measurement • Sending a single ‘ping’ would give a singleton - empirical • Sending 10 pings (a sample) out and getting the average is a statistical metric – derived • Using a set of measurements over time, we can derive an Estimate of the rtt • Projection would be if we had the owd for each router to the next – add all up together to get path owd. GridNM - Yee-Ting Li

  10. Network Monitoring Uses • Monitoring is measuring over long periods of time • Gives an indication of network performance over time – a baseline • Allows comparison of different tools for analysis • Allows analysis of how different protocols behave in different conditions – in real life • Allows ‘tuning’ of existing protocols to make most out of network GridNM - Yee-Ting Li

  11. GridNM • Architecture for monitoring the network • Backend – collects data for presentation • Logs metrics in ASCII log files on a single host • Allows mesh measurements – all nodes performs measurements to al other nodes • Uses standard UNIX infrastructure – ssh • Should be easily adaptable to using Globus certifications once interactive processing is introduced in EDG. GridNM - Yee-Ting Li

  12. GridNM (cont…) • Uses existing (and future tools) to collect metrics • Modular - uses XML to describe available resources • Hosts • Tools • Locks hosts if under measurement – prevents other tests affecting metrics • Currently monitoring 6 sites around Europe using 5 tools GridNM - Yee-Ting Li

  13. GridNM ‘plot’ GridNM - Yee-Ting Li

  14. Security • As secure as SSH • But requires automatic logon • Denial of Service Attacks • Certain Tools (eg iperf) require servers to be run. • GridNM runs the server (unless otherwise told not to) before each tests on the remote host GridNM - Yee-Ting Li

  15. Tool Examples GridNM - Yee-Ting Li

  16. UDP versus TCP GridNM - Yee-Ting Li

  17. Rtt – good network GridNM - Yee-Ting Li

  18. Rtt – periodicity GridNM - Yee-Ting Li

  19. Rtt – bad network GridNM - Yee-Ting Li

  20. Rtt – bad network, loss GridNM - Yee-Ting Li

  21. TCP / Iperf Throughput GridNM - Yee-Ting Li

  22. TCP Performance GridNM - Yee-Ting Li

  23. TCP Performance GridNM - Yee-Ting Li

  24. What does TCP do? Socket buffer size • Tap is independent of Tank size • Tank filled by application • Valve opening (data rate) determined by feedback from network • Small tanks mean small data rate • Large tanks mean larger data rate • Even larger tank mean smaller data rate?!?! TCP Protocol Network GridNM - Yee-Ting Li

  25. Investigation • Possible explanation: • Rate of tank filling < rate of water flow out • i.e. application not fast enough to fill socket buffer past threshold • BUT - needs further investigation • Back to back lab tests with PCs and routers • Comparison to other tcp based tools GridNM - Yee-Ting Li

  26. Part 2 • Network Communication Languages • Known as transport protocols - determines how applications put traffic into the network • Sits on top of IP – common language of the internet GridNM - Yee-Ting Li

  27. Transport Level Protocols • TCP (HTTP, FTP, GridFTP) used for file transfer • Gives guarantee on delivery • All data is copied precisely • Performance can be poor • Respects other internet users • UDP (Real, H323) used for video conferencing • Gives no guarantees on delivery • Data may be incomplete • Performance good • Doesn’t respect other internet users GridNM - Yee-Ting Li

  28. UDP versus TCPperformance at high speeds GridNM - Yee-Ting Li

  29. Measuring Performance of Transport Level Protocols • Need to identify what we want to measure – the metrics. • Dependant on the use of the transport protocol. Need to analyse application level usage • For Grid: • Movement of ‘transient’ data • File Transfer and Replication • process jobs or ‘sandboxes’ • Movement of Real-Time Data • Video Conferencing – Access Grid • Real-Time applications GridNM - Yee-Ting Li

  30. Transport Protocols ‘NG’ GridNM - Yee-Ting Li

  31. Tools to Measure Grid Traffic • Eg TCP • Can use web100 – allows analysis of TCP traffic via fundamental variables important to TCP/IP\ • GridFTP allows logging of transfer information • UDP (UDP Blast, Tsunami) • Need either transport level recording (like web100) or application monitoring • PGM / CC • Need application to be built to use transport protocol • General Solution • Gather SNMP data from nodes along network. GridNM - Yee-Ting Li

  32. Future Directions(the phd bit) • Provision Title in field of • Providing Advanced Transport Protocols for Grid Applications • Aim: Use GridNM infrastructure to analyse performance of different transport protocols • Implement findings into Grid infrastructure, eg GridFTP, to improve grid processes (processing jobs, file transfer, file replication, Access Grid…) GridNM - Yee-Ting Li

  33. Conclusion • Created a flexible infrastructure to monitor and analyse internet traffic • Shown metrics for different scenarios • Given performance overview of current transport protocols • Identified future areas of research into Transport Protocols for the grid. GridNM - Yee-Ting Li

More Related