1 / 38

Grid Network Monitoring in the European DataGRID project

Grid Network Monitoring in the European DataGRID project. Pascale VICAT-BLANC/PRIMET Pascale.Primet@ens-lyon.fr INRIA-RESO Université de Lyon1 - Ecole Normale Supérieure de Lyon France. Grid High Performance Networking. INRIA RESO & UREC teams at LIP/ ENS-Lyon

conway
Download Presentation

Grid Network Monitoring in the European DataGRID project

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Grid Network Monitoring in the European DataGRID project Pascale VICAT-BLANC/PRIMET Pascale.Primet@ens-lyon.fr INRIA-RESO Université de Lyon1 - Ecole Normale Supérieure de Lyon France

  2. Grid High Performance Networking INRIA RESO & UREC teams at LIP/ ENS-Lyon http://www.ens-lyon.fr/LIP/RESO Manage the Network workpackage of the EU DataGRID project : http://ccwp7.in2p3.fr http://eu-datagrid.web.cern.ch/eu-datagrid Contact : Pascale.Primet@ens-lyon.fr French National Grid High Performance Platform over the DWDM VTHD network (2.5Gb/s) thee-toile project Connected through the EU DataTAG 1Gb/s link to Chigago P.VICAT-BLANC/PRIMET

  3. Starlight Chicago 1Gb/s experimental EDT link EDF/Clamart PRISM CERN SUN CEA/SACLAY ENS RESO/LIP New router New VTHD site VTHD site P.VICAT-BLANC/PRIMET

  4. Grid High Performance Networking • Performances measurement and monitoring • for Resource Broker Services (optimisation). • to identify bottlenecks and points of unreliability • High Performance transport protocol • TCP optimizations • Non TCP High Performance Transport Protocol • Reliable multicast • IP Differentiated Services models • Alternative Differentiated Services Models (EDS, QBSS..) • Premium service experiments/evaluation with Bio apps • Active Grid concept • Convergence of Grid and Active Network Technology • Deployment of Active Nodes over the e-toile National Grid High Performance testbed P.VICAT-BLANC/PRIMET

  5. Outline • Grid and Network issues. • The DataGRID testbed • Grid Network monitoring architecture • MapCenter tool dedicated to Grid status visualisation P.VICAT-BLANC/PRIMET

  6. Grid concept • « Anatomical View »: complex collection of shared ressources to create an integrated virtual environment • « Physiological View» : complex collection of distributed services • Main objectives: • Functional : • study new and larger problems • Performances : • diminish the global processing time P.VICAT-BLANC/PRIMET

  7. « Physical » view of a Grid Network Public Network No security No predictable performances No control on the traffic The flat INTERNET Resource = CE (computing element) or Resource = SE (storage element) P.VICAT-BLANC/PRIMET

  8. Network Performance impact • The IP Technology • long distance connectivity • but • Time= f(Tcomputation,Ttransfer) • and • Ttransfer = Volume/Rate • Stability and predictability of network performances are key factors P.VICAT-BLANC/PRIMET

  9. Data transfer : typical scenario Processing center Input Data Output Data Network Job execution User Storage center P.VICAT-BLANC/PRIMET

  10. Where is the bottleneck? Network File Transfer Time (Ttrans) Wide Area Network Local System Local System Local Area Network Local Area Network Where is the bottleneck? P.VICAT-BLANC/PRIMET

  11. European DataGRID project • 7 applications distributed among 6 virtual organisations • 11 organisations over 15 countries • 40 sites in Europe • Based on the European GEANT backbone, National NREN’s (SuperJanet, GARR, Renater, SurfNet…) and local area network (CERN, IN2P3…) • more informations :http://ccwp7.in2p3.fr P.VICAT-BLANC/PRIMET

  12. EDG Network Infrastructure P.VICAT-BLANC/PRIMET

  13. European DataGRID testbed P.VICAT-BLANC/PRIMET

  14. Grid High Performance Networking • Collaboration with providers • Specific requirements studies • Available infrastructure and services studies • Enhanced Network services tests • Collaboration with users • End to end monitoring • E2E performances problems identification • Network cost functions for scheduling • Transport protocols studies and optimisation P.VICAT-BLANC/PRIMET

  15. CERN TIER 0 TIER 1 INFN RAL … IN2P3 … TIER 2 Lab 1 Lab 2 Lab n … Departments a b c Desktops … Application requirements studies P.VICAT-BLANC/PRIMET

  16. Network performances measurement For Provisioning: • To be available, via visualization to human observer (user, network/system administrators) • To provide tools for network performances measurement, problems identification and resolution (bottlenecks, point of unreliability, quality of service needs, topology…) • To achieve network performance forecast and optimization – Capacity planning P.VICAT-BLANC/PRIMET

  17. Network performances measurement For scheduler/ resource manager…: • Network performance parameters are used for optimizing resource allocation (replication, MPI, Remote file access…) • Network performance metrics must • be published to the Grid Information System • Be accessible through aggregated functions called by Grid resource broker services (computing and data storage). P.VICAT-BLANC/PRIMET

  18. Architectural design (GMA) • four functional units : • monitoring tools or sensors • a repository for collected data; • the means for data analysis to generate network metrics; • the means to access and to use the derived metrics. • See our D7.2 document on WP7 EDG site P.VICAT-BLANC/PRIMET

  19. Network Monitoring Architecture Network managers Resource Broker P_RTPL MapCenter P_NWS Middleware Publication LDAP Forecaster Analysis Data processor Raw Repository Data Collector Sensors PingEr IPerf GridFTP SNMP RTPL … P.VICAT-BLANC/PRIMET

  20. Schema and LDAP backend • Grid applications/mw are able to access network monitoring metrics via LDAP services according to a defined LDAP schema. • LDAP back end makes measurements visible through the Globus GIIS/GRIS . • Current metric information stored in local network monitoring stations. • R-GMA is tested as an alternative solution to Globus MDS http://ccwp7.in2p3.fr P.VICAT-BLANC/PRIMET

  21. P.VICAT-BLANC/PRIMET

  22. Measurement methods • Active methods • Injection of traffic inside the network for testing performances between two points • problem: may be intrusive (TCP/UDP throughput) • Passive methods • Collect traffic informations in one point of the network : router, switch, dedicated passive host, computing element or storage element (GRIDftp logs)… • Problem : give network usage, not capacity P.VICAT-BLANC/PRIMET

  23. Two way metrics • One way metrics Tested system PingER GPS Time RIPE 2 RIPE 1 P.VICAT-BLANC/PRIMET

  24. Passive measurement Passive measures at one point P.VICAT-BLANC/PRIMET

  25. Metrics and tools • Round Trip Delay => PinGER • Packet Loss => PinGER • TCP throughput => IPerfER • UDP throughput => UDPMon • site connectivity => MapCenter • service availability => MapCenter • OneWay metrics => RIPEncc test boxes P.VICAT-BLANC/PRIMET

  26. PingER Output: Daresbury and CERN P.VICAT-BLANC/PRIMET

  27. RIPEncc Output: Daresbury and CERN P.VICAT-BLANC/PRIMET

  28. Some results • from testbed sites to CERN • On 1 day or on 1 month • Pinger RTT: Average: 25ms • Pinger Loss Rate : 3% • OWD: average: 9ms • OWL: average 0 to 0,3% • TCP Throughput : from 0 to 350Mb/S P.VICAT-BLANC/PRIMET

  29. Heterogeneous E2E performances P.VICAT-BLANC/PRIMET

  30. InsufficientE2E performances Mesures sur Liaison CERN – Lyon à 155 Mb/s (06/ 2002) with UDP loss l =20% rate r =80Mb/s With TCP rate r <20Mb/s With ICMP Delay d = 18ms P.VICAT-BLANC/PRIMET

  31. EDG Grid Network Element Concept • Network Element Concept : • Shared ressource • Set of virtual links P.VICAT-BLANC/PRIMET

  32. Network Cost functions Network metrics published in LDAP repositories are used by resource brokers and replica managers through network cost functions : TransferTime = networkCost (SE1, SE2, filesize) Computed from • TCP throughput measurements (aggregated) • GridFTP logs • RTT Measurements (aggregated) P.VICAT-BLANC/PRIMET

  33. MAPCenter et TopoGRID • Provide logical and up to date information on the actual accessibility of the Grid resources and services. • A flexible presentation layer of services and applications status of the grid. • MapCenter polls periodically computing elements and services with different methods and displays various views of the grid status on a Web site. • ccwp7.in2p3.fr/mapcenter P.VICAT-BLANC/PRIMET

  34. P.VICAT-BLANC/PRIMET

  35. P.VICAT-BLANC/PRIMET

  36. P.VICAT-BLANC/PRIMET

  37. Map Center features • Multi-threaded • HTML pages generation as fixed rate • Configurable • A configuration file on the server defines timeouts, numbers of threads, length of the history and the frequency of HTML pages generation • Flexibility • This model is very flexible, and any representation (geographical, virtual organization, graphical...) can be easily done by modifying the data file • Can be distributed (per country, per VO…) P.VICAT-BLANC/PRIMET

  38. TopoGRID • To dynamically discover the topology of your GridNetwork…. P.VICAT-BLANC/PRIMET

More Related