1 / 22

Grid Monitoring Services

Grid Monitoring Services. Robin Middleton RAL/PPD 24-May-01. Overview. What is Monitoring ? GGF Perf-WG DataGrid WP3 Example : Netlogger Summary. Introduction. Information Services part dealt with separately today DataGrid WorkPackage 3 (WP3) UK leadership / responsibility

lucius
Download Presentation

Grid Monitoring Services

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Grid Monitoring Services Robin Middleton RAL/PPD 24-May-01 GridPP Collaboration meeting - R.P.Middleton (RAL/PPD)

  2. Overview • What is Monitoring ? • GGF Perf-WG • DataGrid WP3 • Example : Netlogger • Summary GridPP Collaboration meeting - R.P.Middleton (RAL/PPD)

  3. Introduction • Information Services part dealt with separately today • DataGrid WorkPackage 3 (WP3) • UK leadership / responsibility • WP3 = Grid Monitoring AND Information Services • Global Grid Forum - Perf Mon Workgroup • http://www-didc.lbl.gov/GridPerf/ GridPP Collaboration meeting - R.P.Middleton (RAL/PPD)

  4. What is Monitoring ? • Application performance • Fabric availability • Network availability / performance • Event / Alert • Archives • Forecasting (e.g NWS) • Issues • update/read frequency • information streaming • hierarchical .vs. relational • relaxed coherence; timestamps • scalable; non-invasive • non-repeatable • Monitoring .vs. Monitoring & Information ? GridPP Collaboration meeting - R.P.Middleton (RAL/PPD)

  5. Boundaries Mass Storage Computing Fabric Application Network Monitoring Workload Mgt End-Users DataMan Sys/Grid-Admin GridPP Collaboration meeting - R.P.Middleton (RAL/PPD)

  6. GGF : Perf-WG • “The Grid Performance working group is focused on defining standards and best practices for the gathering, representation, storage, distribution, and query of performance information about Grid resources and applications.” • Four Projects (!) 1.Define a schema for data formats for performance monitoring. This would be a common interchange format that tools could use to interoperate. 2.Taxonomy / classification of performance monitoring and analysis tools. 3.Survey of existing tools classified by the above taxonomy. 4.Recommendations on the aspects of grid applications, services and resources that should be monitored. 5.The development of performance monitoring tools based upon the survey of tools. GridPP Collaboration meeting - R.P.Middleton (RAL/PPD)

  7. GGF Perf-WG : Use Cases 1: Instrumented library for performance measurement (e.g. I/O system) 2: Netlogger/DPSS monitoring streams to log file 3: JAMM (Java) sensors stream data to a GUI 4: JAMM/Port Monitor 5: Fault detection & analysis 6: Job progress monitoring 7: Distributed system performance analysis 8: Network-aware , self-tuning applications 9: Data replication (choice of “best” location) 10: Scheduling & prediction services 11: Auditing systems 12: Configuration monitoring 13: User application monitoring 14: Application self-tuning 15: Real-time adaptive simulation & presentation GridPP Collaboration meeting - R.P.Middleton (RAL/PPD)

  8. DataGrid : WorkPackage 3 The aim of this workpackage is to specify, develop, integrate and test tools and infrastructure to enable end-user and administrator access to status and error information in a Grid environment and to provide an environment in which application monitoring can be carried out. This will permit both job performance optimisation as well as allowing for problem tracing and is crucial to facilitating high performance Grid computing. GridPP Collaboration meeting - R.P.Middleton (RAL/PPD)

  9. Sensor Sensor Sensor Architecture (GGF : Perf-WG) Consumer DirectoryService Discovery Subscribe Publish Host - A Host - B Producer Producer Sensor Sensor Sensor GridPP Collaboration meeting - R.P.Middleton (RAL/PPD)

  10. WP3 : Tasks • Umbrellas • Task 3.1: Requirements & Design (month 1-12) • Task 3.2: Current Technology (month 1-12) • Task 3.3: Infrastructure (month 7-24) • Task 3.4: Analysis & Presentation (month 7-24) • Task 3.5: Test & Refinement (month 19-36) GridPP Collaboration meeting - R.P.Middleton (RAL/PPD)

  11. WP3 : Deliverables (as in the TA) D3.1 (Report) Month 12: Evaluation Report of current technology D3.2 (Report) Month 9 : Detailed architectural design report and evaluation criteria (also input to WP12 architecture deliverable) D3.3 (Prototype) Month 9: Components and documentation for the First Project Release (see WP 6) D3.4 (Prototype) Month 21: Components and documentation for the Second Project Release (see WP 6) D3.5 (Prototype) Month 33: Components and documentation for the Final Project Release (see WP 6) D3.6 (Report) Month 36: Final evaluation report GridPP Collaboration meeting - R.P.Middleton (RAL/PPD)

  12. WP3 : Milestones (as in the TA) M3.1 Month 6: Decide baseline architecture & technologies. M3.2 Month 9: Provide requirements for collation by Project Architect M3.3 Month 9: Prototype components integrated into First Project release (see WP 6) M3.4 Month 21: Interim components integrated into Second Project Release (see WP 6) M3.5 Month 33: Final components integrated into Final Project Release (see WP 6) GridPP Collaboration meeting - R.P.Middleton (RAL/PPD)

  13. WP3 : First Release (PM9) • Information services based on a new version of the Globus MDS (soon to be in alpha release). • Rudimentary implementation of a relational approach to information services. • A set of APIs in support of both MDS and GMA approaches. • Basic presentation of performance monitoring data based around Netlogger GridPP Collaboration meeting - R.P.Middleton (RAL/PPD)

  14. WP3 : Effort Funded Unfunded Total PPARC 3.0 1.83 4.83 SZTAKI (HU) 2.08 0.92 3.0 INFN (IT) 0.0 1.16 1.16 IBM-UK 1.0 0.0 1.0 Total 6.08 3.91 10.0 + Trinity College Dublin (NB : for both Monitoring and Information Services ) GridPP Collaboration meeting - R.P.Middleton (RAL/PPD)

  15. WP3 : Use Cases • Fault Detection & Analysis, Heartbeats [5] • Job Status & Progress Monitoring [6] • Application Performance Monitoring [1,13] • Performance Analysis of Distributed Systems [7] • Scheduling Services and Self Tuning Applications [8,10,14,(15)] • Data Replication Services [9] • Accounting & Auditing [11] • Configuration monitoring [12] GridPP Collaboration meeting - R.P.Middleton (RAL/PPD)

  16. WP3 : Decisions (end 2000) • Try to track standards & best practice from Global Grid Forum • evaluate, steer, adopt, … • Other WPs should provide the majority of sensors • network, fabric, mass-storage • WP3 will provide the instrumentation API • Key deliverables will be • Performance Services • Error / Alert Services • Status / Parameter Services • Logging / Archival Services • (forecasting) - information to enable other WPs to do this • WP3 subcontracts archival services (in terms of the data management aspects) ? GridPP Collaboration meeting - R.P.Middleton (RAL/PPD)

  17. Supervisor Processing Node Readout Buffer Netlogger Acknowledgement : Weidong Li GridPP Collaboration meeting - R.P.Middleton (RAL/PPD)

  18. Supervisor Processing Node Readout Buffer Netlogger Acknowledgement : Weidong Li GridPP Collaboration meeting - R.P.Middleton (RAL/PPD)

  19. Sequence Diagram Readout Buffer Supervisor Processing Node Request T I M E 1 2 Fetch Data 3 4 5 Return data 6 Result 7 GridPP Collaboration meeting - R.P.Middleton (RAL/PPD)

  20. Results 1 2 3 4 5 6 7 X : secs Y : “count” Acknowledgment : Weidong Li GridPP Collaboration meeting - R.P.Middleton (RAL/PPD)

  21. Netlogger Summary • Example deployment • Time resolution • NTP (~5ms) • Custom h/w (~50s) • Thread safety ? • Variety of visualisation methods • “non-invasive” ? • Moving towards the GMA • e.g. integration of directory service GridPP Collaboration meeting - R.P.Middleton (RAL/PPD)

  22. Summary • Information Service is KEY to Monitoring • …and nature of service to be determined ! • Unified Information Architecture is important • …otherwise duplication and inconsistencies • Align with Global Grid Forum for “standards”, etc. • Starting point is Netlogger • DataGrid deliverable details are testbed “driven” • Cross-DataGrid WP - service to many areas GridPP Collaboration meeting - R.P.Middleton (RAL/PPD)

More Related