1 / 22

GridView - A Grid Monitoring and Visualization Tool

GridView - A Grid Monitoring and Visualization Tool. Rajesh Kalmady, Digamber Sonvane, Kislay Bhatt, Phool Chand, Computer Division, BARC, India James Casey, Zdenek Sekera, IT Department, CERN. Project Goal.

mbaxley
Download Presentation

GridView - A Grid Monitoring and Visualization Tool

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. GridView - A Grid Monitoring and Visualization Tool Rajesh Kalmady, Digamber Sonvane, Kislay Bhatt, Phool Chand, Computer Division, BARC, India James Casey, Zdenek Sekera, IT Department, CERN TIFR, Mumbai, India, Feb 13-17,2006

  2. Project Goal • Provide a high level view of the various Grid resources and functional aspects of the LCG • Central Archival, Analysis, Summarization Graphical Presentation and Pictorial Visualization of Data from various LCG sites and monitoring tools • Useful in GOCs/ROCs and to site admins/VO admins TIFR, Mumbai, India, Feb 13-17,2006

  3. Gridview Architecture • Loosely coupled components with independent sensors, transport, archival, analysis and visualization components. • Sensors are the various LCG information providers and monitoring tools at sites • Transport used is R-GMA • Gridview provides Archival, Analysis and Visualization TIFR, Mumbai, India, Feb 13-17,2006

  4. Data Sources (LCG Sites and Monitoring Tools) • LCG-2 Information Providers • Gridftp Logs (Service Challenge Throughput Tests) • RB Job State • WN Job State • Site Functional Test (SFT) • GIIS Monitor (GStat) • LCG-2 Certificate Lifetime • LCG-2 Job Submission Tests TIFR, Mumbai, India, Feb 13-17,2006

  5. R-GMA Transport • Monitoring data generated at grid sites by different monitoring tools • Gridview collects this data for archival in a central Oracle database at CERN • R-GMA used as transport mechanism. Gridview a major consumer of R-GMA tuples • Many monitoring tools publish data to R-GMA TIFR, Mumbai, India, Feb 13-17,2006

  6. GridView Architecture TIFR, Mumbai, India, Feb 13-17,2006

  7. Archiver Module • Collects R-GMA tuples containing monitoring information published by data sources • Archives it into central Oracle database at CERN • Implemented in Java TIFR, Mumbai, India, Feb 13-17,2006

  8. Summarization Module • Performs analysis of monitoring data collected by archiver module and generates summary information • Usage, Performance figures • Detect Fault situations and user defined events • Summary info stored back in database • Filtering of duplicate R-GMA tuples TIFR, Mumbai, India, Feb 13-17,2006

  9. Presentation Module • Presents current and history information (summaries created by Summarization module) • Conventional bar graphs, histograms and pie charts TIFR, Mumbai, India, Feb 13-17,2006

  10. GUI and Visualization Module • Dashboard showing all grid sites on a map • Current site status information and fault notification displayed using 3D graphics • Information generated by summarization module read from database • Hooks to invoke presentation module to view history information TIFR, Mumbai, India, Feb 13-17,2006

  11. Current Implementation: File Transfer Monitoring • Analysis of GridFTP logs • Gridftp transfers are logged and published in R-GMA by lcg-mon-gridftp and archived by Gridview • After analysis, following summaries are created • Hour-wise, day-wise average throughput per site • Hour-wise, day-wise aggregate data transfer per site • Hour-wise average throughput and aggregate data transfer per VO • Host wise data transfer details • In production use during SC3 TIFR, Mumbai, India, Feb 13-17,2006

  12. TIFR, Mumbai, India, Feb 13-17,2006

  13. Current Implementation: Job Monitoring • Job status logs published by LB (Logging and Bookkeeping) servers at various RBs • Gridview generates following periodic (hourly/daily/weekly/monthly) summary info: • Total number of jobs in different states at different grid sites • VO-wise and RB-wise job distribution • Metrics such as site-wise Job success rate • Resource utilization by different VOs etc. TIFR, Mumbai, India, Feb 13-17,2006

  14. TIFR, Mumbai, India, Feb 13-17,2006

  15. Current Implementation:GUI and Visualization • Java 3-D based application showing different grid sites, their status summaries and fault conditions • Following are some status summaries to be shown • CPU Status – Total, Free, Busy • Storage Status – Total, Used, Free • Job Status – Total, Running, Queued-up • Service status – Ok, Stopped, Degraded • Network Traffic Status • Total Bandwidth • Long Term average bandwidth used • Currently (last hour) used bandwidth TIFR, Mumbai, India, Feb 13-17,2006

  16. Fault Notification (Fault and Alarm Types) • CEs, SEs, WNs down • Site Functional Test (SFT) failure • Sanity check failure (GIIS Monitor) • SE storage space full • Many jobs piled up in the queue • Broken network link • Host Certificate expiry TIFR, Mumbai, India, Feb 13-17,2006

  17. TIFR, Mumbai, India, Feb 13-17,2006

  18. TIFR, Mumbai, India, Feb 13-17,2006

  19. TIFR, Mumbai, India, Feb 13-17,2006

  20. On-Going work in Gridview • Service Availability Monitoring • Being interfaced with SFT (Site Functional Tests) for monitoring availability of various services such as CE, SE, RB, BDII etc. • Rating of sites according to average resource availability and acceptable thresholds • Service availability metrics such as MTTR, uptime, failure rate to be computed and visuailsed • gLite FTS • Gridview to be adapted to monitor file transfer statistics like successful transfers, failure rates etc for FTS channels across grid sites • Enhancement of GUI & Visualisation module to function as full-fledged dashboard for LCG TIFR, Mumbai, India, Feb 13-17,2006

  21. In conclusion • Gridview is a useful tool for high level visualization of grid status • File Transfer Monitoring in production use during SC3 tests • Job status monitoring to be released for production use • Work on Service Availability Monitoring (integrating SFT etc.) in progress TIFR, Mumbai, India, Feb 13-17,2006

  22. Thank You TIFR, Mumbai, India, Feb 13-17,2006

More Related