1 / 17

Simply monitor a grid site with Nagios

Simply monitor a grid site with Nagios. J. Casey, CERN E. Imamagic, SRCE ISGC 2008. Overview. Nagios Nagios-based grid monitoring Site monitoring prototype Demo Current status Future work Conclusions. Nagios. Open source monitoring framework widely used & actively developed

glenys
Download Presentation

Simply monitor a grid site with Nagios

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Simply monitor a grid site with Nagios J. Casey, CERN E. Imamagic, SRCE ISGC 2008

  2. Overview • Nagios • Nagios-based grid monitoring • Site monitoring prototype • Demo • Current status • Future work • Conclusions ISGC 2008 / Simply monitor a grid site with Nagios

  3. Nagios • Open source monitoring framework • widely used & actively developed • Host and service problems detection and recovery • Provides wide set of basic sensors • easy to develop custom sensors • Centralized vs. distributed deployment • High configurability • service dependencies, fine-grained notification options • Web interface • status view, administration ISGC 2008 / Simply monitor a grid site with Nagios

  4. Nagios-based Grid Monitoring • Monitoring CRO-GRID Infrastructure (2004-2006) • Globus Toolkit Pre-WS & WS, UNICORE, other services • active recovery of services • http://www.cro-ngi.hr • Monitoring EGEE resources in Central Europe (CE) • core services since mid 2006 • all CE sites for 1st line support since September 2006 • http://nagios.ce-egee.org • Grid Services Monitoring (GSM) WG • site monitoring prototype, mid 2007 • http://crnjak.srce.hr/nagios (egee.srce.hr) • https://pps-monitoring.cern.ch/nagios (CERN-PPS) ISGC 2008 / Simply monitor a grid site with Nagios

  5. Site admins Issue alarms Get site status Get Nagios results Get remote results Get VOMS proxy Get site’s & nodes information Refresh proxy Probe descriptions MyProxy … Live node checks Get nodes information Service checks Site Monitoring Prototype Monitoring server Site nodes … CE SE LFC Site BDII ISGC 2008 / Simply monitor a grid site with Nagios

  6. Grid Probes • Provided by SRCE, CERN, OSG • Security facilities & services • CA distribution, Certificate lifetime, MyProxy • Monitoring & information services • R-GMA, BDII, MDS, GridICE • Job management services • Globus Gatekeeper, RB, WMS, WMProxy, Job matching • File management services • GridFTP, SRM, DPNS, LFC, FTS ISGC 2008 / Simply monitor a grid site with Nagios

  7. Standard Components • Specifications defined by GSM WG • Probe wrapper • enables integration of standardized probes • Grid Monitoring Probes Specification • https://twiki.cern.ch/twiki/bin/view/LCG/GridMonitoringProbeSpecification • Publisher & remote gatherers • integration with other tools • Grid Monitoring Data Exchange Standard • https://twiki.cern.ch/twiki/bin/view/LCG/GridMonitoringDataExchangeStandard ISGC 2008 / Simply monitor a grid site with Nagios

  8. Nagios Config Generator • Uses multiple information sources • SAM, BDII, active heuristic checks • Modular approach • plugging in additional information sources • integration with other monitoring systems (e.g. LEMON) • User-defined rules • configuration tuning for non-standard grid sites • Standalone configuration • integration with existing Nagios server ISGC 2008 / Simply monitor a grid site with Nagios

  9. Service checks Site nodes … CE SE LFC Site BDII Remote gLite UI • Avoid installation of grid middleware on Nagios server • execute grid probes on existing gLite UI • use Nagios Remote Plugin Executor (NRPE) ISGC 2008 / Simply monitor a grid site with Nagios

  10. ISGC 2008 / Simply monitor a grid site with Nagios

  11. ISGC 2008 / Simply monitor a grid site with Nagios

  12. ISGC 2008 / Simply monitor a grid site with Nagios

  13. SAM Standard probes NPM ISGC 2008 / Simply monitor a grid site with Nagios

  14. Current Status • Three sets of standard probes integrated • SRCE, CERN, OSG • Two external monitoring systems • SAM, ENOC DownCollector • Several deployments • CERN-PPS, SRCE, NIKHEF, PIC, IN2P3, ScotGrid • RPMs in apt and yum repository • Installation and configuration manual • More info https://twiki.cern.ch/twiki/bin/view/LCG/GridServiceMonitoringInfo ISGC 2008 / Simply monitor a grid site with Nagios

  15. Future Work • NCG development • providing configuration for multiple sites (regional monitoring) • providing configuration for multiple VOs • Integration with global monitoring systems • ActiveMQ messaging system • Operations Automation Team mandate • Enabling “on-host” check via NRPE • process, logs, ports, files, etc • Probe description & site topology databases definition ISGC 2008 / Simply monitor a grid site with Nagios

  16. Conclusions • Nagios • highly configurable monitoring framework with notifications, service dependencies, … • widely used by site admins • Grid extensions • integration with existing infrastructure (user certificates, VOMS, GOCDB, SAM) • probes for key grid services • Implementation of GSM WG specifications • probe wrapper, publisher & remote gatherers • easy integration with existing probes and monitoring systems ISGC 2008 / Simply monitor a grid site with Nagios

  17. Thank You! Questions? ISGC 2008 / Simply monitor a grid site with Nagios

More Related