170 likes | 348 Views
Simply monitor a grid site with Nagios. J. Casey, CERN E. Imamagic, SRCE ISGC 2008. Overview. Nagios Nagios-based grid monitoring Site monitoring prototype Demo Current status Future work Conclusions. Nagios. Open source monitoring framework widely used & actively developed
E N D
Simply monitor a grid site with Nagios J. Casey, CERN E. Imamagic, SRCE ISGC 2008
Overview • Nagios • Nagios-based grid monitoring • Site monitoring prototype • Demo • Current status • Future work • Conclusions ISGC 2008 / Simply monitor a grid site with Nagios
Nagios • Open source monitoring framework • widely used & actively developed • Host and service problems detection and recovery • Provides wide set of basic sensors • easy to develop custom sensors • Centralized vs. distributed deployment • High configurability • service dependencies, fine-grained notification options • Web interface • status view, administration ISGC 2008 / Simply monitor a grid site with Nagios
Nagios-based Grid Monitoring • Monitoring CRO-GRID Infrastructure (2004-2006) • Globus Toolkit Pre-WS & WS, UNICORE, other services • active recovery of services • http://www.cro-ngi.hr • Monitoring EGEE resources in Central Europe (CE) • core services since mid 2006 • all CE sites for 1st line support since September 2006 • http://nagios.ce-egee.org • Grid Services Monitoring (GSM) WG • site monitoring prototype, mid 2007 • http://crnjak.srce.hr/nagios (egee.srce.hr) • https://pps-monitoring.cern.ch/nagios (CERN-PPS) ISGC 2008 / Simply monitor a grid site with Nagios
Site admins Issue alarms Get site status Get Nagios results Get remote results Get VOMS proxy Get site’s & nodes information Refresh proxy Probe descriptions MyProxy … Live node checks Get nodes information Service checks Site Monitoring Prototype Monitoring server Site nodes … CE SE LFC Site BDII ISGC 2008 / Simply monitor a grid site with Nagios
Grid Probes • Provided by SRCE, CERN, OSG • Security facilities & services • CA distribution, Certificate lifetime, MyProxy • Monitoring & information services • R-GMA, BDII, MDS, GridICE • Job management services • Globus Gatekeeper, RB, WMS, WMProxy, Job matching • File management services • GridFTP, SRM, DPNS, LFC, FTS ISGC 2008 / Simply monitor a grid site with Nagios
Standard Components • Specifications defined by GSM WG • Probe wrapper • enables integration of standardized probes • Grid Monitoring Probes Specification • https://twiki.cern.ch/twiki/bin/view/LCG/GridMonitoringProbeSpecification • Publisher & remote gatherers • integration with other tools • Grid Monitoring Data Exchange Standard • https://twiki.cern.ch/twiki/bin/view/LCG/GridMonitoringDataExchangeStandard ISGC 2008 / Simply monitor a grid site with Nagios
Nagios Config Generator • Uses multiple information sources • SAM, BDII, active heuristic checks • Modular approach • plugging in additional information sources • integration with other monitoring systems (e.g. LEMON) • User-defined rules • configuration tuning for non-standard grid sites • Standalone configuration • integration with existing Nagios server ISGC 2008 / Simply monitor a grid site with Nagios
Service checks Site nodes … CE SE LFC Site BDII Remote gLite UI • Avoid installation of grid middleware on Nagios server • execute grid probes on existing gLite UI • use Nagios Remote Plugin Executor (NRPE) ISGC 2008 / Simply monitor a grid site with Nagios
SAM Standard probes NPM ISGC 2008 / Simply monitor a grid site with Nagios
Current Status • Three sets of standard probes integrated • SRCE, CERN, OSG • Two external monitoring systems • SAM, ENOC DownCollector • Several deployments • CERN-PPS, SRCE, NIKHEF, PIC, IN2P3, ScotGrid • RPMs in apt and yum repository • Installation and configuration manual • More info https://twiki.cern.ch/twiki/bin/view/LCG/GridServiceMonitoringInfo ISGC 2008 / Simply monitor a grid site with Nagios
Future Work • NCG development • providing configuration for multiple sites (regional monitoring) • providing configuration for multiple VOs • Integration with global monitoring systems • ActiveMQ messaging system • Operations Automation Team mandate • Enabling “on-host” check via NRPE • process, logs, ports, files, etc • Probe description & site topology databases definition ISGC 2008 / Simply monitor a grid site with Nagios
Conclusions • Nagios • highly configurable monitoring framework with notifications, service dependencies, … • widely used by site admins • Grid extensions • integration with existing infrastructure (user certificates, VOMS, GOCDB, SAM) • probes for key grid services • Implementation of GSM WG specifications • probe wrapper, publisher & remote gatherers • easy integration with existing probes and monitoring systems ISGC 2008 / Simply monitor a grid site with Nagios
Thank You! Questions? ISGC 2008 / Simply monitor a grid site with Nagios