1 / 16

P erformance and E xception M onitoring Project

P erformance and E xception M onitoring Project. Tim Smith CERN/IT. Overview. Motivation Objectives Analysis and Design Prototyping Perspective and Future. Alarm Recovery action Monitoring System Local Remote Process killer Console Resource planning Accounting Security

siusan
Download Presentation

P erformance and E xception M onitoring Project

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Performance and Exception Monitoring Project Tim Smith CERN/IT

  2. Overview • Motivation • Objectives • Analysis and Design • Prototyping • Perspective and Future Tim Smith: HEPiX @ JLab

  3. Alarm Recovery action Monitoring System Local Remote Process killer Console Resource planning Accounting Security Inventory Independent systems No single overview Duplicated collection Host based: Want Service Perceived problems not real Scalability Motivation Tim Smith: HEPiX @ JLab

  4. Alarm Recovery action Monitoring System Local Remote Console Resource planning Accounting Security Inventory Motivation • Configuration • Collection • Transport • Repository mgmt • Display Tim Smith: HEPiX @ JLab

  5. Objectives • To provide tools in which the alarms and displays are orientated to the overall service provided: • User end-to-end views, Quality of service views • Managerial views of resource usage / evolution / failure rates • Service provider views, and detailed machine views • Link the alarms to both the monitoring and corrective actions • To provide service level metrics • To provide a uniform monitoring infrastructure • Coordinated central repositories + Common logging format • Averaging and archiving of logged information • Correlations between logged information • Multiple input routes; extensible moni. clients • Modular tools; demonstrated scalability Tim Smith: HEPiX @ JLab

  6. Process • Analysis • User Requirements Document • Current Tools survey • Enterprise/Cluster mgmt, Pub domain, other labs, building blocks, DAQ, Run Control, Slow Control • Goal / Question / Metric formalism • System Requirements Document • Design • Interfaces Document • Prototyping Tim Smith: HEPiX @ JLab

  7. Goal / Question / Metric • Ensure quality of Interactive Service • Sufficient nodes? • Low enough load? • Slow to respond to commands? • Contactable via network • Network daemons alive • No nologin • Free ptys • Connection test from remote node Tim Smith: HEPiX @ JLab

  8. PEM Architecture 1 1..n 1 1..n 1 Monitoring Agent Monitoring Broker Measurement Repository 1..n 1 1 1..n Outside PEM 1 1..n Configuration Repository Correlation Engine 1 1..n 1 1 1..n User Interface Access Server Tim Smith: HEPiX @ JLab

  9. Configuration Repository Loading the DB <TAG> </TAG> Parser <TAG> </TAG> <TAG> </TAG> <TAG> </TAG> <TAG> </TAG> XML-DBMS RDBMS jdbc XML Schema Host, Host type Metrics, Services XML-DBMS freeware (Tried XSU from Oracle) Viewers Xerces From Apache Tim Smith: HEPiX @ JLab

  10. Configuration Repository Querying the DB <TAG> </TAG> Parser <TAG> </TAG> <TAG> </TAG> <TAG> </TAG> <TAG> </TAG> XML-DBMS RDBMS jdbc jdbc XML DB Configuration Items Java Objects Tim Smith: HEPiX @ JLab

  11. Correlation Engine • To correlate metrics from the MRS according to configuration in the CRS • Metric collections: trends + multiple machines • Samplings: Union for read efficiency from MRS • Example Java Classes: • Correlation coordinator • Sampling cache • Evaluators • Timers Tim Smith: HEPiX @ JLab

  12. Events • Publish / Subscribe : Java RMI • Interfaces Document Monitoring Agent Monitoring Broker Measurement Repository metricstream metricvalue Configuration Repository Correlation Engine exception configuration User Interface Access Server Tim Smith: HEPiX @ JLab

  13. Monitoring Agent/Broker I • SNMP • extended existing infrastructure • Multithreaded broker loading DB • JMX / JDMK • JMX public specification: managed resources • Plugable agents • Reported several important bugs • Demo at JavaOne conference • Linux/NT remote reset • Netlogger instrumentation • Opened up license negotiations Tim Smith: HEPiX @ JLab

  14. C Low overhead Monitoring Agent/Broker II • Not yet … DMTF • DMI, CMI SNMP Spool /proc netlogger Script Monitoring Process Spool Manager Monitoring Broker Tim Smith: HEPiX @ JLab

  15. PEM Futures • Today: CERN CC needs it • Prototype for ALICE MDC III in January • Tomorrow: Tier-0 RC / GRID node need it • More complete management solutions • Integrate into the Fabric Management WP • ‘GRIDification’ • Rapidly evolving technologies • Lots of middleware • Lots of companies wanting collaboration • still need framework Tim Smith: HEPiX @ JLab

  16. PEM in Perspective Configuration Management Monitoring Alarm Recovery Actions Inventory Resource Planning Security Application Inst/Update OS Configuration/Update OS Installation/Update Power Mgmt/Remote Reset Console Mgmt PC Hardware Tim Smith: HEPiX @ JLab

More Related