1 / 33

Cluster Monitoring with EPICS and SNMP

Cluster Monitoring with EPICS and SNMP. Motivation. We wish to monitor the ALICE HLT analysis cluster – 500 PCs The analysis of data obtained from the ALICE experiment will take a long time, therefore a stable analysis cluster is needed

adia
Download Presentation

Cluster Monitoring with EPICS and SNMP

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cluster Monitoring with EPICS and SNMP CBM Conference 2006 Cluster Monitoring with EPICS and SNMP

  2. Motivation • We wish to monitor the ALICE HLT analysis cluster – 500 PCs • The analysis of data obtained from the ALICE experiment will take a long time, therefore a stable analysis cluster is needed • To ensure stability, this cluster must be constantly monitored • Using the EPICS architecture with SNMP support it is possible to monitor such a PC cluster CBM Conference 2006 Cluster Monitoring with EPICS and SNMP

  3. Contents • Cluster Management • SNMP • MIB Trees • SNMP Operations • Using data from SNMP • EPICS • Overview • Channel Access • Record Display • Device Support • devSNMP • Management Possibilities • Test Implementation • Overview • Software • Monitored Resources • Example Implementation • Extended Implementation • Extension Possibilities • Current State • Summary CBM Conference 2006 Cluster Monitoring with EPICS and SNMP

  4. Cluster Management • Nowadays PC clusters are widely used for data analysis in many settings, such as in physics experiments or commercial organisations • These clusters often consist of hundreds to thousands of individual PCs (nodes) • In order to maintain a healthy, efficient cluster, key resources of the nodes must be monitored, eg: • Hard disk usage • Processor usage • Running processes, etc... • What is the best way of obtaining this information from the nodes? • Self monitoring? • Operating system logging? • SNMP? CBM Conference 2006 Cluster Monitoring with EPICS and SNMP

  5. Simple Network Management Protocol • Simple Network Management Protocol (SNMP) is a management protocol for gathering statistical data about network/host traffic and the behaviour of network components • It is a telecom industry standard protocol and therefore most standardized organizations and main vendors support SNMP • It creates an extensive Management Information Base (MIB) on the host system, which is a database of information useful for network management • MIB objects are organised in a tree structure that includes public (standard) and private branches • These MIBs contain key system resource information which can be used for monitoring purposes CBM Conference 2006 Cluster Monitoring with EPICS and SNMP

  6. MIB Tree - Graphical View iso = 1 • MIB tree can referred to symbolically or numerically • Eg: iso.org.dod.internet.mgmt.mib-2.system.sysUpTime = 1.3.6.1.2.1.1.3 org = 3 dod = 6 internet = 1 mgmt = 2 private = 4 MIB-2 = 1 enterprises = 1 system = 1 ucdavis = 2021 sysDescr = 1 sysUpTime = 3 dskTable = 9 dskEntry = 1 dskTotal = 6 dskAvail = 7 CBM Conference 2006 Cluster Monitoring with EPICS and SNMP

  7. MIB Tree - Output View +--iso(1)   |   +--org(3)      |      +--dod(6)         |         +--internet(1)            |            +--directory(1)            |            +--mgmt(2)            |  |            |  +--mib-2(1)            |     |            |     +--system(1)            |     |  |            |     |  +-- -R-- String    sysDescr(1)            |     |  |        Textual Convention: DisplayString            |     |  |        Size: 0..255            |     |  +-- -R-- ObjID     sysObjectID(2)            |     |  +-- -R-- TimeTicks sysUpTime(3)            |     |  +-- -RW- String    sysContact(4)            |     |  |        Textual Convention: DisplayString            |     |  |        Size: 0..255            |     |  +-- -RW- String    sysName(5)            |     |  |        Textual Convention: DisplayString            |     |  |        Size: 0..255            |     |  +-- -RW- String    sysLocation(6)            |     |  |        Textual Convention: DisplayString            |     |  |        Size: 0..255            |     |  +-- -R-- INTEGER   sysServices(7)            |     |  |        Range: 0..127            |     |  +-- -R-- TimeTicks sysORLastChange(8)            |     |  |        Textual Convention: TimeStamp            |     |  |   |     |  +--sysORTable(9)            |     |     |            |     |     +--sysOREntry(1)            |     |        |  Index: sysORIndex            |     |        |            |     |        +-- ---- INTEGER   sysORIndex(1)            |     |        |        Range: 1..2147483647            |     |        +-- -R-- ObjID     sysORID(2)            |     |        +-- -R-- String    sysORDescr(3)            |     |        |        Textual Convention: DisplayString            |     |        |        Size: 0..255            |     |        +-- -R-- TimeTicks sysORUpTime(4)            |     |                 Textual Convention: TimeStamp            |     |            |     +--interfaces(2)            |     |  |            |     |  +-- -R-- Integer32 ifNumber(1)            |     |  |            |     |  +--ifTable(2)            |     |     |            |     |     +--ifEntry(1)            |     |        |  Index: ifIndex            |     |        |            |     |        +-- -R-- Integer32 ifIndex(1)            |     |        |        Textual Convention: InterfaceIndex            |     |        |        Range: 1..2147483647            |     |        +-- -R-- String    ifDescr(2)            |     |        |        Textual Convention: DisplayString            |     |        |        Size: 0..255            |     |        +-- -R-- EnumVal   ifType(3) CBM Conference 2006 Cluster Monitoring with EPICS and SNMP

  8. SNMP Operations - Overview • SNMP has simple client-server interactions with few operations to access information held in the MIB tree: • {Get} {Set} {GetNext} {Walk} {Table} {Trap} {Translate} • These operations can query local MIB trees, or those of networked machines SNMP Agent SNMP Agent Network SNMP Operation SNMP Agent SNMP Agent SNMP Agent Managed Device MIB MIB MIB MIB CBM Conference 2006 Cluster Monitoring with EPICS and SNMP

  9. SNMP Operations - Command Struct. • Typical SNMP {get} command structure: Operation Community PC to Query MIB Object to query • Output: MIB Object queried Object Type Object Value CBM Conference 2006 Cluster Monitoring with EPICS and SNMP

  10. Using Data from SNMP • Once the information has been obtained from the MIB trees it must be fed into a control system for it to be useful in a management context • This might process the information, store it for later analysis, or simply display it using a Graphical User Interface (GUI) • Many systems currently exist: • EPICS • Ganglia • Lemon CBM Conference 2006 Cluster Monitoring with EPICS and SNMP

  11. EPICS - Overview • One such system is the Experimental Physics and Industrial Control System (EPICS) • www.aps.anl.gov/epics • It is currently in use in over 12 organizations to control devices in major projects such as Particle Accelerators, Telescopes, and Large Experiments • GSI, SLAC, ANL, DESY, LANL, ... • Therefore, huge support and knowledge base • It is based on a client/server network model, with servers holding information in Records which can be accessed by the clients CBM Conference 2006 Cluster Monitoring with EPICS and SNMP

  12. EPICS - Architecture EPICS Clients Network Record Field 1: x Field 2: y Field 3: z Record Field 1: x Field 2: y Field 3: z EPICS Servers CBM Conference 2006 Cluster Monitoring with EPICS and SNMP

  13. EPICS - Channel Access • Remote access to EPICS records is achieved through the Channel Access (CA) protocol • This requires a CA server to be running on the EPICS server, and a CA client to be running on the EPICS client • These are usually already integrated into EPICS clients/servers when they are created CBM Conference 2006 Cluster Monitoring with EPICS and SNMP

  14. EPICS - Architecture EPICS Clients CA Client CA Client Network Record Field 1: x Field 2: y Field 3: z CA Server Record Field 1: x Field 2: y Field 3: z CA Server EPICS Servers CBM Conference 2006 Cluster Monitoring with EPICS and SNMP

  15. EPICS - Record Display • The information from EPICS records can be displayed by a GUI: MEDM CBM Conference 2006 Cluster Monitoring with EPICS and SNMP

  16. EPICS - Record Display GumTree CBM Conference 2006 Cluster Monitoring with EPICS and SNMP

  17. EPICS - Device Support • Records can be interfaced to numerous devices • These devices can be hardware or software • Interfacing allows information from device to be input into EPICS records • This interfacing is known as device support CBM Conference 2006 Cluster Monitoring with EPICS and SNMP

  18. EPICS - Architecture EPICS Clients CA Client CA Client Network Record Field 1: x Field 2: y Field 3: z CA Server Record Field 1: x Field 2: y Field 3: z CA Server EPICS Servers Support Support CBM Conference 2006 Cluster Monitoring with EPICS and SNMP

  19. Device Support for SNMP - devSNMP • devSNMP is the device support for SNMP • Allows the input of data from SNMP into EPICS records • Sets input field of a record to an SNMP {get} operation • It is configured for the open source product, NET-SNMP • This is simply one particular implementation of SNMP • www.net-snmp.org CBM Conference 2006 Cluster Monitoring with EPICS and SNMP

  20. Device Support for SNMP - devSNMP • SNMP {get} command: • Record definition file: record (stringin, “System_Description"){ field (DTYP,"Snmp") field (INP,"@localhost public system.sysUpTime.0 STRING:100") field (SCAN,"5 second")} CBM Conference 2006 Cluster Monitoring with EPICS and SNMP

  21. Management Possibilities • EPICS records are capable of carrying out simple calculations and conditionality relations – nothing very complicated • The data from SNMP can therefore be used to control other devices interfaced with EPICS records • One reaction possibility is an SNMP {set} operation, which writes values to a MIB • However, the current release of devSNMP supports only {get} operation • Other SNMP command support planned for the future CBM Conference 2006 Cluster Monitoring with EPICS and SNMP

  22. Test Implementation - Overview • Carried out at the Linux PC Cluster at the Kirchhoff Institute for Physics, University of Heidelberg • 32 PCs running SuSE 9 Linux OS CBM Conference 2006 Cluster Monitoring with EPICS and SNMP

  23. Test Implementation - Software • EPICS Servers: • 30 cluster nodes (2.4 and 2.6 kernels) running EPICS soft IOCs with devSNMP • NET-SNMP tool set and libraries installed on each node • EPICS Clients: • Two cluster nodes (2.6 kernel) running an installation of Motif Editor and Display Manager (MEDM) on an EPICS base CBM Conference 2006 Cluster Monitoring with EPICS and SNMP

  24. Test Implementation - Architecture MEDM MEDM CA Client CA Client Record Inp: SNMP CA Server Record Inp: SNMP CA Server CA Server Record Inp: SNMP Network devSNMP devSNMP devSNMP SNMP Agent SNMP Agent SNMP Agent MIB MIB MIB CBM Conference 2006 Cluster Monitoring with EPICS and SNMP

  25. Test Implementation - Info. Flow MEDM MEDM CA Client CA Client Record Inp: SNMP CA Server Record Inp: SNMP CA Server Record Inp: SNMP CA Server CBM Conference 2006 Cluster Monitoring with EPICS and SNMP

  26. Test Implementation - Mon. Resources • Some resources monitored: • Hard disk partition usage (total, available, used, percentage used, alarm limit) • Avg CPU usage over 1 min • System up time (from SNMP daemon start) • Inbound Packet Errors • Uncast Outbound Packets • SNMP daemon process check CBM Conference 2006 Cluster Monitoring with EPICS and SNMP

  27. Example Implementation - DESY • Currently EPICS with devSNMP is being used at DESY to monitor key switches and routers • Network Traffic • Status • Solaris and Linux PC clusters to be monitored in the future • In total around 25 managed devices, but this is increasing all the time • More information on EPICS/devSNMP at DESY: • http://www-mks2.desy.de/content/e4/e40/e41/e12212/index_ger.html CBM Conference 2006 Cluster Monitoring with EPICS and SNMP

  28. Extension Possibilities • EPICS has limitations as a management system: • EPICS is a static system. • Records have limited analysis and reaction capabilities,in particular, no rule based events • For dynamic management we can forward information from EPICS records to an expert management system – SysMES (Camilo Lara, et al.) • Allows complex analysis and reaction to the data obtained from SNMP • Management system must have CA Client to communicate with EPICS records CBM Conference 2006 Cluster Monitoring with EPICS and SNMP

  29. Current State • Interface between CA Client and SysMES has been written • Interface between the cluster monitoring systems LEMON and Ganglia have been defined and we are in the process of implementation CBM Conference 2006 Cluster Monitoring with EPICS and SNMP

  30. Current State - Architecture SysMES Client MEDM MEDM Interface CA Client CA Client CA Client Record Inp: SNMP CA Server Record Inp: SNMP CA Server CA Server Record Inp: SNMP Network devSNMP devSNMP devSNMP SNMP Agent SNMP Agent SNMP Agent MIB MIB MIB CBM Conference 2006 Cluster Monitoring with EPICS and SNMP

  31. Summary • SNMP: • Is the standard for network management in almost all modern networked devices (eg: PCs, work stations, bridges, switches, routers, ...) • Widely implemented protocol with a large knowledge base • Very low system resource usage • A lot of system information is stored in node MIB Trees (which SNMP can access) • EPICS: • Widely implemented control system with a huge support base • Allows input and output to a vast array of devices • Through device support for SNMP, these can be combined to create a monitoring system • This can be extended by forwarding the monitoring data to an expert management system (such as SysMES) CBM Conference 2006 Cluster Monitoring with EPICS and SNMP

  32. Thanks • Many thanks to all who have helped, but especially: • Camilo Lara Coordinator, KIP • Albert Kagarmanov devSNMP at DESY CBM Conference 2006 Cluster Monitoring with EPICS and SNMP

  33. The End Thank you for your attention Any questions? CBM Conference 2006 Cluster Monitoring with EPICS and SNMP

More Related