lemon l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Lemon PowerPoint Presentation
Download Presentation
Lemon

Loading in 2 Seconds...

play fullscreen
1 / 10

Lemon - PowerPoint PPT Presentation


  • 248 Views
  • Uploaded on

Lemon. Computer Monitoring at CERN Miroslav Siket CERN-IT/FIO-FS. Outline. Lemon – what it is? Structure Functionality Metrics Alarms Web visualization. Lemon – LHC Era Monitoring.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Lemon' - danil


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
lemon

Lemon

Computer Monitoring at CERN

Miroslav Siket

CERN-IT/FIO-FS

outline
Outline
  • Lemon – what it is?
  • Structure
  • Functionality
  • Metrics
  • Alarms
  • Web visualization

Sysadmin Introduction at CERN

lemon lhc era monitoring
Lemon – LHC Era Monitoring
  • Lemon is a software package containing tools for monitoring status and performance of the computers (currently limited to Linux and Solaris OS)
  • Contains following components:
    • Sensors (they measure individual metrics [values])
    • MSA (Monitoring Sensor Agent)
    • Monitoring Repository (a daemon that receives the metrics)
    • Monitoring Repository Backend (storage)
    • LRF (Lemon RRD tool framework – caching and web presentation tools)
    • Correlation Engines
    • Lemon Client (tool for retrieving data)
    • LAG (Laser Alarm Gateway – tool for passing alarms to Laser system)
  • See http://cern.ch/lemon for more info

Sysadmin Introduction at CERN

lemon schema

Repository

backend

SQL

RRDTool / PHP

Correlation

Engines

SOAP

SOAP

apache

TCP/UDP

HTTP

Monitoring

Repository

Monitoring Agent

Nodes

Lemon

CLI

Web browser

Sensor

Sensor

Sensor

User

Lemon - schema

Sysadmin Introduction at CERN

sensor ms and sensor agent msa
Sensor (MS) and Sensor Agent (MSA)
  • Sensor measures the data based on the requests from MSA
  • MSA receives the data from sensor through the pipe
  • MSA sends the data to the Monitoring Repository (MR) through the UDP socket
  • Typical communication between the two:
    • MSA forks sensor system
    • MSA: INI 1 LoadAvg
    • MSA: GET 1
    • Sensor: PUT 1 0.42
    • MSA: sends UDP packet to MR
  • MSA controls the frequency and status of individual sensors (several of them)
  • You can write sensors yourself (bash, c++, perl,…)

Sysadmin Introduction at CERN

metrics
Metrics
  • Measured metrics(about 255):
    • Status: OS, disk DMA, RPM ok?, ethlink,…
    • Daemons: sshd, ntpd, syslogd, friod,… alive
    • File size of files: /etc/nologin, /afs/cern.ch,…
    • Security: sshd md5chksum,…
    • Performace: CPU utilization, memory utilization, network bandwidth use,…
    • Misc: virtual organization number of jobs, smart status, temperature,…

(see the list at http://cern.ch/lemon-status/metric_descriptions.php)

  • Status of the MSA can be seen in the /var/log/edg-fmon-agent.log file on each machine (log file to edg-fmon-agent daemon)

Sysadmin Introduction at CERN

lemon at cern
Lemon at CERN
  • Lemon monitors about 2100 computer within 100 clusters
  • On average it collects about 70 metrics from each host
  • Part of the ELFms
  • Integrated with Sure alarm system
  • Collecting about 1GB/day
  • Integrated with CDB

Node

Configuration

Management

Node

Management

Sysadmin Introduction at CERN

sure system
Sure system
  • Sure sensor checks values of the individual metrics with reference values and rises an alarms when the conditions are met
  • Examples:
    • Loadavg > 20 – raises Load_high alarm
    • # of sshd daemons < 1 – raises sshd_dead alarm
    • # of Smart failure in /var/log/messages > 0 – raises smart_failure alarm
  • Alarms are sent to the Sure servers
  • Operators acknowledge alarms, log them and if unable to resolve, notify responsible person
  • Sysadmins receive ITCM tickets – for each alarms there are procedures how to handle them
  • Special case – NO_CONTACT alarm

Sysadmin Introduction at CERN

web visualization and framework
Web visualization and framework
  • LRF pre-process part of the data from Monitoring Repoistory and stores them into the RRD files for fast visualization
  • Groups the logical units (nodes) into clusters based on:
    • CDB [configuration database] definition
    • user defined clusters
    • HW type
    • Racks
  • Php based web interface displays preprocessed data on demand and gives together with CDB and status information general overview
  • Check it at http://cern.ch/lemon-status

Sysadmin Introduction at CERN

summary
Summary
  • Lemon serves to provide monitoring information about the computers in the Computer Center at CERN
  • Thanks to its integration with Sure (alarm system) it allows fast and easy identification and repair of problems
  • In connection to CDB it allows easier overview of services and visualization of their performance
  • In connection to Remedy (ITCM) allows overview of the problems for the given service

Sysadmin Introduction at CERN