Monitoring configuration and control of the lhcb trigger farm
This presentation is the property of its rightful owner.
Sponsored Links
1 / 16

Monitoring, Configuration and Control of the LHCb Trigger Farm PowerPoint PPT Presentation


  • 88 Views
  • Uploaded on
  • Presentation posted in: General

Monitoring, Configuration and Control of the LHCb Trigger Farm. Gianluca Peco On behalf of the Bologna Group. Trigger Meeting, 21/9/04. Monitoring, Configuration and Control. Monitoring Display of relevant parameters concerning the status of the farm elements

Download Presentation

Monitoring, Configuration and Control of the LHCb Trigger Farm

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Monitoring configuration and control of the lhcb trigger farm

Monitoring, Configuration and Control of the LHCb Trigger Farm

Gianluca Peco

On behalf of the Bologna Group

Trigger Meeting, 21/9/04


Monitoring configuration and control

Monitoring, Configuration and Control

  • Monitoring

    • Display of relevant parameters concerning the status of the farm elements

    • Induce a FSM transition to an alarm state when the monitored parameters indicate error/warning conditions

  • Configuration

    • Define the farm running conditions

      • Farm elements and Kernel version to be used

  • Control

    • Action execution (reboot, ready, start, stop) triggered by manual command or by FSM transition


Monitoring

Monitoring

  • Each node runs a few light processes:

    • monitor sensors;

    • command actuators.

  • DIM is the network communication layer between control units and farm elements

    • It allows bi-directional communication.

  • PVSS is interfaced to the farm nodes

    • to receive monitor data;

    • to issue command to the nodes;

    • to set node configuration.


Pvss and dim

PVSS and DIM

DIM is based on the client/server paradigm

  • Servers "publish" their servicesby registering them with the name server (normally once, at startup).

  • Clients "subscribe" to services by asking the name server which server provides the service and then contacting the server directly, providing the type of service and the type of update as parameters.

  • The name server keeps an up-to-date directory of all the servers and services available in the system.

DIM SERVER

runs on a farm element

PVSS

sensor

actuator


Pvss and dim1

PVSS and DIM

PVSS Data Base

  • PVSS provides a runtime DB, alarm generation, graphicalpanels

  • A key PVSS concept is the data point. A data point type is somewhat analogous to an object oriented class (collection of attributes that provides inheritance).

  • PVSS communicates with DIM via a PVSS-DIM Api Manager that can be configured

  • PVSS can behave as a DIM Client (i.e. receive information from or send commands to DIM servers) or as a DIM Server (i.e. send information to or receive commands from DIM clients)

Data point

DIM


Sensors

Sensors

  • Built as C programs, they collect relevant information from /proc and /sys kernel filesystems and publish them by DIM calls.

  • The following sensors arereadyand tested:

    • Temperature and fan speeds

    • CPU states, including irq and softirq

    • Hardware interrupt rates

    • Memory usage

    • Network interafce card

    • TCP/IP stack

    • Process status

      • The process list is achieved by calls to the libproc-3.2.3.so library (to cope to changes in kernel version).


Data point structure

Sensor-1 DPT

Sensor-2 DPT

Sensor-n DPT

Node_001_01

Node_001_02

Node_001_01

Node_001_02

Node_001_01

Node_001_02

Node_100_20

Node_100_20

Node_100_20

Data Point Structure

  • To each sensor corresponds a DPe in the PVSS (service is mapped in a DPT)

  • A sensor subscribing the DNS is automatically detected by a PVSS ctrl script and subscribed in a corresponding DP structure

  • A missing sensor is detected and its absence is shown in the corresponding control panel

Data points


Dimconfig clientservices

DIMConfig ClientServices


Data point structure ii

Data Point Structure (II)

DpType Structure

SFNode DpT

Name : SFN_xxx_yy

Reference DpT of Sensors DpT

Sensor DpT

Name : Stxxxxx

settings

readings

info

connected (bool)


Development testbed

LHCBPLUS

PC1

3

C

o

m

PC2

Development Testbed

DNS

Sensors/Actuators

PVSS Dist1

PVSS Dist7

14 Linux box

2 Windows box running PVSS Distributed System

1 Linux box development platform running DIM (sensors, actuators)


Display architecture

Display Architecture

Farm Display Panel

subfarm

Action: Event Click

subfarm

subfarm

Node Display Panel

SubFarm Display Panel

Node_001_12

SubFarm_001

ssh

Sensor Display Panel

terminal

Missing service

DP doesn’t exist


Display

Display

Main Display Panel

Process list

On click

Nodes


Display ii

Display (II)


Process control

Process Control

  • Basic mechanism to start/stop a process is ready (DIM Server publish DIMCMD).

  • When a process is started by DIMCMD an arbitrary Unique Thread Group Identifier (UTGID) is assigned to the process. (No more then one process can be started with the same UTGID.

  • Then the process may be traced and killed using UTGID command.

  • The UTGID mechanism is achieved by setting an additional environment variable.


Utgid ustart

UTGID uStart

uStart : start a UTGID process

uStart : Can’t start two process with

the same UTGID


Utgid uls ukill

UTGID uLs,uKill

uLs : show UTGID Proc

uKill : stop process by UTGID


  • Login