Monitoring configuration and control of the lhcb trigger farm
Download
1 / 16

Monitoring, Configuration and Control of the LHCb Trigger Farm - PowerPoint PPT Presentation


  • 117 Views
  • Uploaded on

Monitoring, Configuration and Control of the LHCb Trigger Farm. Gianluca Peco On behalf of the Bologna Group. Trigger Meeting, 21/9/04. Monitoring, Configuration and Control. Monitoring Display of relevant parameters concerning the status of the farm elements

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Monitoring, Configuration and Control of the LHCb Trigger Farm ' - mariko-curry


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Monitoring configuration and control of the lhcb trigger farm

Monitoring, Configuration and Control of the LHCb Trigger Farm

Gianluca Peco

On behalf of the Bologna Group

Trigger Meeting, 21/9/04


Monitoring configuration and control
Monitoring, Farm Configuration and Control

  • Monitoring

    • Display of relevant parameters concerning the status of the farm elements

    • Induce a FSM transition to an alarm state when the monitored parameters indicate error/warning conditions

  • Configuration

    • Define the farm running conditions

      • Farm elements and Kernel version to be used

  • Control

    • Action execution (reboot, ready, start, stop) triggered by manual command or by FSM transition


Monitoring
Monitoring Farm

  • Each node runs a few light processes:

    • monitor sensors;

    • command actuators.

  • DIM is the network communication layer between control units and farm elements

    • It allows bi-directional communication.

  • PVSS is interfaced to the farm nodes

    • to receive monitor data;

    • to issue command to the nodes;

    • to set node configuration.


Pvss and dim
PVSS and DIM Farm

DIM is based on the client/server paradigm

  • Servers "publish" their servicesby registering them with the name server (normally once, at startup).

  • Clients "subscribe" to services by asking the name server which server provides the service and then contacting the server directly, providing the type of service and the type of update as parameters.

  • The name server keeps an up-to-date directory of all the servers and services available in the system.

DIM SERVER

runs on a farm element

PVSS

sensor

actuator


Pvss and dim1
PVSS and DIM Farm

PVSS Data Base

  • PVSS provides a runtime DB, alarm generation, graphicalpanels

  • A key PVSS concept is the data point. A data point type is somewhat analogous to an object oriented class (collection of attributes that provides inheritance).

  • PVSS communicates with DIM via a PVSS-DIM Api Manager that can be configured

  • PVSS can behave as a DIM Client (i.e. receive information from or send commands to DIM servers) or as a DIM Server (i.e. send information to or receive commands from DIM clients)

Data point

DIM


Sensors
Sensors Farm

  • Built as C programs, they collect relevant information from /proc and /sys kernel filesystems and publish them by DIM calls.

  • The following sensors arereadyand tested:

    • Temperature and fan speeds

    • CPU states, including irq and softirq

    • Hardware interrupt rates

    • Memory usage

    • Network interafce card

    • TCP/IP stack

    • Process status

      • The process list is achieved by calls to the libproc-3.2.3.so library (to cope to changes in kernel version).


Data point structure

Sensor-1 DPT Farm

Sensor-2 DPT

Sensor-n DPT

Node_001_01

Node_001_02

Node_001_01

Node_001_02

Node_001_01

Node_001_02

Node_100_20

Node_100_20

Node_100_20

Data Point Structure

  • To each sensor corresponds a DPe in the PVSS (service is mapped in a DPT)

  • A sensor subscribing the DNS is automatically detected by a PVSS ctrl script and subscribed in a corresponding DP structure

  • A missing sensor is detected and its absence is shown in the corresponding control panel

Data points



Data point structure ii
Data Point Structure (II) Farm

DpType Structure

SFNode DpT

Name : SFN_xxx_yy

Reference DpT of Sensors DpT

Sensor DpT

Name : Stxxxxx

settings

readings

info

connected (bool)


Development testbed

LHCBPLUS Farm

PC1

3

C

o

m

PC2

Development Testbed

DNS

Sensors/Actuators

PVSS Dist1

PVSS Dist7

14 Linux box

2 Windows box running PVSS Distributed System

1 Linux box development platform running DIM (sensors, actuators)


Display architecture
Display Architecture Farm

Farm Display Panel

subfarm

Action: Event Click

subfarm

subfarm

Node Display Panel

SubFarm Display Panel

Node_001_12

SubFarm_001

ssh

Sensor Display Panel

terminal

Missing service

DP doesn’t exist


Display
Display Farm

Main Display Panel

Process list

On click

Nodes



Process control
Process Control Farm

  • Basic mechanism to start/stop a process is ready (DIM Server publish DIMCMD).

  • When a process is started by DIMCMD an arbitrary Unique Thread Group Identifier (UTGID) is assigned to the process. (No more then one process can be started with the same UTGID.

  • Then the process may be traced and killed using UTGID command.

  • The UTGID mechanism is achieved by setting an additional environment variable.


Utgid ustart
UTGID uStart Farm

uStart : start a UTGID process

uStart : Can’t start two process with

the same UTGID


Utgid uls ukill
UTGID uLs,uKill Farm

uLs : show UTGID Proc

uKill : stop process by UTGID


ad