1 / 20

Detector Monitoring proposals ( V.Dattilo for the EGO Operations Group ) Introduction

Detector Monitoring proposals ( V.Dattilo for the EGO Operations Group ) Introduction Current status Proposals. VIRGO Collaboration Meeting Cascina , Dec 5th-7th , 200 5. Introduction 1/2.

Download Presentation

Detector Monitoring proposals ( V.Dattilo for the EGO Operations Group ) Introduction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Detector Monitoring proposals • ( V.Dattilo for the EGO Operations Group ) • Introduction • Current status • Proposals VIRGO Collaboration Meeting Cascina, Dec 5th-7th, 2005

  2. Introduction 1/2 “In any complex and sophisticated apparatus, scientific or industrial, an on-line status-monitoring system (for brevity: Detector Monitoring) is fundamental to the assessment of proper operation and to raise an alert in case of not-standard or potentially dangerous malfunctioning” For this reason, it would be particularly useful to the operator’s work, too. Notwithstanding the above statements and the work performed up to now, in Virgo a complete on-line status-monitoring systemis still not finalized.

  3. Introduction 2/2 What we mean by the on-line status-monitoring system in the case of Virgo? • A centralized system that displays and stores information related to the status and proper (or not proper) functioning of: • Processes/servers/daemons • Machines (RIOs, WKs, Disks, other CPUs, …) • Control loops of the ITF • Working points of ITF items • Infrastructure plants (air conditioning, UPS, …) • … • When necessary, it raises an efficient and appropriate alert/alarm ; • Provides an hyperlink from event to recovery procedure; • Flexible for upgrades and easily re-configurable, according to the evolution of Virgo.

  4. Current status The work on the definition and implementation of monitoring tools started a few years ago, and has been boosted during the last year. It has been carried out by a few groups (LAL, LAPP, EGO ITF. Dept.) The outcome has been a set of monitoring tools, each of them specialized in the monitoring of a part of Virgo:

  5. Current status existing tools BIG BROTHER: monitoring of machines, servers and daemons (EGO Software Group)

  6. Current status existing tools ERROR LOGGER: detailed monitoring of servers (LAL Group)

  7. Current status existing tools Qc-MONI: monitoring of control loops, working points of sub-systems(the monitoring of servers is a recent feature and currently under implementation) (LAPP Group , with contributions by EGO Operations Group for the configuration)

  8. Current status existing tools IMMS: monitoring of air conditioning plant, UPS, … (in progress) (EGO Software Group)

  9. Current status existing tools These 4 existing monitoring tools complement one another, with slights overlaps: Machines (RIOs, WKs, Disks, other CPUs, …) Daemons BigBrother Servers Control loops of the ITF • Working points of ITF items Infrastructure plants (air conditioning, UPS, …)

  10. Current status existing tools These 4 existing monitoring tools complement one another, with slights overlaps: Machines (RIOs, WKs, Disks, other CPUs, …) Daemons Error Logger BigBrother Servers Control loops of the ITF • Working points of ITF items Infrastructure plants (air conditioning, UPS, …)

  11. Current status existing tools These 4 existing monitoring tools complement one another, with slights overlaps: Machines (RIOs, WKs, Disks, other CPUs, …) Daemons Error Logger BigBrother Servers Control loops of the ITF • Working points of ITF items QcMoni Infrastructure plants (air conditioning, UPS, …)

  12. Current status existing tools These 4 existing monitoring tools complement one another, with slights overlaps: Machines (RIOs, WKs, Disks, other CPUs, …) Daemons Error Logger BigBrother Servers Control loops of the ITF • Working points of ITF items QcMoni IMMS Infrastructure plants (air conditioning, UPS, …)

  13. Current status recent work Recent work: • hyperlink from event to recovery procedure has been implemented in QcMoni; • the syntax for criteria definition in QcMoni has been upgraded: now it allows also dynamic criteria (ie depending on the locking/alignment state when needed); • development of IMMS; • a warning level, in addition to the normal and error levels, has been added in QcMoni; • The QcMoni configurations of a few sub-systems have been completed ; • The feature to detect a server state has been added to QcMoni (this feature being based on the data sent by the server to the DAQ, it requires a modification of those servers not yet providing data to the DAQ)

  14. Current status limitations Notwithstanding the work and efforts on the above mentioned Virgo Monitoring tools, they are not extensively used by operators as parts of an on-line monitoring systems. The main identified reasons for this are: • The lack of a centralized monitoring GUI scatters the operator’s attention; • The configurations of a few sub-systems are still missing or not complete (a few sections to be added to QcMoni, not all the servers are declared to ErrorLogger, …); • In some cases, the threshold tuning is still not complete: consequently, the related flags/messages may denote an error, despite the monitored item being in the standard state. This depreciates the attention paid to the error events; • In some cases, lack of instructions concerning a well defined operator’s action associated to a given warning / error event; • Due to the lack of an efficient warning / error notification, appropriate to the event gravity, the operator sometimes realizes late (or not at all) the displayed event;

  15. Proposals #1 After consultation with the Operators and preliminary discussions with a few of the involved people, the following proposals have been defined: Proposal #1 Centralization of the monitoring tools The possibility to conceive and implement ex-novo a monitoring system or a new one based on the existing ones has been quickly discarded, due to the large amount of required work and for the will to take advantage of the work already performed on the existing tools. Since a centralization at level of GUI and alarm notification is enough, a choice could be to use the QcMoni tool to provide rapidly a single monitoring interface to operators. This proposal requires the implementation in BB, EL, IMMS of the feature to export their generated information to QcMoni (IMMS is already conceived to forward data to DAQ) . Since QcMoni takes the information from the DAQ, the above proposal means that they should sent data (SM data are enough) to the DAQ. cont.

  16. Proposals #1 • In this way, additional flags, denoting the status of machines, infrastructure plants, daemons and servers, will be added to the QcMoni interface. When one of these additional flags denotes a warning or an error, by clicking on it an hyperlink should open the relevant page of BB, EL, IMMS that generated that alert, where it is possible to get further info on the event. • This implies also: • QcMoni has to allow not only hyperlink to text file containing the recovery instructions, but also hyperlink to a generic web page; • EL and IMMS should also provide a web interface (like BB); An example: 1st level of QcMoni Suspensions monitoring CURRENTLY Control loops Working points

  17. PR_servers PR_machines BS_machines BS_servers NI_servers NI_machines NE_servers NE_machines WI_servers WI_machines WE_machines WE_servers Proposals #1 • In this way, additional flags, denoting the status of machines, infrastructure plants, daemons and servers, will be added to the QcMoni interface. When one of these additional flags denotes a warning or an error, by clicking on it an hyperlink should open the relevant page of BB, EL, IMMS that generated that alert, where it is possible to get further info on the event. • This implies also: • QcMoni has to allow not only hyperlink to text file containing the recovery instructions, but also hyperlink to a generic web page; • EL and IMMS should provide also a web interface (like BB); An example: 1st level of QcMoni Suspensions monitoring AFTER machines servers Control loops Working points

  18. Proposals #2 • Proposal #2 • Warning / alarm notification • A centralized warning / alarm notification of the events should be implemented. • Indeed, besides the display on the screen, other efficient possibilities should exist, like a sound alarm or a message to ALP, depending on the event level and the recovering action required. The possibility to send a sms to on-call persons or an e-mail to concerned expert can also be envisaged. • This is very important for 2 reasons: • during shifts: to avoid that sometime the operator may not realize in due time an event occurrence, thus delaying the triggering of some action; • during night and weekend: depending on the level of the event, to request some recovery action to ALP or to the on-call person. • The most critical alarms will continue to be managed also by the single tools with the proper notification policy.

  19. Proposals #3 • Proposal #3 • Monitoring web interface • A good part of the evolution process of the monitoring system is expected to be at level of web displaying and notification, according also to the future needs that will come from their use in the Control Room. • A strong and prompt interaction with the EGO Operations Group is therefore fundamental. • To involve the Operations Group directly in the work on that software should boost the evolution process (currently it works only on the configuration files). • In that case, other new web technologies could be advantageously adopted to build the monitoring web interface, similarly to what has been done in the case of the new Virgo logbook (more in Gary’s talk). This will allow, for instance, to solve also the still present problem of the QcMoni web page refresh.

  20. Proposals misc. • Proposal (miscellaneous) • To implement a way to easily retrieve and consult from the stored data the flag content. Currently the bits associated to each flag are converted in a decimal number (up to 32 bits) and stored in the trend data. The inverse process is not straightforward… • Implementation of the hyperlink feature in those monitoring tools where not still available (BB, EL, IIMMS) • Sub-systems are invited to collaborate in the configuration of the relevant monitoring section, by providing criteria, thresholds and recovery action instructions to the Operations Group. • …. • Coffee after recovery action.

More Related