1 / 7

Commissioning

Commissioning. Post Mortem analysis of commissioning week 31/3/2008 – 4/4/2008. Monitoring Tasks. Currently 9 tasks Rich (U.Kerzel): 1 RichDAQMon Calo (O.Dechamps): 3 CaloDAQCalib, CaloDAQDisplay, CaloDAQMon L0Calo (O.Dechamps): 3 GlobalDAQMon, L0DUDAQMon, L0CaloDAQMon

zorion
Download Presentation

Commissioning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Commissioning Post Mortem analysis of commissioning week 31/3/2008 – 4/4/2008 M.Frank CERN/LHCb

  2. Monitoring Tasks • Currently 9 tasks • Rich (U.Kerzel): 1RichDAQMon • Calo (O.Dechamps): 3CaloDAQCalib, CaloDAQDisplay, CaloDAQMon • L0Calo (O.Dechamps): 3GlobalDAQMon, L0DUDAQMon, L0CaloDAQMon • MUON (G.Graziani): 1MuonDAQMon • Online (B.Jost): 1RawSizeONLMon • To come: • L0Muon (J.Cogan): 1L0MuonDAQMon M.Frank CERN/LHCb

  3. General Problems of Monitoring Tasks • No common task execution environment • Setup done with “frozen” cmt setup-script • Nearly every task runs his own scriptAt least each subdetector • Conditions database using SQLite requires “real” temporary disk • No NFS mounts of /tmp or /var/tmp • Requires RAM disk • Tasks still need to be assigned to a specific subdetector • “ECAL” rather than “CALO” • Needs to be sorted out M.Frank CERN/LHCb

  4. Storage • 1 hickup on Monday before start • Maybe due to some debugging during the previous week • Problem that sometimes store02::writerd needs restart • Under investigation • May take some time (occurrence ~1 / week) • If disks disappear, the storage is unhappy • Requires complete restart (including by hand action on store02) • Behaved pretty well throughout the entire week • Including dynamic partitioning(RICH1, RICH2, HCAL, TRG, MUON) M.Frank CERN/LHCb

  5. RunInfo Datapoint • After every upgrade of the run info datapoint specific configurations disappear • RunInfo DP has its own instance (and definition) in each partition(different PVSS system) • Monitoring tasks • Storage configuration M.Frank CERN/LHCb

  6. Booting of Nodes • No verification procedure that a node has booted properly • Each reboot needs “by hand” re-configuration • Boot startup tasks start properly only on controls PCs • PVSS projects are started • FSM does not always start properlySub-trees sometimes stay dead • On Farm/Monitoring/Storage nodes the boot startup does not always work • Require Controls PC to be up and running (task manager) • Needs investigation • Tasks seem to be restarted regularly • FMC task manager inconsistencies on Controls PC • During boot all tasks are started properly • Starting tasks later fails / makes tmSrv hang M.Frank CERN/LHCb

  7. General Observations for Farm/Monitoring/Storage Operation • We are flying completely blind folded • If things work all is fine • If they don’t it is difficult to find out why • There are no tools, which with a few panels/windows give coherent diagnostics / reports of what is going wrong • Yes, there is the logViewer – many messages, one per subfarm • mbm utilities to monitor subfarms, monitoring, storage, nodes • Still, for expert use only • Nothing, which allows in depth and simple investigationif anything fails M.Frank CERN/LHCb

More Related