1 / 6

Availability Reports based on Nagios James Casey, David Collados

Availability Reports based on Nagios James Casey, David Collados. Recap of March reports. March Availability reports for WLCG Official SAM reports: http://gvdev.cern.ch/GRIDVIEW/downloads/Reports/201003/

topper
Download Presentation

Availability Reports based on Nagios James Casey, David Collados

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Availability Reports based on Nagios James Casey, David Collados

  2. Recap of March reports • March Availability reports for WLCG • Official SAM reports: http://gvdev.cern.ch/GRIDVIEW/downloads/Reports/201003/ • Unofficial Nagios report: http://gvdev.cern.ch/GRIDVIEW/downloads/Reports/201003-nagios/ • 171 Sites in the report • Sites whose availability increased in Nagios ≥ 10% 8 = 4.7% • Sites whose availability decreased in Nagios ≥ 10% 16 = 9.4% • Sites whose availability changed in Nagios ≥ 10% 24 = 14.0%

  3. April – WLCG reports • March Availability reports for WLCG • Official SAM reports: http://gvdev.cern.ch/GRIDVIEW/downloads/Reports/201004/wlcg/ • Unofficial Nagios report: http://gvdev.cern.ch/GRIDVIEW/downloads/Reports/201004-nagios/wlcg/ • 171 Sites in the report • Sites whose availability increased in Nagios ≥ 10% 5 = 2.9% • Sites whose availability decreased in Nagios ≥ 10% 3 = 1.8% • Sites whose availability changed in Nagios ≥ 10% 8 = 4.7%

  4. April – WLCG reports • Sites whose availability decreased in Nagios ≥ 10% 3 = 1.8% • VICTORIA-LCG2: fails org.gstat.SanityCheck • GSI-LCG2: lfc-mkdir fails on WN during org.sam.WN-Rep test • NO-NORGRID-T2: fails org.bdii.Entries test • Sites whose availability increased in Nagios ≥ 10% 5 = 2.9% • UNI-FREIBURG: lcg-cp connection times out in SAM srm-put test • Wuppertalprod: failed sBDII-performance test in SAM (now fixed) • T2_Estonia: fails ce-sft-lcg-rm-cr test in SAM • RRC-KI: fails ce-sft-lcg-rm test in SAM (SE not defined in BDII) • UKI-SOUTHGRID-BRIS-HEP: sBDII sanity test timeouts in SAM • All failures accounted to grid site problems

  5. April – WLCG reports • IT ROC evaluation on the availability based on SAM vs Nagios • Nagios availability higher due to different response to a failure: • In SAM each failure is accounted • Nagios has soft states and increases frequency of checks when there are failures • sBDII tests executed from ROC Nagios instances instead of GStat in Taiwan

  6. May Reports - Summary • We consider the Nagios infrastructure ready to replace SAM OPS submissions • Propose: generate official OPS reports for May based on Nagios • While we keep on running SAM OPS tests during May • Propose: stop SAM OPS submissions on 15th June after computation and validation of May reports

More Related