1 / 17

FAX update

FAX update. 26 th August 2013. Content. Running issues FAX failover Moving to new AMQ server Informing on endpoint status Monitoring developments Monitoring validation dCache monitor 5.0.0 Collector Dashboard 50 shades of green. Running issues. Dead endpoints:

davin
Download Presentation

FAX update

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. FAX update 26thAugust 2013

  2. Content • Running issues • FAX failover • Moving to new AMQ server • Informing on endpoint status • Monitoring developments • Monitoring validation • dCache monitor 5.0.0 • Collector • Dashboard • 50 shades of green Ilija Vukotic ivukotic@uchicago.edu

  3. Running issues • Dead endpoints: • Frascati, Manchester, LAL • cmsd services are dead at: • Taiwan-lcg2, LPSC, Protvino, SWT2_CPB • /atlas/dq2/user/gangarbtlookups • Made half of federation endpoints not accessible from upstream redirectors. • will be more explained by Johannes. • Remaining issues with x509 • communicating our wish to get it turned on • BU, DESY-HH, DESY-ZN, FZK, LRZ-LMU, MPPMU, Freiburg, Wuppertal, Geogrid Ilija Vukotic ivukotic@uchicago.edu

  4. Runningissues Rather green considering it’s August ! Quite a bit of traffic considering it’s August ! New functional HC tests should not contribute much AFAIK Ilija Vukotic ivukotic@uchicago.edu

  5. FAX failover • FAX failover works http://pandamon.cern.ch/fax/failover. • Developments: • Cloud is shown and corrected queue names • Side menu • In works: • Filtering on user • Graphing • To ponder: • Site admins are not aware of this possibility. How do we communicate to them that it is in their best interest to turn it on? Ilija Vukotic ivukotic@uchicago.edu

  6. FAX failover Production jobs failing over to FAX FAX dedicated submenu Will add here panda brokered job statistics Ilija Vukotic ivukotic@uchicago.edu

  7. Moving to new AMQ server • All FAX related info was sent to pilot.msg.cern.ch • There was no authentication • Moved to Dashboard test broker • Consumer now uses STOMP+SSL • Required change to new stomp version • This week will move to production server Ilija Vukotic ivukotic@uchicago.edu

  8. Informing on endpoint status • Mailing from SSB works and gives results. • Do we want SAM updates too? • What would it take? • Who would do it? Ilija Vukotic ivukotic@uchicago.edu

  9. Monitoring developments • There is a need to remotely check if cmsd works. • We had (and still have) sites showing as green for direct access and red for downstream redirection. • Investigation shows that actually cmsd’s are dead/not responding. • Need a way to directly probe cmsd’s • Andy will look at the ways to do it. • To develop new columns for SSB: • xRootD version • Rucio support • Monitoring status Ilija Vukotic ivukotic@uchicago.edu

  10. Monitoring validation • First step is validation that results shown by Matevz’s collector are correct. • I was sending xrootd summary messages to collector and checking what I see in plots. While messages arrive and get shown, there is something wrong in calculating/plotting summaries. Ilija Vukotic ivukotic@uchicago.edu

  11. Ilija Vukotic ivukotic@uchicago.edu

  12. dCache monitor 5.0.0 • dCache monitor mostly rewritten: • dCache compatible logging • UDP messaging from same ports • Sends “=” stream • Sends more data (substitutes DN \CN with username etc.) • Made compatible with collector • Tested at MWT2. Very good results. • End of the week, RPM will be produced and placed in WLCG repository. CMS will be informed about new version. Ilija Vukotic ivukotic@uchicago.edu

  13. Collector • New version being prepared by Matevz • New AMQ version • BIG ISSUE: • Some CMS sites are sending info to our collector.Will be raised with Brian B. Ilija Vukotic ivukotic@uchicago.edu

  14. Dcache monitor 5.0.0 • Now gives really important and actionable information. Just during debugging I noticed: • Files opened, read a small percentage and kept open for hours. • Same file open twice in the same session (?!) • Rather small usage of vector reads. Ilija Vukotic ivukotic@uchicago.edu

  15. In dashboard Why difference between table and plots? What’s idea of “Site history” tab? Need to investigate why CMS sites appear here (CERN-CMSTEST) Ilija Vukotic ivukotic@uchicago.edu

  16. PANDA re-brokering • Discussed at last CERN S&C week • We agreed on providing an estimate of cost to move data in WAN to PANDA, so it could re-broker jobs from very long queues to sites with free slots that have good connection to data. • Cost matrix exist in SSB. • Code reading it from SSB doing exponential decay smoothing runs and sends info to AGIS. • Have to check scalability of AGIS bulk update. • Waiting for Artem to code moving data from AGIS to schedconfig. • Next step is Tadashi making use of that table from schedconfig and actually re-broker. • Finally we’ll have to monitor it the same way we do with Failover. No developments Ilija Vukotic ivukotic@uchicago.edu

  17. 50 shades of green • Green color in any of the FAX SSB monitor metrics is based on one and the same file. • This involves a lot of cached information. • Need to find out a percentage of successfully obtained files from much large file pool while avoiding caching effects. • Simple code developed to test all endpoints having FDR datasets. Doing _file0->ls() on each of the ~800 files. Sequential. • Currently run by hand. • You may find it in FAXtools/FAXtestsFDR of our CERN FAX git repo. Ilija Vukotic ivukotic@uchicago.edu

More Related