1 / 16

Monitoring of LHC Computing Activities

Get key results and recent developments on data transfer monitoring, job monitoring, and monitoring of sites and services for the LHC computing activities.

dorae
Download Presentation

Monitoring of LHC Computing Activities

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Monitoring of the LHC Computing Activities Key Results from the Services for HUC (SA3) EGI Technical Forum 2011 J. Andreeva, M. Cinquilli, P. Dhara, E. Karavakis (CERN & SA3), P. Karhula, M. Kenyon, L. Kokoszkiewicz, E. Lanciotti, M. Nowotka, G. Ro, P. Saiz, L. Sargsyan, D. Tuckett CERN IT-ES

  2. Outline • Importance of monitoring • Experiment Dashboard • Key Results and Recent Development on: • Data Transfer Monitoring • Job Monitoring • Monitoring of Sites and Services • Summary Monitoring of the LHC Computing Activities 2 22/09/2011

  3. Importance of monitoring • WLCG integrates more than 140 computing centres in 35 countries • Reliable monitoring is complicated due to the diversity of the infrastructure • Powerful and flexible monitoring systems are required in order to maintain and improve a highly distributed system • Monitoring the computing activities for a given VO is essential in order to estimate the quality of the infrastructure and to detect any problems or inefficiencies Monitoring of the LHC Computing Activities 3 22/09/2011

  4. Experiment Dashboard • Not coupled to a specific Workload or Data Management System • Covers the full range of the experiments’ computing activities: • Job Monitoring, Data Transfers, Sites and Services • Provides common solutions focused on different user categories • Heavily used by the main four LHC experiments • More than 4000 unique visitors monthly just for CMS • Can be easily adapted to the needs of new VOs but the VOs must decide what they wish to monitor and implement/extend the monitoring system to their needs Monitoring of the LHC Computing Activities 4 22/09/2011

  5. Dashboard for Monitoring the Computing Activities of the LHC Data transfer Data access WLCG GoogleEarth Dashboard Analysis + Production Real time and Historical Views Site Status Board Site usability SiteView Monitoring of the LHC Computing Activities 5 22/09/2011

  6. Recent Development on Data Management Monitoring • Monitors dataset and file movement • Used 24/7 by shifters to identify failures and alert sites • ~1k unique visitors monthly / ~15k page views daily • New ATLAS DDM Dashboard UI provides an interactive matrix and high quality plots of transfer statistics with flexible filtering and grouping • Implemented in AJAX / jQuery Monitoring of the LHC Computing Activities 6 22/09/2011

  7. Monitoring of the LHC Computing Activities Recent Development on Data Transfers Monitoring • Currently there is no tool that can provide an overall view of data transfers on the WLCG scope (across LHC experiments, across various technologies used, for example FTS and xrootd, across multiple local FTS instances, etc..) • Prototype WLCG Transfer Dashboard consuming FTS transfer events, generating statistics and exposing data via a generic version of the DDM Dashboard user interface. Initially for ATLAS, CMS and LHCb – support for other file transfer protocols is planned 7

  8. Recent Development on Job Monitoring • Aimed at different types of users: individual scientists, user support teams, site admins and VO managers • Works transparently across different middleware, submission methods and execution backends. Used heavily within CMS and ATLAS • Common DB schema and commonapplications • Improvements on information collectors for job monitoring data • Speed improvements and optimisations for all the different user interfaces • Added functionality and flexibility to the Historical Views job accounting application • New version of User Analysis Task Monitoring and Production Task Monitoring using a common framework (hbrowse) implemented in jQuery Monitoring of the LHC Computing Activities 8 22/09/2011

  9. User Analysis Task Monitoring • User / User-support perspective with a wide selection of plots • ATLAS version in production based on ‘hbrowse’ (also used in ganga/diane mon) • Will be adopted by CMS as well Monitoring of the LHC Computing Activities 9 22/09/2011

  10. Job Summary & Historical Views Job Summary • Shifter, Expert, Site perspective • Real time job metrics by site, activity, … • Significant code refactoring and speed improvements Historical Views • Site, Management perspective • Job metrics as a function of time • Significant code refactoring • Added flexibility:8 filtering and 11 grouping by options Monitoring of the LHC Computing Activities 10 22/09/2011

  11. Recent Development on Monitoring of Sites and Services • Site Usability Monitoring • An interface to the Nagios tests used by the LHC VOs for the validation of sites and services • Collaboration with SAM, Nagios and Grid View teams. Strong contribution from BARC in India • Under validation by ATLAS and CMS Monitoring of the LHC Computing Activities 11 22/09/2011

  12. Recent Development on Site Commissioning (cont.) • Site Status Board • Used by ATLAS and CMS for distributed computing shifts and for site commissioning • Presents the status of all sites in a VO • According to VO-defined metrics • Easy to add/combine metrics • Different views (shifter, site commissioning, transfers...) • More than 200 metrics were added by the LHC VOs • Alarms added and error monitoring to the collectors • Many improvements on the database layout and better graphics on the UI-level using jQuery and highcharts Monitoring of the LHC Computing Activities 12 22/09/2011

  13. Site Status Board Monitoring of the LHC Computing Activities 13 22/09/2011

  14. Recent Development on Publicity and Dissemination • WLCG Google Earth Dashboard: Global, cross-vo, real-timeview of the LHC computing activities • Improved stability of the collectors Monitoring of the LHC Computing Activities 14 22/09/2011

  15. Common Monitoring Solutions Monitoring of the LHC Computing Activities 15 22/09/2011

  16. Summary • Many recent improvements on our monitoring apps due to: • Common framework leveraging the latest web technologies • Applications are built on the same framework to reduce development and maintenance overhead • Loose coupling to data sources adding flexibility to the system • Applications can be easily adopted by different data sources within a VO or by different VOs • UI is agnostic of the data storage implementation Monitoring of the LHC Computing Activities 16 22/09/2011

More Related