1 / 15

Using xrootd monitoring flows in various contexts

Using xrootd monitoring flows in various contexts. Artem Petrosyan , Sergey Belov , Danila Oleynik , Sergey M itsyn , Julia Andreeva. Usecases for xrootd monitoring. Integration of the xrootd transfers in the WLCG transfer Dashboard, which currently tracks only FTS transfers:

Download Presentation

Using xrootd monitoring flows in various contexts

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Using xrootd monitoring flows in various contexts ArtemPetrosyan, Sergey Belov, DanilaOleynik, Sergey Mitsyn, Julia Andreeva

  2. Usecases for xrootd monitoring • Integration of the xrootd transfers in the WLCG transfer Dashboard, which currently tracks only FTS transfers: http://dashb-wlcg-transfers.cern.ch/ui/ • Data popularity • Xrootd federation monitoring

  3. Data flow for the WLCG data transfer Dashboard and Popularity application • For the first two cases monitoring reports for accomplished transfers generated from the detailed xrootd data flow is enough. • Per-file monitoring reports are generated by the UCSD collector which consumes UDPs from the xrootd servers (detailed data flow) • A separate component consumes generated per-file reports (UDPs) and publishes them to ActiveMQ • This info is then will be consumed by the Dashboard and Popularity collectors

  4. Data flow for the WLCG data transfer Dashboard and Popularity application Xrootd server Dashboard DB UDPs UDPs Xrootd server ActiveMQ publisher UCSD collector ActiveMQ MB at CERN Xrootd server Popularity DB Xrootd server

  5. Data flow for the WLCG data transfer Dashboard and Popularity application.Possible improvement Xrootd server Dashboard DB UDPs Xrootd server UCSD collector ActiveMQ MB at CERN Xrootd server Popularity DB Xrootd server

  6. Federation monitoring • This work is as a spin off of the ATLAS T3Mon project . It is being performed by JINR (Dubna) team whose monitoring task had been agreed with the xrootd consortium. The architecture described in the further slides had been shown at the xroootd meeting in Lyon, several ATLAS meetings and CHEP WLCG workshop • Current implementation uses only monitoring information for accomplished transfers retrieved from the detailed flow. • The UI is similar to the UI of the ATLAS DDM Dashboard and of the WLCG Transfer Dashboard and therefore is familiar to the ATLAS community • First prototype enabled on the simulated data flow. Take a look: • New URL:http://xrdfedmon-dev.jinr.ru/ui//#date.from=201206210000&date.interval=0&date.to=201206220000&grouping.dst=%28host%29&grouping.src=%28site%29&m.content=%28efficiency,successes,throughput%29 • Old URL: http://fizmat-work.dyndns.org/ui/#date.from=201205140000&date.interval=0&date.to=201205150000&tab=src_plots

  7. Architectural principles • Asynchronous communication between information sources and components of the system at various hierarchical levels (possibility of hierarchical aggregation, collecting-processing-publishing-collecting-processing-publishing…) through ActiveMQ • Common way of handling summary and detailed flows of the native xrootd monitoring (in difference with the current CMS approach) • Flexibility in terms of deployment scenarios which can depend on the size of federation and requirements regarding number of metrics and aggregation granularity. This would allow to construct a scalable system but in difference with ALICE approach does not necessary requires per-site deployment of any components • Choosing technologies which can be deployed at the federation level (no Oracle for example) but should scale in case of heavy load (hadoop, hbase for persistency, MapReduce for data processing) • Common monitoring UI at all levels (federation, VO, WLCG global) but with different levels of details and with additional views at the federation level • Common naming convention (with possibility of choice VO-specific or GOCDB/OIM) which is ideally should be provided by the experiment topology system

  8. Data flow for the xrootd federation monitoring UDP collector Currently there are 2 implementations 1).UCSD collector+ ActiveMQ publisher 2)T3Mon collector_publisher Both provide needed functionality Xrootd server UDPs Xrootd server ActiveMQ message bus AMQ2Hadoop collector Hbase MapReduce processing Xrootd server JSON ATLAS DDM Dashboard-like UI Xrootd server

  9. Data flow for the xrootd federation monitoring Possibility to republish aggregated metrics for further consumption at the higher level UDP collector Currently there are 2 implementations 1).UCSD collector+ ActiveMQ publisher 2)T3Mon collector_publisher Both provide needed functionality Xrootd server UDPs Xrootd server ActiveMQ message bus AMQ2Hadoop collector Hbase MapReduce processing Xrootd server JSON ATLAS DDM Dashboard-like FederationUI Xrootd server

  10. T3Mon UDP messages collector • Can be installed anywhere, implemented as Linux daemon • Listens UDP port • Extracts transfer info from several messages and compose file transfer message • Sends complete transfer message to ActiveMQ • Message data: • Domain from, host and ip address • Domain to, host and ip address • User • File, size • Bytes read/written • Date transfer started/finished Functionality is similar to the UCSD collector, any of two implementations can be used

  11. AMQ2Hadoop collector • Can be installed anywhere, implemented as Linux daemon • Listens ActiveMQ queue • Extracts messages • Inserts into raw table in Hbase

  12. Hadoop processing • In prototype stage, routines executed manually • Reads raw table • Prepares summary: 10min stats from:to:read:written • Inserts summary data into summary table in Hbase • MapReduce: we use Java, preparing Pig routines

  13. Hbase data export • Web-service • Extracts data from the storage • Feeds Dashboard XBrowse UI. • Xbrowse is a java script framework used both for ATLAS DDM Dashboard and WLCG Transfer Dashboard, backend agnostic, requires JSON input. Easy to integrate with various underlying data sources

  14. Why did we ask for raw data reporting to CERN server? • Current implementation of the Federation monitor uses data about the accomplished transfers. We assume it is not enough, since we won’t be able to provide ~ to real time picture with accurate IO measurements. • Certainly, architecture does not foresee sending raw UDPs to a central server. UDP collectors should be deployed as close as possible to the xrootd servers of the federation. Deployment model of how many of them is required should be understood considering amount of monitoring data and size of federation Initially we foresaw per-server or site deployment (ALICE-like model) but this might required additional deployment/maintenance effort. Some reasonable compromise should be found. • Currently, raw data is required for - crosschecks of the correctness of the aggregated data flow Our intention was to generate per client-server snapshots from the raw data and to compare them with similar snapshots based on data consumed from ActiveMQ. -implementing collector which would allow to provide close to real-time picture not based just on accomplished transfers (process raw UDPs and republish results of processing to message bus, or even just raw UDPs to message bus?) -estimation of the amount of transferred data and suggesting recommendations for various deployment scenarios in the future

  15. Issues and open questions • How to resolve federation topology, in particular for clients (mapping of the IP domain to a particular site either in ATLAS or GOCDB/OIM naming convention, though might be not always possible )? Is it foreseen to provide such mapping through AGIS? • For the moment we did not manage to install UCSD collector using existing documentation (sent questions to Matevz) • Any new monitoring system requires validation ( functionality, scalability, performance, reliability of information). Finding ATLAS pilot sites for validation of any T3Mon components on real data flow is almost impossible ( relevant not only for xrootd but for example for proof monitoring as well). • Does ATLAS need more accurate and ~ realtime monitoring than what can be provided based only on the reports about the accomplished transfers)? • Does ATLAS need a single monitoring display for federation monitoring similar to ATLAS DDM Dashboard with all necessary data available through a single entry point (no separate displays and completely different overall implementations for smry and detailed xrootd data flows )?

More Related