Atlas off grid sites tier 3 monitoring
Download
1 / 20

ATLAS Off-Grid sites (Tier-3) monitoring - PowerPoint PPT Presentation


  • 93 Views
  • Uploaded on
  • Presentation posted in: General

ATLAS Off-Grid sites (Tier-3) monitoring. A. Petrosyan on behalf of the ATLAS collaboration GRID’2012, 17.07.12, JINR, Dubna. Goals of the project. Provide reasonable monitoring solution for ‘off grid’ sites (unplugged geographically close computing resources)

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha

Download Presentation

ATLAS Off-Grid sites (Tier-3) monitoring

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


ATLAS Off-Grid sites (Tier-3) monitoring

A. Petrosyan on behalf of the ATLAS collaboration

GRID’2012, 17.07.12, JINR, Dubna


Goals of the project

  • Provide reasonable monitoring solution for ‘off grid’ sites (unplugged geographically close computing resources)

  • Monitoring of computing facility of local groups with collocated storage system (Tier1+Tier3, Tier2+Tier3)

  • Present Tier-3 sites activity on global level

  • Data transfer monitoring across XRootD federation

GRID'2012, JINR, Dubna


Tier-3 sites monitoring levels

  • Monitoring of the local infrastructure for site administration

  • Central system for monitoring of the VO activities at Tier-3 sites

GRID'2012, JINR, Dubna


Objectives of the local monitoring system at Tier-3 site

  • Detailed monitoring of the local fabric

  • Monitoring of the batch system

  • Monitoring of the job processing

  • Monitoring of the mass storage system

  • Monitoring of the VO computing activities on the local site

GRID'2012, JINR, Dubna


Objectives of the global Tier-3 monitoring

  • Monitoring of the VO usage of the Tier-3 resources in terms of data transfer, data access, and job processing

  • Quality of the provided service based on the job processing and data transfer monitoring metrics

GRID'2012, JINR, Dubna


Site monitoring

  • Based on Ganglia monitoring system

  • Collects basic metrics using Ganglia sensors

  • Plugin system for monitoring specific metrics

  • PostgreSQL to aggregate data

  • More details for each package at https://svnweb.cern.ch/trac/t3mon/wiki/T3MONHome

  • Monitoring modules available for Condor, Lustre, PBS, Proof, XRootD; each has plugin to deliver data to the global level

  • Examples of UI for different systems at http://vm01.jinr.ru/ganglia/

GRID'2012, JINR, Dubna


Data flow for the site monitoring

  • Common UI for various data sources

  • Small core with separate modules allows to install only needed software

  • Delivery to global level can be switched off

GRID'2012, JINR, Dubna


Global monitoring

  • Ganglia as executor

  • MSG as transmitting system

  • Publisher on local site: is executed by gmond, intercommunicates with local DB and sends information to MSG system

  • Backend: consumer(s) of messages at CERN and data popularity and jobs statistics presentation via Dashboard

GRID'2012, JINR, Dubna


Data flow for the global monitoring

GRID'2012, JINR, Dubna


Data flow for Proof, Condor

  • PostgreSQL for data aggregation on local site

  • Ganglia UI to present data popularity on site level

  • Ganglia gmond to execute summary gathering

  • Summary is delivered to Dashboard historical views once per hour

  • Data being sent to global level:

    • Job status: Ok, stopped, aborted

    • Site name

    • Time of report

    • Amount of processed events

    • Bytes read

    • Amount of active users

GRID'2012, JINR, Dubna


Data flow for XRootD

  • Both summary and detailed events gatherer implemented as Linux daemon

  • Summary data goes directly to Ganglia

  • File transfer data can be stored in local PostgreSQL and then presented via Ganglia

  • Detailed data can be delivered to ActiveMQ directly

  • Data being sent to global level:

    • Domain from, host and ip address

    • Domain to, host and ip address

    • User

    • File, size

    • Bytes read, written

    • Time transfer started and finished

GRID'2012, JINR, Dubna


Tier-3 monitoring status

  • Full chain of development from Tier-3 site to Dashboard was performed

  • Site-level presentation via Ganglia Web 2.0

  • Global-level presentation of Proof jobs via Dashboard Historical Views

  • Tier-3 site to DQ2 popularity: formats agreed, delivers, consumer on DQ2 side is in testing stage

  • T3Mon software was installed on pilot sites

  • Distribution is available via our repository: https://svnweb.cern.ch/trac/t3mon/wiki/YumConfigure

  • We are welcome more sites to try and to send their feedback to our support list: t3mon-jinr-@googlegroups.com

GRID'2012, JINR, Dubna


XRootD transfers monitoring

  • Goal: present transfers between servers and sites in federation via one UI

  • Messages from XRootD servers are being collected via T3Mon UDP collector and then being sent into AMQ

  • Data is stored in Hbase storage

  • Hadoop processing is used to prepare data summaries

  • Web-services for data export

  • Dashboard transfer interface as UI

GRID'2012, JINR, Dubna


Data flow for the XRootD federation monitoring

GRID'2012, JINR, Dubna


T3Mon UDP messages collector

  • Can be installed anywhere, implemented as Linux daemon

  • Extracts transfer info from several messages and compose file transfer message

  • Sends complete transfer message to ActiveMQ

  • Message includes:

    • Domain from, host and ip address

    • Domain to, host and address

    • User

    • File, size

    • Bytes read/written

    • Time transfer started/finished

GRID'2012, JINR, Dubna


AMQ2Hadoop collector

  • Can be installed anywhere, implemented as Linux daemon

  • Listens ActiveMQ queue

  • Extracts messages

  • Inserts into Hbase raw table

GRID'2012, JINR, Dubna


Hadoop processing

  • Reads raw table

  • Prepares data summary: 10 min stats as structure:

    • From

    • To

    • Sum bytes read

    • Sum bytes written

    • Amount files read

    • Amount files written

  • Inserts summary data into summary table

  • MapReduce: we use Java, we also working on enabling Pig routines

GRID'2012, JINR, Dubna


Storage2UI data export

  • Web-service

  • Extracts data from the storage

  • Feeds Dashboard XBrowse UI

GRID'2012, JINR, Dubna


Status

  • In prototype stage:

    • Hadoop processing is executed manually

    • Simulated data

  • UI:

    http://xrdfedmon-dev.jinr.ru/ui/#date.from=201206210000&date.interval=0&date.to=201206220000&grouping.dst=(host)&grouping.src=(host)

  • We are ready to start testing on real federation

GRID'2012, JINR, Dubna


Thanks for attention

GRID'2012, JINR, Dubna


ad
  • Login