Storage federations and fax the atlas federation
This presentation is the property of its rightful owner.
Sponsored Links
1 / 16

Storage Federations and FAX (the ATLAS Federation) PowerPoint PPT Presentation


  • 66 Views
  • Uploaded on
  • Presentation posted in: General

Storage Federations and FAX (the ATLAS Federation). Wahid Bhimji University of Edinburgh. Outline. Introductory: What is Storage Federation and FAX and its goals (as stated by the ATLAS FAX project) Some personal perspectives UK deployment status Testing / Monitoring / Use-Cases

Download Presentation

Storage Federations and FAX (the ATLAS Federation)

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Storage federations and fax the atlas federation

Storage Federations and FAX(the ATLAS Federation)

Wahid Bhimji

University of Edinburgh


Outline

Outline

  • Introductory:

    • What is Storage Federation and FAX and its goals (as stated by the ATLAS FAX project)

    • Some personal perspectives

  • UK deployment status

  • Testing / Monitoring / Use-Cases

  • Concerns and Benefits


What is federation and fax

What is Federation and FAX?

Description (from the FAX Twiki):

The Federated ATLAS Xrootd (FAX) system is a storage federation aims at bringing Tier1, Tier2 and Tier3 storage together as if it is a giant single storage system, so that users do not have to think of there is the data and how to access the data. A client software like ROOT or xrdcp will interact with FAX behind the sight and will reach the data whereever it is in the federation.

Goals (from Rob Gardner’s talk at Lyon Federation mtg. 2012). Similar to CMS:

  • Common ATLAS namespace across all storage sites, accessible from anywhere;

  • Easy to use, homogeneous access to data

  • Use as failover for existing systems

  • Gain access to more CPUs using WAN direct read access

  • Use as caching mechanism at sites to reduce local data management tasks


Other details oddities of fax some of this is my perspective

Other details / oddities of FAX(some of this is my perspective)

  • Started in US with pure-xrootd and xrootd-dcache

    • Now worldwide inc. UK; IT ; DE and CERN (EOS)

  • Uses topology of “regional” redirectors (see next slide)

  • ATLAS federation uses a “Name2Name” LFC lookup unlike CMS

  • Now moving from R&D to production

    • But not (quite) there yet IMHO

  • There is interest in http(s) federation instead / as well

    • Particularly from Europeans

    • But this is nowhere near as far along.


Regional redirectors

Regional redirectors


Uk fax deployment status

UK Fax Deployment Status

  • dCache and Lustre (though not StoRM) setups are widely used in the US

  • DPM has a nice xrootd server now : details

    • UK have been a testbed for this – but now entirely YAIM setup (since 1.8.4) :https://svnweb.cern.ch/trac/lcgdm/wiki/Dpm/Xroot/Setup

    • Thanks to David Smith; many issues (in xrootd not DPM) all solved (inc SL6)

  • CASTOR required a kind of custom setup by Shaun / Alastair but building on config of others.

  • Regional redirector setup for UK : physically at CERN - currently managed by them though we could do it

  • UK sites working now

    • DPM: ECDF; Glasgow; Liverpool; Oxford – Working

      • Lancaster – Almost there; Manchester – Intending to deploy.

      • EMI push means all sites could install now

    • Lustre: QMUL – Working

    • CASTOR: RAL - Working


Testing and monitoring

Testing and Monitoring

https://twiki.cern.ch/twiki/bin/view/Atlas/MonitoringFax

FAX specific :

http://uct3-xrdp.uchicago.edu:8080/rsv/ (basic status)

http://atl-prod07.slac.stanford.edu:8080/display (previous page)

Now in normal ATLAS monitoring too :

  • http://dashb-atlas-ssb.cern.ch/dashboard/request.py/siteview?view=FAXMON#currentView=FAXMON&fullscreen=true&highlight=false

  • http://dashb-atlas-xrootd-transfers.cern.ch/ui/


Traffic monitoring

Traffic monitoring

xrootd.monitor all rbuff 32k auth flush 30s  window 5s dest files info user iorediratl-prod05.slac.stanford.edu:9930

xrd.reportatl-prod05.slac.stanford.edu:9931 every 60s all -buff -poll sync

Needs to be on all disk nodes too

(setup by YAIM)

http://atl-prod07.slac.stanford.edu:8080/display


Aside s tress t esting dpm x rootd

Aside – stress testing DPM xrootd

ANALY_GLASGOW_XROOTD queue

  • Stress-tested “local” xrootd access

    • For direct access we saw some server load (same as we do for rfio).

    • David did offer to help – we didn’t follow up much

    • I am still optimistic that xrootd will offer better performance than rfio

  • Trying panda failover tricks

    • Not done yet– but hammercloud tests are planned for the dress-rehersal (see next page)

  • ASGC have done extensive hammerclouds on (non-FAX) dpm-xrootd :

    • Promising results . Using in production now (?)


  • This week fax dress rehersal

    This week: FAX dress rehersal

    • Initially just a continuation of existing tests

      • ramping up to O(10) jobs

      • Using standard datasets placed at sites

    • Also some hammercloud tests – again low level

      • Direct access (most interesting) not working yet.

    • _All_ increases will be discussed with cloud / sites

    • Not everything is ready to run (still placing datasets) so this is likely to take some weeks.


    Use cases revisiting goals

    Use Cases – revisiting goals

    • Common ATLAS namespace across all storage sites, accessible from anywhere; Easy to use, homogeneous access to data

      • Done – implicit in the setup

      • Keen users being encouraged to try: tutorials etc.

    • Use as failover for existing systems

      • Production jobs can now retry from the federation if all local tries fail… works –not tried in anger.

    • Gain access to more CPUs using WAN direct read access

      • WAN access works – no reason no to use in principle.

      • Timing info from WAN tests ready for brokering – not yet used (AFAIK)

    • Use as caching mechanism at sites to reduce local data management tasks

      • Nothing yet has been done on this (AFAIK).


    Some of my concerns

    Some of my concerns

    • If users started using this, this could result in a lot of unexpected traffic of files served from sites (and read in by WNs):

      • The service is not yet in production – no SAM test; no clear expectations of service etc. Communication with sites currently direct to site admin (not via cloud or ggus).

      • We know some network paths are slow. And this is a new path involving WN and WAN

      • Multiple VO support: currently separate server instances

      • BUT SAM test etc. is coming as are configurable bandwidth limiting

    • But many site failures (and user frustrations) are storage related so ifit solves those then its worth it


    What about http

    What about http://

    • Storage federation based on http (WebDav) has been demonstrated – see for example Fabrizio’s GDB talk

    • DPM has a WebDav interface as does dCache and Chris has something with Storm at QMUL.

    • Sits better with wider goals of interoperability with other communities.

    • However doesn’t have the same level of HEP/LHC uptake so far

    • ATLAS however want to use it within Rucio for renaming files. ECDF involved in those tests - not going that well but will iron those out.


    Conclusion discussion

    Conclusion/ Discussion

    • Significant progress in UK FAX integration. We are well placed to completely deploy this in the coming months. Things appear to work but have not been stressed.

    • ATLAS are stating deadlines of April for both xrootd and webdav (though they don’t really have the use-cases yet) .

    • CMS had a deadline of end of last year I believe.

    • Now is the time to voice concerns! And also to decide as a cloud of the benefits and push in that direction.

    • But should also progress deployment. And see what bottlenecks we have

      Chris made a wiki page for site status:

    • https://www.gridpp.ac.uk/wiki/WebDAV#Federated_storage_support


  • Login