1 / 19

Distributed Storage

Distributed Storage. Wahid Bhimji Outline : Update on some issues mentioned last time: SRM; DPM collaboration Federation: w ebDav / xrootd deployment status O ther topics: Clouds and “Big Data”. Update on issues.

gareth
Download Presentation

Distributed Storage

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Distributed Storage Wahid Bhimji Outline: Update on some issues mentioned last time: SRM; DPM collaboration Federation: webDav /xrootd deployment status Other topics: Clouds and “Big Data”

  2. Update on issues • SRM: Still required but progress towards removing need for disk-only: GDB talk DPM (Collaboration) • Agreed” at DPM workshop at LAL, 04/12/2012 • “more than adequate level of effort was pledged in principle to maintain, develop and support the DPM as an efficient storage solution.” • Strong commitment from TW; core support from CERN; decent one from France and us: we didn’t really evaluate transitioning – now there seems no need to. • In practice CERN are currently providing more than promised – but there is a further reorganisationcoming and previous lead developer (Ricardo) left. • Xrootd now working well in DPM (see federation slides) • DMLite is in production (but only actually used for WebDav) • still minimal experience / testing: HDFS / S3 apparently works; Lustre not finished.

  3. Federation: what is it ATLAS project called “FAX”, CMS called “AAA” (AnydataAnytimeAnywhere) Description (from the FAX Twiki): The Federated ATLAS Xrootd (FAX) system is a storage federation aims at bringing Tier1, Tier2 and Tier3 storage together as if it is a giant single storage system, so that users do not have to think of there is the data and how to access the data. A client software like ROOT or xrdcp will interact with FAX behind the sight and will reach the data whereever it is in the federation. Goals (from Rob Gardner’s talk at Lyon Federation mtg. 2012). Similar to CMS: • Common ATLAS namespace across all storage sites, accessible from anywhere; • Easy to use, homogeneous access to data • Use as failover for existing systems • Gain access to more CPUs using WAN direct read access • Use as caching mechanism at sites to reduce local data management tasks

  4. ATLAS FAX Deployment Status • US fully deployed: Scalla; dCache and Lustre (though not StoRM) • DPM has a nice xrootd server now : details • UK have been a testbed for this – but now entirely YAIM setup (since 1.8.4) :https://svnweb.cern.ch/trac/lcgdm/wiki/Dpm/Xroot/Setup • Thanks to David Smith all issues (in xrootd not DPM) solved. • CASTOR required a kind of custom setup by T1- works now • Regional redirector setup for UK : physically at CERN • UK sites working now • DPM: Cambridge;ECDF; Glasgow; Lancaster; Liverpool; Manchester; Oxford; • EMI push means all sites could install now (but its still optional) • Lustre: QMUL – Working • dCache : RalPP – In progress • CASTOR: RAL - Working

  5. CMS AAA Deployment Status Andrew Lahiff • Site status • xrootdfallback: • Enabled at all US Tier-2s • UK/European sites being actively pursed & tested • xrootd access: • Enabled at some Tier-2s, FNAL, RAL • Sites being encouraged to provide access • xrootdfallback & access SAM tests not yet critical • UK Sites: RalPP; IC and Brunel all have fallback and access enabled • Usage: • User analysis jobs (ongoing) • Central MC & data reprocessing (started a few weeks ago) • Running workflows which read input data from remote sites using xrootdfallback • Makes better use of available resources without moving data around using FTS or wasting tape at Tier-1s.

  6. CMS xrootdusage • Data reprocessing running at CNAF, reading using xrootd“fallback” • > 1000 slots • > 500 MB/s aggregate • ~ 25 T data copied • 350 kB/s/core • CPU eff> 95%

  7. CMS remote usage – last week 350T Big Time – mostly in the US

  8. ATLAS remote usage 15 T Small Fry: currently tests and users in the know. But it works and UK Sites are up there

  9. What about http:// • Storage federation based on http (WebDav) has been demonstrated – see for example Fabrizio’s GDB talk • DPM has a WebDav interface as does dCache. Storm has just released something – testing at QMUL. (Castor (and EOS) not yet). • Sits with wider goals of interoperability with other communities. • However doesn’t have the same level of HEP/LHC uptake so far • ATLAS however want to use it within Rucio • Initially for renaming files – but wider uses envisaged: e.g. user download • UK ironed out a few problems for ATLAS use with DPM server • Those fixes will be 1.8.7 - not yet released.

  10. Other things • Cloud Storage: • Plan for testing within GridPP (Some existing studies at CERN) • Definition: “resources which have an interface like those provided by current commercial cloud resource providers” • Goal: to allow use of future resources that are provided in this way . • Testing: transfers and direct reading (ROOT S3 plugin) • Status: set up on IC Swift: • Copying works. • Added as pool to test-DPM at Edinburgh – almost works . • “Big Data”: • “Big Data” is not a buzzword, now business-as-usual in the private sector. • Why does HEP share little of the same toolkit? • Explore via joint-workshops + development activities: real physics use-cases. • Both these are on the topic list for WLCG Storage/Data WG.

  11. Conclusion/ Discussion • Future of DPM is not so much of a worry • Reduced reliance on SRM will offer flexibility in storage solutions • Federated xrootd deployment is well advanced (inc in UK) • Seems to work well in anger for CMS (aim 90% T2s by June); • So far so good for ATLAS – though not pushing all the way; • LHCbalso interested (“fallback onto other storage elements only as exception”) • WebDav option kept alive by ATLAS rucio https://www.gridpp.ac.uk/wiki WebDAV#Federated_storage_support • Cloud Storage: Worth exploring – have just started • Big Data: Surely an “Impact” in the room

  12. Backup

  13. Federation traffic Modest levels now will grow when in production • In fact inc. local traffic UK sites dominate • Oxford and ECDF switched to xrootd for local traffic

  14. Systematic FDR load tests in progress EU cloud results Slide Stolen from I. Vukotic Absolute values not important (Affected by CPU /HT Etc.) and setup Point is remote read can be Good but varies

  15. Cost matrix measurements Cost-of-access: (pairwise network links, storage load, etc.)

  16. FAX by destination cloud

  17. Testing and Monitoring https://twiki.cern.ch/twiki/bin/view/Atlas/MonitoringFax FAX specific : http://uct3-xrdp.uchicago.edu:8080/rsv/ (basic status) http://atl-prod07.slac.stanford.edu:8080/display (previous page) Now in normal ATLAS monitoring too : • http://dashb-atlas-ssb.cern.ch/dashboard/request.py/siteview?view=FAXMON#currentView=FAXMON&fullscreen=true&highlight=false • http://dashb-atlas-xrootd-transfers.cern.ch/ui/

  18. Atlas FAX structure Topology of “regional” redirectors Needs “Name2Name” LFC lookup (unlike CMS) (probably not needed in soon (rucio))

More Related