1 / 29

Xrootd usage @ LHC

Xrootd usage @ LHC. An up-to- date technical survey about xrootd-based storage solutions. Outline. Intro Main use cases in the storage arena Generic Pure xrootd @ LHC The Atlas@SLAC way The Alice way CASTOR2 Roadmap Conclusions. Introduction and use cases.

kiril
Download Presentation

Xrootd usage @ LHC

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Xrootd usage @ LHC An up-to-date technical survey about xrootd-based storage solutions

  2. Outline • Intro • Main use cases in the storage arena • Generic Pure xrootd @ LHC • The Atlas@SLAC way • The Alice way • CASTOR2 • Roadmap • Conclusions F.Furano (CERN IT-DM) - Xrootd usage @ LHC

  3. Introduction and use cases F.Furano (CERN IT-DM) - Xrootd usage @ LHC

  4. The historical Problem: data access F.Furano (CERN IT-DM) - Xrootd usage @ LHC • Physics experiments rely on rare events and statistics • Huge amount of data to get a significant number of events • The typical data store can reach 5-10 PB… now • Millions of files, thousands of concurrent clients • The transaction rate is very high • Not uncommon O(103) file opens/sec per cluster • Average, not peak • Traffic sources: local GRID site, local batch system, WAN • Up to O(104) clients per server! • If not met then the outcome is: • Crashes, instability, workarounds, “need” for crazy things • Scalable high performance direct data access • No imposed limits on performance and size, connectivity • Higher performance, supports WAN direct data access • Avoids WN under-utilization • No need to do inefficient local copies if not needed • Do we fetch entire websites to browse one page?

  5. TheChallenges • LHC User Analysis • Boundary Conditions • GRID environment • GSI authentication • User space deployment • CC environment • Kerberos, - admin deployment • High I/O load • Moderate Namespace load • Many clients O(1000-10000) • T0/T3 @ CERN • Preferredinterfaceis MFS • Easy, intuitive, fast response, standardapplications • Moderate I/O load • High Namespaceload • Compilation • Software startup • searches • Less Clients O(#users) Sequential File Access Sparse File Access Basic Analysis (today)RAW, ESD Advanced Analysis (tomorrow)ESD,AOD, Ntuple, Histograms RAP root, dcap,rfio .... MFS Mounted File Systems Batch Data Access Interactive Data Access F.Furano (CERN IT-DM) - Xrootd usage @ LHC

  6. Main requirement • Data access has to work reliably at the desired scale • This also means: • It has not to waste resources F.Furano (CERN IT-DM) - Xrootd usage @ LHC

  7. A simple use case • I am a physicist, waiting for the results of my analysis jobs • Many bunches, several outputs • Will be saved e.g. to an SE at CERN • My laptop is configured to show histograms etc, with ROOT • I leave for a conference, the jobs finish while in the plane • When there, I want to simply draw the results from my home directory • When there, I want to save my new histos in the same place • I have no time to loose in tweaking to get a copy of everything. I loose copies into the confusion. • I want to leave the things where they are. I know nothing about things to tweak. What can I expect? Can I do it? F. Furano, A. Hanushevsky - Scalla/xrootd WAN globalization tools: where we are. (CHEP09)

  8. Another use case • ALICE analysis on the GRID • Each job reads ~100-150MB from ALICE::CERN::SE • These are cond data accessed directly, not file copies • I.e. VERY efficient, one job reads only what it needs. • It just works, no workarounds • At 10-20MB/s it takes 5-10 secs (most common case) • At 5MB/s it takes 20secs • At 1MB/s it takes 100 • Sometimes data are accessed elsewhere • Alien allows to save a job by making it read data from a different site. Very good performance • Quiteoften the results are written/merged elsewhere F. Furano, A. Hanushevsky - Scalla/xrootd WAN globalization tools: where we are. (CHEP09)

  9. Pure Xrootd F.Furano (CERN IT-DM) - Xrootd usage @ LHC

  10. xrootd Plugin Architecture authentication (gsi, krb5, etc) lfn2pfn prefix encoding authorization (name based) Protocol (1 of n) (xrootd) File System (ofs, sfs, alice, etc) Storage System (oss, drm/srm, etc) Clustering (cmsd) Protocol Driver (XRD) F.Furano (CERN IT-DM) - Xrootd usage @ LHC

  11. The client side • Fault tolerance in data access • Meets WAN requirements, reduces jobs mortality • Connection multiplexing (authenticated sessions) • Up to 65536 parallel r/w requests at once per client process • Up to 32767 open files per client process • Opens bunches of up to O(1000) files at once, in parallel • Full support for huge bulk prestages • Smart r/w caching • Supports normal readaheads and “Informed Prefetching” • Asynchronous background writes • Boosts writing performance in LAN/WAN • Sophisticated integration with ROOT • Reads in advance the “right” chunks while the app computes the preceding ones • Boosts read performance in LAN/WAN (up to the same order) F.Furano (CERN IT-DM) - Xrootd usage @ LHC

  12. The Xrootd “protocol” F.Furano (CERN IT-DM) - Xrootd usage @ LHC • The XRootD protocol is a good one • Efficient, clean, supports fault-tolerance etc. etc… • It doesn’t do any magic, however • It does not multiply your resources • It does not overcome hw bottlenecks • BUT it allows the true usage of the hw resources • One of the aims of the project still is sw quality • In the carefully crafted pieces of sw which come with the distribution • What makes the difference with Scalla/XRootD is: • Scalla/XRootD Implementation details (performance + robustness) • And bad performance can hurt robustness (and vice-versa) • Scalla SW architecture (scalability + performance + robustness) • Designed to fit the HEP requirements • You need a clean design where to insert it • Born with efficient direct access in mind • But with the requirements of high performance computing • Copy-like access becomes a particular case

  13. Pure Xrootd @ LHC F.Furano (CERN IT-DM) - Xrootd usage @ LHC

  14. The Atlas@SLAC way with XROOTD Scalla Cluster F U S E F U S E Data Data Data SRM Data Data A D A P T E R Data Data GRID DIR GridFTP xrootd/cmsd/cnsd Fire Wall • Pure Xrootd + Xrootd-based “filesystem” extension • Adapters to talk to BestMan SRM and GridFTP • More details in A.Hanushevsky’s talk @ CHEP09 Clients F.Furano (CERN IT-DM) - Xrootd usage @ LHC

  15. The ALICE way with XROOTD Xrootd Cmsd Local clients work Normally at each site Xrootd site (GSI) Xrootd site (CERN) Any other Xrootd site • Pure Xrootd + ALICE strong authzplugin. No difference among T1/T2 (only size and QOS) • WAN-wide globalized deployment, very efficient direct data access • CASTOR at Tier-0 serving data, Pure Xrootd serving conditions to the GRID jobs • “Old” DPM+Xrootd in several tier2s A globalized cluster ALICE global redirector A smart client could point here Missing a file? Ask to the global redirector Get redirected to the right collaborating cluster, and fetch it. Immediately. Virtual Mass Storage System … built on data Globalization More details and complete info in “Scalla/Xrootd WAN globalization tools: where we are.” @ CHEP09 F.Furano (CERN IT-DM) - Xrootd usage @ LHC

  16. CASTOR2 Putting everything together @ Tier0/1s F.Furano (CERN IT-DM) - Xrootd usage @ LHC

  17. The CASTOR way • Client connects to a redirector node • The redirector asks CASTOR where the file is • Client then connects directly to the node holding the data • CASTOR handles tapes in the back Trigger migration/recall CASTOR On C Where is X ? A Redirector Open file X Tape backend Client Go to C B C Disk Servers Credits: S.Ponce (IT-DM) F.Furano (CERN IT-DM) - Xrootd usage @ LHC

  18. CASTOR 2.1.8Improving Latency - Read Network Latency Limit • 1st focus on file (read) open latencies Estimate ms Read Open Latencies Castor 2.1.7 (rfio) 1000 Castor 2.1.8 (xroot) 100 10 Castor 2.1.9 (xroot) 1 October 2008 Credits: A.Peters (IT-DM) F.Furano (CERN IT-DM) - Xrootd usage @ LHC

  19. CASTOR 2.1.8Improving Latency – Metadata Read Network Latency Limit • Next focus on meta data (read) latencies Estimate ms Stat Latencies 1000 100 Castor 2.1.7 Castor 2.1.8 10 Castor 2.1.9 1 October 2008 Credits: A.Peters (IT-DM) F.Furano (CERN IT-DM) - Xrootd usage @ LHC

  20. Prototype - Architecture XCFS Overview - xroot + FUSE Client Application Remote Access Protocol (ROOT plugshere) Posixaccess to /xcfs (i.e. a genericapplication) glibc VFS /dev/fuse CLIENT FUSE LL Implementation libfuse XROOT Posix Library xcfsd libXrdPosix XROOT Client Library libXrdClient Capability xrootd server daemon xrootd xrootd xrootd xrootd Strong Auth Plugin libXrdSec<plugin> libXrdSec<plugin> MD SERVER DISK SERVER libXrdCatalogFs libXrdSec<plugin> libXrdSecUnix Meta Data Filesystem libXrdCatalogOfs libXrdCatalogFs Authz FS libXrdCatalogAuthz Name Space Provider DATA FS XFS F.Furano (CERN IT-DM) - Xrootd usage @ LHC Credits: A.Peters (IT-DM)

  21. ~1.000/s ~2.400/s ~2.500/s ~3.000/s Σ = 70.000/s Early Prototype - Evaluation Meta Data Performance • File Creation* • File Rewrite • File Read • Rm • Readdir/StatAccess *These valueshavebeenmeasuredexecutingshellcommands on 216 mountclients. Creation performancedecreaseswiththefilling of thenamespace on a spinningmedium. Using an XFS filesystemover a DRBD blockdevicein a high-availabilitysetupfilecreationperfromancestabilizes at 400/s (20 Miofiles in thenamespace) F.Furano (CERN IT-DM) - Xrootd usage @ LHC Credits: A.Peters (IT-DM)

  22. Network usage (or waste!) • Network traffic is an important factor – it has to match the ratio IO(CPU Server) / IO(Disk Server) • Too much unneeded traffic means fewer clients supported (serious bottleneck: 1 client works well, 100-1000 clients do not at all) • Lustre doesn't disable readahead during forward-seeking access and transfers the complete file if reads are found in the buffer cache (readahead window starts with 1M and scales up to 40 M) • XCFS/LUSTRE/NFS4 network volume without read-ahead is based on 4k pages in Linux • Most of the requests are not page aligned and result in additional pages to be transferred (avg. read size 4k), hence they xfer twice as much data (but XCFS can skip this now!) • 2nd execution plays no real role for analysis since datasets are usually bigger than client buffer cache Credits: A.Peters (IT-DM) – ACAT2008 F.Furano (CERN IT-DM) - Xrootd usage @ LHC

  23. CASTOR 2.1.8-6Cross Pool Redirection Server T3 Stager Name Space T0 Stager X X Server Manager Manager X xrootd X cmsd(cluster management) Meta Manager Example Configuration • T3 pool subscribed • r/w for /castor/user • r/w for /castor/cms/user/ • T0 pool subscribed • ro for /castor • ro for /castor/cms/data • Whyisthatuseful? • Users canaccessdataby LFNwithoutspecification of thestager • Users areautomaticallydirected to 'their' poolwithwritepermissions There are even more possibilities if a part of the namespace can be assigned to individual pools for write operations. F.Furano (CERN IT-DM) - Xrootd usage @ LHC Credits: A.Peters (IT-DM)

  24. Towards a Production VersionFurtherImprovements – Security • GSI/VOMS authentication plugin prototype developed based on pure OpenSSL • using additionally code from mod_ssl & libgridsite • significantly faster than GLOBUS implementation • After Security Workshop with A.Hanushevsky Virtual Socket Layer introduced into xrootd authentication plugin base to allow socket oriented authentication over xrootd protocol layer • Final version should be based on OpenSSL and VOMS library Virtual Socket Virtual Socket F.Furano (CERN IT-DM) - Xrootd usage @ LHC

  25. The roadmap F.Furano (CERN IT-DM) - Xrootd usage @ LHC

  26. XROOT Roadmap @CERN • XROOT is strategic for scalable analysis support with CASTOR at CERN / T1s • will support other file access protocols until they become obsolete • CASTOR • Secure RFIO has been released in 2.1.8 • deployment impact in terms of CPU may be significant • Secure XROOT is default in 2.1.8 (Kerb. or X509) • Expect to lower CPU cost than rfio due to session model • No plans to provide un-authenticated access via XROOT F.Furano (CERN IT-DM) - Xrootd usage @ LHC

  27. XROOTD Roadmap • CASTOR • Secure RFIO has been released in 2.1.8 • deployment impact in terms of CPU may be significant • Secure XROOT is default in 2.1.8 (Kerb. or X509) • Expect to lower CPU cost than rfio due to session model • No plans to provide un-authenticated access via XROOT • DPM • support for authentication via xrootd is scheduled start certification begin of July • dCache • Relies on a custom full re-implementation of XROOTD protocol • protocol docs have been updated by A. Hanushevsky • in contact with CASTOR/DPM team to add authentication/authorisation on the server side • evaluating common client plug-in / security protocol F.Furano (CERN IT-DM) - Xrootd usage @ LHC

  28. Conclusion • A very dense roadmap • Many, many tech details • Heading for • Solid and high performance data access • For production and analysis • More advanced user analysis scenarios • Need to match existing architectures, protocols and workarounds F.Furano (CERN IT-DM) - Xrootd usage @ LHC

  29. Thank you Questions?

More Related