1 / 21

Hyper-Scaling Xrootd Clustering

Hyper-Scaling Xrootd Clustering. Andrew Hanushevsky Stanford Linear Accelerator Center Stanford University 29-September-2005 http://xrootd.slac.stanford.edu. Root 2005 Users Workshop CERN September 28-30, 2005. Outline. Xrootd Single Server Scaling Hyper-Scaling via Clustering

jenski
Download Presentation

Hyper-Scaling Xrootd Clustering

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Hyper-Scaling Xrootd Clustering Andrew Hanushevsky Stanford Linear Accelerator Center Stanford University 29-September-2005 http://xrootd.slac.stanford.edu Root 2005 Users WorkshopCERN September 28-30, 2005

  2. Outline • Xrootd Single Server Scaling • Hyper-Scaling via Clustering • Architecture • Performance • Configuring Clusters • Detailed relationships • Example configuration • Adding fault-tolerance • Conclusion 2: http://xrootd.slac.stanford.edu

  3. Latency Per Request (xrootd) 3: http://xrootd.slac.stanford.edu

  4. Capacity vs Load (xrootd) 4: http://xrootd.slac.stanford.edu

  5. xrootd Server Scaling • Linear scaling relative to load • Allows deterministic sizing of server • Disk • NIC • Network Fabric • CPU • Memory • Performance tied directly to hardware cost 5: http://xrootd.slac.stanford.edu

  6. Hyper-Scaling • xrootd servers can be clustered • Increase access points and available data • Complete scaling • Allow for automatic failover • Comprehensive fault-tolerance • The trick is to do so in a way that • Cluster overhead (human & non-human) scales linearly • Allows deterministic sizing of cluster • Cluster size is not artificially limited • I/O performance is not affected 6: http://xrootd.slac.stanford.edu

  7. Basic Cluster Architecture • Software cross bar switch • Allows point-to-point connections • Client and data server • I/O performance not compromised • Assuming switch overhead can be amortized • Scale interconnections by stacking switches • Virtually unlimited connection points • Switch overhead must be very low 7: http://xrootd.slac.stanford.edu

  8. Single Level Switch A open file X Redirectors Cache file location go to C 2nd open X Who has file X? B go to C I have open file X C Redirector (Head Node) Client Data Servers Cluster Client sees all servers as xrootd data servers 8: http://xrootd.slac.stanford.edu

  9. Two Level Switch Client A Who has file X? Data Servers open file X B D go to C Who has file X? I have open file X I have C E go to F I have Redirector (Head Node) Supervisor (sub-redirector) F open file X Cluster Client sees all servers as xrootd data servers 9: http://xrootd.slac.stanford.edu

  10. Making Clusters Efficient • Cell size, structure, & search protocol are critical • Cell Size is 64 • Limits direct inter-chatter to 64 entities • Compresses incoming information by up to a factor of 64 • Can use very efficient 64-bit logical operations • Hierarchical structures usually most efficient • Cells arranged in a B-Tree (i.e., B64-Tree) • Scales 64h (where h is the tree height) • Client needs h-1 hops to find one of 64h servers (2 hops for 262,144 servers) • Number of responses is bounded at each level of the tree • Search is a directed broadcast query/rarely respond protocol • Provably best scheme if less than 50% of servers have the wanted file • Generally true if number of files >> cluster capacity • Cluster protocol becomes more efficient as cluster size increases 10: http://xrootd.slac.stanford.edu

  11. Cluster Scale Management • Massive clusters must be self-managing • Scales 64n where n is height of tree • Scales very quickly (642 = 4096, 643 = 262,144) • Well beyond direct human management capabilities • Therefore clusters self-organize • Single configuration file for all nodes • Uses a minimal spanning tree algorithm • 280 nodes self-cluster in about 7 seconds • 890 nodes self-cluster in about 56 seconds • Most overhead is in wait time to prevent thrashing 11: http://xrootd.slac.stanford.edu

  12. Clustering Impact • Redirection overhead must be amortized • This is deterministic process for xrootd • All I/O is via point-to-point connections • Can trivially use single-server performance data • Clustering overhead is non-trivial • 100-200us additional for an open call • Not good for very small files or short “open” times • However, compatible with the HEP access patterns 12: http://xrootd.slac.stanford.edu

  13. Detailed Cluster Architecture A cell is 1-to-64 entities (servers or cells) clustered around a cell manager The cellular process is self-regulating and creates a B-64 Tree Head Node M 13: http://xrootd.slac.stanford.edu

  14. xrootd ofs odc olb The Internal Details xrootd Data Network (redirectors steer clients to data Data servers provide data) olbd Control Network Managers, Supervisors & Servers (resource info, file location) Redirectors olbd M ctl xrootd olbd S Data Clients data xrootd Data Servers 14: http://xrootd.slac.stanford.edu

  15. xrootd ofs odc olb Schema Configuration Redirectors (Head Node) Supervisors (sub-redirector) Data Servers (end-node) ofs.redirect remote odc.manager host port ofs.redirect remote ofs.redirect target ofs.redirect target x x x o o o olb.role manager olb.port port olb.allow hostpat olb.role supervisor olb.subscribe host port olb.allow hostpat olb.role server olb.subscribe host port 15: http://xrootd.slac.stanford.edu

  16. Example: SLAC Configuration kan01 kan02 kan03 kan04 kanxx kanrdr-a kanrdr02 kanrdr01 client machines Hidden Details 16: http://xrootd.slac.stanford.edu

  17. xrootd ofs odc olb Configuration File if kanrdr-a+ olb.role manager olb.port 3121 olb.allow host kan*.slac.stanford.edu ofs.redirect remote odc.manager kanrdr-a+ 3121 else olb.role server olb.subscribe kanrdr-a+ 3121 ofs.redirect target fi 17: http://xrootd.slac.stanford.edu

  18. xrootd ofs odc olb Potential Simplification? if kanrdr-a+ olb.role manager olb.port 3121 olb.allow host kan*.slac.stanford.edu ofs.redirect remote odc.manager kanrdr-a+ 3121 else olb.role server olb.subscribe kanrdr-a+ 3121 ofs.redirect target fi olb.port 3121 all.role manager if kanrdr-a+ all.role server if !kanrdr-a+ all.subscribe kanrdr-a+ olb.allow host kan*.slac.stanford.edu Is the simplification really better? We’re not sure, what do you think? 18: http://xrootd.slac.stanford.edu

  19. Adding Fault Tolerance Manager (Head Node) xrootd xrootd Fully Replicate olbd olbd Supervisor (Intermediate Node) xrootd xrootd xrootd Hot Spares olbd olbd olbd xrootd xrootd Data Replication Restaging Proxy Search* Data Server (Leaf Node) olbd olbd ^xrootd has builtin proxy support today; discriminating proxies will be available in a near future release. 19: http://xrootd.slac.stanford.edu

  20. Conclusion • High performance data access systems achievable • The devil is in the details • High performance and clustering are synergetic • Allows unique performance, usability, scalability, and recoverability characteristics • Such systems produce novel software architectures • Challenges • Creating applications that capitilize on such systems • Opportunities • Fast low cost access to huge amounts of data to speed discovery 20: http://xrootd.slac.stanford.edu

  21. Acknowledgements • Fabrizio Furano, INFN Padova • Client-side design & development • Principal Collaborators • Alvise Dorigo (INFN), Peter Elmer (BaBar), Derek Feichtinger(CERN), Geri Ganis (CERN), Guenter Kickinger(CERN), Andreas Peters (CERN), Fons Rademakers (CERN), Gregory Sharp (Cornell), Bill Weeks (SLAC) • Deployment Teams • FZK, DE; IN2P3, FR; INFN Padova, IT; CNAF Bologna, IT; RAL, UK; STAR/BNL, US; CLEO/Cornell, US; SLAC, US • US Department of Energy • Contract DE-AC02-76SF00515 with Stanford University 21: http://xrootd.slac.stanford.edu

More Related