220 likes | 342 Views
This paper presents an in-depth analysis of the performance and scalability of xrootd, a distributed data access technology crucial in High Energy Physics. Contributions from experts at SLAC, INFN, and CERN discuss architectural overviews, performance benchmarks, server scalability, and resource overhead impact. The xrootd system shows remarkable efficiency in large data transfers and can scale to support over 256,000 servers in a cluster, making it a vital component in modern data-intensive applications in the field.
E N D
Performance and Scalability of xrootd Andrew Hanushevsky (SLAC), Wilko Kroeger (SLAC), Bill Weeks (SLAC),Fabrizio Furano (INFN/Padova), Gerardo Ganis (CERN)Jean-Yves Nief (IN2P3), Peter Elmer (U Wisconsin) Les Cottrell (SLAC), Yee Ting Li (SLAC) • Computing in High Energy Physics • 13-17 February 2006 • http://xrootd.slac.stanford.edu • xrootd is largely funded by the US Department of Energy • Contract DE-AC02-76SF00515 with Stanford University
Outline • Architecture Overview • Performance & Scalability • Single Server Performance • Speed, latency, and bandwidth • Resource overhead • Scalability • Server and administrative • Conclusion 2: http://xrootd.slac.stanford.edu
Performance authentication (gsi, krb5, etc) lfn2pfn prefix encoding authorization (name based) Protocol (1 of n) (xrootd) File System (ofs, sfs, alice, etc) Storage System (oss, drm/srm, etc) Scaling Clustering (olbd) xrootd Plugin Architecture Protocol Driver (XRD) 3: http://xrootd.slac.stanford.edu
Performance Aspects • Speed for large transfers • MB/Sec • Random vs Sequential • Synchronous vs asynchronous • Memory mapped (copy vs “no-copy”) • Latency for small transfers • m sec round trip time • Bandwidth for scalability • “your favorite unit”/Sec vs increasing load 4: http://xrootd.slac.stanford.edu
Raw Speed I(sequential) Disk Limit sendfile() anyone? Sun V20z 2x1.86GHz Opteron 244 16GB RAM Seagate ST373307LC 73GB 10K rpm SCSI 5: http://xrootd.slac.stanford.edu
Raw Speed II (random I/O) (file not preloaded) 6: http://xrootd.slac.stanford.edu
Latency Per Request 7: http://xrootd.slac.stanford.edu
Event Rate Bandwidth NetApp FAS270: 1250 dual 650 MHz cpu, 1Gb NIC, 1GB cache, RAID 5 FC 140 GB 10k rpm Apple Xserve: UltraSparc 3 dual 900MHz cpu, 1Gb NIC, RAID 5 FC 180 GB 7.2k rpm Sun 280r, Solaris 8, Seagate ST118167FC Cost factor: 1.45 8: http://xrootd.slac.stanford.edu
Latency & Bandwidth • Latency & bandwidth are closely related • Inversely proportional if linear scaling present • The smaller the overhead the greater the bandwidth • Underlying infrastructure is critical • OS and devices 9: http://xrootd.slac.stanford.edu
Server Scaling (Capacity vs Load) 10: http://xrootd.slac.stanford.edu
I/OBandwidth (wide area network) SLAC to Seattle • SC2005 BW Challenge • Latency Û Bandwidth • 8 xrootd Servers • 4@SLAC & 4@Seattle • Sun V20z w/ 10Gb NIC • Dual 1.8/2.6GHz Opterons • Linux 2.6.12 • 1,024 Parallel Clients • 128 per server • 35Gb/sec peak • Higher speeds killed router • 2 full duplex 10Gb/s links • Provided 26.7% overall BW • BW averaged 106Gb/sec • 17 Monitored links total Seattle to SLAC BW Challenge ESnet routed ESnet SDN layer 2 via USN http://www-iepm.slac.stanford.edu/monitoring/bulk/sc2005/hiperf.html 11: http://xrootd.slac.stanford.edu
xrootd Server Scaling • Linear scaling relative to load • Allows deterministic sizing of server • Disk • NIC • CPU • Memory • Performance tied directly to hardware cost • Underlying hardware & software are critical 12: http://xrootd.slac.stanford.edu
Overhead Distribution 13: http://xrootd.slac.stanford.edu
OS Effects 14: http://xrootd.slac.stanford.edu
Device & File System Effects I/O limited CPU limited UFS good on small reads VXFS good on big reads 1 Event » 2K 15: http://xrootd.slac.stanford.edu
NIC Effects 16: http://xrootd.slac.stanford.edu
Super Scaling • xrootd Servers Can Be Clustered • Support for over 256,000 servers per cluster • Open overhead of 100us*log64(number servers) • Uniform deployment • Same software and configuration file everywhere • No inherent 3rd party software requirements • Linear administrative scaling • Effective load distribution 17: http://xrootd.slac.stanford.edu
Cluster Data Scattering (usage) 18: http://xrootd.slac.stanford.edu
Cluster Data Scattering (utilization) 19: http://xrootd.slac.stanford.edu
Low Latency Opportunities • New programming paradigm • Ultra-fast access to small random blocks • Accommodate object data • Memory I/O instead of CPU to optimize access • Allows superior ad hoc object selection • Structured clustering to scale access to memory • Multi-Terabyte memory systems at commodity prices • PetaCache Project • SCALLAStructured Cluster Architecture for Low Latency Access • Increased data exploration opportunities 20: http://xrootd.slac.stanford.edu
Memory Access Characteristics Block size effect on average overall latency per I/O (1 job - 100k I/O’s) Disk I/O Scaling effect on average overall latency clients (5 - 40 jobs) Mem I/O 21: http://xrootd.slac.stanford.edu
Conclusion • System performs far better than we anticipated • Why? • Excruciating attention to details • Protocols, algorithms, and implementation • Effective software collaboration • INFN/Padova: Fabrizio Furano, Alvise Dorigao • Root: Fons Rademakers, Gerri Ganis • Alice: Derek Feichtinger, Guenter Kickinger • Cornell: Gregory Sharp • SLAC: Jacek Becla, Tofigh Azemoon, Wilko Kroeger, Bill Weeks • BaBar: Pete Elmer • Critical operational collaboration • BNL, CNAF, FZK, INFN, IN2P3, RAL, SLAC • Commitment to “the science needs drive the technology” 22: http://xrootd.slac.stanford.edu