1 / 19

R&D Activities on Storage in CERN-IT’s FIO group

R&D Activities on Storage in CERN-IT’s FIO group. Helge Meinhard / CERN-IT HEPiX Fall 2009 LBNL 27 October 2009. Outline. Follow-up of two presentations in Umea meeting: iSCSI technology ( Andras Horvath) Lustre evaluation project (Arne Wiebalck ). iSCSI - Motivation.

makoto
Download Presentation

R&D Activities on Storage in CERN-IT’s FIO group

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. R&D Activities on Storagein CERN-IT’s FIO group HelgeMeinhard / CERN-IT HEPiX Fall 2009 LBNL 27 October 2009

  2. Outline Follow-up of two presentations in Umea meeting: • iSCSI technology (Andras Horvath) • Lustre evaluation project (Arne Wiebalck) Storage R&D at CERN IT-FIO – Helge Meinhard at cern.ch – 27-Oct-2009

  3. iSCSI - Motivation • Three approaches • Possible replacement for rather expensive setups with Fibre Channel SANs (used e.g. for physics databases with Oracle RAC, and for backup infrastructure) or proprietary high-end NAS appliances • Potential cost-saving • Possible replacement for bulk disk servers (Castor) • Potential gain in availability, reliability and flexibility • Possible use for applications, for which small disk servers have been used in the past • Potential gain in flexibility, cost-saving • Focus is functionality, robustness and large-scale deployment rather than ultimate performance Storage R&D at CERN IT-FIO – Helge Meinhard at cern.ch – 27-Oct-2009

  4. iSCSI terminology • iSCSI is a set of protocols for block-level access to storage • Similar to FC • Unlike NAS (e.g. NFS) • “Target”: storage unit listening to block-level requests • Appliances available on the market • Do-it-yourself: put software stack on storage node, e.g. our storage-in-a-box nodes • “Initiator”: unit sending block-level requests (e.g. read, write) to the target • Most modern operating systems feature an iSCSI initiator stack: Linux RH4, RH5; Windows Storage R&D at CERN IT-FIO – Helge Meinhard at cern.ch – 27-Oct-2009

  5. Hardware used • Initiators: number of different servers including • Dell M610 blades • Storage-in-a-box server • All running SLC5 • Targets: • Dell Equallogic PS5000E (12 drives, 2 controllers with 3 GigE each) • Dell Equallogic PS6500E (48 drives, 2 controllers with 4 GigE each) • Infortrend A12E-G2121 (12 drives, 1 controller with 2 GigE) • Storage-in-a-box: Various models with multiple GigE or 10GigE interfaces, running Linux • Network (if required): private, HP ProCurve 3500 and 6600 Storage R&D at CERN IT-FIO – Helge Meinhard at cern.ch – 27-Oct-2009

  6. Target stacks under Linux • RedHat Enterprise 5 comes with tgtd • Single-threaded • Does not scale well • Tests with IET • Multi-threaded • No performance limitation in our tests • Required newer kernel to work out of the box (Fedora and Ubuntu server worked for us) • In context of collaboration between CERN and Caspur, work going on to understand the steps to be taken for backporting IET to RHEL 5 Storage R&D at CERN IT-FIO – Helge Meinhard at cern.ch – 27-Oct-2009

  7. Performance comparison • 8k random I/O test with Oracle tool Orion Storage R&D at CERN IT-FIO – Helge Meinhard at cern.ch – 27-Oct-2009

  8. Performance measurement • 1 server, 3 storage-in-a-box servers as targets • Each target exporting 14 JBOD disks over 10GigE Storage R&D at CERN IT-FIO – Helge Meinhard at cern.ch – 27-Oct-2009

  9. Almost production status… • Two storage-in-a-box servers with hardware RAID5 running SLC5 and tgtd on GigE • Initiator provides multipathing and software RAID 1 • Used for some grid services • No issues • Two Infortrend boxes (JBOD configuration) • Again, initiator provides multipathing and software RAID 1 • Used as backend storage for Lustre MDT (see next part) • Tools for setup, configuration and monitoring in place Storage R&D at CERN IT-FIO – Helge Meinhard at cern.ch – 27-Oct-2009

  10. Being worked on • Large deployment of Equallogic ‘Sumos’ (48 drives of 1 TB each, dual controllers, 4 GigE/controller): 24 systems, 48 front-end nodes • Experience encouraging, but there are issues • Controllers don’t support DHCP, manual config required • Buggy firmware • Problems with batteries on controllers • Support not fully integrated into Dell structures yet • Remarkable stability • We have failed all network and server components that can fail, the boxes kept running • Remarkable performance Storage R&D at CERN IT-FIO – Helge Meinhard at cern.ch – 27-Oct-2009

  11. Equallogic performance • 16 servers, 8 sumos, 1 GigE per server, iozone Storage R&D at CERN IT-FIO – Helge Meinhard at cern.ch – 27-Oct-2009

  12. Appliances vs. home-made • Appliances • Stable • Performant • Highly functional (Equallogic: snapshots, relocation without server involvement, automatic load balancing, …) • Home-made with storage-in-a-box servers • Inexpensive • Complete control over configuration • Can run other things than target software stack • Can select function at software install time (iSCSI target vs. classical disk server with rfiod or xrootd) Storage R&D at CERN IT-FIO – Helge Meinhard at cern.ch – 27-Oct-2009

  13. Ideas (partly started testing) • Two storage-in-a box servers as highly redundant setup • Running target and initiator stacks at the same time • Mounting half the disks local, half on the other machine • Some heartbeat detects failures and (e.g. by resetting an IP alias) moves functionality to one or the other box • Several storage-in-a-box servers as targets • Exporting disks either as JBOD or as RAID • Front-end server creates software RAID (e.g. RAID 6) over volumes from all storage-in-a-box servers • Any one (or two with SW RAID 6) storage-in-a-box server can fail entirely, the data remain available Storage R&D at CERN IT-FIO – Helge Meinhard at cern.ch – 27-Oct-2009

  14. Lustre Evaluation Project • Tasks and goals • Evaluate Lustre as a candidate for storage consolidation • Home directories • Project space • Analysis space • HSM • Reduce service catalogue • Increase overlap between service teams • Integrate with CERN fabric management tools Storage R&D at CERN IT-FIO – Helge Meinhard at cern.ch – 27-Oct-2009

  15. Areas of interest (1) • Installation • Quattorized installation of Lustre instances • Client RPMs for SLC5 • Backup • LVM-based snapshots for meta data • Tested with TSM, set up for PPS instance • Changelogs feature of v2.0 not yet usable • Strong Authentication • v2.0: early adaptation, full Kerberos Q1/2011 • Tested & used by other sites (not by us yet) • Fault-tolerance • Lustre comes with built-in failover • PPS MDS iSCSI setup Storage R&D at CERN IT-FIO – Helge Meinhard at cern.ch – 27-Oct-2009

  16. FT: MDS PPS Setup MDS MDT OSS Dell EquallogiciSCSI Arrays 16x 500GB SATA Dell PowerEdge M600 Blade Server 16GB Private iSCSI Network OSS CLT • Fully redundant against component failure • iSCSI for shared storage • Linux device mapper + md for mirroring • Quattorized • Needs testing

  17. Areas of Interest (2/2) • Special performance & Optimization • Small files: „Numbers dropped from slides“ • Postmark benchmark (not done yet) • HSM interface • Active developement, driven by CEA • Access to Lustre HSM code (to be tested with TSM/CASTOR) • Life Cycle Management (LCM) & Tools • Support for day-to-day operations? • Limited support for setup, monitoring and management

  18. Findings and Thoughts • No strong authentication as of now • Foreseen for Q1/2011 • Strong client/server coupling • Recovery • Very powerful users • Striping, Pools • Missing support for life cycle management • No user transparent data migration • Lustre/Kernel upgrades difficult • Moving targets on the roadmap • V2.0 not yet stable enough for testing

  19. Summary • Some desirable features not there (yet) • Wish list communicated to SUN • SUN interested in evaluation • Some more tests to be done • Kerberos, Small files, HSM • Documentation

More Related