1 / 19

CERN Data Services Update

CERN Data Services Update. HEPiX 2004 / NeSC Edinburgh Data Services team: Vladim ír Bahyl, Hugo Caçote, Charles Curran, Jan van Eldik, David Hughes, Gordon Lee, Tony Osborne, Tim Smith. Outline. Data Services Drivers Disk Service Migration to Quattor / LEMON Future directions

uma
Download Presentation

CERN Data Services Update

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CERN Data ServicesUpdate HEPiX 2004 / NeSC Edinburgh Data Services team: Vladimír Bahyl, Hugo Caçote, Charles Curran, Jan van Eldik, David Hughes, Gordon Lee, Tony Osborne, Tim Smith

  2. Outline • Data Services Drivers • Disk Service • Migration to Quattor / LEMON • Future directions • Tape Service • Media migration • Future directions • Grid Data Services CERN Data Services: Tim.Smith@cern.ch

  3. Data Flows • Tier-0 / Tier-1 for the LHC • Data Challenges: • CMSDC04 (finished) ; PCP05 (Autumn) +80; +170 • ALICE ongoing +137 TB • LHCb ramping up +40 TB • ATLAS ramping up +60 TB • Fixed Target Programme: • NA48 at 80 MB/s +200 TB • COMPASS at 70 MB/s (peak 120) +625 TB • nToF at 45 MB/s +180 TB • NA60 at 15 MB/s +60 TB • Testbeams at 1~5 MB/s (x 5) • Analysis… CERN Data Services: Tim.Smith@cern.ch

  4. Disk Server Functions CERN Data Services: Tim.Smith@cern.ch

  5. Generations 1st & 2nd 4U 3rd & 4th 8U 0th Jumbos CERN Data Services: Tim.Smith@cern.ch

  6. Warrantees CERN Data Services: Tim.Smith@cern.ch

  7. Disk Servers: Jan 2004 • 370 EIDE Disk Servers • Commodity Storage in a box • 544 TB of disk capacity • 6700 spinning disks • Storage Configuration • HW Raid-1 mirrored for “maximum reliability” • ext2 file systems • Operating systems • RH6.1, 6.2, 7.2, 7.3, RHES • 13 different kernels • Application uniformity; CASTOR SW CERN Data Services: Tim.Smith@cern.ch

  8. Quattor-ising • Motivation: Scale • Uniformity; Manageability; Automation • Configuration Description (into CDB) • HW and SW; nodes and services • Reinstallation • Production machines – min service interruption! • Eliminate peculiarities from CASTOR nodes • MySQL, web servers • Refocus root control • Quiescing a disk server ≠ draining a batch node! • Gigabit cards gymnastics • (ext2 -> ext3) • Complete (except 10 RH6 boxes for Objectivity) CERN Data Services: Tim.Smith@cern.ch

  9. LEMON-ising • MSA everywhere • Linux box monitoring and alarms • Automatic HW static checks • Adding • CASTOR server specific • Service monitoring • HW Monitoring • lm_sensors (see tape section) • smartmontools • smartd deployment • Kernel issues; firmware bugs; through 3ware controller • smart_ctl auto checks; predictive monitoring • IPMI investigations; especially remote access • Remote reset/power-on/power-off CERN Data Services: Tim.Smith@cern.ch

  10. Failure rate unacceptably high 10 months to be believed 4 weeks to execute 1224 disks exchanged (out of 6700) And the cages Western Digital; type DUA Head instabilities Disk Replacement CERN Data Services: Tim.Smith@cern.ch

  11. Disk Storage Futures • EIDE Commodity storage in a box • Production systems • HW Raid-1 / ext3 • Pilots (15 production systems) • HW Raid-5 + SW Raid-0 / XFS • (See Jan Iven’s talk next) • New tenders out… • 30TB SATA in a box • 30TB external SATA disk arrays • New CASTOR stager (see Olof’s talk) CERN Data Services: Tim.Smith@cern.ch

  12. Tape Service • 70 tape servers (Linux) • (mostly) Single FibreChannel attached drives • 2 symmetric robotic installations • 5 x STK 9310 Silos in each Drives Media CERN Data Services: Tim.Smith@cern.ch

  13. lm_sensors package General SMBus access and hardware monitoring. Used to access LM87 chip Fan speeds Voltages Int/Ext temperatures ADM1023 chip Int/Ext temperatures Tape Server Temperatures CERN Data Services: Tim.Smith@cern.ch

  14. Tape Server Temperatures CERN Data Services: Tim.Smith@cern.ch

  15. Media Migration • To 9940B (mainly from 9940A) • 200GB – extra capacity avoids unnecessary acquisitions • Better performance – though hard to benefit in normal chaotic mode • Reduced errors; fewer interventions • 1-2% of A tapes can not be read (extremely slow) on B drives • Have not been able to return all A-drives CERN Data Services: Tim.Smith@cern.ch

  16. Tape Service Developments • Removing tails… • Tracking of all tape errors (18 months) • Retiring of problematic media • Proactive retiring of heavily used media (>5000 mounts) • repack on new media • Checksums • Populated writing to tape • Verified loading back to disk • 22% already after few weeks CERN Data Services: Tim.Smith@cern.ch

  17. Water Cooled Tapes! • Plumbing error! • 5000 tapes disabled for a few days • 550 superficially wet • 152 seriously wet – visually inspected CERN Data Services: Tim.Smith@cern.ch

  18. Tape Storage Futures • Commodity drive studies • LTO-2 (Collaboratively CASPUR/Valencia) • Test and evaluate High-end drives • IBM 3592 • STK NGD • Other STK offerings • SL8500 robotics and silos • Indigo; managed storage, tape virtualisation CERN Data Services: Tim.Smith@cern.ch

  19. GRID Data Management • GridFTP + SRM servers (Former) • Standalone / experiment dedicated • Hard to intervene; not scalable • New load-balanced 6 node Service • castorgrid.cern.ch • SRM modifications to support operate behind load balancer • GridFTP standalone client • Retire ftp and bbftp access to CASTOR CERN Data Services: Tim.Smith@cern.ch

More Related