1 / 9

Jefferson Lab Site Report

Jefferson Lab Site Report. Kelvin Edwards Thomas Jefferson National Accelerator Facility Newport News, Virginia USA Kelvin.Edwards@jlab.org 757-269-7770 http://cc.jlab.org HEPiX - TRIUMF, Oct. 20, 2003. Central Computing. Sun systems Upgrade to Solaris 8 almost complete HP systems

kuper
Download Presentation

Jefferson Lab Site Report

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Jefferson LabSite Report Kelvin Edwards Thomas Jefferson National Accelerator Facility Newport News, Virginia USA Kelvin.Edwards@jlab.org 757-269-7770 http://cc.jlab.org HEPiX - TRIUMF, Oct. 20, 2003

  2. Central Computing • Sun systems • Upgrade to Solaris 8 almost complete • HP systems • All upgraded to HP 11i • Moving away from HP for central services • Linux systems • Still at RedHat 7.2 • Evaluating RedHat 10 (Fedora 1) • Windows 2000 Domain Upgrade • Implemented in May • Working on Group policy issues

  3. Central Computing (cont) • Network Appliance • 2 recently upgraded to the FAS940 (~16k NFS Ops/sec) • ~4.5TB online disk space (1.5TB home, 2TB group) • Linux fileserver • 3Ware SATA system • 2TB scratch area (16 160GB Seagate SATA drives) • Backups • QuickRestore • Seagate LTOs, Overland Tape Library

  4. Scientific Computing • JASMine & Auger (http://cc.jlab.org/scicomp) • JASMine: Mass Storage Tape + Disk Cache • Auger: Batch Farm Management & Monitoring • Typical Day • 2 – 4 TB of INPUT data through the farm • Process 2000 – 5000 jobs • Certificates used for all user authentication • Tape drives • 6 9840s – migrating data to 9940Bs • 13 9940A – Read only • 15 9940B – all data written to these tapes

  5. Scientific Computing (cont) • Linux File Servers • 16 Data Movers – • 10 Mylex eXtremeRAID 2000 RAID cards (RAID-5) (SCSI) • 6 Adaptec 2200S Raid Cards (RAID-50) (U320 SCSI) • 32 Cache/Work File Servers • Mixture of Mylex and 3Ware cards • Batch Farming – over 24000 SPECint95, LSF • 178 RH 7.2 Linux dual-processors (P2 750 to P4 2.66GHz)

  6. Noteworthy • Kswapd failures -- Solved • Automount timeouts set to 60 seconds, NOT minutes • Adaptec 2200S raid cards • Instead of the MegaRaid cards • Not quite as fast, but acceptable • Timeout problem -- fix available • Adaptec TOE (TCP Offloading Engine) • Problems with RH7.2, custom kernel (XFS), and their driver • Anyone else using them? Good results?

  7. Projects • Windows • Standard builds (Server, IIS, desktop, laptop) • Backup Software Upgrade • Reliaty (was QuickRestore) • SSH v2 Internally • Networks • Gigabit connection to our border router • VLans for use on site

  8. Projects (cont) • JASMine • Rewrite disk cache • Support farm output caches • Policy-based file movement off-site • Auger • Better file scheduling/pinning

  9. Projects (cont) • PPDG • SRM version 2 • Replication • Replica Catalog web service interface • Remote Job submission • User and System JDLs • Batch web service integration with Auger

More Related