1 / 15

London Tier 2

London Tier 2. Status Report GridPP 12, Brunel, 1 st February 2005 Owen Maroney. LT2 Sites. Brunel University Imperial College London (including London e-Science Centre) Queen Mary University of London Royal Holloway University of London University College London. LT2 Management.

stesha
Download Presentation

London Tier 2

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. London Tier 2 Status Report GridPP 12, Brunel, 1st February 2005 Owen Maroney

  2. LT2 Sites • Brunel University • Imperial College London • (including London e-Science Centre) • Queen Mary University of London • Royal Holloway University of London • University College London GridPP 12: London Tier 2 Status

  3. LT2 Management • Management board had first meeting on 3rd December 2004 • Next meeting 9th March 2005 • Members of management board: • Brunel: Paul Kyberd • IC: John Darlington (& Steve McGough) • RHUL: Michael Green • QMUL: Alex Martin • UCL: Ben Waugh • Chair: David Colling • Secretary: Owen Maroney GridPP 12: London Tier 2 Status

  4. Brunel • 1 WN PBS @ LCG-2_2_0 • R-GMA installed but not APEL • In process of adding 60 WN’s • Issues with private networking attempted to resolve with LCG-2_2_0 • Will now proceed directly to LCG-2_3 • Investigating installation of SL on nodes • If goes well will use YAIM • If goes badly will use RH7.3 LCFG GridPP 12: London Tier 2 Status

  5. Imperial College London • 66 CPU PBS HEP farm @ LCG-2_2_0 • APEL installed • Upgrading to LCG-2_3_0 (this week!) • Will still use RH7.3 LFCGng • HEP computing undergoing re-organisation • LCG nodes will be incorporated into SGE cluster, and made available to LCG (dependancy on LeSC SGE integration) • Will re-install as RHEL OS at that time. • London e-Science Centre • Problems over internal re-organisation • SGE farm, 64bit RHEL • Problems with default installation tool (APT) supplied by LCG • Also LCG-2_3 not supported on 64bit systems • Working on deploying LCG-2_3 on 32bit frontend nodes using YUM and RHEL • Tarball install on WN. Hope this is binary compatible! • Then need to work on SGE information provider GridPP 12: London Tier 2 Status

  6. Queen Mary • 320 CPU Torque farm • OS is Fedora 2 • Currently running LCG-2_1_1 on frontend, LCG-2_0_0 on WN. • More up-to-date versions of LCG were not binary compatible with Fedora • Trinity College Dublin have recently provided Fedora port of LCG-2_2_0 and are working on port of LCG-2_3_0 • Will install LCG-2_3_0 frontend as SL3 machines, using yaim. • Install LCG-2_2_0 on Fedora WN • Upgrade to 2_3_0 on WN when TCD ready. GridPP 12: London Tier 2 Status

  7. Little change: 148 CPU PBS farm APEL installed But no data reported! Very little manpower available Currently running LCG-2_2_0 Hoped to upgrade to LCG-2_3_0 during February Late breaking news…. RHUL PBS server hacked and taken offline…. Royal Holloway GridPP 12: London Tier 2 Status

  8. University College London • UCL-HEP 20 CPU PBS farm @ LCG-2_2_0 • In process of upgrading to LCG-2_3_0 • Frontends SL3 using YAIM • WN stay on RH7.3 • UCL-CCC 88 CPU PBS farm @ LCG-2_2_0 • Running APEL • Upgrade to LCG-2_3_0 SL3 during February GridPP 12: London Tier 2 Status

  9. Contribution to GridPP • Promised vs. Delivered : No change since GridPP11 *CPU count includes shared resources where CPU’s are not 100% dedicated to Grid/HEP kSI2K value takes this sharing into account GridPP 12: London Tier 2 Status

  10. Usage by VO (APEL) GridPP 12: London Tier 2 Status

  11. Usage by VO (Jobs) GridPP 12: London Tier 2 Status

  12. Usage by VO (CPU) GridPP 12: London Tier 2 Status

  13. Site Experiences (I) • Storage Elements are all ‘classic’ gridftp servers • Still waiting for deployment release of SRM solution • Problems with experiments use of Tier 2 Storage • Assumption: Tier 2 SE used as a import/export buffer for local farm • Input data staged in for jobs on farm • Output data staged out to long term storage at Tier 0/1 • Tier 2 not permanent storage: no backup! • In practice: Grid does not distinguish between SE’s. No automatic data migration tools. No SE “clean-up” tools. • All SE’s advertised as “Permanent” by default. • “Volatile” and “Durable” settings only appropriate for SRM? • SE’s fill up with data: become ‘read-only’ data servers • Some datafiles left on SE without entry in RLS: dead-space! • One VO can fill an SE blocking all other VO’s • Disk quota integration with information provider • Clean-up tools needed to deal with files older than “x” weeks? • Delete from SE and entry in RLS, if another copy exists • Migrate to different (nearest Tier 1?) SE if only copy • But site admin needs to be in all VO’s to do this! GridPP 12: London Tier 2 Status

  14. Site Experiences (II) • Timing and release of LCG-2_3_0 still could have been improved • Information flow (pre-)release still a problem. • But at least a long upgrade period was allowed! • Structure of documentation changed • Generally an improvement • Some documents clearly not proof read before release • BUT: NO LT2 sites have managed to upgrade yet! • WHY NOT? • Lot’s of absence over Christmas/New Year period: not really 2 months • Perception that YAIM installation tool was not mature: lots of ‘bugs’ • Bugs fixed quickly, but still the temptation to let other sites ‘go first’ • YAIM did not originally handle separate CE and PBS server • Most common configuration in LT2! • Still need to schedule time against other constraints • Hardware support posts still not appointed • Sites still supported on unfunded ‘best-effort’ basis. • Uncertainty at sites if experiments were ready to use SL • New release schedule proposed by LCG Deployment at CERN should help • As should appointment of hardware support posts GridPP 12: London Tier 2 Status

  15. Summary • Little change since GridPP11 • R-GMA and APEL installations • Additional resources (Brunel, LeSC) still to come online • Failure to upgrade to LCG-2_3_0 rapidly • Significant effort over Summer 2004 put a lot of resources into LCG • But manpower was coming from unfunded ‘best-effort’ • When term-time starts, much less effort available! • Maintenance manageable • Upgrades difficult • Major upgrades very difficult! • Use of resources in practice is turning out to be different to expectations! GridPP 12: London Tier 2 Status

More Related