1 / 7

Tier 1 (Grid) Services

Tier 1 (Grid) Services. Ian Collier GridPP Review June 20 th 2012. Past Year. EMI Updates Migration off gLite to EMI(2) Formally engaged with Staged Rollout & Early Adopters process Virtualisation (Nearly) all services on (Hyper-V) virtualised platform

shandi
Download Presentation

Tier 1 (Grid) Services

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Tier 1 (Grid) Services Ian Collier GridPP Review June 20th 2012

  2. Past Year • EMI Updates • Migration off gLite to EMI(2) • Formally engaged with Staged Rollout & Early Adopters process • Virtualisation • (Nearly) all services on (Hyper-V) virtualised platform • Much easier to set up & manage than collection of bare metal • Quick recovery after power events notable • CVMFS Stratum 0 for non-LHC Vos • Actively used now • Responses have been very positive • WeNMR latest, enthusiastic, users

  3. Operational Issues • Batch start rates • Limited • Have been testing alternatives to torque/maui • Condor & SLURM frontrunners • Condor looking very good • Have been hitting scaling limits with SLURM • As side effect also looking at ARC CE • Ne step: test with half of the retiring 2007 WNs on SL6 with new Condor & ARC CE • (cvmfs) Job timeout failures • Low but persistent rate (~5% varying) • Have been testing 2.1.x client • Found much worse problems • Investigation continuing

  4. Coming Year • Continue Updates • Starting on EMI-3 • Further Staged Rollout & Early adoption • Complete SL6 migrations • Virtualisation • Shared storage just coming on-line • Investigations to make full use of that • Replication between buildings, etc. • Distribute services • Between R89 & Atlas ‘outpost’ as it develops • ie BDIIs, FTS’, CEs, etc., spread between 2 buildings • CVMFS Stratum 0 • Erasmus project to build web interface for SW upload • Negotiating for sites to replicate • Reference architecture may be different from WLCG • EGI have picked coordinating network of repositories & replicas • Nikhef & OSG, maybe CERN

  5. Configuration Management • Quattor working well • Although we benefit from QWG, we could do so more • Made some ‘expedient’ choices early on – ready to revisit now • Quattor community more active recently • No longer held back by backward compatibility for CERN • Migration to Aquilon • Opportunity to refactor • Will allow more automation • Will improve workflows. • Of course track other activities& developments

  6. Cloud • SCD Cloud • Concept well proven • ~300 cores, 90-95% use • Adding half of 2007 WNs • Member of staff (not rotating graduate) in plan • Storage • Have small ceph cluster to deploy • Image store • Object (S3) store - service • Active use cases: • Internal (Tier1 & SCT) development & testbeds • High level of user trust • Developing Use cases • Other users in STFC (ISIS, RAL Space) • EGI, GridPP & WLCG Cloud work

  7. Looking to Future Starting to think about: • Post GridPP 4 • Cloud is great for ‘disposable’ resources • What would it take for us to consider it to be solid enough for services now on Hyper-V? • What about layer (& interface) in batch farm?

More Related