1 / 19

CREAM deployment news

CREAM deployment news. John Gordon, Antonio Retico GDB 10-Feb-10 - CERN. Agenda. Good afternoon!. Status of deployment Open questions a) How goes the experiment testing? b) Are there still blockers to phasing out lcg -CE? c) Was scalability and reliability proven?

becca
Download Presentation

CREAM deployment news

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CREAM deployment news John Gordon, Antonio Retico GDB 10-Feb-10 - CERN

  2. Agenda Good afternoon! Status of deployment Open questions • a) How goes the experiment testing? • b) Are there still blockers to phasing out lcg-CE? • c) Was scalability and reliability proven? • d) Why sites don't upgrade? GDB - 10 Feb 10 - CERN

  3. Deployment status Sites encouraged to deploy CREAM at various levels • MB, GDB, SA1, Pilot Service • Nick’s list around for 1 year now ( Happy Birthday!) • lcg-CE still far from replacement ~ 50 instances around on Feb 2010 • Clearly not very popular yet 6 European T1s have installed some • Do we expect, ASGC, TRIUMF, BNL and FNAL to use it? Time for a mini-review? GDB - 10 Feb 10 - CERN

  4. Open questions a) How goes the experiment testing? b) Are there still blockers to phasing out lcg-CE? c) Was scalability and reliability proven? d) Why sites don't upgrade? GDB - 10 Feb 10 - CERN

  5. Open questions a) How goes the experiment testing? GDB - 10 Feb 10 - CERN

  6. Experiment@Work (Alice) Historically the happiest On the way of deprecating lcg-CEs at their sites • Also for submission via WMS Can they do it? • Would that affect A/R metrics (see next slides)? GDB - 10 Feb 10 - CERN

  7. Experiment@Work (LHCb) Submission to CREAM seamlessly enabled SAM tests show many sites still failing • ~40% of sites are passing the tests • Mostly faulty configuration of the LHCb queues • Not a bug but diffused inexperience with CREAM config at sites GDB - 10 Feb 10 - CERN

  8. Experiment@Work (CMS) Considerable testing activity registered recently Trying to us PROD Agent with ICE-CREAM A couple of issues reported • Bookkeeping • Problems in updating the job status • Jobs actually finished are still reported as running • Operations • services started in the wrong order by YAIM after updates The first is seen as a showstopper for production • Not a bug but a CREAM/Blparser configuration issue • CREAM 1.6 (patch 3179) will make configuration easier • Fix for ICE bug #61405 expected GDB - 10 Feb 10 - CERN

  9. Experiment@Work (Atlas) lcg-CE required until end 2010 Outcome of first CondorG submission testing • Testing promising but inconclusive • Only find problems by heavy usage • shift expert support from LCG CE to Cream CE • Keep LCG CE but recommend sites with >1 CE install CreamCE (Rod Walker @ ATLAS Tier-1/2/3 Jamboree) GDB - 10 Feb 10 - CERN

  10. A note about the release CREAM 1.6 expected to come with many bug fixes • Most of them found by developers Still with the developers (now Product Team) • They will do certification of 1.6 Entered pre-certification today (10-Feb) First release with a new delivery process • May take some time GDB - 10 Feb 10 - CERN

  11. Open questions b) Are there still blockers to phasing out lcg-CE? (My view on Nick’s list. Comments welcome) GDB - 10 Feb 10 - CERN

  12. CondorG (point B) Testing of CondorG submission path taking off now • Issues are still under analysis Need to wait GDB - 10 Feb 10 - CERN

  13. Operations tools (Point D) SAM/Nagios tests are there What about A/R metrics? • Can a site run only CREAM (and still count as CE provider to WLCG)? Long transition period to be expected • With CEs we cannot use the ‘SRMv2 test approach’ • Wait for enough CREAMs to be there • Switch the A/R to use CREAM “overnight” What is site CE availability for a site? • Av[CE] = OR [Av (CREAM),Av(lcg-CE), Av(ARC)] or • Av[CE] = AND [Av (CREAM),Av(lcg-CE), Av(ARC)] ? • Something new to be implemented in Gridview GDB - 10 Feb 10 - CERN

  14. Graceful failure (point O) Still some developments expected to fix point O) • “Graceful failure or self-limiting behavior when the CE load reaches its maximum” • Problem probably hit at KIT (pending jobs) • New limiter expected in 1.6 GDB - 10 Feb 10 - CERN

  15. Open questions c) Was scalability and reliability proven? GDB - 10 Feb 10 - CERN

  16. Scalability/Reliability (various points) Scalability • Which sites can report a production experience at significant scale? Reliability • Issues still being found affecting version 1.5 • Mainly bad configurations concerning WMS submission path • Mostly fixed with CREAM 1.6 + a new version of ICE GDB - 10 Feb 10 - CERN

  17. Open questions d) Why sites don't upgrade? GDB - 10 Feb 10 - CERN

  18. Some hypothesis No pressure by the experiments • Are the experiments happy with the current scale? New “latest and greatest” updates always in the pipe • One could say that time is still needed to mature lcg-CE works for now (don’t fix it!) lcg-CE is still the unique reference for site computing quality reports Others? GDB - 10 Feb 10 - CERN

  19. Questions? ? GDB - 10 Feb 10 - CERN

More Related