1 / 8

Current status WMS and CREAM CE deployment

Current status WMS and CREAM CE deployment. Patricia Mendez Lorenzo ALICE TF Meeting (CERN, 02/04/09). WMS: some highlights. In December 2008 ALICE finished the migration of all sites to a WMS submission approach

elke
Download Presentation

Current status WMS and CREAM CE deployment

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Currentstatus WMS and CREAM CE deployment Patricia Mendez Lorenzo ALICE TF Meeting (CERN, 02/04/09)

  2. WMS: somehighlights • In December 2008 ALICE finished the migration of all sites to a WMS submissionapproach • The instabilitiesfound in the system has forced the experiment and the support to babysitcontinuously the system and the production • This proceduredoes not scale in a real data takingapproach (in few months) • ALICE has not changed the submissionproceduredefinedevenbefore 2006 DC • IMHO is not the experimentchaging the submissionprocedurebecause a new service is not providing the correspondingstability • It is the service copingwith the experimentrequirements and computing model, not the opposite • Let’s stop: • sayingthatthis issue affectes ALICE only: It issimply NOT TRUE • Daily I seesimilar issues with Geant4, Lattice QCD, sixT. • Asking ALICE to change the submissionprocedure • It is not realisticatthis point, in addition not see the point of changing one workload management system due to (not wellunderstood) instabilities in a service

  3. ALICE approach • ALICE requiresdeployment of the CREAM-CE at all sites • This is the highestpriority • Sites mightbeexcluded of the production if the service is not provided • The experimentthereforewill not maintain a new submissionprocedure for somemonths • Intermedium time from WMS to CREAM • In addition bothsystems must bemaintaintogether • bulksubmissionis not supported to the CLI levelyet by CREAM • It is not realistic to have 2 submissionapproachesatthis time by NONE application

  4. Status of the WMS in production • Distribution of WMS in the ALICE production • For T0 site • Optimal situation: 3 WMS covering the production and the Pass 1 reconstruction at the T0 only • The reality: Eachnode has achieved a limit of 13K jobs/day (confirmed by the WMS operation experts). In addition thesenodes have to copewith the instabilities of external WMS • For T1 sites • Optimal situation: Each T1 site shouldprovideat least 2 WMS whichshouldbededicated in the case of manydepending T2 sites in the country • The reality: This affects basicallyItaly and France and itisensured by Italy • For T2 sites • Optimal situation:Largefederations WITHOUT a regional T1 shouldfollow the structure asked for the T2 sites (case of Russia) • The reality: the available T1 WMS must flyfrom one T2 to anotherdepending on the dailyoverloadstatus

  5. Sometrues and some lies about the ALICE Submissionprocedure and the WMS • Thelatest WMS mega-patchsolves the overloding issues observed in gLite3.0: FALSE • We have not seenhugebacklogsanymore: TRUE • The ALICE submissionprocedure has changed in the last time producing the instabilitiesobserved in some WMS: FALSE • The experimenttried to accomodate as much as possible the submissionprocedure to WMS withintheirowncomputing model limits: TRUE • Same WMS configuration file as in AFS@CERN • Proxyrenewaltrigeredonly once per hour • RESUBMISSION FEATURE OF THE WMS DISCARTED BY THE EXPERIMENT AT THE JDL LEVEL SINCE FEB2009 • ALICE isthereforeusing the WMS to a treelevel (RB mode) • All the rest of the features are simply not used and not required

  6. WHAT WAS HAPPENING IN FRANCE? • Issues in GRIF and CCIN2P3 are totalyuncorrelated • GRIF • grid33.lal.in2p3.fr gotoverloadedyesterday • In addition itwasannouncedthat ALICE wasoverloading the CE • Resubmissionapproachwasdiscarted • Number of jobs not visible in the IS not the LB (later on) • CCIN2P3 • This is the unique VO supporting CE in the T1 and T2 • CEs withdifferentranks • This situation wasfulfilling one CE (best ranking) leaving the rest of CE empty • The query to the info system wasproviding 0 waiting jobs for those (worseranking) CE and therefore the system kept on submitting jobs • T1 and T2 clisterswillbeseparated in different VOBOXES

  7. Status of the CREAM-CE • New sites providing CREAM-CE: • RU-SPbSU (undertesting) • Prague (still to betested) • Subatech (still to betested) • Alreadyexisting sites with production infrastructures: • FZK (justupgraded to the next version) • Kolkata (performing fine) • KISTI (no issues) • GSI (pending the setup in production) • RAL (no issues) • CNAF (no issues) • CERN (moving the system from SLC5 to SLC4 to increase the number of resources) • Torino (no issues) • SARA (no issues)

More Related