110 likes | 128 Views
This status report discusses the initial plan and setting-up issues for the pilot deployment of Glexec/Scas at IN2P3-CC. It also includes the overview of grid job management and last BQS JM enhancements.
E N D
Glexec/SCAS Pilot:IN2P3-CC status Pierre Girard CCIN2P3 T1-T2 2009-02-03
Content • Grid deployment at CCIN2P3 • Initial plan for pilot of Glexec/Scas • Setting-up issues • Conclusion Pierre Girard - Glexec/SCAS: IN2P3-CC status
Grid Job Management at CCIN2P3 • Several Grid WN versions at time AFS Computing Element Computing Element Computing Element Computing Element Glite-WN-3.1.26-glexec Glite-WN-3.1.26-prod BQS Glite-WN-3.1.19-prod Anastasie Glite-WN-3.1.666-pps No MW locally on worker WN WN WN WN WN WN WN WN Globus4-WN Shared FS (afs.in2p3.fr) Computing Pierre Girard - Glexec/SCAS: IN2P3-CC status
Overview of grid job submission Grid Job Credentials 1 RSL WN Submit U-job Glite-WN Computing Element lcg0507012233-1234.sh U-job SL4.5 4 Job Manager spawn 2 Local Job Wrapping lcg0507012233-1234.sh 3 BQS #!/bin/sh #PBS -q T #PBS -l M=2200MB #PBS -l T=3801600 #PBS -l scratch=16250MB #PBS -l platform=LINUX #PBS --share T1prod … qsub U-job Pierre Girard - Glexec/SCAS: IN2P3-CC status
Glite-WN-3.1.26-glexec Glite-WN-3.1.26-prod Glite-WN-3.1.19-prod Glite-WN-3.1.666-pps Globus4-WN WN profile selection by BQS JobManager Grid Job Credentials 1 RSL WN Submit U-job lcg0507012233-1234.sh BQS-JM config Glite-WN Computing Element 6 U-job Dynamically link to WN profile SL4.5 BQS JM 5 rules spawn 2 Local Job Wrapping lcg0507012233-1234.sh 3 4 BQS Set WN profile qsub Glite-WN-3.1.26-glexec U-job AFS Pierre Girard - Glexec/SCAS: IN2P3-CC status
Last BQS JM enhancements • BQS JM control • Submission policy (deny, accept) • Forbearance management if BQS becomes unresponsive • BQS JM Outputs • BQS submission parameters • Class: A (=short), G (=Medium), T (=Long), J (=verylong) • Amount of {Mem, CPU, Scratch} • Farm name • Platform (SL3, SL4, SL5) • Logical resources (list of) • u_dcache_atlas, u_dcache_alice, u_OracleStress_atlas, … • VO Share • Wrapped data • WN profile to be used profilesDirectory = /afs/in2p3.fr/grid/profiles/glite/3.1.25-0/SL4_64/WN32 • Site Name • AFS token (or not) Pierre Girard - Glexec/SCAS: IN2P3-CC status
Last BQS JM enhancements • BQS JM configuration capabilities • (Most of) BQS JM outputs are determined according to configuration rules • A rule is basically an assignment Ex.: SubmissionPolicy = ACCEPT • But can be conditionned depending on some job input data (in the precedence order) • Mapped account • Mapped group • CE queue Ex.: UserSubmissionPolicy_atlas050 = DENY # Specific requirements for ATLAS with queue verylong GroupVirtualQueueMaxMem_atlas_verylong = default GroupVirtualQueueMaxCPU_atlas_verylong = max GroupVirtualQueueMaxScratch_atlas_verylong = default • Configuration syntax • Is quite ugly • Makes the condition combination not possible • But, seems enough for now Pierre Girard - Glexec/SCAS: IN2P3-CC status
Glexec deployment at IN2P3-CC • Glexec is a tool to be deployed on the WN • to be used by the VOs to manage the « real user jobs » within a job pilot • With a setuid capability (job pilot forks the « real user job » by using another account) • Site authorization by « real user job » based on real user proxy • How the deployment was planned • Deploy the Glite-WN/Glexec relocated on AFS • Use the configuration capabilities to redirect the pilot jobs to this deployment profilesDirectory = /afs/in2p3.fr/grid/profiles/glite/3.1.25-0/SL4_64/WN32 UserProfilesDirectory_dteam049 = /afs/in2p3.fr/grid/profiles/glite/3.1.25-0/SL4_32/WN32_GLEXEC • Sounded easy… Pierre Girard - Glexec/SCAS: IN2P3-CC status
Glexec deployment Issues at IN2P3-CC • Glexec requires to be locally installed on Worker • Configuration file absolute path hardcoded • /opt/glite/etc/glite.conf • Only one MW configuration possible • Dynamic library configuration (due to « setuid ») • /etc/ld.so.conf • Only one MW installation possible • Log configuration (syslog) • Not so problematic for now Pierre Girard - Glexec/SCAS: IN2P3-CC status
Glexec deployment in use at IN2P3-CC • We are part of the « SCAS Pilot Service » • Asked to provide SCAS/glexec in production • Load test for SCAS services by Atlas and Lhcb • Deployment done • Useable by both LHCb and Dteam • Through the T1 CEs • According to specific VOMS roles/groups • But • Deployment issues • Break down our WN setup strategy • Relocatable distribution was not ready (home-made) • First tests with LHCb • Were not satisfactory • Raised some questions Pierre Girard - Glexec/SCAS: IN2P3-CC status
Glite-3.2.0 (SL5) at IN2P3-CC • Glite-WN only • Deployed on AFS • Tested with a test CE on BQS Farm « lcg » • Will be activated • as soon as SL5 workers enter the production (done) • A queue will be added to the T2 (T1?) CEs Pierre Girard - Glexec/SCAS: IN2P3-CC status