1 / 10

glexec on worker nodes

glexec on worker nodes. David Groep NIKHEF. Why thing glexec? VO side. Background: some VOs prefer to use their own scheduling & job management late binding of jobs to job slots first establishing an overlay network subsequent scheduling and starting of jobs is faster

skimmons
Download Presentation

glexec on worker nodes

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. glexec on worker nodes David Groep NIKHEF

  2. Why thing glexec? VO side Background: some VOs prefer to use their own scheduling & job management • late binding of jobs to job slots • first establishing an overlay network • subsequent scheduling and starting of jobs is faster • hide details between the various grid flavours • implement VO priorities • full use of allocated slots, up to max wall clock time but these VOs will need their ‘own’ scheduler • some of them do have it already, • but then, others don’t, so this must never be the only (or even the default) way of using resources JRA3 EU Review Input DavidG December 7th 2005

  3. Submitting user’s identity & job VO identity/process or VO placeholder manager Site managed and trusted services Sites: glexec on WN requirements • Basic principle • VO supplied schedulers should comply with and implement • the same policies as corresponding functionality in the native batch systems and grid middleware • both now and in the future • Essential ingredients • Independent auditing on the VO actions • Accounting at the user level no longer to be done by the site • ‘trusted’ way to get the user credentials from the VO JRA3 EU Review Input DavidG December 7th 2005

  4. Current mode Job submission in the gLite-CE • VO Scheduler: Condor-C & BLAHP • VO scheduler on head node changes to end-user’s identity (i.e. to the job owner in the VO job source) • On change, site policies are checked • Job on the batch queue has ‘proper’ identity Some current practice • several VOs submit ‘placeholder’ jobs with (essentially) a single identity for all of the VO • The ‘checkered’ placeholder then gets user jobs in ‘some’ way and exetues them with the placeholder’s identity • The site does not ‘see’ the original submitter Of course, also ‘classic’ submissions and proper uid changes by Condor-C&BLAHP on the head node JRA3 EU Review Input DavidG December 7th 2005

  5. VO scheduler on the node Job submission in a glexec-on-WN scenario • VO scheduler submits a placeholder job to the batch system, and the VO ‘placeholder job’ submitter is responsible for the placeholder behaviour this might be a specific role in the VO, or a locally registered ‘badged’ user at each site • The placeholder job is subject to the normal site policies for jobs • The placeholder obtains the true user job, and presents the user credentials and the job (executable name) to the site to request a decision On success: the site will set the uid/gid of the new user’s job On failure: the glexec will return with an error, and the placeholder job can terminate or obtain another job proper uid changes by Condor-C&BLAHP on the head node SHOULD REMAIN DEFAULT JRA3 EU Review Input DavidG December 7th 2005

  6. Status today • ‘glexec’ is part of gLite3.0 • based off the Apache HTTP suexec code base • uses the LCAS and LCMAPS for enforcement and mapping • library-based implementation • needs the gLite-flavour of LCAS/LCMAPS (not the LCG2.x versions) • New modules have been added • LCAS: RSL (executable path) constrains • validation of cert chain and proxy lifetime • restrictions • policy should be located on local posix-accessible file systems • policy transport should be ‘trustworthy’ JRA3 EU Review Input DavidG December 7th 2005

  7. Still needed • Make the credential acquisition process work across the network, so there can be a site-central policy engine • enforcement will have to stay local • Same for LCAS • changeover to standard callouts for both are needed this is planned work, but it is work and will take time JRA3 EU Review Input DavidG December 7th 2005

  8. Needed components, procedures • Auditing the VO placeholder job/scheduler on the WN • check number of ‘fork-execs’ done by the placeholder with the number of glexec invocationsa discrepancy means the VO is cheating on you • check the VO placeholder job is not using too much CPUthe CPU-time / Walltime should be close to zero • credential mapping auditing/logging • ‘JobRepository’ fits the bill • schema allows for recording and retrieving all aspects of credential mapping • records both user identity and any VO attributes • retains the credential mapping for each ‘job’ or glexec invocation • JR is part of the stack, but not widely deployed yet JRA3 EU Review Input DavidG December 7th 2005

  9. Needed auditing Detailed auditing • ‘enterprise class’ operating systems have some kind of auditing • system-call level auditing is typically part of EAL3+ certification • “LAuS” for Linux systems, like RHEL3+ and SELS • gives a wealth of information, even today without ‘glexec-on-WNs’so it’s a good idea even now, and not too hard to do JRA3 EU Review Input DavidG December 7th 2005

  10. Summary • We have to realise that some VOs are doing ‘agent’ jobs today, • there is no effective enforcement against this • and some sites may even just don’t care yet, but others have hard requirements on auditability and regulatory compliance • Some VOs are given a specific target date for leaving this model • This glexec-on-WN model, giving the VOs the tools to comply with site requirements, seems a reasonable way forward • at least makes it better than it is today • but many will miss the warm and fuzzy feeling of trust here there has been a lot of discussion in the group, so have a look at the minutes for details and many more considerations JRA3 EU Review Input DavidG December 7th 2005

More Related