1 / 10

A GSI-secured job manager for connecting PBS servers in independent administrative domains

A GSI-secured job manager for connecting PBS servers in independent administrative domains. John Walsh, Brian Coghlan, Stephen Childs, Eamonn Kenny (Trinity College Dublin/EGEE) EGEE 2 nd User Forum – Manchester, May 2007. Introduction. RemotePBS

lenci
Download Presentation

A GSI-secured job manager for connecting PBS servers in independent administrative domains

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A GSI-secured job manager for connecting PBS servers in independent administrative domains John Walsh, Brian Coghlan, Stephen Childs, Eamonn Kenny (Trinity College Dublin/EGEE) EGEE 2nd User Forum – Manchester, May 2007

  2. Introduction • RemotePBS • Based on/extends lcgpbs job manager on LCG-CE • Implements secure execution of grid jobs on remote batch systems (RBS) • Separate administrative domains • Single gatekeeper, multiple RBS model • RBS head/submit node • Installed with gLite WN (+ YAIM/Quattor) • Lightweight • Additional “mini” information provider (IP) • Remote access uses grid credentials • A work in progress, but used at three production EGEE sites • Restricted VO/users EGEE 2nd User Forum, Manchester, May 11th 2007

  3. Current GK problems lcgpbs • Allows “remote” execution using routing queues • Requires /etc/hosts.equiv authentication • Known PBS issue • Remote batch submit node → gatekeeper • Weak security model gLite-CE • Separate CE/RBS possible • RBS requires /etc/hosts.equiv • Same administrative domian EGEE 2nd User Forum, Manchester, May 11th 2007

  4. Mini IP # gridgate.ucd.ie:2119/jobmanager-remotepbs-rowan, mpUCDie, local, grid dn: GlueCEUniqueID=gridgate.ucd.ie:2119/jobmanager-remotepbs-rowan,mds-vo-name =mpUCDie,mds-vo-name=local,o=grid GlueCEHostingCluster: gridgate.ucd.ie GlueCEName: rowan GlueCEUniqueID: gridgate.ucd.ie:2119/jobmanager-remotepbs-rowan GlueCEInfoGatekeeperPort: 2119 GlueCEInfoHostName: gridgate.ucd.ie GlueCEInfoLRMSType: remotepbs GlueCEInfoLRMSVersion: 2.1.8 GlueCEInfoTotalCPUs: 194 GlueCEInfoJobManager: remotepbs GlueCEInfoContactString: gridgate.ucd.ie:2119/jobmanager-remotepbs-rowan GlueCEInfoApplicationDir: /home/ # cosmo, gridgate.ucd.ie:2119/jobmanager-remotepbs-rowan, mpUCDie, local, gri d dn: GlueVOViewLocalID=cosmo,GlueCEUniqueID=gridgate.ucd.ie:2119/jobmanager-rem otepbs-rowan,mds-vo-name=mpUCDie,mds-vo-name=local,o=grid GlueVOViewLocalID: cosmo GlueCEAccessControlBaseRule: VO:cosmo EGEE 2nd User Forum, Manchester, May 11th 2007

  5. RemotePBS network architecture EGEE 2nd User Forum, Manchester, May 11th 2007

  6. Job execution flow Remote PBS queue info published by site BDII TLGS/RB ↔ GK interaction remains the same However, no local queue required on GK • Queue name used by remotepbs as lookup to config data • Remote submission node name • Remote gsisshd port • Real remote queue name on RBS • additional PBS server directives (PPN etc) Job Script/Data constructed on GK • Minor modifications • Symbolic links are now relative • Copied to remote submission node via gsissh • Job submitted via gsissh using qsub on remote submission node EGEE 2nd User Forum, Manchester, May 11th 2007

  7. Job status Job ID tracked by GK Monitor process on GK looks up all jobs • Iterates over all remote jobs • Gets unique remote host/queuename pairs • Gsissh qstats to all unique hosts for user jobs • Removes completed jobs • Safe clean up job data on RBS EGEE 2nd User Forum, Manchester, May 11th 2007

  8. RBS setup Gsisshd from VDT • RBS needs host cert • Config can limit connection to only those from GK Shared home directory on RBS/W Modules (optional) • User Grid Context can be determined at login to RBS • Grid environment set up • “module load grid.ie” implicit with gsissh connection • Allows static user to use local batch + grid access EGEE 2nd User Forum, Manchester, May 11th 2007

  9. Current issues JM doesn’t yet implement Access Control on Users/VO • Globus monitoring process connects for invalid user Lifetime of LCG-CE • Move to gLite-CE(?) • Timeframe to implement equivalent + improvements • gLite-CE BLAHP could simplify matter Independent pool accounts not yet possible • Username and $HOME must be same on GK and RBS • Use static accounts • Need to implement pool on CE + pool or static on RBS gsissh needs quick timeout • RBS responsive? APEL accounting records EGEE 2nd User Forum, Manchester, May 11th 2007

  10. Summary • RemotePBS • Implements secure execution of grid jobs on RBS • Separate administrative domains • Single gatekeeper, multiple RBS model • Accommodates Compute Centres with headnode-only model • A work in progress, but used at three production EGEE sites • Acknowledgements • David Golden (UCD & DIAS) • Maarten Litmaath & David Smith (CERN) • Alastair McKinstry (ICHEC) • Stephane Dudzinski (DIAS & TCD) • CosmoGrid project consortium EGEE 2nd User Forum, Manchester, May 11th 2007

More Related