1 / 28

Challenges Running an NFSv4-backed OSG Cluster

Kevin Coffman kwc@citi.umich.edu Center for Information Technology Integration University of Michigan. Challenges Running an NFSv4-backed OSG Cluster. Overview. Basic NFSv4 in production Open Science Grid (OSG) Overview OSG Installation OSG Configuration Submitting a job!

luana
Download Presentation

Challenges Running an NFSv4-backed OSG Cluster

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Kevin Coffman kwc@citi.umich.edu Center for Information Technology Integration University of Michigan Challenges Running an NFSv4-backed OSG Cluster

  2. Overview • Basic NFSv4 in production • Open Science Grid (OSG) Overview • OSG Installation • OSG Configuration • Submitting a job! • Authentication differences (AFS vs. NFSv4) • Authentication futures

  3. Basic NFSv4 file service in production • Basic file storage • User name mappings • Home directories • Kernel builds, etc.

  4. Open Science Grid Overview • Architecture • Head node & worker notes • Core is NSF Middleware Initiative (including Globus, Condor, kx.509) • Authentication • X.509, kx.509, proxy certs • No cluster file-system required • “Home”, Base, Data, Apps, Temp, Worker node temp

  5. OSG Installation • New Linux kernels, new NFSv4 code, new OSG releases, repeat! • Base installation is done solely on head node • Credentials needed • Root access assumed for local file system access • Mapping machine cred now necessary • Kerberos credentials for NFS file system access • Name-to-UID mapping issues • Found the need for tools/scripts for flushing mappings

  6. OSG Configuration • Daemons (i.e., MonALISA and Condor) on head node and worker nodes require authentication for file system access • Keytabs • More name to UID mapping required • Virtual Organization (VO) accounts • DN to UNIX account name via grid-mapfile • Name to UID mappings required for file system access

  7. Submitting a job! • Job submission uses X.509 authentication • Need Kerberos authentication for file-system access • Gatekeeper forks a job manager process for each job • Job manager process runs as the original user and needs user’s credentials • Verified works as expected using AUTH_SYS w/o requiring Kerberos credentials

  8. mod_ssl Browser mod_kct libpkcs11 KCT kx509 mod_kx509 KCA kinit KDC CHEF Authorization GateKeeper Resource Mgr Authorization Resource MGRID Architecture MGRID Portal User Workstation Apache SSL (Client Certificate required) 3 Kerberos V5 4 Kerberos 2 5 Kerberos mod_ jk mod_ php 1 6 Tomcat GSI Grid Resource LDAP 6 SASL 7 LDAP SASL 8

  9. Grid job authentication issues • Jobs scheduled to run in the future • Long-running jobs (refreshing credentials) • Combination of both (future and long-running) • Distribution of user credentials to worker nodes for file system access

  10. Authentication differences(AFS vs NFSv4)

  11. Current Architecture KDC TGS AS 6 1 client server 5 9 SVC GSSD GSSD user process user 7 kernel 12 8 10 13 3 4 gss context cache gss context cache NFS NFSD 11 Credentials on Disk 2 keytab

  12. Authentication futures • SPKM3 • Allows us to stay in X.509 world • Anonymous (DH) • Certificate on server to prevent MIM • X.509 Certificates • LIPKEY • Built on top of SPKM3 • Allows TLS-like password authentication

  13. Linux kernel keys support(a.k.a. keyring) • General credential storage in-kernel • thread-specific keyring • process-specific keyring • session-specific keyring (PAG-like via JOIN_SESSION_KEYRING) • Different key types: ‘user’, ‘rpcsec_gss context’ • Create, delete, link, search, revoke, etc. • Quotas and permissions • Referenced by serial # and description

  14. MIT Kerberos ccache using keyring as backing storage • Assumes a single “active” credentials cache • Can store more than one ccache in same session keyring • All user-level code Session | +---> krb5_cc_active (key: contains 0x00004f12) | +---> /tmp/krb5cc_20010_XF45C2 (keyring: id is 0x000023cd) | | | +---> kwc@CITI.UMICH.EDU (principal info) | +---> krbtgt/CITI.UMICH.EDU@CITI.UMICH.EDU | +---> nfs/screamer.citi.umich.edu@CITI.UMICH.EDU | +---> nfs/troy.citi.umich.edu@CITI.UMICH.EDU | +---> pop/citi.umich.edu@CITI.UMICH.EDU | +---> afs@CITI.UMICH.EDU | +---> /tmp/krb5cc_20010_umich (keyring: id is 0x00004f12) | +---> kwc@UMICH.EDU (principal info) +---> krbtgt/UMICH.EDU@UMICH.EDU +---> imap/tremors.itd.umich.edu@UMICH.EDU

  15. Mount using keyring support • Mount program will use keytab to set up machine credentials in keyring • /sbin/request-key invoked and finds machine credentials • Context is negotiated and “rpcsec_gss context” key instantiated

  16. User access using keyring support • Assumes they have credentials in keyring via kinit or PAM • No more looking around blindly for creds in filesystem • /sbin/request-key invoked and finds user’s session-specific credentials

  17. Keyring issues • Upcalls from asynchronous events • Still need to tie “rpcsec_gss context” keys to Kerberos credentials

  18. Future Architecture KDC TGS AS 4 1 client server 7 SVC GSSD request-key handler user process user 5 kernel 10 6 8 11 TGT 2 3 gss context cache gss contextcache(in keyring) NFS NFSD 9 keytab

  19. Questions / Discussion http://www.citi.umich.edu/projects

  20. References • Open Science Grid • http://www.opensciencegrid.org • MonALISA • http://monalisa.cacr.caltech.edu • Condor • http://www.cs.wisc.edu/condorCondor • Keyring • Kernel Source: /Documentation/keys.txt

  21. Backup Slides

  22. Krb5: Obtaining gss context • TGT: currently stored in file system • Per NFSD service ticket: currently stored in file system • GSSD locates user credentials by convention (/tmp/krb5cc_uid) • Synchronizing gss_context and credential problematic

  23. Linux credential interface • New system calls for kernel credential placement • Available for upcoming PAG implementation • Passed via upcall to GSSD • Credential vs. gss context management no longer a problem

  24. Linux Krb5 kernel credential • Pass TGT to kernel as credential • Stored in user process (PAG) • Passed to GSSD via gss_init_sec_context upcall • GSSD manages Krb5 NFSD service tickets • Multiple in kernel TGTs vs. cross realm authentication

  25. Client: LIPKEY with SPKM3 • Initiator • Anonymous SPKM3 client • Credential: • LIPKEY username and password • sent to server encrypted in SPKM3 session key • Context • per <user, nfsd> LIPKEY(?) and SPKM3 gss context

  26. Linux LIPKEY kernel credential • LIPKEY credential (username and password) is per server. • Not stored in kernel • Instead, store information to be passed to GSSD which will prompt user for LIPKEY password for each NFSD.

  27. Client: SPKM with X509 • Initiator • password for long term user X.509 private key • Credential • short term proxy X509 credential and private key (grid-proxy-init) • Context • per <user, nfsd> SPKM gss context

  28. Linux SPKM kernel credential • Pass proxy (short term) X509 credential and private key to kernel as credential • Stored in user process (PAG) • Passed to GSSD via gss_init_sec_context upcall • GSSD manages CA hierarchy and credential checking

More Related