1 / 51

WestGrid Overview

WestGrid Overview. Dr. Rob Simmonds Distributed Systems Architect. Talk Overview. The WestGrid project The WestGrid HPTC resources Grid services for HPTC and how they will be used in WestGrid. WestGrid Project. 8 institutions More than 250 researchers Technical and operational officers

sienna
Download Presentation

WestGrid Overview

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. WestGrid Overview Dr. Rob Simmonds Distributed Systems Architect

  2. Talk Overview • The WestGrid project • The WestGrid HPTC resources • Grid services for HPTC and how they will be used in WestGrid

  3. WestGrid Project • 8 institutions • More than 250 researchers • Technical and operational officers • HPTC: compute resources and storage • Visualization and collaboration

  4. WestGrid People • PIs • Jonathan Borwein (SFU), Gren Patey (UBC), Jonathan Schaeffer (UofA), Brian Unger (UofC), Mike Vetterli (SFU/TRIUMF) • HPC planning committee • Rob Balantyne, Matthew Choptuik, Corrie Kost, Harold Esche, Paul Lu, Richard Marchand, Seamus O'Shea, Mark Thachuk, Ron Senda, Martin Siegert, Rob Simmonds, Mike Vetterli • Visualization planning committee • Lyn Bartram, Kelly Booth, Pierre Boulanger, Brian Corrie, Sara Diamond, Larry Katz, John MacDonald, Trever Woods • CAO • Ken Hewitt

  5. WestGrid HPTC Resources • 140TB IBM storage server (Power4/AIX) • 1008 processor IBM cluster (IA-32/Linux) • 256 processor SGI Origin (MIPS/Irix) • 144 processor HP SC45 (Alpha/Tru64) All connected by Canada’s world class networks

  6. Grid Computing • “Grid” is a set software services • Combines meta-computing, resource discovery and security • Designed to enable access to resources in different management domains • Grid services will enable WestGrid resources to be integrated into individual researcher’s computing environments

  7. Grid Standardization • Global Grid Form (GGF) is working to provide standards • Open Grid Services Architecture (OGSA) defines low level Grid services

  8. Grid toolkits • Globus (Public domain – ANL/ISI) • Currently version 2.x used for production • Version 3 provides a reference implementation for OGSA • Legion (Commercial – Avaki) • Provides more support for data handing • Will support OGSA

  9. Grid Security Infrastructure • Ability for trusted users to access remote resources without re-authentication • Ability for trusted jobs to access remote resources without re-authentication • Protection against stolen credentials • Avoid requirement for dedicated, highly available security server(s)

  10. Certificate Authority Model • CA issues certificates to trusted users and services • Certificates used to authenticate with remote resources that trust issuing CA • Grid Canada CA will be trusted by WestGrid resources

  11. GSI Proxy Certificates • User credentials delegated from user certificate to proxy certificate • Proxy certificate used for authentication • Proxy certificates have limited lifetime • can also be limited to only authenticate with certain services • Proxy certificate copied to remote resource when job is started

  12. Globus Security Commands • Users can request a certificate using ‘grid-cert-request’ • This creates userkey.pem and usercert_request.pem in ~/.globus/ • Certificate request file sent to CA • usercert.pem is returned and placed in ~/.globus/ Aim to automate this process for WestGrid users

  13. Globus Security – Cont. • Proxy certificate created using ‘grid-proxy-init’ • Proxy certificate examined using ‘grid-proxy-info’ • Proxy certificate destroyed using ‘grid-proxy-destroy’ Proxy certificates could be created during login process

  14. GSI initialization demo …

  15. Enabling Access to Resources • Holding certificate from trusted CA does not guarantee access to resources • Users given access to resource by being included in recource’s grid-mapfile • This allows owner of resource to choose which users are allowed to use the resource • The grid-mapfile maps Grid user to a local account

  16. Globus Job Starting • Run job on remote resource using ‘globus-job-run <host> <program>’ • <host> must trust the CA that signed the users certificate and user must be mentioned in grid-mapfile • Proxy certificate is copied to GASS cache on <host> to enable program to authenticate with other remote resources

  17. Batch Job Starting • ‘globus-job-submit <host> <program>’ • This returns a url used to query job • ‘globus-job-status <url>’ • Find out if the job is waiting, running or finished • ‘globus-job-get-output <url>’ • Get output produced by job. This is stored in the GASS cache on the host where the job is running • ‘globus-job-clean <url>’ • Remove the GASS cache entry for the job in question

  18. GridFTP • ‘globus-url-copy <original> <copy>’ • Copies file from one location to another • file:/<filename> - a file on a local file-system • gsiftp://<host>/<filename> - a file on GridFTP server <host> • Extensions to standard FTP include • Third party transfers • Parallel transfers

  19. Credential Repository • NCSA’s MyProxy server provides an on-line credential repository • User stores proxy certificate in repository • This certificate can be long lived • User can later recover a short lived certificate from the repository

  20. Credential Repository Uses • Used to authenticate with environment when user does not have access to their certificate • e.g., in a Web portal • Could be used to authenticate and get proxy certificate during login process eliminating need for Unix passwords

  21. MyProxy Commands • myproxy-init –s <host> • Put a proxy certificate into the MyPoxy server on <host> • Can specify host using environment variable • myproxy-info –s <host> • View information about user’s proxy certificate • myproxy-get-credential • Get a proxy certificate • myproxy-destroy • Remove proxy certificate from the MyProxy server

  22. Inserting Credential

  23. Recovering Credential

  24. MyProxy Certificate Renewal • Allows automated proxy certificate renewal • Special proxy certificate enables trusted service to renew standard proxy certificate • e.g., trust a local scheduler to renew the certificate before starting a job • Should help to prevent users resorting to insecure means for automating proxy renewal

  25. GSI Enabled SSH Tools • GSI enabled versions of OpenSSH tools will be used in WestGrid • gsi-ssh Authenticates through GSI and copies proxy certificates to remote host • gsi-scp Authenticates through GSI

  26. GSI Enabled SSH

  27. Resource Discovery • Globus uses MDS for resource discovery • GRIS – provides information about individual hosts • GIIS – provides information about groups of hosts • In WestGrid each of the 4 major resources will run a GRIS • At least one GIIS will be provided to hold aggregate information • Probably use one per site

  28. MDS • Publish information to LDAP servers • Information used by Grid services to locate needed resources • Publish information such as • Type(s) of job scheduler available • Parameters accepted by job scheduler • Number of processors • Amount of RAM, disk or tape • Software and license availability

  29. MDS Example

  30. Meta-scheduling • A meta-scheduler is used to submit jobs to other job schedulers • WestGrid will employ meta-scheduling • Condor-G, Silver and Trellis are under consideration • Multiple meta-schedulers could be used • Hierarchical meta-scheduling can be employed

  31. Condor-G • Can be used to submit jobs to specific machines • Can use ‘glideins’ to add resources to local condor pool • New version will include support for batch scheduler advertisements

  32. Condor-G : Glidein Example Movie at http://www.cpsc.ucalgary.ca/~simmonds/EdmontonTalk1/condor_demo1.avi

  33. Result: Solar System Viz Movie at http://www.cpsc.ucalgary.ca/~simmonds/EdmontonTalk1/solarsystem.avi

  34. WestGrid Accounting • Use MDS to publish accounting information from each site to LDAP • WestGrid wide accounting calculated and also published in secure LDAP • Users will be able to gain access to information, filtered by a policy manager

  35. Scheduling Priorities • Plan to use accounting information to provide fairness in scheduling priorities across WestGrid • Feed values calculated using global accounting information back into local batch schedulers

  36. Data Storage • Grid enabled access to storage • Accessible from researcher’s desktop • Distributed file systems currently limited • Security and caching issues • Data repository systems provide much of the functionality required • SRB from SDSC • Giggle from ISI/ANL

  37. Repository management • Large network available file stores • Annotation – meta-data tagging • Data representation optimization • Files, collections and containers • User level replication aided by catalogs

  38. Look at SRB

  39. SRB – “S commands”

  40. Wide Area Message Passing • MPI-G2 enables running of message passing jobs in Grid environment • Attempts to use best MPI implementation at each site • Provides process mapping configuration to group tightly coupled processes

  41. Web Portals • Enable access to Grid services via web browser • Start a secure session then authenticate this session with GSI using credential server • Web session now acts as you in Grid environment WestGrid mock up

  42. WestGrid mock up

  43. WestGrid mock up

  44. WestGrid mock up

  45. WestGrid mock up

  46. Getting a WestGrid Account • Centralized Web based account requests • We get certificate or you use exiting certificate • We setup accounts, install certificates and email you

  47. WestGrid Grid Environment • Initial Grid services use • Globus, MyProxy, OpenSSH, SRB • Services include • Job starting, resource discover, credential management and repository management • Working on having meta-scheduler(s) • Condor-G, …

  48. Lots of work to do … • Distributed file systems • Improved replica management • Fine-grain security • Performance measurement and analysis • Credential based information discovery • Enhanced meta-scheduling • Workflow

  49. Credits – TeleSim helpers • Mark Fox mfox@cpsc.ucalgary.ca (TeleSim programmer) • Web portals, demo • Andrey Mirchovski mirchov@cpsc.ucalgary.ca (TeleSim research student) • Security and chief Globus critic • Phil Rizk rizkp@cpsc.ucalgary.ca (Hons project student/TeleSim programmer) • MDS, accounting and Web services

More Related