1 / 18

how Shibboleth can work with job schedulers to create grids to support everyone

how Shibboleth can work with job schedulers to create grids to support everyone. H. David Lambert, Stephen Moore, Arnie Miles, Chad La Joie, Brent Putman, Jess Cannata. Exposing Computational Resources Across Administrative Domains. The Paradox of Grid Computing.

ira
Download Presentation

how Shibboleth can work with job schedulers to create grids to support everyone

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. how Shibboleth can work with job schedulers to create grids to support everyone H. David Lambert, Stephen Moore, Arnie Miles, Chad La Joie, Brent Putman, Jess Cannata Exposing Computational Resources Across Administrative Domains

  2. The Paradox of Grid Computing Large amounts of computing power goes untapped, yet researchers cannot typically find computing power. Resource owners must set policies for the use of their equipment. Users must find and leverage resources that apply to their needs.

  3. Secure grid-like installations are not growing beyond small groups of known players. but....WHY? The only method currently available for ensuring security of a resource involves personal interaction between resource owners and resource consumers. Enabling a user or resource to access a resource requires manually adding user to a local map file. Various methods of grouping users and resources to share certificates have sprung up.

  4. What does this mean? Historically, getting massive quantities of resources on the grid has been a challenge. However, in situations where the potential resource owners are relieved of heavy administrative burdens, resource owners flock to the grid. When massive numbers of resources are made available to researchers, real work gets accomplished. On the other hand Grids that encourage resource owners to connect their machines to a central portal that only allows specific efforts to run have exploded. S.E.T.I. United Devices Grid.org IBM's World Community Grid

  5. Modern Job Scheduling software include: Condor Sun Grid Engine (N1) PBS (Pro and Open) LSF Platform How are jobs executed?

  6. Job scheduling software is unsurpassed in environments where there is only one administrative domain. Unfortunately, as soon as you begin to cross any sort of administrative line, these products become less robust. Beowulf Clusters High Performance n-way devices Intra-Campus grids Inter-Campus grids Attempts to leverage existing grid tools to handle this have resulted in compromises. Groups of users sharing one certificate. User management issues. Accounting issues.

  7. In general, job scheduling software accepts a job description file that describes the work to be done. Job file is free form text, containing name-value pairs. We can therefore add anything we want to these files, as long as we teach the execution machines to understand.

  8. Example Submission file (Condor) # Example condor_submit input file # (Lines beginning with # are comments) Universe = vanilla Executable = /home/arnie/condor/my_job.condor Input = my_job.stdin Output = my_job.stdout Error = my_job.stderr Arguments = -arg1 -arg2 InitialDir = /home/arnie/condor/run_1 Queue

  9. Condor in the Beowulf, Supercomputer, or campus Grid world. User has an account on the cluster or HP device, all nodes are in a closely controlled administrative domain. Universe = vanilla Executable = /home/arnie/condor/my_job.condor Input = my_job.stdin Output = my_job.stdout Error = my_job.stderr Arguments = -arg1 -arg2 InitialDir = /home/arnie/condor/run_1 Queue

  10. Condor Grid with Flocking Collector Collector Collector Negotiator Negotiator Negotiator Submit Machine Central Manager (CONDOR_HOST) Pool-Foo Central Manager Pool-Bar Central Manager Schedd “Flocks” are introduced to each other by hostname or IP address.

  11. Job Scheduling with Conventional “Grid” Products: Globus and Condor-G User submits job via Globus enabled version of Condor. Any number of resources “on the grid” accept jobs from Globus Gatekeeper and are distributed to Globus Job Managers to be distributed to resources. Each resource must physically map a Globus x.509 certificate to a local user account.

  12. User and Resources Management Problems How does the owner of a grid resource grant access to large numbers of individuals? Summary of Limitations from Previous Examples How does the owner of a grid resource know when a user granted access by membership in an organization leaves that organization? How does a user easily get added to a resource? How does a user find available resources?

  13. While Condor was already able to leverage user attributes from a local LDAP store, this project demonstrates the first time that Condor can consume user attributes from a remote store. SAML based solutions provide secure access to attributes about a user to a resource to become a powerful partner to existing batch job schedulers.

  14. LDAP 4 IdP DB 4 3 7 5 WAYF 2 Shib/Condor Portal Condor Schedd Condor Schedd Running Job 6 9 11 1 8 Condor Startd Job ClassAd 10 Resource ClassAd User at Site 'A' Resource at Site 'B' 4 4 4 4 4 4 LDAP LDAP LDAP LDAP LDAP LDAP Georgetown IdP Georgetown IdP Georgetown IdP Georgetown IdP Georgetown IdP Georgetown IdP DB DB DB DB DB DB 4 4 4 4 4 4 3 3 3 3 3 3 7 7 7 7 7 7 5 5 5 5 5 5 Federation WAYF Federation WAYF Federation WAYF Federation WAYF Federation WAYF Federation WAYF 2 2 2 2 2 2 Running Job Running Job Running Job Running Job Running Job Running Job UW Condor Schedd UW Condor Schedd UW Condor Schedd UW Condor Schedd UW Condor Schedd UW Condor Schedd UW Shib/Condor Portal UW Shib/Condor Portal UW Shib/Condor Portal UW Shib/Condor Portal UW Shib/Condor Portal UW Shib/Condor Portal Resource Condor Schedd Resource Condor Schedd Resource Condor Schedd Resource Condor Schedd Resource Condor Schedd Resource Condor Schedd 10 10 10 10 10 10 9 9 9 9 9 9 6 6 6 6 6 6 1 1 1 1 1 1 8 8 8 8 8 8 Job ClassAd Job ClassAd Job ClassAd Job ClassAd Job ClassAd Job ClassAd What we are doing now with Shibboleth, LDAP, and Condor User at Site 'A' is aware of a Resource at Site 'B' and Owner of Resource 'B' has granted access to Site 'A'. We leverage the free-text job submission files to add attributes from SAML to our jobs. Bob @ Georgetown University Bob @ Georgetown University Bob @ Georgetown University Bob @ Georgetown University Bob @ Georgetown University Bob @ Georgetown University

  15. New Work: Phase II SAML based grid work engine with intelligent resource management Company “A” Resource Discovery Network Running Job Resource Scheduler 10 Resource Discovery Network Node Identity Provider Running Job Resource Scheduler 10 Resource Discovery Network Node Job Submission Client University “B” Running Job Resource Scheduler 10 Resource Discovery Network Node User Job File Running Job Resource Scheduler 10 Now, Resource owners can grant access to users based upon their attributes instead of their identities. Management of users is again the responsibility of the local administration, as it should be. When Resource Owners can easily set policies without worrying about user management and group memberships, they will become willing to attach their resources to this new computational Grid.

  16. Intelligent Resource Management An Intelligent Resource Management System will allow users to launch jobs from their portal and trust that the work will be sent to the Resource that not only correctly matches the User's job policy, but has the least load on it. This will be done without the User being aware of where the work will be executed. This solution will be scheduler agnostic. Users have their own policy decisions to make: Processor type, Operating System Type, executable location, data location, memory requirements, etc. In the perfect world, Users will have multiple Resources to choose from. These Resources will have different configurations that can match the User policy requirements. These varied Resources will also have an ever-changing availability!

  17. Example of Intelligent Agent Company “A” Resource Discovery Network Running Job Scheduler Identity Provider Resource Discovery Network Node Running Job Scheduler Resource Discovery Network Node Job Submission Client University “B” Running Job Scheduler Resource Discovery Network Node User Job File Running Job Scheduler

  18. Acknowledgments Georgetown University: Charlie Leonhardt Steve Moore Arnie Miles Chad La Joie Bent Putman Jess Cannata University of Wisconsin: Miron Livny Todd Tannenbaum Ian Alderman Internet2: Ken Klingenstein, Mike McGill

More Related