1 / 34

Exposing Computational Resources Across Administrative Domains

Exposing Computational Resources Across Administrative Domains. The Condor Shibboleth Integration Project – a scalable alternative for the computational grid.

ankti
Download Presentation

Exposing Computational Resources Across Administrative Domains

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Exposing Computational Resources Across Administrative Domains The Condor Shibboleth Integration Project – a scalable alternative for the computational grid

  2. “ … Since the early days of mankind the primary motivation for the establishment of communitieshas been the idea that by being part of an organized group the capabilities of an individual are improved. The great progress in the area of inter-computer communication led to the development of means by which stand-alone processing sub-systems can be integrated into multi-computer ‘communities’. … “ Miron Livny, “Study of Load Balancing Algorithms for Decentralized Distributed Processing Systems.”, Ph.D thesis, July 1983.

  3. The single biggest road-block to the grid • THE grid will not happen if resource owners do not bring their resources to the table • Resource owners must know their resource is secure • Security mechanisms must be scalable

  4. Grids focus on site autonomy • One of the underlying principles of the Grid is that a given site must have local control over its resources, which users can have an account, usage policies, etc. • Grids: The Top Ten Questions, Jennifer M. Schopf and Bill Nitzberg

  5. Resources must be attracted to the system • Because users demand it! • It is secure • Rules for use of each resource can be established and enforced locally (WHO can do WHAT, and WHEN for HOW LONG?) • It is scalable • Administration of the access to the resource is no longer a daunting (if not impossible) task • It is easy to set up

  6. Introduction to Condor • In a nutshell, Condor is a specialized batch system for managing compute-intensive jobs. Like most batch systems, Condor provides a queuing mechanism, scheduling policy, priority scheme, and resource classifications. Users submit their compute jobs to Condor, Condor puts the jobs in a queue, runs them, and then informs the user as to the result.

  7. Communities benefit from Matchmakers... .. someone has to bring together community members who have requests for goods and services with members who offer them. • Both sides are looking for each other • Both sides have constraints • Both sides have preferences • eBay is a matchmaker • Condor is a matchmaker

  8. Condor’s Power • A user submits a job to Condor • Condor finds an available machine on the network and begins the job • Definition of ‘available’ is highly configurable • If a machine becomes unavailable, job is checkpointed until the resource becomes available, or is migrated to a different resource • Condor does not require an account on the remote machine

  9. Migrating jobs to other Pools • Flocking • Flocking is Condor's way of allowing jobs that cannot immediately run (within the pool of machines where the job was submitted) to instead run on a different Condor pool. • Condor-C • Condor-C allows jobs in one machine's job queue to be moved to another machine's job queue. These machines may be far removed from each other, providing powerful grid computation mechanisms, while requiring only Condor software and its configuration • Condor-C is highly resistant to network disconnections and machine failures on both the submission and remote sides.

  10. DAGMan • DAGMan (Directed Acyclic Graph Manager) is a meta-scheduler for Condor. It manages dependencies between jobs at a higher level than the Condor Scheduler

  11. Disadvantages • Even with Flocking and Condor-C, administrative scalability doesn’t exist • System requires excessive communications between administrators, exchanging of host names and/or IP addresses

  12. What is Shibboleth? • Internet2 Middleware • Shibboleth leverages campus identity and access management infrastructures to authenticate individuals and then sends information about them to the resource site, enabling the resource provider to make an informed authorization decision.

  13. Shibboleth Goals • Use federated administration as the lever; have the enterprise broker most services (authentication, authorization, resource discovery, etc.) in inter-realm interactions • Provide security while not degrading privacy. • Attribute-based Access Control • Mike Gettes, Duke University

  14. Shibboleth Components:Identity Provider • User’s home site, where authentication credentials and attribute info is stored. • Handle Server (HS) – provides a unique and potentially anonymous identity for the user. User authenticates using site’s existing technology (LDAP, Kerberos, WebISO, etc). • Attribute Authority (AA) – responds to requests about the user from the target. Retrieves user attributes from site’s existing identity store – typically a user directory such as LDAP. • Both implemented with Apache, Tomcat and Java servlets/JSP.

  15. Shibboleth Components:Service Provider • Protects the target application, enforces authentication and authorization. • Assertion Consumer Service (ACS) – maintains state information about the user’s unique numerical identifier (handle) • Shibboleth Attribute Requester (SHAR) – makes requests for attributes to the user’s Identity Provider’s Attribute Authority • Co-located with application web server as server module. Implementations currently exist for Apache (UNIX and Windows) and IIS (Windows).

  16. Typical Access Flow • User attempts to access new Shibboleth-protected resource on target site application server. • User is redirected to Where are you From Server (WAYF), selects home site (origin site). Only necessary once per user session. • User is redirected to origin site Handle Server (HS) and authenticates with their local credentials (eg. Username/password)

  17. Typical Access Flow (cont.) • Handle server generates unique numerical identifier (handle) and redirects user to target site’s ACS. • Target ACS hands off to SHAR, which uses the handle to request attributes from the user origin site’s Attribute Authority. • User’s AA responds with an attribute assertion, subject to Attribute Release Policies (ARP). • Target site uses the returned user attributes for access control and other application-level decisions.

  18. Federations • Associations of enterprises that come together to exchange information about their users and resources in order to enable collaborations and transactions • Built on the premise of • Initially “Authenticate locally, act globally” • Now, “Enroll, authenticate and attribute locally, act federally.” • Federation provides only modest operational support and consistency in how members communicate with each other • Enterprises (and users) retain control over what attributes are released to a resource; the resources retain control (though they may delegate) over the authorization decision. • Mike Gettes, Duke University

  19. What if we Integrated Shib with Condor? • Condor would function exactly as it now does • Flocking (or eventually Condor-C) would use Shibboleth as it’s authentication model • Classified Ads would include either user attributes or Shib Unique Identifiers • This brings us to the Condor-Shibboleth Integration Project!

  20. Project Goals • Primary goal is to create a scalable, expandable grid-aware universal workflow management tool for computational grids that doesn't require or exclude use of Globus grid map files. • Scalable. • Functions across unrelated administrative domains • Tied to the Federated Authentication Model. • No ties to Globus Certificates and Grid map files. • Expandable. • Design for relatively simple computational grids, but do not exclude connection to future projects requiring expanded grid services (Globus). • Grid-Shib project already in progress could become a future connection.

  21. Project Goals • Phase I: Shib Enabled Condor Web portal • Shibboleth was originally designed as a web services federated authentication tool, fat clients not yet available. • Phase II: Shib Enabled Condor Fat Client • Extending the existing “submit client” model with Shib elements. • There are already other open source projects in the works which will utilize Shibboleth in a fat client model, so we should be sure our work now will be compatible with the fat client model when it is supported by Shibboleth.

  22. Project Goals • Impact Condor as little as possible. • Identify key components that must be changed, mostly at the execute end. • Some work can be performed by preprocessing scripts at the submit end. • Ensure that changes made at execute end will not interfere with other modes of Condor use. • Ensure that changes made at execute end will also work with eventual fat client version.

  23. Web based Condor Scheduler node • Grid Portal must be running condor_schedd and have $(FLOCK_TO) populated with grid resources. • Grid Portal must be configured as a Shibboleth Server • User creates a job. • Uploads Condor submit script, or • Portal tools could help create simple submit scripts.

  24. Conceptual Workflow Model • User logs in to Grid Portal Web Site (Phase I) • Shibboleth web server module detects that no Shibboleth session exists, and redirects user to the “where are you from (WAYF)” server. • User selects their home institution and is redirected to that Institution's Identity Provider Site, and provides credentials that are stored in some database (LDAP, PKI, RDBMS, Kerberos). • User is redirected back to the Web Portal Server. • The web server module makes a call back to the Shibboleth Attribute Authority and a Shibboleth Session is established for the user. • A Base64 encoded SAML Attribute Assertion is now available for consumption by the Web Portal.

  25. Conceptual Workflow Model • User submits job. • Clicks on a 'Submit' button on Portal site. • Portal Applications pulls out key components (name, institution) to add to class ad from mapped HTTP request headers. • Portal Applications would append the signed base-64 encoded Attributes assertion to the Condor Submit file. • Portal Applications would create a temporary database entry to hold data files for job • Altered Condor Submit file is passed as an argument to condor_submit from the Webserver.

  26. Conceptual Workflow Model • Upon completion of the job, Condor will behave as though this were a Flocking job. • Data files are returned to the Grid Portal machine • Better solution would be to allow user to dictate in job submission file where data will be stored. • Condor supports this, we need to evaluate how Shib enabling Condor will affect this. • User is notified that the job is done

  27. Behind the scenes • Execute resources will advertise not only the typical attributes, but will also advertise who they're willing to work for based upon Grid Access Attributes. • The path to one final Grid Access File is declared in a Condor configuration file on each Master node (optionally each compute node). • Jobs are matched to resources based upon usual “class ad” rules and new Shibboleth rules contained in the Grid Access File. • When a match is made, the execute resource will THEN parse the xml, verify the signature of the attributes added to the submit file by Shibboleth, and extract the relevant attributes. • If the signature is good, the job will be executed following all the functionality of flocking.

  28. Condor Modifications • We propose a hierarchical set of role based authorization configuration files that match the config file structure already in place in Condor. • Each level should establish permissions for that specific level. • A site-wide file would allow the most generic level of access (anyone from Georgetown, anyone from University of Wisconsin, no one else). • Resource level files would specify more specific levels of access (Arnie on this machine any time, anyone from Georgetown only after 9:00 pm).

  29. Grid Access File Structure • Grid Access Template Files. • Grid Access Template Files contain. • Users (individual or affiliation). • Conditions of Use (time, date, load, ranking). • Any machine on Internet can house a Grid Access Template File. • Any Grid Access Template File can refer to one or more other Grid Access Template Files in an include fashion. • GAT files will be processed into a local Grid Access File by a small GAT parsing program.

  30. Grid Access Parser • A Grid Access Parser will run on each machine hosting a Grid Access Template File. • “Interactive Mode” will allow the Parser to help resource owners build their custom Template Files. • “Cron Mode” can help track day to day changes in the Grid. • Parser will create a human readable/consumer readable text file of all contraints listed. • All cited Grid Access Template Files will be parsed. • Final list will include ALL contraints listed in ALL Grid Access Template Files up the chain. • Parser must handle duplicate entries gracefully. • Prevents bloated Grid Access Files. • Allows redundant Template Files to prevent any single points of failure.

  31. Condor Modifications • $(FLOCK_FROM) should be pointed to this Grid Access File. • This will improve scalability by passing responsibility for deciding the final list of submitters to each resource owner. • Some other variable should be set to a site file and one or more trust file. • The site file includes information about identity providers and service providers. • The trust file includes information about trusted signing credentials. • These files are provided by the Federation, and are what represents the Federation in a physical sense, and are necessary for the operation of every identity provider and service provider.

  32. Summary • We will create a scalable, expandable computational grid system that allows the implementation of easy to manage computational grids. • This approach does not require Globus certificates and mapfiles, but will not preclude them (particularly once the separate Grid-Shib project is completed). • This project can be rolled out in two phases. • one that allows the creation of a web based Grid Portal. • a second that allows command-line access from a fat client.

  33. Credits • University of Wisconsin • Miron Livny Professor of Computer Science and Condor Project Lead • Todd Tannenbaum Manager of Condor Development Staff • Ian Alderman Researcher of Data Security for Condor team • Georgetown University • Charlie Leonhardt Chief Technologist • Chad LaJoie • Brent Putman Programmer • Georgetown University Advanced Research Computing • Steve Moore Director • Arnie Miles Senior Systems Architect • Jess Cannata Systems Administrator • Nick Marcou Systems Administrator • Internet 2 • Ken Klingenstein Director of the Internet2 Middleware Initiative • Mike McGill Program Manager for the Internet2 Health Sciences Initiative • Special thanks to Jess Cannata who helped engineer the specific details of this idea.

  34. URL • http://www.guppi.arc.georgetown.edu/condor-shib/

More Related