1 / 68

The Unicore Project

The Unicore Project. Adam Belloum Computer Architecture & Parallel Systems group University of Amsterdam adam@science.uva.nl. UNICORE Objectives. UNICORE was to hide the seams resulting from different hardware architectures.

rue
Download Presentation

The Unicore Project

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Unicore Project Adam Belloum Computer Architecture & Parallel Systems group University of Amsterdam adam@science.uva.nl

  2. UNICORE Objectives • UNICORE was to hide the seams resulting from different hardware architectures. • Security is built into the design of UNICORE from the start relying on the emerging X.509 standard. • UNICORE is usable by scientists and engineers without having to study vendor or site specifications. • A GUI assists the user in creating and managing complex jobs and integrating applications.

  3. Unicore Functions • Job creation and submission: • Job management: • Data management: • Application support: • Flow control: • Meta-computing: • Single sign-on: • Support for legacy jobs: • Resource management:

  4. The UNICORE Functionality and Architecture The UNICORE system provides functionality to both the end-user and the computer centers • The end-user • Seamless, uniform access to computing and data resources • Move computing jobs between different platforms • The computer centers (or Grid sites) • Gain a solid authentication mechanism fully integrated into their administration procedures • The plugin mechanism allows integrating easy-to-use interfaces, increasing end-user productivity.

  5. End-User Functionality • The end-users of UNICORE interact with all sites and systems in a UNICORE Grid via a graphical Client. • Regardless of the target site for a job, the connection process is always the same: • User unlocks the UNICORE certificate (keystore password). • Client loads plugins configured in the plugin directory. • Client automatically connects to the Grid sites from a user or system–defined list.

  6. Different tasks can be performed • Construct/modify a (batch) job and submit it • Inspect the queued, running/completed jobs and retrieve results. • Perform data transfer or file management functions. • Access functions of the loaded plugins. • Manage the local keystore, i.e. • add/modify user certificates, • select an identity from several certificates, • Select trusted certification authorities for Gateways/plugins.

  7. Job Construction and Submission • Jobs are constructed as a directed acyclic graph (DAG) • Job is constructed using abstract terms, • The end user : • specify on which system a job should run • specify the resourcesrequired to run the task • Save the job on the machine running the client, and later make modifications or submit the job again.

  8. Job Monitoring • UNICORE NJS server tracks the status of jobs/tasks/job groups and can kill or put them on hold. • A task or sub-job is terminated (by job monitor commands, or because a failure), • the user can retrieve the results (standard output, standard error, result files) and transmit them to the local workstation. • After all results have been retrieved, the job information can be purged from the UNICORE system. NJS : Network Job Supervisor

  9. Job States • SUCCESSFUL: job or task has been run successfully. • For a job, this means that all components have run successfully • FAILED: execution of the job group or task has failed. • For a job at least one of the components has failed. • For a task, this means that the task has failed. • The end–user can inspect the status of the components to find out which have failed.

  10. Job States • PENDING: job/task is queued. • This happens if a predecessor task has not yet been executed, or if the whole job has not yet been started at all. • QUEUED: job/task is queued in the target batch system. • This happens if the resources on the target system are used by other jobs which may or may not be UNICORE jobs. • EXECUTING: job or task is currently executing, • For a job group, this is indicated if at least one component is being executed.

  11. File and Data Management • The UNICORE system includes functions to Transmit files between • the local user workstation and file systems • or archives that can be accessed from the UNICORE site. • Perform Unix–style file management functions • copy, move, delete, chmod, etc. on files residing on file systems or archives that can be accessed from the UNICORE site, and generate directory listings.

  12. functionality relevant for the computer centers comprises • Provision of a strong user authentication mechanism (X.509 certificates) • Extensibility by site specific user authentication methods (like SecurID cards, skey one-time passwords, etc.) • Compatibility to the center’s authorization mechanisms and policy (mapping of UNICORE userids to local Unix userids, accounting, disk quotas etc.).

  13. Functionality relevant for the computer centers comprises • Site and system specific incarnation of UNICORE jobs driven by a declarative Incarnation Database that can be adapted to the center’s needs • Declarative description of available resources, • capacity resources(processor count, computation time, memory size) • capability resources (like available software packages and special hardware capabilities). • Special support for applications that simplifies the entry and definition of jobs, or provides steering or control functionality

  14. Unicore Architecture

  15. Unicore Architecture

  16. Unicore Architecture • User is running the UNICORE Client • on a local workstation or PC. • Participating computer center defines UNICORE Grid site(s) • (Usite) that Clients can connect to. • A Usite offers access to computing or data resources. • organized as virtual sites (or Vsites) representing the execution or storage systems. • In the Client, the user addresses the Vsites to submit UNICORE jobs or sub-jobs to.

  17. Unicore Architecture • A list of available UNICORE Gateways • is maintained as an XML document under www.unicore.de, but the user can configure own list of sites. • A user certificate is needed to establish to a Gateway and to sign the submitted jobs from • Job Preparation Agent (JPA) part of the Client. • The user can look for the status of the jobs and for results with • Job Monitor Controller (JMC) part of the Client.

  18. optional firewall Network Job Supervisor (NJS) AJO/UPL Unsafe Internet (SSL) • Manages all submitted jobs, • It performs the user authorization • by looking for a mapping of the user certificate to a valid login in the UNICORE User Data Base (UUDB) • It is the NJS’ task to consider the dependencies between • job components and to schedule the components accordingly • The NJS server stores all job • status and result information, • replies to status/result requests from the client User authentication UNICORE Gateway Safe Intranet (TCP) User mapping,job incarnation, job scheduling Network Job Supervisor (NJS) IDB IDB UUDB Incarnated job Status request Target System Interface(TSI) SV1 Commands files Batch Subsystem

  19. The Network Job Supervisor • The NJS performs the following functions: • Receive UPL requests from the upstream Gateway. • Interpret requests for resource information (GetResources task), • send back the available resources in an array of resource objects. • Interpret the consign of a UNICORE job. UPL: UNICORE protocol layer

  20. The Network Job Supervisor • If a group is encountered • Contact Usite/Vsite with a NJS client certificate, and submit the job group. • Poll for status updates until execution of the job group has been completed. • Run tasks on the local systems • using the Incarnation DB to create a scripts that executes a correct interpretation of the job. • Whenever tasks/job groups have completed execution, save the status information & result files for later retrieval. • Interpret requests to list known jobs or to send job status and job results according to the local knowledge about job/task status.

  21. The Network Job Supervisor • The NJS is customized by a configuration file that is first read at start-up and then repeatedly to pick up any changes. The settings include: • Address & port for the incoming connections from the Gateway • Path to the NJS certificates • Path to Incarnation DB • Path to UNICORE User DB • The addresses and ports of TSI instances are specified in the Incarnation Data Base.

  22. The Network Job Supervisor • The NJS keeps a non-volatile up to date copy of its state and so can be stopped and restarted without loss of executing jobs. • The state is sufficiently up-to-date to allow the NJS to recover from most crashes as well. • NJS provides an administrator interface. • Administrator can list all executing jobs at various levels of details with different selection criteria • Administrator can terminate, hold and resume abstract actions on the NJS.

  23. The Incarnation Database (IDB) • Addresses and ports of TSI instances • Incarnations of abstract commands • Declarations of available capacity/capability resources • For the prescribed capacity resources, three entries must be included: mini,max, and default supported values • The essential capacity resources are: • Processor and node counts • Computation time • Core memory size • Disk space

  24. The Incarnation Database (IDB) • The incarnation of commands the IDB contains a set of rules that map • the abstract commands/options of the abstract tasks • to concrete command sequences/options. • For each abstract task, • the NJS incarnation engine applies the matching rules, until all abstract terms have been translated.

  25. The Incarnation Database (IDB) • The context and application resources are specified as software resources in the IDB • The capability resources indicating the availability of special hardware/software support can be defined: • Storage servers: long-term storage or archive partitions, • Context resources: Parallel execution systems (MPI), debugging support, performance analysis support, • Application resources: Availability of application libraries or of standard applications.

  26. The User Data Base (UUDB) • The User Database (UUDB) is indexed by the user certificates and an optional accounting string • A UUDB may be shared between several Vsites at a UNICORE site. • depends on whether the computer center grants all users in the UUDB the access to all the target systems of the Vsites. • The UUDB returns • information whether the user is authorized to compute on the Vsite and the Unix userid. • tasks to be executed at the Vsite are executed with userid, called the Xlogin.

  27. The User Data Base (UUDB) • The usual model will be to map different U-users to different Xlogins. • several U-users can be mapped to the same Xlogin, • To administrate the User DB, a set of administration commands • adding, listing, changing and removing U-users are available. typical UNICORE user User has to specify Xlogin in job ASP without specificlogin per user

  28. optional firewall Target System Interface (TSI) UNICORE Site List AJO/UPL Unsafe Internet (SSL) • To run jobs on a Vsite at a different Usite, • NJS takes the role of a Client and submits the sub-job to the remote Gateway • Because of the SSL connection this means that a certificate for the NJS itself is also required. • UNICORE Target System Interface (TSI) • accepts incarnated job components from NJS • passes them to the local batch systems • Handles file import and export tasks • implements low–level status reporting and control of batch jobs. User authentication UNICORE Gateway Safe Intranet (TCP) User mapping,job incarnation, job scheduling Network Job Supervisor (NJS) IDB IDB UUDB Incarnated job Status request Target System Interface(TSI) SV1 Commands files Batch Subsystem

  29. optional firewall The Target System Interface UNICORE Site List AJO/UPL Unsafe Internet (SSL) • The NJS server uses the UNICORE Target System Interface (TSI) • to access the actual execution platforms using the UNICORE Target System Interface (TSI). • The TSI provides functionality to interface to a batch subsystem or an interactive shell, or to operate on the file system of the target system. User authentication UNICORE Gateway Safe Intranet (TCP) User mapping,job incarnation, job scheduling Network Job Supervisor (NJS) IDB IDB UUDB Incarnated job Status request Target System Interface(TSI) SV1 Commands files Batch Subsystem

  30. optional firewall The Target System Interface AJO/UPL UNICORE Site • All TSI processes are running as server daemons that communicate with the NJS via plain sockets. • Different sockets are used for the transfer of commands and data. • A TSI worker issues batch sub-system (or shell) commands on behalf of a given user (Xlogin) indicated in the NJS request. • To do this, the TSI worker must have permission to change its effective user and group ids. UNICORE Gateway Safe Intranet (TCP) NJS IDB UUDB ... TSI TSI Blade Any cluster management system

  31. The Target System Interface • Create an execution dir for a job group. • Submit a sequence of concrete commands transmitted as a batch script into a queue. • Execute a transmitted shell script immediately • used to transmit data from/to the execution space of a job. • Create files in the execution space; • data is sent inline. Lookup login for user certificate NJS UUDB Execute job on target system IDB TSI Lookup incarnation rules • Send back the contents of files in the execution space • including spooled files, outcomes, and streamed files. • Send status & result data for a batch job & purge the execution space.

  32. The Target System Interface Lookup login for user certificate NJS UUDB • Monitor the status of UNICORE jobs. • Purge the execution space. • Generate a list of currently known jobs • executing or queued in the batch sub-system. • Abort jobs from the batch sub-system • hold/resume jobs if supported by the batch sub-system. • List the contents of directories on the target platform Execute job on target system IDB TSI Lookup incarnation rules

  33. The Target System Interface • For each command • TSI sends a response back to the NJS. • Result data is sent inline as a stream of bytes. • Starting up, TSI and the NJS do a simple authentication step to • ensure that no fraudulent programs are trying to connect. • TSI is not portable between systems of different vendors. • modifications are restricted to only 3 Perl modules • Configuration of an existing TSI is done by editing a single file. Lookup login for user certificate NJS UUDB Execute job on target system IDB TSI Lookup incarnation rules

  34. optional firewall The Unicore Gateway • Acts as the point of entry for UPL connections. • It performs the following functions • As a SSL server, it accepts incoming SSL connections to a (configurable) port. • It authenticates the user certificate presented by the connecting client against a configurable list of trusted CAs. • It transmits other requests to the Vsite, drawing on a configurable list of known Vsites and the addresses and port numbers of the corresponding NJS or storage servers. Arcon Client Toolkit User Certificate UNICOREPro Client User Certificate Preparation and Control of jobs Runtime Interface Job preparation/control Plugins UNICORE Site List AJO/UPL Unsafe Internet (SSL) AJO/UPL UNICORE Site List User authentication UNICORE Gateway

  35. optional firewall optional firewall The Unicore Gateway • Outside the firewall • Configure communication to the downstream NJS servers • Allow SSL traffic from the gateway tunneling the firewall to the addresses and ports of the NJS servers. • Inside the firewall • Allowing SSL traffic to the known port of the Gateway to cross the firewall. • Communication to the downstream NJS servers can be performed using SSL or using plain sockets, depending on internal security procedures. AJO/UPL Unsafe Internet (SSL) AJO/UPL UNICORE Site User authentication UNICORE Gateway Safe Intranet (TCP) User mapping,job incarnation, job scheduling Network Job Supervisor (NJS) NJS

  36. The Unicore Gateway • If SSL is used for downstream communication • the Gateway will use a client certificate to authenticate with the NJS servers. • The Gateway has a “plugin” interface • that can be configured to handle non-UPL connections and pass them on to other servers. • Connections using a “plugin” have the same authentication, UNICORE protocol, connections.

  37. The UNICORE Security Model

  38. Send User Certificate Send Gateway Certificate Establish SSL Connection The UNICORE Security Model • Based on the use of X.509 certificates for the authentication of both users and software. • certificates are issued by a trusted Certificate Authority (CA) • Authorization of users is performed with mechanisms local to the UNICORE sites. • All communication across the outside (insecure) Internet is based on SSL. Client Gateway Trust Gateway Certificate Issuer? Trust User Certificate Issuer?

  39. AJO AJO User Cert User Cert The UNICORE Security Model Send signed job object over SSL Gateway Client • Support delegation for hierarchical job • the sub–jobs need to be transmitted to their respective target UNICORE sites by the primary NJS • The primary NJS acts as the consigner for the sub–jobs it transfers to the secondary NJSs • The NJS performs authorization on the base of both the consigner and the endorser; the consigner information is passed by the Gateway, and the endorser certificate is contained as part of the sub-job. AJO Certificate== SSL Certificate? Forward signed job object NJS Lookup login for user certificate UUDB Execute job on target system IDB TSI Lookup incarnation rules

  40. The UNICORE Security Model • Authorizing users • Certificates • Mapping abstract identity to concrete local userid • Authentication and Securing Plugins • Client plugins have to be signed by the CA typical UNICORE user User has to specify Xlogin in job ASP without specificlogin per user

  41. The UNICORE Job Model

  42. The UNICORE Job Model • For each job a Vsite is assigned • for execution where all tasks immediately belonging to the job will be run. • sub-jobs of a job group are executed • in no particular order, • parallel execution is supported if the resources are available on the target execution systems.

  43. The Abstract Job Object (AJO) • It contains the following components: • Basic UNICORE protocol layer (UPL) for • Client  Gateway • Gateway  Server communication (on top of SSL) • Resource object hierarchy for • the specification of resource requests (need certain resources) • the available resources (can provide certain resources) • Object hierarchy of abstract actions containing the definition of all supported tasks • user and system tasks • abstract UNICORE (sub-)jobs.

  44. The UNICORE Job Model • A UNICORE job on user level is modeled as a directed acyclic graph (DAG) of abstract actions • Abstract tasks, • plus a resource request. • The DAG is signed with the user’s private key preventing tampering with the job definition while it is being transmitted.

  45. ExecuteScriptTask UserTask ImportTask/ExportTask: TransferTask: File operation tasks like CopyFile, RenameFile, DeleteFile, CreateDirectory, ChangePermissions ListVites GetResourceDescription GetJobs: GetActionStatus RetrieveOutcome ControlAction If, RepeatGroup, ForGroup abstract tasks

  46. The UNICORE Protocol

  47. The UNICORE Protocol (UPL) • The UNICORE Protocol Layer (UPL) defines how data is sent between UNICORE Clients and Servers so that: • Clients ask Servers to execute AJOs • Clients get information about the services offered • UPL is a protocol level that is layered on top of existing transport protocols such as SSL.

  48. Requests and Replies have two parts; a serialized Java object an optional stream of bytes. UPL is a protocol level that is layered on top of existing transport protocols such as SSL. ConsignJob. ConsignJob. ConsignJobReply. RetrieveOutcome. RetrieveOutcomeReply. RetrieveOutcomeReply. RetrieveOutcomeAckReply. ListVsites. ListVsitesReply. The UNICORE Protocol

  49. Unicore Protocols • Protocol used for File Transfer and Management is morecomplex than the UPL Protocol used for job submission. • The major reasons are the need • Synchronous communication: as the user initiates commands on the (Storage) Server and expects an immediate reply • A multi stage protocol to prevent the user of sending gigabytes of data to the Storage Server that cannot be stored as the user has no write privileges.

  50. Unicore Protocols • Request data is stored within a Java Object that is digitally signed by the user. • Java Object is serialized and send using the Secure Socket Layer (SSL) protocol via the Gateway to the (SAFE) Server • When the request entails the transfer of a set of files from or to the server, an extended format is needed. • As the file data can be very large they cannot be included within a java object and serialised/de-serialiased.

More Related