Co-allocation Using HARCIV. ResourceManagers HARC WorkshopUniversity of Manchester
Philosophy • New types of RMs can be written by others • Existing RMs can be customized • Interfaces can be enhanced or changed • None of this means changing the acceptor code • API is extensible too • Good community contribution model • CCT keeps control of the acceptor code • The acceptor code will become very stable (already less than one commit per month) • The community evolves the system
Are RMs Easy to Install • Harder than client software • Much easier than Acceptors • Complexity is in the right place: • Only a few people install and configure Acceptors (infrastructure), which is hard • Some people modify/write RMs, which is not too hard • More people install and configure RMs which is easy • Many people install and configure the Client software, which is trivial
Pre-installation - Perl • RMs are written in perl, to make installation trivial • However, they need a large number of CPAN modules to be installed • Some of these, e.g. Net::SSLeay and Crypt::SSLeay are not trivial • There is a document which contains things to watch out for • Lists previously seen problems, with solutions • Basically a list of exceptions • Now 7 pages of text! • There’s a lot of AIX content...
Pre-installation - Certificate • HARC RM needs a certificate • We don’t recommend re-using the host certificate • Get a service certificate • UK e-Science CA now supports: • harccrm for Compute RMs (CRMs) • /C=UK/O=eScience/OU=Manchester/L=MC/CN=harccrm/man2.nw-grid.ac.uk/emailAddress=... • harcacceptor for Acceptors • /C=UK/O=eScience/OU=Manchester/L=MC/CN=harcacceptor/man4.nw-grid.ac.uk/emailAddress=....
Installation Procedure • There’s an installer which installs stuff from the CVS tree - this may change • HARC environment variable points to the root of the repo (“negotiation” directory) • You have a subdirectory in • $HARC/rm-service/config • For example • $HARC/rm-service/config/nw-grid/man2
Installation Procedure 1. Create Contents • install.config - more shortly • grid-mapfile - GT-style mapfile for cert to username mapping (usually a sym-link to /etc/grid-security/grid-mapfile) • acceptor_mapfile - a list of the Acceptor DNs, and also their CA cert DNs • cacerts directory, containing CA Certs for your cert and the Acceptor certs, in PEM format, suffix .crt 2. Then a trivial Install • install-rm nw-grid/man2 /usr/local/man2-rm
install.config RM_INNER_TYPE=SimpleCompute RM_COMPUTE_NODENAME=man2.nw-grid.ac.uk RM_COMPUTE_BATCH_TYPE=TorqueMaui RM_COMPUTE_MEMORY_MB_PER_CPU=4096 RM_COMPUTE_CPUS=8 RM_MAUI_COMMAND_DIR=/usr/local/maui/bin RM_RESOURCE_DESCRIPTION='The Manchester NW-Grid node, a Dual AMD Opteron Linux cluster’ RM_HOST=184.108.40.206 RM_URL=man2-rm RM_PORT=9393
install.config RM_INNER_TYPE=SimpleCompute RM_COMPUTE_NODENAME=man2.nw-grid.ac.uk RM_COMPUTE_BATCH_TYPE=TorqueMaui RM_COMPUTE_MEMORY_MB_PER_CPU=4096 RM_COMPUTE_CPUS=8 RM_MAUI_COMMAND_DIR=/usr/local/maui/bin RM_RESOURCE_DESCRIPTION='The Manchester NW-Grid node, a Dual AMD Opteron Linux cluster’ RM_HOST=220.127.116.11 RM_URL=man2-rm RM_PORT=9393 <Resource> <Compute>man2.nw-grid.ac.uk</Compute> <Endpoint type=“REST”> <RESTEndpoint>https://man2.nw-grid.ac.uk:9393/man2-rm/</RESTEndpoint> </Endpoint> </Resource>
Installation Step • Before Installing • Need PERL5LIB and LD_LIBRARY_PATH to be defined in your environment when you install • Or can add these to the config file • Don’t have to set these if you don’t need to • Then a trivial Install • install-rm nw-grid/man2 /usr/local/man2-rm • Script is in $HARC/rm-service/scripts • What does this do?
What happens? • Installs Source files • Creates a crontab & scripts for restarting the RM • Customizes some scripts for stopping/starting the RM • Installs and hashes CA certificates • Output: rm-service $ scripts/install-rm nw-grid/man2 /Users/jonmaclaren/man2-rm Makefile.crt ... Skipped cct-ca.crt ... 5fb2fc80.0 old-uk-escience-ca.crt ... 01621954.0 uk-escience-ca.crt ... adcbc9ef.0 uk-escience-root.crt ... 8175c1cd.0 Notice: Don't forget to place your certificate and key files at: /Users/jonmaclaren/man2-rm/x509/server_cert.pem /Users/jonmaclaren/man2-rm/x509/server_key.pem
What’s in /usr/local/man2-rm ? • Some Perl Modules • And OuterRM.pl which gets run • commands - which configures and runs the RM (based on install.config, etc.) • rerun - runs “commands” in the background from crontab • crontab - crontab line which can be added directly to your crontab (don’t cut and paste!) • start-rm, stop-rm - control whether rerun will actually start the RM, using a control file (.do_not_restart) • ./stop-rm • ./start-rm [ -w ] • x509 - subdirectory containing all the CA certs, mapfiles, etc.
Perl Modules • Just an overview here... • There is a doc online which has some details on these
Key Modules • OuterRM - just does the HTTP listening and Acceptor Cert authN/authZ • MainLoop - handles each request • TransactionManager - remembers what transactions (by TID) are running, and what their states are • InnerRM - the main class for different types of RM • SimpleComputeRM • SimpleNetworkRM • Both inherit from InnerRM
SimpleComputeRM • Handles batch queue systems • Deals only with processors/memory • To talk to the scheduler, a subclass of SCBatch is used • SCBatchTorqueMaui.pm • SCBatchTorqueMoab.pm • SCBatchLoadLeveler.pm - not in CVS yet... • Chosen at runtime - RM_COMPUTE_BATCH_TYPE • Simple modules • Less than 200 lines • Override • initialize • makeReservation • cancelReservation • getStatus
Customizing InnerRM • Startup/shutdown • initialize/remove • Parsing (validating) the XML • parseResourceElement • parseWorkElement • maybe parseScheduleElement • Co-allocation • tryMakeAction • tryCancelAction • addResourceBookings • completeTransactionBookings • Others for getTimetable/getStatus
Steps for creating a new RM • Design your XML • Resource element • Work element • Create a new subclass of InnerRM.pm • Use the utility classes where possible • To extend the API, create subclasses of • Resource.java • Work.java
Caveats for RMs • Need to restart to re-read grid-mapfile • When restarted, they forget the bookings • Want to add persistence so that it’s trivial for RM developers to utilize • Thread handling needs work (soon!)
What’s next? • Discussion on MPIg... • Beer?
But first... ...Any Questions?