Download
co allocation using harc iv resourcemanagers n.
Skip this Video
Loading SlideShow in 5 Seconds..
Co-allocation Using HARC IV. ResourceManagers PowerPoint Presentation
Download Presentation
Co-allocation Using HARC IV. ResourceManagers

Co-allocation Using HARC IV. ResourceManagers

127 Views Download Presentation
Download Presentation

Co-allocation Using HARC IV. ResourceManagers

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Co-allocation Using HARCIV. ResourceManagers HARC WorkshopUniversity of Manchester

  2. Philosophy • New types of RMs can be written by others • Existing RMs can be customized • Interfaces can be enhanced or changed • None of this means changing the acceptor code • API is extensible too • Good community contribution model • CCT keeps control of the acceptor code • The acceptor code will become very stable (already less than one commit per month) • The community evolves the system

  3. Are RMs Easy to Install • Harder than client software • Much easier than Acceptors • Complexity is in the right place: • Only a few people install and configure Acceptors (infrastructure), which is hard • Some people modify/write RMs, which is not too hard • More people install and configure RMs which is easy • Many people install and configure the Client software, which is trivial

  4. Pre-installation - Perl • RMs are written in perl, to make installation trivial • However, they need a large number of CPAN modules to be installed • Some of these, e.g. Net::SSLeay and Crypt::SSLeay are not trivial • There is a document which contains things to watch out for • Lists previously seen problems, with solutions • Basically a list of exceptions • Now 7 pages of text! • There’s a lot of AIX content...

  5. Pre-installation - Certificate • HARC RM needs a certificate • We don’t recommend re-using the host certificate • Get a service certificate • UK e-Science CA now supports: • harccrm for Compute RMs (CRMs) • /C=UK/O=eScience/OU=Manchester/L=MC/CN=harccrm/man2.nw-grid.ac.uk/emailAddress=... • harcacceptor for Acceptors • /C=UK/O=eScience/OU=Manchester/L=MC/CN=harcacceptor/man4.nw-grid.ac.uk/emailAddress=....

  6. Installation Procedure • There’s an installer which installs stuff from the CVS tree - this may change • HARC environment variable points to the root of the repo (“negotiation” directory) • You have a subdirectory in • $HARC/rm-service/config • For example • $HARC/rm-service/config/nw-grid/man2

  7. Installation Procedure 1. Create Contents • install.config - more shortly • grid-mapfile - GT-style mapfile for cert to username mapping (usually a sym-link to /etc/grid-security/grid-mapfile) • acceptor_mapfile - a list of the Acceptor DNs, and also their CA cert DNs • cacerts directory, containing CA Certs for your cert and the Acceptor certs, in PEM format, suffix .crt 2. Then a trivial Install • install-rm nw-grid/man2 /usr/local/man2-rm

  8. install.config RM_INNER_TYPE=SimpleCompute RM_COMPUTE_NODENAME=man2.nw-grid.ac.uk RM_COMPUTE_BATCH_TYPE=TorqueMaui RM_COMPUTE_MEMORY_MB_PER_CPU=4096 RM_COMPUTE_CPUS=8 RM_MAUI_COMMAND_DIR=/usr/local/maui/bin RM_RESOURCE_DESCRIPTION='The Manchester NW-Grid node, a Dual AMD Opteron Linux cluster’ RM_HOST=130.88.200.242 RM_URL=man2-rm RM_PORT=9393

  9. install.config RM_INNER_TYPE=SimpleCompute RM_COMPUTE_NODENAME=man2.nw-grid.ac.uk RM_COMPUTE_BATCH_TYPE=TorqueMaui RM_COMPUTE_MEMORY_MB_PER_CPU=4096 RM_COMPUTE_CPUS=8 RM_MAUI_COMMAND_DIR=/usr/local/maui/bin RM_RESOURCE_DESCRIPTION='The Manchester NW-Grid node, a Dual AMD Opteron Linux cluster’ RM_HOST=130.88.200.242 RM_URL=man2-rm RM_PORT=9393 <Resource> <Compute>man2.nw-grid.ac.uk</Compute> <Endpoint type=“REST”> <RESTEndpoint>https://man2.nw-grid.ac.uk:9393/man2-rm/</RESTEndpoint> </Endpoint> </Resource>

  10. Installation Step • Before Installing • Need PERL5LIB and LD_LIBRARY_PATH to be defined in your environment when you install • Or can add these to the config file • Don’t have to set these if you don’t need to • Then a trivial Install • install-rm nw-grid/man2 /usr/local/man2-rm • Script is in $HARC/rm-service/scripts • What does this do?

  11. What happens? • Installs Source files • Creates a crontab & scripts for restarting the RM • Customizes some scripts for stopping/starting the RM • Installs and hashes CA certificates • Output: rm-service $ scripts/install-rm nw-grid/man2 /Users/jonmaclaren/man2-rm Makefile.crt ... Skipped cct-ca.crt ... 5fb2fc80.0 old-uk-escience-ca.crt ... 01621954.0 uk-escience-ca.crt ... adcbc9ef.0 uk-escience-root.crt ... 8175c1cd.0 Notice: Don't forget to place your certificate and key files at: /Users/jonmaclaren/man2-rm/x509/server_cert.pem /Users/jonmaclaren/man2-rm/x509/server_key.pem

  12. What’s in /usr/local/man2-rm ? • Some Perl Modules • And OuterRM.pl which gets run • commands - which configures and runs the RM (based on install.config, etc.) • rerun - runs “commands” in the background from crontab • crontab - crontab line which can be added directly to your crontab (don’t cut and paste!) • start-rm, stop-rm - control whether rerun will actually start the RM, using a control file (.do_not_restart) • ./stop-rm • ./start-rm [ -w ] • x509 - subdirectory containing all the CA certs, mapfiles, etc.

  13. Perl Modules • Just an overview here... • There is a doc online which has some details on these

  14. Key Modules • OuterRM - just does the HTTP listening and Acceptor Cert authN/authZ • MainLoop - handles each request • TransactionManager - remembers what transactions (by TID) are running, and what their states are • InnerRM - the main class for different types of RM • SimpleComputeRM • SimpleNetworkRM • Both inherit from InnerRM

  15. SimpleComputeRM • Handles batch queue systems • Deals only with processors/memory • To talk to the scheduler, a subclass of SCBatch is used • SCBatchTorqueMaui.pm • SCBatchTorqueMoab.pm • SCBatchLoadLeveler.pm - not in CVS yet... • Chosen at runtime - RM_COMPUTE_BATCH_TYPE • Simple modules • Less than 200 lines • Override • initialize • makeReservation • cancelReservation • getStatus

  16. Customizing InnerRM • Startup/shutdown • initialize/remove • Parsing (validating) the XML • parseResourceElement • parseWorkElement • maybe parseScheduleElement • Co-allocation • tryMakeAction • tryCancelAction • addResourceBookings • completeTransactionBookings • Others for getTimetable/getStatus

  17. Steps for creating a new RM • Design your XML • Resource element • Work element • Create a new subclass of InnerRM.pm • Use the utility classes where possible • To extend the API, create subclasses of • Resource.java • Work.java

  18. Caveats for RMs • Need to restart to re-read grid-mapfile • When restarted, they forget the bookings • Want to add persistence so that it’s trivial for RM developers to utilize • Thread handling needs work (soon!)

  19. What’s next? • Discussion on MPIg... • Beer?

  20. But first... ...Any Questions?