1 / 20

Co-allocation Using HARC IV. ResourceManagers

Co-allocation Using HARC IV. ResourceManagers. HARC Workshop University of Manchester. Philosophy. New types of RMs can be written by others Existing RMs can be customized Interfaces can be enhanced or changed None of this means changing the acceptor code API is extensible too

lacy
Download Presentation

Co-allocation Using HARC IV. ResourceManagers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Co-allocation Using HARCIV. ResourceManagers HARC WorkshopUniversity of Manchester

  2. Philosophy • New types of RMs can be written by others • Existing RMs can be customized • Interfaces can be enhanced or changed • None of this means changing the acceptor code • API is extensible too • Good community contribution model • CCT keeps control of the acceptor code • The acceptor code will become very stable (already less than one commit per month) • The community evolves the system

  3. Are RMs Easy to Install • Harder than client software • Much easier than Acceptors • Complexity is in the right place: • Only a few people install and configure Acceptors (infrastructure), which is hard • Some people modify/write RMs, which is not too hard • More people install and configure RMs which is easy • Many people install and configure the Client software, which is trivial

  4. Pre-installation - Perl • RMs are written in perl, to make installation trivial • However, they need a large number of CPAN modules to be installed • Some of these, e.g. Net::SSLeay and Crypt::SSLeay are not trivial • There is a document which contains things to watch out for • Lists previously seen problems, with solutions • Basically a list of exceptions • Now 7 pages of text! • There’s a lot of AIX content...

  5. Pre-installation - Certificate • HARC RM needs a certificate • We don’t recommend re-using the host certificate • Get a service certificate • UK e-Science CA now supports: • harccrm for Compute RMs (CRMs) • /C=UK/O=eScience/OU=Manchester/L=MC/CN=harccrm/man2.nw-grid.ac.uk/emailAddress=... • harcacceptor for Acceptors • /C=UK/O=eScience/OU=Manchester/L=MC/CN=harcacceptor/man4.nw-grid.ac.uk/emailAddress=....

  6. Installation Procedure • There’s an installer which installs stuff from the CVS tree - this may change • HARC environment variable points to the root of the repo (“negotiation” directory) • You have a subdirectory in • $HARC/rm-service/config • For example • $HARC/rm-service/config/nw-grid/man2

  7. Installation Procedure 1. Create Contents • install.config - more shortly • grid-mapfile - GT-style mapfile for cert to username mapping (usually a sym-link to /etc/grid-security/grid-mapfile) • acceptor_mapfile - a list of the Acceptor DNs, and also their CA cert DNs • cacerts directory, containing CA Certs for your cert and the Acceptor certs, in PEM format, suffix .crt 2. Then a trivial Install • install-rm nw-grid/man2 /usr/local/man2-rm

  8. install.config RM_INNER_TYPE=SimpleCompute RM_COMPUTE_NODENAME=man2.nw-grid.ac.uk RM_COMPUTE_BATCH_TYPE=TorqueMaui RM_COMPUTE_MEMORY_MB_PER_CPU=4096 RM_COMPUTE_CPUS=8 RM_MAUI_COMMAND_DIR=/usr/local/maui/bin RM_RESOURCE_DESCRIPTION='The Manchester NW-Grid node, a Dual AMD Opteron Linux cluster’ RM_HOST=130.88.200.242 RM_URL=man2-rm RM_PORT=9393

  9. install.config RM_INNER_TYPE=SimpleCompute RM_COMPUTE_NODENAME=man2.nw-grid.ac.uk RM_COMPUTE_BATCH_TYPE=TorqueMaui RM_COMPUTE_MEMORY_MB_PER_CPU=4096 RM_COMPUTE_CPUS=8 RM_MAUI_COMMAND_DIR=/usr/local/maui/bin RM_RESOURCE_DESCRIPTION='The Manchester NW-Grid node, a Dual AMD Opteron Linux cluster’ RM_HOST=130.88.200.242 RM_URL=man2-rm RM_PORT=9393 <Resource> <Compute>man2.nw-grid.ac.uk</Compute> <Endpoint type=“REST”> <RESTEndpoint>https://man2.nw-grid.ac.uk:9393/man2-rm/</RESTEndpoint> </Endpoint> </Resource>

  10. Installation Step • Before Installing • Need PERL5LIB and LD_LIBRARY_PATH to be defined in your environment when you install • Or can add these to the config file • Don’t have to set these if you don’t need to • Then a trivial Install • install-rm nw-grid/man2 /usr/local/man2-rm • Script is in $HARC/rm-service/scripts • What does this do?

  11. What happens? • Installs Source files • Creates a crontab & scripts for restarting the RM • Customizes some scripts for stopping/starting the RM • Installs and hashes CA certificates • Output: rm-service $ scripts/install-rm nw-grid/man2 /Users/jonmaclaren/man2-rm Makefile.crt ... Skipped cct-ca.crt ... 5fb2fc80.0 old-uk-escience-ca.crt ... 01621954.0 uk-escience-ca.crt ... adcbc9ef.0 uk-escience-root.crt ... 8175c1cd.0 Notice: Don't forget to place your certificate and key files at: /Users/jonmaclaren/man2-rm/x509/server_cert.pem /Users/jonmaclaren/man2-rm/x509/server_key.pem

  12. What’s in /usr/local/man2-rm ? • Some Perl Modules • And OuterRM.pl which gets run • commands - which configures and runs the RM (based on install.config, etc.) • rerun - runs “commands” in the background from crontab • crontab - crontab line which can be added directly to your crontab (don’t cut and paste!) • start-rm, stop-rm - control whether rerun will actually start the RM, using a control file (.do_not_restart) • ./stop-rm • ./start-rm [ -w ] • x509 - subdirectory containing all the CA certs, mapfiles, etc.

  13. Perl Modules • Just an overview here... • There is a doc online which has some details on these

  14. Key Modules • OuterRM - just does the HTTP listening and Acceptor Cert authN/authZ • MainLoop - handles each request • TransactionManager - remembers what transactions (by TID) are running, and what their states are • InnerRM - the main class for different types of RM • SimpleComputeRM • SimpleNetworkRM • Both inherit from InnerRM

  15. SimpleComputeRM • Handles batch queue systems • Deals only with processors/memory • To talk to the scheduler, a subclass of SCBatch is used • SCBatchTorqueMaui.pm • SCBatchTorqueMoab.pm • SCBatchLoadLeveler.pm - not in CVS yet... • Chosen at runtime - RM_COMPUTE_BATCH_TYPE • Simple modules • Less than 200 lines • Override • initialize • makeReservation • cancelReservation • getStatus

  16. Customizing InnerRM • Startup/shutdown • initialize/remove • Parsing (validating) the XML • parseResourceElement • parseWorkElement • maybe parseScheduleElement • Co-allocation • tryMakeAction • tryCancelAction • addResourceBookings • completeTransactionBookings • Others for getTimetable/getStatus

  17. Steps for creating a new RM • Design your XML • Resource element • Work element • Create a new subclass of InnerRM.pm • Use the utility classes where possible • To extend the API, create subclasses of • Resource.java • Work.java

  18. Caveats for RMs • Need to restart to re-read grid-mapfile • When restarted, they forget the bookings • Want to add persistence so that it’s trivial for RM developers to utilize • Thread handling needs work (soon!)

  19. What’s next? • Discussion on MPIg... • Beer?

  20. But first... ...Any Questions?

More Related