1 / 12

Managing HPC Allocations and Users with Perl

Learn how to effectively manage HPC allocations and users using Perl scripting. This guide covers challenges, multiple clusters, different schedulers, audit trails, project types, and code structure.

larryrouse
Download Presentation

Managing HPC Allocations and Users with Perl

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Tom Payerle payerle@umd.edu ACIGS Division of Information Technology University of Maryland Managing HPC Allocations and Users with Perl

  2. HPC Management Challenges • May have multiple clusters • Possibly with different schedulers • Other differences? • Many allocations and users • many-to-many correlation between users and allocations? • Multiple “types” of allocations? • Need for audit trail • why was this done and when?

  3. Scripting provides: • Reproducibility/Consistency • Easier usage • Complex/multistep processes with single command • Sanity checking • Delegation to junior staff? • Logging provides audit trail • Who did what, when, and why?

  4. UMD Project Types • ‘coop’: for “paid” projects • Standard and high priority allocation accounts • Replenished quarterly/monthly • Each quarter, full quarterly SUs given to std pri account • Each month, 1/3 of quarterly amount xfer std => hi • Can use monthly allotment as hi pri, or borrow w/in quarter at std priority • Can access scavenger and debug partitions as well • Can access longer walltime QoSes • ‘grants’: “unpaid” • One time grant of SUs, alloc expires in 1 year • Can access scavenger and debug partitions as well • ‘condo’: unmetered usage of limited # of nodes • No access to scavenger partition • ‘organization unit’: “dummy” project for organizing things

  5. Would expand to following sacctmgr commands: • sacctmgr -i add user account=tptest user=kevin cluster=dt partition=standard qos=narrow-long,narrow-med,... • sacctmgr -i add user account=tptest-hi user=kevin cluster=dt partition=high-priority qos=narrow-long,narrow-med,... • sacctmgr -i add user account=tptest user=kevin cluster=dt partition=debug qos=debug • sacctmgr -i add user account=tptest-hi user=kevin cluster=dt partition=debug qos=debug • sacctmgr -i add user account=tptest user=kevin cluster=dt partition=scavenger qos=scavenger • sacctmgr -i add user account=tptest-hi user=kevin cluster=dt partition=scavenger qos=scavenger

  6. Code Structure • Most stuff done in Perl Modules • Mostly OO • Perl scripts for the frontend • Almost everything requires giving a reason for logs • Major components: • Interface to Projects DB • Classes for each cluster, contain info re individual quirks • Utility modules to do the low level work • Modules to orchestrate the low level tasks

  7. Glue::HPCC::Cluster Class • Base class, defaults for all clusters • Subclasses for each cluster • Defines: • Type of scheduler (e.g. Slurm) • Where the Projects DB for cluster is • Where home directories, data directories go • Any other cluster specific information

  8. Utility Classes • Wrappers around Slurm cmd line utils • Slurm::Sacctmgr*, Slurm::Sshare*, Slurm::Squeue, Slurm::Scontrol, Slurm::Sinfo, Slurm::Sacct • * available on CPAN • Wrappers around Unix commands • Unix user/group management • Query if user exists, is in Unix group, etc • Create home directory, add/remove from Unix group, etc • Netgroup utilities (add/remove/query user to/from netgroup) • Mail utilities (basically send templated emails) • Etc.

  9. ClusterUtil::* Classes –-Higher Level Functionality • Split into User and Project subclasses • Split into Scheduler dependent and non-Scheduler dependent subclasses • Currently only Slurm scheduler supported (+ Dummy class) • Call ClusterUtil::Project and/or ClusterUtil::User routines with a HPCC class instance • Main class takes care of non-scheduler specific tasks • ClusterUtil::Slurm::Project/User takes care of scheduler specific tasks • Branch out based on Project type as needed • Create/delete/update/replenish projects • Add/remove users from projects/cluster

  10. Questions?

More Related