douglas jacobsen bioinformatics computing consultant n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Douglas Jacobsen Bioinformatics Computing Consultant PowerPoint Presentation
Download Presentation
Douglas Jacobsen Bioinformatics Computing Consultant

Loading in 2 Seconds...

play fullscreen
1 / 46

Douglas Jacobsen Bioinformatics Computing Consultant - PowerPoint PPT Presentation


  • 74 Views
  • Uploaded on

Genepool Modules Setting up your environment at NERSC. Douglas Jacobsen Bioinformatics Computing Consultant. Topics. UNIX Environment Basics Constructing a default environment, dotfiles Introduction to Modules Extension to Modules – ModulesReloaded Using modules interactively

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Douglas Jacobsen Bioinformatics Computing Consultant' - sabine


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
topics
Topics

UNIX Environment Basics

Constructing a default environment, dotfiles

Introduction to Modules

Extension to Modules – ModulesReloaded

Using modules interactively

Using modules in a batch job

Constructing basic modules for your software

Constructing pipeline modules

motivation for this training
Motivation for this training
  • Most-common tickets at NERSC are issues with environment settings
  • /jgi/tools is being retired;old settings need to be changed!
  • The modules system on genepool has been updated to ease the transition and future production work
  • Examples modulefiles in:
    • /global/projectb/shared/data/training/modules
the unix environment
The UNIX Environment
  • What is it? Key/value store for every process
  • What does the UNIX environment do for you?
    • controls which programs you can easily run
      • PATH
      • Many linux systems have default PATH of:

PATH = /usr/local/bin:/usr/bin:/bin

    • Sets up linking paths to allow your programs to run
      • LD_LIBRARY_PATH
    • Controls how your programs run
      • MANPATH, PKG_CONFIG_PATH, PS1, OMPI_MCA_ras
      • Really the environment is a way for you to communicate with your programs
    • Useful convenience variables on the command line and scripts:
      • SCRATCH, NERSC_HOST, BOOST_ROOT
the unix environment the rules
The UNIX Environment: The Rules

init

memtime

bash

ls

perl

$data = `cat $file | sort `

blastx

/bin/sh

cat

sort

Each process has its own environment

Each process can manipulate it’s own environment but no others

A child process inherits its parent’s environment

A “login” shell reads special “dotfiles” which may reset parts of the environment

looking at the environment
Looking at the environment

$ env# dump the whole environment

$ echo $NERSC_HOST # just see NERSC_HOST

$ echo $PATH # view the compound variable PATH

$ env | grep MODULE # just variables with ‘MODULE’

We’ll be looking at the environment a lot today, these are two easy ways to interrogate the environment from either bash or tcsh

What shell are you using? (hint, check $SHELL)

changing the environment
Changing the environment
  • bash (default on genepool)

export MYVAR=“test” # when writing, don’t use ‘$’

echo $MYVAR # when reading, use ‘$’

export PATH=$HOME/bin:$PATH # prepend your PATH

export MYVAR=“${MYVAR}2” # append ‘2’ to MYVAR

  • tcsh

setenv MYVAR “test”

Echo $MYVAR

setenv PATH $HOME/bin:$PATH

setenv MYVAR “${MYVAR}2”

nersc dotfiles your default environment pt 1
NERSC Dotfiles – Your default Environment Pt 1
  • When you first login (or a batch script runs), a login shell is executed
    • A login shell is generated for every job – even if you transmit your environment, the login shell environment is overlayed on top of the transmitted environment
  • A login shell sources special files in your home directory, your dotfiles
  • bash users (files evaluated in this order):
    • $HOME/.profile (read-only symlink, do not change)
    • $HOME/.bash_profile.ext(user customizable)
    • $HOME/.bashrc(read-only symlink, do not change)
    • $HOME/.bashrc.ext (user customizable)
  • tcsh users (files evaluated in this order):
    • $HOME/.tcshrc(read-only symlink, do not change)
    • $HOME/.tcshrc.ext (user customizable)
    • $HOME/.login(read-only symlink, do not change)
    • $HOME/.login.ext (user customizable)
  • zsh, kshexecute some dotfiles, but NERSC support is being phased out
  • /bin/sh does not properly source the dotfiles(BEWARE!)
using software and the unix environment
Using Software and the UNIX Environment
  • Providing large-scale installations of software for many different users on an HPC system presents a number of challenges:
    • Different users need different software, use different shells
    • Some users need different specific versions, including older versions
    • All users need to access the software quickly and easily from “everywhere” [network-mounted, non-standard paths]
    • Providing a user interface for accessing that software can be challenging
      • Example: How would you use software installed in

/usr/common/jgi/aligners/blast+/2.2.28

      • Answer:
        • Add /usr/common/jgi/aligners/blast+/2.2.28/bin to PATH;
        • csh: setenv PATH /usr/common/jgi/aligners/blast+/2.2.28/bin:$PATH
        • bash: export PATH=/usr/common/jgi/aligners/blast+/2.2.28/bin:$PATH
what are modules
What are Modules?

A “module” is something that can be loaded or unloaded dynamically into the environment.

Modules have a name

Modules can have a default version

Modules have a version

can have many versions

To refer to the default version of a module, use: <name>

e.g. module load gcc

To refer to a specific version of a module, use: <name>/<version>

e.g. module load gcc/4.8.1

modules interactive example
Modules Interactive Example
  • Basic Commands:

module load <module id> [<module id> …] Load a module

module unload <module id> [<module id> …] Remove a module

module list List all loaded modules

module show <module id> See module effects

module avail See all modules

module purge Remove all modules

  • Try the following:
    • Load the default blast+ module
    • Load the latest version of the hdf5 module (hint: not default)
    • Unload the above modules but leave the rest intact
    • What effects does the jgitools module have?
    • What versions of RSeQC are available on genepool? (try using grep)
  • Why didn’t grep work for the last step?
    • module avail | grepRSeQCwon’t work
    • module communicates with you on stderr (stdout is used internally)
slide12

More awkward in tcsh, but possible:

( module –t avail ) | & grepRSeQC

dmj@genepool02:~$ module list

Currently Loaded Modulefiles:

1) modules 7) mysql/5.0.96

2) nsg/1.2.0 8) PrgEnv-gnu/4.6

3) uge/8.0.1 9) perl/5.16.0

4) jgitools/1.2.0 10) readline/6.2

5) oracle_client/11.2.0.3.0 11) python/2.7.4

6) gcc/4.6.3 12) usg-default-modules/1.4

dmj@genepool02:~$ module load blast+

dmj@genepool02:~$ module load hdf5/1.8.11

dmj@genepool02:~$ module list

Currently Loaded Modulefiles:

1) modules 8) PrgEnv-gnu/4.6

2) nsg/1.2.0 9) perl/5.16.0

3) uge/8.0.1 10) readline/6.2

4) jgitools/1.2.0 11) python/2.7.4

5) oracle_client/11.2.0.3.0 12) usg-default-modules/1.4

6) gcc/4.6.3 13) blast+/2.2.26

7) mysql/5.0.96 14) hdf5/1.8.11

dmj@genepool02:~$ module unload blast+ hdf5

dmj@genepool02:~$ module list

Currently Loaded Modulefiles:

1) modules 7) mysql/5.0.96

2) nsg/1.2.0 8) PrgEnv-gnu/4.6

3) uge/8.0.1 9) perl/5.16.0

4) jgitools/1.2.0 10) readline/6.2

5) oracle_client/11.2.0.3.0 11) python/2.7.4

6) gcc/4.6.3 12) usg-default-modules/1.4

dmj@genepool02:~$ module -t avail 2>&1 | grepRSeQC

RSeQC/2.3.2

RSeQC/2.3.6(default)

dmj@genepool02:~$

basic modules functionality
Basic Modules Functionality
  • Modules manipulate the environment
    • Loading can:
      • Set an environment variable (possibly by replacing)
      • Append (or prepend) to a compound environment variable
      • Unset an environment variable
      • *can* execute a command (not recommended if the command changes the state of the system)
    • ‘module unload’ reverses the effects of the ‘module load’
    • Which effects of a module might be irreversible?
      • Answer:
        • setenv won’t restore the environment to its original state
        • multiple modules calling ‘setenv’ or ‘unsetenv’ on the same variable might lead to an inconsistent state (those modules should conflict)
        • Executing system calls which change system state (e.g. xhost) are not trivially reversible by unloading the module
modules conflicting and swapping
Modules: conflicting and swapping
  • Some modules are incompatible
    • E.g. both wublast and blast+ provide different blastn, blastx, etc. executables
    • To prevent these modules from being simultaneously loaded, they conflict

dmj@genepool02:~$ module load wublast

dmj@genepool02:~$ module load blast+

blast+/2.2.26(25):ERROR:150: Module 'blast+/2.2.26' conflicts with the currently loaded module(s) 'wublast/20060510’

  • Most of the time, only a single version of a module should be loaded at a time:
    • e.g., doesn’t make sense to load more than one version of gcc
    • Try:

module purge ## cleans everything out

module load gcc

Module load gcc/4.8.1

    • Error? to change from gcc/4.6.3 (the default) to gcc/4.8.1 (the latest), swap!

module swap gccgcc/4.8.1-or-module swap gcc/4.8.1

setting up your own modules
Setting up your own modules
  • Modules are described by modulefiles
    • One version per modulefile, in a directory named for the module;
    • Collections of modules are found in $MODULEPATH
    • Try looking at $MODULEPATH
    • Add your own modules directory:

genepool$ mkdir $HOME/modules

genepool$ mkdir $HOME/modules/my_first_module

genepool$ module use $HOME/modules

      • Try looking at $MODULEPATH again

genepool$ module avail my_first_module

    • Why doesn’t it show up?
      • No modulefiles installed yet… next slide.
simple modulefile too simple
Simple modulefile (TOO SIMPLE)

Modulefiles are written in (somewhat overloaded) TCL.

Module identifier string (REQ)

#%Module1.0

##

## Required internal variables

set name gcc

set version 4.6.3

set root /usr/common/usg/languages/$name/$version\_1

## List conflicting modules here

conflict $name

## Software-specific settings exported to user environment

prepend-path PATH $root/bin

prepend-path LD_LIBRARY_PATH $root/lib

prepend-path LD_LIBRARY_PATH $root/lib64

prepend-path PKG_CONFIG_PATH $root/lib/pkgconfig

setenv GCC_DIR $root

Comment

}

Internal variables

Don’t load more than one gcc!

}

The actual environment

adjustments

WARNING: This example is simplified, do not use in production on genepool.

Refer to later ModulesReloaded examples.

common environment variables in modules
Common Environment Variables in Modules

Be VERY careful about manipulating these environment variables!!!

  • Modules for software packages commonly set:
    • PATH
    • LD_LIBRARY_PATH
    • PYTHONPATH
    • PERL5DIR
  • Every usg/jgi module for software also sets an environment variable pointing to the base of the distribution:
    • E.g. BOOST_ROOT, PERL_DIR, PYTHON_DIR, GIT_PATH
  • Exercise:
    • Load the python module first
    • Use ‘module info’ to investigate the effects of:
      • graphviz
      • RSeQC
      • Smrtanalysis
    • Are there commonalities? Differences?
modules have dependencies
Modules have dependencies

For the python module to function, both the gccand readlinemodules need to be loaded

For the perlmodule to function, the gccmodule needs to be loaded

  • Python needs some of gcc’s libraries
  • Perl needs some of gcc’s libraries
  • Python also needs readline’s libraries
complexity of module dependencies on genepool
Complexity of module dependencies on genepool
  • Highly inter-connected graph of dependencies
  • The most highly connected nodes:
    • gcc
    • perl
    • python
    • oracle-jdk
    • openmpi
  • Many modules are disconnected from the network, possibly because they are:
    • Statically compiled
    • Only rely on base-system functionality
    • Dependencies haven’t been modelled yet
modulesreloaded
ModulesReloaded
  • Automatically checks and loads dependencies
  • Automatically unloads orphaned dependencies
  • Differentiates between user-loaded modules and auto-loaded modules when manipulating modules
  • Does more extensive error checking
    • Modules failing to load return exit status 1 (echo $?)
  • Supports “variant” modules
    • Single modulefiles for multiple installations of similar software
  • Enables reporting of upcoming changes to modules system
  • Enhances logging capabilities of modules system
modulesreloaded autoload unload
ModulesReloadedAutoLoad/Unload
  • Exercise:
    • Start by unloading all modules.
    • Load the python module.
    • Which modules were loaded?
    • Next, load the perl module.
    • Which modules are loaded now?
    • Now, unload the python module
    • Check module list
    • Finally, unload the perl module.
    • Check module list
    • Look at the details of the perl and python modules.
modulesreloaded autoload unload1
ModulesReloadedAutoLoad/Unload
  • Exercise:
    • Start by unloading all modules. [module purge]
    • Load the python module. [module load python]
    • Which modules were loaded? [gcc, readline, python]
    • Next, load the perl module. [module load perl]
    • Which modules are loaded now? [gcc, readline, python, perl]
    • Now, unload the python module [module unload python]
    • Check module list [gcc, perl]
    • Finally, unload the perl module. [module unload perl]
    • Check module list [None!]
    • Look at the details of the perl and python modules.

module show perl

module show python

modulesreloaded autoload unload2
ModulesReloadedAutoLoad/Unload
  • In the previous exercise, you should have noticed that the perl and python modules each depended on the gcc module (among others).
    • The gcc module won’t get unloaded while another loaded module still depends on it.
modulesreloaded user s choice
ModulesReloaded User’s Choice!
  • Exercise:
    • Load the default hmmer module
    • Load the repeatmasker module
    • Why did that just happen?
    • ModulesReloaded tracks which modules the user directly requests (vs. those just loaded as dependencies), and won’t swap or remove them automatically.
    • Unload hmmer, then try loading repeatmasker.
modulesreloaded variants
ModulesReloaded Variants

https://www.nersc.gov/users/computational-systems/genepool/programming/

  • Programming Environments are integrated sets of modules
    • Attempt to provide a seamless and coherent build environment – regardless of compiler.
  • Exercise:
    • Purge all your modules.
    • Load ‘PrgEnv-gnu’
    • Load ‘boost’
    • Examine the BOOST_ROOT environment variable
    • Swap to ‘PrgEnv-gnu/4.8’
    • Examine the BOOST_ROOT environment variable again
modulesreloaded variants1
ModulesReloaded Variants
  • The ‘boost’ module is a ‘variant’ module
    • When loaded, it detects which programming environment (PrgEnv) is loaded
    • When the PrgEnv is swapped, the variant module is also reloaded
    • A variant module cannot be loaded without its provider (e.g. boost cannot be loaded without some PrgEnv)
  • Earlier, we had to load python before we could interrogate RSeQC
    • because RSeQC is a variant on ‘python’ (instead of ‘PrgEnv’)
modulesreloaded variants2
ModulesReloaded Variants

PrgEnv and Compilers

Software Libraries (and Deps)

Each programming environment provide the ‘PrgEnv’ attribute which is required by the libraries.

The PrgEnv meta-modules conflict with each other; but the compilers do not.

Legend

“Normal” Module

PrgEnv-provider Module

PrgEnv-client Module

Default Module

Non-default Module

modulesreloaded defaultchange
ModulesReloadedDefaultChange
  • Changing default module versions may be disruptive to some users
  • To advertise the change a warning is communicated by modules
  • Example:
    • The default version of blast+ is planned to be changed on August 6.
    • Load the default blast+ module
    • Unload the blast+ module
    • Load blast+/2.2.26 (which is the default)

dmj@genepool04:~$ module load blast+

WARNING: The default version of blast+ will be changing from 2.2.26 to 2.2.28 on 2013/08/06. Please try blast+/2.2.28. Please contact consult@nersc.gov with any questions.

dmj@genepool04:~$ module unload blast+

WARNING: The default version of blast+ will be changing from 2.2.26 to 2.2.28 on 2013/08/06. Please try blast+/2.2.28. Please contact consult@nersc.gov with any questions.

dmj@genepool04:~$ module load blast+/2.2.26

dmj@genepool04:~$

  • The warning is only sent to users accessing the default without specifying a version.
nersc dotfiles your default environment pt 2
NERSC Dotfiles – Your default Environment Pt 2
  • Default modules are loaded in the .bashrc/.tcshrc files
    • System files load ‘uge’,’nsg’,’jgitools’
      • uge adds the scheduler
      • Jgitools puts /jgi/tools/bin into your PATH
    • .bashrc loads ‘usg-default-modules’
    • usg-default-modules autoloads:
      • PrgEnv-gnu
      • perl
      • python
      • oracle-client
      • mysql
    • Are any additional modules auto-loaded as prerequisites?
  • You can add your own ‘module load’ commands to .bashrc.ext / .tcshrc.ext
    • Do this with care – modules added in the default environment become somewhat infectious
nersc dotfiles your default environment pt 21
NERSC Dotfiles – Your default Environment Pt 2
  • What happens if a user does the following in a their .bashrc.ext file?

module load smrtanalysis

export PERL5LIB=$HOME/perl

export LD_LIBRARY_PATH=/house/groupdirs/randd/lib:$LD_LIBRARY_PATH

    • Is something wrong here?
    • Answer: PERL5DIR shouldn’t be replaced. This is invalidating the effects of the smrtanalysis module. Instead, use:

export PERL5LIB=$HOME/perl:$PERL5LIB

  • What about this:

export PATH=/jgi/tools/bin:$PATH

    • Is there something wrong with this?
    • Answer: The jgitools module is loaded very early in the environment. The jgitools module already implements this functionality. The many things in /jgi/tools/bin may override other settings you want.
nersc dotfiles your default environment pt 22
NERSC Dotfiles – Your default Environment Pt2
  • Best Practices:
    • Do put your settings in a “genepool”-only section of .bashrc.ext / .tcshrc.ext

if [ “$NERSC_HOST” == “genepool” ]; then

fi

    • Limit the number of modules you load by default, it can complicate handing off batch scripts later
    • Do not replicate module functionality
      • i.e. don’t set environment variables with paths into /usr/common directly
      • Only add to variables like PATH, LD_LIBRARY_PATH, PYTHONPATH, PERL5DIR as these are commonly
using modules interactively
Using Modules Interactively

Use modules precisely as we have been in the exercises

Modules are great for interactive use!

using modules in batch scripts
Using Modules in Batch Scripts

Ensures login environment is initialized

#!/bin/bash –l

#$ -l ram.c=10G

#$ -l h_rt=8:00:00

set –e

module purge

module load PrgEnv-gnu/4.6

module load uge

module load blast+/2.2.28

module load python/2.7.4

#…. Run your programs here ….

UGE options

Kill script if any commands give non-zero exit status

Clear all the modules, and then reload all needed modules by version

using modules in batch scripts1
Using Modules in Batch Scripts
  • Using this approach:
    • Your batch script will terminate if something goes wrong (non-zero exit status)
    • No extraneous modules will be loaded, ensuring exactly the calculation you want to be run is run with no surprises
    • Using the precise version numbers means your script will work even after new defaults are installed
    • Purging the modules first will allow your script to work in other users’ hands without requiring anybody to change their dotfiles.
using modules in production pipelines
Using Modules in Production Pipelines
  • Consider creating a pipeline module
    • e.g. jigsaw/5.1
    • The pipeline module could be a pure ‘meta-module’ or point to it’s own relevant scripts (and still be a meta-module)
    • A meta-module purely loads other modulefiles
      • E.g., PrgEnv-gnu
    • A full-featured modulefile could:
      • Load other modulefiles
      • Add entries to PATH, PERL5LIB, other parts of the environment
writing a meta modulefile
Writing a meta-modulefile

mod_conflict replaces the conflict keyword to trap and exit with status 1

#%Module1.0

##

## Required internal variables

set name MyPipeline

set version 1.0

## List conflicting modules here

set mod_conflict [list $name]

## List prerequisite modules here

set mod_prereq_autoload [list blast+/2.2.28 mothur/1.26.0 qiime/1.7.0]

set mod_prereq[list blast+/2.2.28 mothur/1.26.0 qiime/1.7.0]

## Source the common modules code-base

source /usr/common/usg/Modules/include/usgModInclude.tcl

## Software-specific settings exported to user environment

setenvMYPIPELINE_VER $version

mod_prereq_autoloadis the list of modules to autoload

mod_prereqis the list of modules to enforce are loaded first. This sets up the automatic load/swap protections.

usgModInclude.tclis the ModulesReloaded include code. This should be included before any environment manipulations.

A pure meta-module

writing a meta modulefile1
Writing a meta-modulefile

root should evaluate to the filesystem path for your pipeline. The braces instruct TCL to not evaluate it immediately. The include code will do the evaluation and perform additional error checking.

#%Module1.0

##

## Required internal variables

set name MyPipeline

set version 1.0

set root {/path/to/my/group/stuff/$name/$version}

## List conflicting modules here

set mod_conflict [list $name]

## List prerequisite modules here

set mod_prereq_autoload [list blast+/2.2.28 mothur/1.26.0 qiime/1.7.0]

set mod_prereq[list blast+/2.2.28 mothur/1.26.0 qiime/1.7.0]

## Source the common modules code-base

source /usr/common/usg/Modules/include/usgModInclude.tcl

## Software-specific settings exported to user environment

setenv MYPIPELINE_VER $version

setenv MYPIPELINE_ROOT $root

prepend-path PATH $root/bin

Position all your environment manipulations after the include file. Doset an environment variable for the version and root of your pipeline.

A full featured pipeline-module

using pipeline modules in batch scripts
Using Pipeline Modules in Batch Scripts

Ensures login environment is initialized

#!/bin/bash –l

#$ -l ram.c=10G

#$ -l h_rt=8:00:00

set –e

module purge

module load PrgEnv-gnu/4.6

module load python/2.7.4

module use /path/to/my/groups/modulefiles

module load MyPipeline/1.0

#…. Run your programs here ….

UGE options

Kill script if any commands give non-zero exit status

Clear all the modules, load any needed variant-provider modules

Add your modulefiles to MODULEPATH (module use)

Load your pipeline module

best practices dotfiles
Best Practices - Dotfiles
  • If you make changes to compound environment variables, make sure to only add to them
    • PATH, LD_LIBRARY_PATH, PERL5DIR, PYTHONPATH (many more)
  • Do not replace modules functionality in your dotfiles:
    • Don’t add /jgi/tools/bin to PATH
    • Don’t add any absolute paths in /usr/common to your environment
  • Limit the number of default modules
    • Large numbers of default modules complicates giving scripts to others (they need to change their default environment to run your script)
    • Instead setup convenience meta-modules or pipeline modules and load them as-needed
best practices modules
Best Practices - Modules
  • Avoid embedding absolute paths in your scripts
    • Instead use the environment variables set in your modules
    • This reduces maintenance work on your script and centralizes the work to a single place – the modulefile
  • In production scripts, purge the modules and load them by-version
    • This ensures the script runs reproducibly
  • Unloading modules and re-loading is sometimes more reliable than swapping
    • ModulesReloaded, for example, can’t unload orphaned dependencies when swapping:

module swap PrgEnv-gnu PrgEnv-intel

module swap PrgEnv-intelPrgEnv-gnu

    • The above will leave the intel module loaded due to a bug in the underlying modules system (will investigate and fix in the future).
best practices general
Best Practices - General
  • Logout (and back in again)
    • Seriously, environments do not age like a fine wine
    • With consistent use of modules, however, they should be more stable
more information
More Information
  • The NERSC website has a great deal of information about this:
    • Genepool User Environment:
      • http://www.nersc.gov/users/computational-systems/genepool/user-environment/
    • Running CGI Scripts with Modules:
      • https://www.nersc.gov/users/computational-systems/genepool/user-environment/scriptenv-loading-modules-before-starting-a-script/
    • Using modules within Python:
      • https://www.nersc.gov/users/computational-systems/genepool/user-environment/working-with-modules-within-perl-and-python/
    • ModulesReloaded
      • Coming soon…