460 likes | 764 Views
High Performance Computing. John Zaitseff September 2014. High Performance Computing. High Performance Computing architecture. Massively Parallel Distributed Computational Cluster Many individual servers (“nodes”): dozens to thousands Multiple processors per node: between 8 and 64 cores
E N D
High Performance Computing John Zaitseff September 2014 High Performance Computing
High Performance Computing architecture Massively Parallel Distributed Computational Cluster • Many individual servers (“nodes”): dozensto thousands • Multiple processors per node: between 8and 64 cores • Interconnected by fast networks • Almost always run Linux • In our case: Rocks Linux Distributionon top of CentOS 6.x The Trentino clusterImage credit: John Zaitseff, UNSW
High Performance Computing architecture Internet Head Node Storage Node Internal Network Switch Chassis m Chassis 1 Compute Node 1 Compute Node 2 Compute Node 3 Compute Node 4 Compute Node n Compute Node 1-1 Compute Node m-1 Compute Node 1-2 Compute Node m-2 Compute Node 1-3 Compute Node m-3 Compute Node m-4 Compute Node 1-4 Compute Node 1-n Compute Node m-n
The Newton cluster: newton.mech.unsw.edu.au • 10 × Dell R415 server nodes • Head node: newton • Compute nodes: newton01 to newton09 • 160 × AMD Opteron 4386 3.1GHz processor cores • Two physical processors per node • Eight CPU cores per processor • Only four floating-point units per processor • 320 GB of main memory (32 GB per node) • 12 TB of storage: 6 × 3 TB drives in RAID 6 • 1Gb Ethernet network interconnect http://cfdlab.unsw.wikispaces.net/ The Newton cluster Image credit: John Zaitseff, UNSW
The Trentino cluster: trentino.mech.unsw.edu.au • 16 × Dell R815 server nodes • Head node: trentino • Compute nodes: trentino01 to trentino15 • 1024 × AMD Opteron 6272 2.1GHz processor cores • Four physical processors per node • Sixteen CPU cores per processor • Only eight floating-point units per processor • 2048 GB of main memory (128 GB per node) • 30 TB of storage: 12 × 3 TB drives in RAID 6 • 4×1Gb Ethernet network interconnect http://cfdlab.unsw.wikispaces.net/ The back of the Trentino cluster Image credit: John Zaitseff, UNSW
The Leonardi cluster: leonardi.eng.unsw.edu.au • 7 × HP BladeSystem c7000 blade enclosures • 1 × HP ProLiant DL385 G7 server: leonardi • 56 × HP BL685c G7 compute nodes • Compute nodes: ec01b01-ec07b08 • 2944 × AMD Opteron 6174 2.2GHz processor coresand Opteron 6276 2.3GHz processor cores • Four physical processors per node • Twelve or sixteen CPU cores per processor • 5888 GB of main memory (96 or 128 GB per node) • 95 TB of storage: 60 × 2 TB drives in RAID 60 • 2×10Gb Ethernet network interconnect http://leonardi.unsw.wikispaces.net/ Nodes in the Leonardi cluster Image credit: John Zaitseff, UNSW
The Raijin cluster: raijin.nci.org.au • 3592 × Fujitsu blade server nodes • Multiple login nodes • Multiple management nodes • 57,472 Intel Xeon E5-2670 2.60GHzprocessors • 160 TB of main memory • 10 PB of storage using the Lustredistributed file system • 14Gb Infiniband FDR networkinterconnect http://nci.org.au/nci-systems/national-facility/peak-system/raijin/ Image credit: National Computational Infrastructure
High Performance Computing architecture Internet Do not run your jobs here! Head Node Storage Node Internal Network Switch Chassis 1 Chassis m Compute Node 1 Compute Node 2 Compute Node 3 Compute Node 4 Compute Node n Compute Node m-1 Compute Node 1-1 Compute Node m-2 Compute Node 1-2 Compute Node 1-3 Compute Node m-3 Compute Node m-4 Compute Node 1-4 Compute Node 1-n Compute Node m-n
Connecting to a HPC system • Use the Secure Shell protocol (SSH) • Under Linux: ssh username@hpcsystemname • Under Windows: PuTTY (Start » All Programs » PuTTY » PuTTY) • Can install Cygwin: “that Linux feeling under Windows” • Command line prompt • Will look something like: z9693022@newton:~ $ • May be different in different systems; may be customised • Try it now: PuTTY, Host name newton.mech.unsw.edu.au • RSA2 fingerprint: 69:7e:64:75:57:67:ad:4c:21:8e:90:7d:8e:97:70:ce • User name: your zID; Password: your zPass • To exit: exit
Simple Linux commands • List files in a directory: ls [pathname ...] • [] indicates optional parameters, ... indicates one or more parameters • Italic fixed-width font indicates replaceable parameters • To show the current directory: pwd • To change directories: cd directory • ~ is the home directory • .. is the directory above the current one • ~user is the home directory of user user • Try it now: cd ~z9693022/src/trader-7.6 ls # List files in current directory cd src pwd; ls # More than one command at a time! cd ..; pwd # You don’t have to enter the comments...
Directories and files: paths and pathnames • Files and directories are organised into a hierarchical tree structure • The top of the tree is called the root directory (or simply root), and is denoted as / (slash) • The root directory contains directories, which in turn contain files and directories of their own:
Absolute pathnames • Any file or directory can be represented as an absolute pathname: • gives the full name of the file or directory • starts with the root “/” • lists each directory along the way • has a “/” to separate each path (or pathname) component • For example: the directory /share/apps/ansys/15.0
Relative pathnames • Second way of denoting a file or directory (a pathname) • Relative to the current working directory • Does not start with the root directory “/” • Path components are still separated with slashes “/” • Current directory is denoted by “.” (dot) • Going up a level is denoted by “..” (dot-dot) • Often just contains a filename with no directories listed • Examples: Assume current directory is /home/z9693022/src/trader-7.6: README → /home/z9693022/src/trader-7.6/README src/trader.c → /home/z9693022/src/trader-7.6/src/trader.c ../trader-7.6.tar.xz → /home/z9693022/src/trader-7.6.tar.xz src/.././README → /home/z9693022/src/trader-7.6/README ./README → /home/z9693022/src/trader-7.6/README
Important directories • Home directory: /home/user (e.g., /home/z9693022) • Scratch directory for temporary files: /share/scratch/user(but not available on Newton!) • Binary directories for utility programs: • /bin — for essential utilities • /usr/bin — for other utilities and some applications • /usr/local/bin — for local utilities and applications • /home/user/bin — for your own utilities • On our clusters, applications: /share/apps • On our clusters, module files: /share/apps/Modules • Note synonyms: path, pathname, filename
More with pathnames • To change directories: cd dir • To change to your home directory: cd ~ or cd $HOME or cd (by itself) • To get current working directory: pwd • To show the directory tree structure: tree, tree -d (directories only) • To view a file page by page: less filename, “q” to quit, “h” for help • Try it now: cd /home/z9693022/src/trader-7.6 tree -d less README less src/trader.c cd src; pwd less README less ../README # Different from README!
Getting help • Many commands have a myriad of command line options • For a brief summary of command line options, try command --help • For a full explanation, try man command • For some commands, try pinfo command • To search for a keyword in the manual: man -k keyword • Remember, “Google is your friend” • Try it now: ls --help cd --help # Does this work? man ls # See “See Also” section at end pinfo coreutils # “q” to quit man less # 1571 lines! man cd # What is “BASH_BUILTINS”?
The Bourne Again (Bash) shell • Official manual page entry: Bash is an sh-compatible command language interpreter that executes commands read from the standard input or from a file. Bash also incorporates useful features from the Korn and C shells (ksh and csh).Bash is intended to be a conformant implementation of the Shell and Utilities portion of the IEEE POSIX specification (IEEE Standard 1003.1). Bash can be configured to be POSIX-conformant by default. • Interprets your typed commands and executes them • Just another Linux program: nothing special about it! • Started by the system when you log in • You can then start another shell, if you like (e.g., ksh, tcsh, even python) • You can start a subshell by running bash • To exit a subshell (or the main shell): exit
Some features of Bash • Powerful command line facilities (shortcuts): • Tab completion (press the TAB key to complete commands and pathnames, TAB TAB to list all possibilities) • Command line editing: try ↑ (Up-Arrow) to recall previous commands, CTRL-R (C-R or ^R) to search for previous commands, ← and → to move along current command line • A full programming and scripting language: • Variables and arrays • Loops (for; while; until), control statements (if ... then ... else; case) • Functions and coprocesses • Text processing (“expansion” and “parameter substitution”) • Simple arithmetic calculations • Input/output redirection (e.g., redirect output to different files) • Much, much more! (The man page runs to over 5,300 lines)
Trying out some features of Bash • Try it now: • cd ~z9693022/src/trader-7.6/src • Type “less”, then space, but do not press ENTER yet • Press TAB once: nothing appears • Press TAB a second time: all relevant completions appear • Type “f”, then press TAB: the filename is completed to “fileio.” • Press TAB TAB again: two files are listed • Type “h” to select the second file, then press ENTER (and “q” to quit) • Try it now: • Press CTRL-R, then type “ls” (but do not press ENTER): previous commands with “ls” in them are listed • Press CTRL-R again a few times: will even list “pinfo coreutils” • Press ENTER when you get to the command you wish to execute • Press CTRL-C if you do not wish to execute any command
Listing files and directories • Already know the ls command: List directory contents • In full: ls [options] [pathname ...] • Some options: • “-a” for all files (including those starting with “.”) • “-l” for long (detailed) listing • Options sometimes can be combined: “-alF” • Try it now:ls -laF or dir (an alias to “ls -laF”); ll (“ls -lF”) • Example of a line in a long listing: -rw-r--r-- 1 z9693022 unsw 1266 May 24 07:59 README • The columns of information are: file permissions, number of links (usually 1 for files, 2 or more for directories), file owner, group owner, size in bytes (here, 1266), date last modified, the actual filename (README), with perhaps a trailing “*” for executable files and “/” for directories.
File and directory patterns • The Bash shell interprets certain characters in the command line by replacing them with matching pathnames • Called pathname expansion, pattern matching, wildcards or globbing • For existing pathnames: “*” matches any string, “?” matches any single character, “[...]” matches any one of the enclosed characters • Try it now: cd ~z9693022/src/trader-7.6/src; echo 1 2 3 echo *c # All filenames ending in “c”: “.” is not special echo ????.c # All filenames six characters long (4 + “.c”) echo M*m # All filenames starting with “M” and ending with “m” echo [it]* # All filenames starting with either “i” or “t” echo ../lib/uni* # All filenames in ../lib starting with “uni” echo ../*/*.c
More file and directory patterns • Glob patterns “*”, “?” and “[...]” only match existing pathnames • Even for pathnames that do not exist: “{alt1,alt2,...}” lists alternatives, “{n..m}” lists all numbers between n and m, “{n..m..s}” in steps of s • Technically called brace expansion • Try it now: ls test-* # “No such file or directory” echo test-* # What happens? echo test-{one,two,three} echo newdir/{one,two,three} echo test-{1..100} echo test-{001..100} # Zero-padding echo test-{1..100..3} # By steps of three echo test-{100..1..-3} # By steps of negative three
Naming files and directories • Linux allows any characters in filenames except “/” and the NUL byte • You may create filenames with “weird” characters in them: • spaces and tabs • starting with “-”: conflicts with command line options • question marks “?”, asterisks “*”, brackets and braces • other characters with special meanings: “!”, “$”, “&”, “#”, “"”, etc. • Just because you can does not mean you should! • To match such files: use the glob characters “*” and “?” • Linux file systems are case-sensitive: README.TXT is different from readme.txt, which is different from Readme.txt and ReadMe.txt! • File type suffixes (e.g., “.txt”) are optional but recommended • Filenames starting with “.” are usually hidden from globs and ls output. • Recommendation: Use “a” to “z”, “A” to “Z”, “0” to “9”, “-”, “_” and “.” only.
Managing directories • To create a directory: mkdir dir... • To create parent directories as well: mkdir -p dir... • To remove an empty directory: rmdir dir... • Try it now: cd ~; ls mkdir gsoe9400/dir{1,2,3} # Why does this fail? mkdir -p gsoe9400/dir{1,2,3,99} gsoe9400/x ls gsoe9400 rmdir gsoe9400/dir? ls gsoe9400 # Should list dir99 and x only rmdir gsoe9400/* # Be careful...
Managing files • To output one or more file’s contents: cat filename... • To view one or more files page by page: less filename... • To copy one file: cp source destination • To copy one or more files to a directory: cp filename...dir • To preserve the “last modified” time-stamp: cp -p • To copy recursively: cp -pr sourcedestination • To move one or more files to a different directory: mv filename...dir • To rename a file: mv oldnamenewname • To remove files: rm filename... • Recommendation: use “ls filename...” before rm or mv: what happens if you accidentally type “rm *”? or “rm * .c”? (note the space!)
Managing files and directories, continued • To copy whole directory trees: cp -pr filename...destination • To copy to and from another Linux system (e.g., from Leonardi to Trentino), use Secure Copy: scp [-p -r]source...destination • Either source or destination (but not both) can contain a remote system identifier followed by a colon: [user@]system: • Can also use rsync or insync: insync [-d]sourcedestination • Examples: cp -pr ~z9693022/src/trader-7.6 . scp -p ~/file1.txt leonardi:file2.txt scp -p john@zap.org.au:src/README . mkdir dir1; insync ~/orig dir1 insync /share/scratch/$USER/data1 $HOME/data1 insync leonardi:/share/scratch/$USER/data2 .
Managing files and directories, continued • Try it now: cd ~/gsoe9400 cp -pr ~z9693022/src/trader-7.6 .; ls cd trader-7.6; pwd cat build-aux/bootstrap ls */*.c rm */*.c; ls */*.c # What is the output of ls? insync ~z9693022/src/trader-7.6 . mkdir ../new; cp src/trader.c ../new cd ../new; ls mv trader.c new.c; rm new.c cp -p ../trader-7.6/src/trader.* . cp trader.c new.c ls -l trader.c new.c # What is the difference between these files?
Transferring files • To copy files to another Linux system: use scp, rsync or insync • To copy files to and from a Windows machine: use WinSCP or scp, rsync or insync under Cygwin • Try it now: • Start WinSCP (Start » All Programs » WinSCP » WinSCP) • Host name newton.mech.unsw.edu.au • RSA2 fingerprint: 69:7e:64:75:57:67:ad:4c:21:8e:90:7d:8e:97:70:ce • User name: your zID; Password: your zPass • Copy ~/gsoe9400/new/new.c to the Windows desktop • Rename it to newnew.c (using the usual Windows right-click or F2) • Copy it back • Under PuTTY: ls newnew.c
More Linux commands • What machine am I on? hostname • What is the date and time? date • Who is logged in? who • But who is user z1234567? finger [username...] • What is the user name for someone? finger part-of-name • What files contains a particular string? grep 'pattern' filename... • What is the difference between two files? diff [-u]file1file2 • How do I rename multiple files at once? rename or prename • Where is a file named filename? find dir... -name filename • How big is a file or directory? du -h [filename...] • How much space is available in a directory? df -h [dir...] • How much disk quota do I have? quota -s • “Blocks” is how many disk blocks you are using, in chunks of 1 kB • On Newton: “limit” is 10240M = 10 GB
Redirecting input and output • The terminal is treated as just another file (/dev/tty); use CTRL-D to signify the end of file • Other special files: /dev/null (an empty file), /dev/zero (an infinite number of binary zeros—can use up your quota in a hurry!) • Input and output from a program can be redirected to a file or even piped to another program • To redirect output to filename, use “>filename” • To append output to filename, use “>>filename” • To redirect input from filename, use “<filename” • To connect the output from one program to the input of another (pipes), use “program1|program2” • Multiple pipes are allowed: “program1|program2|...|programn” • Many utility programs are designed to be used in this way, as filters • Output can be substituted into a command line: $(commandline)
Redirecting input and output, continued • Try it now: cd ~/gsoe9400/trader-7.6 ls > ../dir-list1 cat ../dir-list1 cat ../dir-list1 | wc -l # How many lines in ../dir-list1? ls ~/gsoe9400/trader-7.6 | wc -l # Same as above rm ../dir-list1 ls -l | grep May # How many files were last modified in May? ls -l | grep May | sort -nk4 # Same, but sort by file size (4th field) who | awk '{print $1}' # Just list first field of “who” output finger $(who | awk '{print $1}') # Full details of who is logged in finger $(who | awk '{print $1}') | less # One page at a time
Simple scripting • Shell scripts are just files containing a list of commands to be executed • First line (“magic identifier”) must be #!/bin/bash • Comments are introduced with “#” • The script file must be made executable: chmod a+x filename • Variables: • To set a variable, use varname=value (no spaces!) • To use a variable, use $varname or ${varname} • Variable names start with a letter, may contain letters, numbers and “_” • Variable names are case-sensitive (as with most things Linux) • Functions (parameters are accessed using $1, $2, ...): funcname() {body of function}
Simple scripting, continued • For loops: for varname in list...; doprocess using ${varname}done • Control statements (multiple “elif” allowed; “elif” and “else” clauses are optional): if [ comparison ]; thenif-true statementselif [ second-comparison ]; thenif-second-true statementselseif-false statementsfi • Example of comparisons: string1 = string2 (is equal) • See the manual page for test (“man test”) for more information
Simple scripting, continued • While loops: while [ comparison ]; dowhile-true statementsdone • Until loops: until [ comparison ]; dowhile-false statementsdone • Many, many other programming features available! • Read the manual page: man bash • Some books: • Cameron Newham, Learning the bash Shell, 3rd Edition, O’Reilly Media, March 2005. ISBN 9780596009656, 9780596158965 • William E. Shotts Jr., The Linux Command Line, No Starch Press, January 2012. ISBN 9781593273897, 9781593274269
Editing files under Linux • Use an editor to edit text files • Many choices, leading to “religious wars”! • Some options: GNU Emacs, Vim, Nano • Nano is very simple to use: nano filename • CTRL-X to exit (you will be asked to save any changes) • GNU Emacs and Vim are highly customisable and programmable • For example, see the file ~z9693022/.emacs • Debra Cameron et al., Learning GNU Emacs, 3rd Edition, O’Reilly Media, December 2004. ISBN 9780596006488, 9780596104184 • Arnold Robbins et al., Learning the vi and Vim Editors, 7th Edition, O’Reilly Media, July 2008. ISBN 9780596529833, 9780596159351 • Try it now: cd ~/gsoe9400; nano script1
Creating a simple script file • Try it now, continued: Enter the following text: #!/bin/bash # How much disk quota am I using? # (We want only the last line of "quota" output: # use the "tail" utility) blocks_used=$(quota | tail -n 1 | awk '{print $1}') blocks_limit=$(quota | tail -n 1 | awk '{print $3}') percent=$(( ${blocks_used} * 100 / ${blocks_limit} )) echo "I am using ${blocks_used} blocks (${percent}%)" • Save the file and exit the editor, then: chmod a+x ./script1 ./script1 # Execute the script! (Note the use of “./”)
Creating a script with loops • Try it now: • Create and run the file script2, containing the following. What is the output? (Hint: remember “chmod a+x ./script2; ./script2”) #!/bin/bash module load matlab/2014a for n in {01..10}; do echo "n = $n;" >script${n}.m echo "sqrtn = sqrt(n);" >>script${n}.m echo "save('data${n}.txt', 'sqrtn', '-ascii');" \ >>script${n}.m echo "quit" >>script${n}.m matlab -nojvm -r script${n} >/dev/null cat data${n}.txt done
Applications on the cluster • Applications are managed using the module system • Applications are stored in /share/apps • Module files are stored in /share/apps/Modules • Module files set shell environment variables such as PATH • PATH controls where applications are searched (the search path) • Try it now:echo $PATH • To see all available applications: module avail • To see currently loaded applications: module list • To load an application: module load application[/version] • To unload an application: module unload application[/version]
Submitting jobs to the cluster • So far, everything has been run on the head node: a very bad idea! • To submit a job to the cluster compute nodes: • Create a shell script file as per normal • Add #PBS directives as required directly after “#!/bin/bash” • Add “cd $PBS_O_WORKDIR” • Execute qsub ./scriptfile • Wait for the job to run, checking its status as required • Common #PBS directives (“man qsub” for full details): • #PBS -N scriptname — Set a name for the script • #PBS -M email — Send notifications to an email address • #PBS -m abe — What notifications to send • #PBS -l walltime=hh:mm:ss — How much time is required • #PBS -l vmem=sizegb — How much memory is required (GB) • #PBS -l nodes=1:ppn=n — Request n processors on one node • #PBS -q queuename — Which queue to submit to
Checking your job status • Submit your jobs using “qsub” • You will be given a job number in the form jobnumber.systemname • Check job status: qstat [jobnumber] • Another way: showq • Yet another way: pestat or pestat | less -S • Use ← and → keys to scroll left and right (or expand your terminal!) • Show which nodes are reserved: showres -n | less -S • Get overall information about the cluster: visit http://systemname/ganglia/ • e.g., http://newton.mech.unsw.edu.au/ganglia/ • Currently only available within UNSW • Try it now: view the Ganglia page for the Newton cluster.
Managing your jobs • To see which nodes exist on the cluster: rocks list host or pestat • To see jobs belonging to you: qstat | grep $USER • To see when your job will start: showstart jobnumber • For more detailed information: checkjob jobnumber • To delete a queued job (whether running or not): qdel jobnumber... • To place a job on hold: qhold jobnumber... • To release a job currently on hold: qrls jobnumber... • To rerun a job (kill it and then restart it): qrerun jobnumber... • To move a job from one queue to another:qmove destqueuejobnumber...
Submitting and checking a job • Try it now: • Create and change to the directory ~/gsoe9400/job1: mkdir ~/gsoe9400/job1; cd ~/gsoe9400/job1 • Copy the previously created script file script2: cp ../script2 job1 • Edit the file job1 and add the following lines just after “#!/bin/bash”: #PBS -N job1 #PBS -M J.Zaitseff@unsw.edu.au # Do not replace—used to #PBS -m abe # assess you for this class! #PBS -l walltime=00:10:00 #PBS -l vmem=2gb #PBS -l nodes=1:ppn=1 cd $PBS_O_WORKDIR • Submit the script: qsub ./job1
Conclusion You have begun your journeyto using High PerformanceComputing clusters effectively. Well done! John Zaitseff J.Zaitseff@unsw.edu.au Available for consultationson Tuesdays 9:30am–4pmby appointment only. http://www.engineering.unsw.edu.au/hpc Image credit: John Zaitseff, UNSW