1 / 12

FreeSurfing on the Supercomputer

FreeSurfing on the Supercomputer. Malcolm Tobias mtobias@wustl.edu 314.362.1594 http://www.chpc.wustl.edu/. What is a supercomputer?. Then. Now. What do we have?. 168 “iDataPlex” compute nodes 8 cores/node (w.o. HyperThreading) Each core 2.67GHz 24GB RAM/node

matt
Download Presentation

FreeSurfing on the Supercomputer

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. FreeSurfing on the Supercomputer Malcolm Tobias mtobias@wustl.edu 314.362.1594 http://www.chpc.wustl.edu/

  2. What is a supercomputer? Then Now

  3. What do we have? 168 “iDataPlex” compute nodes 8 cores/node (w.o. HyperThreading) Each core 2.67GHz 24GB RAM/node 2 “login” nodes (also iDataPlex) 7 “large SMP” compute nodes 64 cores/node Each core 2.40GHz 256GB RAM/node! Very fast interconnect, InfiniBand QDR 8 Gbit/s Latency ~140 ns Small (4TB), but very fast storage

  4. How to manage all the users and their jobs? Big jobs/Small jobs, Long jobs/Short jobs Queuing System (AKA Resource Manager) Users specify the resources their jobs need and the steps to run the application in a 'batch file'. The scheduler picks when and where to run the jobs.

  5. Anatomy of a batch file #!/bin/sh #PBS OPTIONS STEPS FOR RUNNING YOUR APPLICATION

  6. #!/bin/sh # give the job a name to help keep track of running jobs (optional) #PBS -N freesurfer # Specify the resources needed # Even though we're only using 1CPU, ask for all 8 to have exclusive # use of the node to get more accurate benchmarking data #PBS -l nodes=1:ppn=8:idataplex,walltime=48:00:00 # Specify the default queue, not the SMP nodes #PBS -q dque

  7. # cd Into the run directory and get the data we need: cd /scratch/mtobias mkdir freesurfer cd freesurfer # This is the directory will call the SUBJECTS_DIR export SUBJECTS_DIR=/scratch/mtobias/freesurfer cp /export/freesurfer/subjects/sample-001.mgz . cp /export/freesurfer/subjects/sample-002.mgz . # Make Freesurfer happy export FREESURFER_HOME=/export/freesurfer export PATH=${FREESURFER_HOME}/bin:${PATH} export PATH=${FREESURFER_HOME}/mni/bin:${PATH} export PERL5LIB=${FREESURFER_HOME}/mni/lib/perl5/5.8.5/ # Finally, run the command time recon-all -s ernie -i ./sample-001.mgz -i ./sample-002.mgz -all

  8. How to run a job 3 commands to get started: qsub, qstat, and qdel 'qsub BATCH_FILE' to submit a job 'qstat' to see the status of all the current jobs 'qdel JOBID' to delete a job When things go wrong, STDOUT and STDERR get redirected to files in the directory that you submitted the job from The files have the form: JOB_NAME.{e|o}JOBID Subsitute BATCH_FILE for JOB_NAME if you haven't defined the JOB_NAME

  9. How to get data on/off the system All access goes through the “login nodes”: login1.chpc.wustl.edu login2.chpc.wustl.edu All users have 2 work areas, home and scratch Home directories have a 5GB quota Scratch space has no quotas but is shared by all users! Files should only be stored temporarily and deleted after the job is finished You move data into either area with the standard SSH tools: scp/sftp BlueArc – COMING SOON!

  10. How fast is fast? We need a benchmark! If one doesn't exist, we need to create one. FreeSurfer ships with some data: subjects/sample-001.mgz subjects/sample-002.mgz What options should be standard? recon-all -s ernie -i ./sample-001.mgz -i ./sample-002.mgz -all

  11. How fast are the iDataPlex nodes? Best case (no other jobs so Turbo Mode gives it an extra 'kick'): 1034 minutes(m) Realistic case (8 jobs filling a node): 1115m Can we do better? With HyperThreading (HT), I can run 16 jobs on a node in 1728m Without HT, it would have taken 2x1115m=2230m to run 16 jobs, so a speed-up of ~23%

  12. Can we get even faster? If we can get the source code, we can try using the Intel compilers which are faster and will optimize to our hardware. Will we encounter compile time problems? How to verify we're getting the correct answer? Can we use optimized libraries like the Intel Math Kernel Library?

More Related