1 / 18

MODIS Computing Cluster: An Introduction

MODIS Computing Cluster: An Introduction. Liam Gumley Space Science and Engineering Center University of Wisconsin-Madison Mar. 10, 2005. This briefing describes the design and operation of the MODIS compute cluster. Hardware Description. System Software and User Filesystems.

stevenreed
Download Presentation

MODIS Computing Cluster: An Introduction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MODIS Computing Cluster: An Introduction Liam Gumley Space Science and Engineering Center University of Wisconsin-Madison Mar. 10, 2005

  2. This briefing describes the design and operation of the MODIS compute cluster Hardware Description System Software and User Filesystems MODIS Batch Processing

  3. The current system (SGI Origin2000) was purchased in 1998, and subsequently upgraded January 1998 March 2005 16 CPUs (250 MHz) 4 GB RAM 1.4 TB disk (70 drives) SGI Irix 6.5 MIPS Pro Fortran/C Compilers No maintenance contract since Feb. 2004!

  4. The new system (“redback cluster”) was sized to exceed Origin performance by at least 4x 1 head node: Sun V40z [4 x Opteron 2.2 GHz CPUs, 8 GB RAM, 6 x 73 GB disk] 8 compute nodes: Sun V20z [2 x Opteron 2.2 GHz CPUs, 2 GB RAM, 2 x 73 GB disk] Storage: 5.6 TB RAID [16 x 400 GB SATA disks, 2 Gbps Fibrechannel interface] Network: Gigabit Ethernet [Dell Powerconnect 24-port switch]

  5. Why “redback”?

  6. What is the difference between the head node and the compute nodes? Head node: Used for editing, compiling, visualizing, submitting batch jobs Compute nodes: Running batch jobs Compute nodes Head node Private Gigabit network To SSEC network

  7. System software includes all the applications we normally use on origin Operating System: Rocks 3.3.0 (based on Red Hat Linux) Compilers: Portland Group v 5.2 (pgf90, pgcc) Scientific Analysis: IDL v6.1, Matlab v7.0, ENVI 4.1 Visualization: McIDAS Servers: ADDE Batch Job Manager: LSF (e.g., bsub, bjobs, bkill) Utilities: nedit, ncftp suite

  8. Filesystems look the same on all nodes Home directory: /home/username Scratch directory: /scratch/username Data directories: /modisnfs1/MODIS /modisnfs2/MODIS /modisnfs3/MODIS /modisnfs12/MODIS /modisnfs13/MODIS Note: 2 GB quota! Note: /scratch is always local redback RAID (5.6 TB) falcon RAID (3.5 TB)

  9. Files from origin have already been copied to the redback RAID These directories on redback are mirrored from origin every night: /modisnfs2/MODIS/origin/modishome /modisnfs2/MODIS/origin/modisnfs1 /modisnfs2/MODIS/origin/modisnfs2 /modisnfs2/MODIS/origin/modisnfs3 /modisnfs2/MODIS/origin/modisnfs4 Also remember that the following filesystems on redback are NFS mounted from falcon /modisnfs12/MODIS /modisnfs13/MODIS

  10. UW MODIS Atmosphere Collection 5 code has been installed and tested To check out your local copy from CVS: $ cd $HOME $ cvs checkout OPS (takes a few minutes) To set up the MODIS toolkit environment: $ . $HOME/OPS/MODIS/Setup.ksh (bash/ksh) % source $HOME/OPS/MODIS/Setup.csh (tcsh/csh) Installed collection 5 algorithms include: MOD_PRDS (destriping) MOD_PR35 (cloud mask) MOD_PR07 (atmospheric profiles) MOD_PR06CR, MOD_PR06CT (cloud top properties) MOD_PRCSRFM, MOD_PRCSRG (clear sky radiance)

  11. New makefiles, scripts, and ancillary data are available to make life easier Each algorithm has a UW-specific makefile, e.g. $ cd ~/OPS/MODIS/MOD_PR35/src $ more MOD_PR35.mk_linux $ make -f MOD_PR35.mk_linux New scripts are available in OPS/MODIS/scripts to a) run an algorithm for one input granule (run*.csh) b) run an algorithm for multiple granules (modis*.ksh) c) find ancillary data files (get*.ksh) according to DAAC production rules Ancillary data are available back to Jan. 2000, and updated nightly in /modisnfs1/MODIS/gumley/ancillary

  12. Local scratch disk is available on head node and all compute nodes Every node has a local scratch disk, e.g. $ cd /scratch/username Head node has 200 GB scratch, Compute nodes have 70 GB scratch Think of it in the same way as /modisnfs2 on origin Files larger than 100 KB are removed after 7 days To see how scratch disk can be used for your own batch jobs, check this sample script: $ cd ~/OPS/MODIS/scripts $ cat sample.scr

  13. The modis_setup.ksh script is used to set up batch processing scripts $ . ~/OPS/MODIS/Setup.ksh $ modis_setup.ksh Usage: modis_setup.ksh SAT PRODUCT IMGDIR GEODIR MSKDIR DATE1 DATE2 TIME1 TIME2 where SAT is the satellite name (terra or aqua) PRODUCT is the MODIS product (MOD35, MOD07, MOD06, MODDS, MODCS) IMGDIR is the directory for the MODIS Level-1B 1KM files GEODIR is the directory for the corresponding geolocation files MSKDIR is the directory for the corresponding cloud mask files (set to MISSING if running MOD35) DATE1 is the start date (YYYYDDD) DATE2 is the end date (YYYYDDD) TIME1 is the start time (HHMM) TIME2 is the end time (HHMM) Note: MOD35, MOD07, MOD06, MODCS all include destriping

  14. Here’s an example of running modis_setup.ksh $ modis_setup.ksh aqua MOD35 \ /modisnfs2/MODIS/gumley/tobin/20020906_modis \ /modisnfs2/MODIS/gumley/tobin/20020906_modis \ MISSING \ 2002249 2002249 0000 0030 2002249 2002249.0000 2002249.0005 2002249.0010 2002249.0015 2002249.0020 2002249.0025 2002249.0030 $ ls MYD35.A2002249.0000.scr MYD35.A2002249.0010.scr MYD35.A2002249.0020.scr MYD35.A2002249.0030.scr MYD35.A2002249.0005.scr MYD35.A2002249.0015.scr MYD35.A2002249.0025.scr

  15. Here’s an example of running modis_submit.ksh $ modis_submit.ksh MYD35 Job <7948> is submitted to queue <short>. Job <7949> is submitted to queue <short>. Job <7950> is submitted to queue <short>. Job <7951> is submitted to queue <short>. Job <7952> is submitted to queue <short>. Job <7953> is submitted to queue <short>. Job <7954> is submitted to queue <short>.

  16. Here’s how to monitor and control your jobs: bjobslists all jobs owned by you bjobs -r lists only the running jobs bjobs -u all list all the jobs owned by everyone bsub < script submits a job script to the default queue bsub -q short < scriptsubmits a job script to the short queue bkill jobidkills a job bkill 0kills all jobs (pending and running) bqueues lists the status of all job queues

  17. Benchmarks show about 4-5x speed increase over origin: Optimized collection 5 algorithms 1 granule of Aqua MODIS (daytime, no destriping, 1 CPU) MOD35: 1 min 22 sec MOD07: 1 min 10 sec 288 granules of Aqua MODIS (with destriping, 16 CPUs) MOD35: 1 hr 15 min MOD07: 1 hr 10 min

  18. Some UNIX tips to make life easier Use less instead of more (allows page up and down): $ less LogStatus Use autocomplete (tab key) Use rsync to synchronize files and directories Use nedit to edit files instead of vi Change default shell if you wish using chsh: $ chsh Changing shell for gumley. Password: New shell [/bin/tcsh]: /bin/ksh Shell changed.

More Related