1 / 19

Experiences deploying Clusterfinder on the grid

Experiences deploying Clusterfinder on the grid. Arthur Carlson (MPE) 7th AstroGrid-D Meeting TUM, 11th-12th June 2007. Experiences deploying Clusterfinder on the grid. What is the deployment problem? A prototype solution using “grid-modules” “environments” Status and conclusions.

mahala
Download Presentation

Experiences deploying Clusterfinder on the grid

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Experiences deploying Clusterfinder on the grid Arthur Carlson (MPE) 7th AstroGrid-D Meeting TUM, 11th-12th June 2007

  2. Experiences deploying Clusterfinder on the grid What is the deployment problem? A prototype solution using “grid-modules” “environments” Status and conclusions

  3. Deployment is when ...

  4. Deployment is when ... users each of many can (build and) run each of many applications hosts. on each of many

  5. Deployment is when ... users each of many can (build and) run each of many applications hosts. on each of many “each” = >90% “many” = >10

  6. Deployment is when ... users each of many can (build and) run each of many applications certificates/password files VOs (update of grid-mapfile, sharing software) firewalls hosts. on each of many repository/distribution/version control data access “standard software” (compiler, ...) environment “each” = >90% “many” = >10

  7. grid-modules

  8. grid-modules A prototype system for getting software from where it is maintained to where it is used. • Inspired by environment modules package • load/unload (PATH) • initadd/initclear (.profile) • for software from a remote repository • update/deinstall • build/clean • test

  9. grid-modules: install and use • grid-modules-clone NEWHOST(LIST) • also copies ~/.subversion for passwords • grid-module [update|load|initadd|build|test] [gridmod|env|gmon|cf|proc|gat]

  10. grid-modules: adding modules • set_module_info agd_rep='svn://svn.gac-grid.org/software‘ all_modules=‘gridmod cf‘ case $module in gridmod) rep=$agd_rep/grid-modules; frag=gridmod/bin;; cf) rep=$agd_rep/clusterfinder; frag=unknown;; *) rep=unknown; frag=unknown;; esac • customization scripts

  11. grid-modules: adding modules • set_module_info agd_rep='svn://svn.gac-grid.org/software‘ planck_rep='http://www.mpa-garching.mpg.de/svn/planck-group/planckbranches‘ all_modules=‘gridmod cf proc‘ case $module in gridmod) rep=$agd_rep/grid-modules; frag=gridmod/bin;; cf) rep=$agd_rep/clusterfinder; frag=unknown;; proc) rep=$planck_rep/ProC-2.3; frag=proc/build/dist/bin;; *) rep=unknown; frag=unknown;; esac • customization scripts ===== proc.build ===== cd ~/grid-modules/proc/ProC-base ant ===== proc.load ===== mkdir -p $HOME/.planck echo "allowIncompleteConf = true" > "$HOME/.planck/pipelinecoordinator.pref" ===== proc.unload ===== rm -r $HOME/.planck

  12. environments

  13. environments A prototype system for making different hosts look alike. • Does a required software package exist on a remote host, and where is it installed? export IMAGEMAGICK_HOME=/usr/local/ImageMagick-6.3.2 • Make it available! export PATH=$PATH:/usr/local/ImageMagick-6.3.2/bin • Host-specific information must be maintained by somebody somewhere. • require modules or take the bull by the horns

  14. environments: load_env The trick is to find the right scripts to execute for each host. if ! hostname=`hostname -f 2>/dev/null`; then hostname=`hostname`; fi scripts=`sed -n "s/^ *$hostname *//p" <<EOF astrogrid.aei.mpg.de aei buran.aei.mpg.de aei lx32i1.cos.lrz-muenchen.de lrz g95 lrz-32 lx64a2.cos.lrz-muenchen.de lrz g95 lrz-64 ... EOF` cd ~/grid-modules/env/bin source ./default if [[ -f local ]]; then \ echo sourcing local environment script source local elif [[ "$scripts" ]]; then \ echo For $hostname sourcing these scripts: $scripts for script in $scripts; do source ./$script; done fi This may need to be changed when adding a new host

  15. environments: scripts The work is done in the scripts. ===== default ===== export GSL_INCL=-I/usr/include export GSL_LIBS=-L/usr/lib export IMAGEMAGICK_INCL=-I/usr/include/ export IMAGEMAGICK_LIBS=-L/usr/lib/ export FC='gfortran -std=gnu -fno-second-underscore' export F_PORTABILITY_FLAGS=-DPLANCK_GFORTRAN export F_COMMONFLAGS='-W -Wall -Wno-uninitialized -Wno-unused -O2 -Wfatal-errors $(F_PORTABILITY_FLAGS)' export FCFLAGS='-c $(F_COMMONFLAGS) -I$(INCDIR)' export CC=gcc export CCFLAGS_NO_C='-W -Wall -I$(INCDIR) $(GSL_INCL) $(IMAGEMAGICK_INCL) -fno-strict-aliasing -O2 -g0 -s -ffast-math' export CCFLAGS='$(CCFLAGS_NO_C) -c‘ ===== lrz ===== export GSL_INCL='$(GSL_INC)' export GSL_LIBS='$(GSL_SHLIB) $(GSL_BLAS_SHLIB)' export ANT_HOME=/lrz/sys/apache-ant-1.6.5 module load gsl module load java module load gcc/4.1.0 module load g95 module load mpi.shmem/gcc export PATH=/lrz/sys/jdk1.5.0_07/bin:${PATH} ====== g95 ===== export FC=g95 export F_PORTABILITY_FLAGS=-DPLANCK_G95 New scripts may need to be written for new hosts Defaults work in most cases. Cooperates with modules. Defaults can be overridden.

  16. Status

  17. Status • ca. 23 AGD hosts + 9 DGI hosts are accessible • F90 build of Clusterfinder successful on 22 hosts (70%) • Some of the problems experienced: • difficulty finding FQDNs of resources, hosts listed by mistake • gsissh disabled • default job factory type disabled for globusrun-ws • no gsiscp installed, or unexpected default ports • svn not installed, too old, or not allowed connections • shell not bash, .profile not processed with batch jobs • file quota too small • some hosts (lx[32|64]ia1 at LRZ) share a file system • no F90 compiler installed, or hard to find • deep changes in grid-modules are hard to update

  18. Conclusions

  19. Conclusions • Clusterfinder has been deployed on “many” hosts using a prototype deployment system that is “easily” extendable to many users and many applications. • The system handles diversity without standing in the way of defining standards. • AGD should use this system or decide on something better, but should not diverge.

More Related