1 / 14

GCP HPC Structure

GCP HPC Structure. E2GRIS1 Rolando Navarro Jara Omar Palomino Huamaní International Potato Center Itacuruça (Brazil), 2-15 November 2008. OVERVIEW What is CIP? CGIAR – Consultative Group on International Agricultural Research GCP - Generation Challenge Programme

Download Presentation

GCP HPC Structure

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. GCP HPC Structure E2GRIS1 Rolando Navarro Jara Omar Palomino Huamaní International Potato Center Itacuruça (Brazil), 2-15 November 2008

  2. OVERVIEW • What is CIP? • CGIAR – Consultative Group on International Agricultural Research • GCP - Generation Challenge Programme • Subprogram 4 – Bioinformatics

  3. Structure STRUCTURE software: - Structure is a free software to genetics analysis developed by Pritchard, Stephens & Donnelly (2000). - From website: “structure is a free software package for using multi-locus genotype data to investigate population structure. Its uses include inferring the presence of distinct populations, assigning individuals to populations, studying hybrid zones, identifying migrants and admixed individuals, and estimating population allele frequencies in situations where many individuals are migrants or admixed.” - Structure command line is easy to install upon Linux – Windows – Sun Operative Systems.

  4. How does it work?

  5. INPUT file: The input file is the data that will be processed by Structure. It has the following format: Number of Individuals: 3 Number of locus: 10 Missing input: -9 Ploid: 4

  6. Mainparams(1): #define INFILE testdata1 // (str) name of input data file #define OUTFILE results //(str) name of output data file #define NUMINDS 3 // (int) number of diploid individuals in data file #define NUMLOCI 10 // (int) number of loci in data file #define LABEL 1 // (B) Input file contains individual labels #define POPDATA 1 // (B) Input file contains a population identifier #define POPFLAG 0 // (B) Input file contains a flag which says whether to use popinfo when USEPOPINFO==1 #define PHENOTYPE 1 // (B) Input file contains phenotype information #define EXTRACOLS 0 // (int) Number of additional columns of data before the genotype data start.

  7. Mainparams(2): #define PHASEINFO 0 // (B) the data for each individual contains a line indicating phase #define MARKOVPHASE 0 // (B) the phase info follows a Markov model. #define MISSING -9 // (int) value given to missing genotype data #define PLOIDY 4 // (int) ploidy of data #define ONEROWPERIND 0 // (B) store data for individuals in a single line #define MARKERNAMES 0 // (B) data file contains row of marker names #define MAPDISTANCES 0 // (B) data file contains row of map distances // between loci Program Parameters #define MAXPOPS 2 // (int) number of populations assumed #define BURNIN 2000 // (int) length of burnin period #define NUMREPS 2000 // (int) number of MCMC reps after burnin

  8. Extraparams #define FREQSCORR 1 // (B) allele frequencies are correlated among pops #define ONEFST 0 // (B) assume same value of Fst for all subpopulations. #define INFERALPHA 1 // (B) Infer ALPHA (the admixture parameter)‏ #define POPALPHAS 0 // (B) Individual alpha for each population #define INFERLAMBDA 0 // (B) Infer LAMBDA (the allele frequencies parameter)‏ #define POPSPECIFICLAMBDA 0 //(B) infer a separate lambda for each pop (only if INFERLAMBDA=1). #define NOADMIX 0 (B) Use no admixture model #define LINKAGE 0 // (B) Use the linkage model model #define PHASED 0 // (B) Data are in correct phase (required unless data are diploid)‏ ............................ ............................ ............................

  9. PLATFORM LSF: - GCP HPC Structure is an application implemented to work for High Performance Computing environment using a management software (LSF) that permit run jobs in parallel way inside the cluster. - Platform LSF is software for managing and accelerating batch workload processing for compute-and data-intensive applications taking maximum advantage of modern multi-core and multi-threaded architectures with advanced new scheduling controls for both sequential and parallel jobs. - Checking LSF status: [rcnavarro@hpc-cip ~]$ lsload HOST_NAME status r15s r1m r15m ut pg ls it tmp swp mem hpc-cip.cgiar.o ok 0.0 0.3 0.0 1% 64.0 0 471 4883M 3176M 1490M compute-0-2.cgi ok 0.0 0.0 0.0 0% 3.4 0 66944 1444M 1000M 3690M compute-0-1.cgi ok 0.0 0.0 0.0 0% 3.3 0 24944 3220M 1000M 3660M compute-0-0.cgi ok 0.0 0.0 0.0 0% 2.9 0 19392 3218M 996M 3644M

  10. PLATFORM LSF and STRUCTURE

  11. GCP HPC STRUCTURE – GUI - Developed in CIP by Luis Avila and Reinhard Simmon from Research Informatics Unit

  12. GOALS • Break GCP HPC STRUCTURE dependence from Platform LSF. • Run upon GRID environment, for several populations and more than one number of run for each analysis. • Getting a friendly interface to users.

  13. Questions … 13

  14. THANK YOU

More Related