slide1
Download
Skip this Video
Download Presentation
Large-Scale Profile-HMM on the Grid Laurent Falquet Swiss Institute of Bioinformatics

Loading in 2 Seconds...

play fullscreen
1 / 17

Large-Scale Profile-HMM on the Grid Laurent Falquet Swiss Institute of Bioinformatics - PowerPoint PPT Presentation


  • 70 Views
  • Uploaded on

Large-Scale Profile-HMM on the Grid Laurent Falquet Swiss Institute of Bioinformatics CH-1015 Lausanne, Switzerland Borrowed from Heinz Stockinger June 17, 2006. Outline. Computing intensive sequence alignment Mapping the problem to the Grid Prototype implementation & results.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Large-Scale Profile-HMM on the Grid Laurent Falquet Swiss Institute of Bioinformatics' - toril


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1
Large-Scale Profile-HMM

on the Grid

Laurent Falquet

Swiss Institute of Bioinformatics

CH-1015 Lausanne, Switzerland

Borrowed from Heinz Stockinger

June 17, 2006

outline
Large-Scale Profile-HMM on the Grid Laurent.Falquet@isb-sib.chOutline
  • Computing intensive sequence alignment
  • Mapping the problem to the Grid
  • Prototype implementation & results
introduction
Large-Scale Profile-HMM on the Grid Laurent.Falquet@isb-sib.chIntroduction
  • Sequence alignment based on profile-HMM: popular but CPU intensive
  • The problem can easily be parallelised:
    • Embarrassingly parallel problem domain
  • Ideal for a Data Grid with lots of CPU power
hmm in biology
Large-Scale Profile-HMM on the Grid Laurent.Falquet@isb-sib.chHMM in Biology
  • Originally, HMMs have been mainly used in speach recognition
  • In Biology used for sequence alignment and database search
  • Packages like
    • HMMER, PFTOOLS, SAM, etc.
  • Profile-HMMs are stored in databases like Prosite, PFAM, SMART, etc.
hmm on the grid
Large-Scale Profile-HMM on the Grid Laurent.Falquet@isb-sib.chHMM on the Grid

hmmsearch model.hmm database.seq

Example:

  • Gridification of the scenario:
    • Input dataset needs to be split (pre-processing)
    • Workload generation
    • Grid jobs submission
    • Remote execution (CPU intensive part)
    • Merging of results
hmm on the grid1
Large-Scale Profile-HMM on the Grid Laurent.Falquet@isb-sib.chHMM on the Grid

Grid

Storage

Element (SE)

Local Desktop

  • Data pre-processing
  • Creation of job descriptors
  • Job submission
  • -> GRID
  • Merging of results

4. Remote execution on computing elements (CE)

slide7
Large-Scale Profile-HMM on the Grid Laurent.Falquet@isb-sib.ch

Chunks

Input Files

Store files on SE

Profiles 1..100

Profiles

1..nnn

Profiles 101..200

Grid

Storage

Element (SE)

Profiles kkk..nnn

Seqs

1..zzzzz

Seqs 1..10’000

Seqs 10’001..20’000

Get files from SE

Seqs yyyyy..zzzzz

wg on Local Site

Seqs 1..10’000

Profiles 1..100

Profile 1

Profile 2

Profile 3

hmmsearch

Profile 100

wg on Remote Site

prototype implementation
Large-Scale Profile-HMM on the Grid Laurent.Falquet@isb-sib.chPrototype Implementation
  • Prototype code implemented in C++ to wrap hmmseach as well as pfsearch
    • Other applications are possible, too
  • Client side code:
    • Uses EGEE as the main Grid middleware
    • Within Swiss BioGrid: ARC/NorduGrid
    • The tool also runs on LSF (Vital-IT cluster)
    • Globus needs to be installed on local machine
  • Takes care of job creation, remote execution, resubmission etc.
usage example for wg workload generator
Large-Scale Profile-HMM on the Grid Laurent.Falquet@isb-sib.chUsage Example for “wg”(workload generator)

Usage: wg <hmmFile> <seqFile> [options]

<hmmFile> path of the profile-HMM file containing 1 or more profiles.

<seqFile> path of the sequence file containing 1 or more sequences.

The following options are available:

-p <no> number of parallel jobs submitted to the Grid.

-r used for a remote execution of the program.

-h print this help message.

-s retrieve all status information of submitted jobs.

-v verbose: print debug messages.

-O do not retrieve the job output but only display status.

-P use pfsearch (default).

-H use hmmsearch.

benchmarks
Large-Scale Profile-HMM on the Grid Laurent.Falquet@isb-sib.chBenchmarks
  • Several benchmarks to check for:
    • Functionality (correctness)
    • Performance
    • Performance “prediction”
  • Preliminary benchmark dataset:
    • Profile-HMM DB with 7,868 entries (619 MB)
    • Sequence DB with 10,923 entries ( 7.4 MB)
preliminary results
Large-Scale Profile-HMM on the Grid Laurent.Falquet@isb-sib.chPreliminary Results

Execution time in hours

hmmsearch

Number of processors (parallel jobs)

distribution of job execution time
Large-Scale Profile-HMM on the Grid Laurent.Falquet@isb-sib.chDistribution of Job Execution Time

100 parallel jobs

Not really “high performant”

Need to get rid of peaks

run time sensitive rts scheduling execution
Large-Scale Profile-HMM on the Grid Laurent.Falquet@isb-sib.chRun Time Sensitive (RTS) Scheduling/Execution

Task Server

Storage Element

task 1, task 2, ... , task n

2. return

task URL

Task

done

1. Get Task

3. retrieve task

Worker Node

with running job

execution time of rts algo
Large-Scale Profile-HMM on the Grid Laurent.Falquet@isb-sib.chExecution time of RTS-Algo.

Time in hours

0 100 200 300 400

Sequence number of work unit

Overall performance: 2.5 hours

Similar to performance on local cluster

execution time on lsf
Large-Scale Profile-HMM on the Grid Laurent.Falquet@isb-sib.chExecution time on LSF

3hours

2hours

1hour

conclusion
Large-Scale Profile-HMM on the Grid Laurent.Falquet@isb-sib.chConclusion
  • Gridification works well for the selected problem
  • High performance is achieved via run time sensitive scheduling algorithm
  • Heterogeneous Grids can gain comparable performance to homogeneous clusters

Work done by Heinz Stockinger in co-operation with Marco Pagni, Lorenzo Cerutti

and Laurent Falquet

The EMBRACE project is funded by the European Commission within its FP6 Programme, under the thematic area "Life sciences, genomics and biotechnology for health,"contract number LHSG-CT-2004-512092.

ad