Hsp hmmer vs mpi hmmer parallelization of psi blast
This presentation is the property of its rightful owner.
Sponsored Links
1 / 18

HSP-HMMER vs MPI-HMMER & parallelization of PSI-BLAST PowerPoint PPT Presentation


  • 75 Views
  • Uploaded on
  • Presentation posted in: General

HSP-HMMER vs MPI-HMMER & parallelization of PSI-BLAST. Christian Halloy [email protected] April 23, 2009. HMMER, PFAM, and PSI-BLAST. If you BLAST a protein sequence (or a translated nucleotide sequence) BLAST will look for known domains in the query sequence.

Download Presentation

HSP-HMMER vs MPI-HMMER & parallelization of PSI-BLAST

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Hsp hmmer vs mpi hmmer parallelization of psi blast

HSP-HMMER vs MPI-HMMER& parallelization of PSI-BLAST

Christian [email protected]

April 23, 2009


Hmmer pfam and psi blast

HMMER, PFAM, and PSI-BLAST

  • If you BLAST a protein sequence (or a translated nucleotide sequence) BLAST will look for known domains in the query sequence.

  • BLAST can also be used to map annotations from one organism to another or look for common genes in two related species.

  • HmmerPfam compares one or more sequences to a database of profile hidden Markov models, such as the Pfam library, in order to identify known domains within the sequences.

(Fig above): A part of an alignment for the Globin family from the Pfam website

HMMER’s hmmpfam code searches an HMM-PFAM database for matches to a query sequence.

(Left Figure): molecular rendering of Luciferase protein


Preliminaries

preliminaries

  • first meeting May 15, 08 - need to develop Computational Biology using HPC at ORNL - Jouline’s group uses over 20 common Comp.B. tools

  • initial approach: - need to install at least two major tools to run on Cray XT4 -HMMER and PSI-BLAST

  • status in May 08: -public MPI-HMMER scales ok only up to ~256 cores -public (serial) PSI-BLAST was never run on Jaguar/Kraken


Jaguar kraken cray xt4 xt5

Jaguar / Kraken - Cray XT4  XT5

  • 31328 Cores / 18048 Cores 150152 Cores / 66048 Cores

  • 263 TF / 166 TF 1381 TF / 607 TF

  • 2 GB Mem / 1 GB Mem

  • Quad core AMD Opteron nodes

    • Compute nodes + Service nodes (login, I/O)


Mpi hmmer

MPI-HMMER

  • HMMER- Hidden Markov Model – program by HHMI group at Washington University School of Medicine

  • MPI-HMMER – Wayne State University and SUNY Buffalo


Mpi hmmer cont d

Analyzing MPI-HMMER – should we try to improve it?

97 programs written in C

~48,000 lines of code

508 MPI_function callssuch as MPI_Init, MPI_Bcast, MPI_Request, MPI_BarrierMPI_Send, MPI_Recv, MPI_Pack, MPI_Unpack, MPI_Wait, …

Master – Workers paradigm

a LOT of I/O going on

Answer: NO!

MPI-HMMER (cont’d)


Hsp hmmer

Highly Scalable Parallel (HSP) - HMMER

Use the serial HMMER code, split the data, launch with MPI

HSP-HMMER

Input data – thousands of protein sequences

data1

data2

data3

dataN

HMMER

HMMER

HMMER

HMMER

result1

result2

result3

resultN


Hsp hmmer cont d 1

A few details

The new HSP-HMMER code uses:

only 4 MPI_function calls ! ( MPI_Init, MPI_Comm_size, MPI_Comm_rank, and MPI_Finalize)

it adds only ~100 lines of code to hmmpfam.c

However…

Although initial performance was better than MPI-HMMER, the scaling leveled off (but did not decrease) at ~1000 cores

Intense simultaneous I/O to the Lustre file system was still creating too much slowdowns.

HSP-HMMER (cont’d – 1)


Hsp hmmer cont d 2

Problems:

‘feeding’ similar lengths of protein sequences to all nodes produces a synchronized effect, thus I/O bottlenecks

Improvements:

Reorganize the input data so that a mixture of protein sequences of different lengths are given to each processorThis ensures a randomization of the I/O activity, minimizes bottlenecks. Performance gained: another 3 or 4 x

HSP-HMMER (cont’d – 2)


Hsp hmmer cont d 3

HSP-HMMER (cont’d – 3)

Problems:

Opening a single file (for reads or writes) from 1000 or more processors overloads the MDS – MetaData Server

Reads/writes of many different files from many different processors overloads the default 4 OSTs (Object Storage Targets)

Solutions:

Subdivide total input data into multiple data files

Give more work to each processor (more sequences) and write outputs using the Lustre striping mechanism -distribute I/O activities among more OSTs (Object Storage Targets), but have each processor contact only one OST -use pthreads to improve utilization of multicores, more memory, and at the same time reducing number of I/O requests. Another gain of 2 to 3x was observed.


Hsp hmmer cont d 4

Results:

Identifying the Pfam domains in all 6.5 million proteins of the “nr” (non redundant) database takes less than 24 hours when using HSP-HMMER on 2048 dual threaded processors. This would have taken ~2 months with MPI-HMMER

This is critical, considering that the protein database is doubling in size every 6 months!

HSP-HMMER (cont’d – 4)


Summary highly scalable parallel hmmer on a cray xt4 c halloy b rekapalli and i jouline

SUMMARY: Highly Scalable Parallel HMMER on a Cray XT4 C. Halloy, B. Rekapalli, and I. Jouline

  • HMMER – Protein Domain Identification tool

  • existing MPI-HMMER – limited performance, did not scale well

  • new HSP-HMMER – excellent performance (~100x faster than MPI-HMMER for 4096 cores) and scales well beyond 8000 cores

  • HSP-HMMER code brings down time to identify functional domains in millions of proteins from 2 months down to less than 20 hours.

  • HSP-HMMER paper accepted for publication in: ACM SAC 2009 Bioinformatics Track

  • Using a closely coupled supercomputer with high-bandwidth parallel I/O is crucial.

  • Further bioinformatics genomics research will benefit tremendously from the utilization of such powerful resources.


Psi blast

Position Specific Iterated – Basic Local Allignment Search ToolNCBI SOFTWARE DEVELOPMENT TOOLKIT

National Center for Biotechnology Information, NIH

First steps:

serial version ‘blastpgp’ runs on 1 core of Cray XT4 and XT5

Studied the ncbi toolkit software and looked into possible MPI implementations of PSI-BLAST

PSI-BLAST


Psi blast cont d

Analyzing PSI-BLAST (and the whole NCBI toolkit!) –should we try to improve it? should we attempt convoluted MPI routines?

> 500 programs written in C

> 1,000,000 lines of code

no MPI_function calls! (of course!)

Answer: NO!Let’s first try a simple “ideally parallel method”! (nothing embarassing about that!)

PSI-BLAST (cont’d)


Initial results hsp psiblast

Developing a Highly Scalable Parallel (HSP) - PSIBLAST

Wrote an MPI-wrapper, modifying only the ncbimain.c program, and adding only some ~50 lines of code,

It uses only 5 MPI_function calls ( MPI_Init, MPI_Finalize, MPI_Comm_size, MPI_Comm_rank, and MPI_Barrier)

Using an initial set of 27 protein sequences “BLASTed” against the “nr” database with blastpgp running on 1, 8, 512, 1024, 2048 and 4096 tasks.

Initial results: HSP-PSIBLAST


Initial results hsp psiblast cont d

Highly Scalable Parallel (HSP) - PSIBLAST

The graph below shows the best times for each run (several runs were done each time). Numerical results were compared and shown to be identical.

Excellent scaling up to 1024 cores (tasks)

Initial results: HSP-PSIBLAST (cont’d)


Future steps hsp psiblast

Improve the performance and scalability of HSP-PSIBLAST

Similarly to HSP-HMMER we will pre-process the input data- sets so that randomized-sized protein sequences are submitted (by chunks of several thousands) to each MPI task

We will test different variation of parallel I/O with lustre striping, and also using different numbers of OSTs

Splitting up the “nr” database (1 GB in July 08, now 3 GB in March 09) might also be helpful (and eventually necessary, as it grows much more). This will require further improvements to the MPI I/O component to ensure optimal performance.

Test performance and scalability of other NCBI routines

e.g. blast, blastall, megablast, rpsblast, etc,

Future steps: HSP-PSIBLAST


Summary

Summary

  • outcome (as of today): - developed HSP-HMMER on Cray XT4 and XT5 -it scales well up to 8000 cores -it is MUCH faster than MPI-HMMER- developed a parallel PSI-BLAST that runs up to 4096 cores- HSP-PSIBLAST scales very well up to 1024 cores

Questions? Comments?Thank you!


  • Login