Cluster Computing Applications Project Parallelizing BLAST. William Burke York College, City University of New York John Mugler and Stephen Scott Oak Ridge National Laboratory. Research Alliance of Minorities (RAM), Computer Science and Mathematics Division.
Cluster Computing Applications Project Parallelizing BLAST
York College, City University of New York
John Mugler and Stephen Scott
Oak Ridge National Laboratory
Research Alliance of Minorities (RAM), Computer Science and Mathematics Division
Bioinformatics Research needs faster text string matching algorithms.
The purpose of this project is to analyze the BLAST algorithm:
Define the structure of BLAST.
State why it is a valuable Bioinformatics tool.
Explore parallelizations of BLAST.
BLAST matches query string fragments against a target database.
Eliminates need to run a full text string comparison.
Speeds up search database search time.
Several methods of parallelizing BLAST have been explored.
Red Hat Linux 7.2
C3 - http://www.csm.ornl.gov/torc/C3/
LAM/MPI - http://www.lam-mpi.org/
Maui Scheduler - http://supercluster.org/maui/
MPICH - http://www-unix.mcs.anl.gov/mpi/mpich/
OpenSSH - http://www.openssh.com/
OpenSSL - http://www.openssl.org/
PBS - http://www.openpbs.org/
PVM - http://www.csm.ornl.gov/pvm/
SIS - http://www.sisuite.org/
OSCAR configures the head node.
OSCAR builds and configures compute nodes.
C3 reduces time and effort to operate and manage a cluster.
eXtreme TORC powered by OSCAR
needs faster string
What is BLAST?
A heuristic algorithm used for string matching query strings to a database.
How does BLAST algorithm work?
Statistical means for comparison.
How can you parallelize BLAST on a computational cluster?
Query word (W = 3)
neighborhood PEG 15
words PRG 14
PMG 13 neighborhood
PSG 13 score threshold
PQN 12 ( T = 13 )
QUERY STRING SLAALLNKCKTPQGQWLVNQWIKWPLMDKNRIEERLN 365
n DATABASE STRING GSWNLAALDKDPMGDKNRIEERLNLVEAIKWPLMDJN 330
NBLAST SLRI Bioinformatics Toolkit
DNA sequence matching processor
I would like to extend my thanks to Stephen L. Scott, John Mugler, Thomas Naughton, and Brian Luethke for their invaluable mentoring, Michaelangelo Salcedo for his guidance, Debbie McCoy and Cheryl Hamby for their support in the RAM program.
This research was performed under the Research Alliance for Minorities Program administered through the Computer Science and Mathematics Division, Oak Ridge National Laboratory. This Program is sponsored by the Mathematical, Information, and Computational Sciences Division; Office of Advanced Scientific Computing Research; U.S. Department of Energy. Oak Ridge National Laboratory is managed by UT-Battelle, LLC, for the U.S. Department of Energy under contract DE-AC05-00OR22725. This research used resources of the Center for Computational Sciences at Oak Ridge National Laboratory, which is supported by the Office of Science, U.S. Department of Energy. This work has been authored by a contractor of the U.S. Government under contract DE-AC05-00OR22725. Accordingly, the U.S. Government retains a nonexclusive, royalty-free license to publish or reproduce the published form of this contribution, or allow others to do so, for U.S. Government purposes.