Optimizing SRA Sequence Searches Using BLAST Technology
This document highlights the implementation of the Basic Local Alignment Search Tool (BLAST) for efficient similarity searches of biological sequences within the Sequence Read Archive (SRA). It details the requirements to create BLAST databases, the methodology for conducting direct searches, and the advantages of immediate updates following changes in the SRA archive. Additionally, it discusses future developments, such as user-customized search sets and integrating mate-pair information. Key acknowledgments are also provided.
Optimizing SRA Sequence Searches Using BLAST Technology
E N D
Presentation Transcript
SRA Transcript BLAST Tom Madden May 15, 2009
BLAST • Basic Local Alignment Search Tool • Calculates similarity for biological sequences. • Produces local alignments: only a portion of each sequence must be aligned. • Uses statistical theory to determine if a match might have occurred by chance.
Requirements for searching SRA sequences as a BLAST DB • Extract new or updated sequences. • Format into a BLAST database. • Provide disks for eight copies BLAST databases, each with 5 tera-bases (as of January). • Distribute databases to storage in Bethesda and Virginia. • Know how to quickly re-dump for policy changes or data corruption (e.g., unclipped or differently clipped reads should be searched).
Direct BLAST searches against the SRA archive. • Uses SRA toolkit and C++ BLAST API. • Smallest search unit is a “run”. • Multiple runs may be searched together. • Offers searches of 454 SRA transcripts (grouped by organism) at NCBI web page. • Clipped application reads are searched.
Advantages • The search set offered no longer depends upon how fast BLAST database can be produced and distributed. • Changes to SRA archive are seen immediately (e.g., change in clipping algorithm).
Three most popular organisms. • Human • Susscrofa • Tachyglossusaculeatus Counts searches after April 29, 2009 and only includes those with an average of two or more searches per session.
Future development • Allow users to build custom search sets. • Take mate-pair information into account. • Combine SRA searches with traditional BLAST database searches.
Acknowledgements • Kurt Rodarmer • Eugene Yaschenko • Ty Roach • Martin Shumway • Christopher O’Sullivan • Vahram Avagyan • Christiam Camacho • Yan Raytselis • Irena Zaretskaya