1 / 8

Searching Sequence Databases

BIO/CS 271 – Introduction to Bioinformatics. Searching Sequence Databases. Database Searching. How can we find a particular short sequence in a database of sequences (or one HUGE sequence)? Problem is identical to local sequence alignment, but on a much larger scale.

gwyn
Download Presentation

Searching Sequence Databases

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. BIO/CS 271 – Introduction to Bioinformatics Searching Sequence Databases

  2. Database Searching • How can we find a particular short sequence in a database of sequences (or one HUGE sequence)? • Problem is identical to local sequence alignment, but on a much larger scale. • We must also have some idea of the significance of a database hit. • Databases always return some kind of hit, how much attention should be paid to the result? Database Searches

  3. BLAST • BLAST – Basic Local Alignment Search Tool • An approximation of the Needleman & Wunsch algorithm • Sacrifices some search sensitivity for speed Database Searches

  4. MCG, CGP, GPF, PFI, FIL, ILG, LGT, GTY, TYC The BLAST algorithm • Break the search sequence into words • W = 3 for proteins, W = 12 for DNA • Include in the search all words that score above a certain value (T) for any search word MCGPFILGTYC CGP MCG MCGCGP MCT MGP … MCN CTP … … This list can be computed in linear time Database Searches

  5. The Blast Algorithm (2) • Search for the words in the database • Word locations can be precomputed and indexed • Searching for a short string in a long string • Regular expression matching: FSA • HSP (High Scoring Pair) = A match between a query word and the database • Find a “hit”: Two non-overlapping HSP’s on a diagonal within distance A • Extend the hit until the score falls below a threshold value, X Database Searches

  6. Results from a BLAST search Database Searches

  7. Search Significance Scores • A search will always return some hits. • How can we determine how “unusual” a particular alignment score is? • ORF’s • Assumptions Database Searches

More Related