1 / 13

Sequence Similarity Searching

Sequence Similarity Searching. 75321 Class 4 March 2010. Why Compare Sequences?. Identify sequences found in lab experiments What is this thing I just found? Compare new genes to known ones Compare genes from different species information about evolution

craigtorres
Download Presentation

Sequence Similarity Searching

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sequence Similarity Searching 75321 Class 4 March 2010

  2. Why Compare Sequences? Identify sequences found in lab experiments What is this thing I just found? Compare new genes to known ones Compare genes from different species information about evolution Guess functions for entire genomes full of new gene sequences

  3. Are there other sequences like this one? 1) Huge public databases - GenBank, Swissprot, etc. 2) Sequence comparison is the most powerful and reliable method to determine evolutionary relationships between genes 3) Similarity searching is based on alignment 4) BLAST and FASTA provide rapid similarity searching a. rapid = approximate (heuristic) b. false + and - scores

  4. Similarity ≠ Homology 1) 25% similarity ≥ 100 AAs is strong evidence for homology 2) Homology is an evolutionary statement which means “descent from a common ancestor” common 3D structure usually common function homology is all or nothing, you cannot say "50% homologous"

  5. How to Compare Sequences? GATGCCATAGAGCTGTAGTCGTACCCT <— —> CTAGAGAGC-GTAGTCAGAGTGTCTTTGAGTTCC Manually line them up and count? an alignment program can do it for you or a just use a text editor Dot Plot shows regions of similarity as diagonals

  6. GlobalvsLocalsimilarity 1) Global similarity uses complete aligned sequences - total % matches GCGGAP program, Needleman & Wunch algorithm 2) Local similarity looks for best internal matching region between 2 sequences GCGBESTFIT program, Smith-Waterman algorithm, BLAST and FASTA 3) dynamic programming optimal computer solution, not approximate

  7. Search with Protein, not DNA Sequences 1) 4 DNA bases vs. 20 amino acids - less chance similarity 2) can have varying degrees of similarity between different AAs - # of mutations, chemical similarity, PAM matrix 3) protein databanks are much smaller than DNA databanks

  8. Similarity is Based on Dot Plots 1) two sequences on vertical and horizontal axes of graph 2) put dots wherever there is a match 3) diagonal line is region of identity (local alignment) 4) apply a window filter - look at a group of bases, must meet % identity to get a dot

  9. Simple Dot Plot

  10. Dot plot filtered with 4 base window and 75% identity

  11. Dot plot of real data

  12. Global vs. Local Alignments

More Related