1 / 22

Pairwise alignment

Pairwise alignment. Lesson 6 Based on presentation by Irit Gat-Viks, which is based on presentation by Amir Mitchel, Introduction to bioinformatics course, Bioinformatics unit, Tel Aviv University. and.. Benny shomer, Bar-Ilan university. Definition.

Download Presentation

Pairwise alignment

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Pairwise alignment Lesson 6 Based on presentation by Irit Gat-Viks, which is based on presentation by Amir Mitchel, Introduction to bioinformatics course, Bioinformatics unit, Tel Aviv University. and.. Benny shomer, Bar-Ilan university

  2. Definition Alignment: Comparing two (pairwise) or more (multiple) sequences. Searching for a series of identical characters in the sequences. VLSPADKTNVKAAWAKVGAHAAGHG ||| | | |||| | |||| VLSEAEWQLVLHVWAKVEADVAGHG

  3. Sequence comparisons Goal: Comparing two specific sequences Goal: similarity search on sequence database Single pairwise comparisons Multiple pairwise comparisons We wish to optimize for accuracy, not speed We wish to optimize for speed, not accuracy Dynamic programming methods (Smith-Waterman, Needleman-Wunsch) BLAST, FASTA programs Identify homologous, common domains, common active sites etc. Next goal: refine database search, are the reported matches really interesting?

  4. How similar are two sequences? • The common measure of sequence similarity is their alignment score • Simpler measures, e.g., % identity are also common • These require algorithm that compute the optimal alignment between sequences

  5. Comparison methods • Global alignment – Finds the best alignment across the whole two sequences. • Local alignment – Finds regions of similarity in parts of the sequences.GlobalLocal _____ _______ __ ____ __ ____ ____ __ ____

  6. Pairwise Alignment - Scoring • The final score of the alignment is the sum of the positive scores and penalty scores: + Number of Identities + Number if Similarities - Number of gap insertions - Number of Gap extensions Alignment score

  7. Intuition of Dynamic Programming If we already have the optimal solution to: XY AB then we know the next pair of characters will either be: XYZ or XY- or XYZ ABC ABC AB- (where “-” indicates a gap). So we can extend the match by determining which of these has the highest score.

  8. V(i,j) := optimal score of the alignment of S’=s1…si and T’=t1…tj (0  i  n, 0  j  m) V(k,l) has the following properties: • Base conditions: • V(i,0) = k=0..i(sk,-) • V(0,j) = k=0..j(-,tk) • Recurrence relation: V(i-1,j-1) + (si,tj) 1in, 1jm: V(i,j) = max V(i-1,j) + (si,-) V(i,j-1) + (-,tj) Alignment with 0 elements  spacing S’=s1...si-1 with T’=t1...tj-1 si with tj. S’=s1...si with T’=t1...tj-1 and ‘-’ with tj.

  9. Optimal Alignment - Tabular Computation • Add back pointer(s) from cell (i,j) to father cell(s) realizing V(i,j). • Trace back the pointers from (m,n) to (0,0) • Needleman-Wunsch, ‘70 Backtracking the alignment

  10. PAM vs. BLUSOM • Choosing n • Different BLOSUM matrices are derived from blocks with different identity percentage. (e.g., blosum62 is derived from an alignment of sequences that share at least 62% identity.) Larger n  smaller evolutionary distance. • Single PAM was constructed from at least 85% identity dataset. Different PAM matrices were computationally derived from it. Larger n  larger evolutionary distance • Blosum uses more sequences 62 120 250

  11. Mismatch transversion Mismatch transition Match DNA scoring matrices • Non-uniform substitutions in all nucleotides:

  12. Topics to be Covered • Introduction • Comparison methods – Global, local alignment • Alignment parameters • Alignment scoring matrices – proteins • Alignment scoring matrices – DNA • Evaluation • Comparison programs • Choosing between Global / local alignment

  13. Example: Global or local? • Two human transcription factors: • SP1 factor, binds to GC rich areas. • EGR-1 factor, active at differentiation stage (Fasta fromats from http://us.expasy.org/sprot/)

  14. >sp|P08047|SP1_HUMAN Transcription factor Sp1 - Homo sapiens (Human). MSDQDHSMDEMTAVVKIEKGVGGNNGGNGNGGGAFSQARSSSTGSSSSTGGGGQESQPSP LALLAATCSRIESPNENSNNSQGPSQSGGTGELDLTATQLSQGANGWQIISSSSGATPTS KEQSGSSTNGSNGSESSKNRTVSGGQYVVAAAPNLQNQQVLTGLPGVMPNIQYQVIPQFQ TVDGQQLQFAATGAQVQQDGSGQIQIIPGANQQIITNRGSGGNIIAAMPNLLQQAVPLQG LANNVLSGQTQYVTNVPVALNGNITLLPVNSVSAATLTPSSQAVTISSSGSQESGSQPVT SGTTISSASLVSSQASSSSFFTNANSYSTTTTTSNMGIMNFTTSGSSGTNSQGQTPQRVS GLQGSDALNIQQNQTSGGSLQAGQQKEGEQNQQTQQQQILIQPQLVQGGQALQALQAAPL SGQTFTTQAISQETLQNLQLQAVPNSGPIIIRTPTVGPNGQVSWQTLQLQNLQVQNPQAQ TITLAPMQGVSLGQTSSSNTTLTPIASAASIPAGTVTVNAAQLSSMPGLQTINLSALGTS GIQVHPIQGLPLAIANAPGDHGAQLGLHGAGGDGIHDDTAGGEEGENSPDAQPQAGRRTR REACTCPYCKDSEGRGSGDPGKKKQHICHIQGCGKVYGKTSHLRAHLRWHTGERPFMCTW SYCGKRFTRSDELQRHKRTHTGEKKFACPECPKRFMRSDHLSKHIKTHQNKKGGPGVALS VGTLPLDSGAGSEGSGTATPSALITTNMVAMEAICPEGIARLANSGINVMQVADLQSINI SGNGF >sp|P18146|EGR1_HUMAN Early growth response protein 1 (EGR-1) (Krox-24 protein) (ZIF268) (Nerve growth factor-induced protein A) (NGFI-A) (Transcription factor ETR103) (Zinc finger protein 225) (AT225) - Homo sapiens (Human). MAAAKAEMQLMSPLQISDPFGSFPHSPTMDNYPKLEEMMLLSNGAPQFLGAAGAPEGSGS NSSSSSSGGGGGGGGGSNSSSSSSTFNPQADTGEQPYEHLTAESFPDISLNNEKVLVETS YPSQTTRLPPITYTGRFSLEPAPNSGNTLWPEPLFSLVSGLVSMTNPPASSSSAPSPAAS SASASQSPPLSCAVPSNDSSPIYSAAPTFPTPNTDIFPEPQSQAFPGSAGTALQYPPPAY PAAKGGFQVPMIPDYLFPQQQGDLGLGTPDQKPFQGLESRTQQPSLTPLSTIKAFATQSG SQDLKALNTSYQSQLIKPSRMRKYPNRPSKTPPHERPYACPVESCDRRFSRSDELTRHIR IHTGQKPFQCRICMRNFSRSDHLTTHIRTHTGEKPFACDICGRKFARSDERKRHTKIHLR QKDKKADKSVVASSATSSLSSYPSPVATSYPSPVTTSYPSPATTSYPSPVPTSFSSPGSS TYPSPVHSGFPSPSVATTYSSVPPAFPAQVSSFPSSAVTNSFSASTGLSDMTATFSPRTI EIC

  15. SP1 at swissprot

  16. EGR1 at swissprot

  17. Available softwares… • http://en.wikipedia.org/wiki/Sequence_alignment_software • http://fasta.bioch.virginia.edu/fasta_www/home.html • LAlign (local alignment), PLalign(dot plot) • PRSS/ PRFX (significance by Monte Carlo) • http://bioportal.weizmann.ac.il/toolbox/overview.html (Many useful software), Needle, Water. • Bl2seq (NCBI)

  18. Using LAlign • http://www.ch.embnet.org/software/LALIGN_form.html • http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?val=NP_006758.2 • http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?val=NP_066300.1

  19. Bl2Seq at NCBIhttp://www.ncbi.nlm.nih.gov/blast/bl2seq/wblast2.cgi

  20. Bl2seq results

  21. Conclusions • The proteins share only a limited area of sequence similarity. Therefore, the use of local alignment is recommended. • We found a local alignment that pointed to a possible structural similarity, which points to a possible function similarity. • Reasons to make Global alignment: • Checking minor differences between close homologous. • Analyzing polymorphism. • A good reason

More Related