Comparing Two Protein Sequences - PowerPoint PPT Presentation

slide1 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Comparing Two Protein Sequences PowerPoint Presentation
Download Presentation
Comparing Two Protein Sequences

play fullscreen
1 / 76
Comparing Two Protein Sequences
210 Views
Download Presentation
vahe
Download Presentation

Comparing Two Protein Sequences

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Comparing Two Protein Sequences Cédric Notredame

  2. Our Scope Look once Under the Hood Pairwise Alignment methods are LIMITED If You Understand the LIMITS they Become VERY POWERFUL Pairwise Alignment methods are POWERFUL

  3. Outline -WHY Does It Make Sense To Compare Sequences -HOW Can we Compare Two Sequences ? -HOW Can we Align Two Sequences ? -HOW can I Search a Database ?

  4. Why Does It Make Sense To Compare Sequences ? Sequence Evolution

  5. Why Do We Want To Compare Sequences wheat --DPNKPKRAMTSFVFFMSEFRSEFKQKHSKLKSIVEMVKAAGER | | |||||||| || | ||| ||| | |||| |||| ????? KKDSNAPKRAMTSFMFFSSDFRS----KHSDL-SIVEMSKAAGAA EXTRAPOLATE Homology? SwissProt ??????

  6. Why Do We Want To Compare Sequences

  7. Why Does It Make Sense To Align Sequences ? -Evolution is our Real Tool. -Nature is LAZY and Keeps re-using Stuff. -Evolution is mostly DIVERGEANT Same Sequence  Same Ancestor

  8. Why Does It Make Sense To Align Sequences ? Same Sequence Same Function Same Origin Same 3D Fold Many Counter-examples!

  9. Comparing Is Reconstructing Evolution

  10. An Alignment is a STORY ADKPKRPLSAYMLWLN ADKPKRPLSAYMLWLN ADKPKRPLSAYMLWLN Mutations + Selection ADKPKRPKPRLSAYMLWLN ADKPRRPLS-YMLWLN

  11. ADKPKRPLSAYMLWLN ADKPKRPLSAYMLWLN ADKPKRPLSAYMLWLN Mutations + Selection ADKPKRPKPRLSAYMLWLN ADKPRRPLS-YMLWLN ADKPRRP---LS-YMLWLN ADKPKRPKPRLSAYMLWLN Deletion Insertion Mutation An Alignment is a STORY

  12. Evolution is NOT Always Divergent… AFGP with (ThrAlaAla)n Similar To Trypsynogen N S AFGP with (ThrAlaAla)n Chen et al, 97, PNAS, 94, 3811-16 NOT Similar to Trypsinogen

  13. Evolution is NOT Always Divergent AFGP with (ThrAlaAla)n Similar To Trypsynogen N S AFGP with (ThrAlaAla)n NOT Similar to Trypsinogen SIMILAR Sequences BUT DIFFERENT origin

  14. Evolution is NOT always Divergent… Same Sequence Same Origin Same Function Same 3D Fold But in MOST cases, you may assume it is… Similar Function DOES NOT REQUIRE Similar Sequence Similar Sequence  Historical Legacy

  15. How Do Sequences Evolve Each Portion of a Genome has its own Agenda.

  16. How Do Sequences Evolve ? Family KSKA Histone3 6.4 0 Insulin 4.0 0.1 Interleukin I 4.6 1.4 a-Globin 5.1 0.6 Apolipoprot. AI 4.5 1.6 Interferon G 8.6 2.8 Rates in Substitutions/site/Billion Years as measured on Mouse Vs Human (80Million years) Ks Synonymous Mutations, Ka Non-Neutral. CONSTRAINED Genome Positions Evolve SLOWLY EVERY Protein Family Has its Own Level Of Constraint

  17. Different molecular clocks for different proteins--another prediction

  18. How Do Sequences Evolve ? The amino Acids Venn Diagram C P L V Small A G G I Aliphatic C C T S D N K Y E F H Q W R Aromatic Hydrophobic Polar To Make Things Worse, Every Residue has its Own Personality

  19. How Do Sequences Evolve ? + - - In the core, SIZE MATTERS On the surface, CHARGE MATTERS OmpR, Cter Domain In a structure, each Amino Acid plays a Special Role

  20. How Do Sequences Evolve ? Big -> Big Small ->Small NO DELETION Charged -> Charged Small <-> Big or Small DELETIONS Accepted Mutations Depend on the Structure + - -

  21. How Can We Compare Sequences ? Substitution Matrices

  22. How Can We Compare Sequences ? Their Structure We Do Not Have Them !!! Their Function To Compare Two Sequences, We need:

  23. How Can We Compare Sequences ? Same Sequence We will Need To Replace Structural Information With Sequence Information. Same Origin Same Function Same 3D Fold It CANNOT Work ALL THE TIME !!!

  24. How Can We Compare Sequences ? How to derive that matrix? To Compare Sequences, We need to Compare Residues We Need to Know How Much it COSTS to SUBSTITUTE an Alanine into an Isoleucine a Tryptophan into a Glycine … The table that contains the costs for all the possible substitutions is called the SUBSTITUTION MATRIX

  25. How Can We Compare Sequences ? Using Knowledge Could Work C P Small L V A G G Aliphatic I C C T S D N K Y E F H Q Aromatic W R Hydrophobic Polar But we do not know enough about Evolution and Structure. Using Data works better.

  26. How Can We Compare Sequences ? Making a Substitution Matrix Observed Log Expected by chance -Take 100 nice pairs of Protein Sequences, easy to align (80% identical). -Align them… -Count each mutations in the alignments -25 Tryptophans into phenylalanine -30 Isoleucine into Leucine … -For each mutation, set the substitution score to the log odd ratio:

  27. You’re kidding! … I was struck by a lightning twice too!! Garry Larson, The Far Side

  28. How Can We Compare Sequences ? Making a Substitution Matrix The Diagonal Indicates How Conserved a residue tends to be. W is VERY Conserved Some Residues are Easier To mutate into other similar Cysteins that make disulfide bridges and those that do not get averaged

  29. How Can We Compare Sequences ? Making a Substitution Matrix

  30. How Can We Compare Sequences ? Using Substitution Matrix Given two Sequences and a substitution Matrix, We must Compute the CHEAPEST Alignment ADKPRRP---LS-YMLWLN ADKPKRPKPRLSAYMLWLN Deletion Insertion Mutation

  31. Scoring an Alignment Raw Score TPEA ¦| | APGA Score = + 6 + 0 + 2 = 9 • Question: Is it possible to get such a good alignment by chance only? • Most popular Subsitution Matrices • PAM250 • Blosum62 (Most widely used) 1

  32. Gap Opening Penalty Gap Extension Penalty gap Insertions and Deletions • Gap Penalties • Opening a gap is more expensive than extending it Seq A GARFIELDTHE----CAT ||||||||||| ||| Seq B GARFIELDTHELASTCAT

  33. How Can We Compare Sequences ? Limits of the substitution Matrices They ignore non-local interactions and Assume that identical residues are equal ADKPKRPLSAYMLWLN They assume evolution rate to be constant ADKPKRPLSAYMLWLN ADKPKRPLSAYMLWLN Mutations + Selection ADKPKRPKPRLSAYMLWLN ADKPRRPLS-YMLWLN

  34. How Can We Compare Sequences ? Limits of the substitution Matrices Substitution Matrices Cannot Work !!!

  35. How Can We Compare Sequences ? Limits of the substitution Matrices I know… But at least, could I get some idea of when they are likely to do all right

  36. How Can We Compare Sequences ? The Twilight Zone Similar Sequence Similar Structure Different Sequence Structure ???? 30% %Sequence Identity Same 3D Fold 30 Twilight Zone Length 100

  37. How Can We Compare Sequences ? The Twilight Zone Substitution Matrices Work Reasonably Well on Sequences that have more than 30 % identity over more than 100 residues

  38. How Can We Compare Sequences ? Which Matrix Shall I used Other Matrices Exist: BLOSUM 42 BLOSUM 62 BLOSUM 62 The Initial PAM matrix was computed on 80% similar Proteins It been extrapolated to more distantly related sequences. Pam 250 Pam 350

  39. How Can We Compare Sequences ? Which Matrix Shall I use Choosing The Right Matrix may be Tricky… • GONNET 250> BLOSUM62>PAM 250. • But This will depend on: • The Family. • The Program Used and Its Tuning. • Insertions, Deletions? PAM: Distant Proteins High Index (PAM 350) BLOSUM: Distant Proteins  Low Index (Blosum30)

  40. HOW Can we Align Two Sequences ? Dot MatricesGlobal Alignments Local Alignment

  41. Dot Matrices QUESTION What are the elements shared by two sequences ?

  42. >Seq1 THEFATCAT >Seq2 THELASTCAT T H E F A T C A T T Window H E Stringency F A S T C A T Dot Matrices

  43. Window size Sequences Stringency Dot Matrices

  44. Window=1Stringency=1 Window=11Stringency=7 Window=25Stringency=15 Dot Matrices Strigency