1 / 75

Comparing Two Protein Sequences

Comparing Two Protein Sequences. Cédric Notredame. Our Scope. Look once Under the Hood. Pairwise Alignment methods are LIMITED. If You Understand the LIMITS they Become VERY POWERFUL. Pairwise Alignment methods are POWERFUL. Outline. -WHY Does It Make Sense To Compare Sequences.

archie
Download Presentation

Comparing Two Protein Sequences

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Comparing Two Protein Sequences Cédric Notredame

  2. Our Scope Look once Under the Hood Pairwise Alignment methods are LIMITED If You Understand the LIMITS they Become VERY POWERFUL Pairwise Alignment methods are POWERFUL

  3. Outline -WHY Does It Make Sense To Compare Sequences -HOW Can we Compare Two Sequences ? -HOW Can we Align Two Sequences ? -HOW can I Search a Database ?

  4. Why Does It Make Sense To Compare Sequences ? Sequence Evolution

  5. Why Do We Want To Compare Sequences wheat --DPNKPKRAMTSFVFFMSEFRSEFKQKHSKLKSIVEMVKAAGER | | |||||||| || | ||| ||| | |||| |||| ????? KKDSNAPKRAMTSFMFFSSDFRS----KHSDL-SIVEMSKAAGAA EXTRAPOLATE Homology? SwissProt ??????

  6. Why Do We Want To Compare Sequences

  7. Why Does It Make Sense To Align Sequences ? -Evolution is our Real Tool. -Nature is LAZY and Keeps re-using Stuff. -Evolution is mostly DIVERGEANT Same Sequence  Same Ancestor

  8. Why Does It Make Sense To Align Sequences ? Same Sequence Same Origin Same Function Same 3D Fold

  9. Comparing Is Reconstructing Evolution

  10. An Alignment is a STORY ADKPKRPLSAYMLWLN ADKPKRPLSAYMLWLN ADKPKRPLSAYMLWLN Mutations + Selection ADKPKRPKPRLSAYMLWLN ADKPRRPLS-YMLWLN

  11. ADKPKRPLSAYMLWLN ADKPKRPLSAYMLWLN ADKPKRPLSAYMLWLN Mutations + Selection ADKPKRPKPRLSAYMLWLN ADKPRRPLS-YMLWLN ADKPRRP---LS-YMLWLN ADKPKRPKPRLSAYMLWLN Deletion Insertion Mutation An Alignment is a STORY

  12. Evolution is NOT Always Divergent… AFGP with (ThrAlaAla)n Similar To Trypsynogen N S AFGP with (ThrAlaAla)n Chen et al, 97, PNAS, 94, 3811-16 NOT Similar to Trypsinogen

  13. Evolution is NOT Always Divergent AFGP with (ThrAlaAla)n Similar To Trypsynogen N S AFGP with (ThrAlaAla)n NOT Similar to Trypsinogen SIMILAR Sequences BUT DIFFERENT origin

  14. Evolution is NOT always Divergent… Same Sequence Same Origin Same Function Same 3D Fold But in MOST cases, you may assume it is… Similar Function DOES NOT REQUIRE Similar Sequence Similar Sequence  Historical Legacy

  15. How Do Sequences Evolve Each Portion of a Genome has its own Agenda.

  16. How Do Sequences Evolve ? Family KSKA Histone3 6.4 0 Insulin 4.0 0.1 Interleukin I 4.6 1.4 a-Globin 5.1 0.6 Apolipoprot. AI 4.5 1.6 Interferon G 8.6 2.8 Rates in Substitutions/site/Billion Years as measured on Mouse Vs Human (80Million years) Ks Synonymous Mutations, Ka Non-Neutral. CONSTRAINED Genome Positions Evolve SLOWLY EVERY Protein Family Has its Own Level Of Constraint

  17. Different molecular clocks for different proteins--another prediction

  18. How Do Sequences Evolve ? The amino Acids Venn Diagram C P L V Small A G G I Aliphatic C C T S D N K Y E F H Q W R Aromatic Hydrophobic Polar To Make Things Worse, Every Residue has its Own Personality

  19. How Do Sequences Evolve ? + - - In the core, SIZE MATTERS On the surface, CHARGE MATTERS OmpR, Cter Domain In a structure, each Amino Acid plays a Special Role

  20. How Do Sequences Evolve ? Big -> Big Small ->Small NO DELETION Charged -> Charged Small <-> Big or Small DELETIONS Accepted Mutations Depend on the Structure + - -

  21. How Can We Compare Sequences ? Substitution Matrices

  22. How Can We Compare Sequences ? Their Structure We Do Not Have Them !!! Their Function To Compare Two Sequences, We need:

  23. How Can We Compare Sequences ? Same Sequence Same Origin Same Function Same 3D Fold We will Need To Replace Structural Information With Sequence Information. It CANNOT Work ALL THE TIME !!!

  24. How Can We Compare Sequences ? How to derive that matrix? To Compare Sequences, We need to Compare Residues We Need to Know How Much it COSTS to SUBSTITUTE an Alanine into an Isoleucine a Tryptophan into a Glycine … The table that contains the costs for all the possible substitutions is called the SUBSTITUTION MATRIX

  25. How Can We Compare Sequences ? Using Knowledge Could Work C P Small L V A G G Aliphatic I C C T S D N K Y E F H Q Aromatic W R Hydrophobic Polar But we do not know enough about Evolution and Structure. Using Data works better.

  26. How Can We Compare Sequences ? Making a Substitution Matrix Observed Log Expected by chance -Take 100 nice pairs of Protein Sequences, easy to align (80% identical). -Align them… -Count each mutations in the alignments -25 Tryptophans into phenylalanine -30 Isoleucine into Leucine … -For each mutation, set the substitution score to the log odd ratio:

  27. How Can We Compare Sequences ? Making a Substitution Matrix The Diagonal Indicates How Conserved a residue tends to be. W is VERY Conserved Some Residues are Easier To mutate into other similar Cysteins that make disulfide bridges and those that do not get averaged

  28. How Can We Compare Sequences ? Making a Substitution Matrix

  29. How Can We Compare Sequences ? Using Substitution Matrix Given two Sequences and a substitution Matrix, We must Compute the CHEAPEST Alignment ADKPRRP---LS-YMLWLN ADKPKRPKPRLSAYMLWLN Deletion Insertion Mutation

  30. Scoring an Alignment Raw Score TPEA ¦| | APGA Score = + 6 + 0 + 2 = 9 • Question: Is it possible to get such a good alignment by chance only? • Most popular Subsitution Matrices • PAM250 • Blosum62 (Most widely used) 1

  31. Gap Opening Penalty Gap Extension Penalty gap Insertions and Deletions • Gap Penalties • Opening a gap is more expensive than extending it Seq A GARFIELDTHE----CAT ||||||||||| ||| Seq B GARFIELDTHELASTCAT

  32. How Can We Compare Sequences ? Limits of the substitution Matrices They ignore non-local interactions and Assume that identical residues are equal ADKPKRPLSAYMLWLN They assume evolution rate to be constant ADKPKRPLSAYMLWLN ADKPKRPLSAYMLWLN Mutations + Selection ADKPKRPKPRLSAYMLWLN ADKPRRPLS-YMLWLN

  33. How Can We Compare Sequences ? Limits of the substitution Matrices Substitution Matrices Cannot Work !!!

  34. How Can We Compare Sequences ? Limits of the substitution Matrices I know… But at least, could I get some idea of when they are likely to do all right

  35. How Can We Compare Sequences ? The Twilight Zone Similar Sequence Similar Structure Different Sequence Structure ???? 30% %Sequence Identity Same 3D Fold 30 Twilight Zone Length 100

  36. How Can We Compare Sequences ? The Twilight Zone Substitution Matrices Work Reasonably Well on Sequences that have more than 30 % identity over more than 100 residues

  37. How Can We Compare Sequences ? Which Matrix Shall I used Other Matrices Exist: BLOSUM 42 BLOSUM 62 BLOSUM 62 The Initial PAM matrix was computed on 80% similar Proteins It been extrapolated to more distantly related sequences. Pam 250 Pam 350

  38. How Can We Compare Sequences ? Which Matrix Shall I use Choosing The Right Matrix may be Tricky… • GONNET 250> BLOSUM62>PAM 250. • But This will depend on: • The Family. • The Program Used and Its Tuning. • Insertions, Deletions? PAM: Distant Proteins High Index (PAM 350) BLOSUM: Distant Proteins  Low Index (Blosum30)

  39. HOW Can we Align Two Sequences ? Dot MatricesGlobal Alignments Local Alignment

  40. Dot Matrices QUESTION What are the elements shared by two sequences ?

  41. >Seq1 THEFATCAT >Seq2 THELASTCAT T H E F A T C A T T Window H E Stringency F A S T C A T Dot Matrices

  42. Window size Sequences Stringency Dot Matrices

  43. Window=1Stringency=1 Window=11Stringency=7 Window=25Stringency=15 Dot Matrices Strigency

  44. x x x y y Dot Matrices

More Related