1 / 22

Patterns and Profiles

Patterns and Profiles. Lisa Mullan, HGMP-RC. Terminology. Homologs Two proteins that share a common ancestor Usually similar functions Orthologs : different species Paralogs : same genome Analogs Two sequences that have NO common ancestor, but have similar functions. Protein

Download Presentation

Patterns and Profiles

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Patterns and Profiles Lisa Mullan, HGMP-RC

  2. Terminology Homologs Two proteins that share a common ancestor • Usually similar functions • Orthologs : different species • Paralogs : same genome Analogs • Two sequences that have NO common • ancestor, but have similar functions. Protein • analogs may have the same fold.

  3. 7 10 Multiple sequence alignments CHERRIES CLEMENTIN-ES P-EAR--S GRE-ENAPPLES Most programs use “clustal” – a clustering algorithm

  4. 4 24 Multiple sequence alignments P-EARS----- GREENAPPLES CLEMENTINES CHERR--I-ES

  5. 0 24 Multiple sequence alignments GREENAPPLES CHERR---IES P-EARS----- CLEMENTINES

  6. GREENAPPLES CLEMENTINES CHERRIES PEARS GREENAPPLES CLEMENTINES CHERR---IES P-EARS----- Multiple sequence alignments (cont.)

  7. Multiple sequence alignments (cont.) CLUSTAL W (1.7) multiple sequence alignment Q40236/1-193 GTF-DQLQLVLRWPTSFCNGKNCKRTPKDFTIHGLWPDSEAGELNFCNPRASYTIVRHGTF Q40241/1-189 -----QLQLVLRWPTSFCNGKNCKRTPKDFTIHGLWPDSEAGELNFCNPRASYTIVRHGTF Q42513/1-193 GTF-NQLQLVLRWPASFCKGKKCERTPNNFTIHGLWPDIKGTILNNCNPDAKYASVTGGKF G255586/1-194 GAF-EYMQLVLQWPTAFCHTTPCKNIPSNFTIHGLWPDNVSTTLNFCGKEDDYNIIMDGP- Q40379/1-194 GAF-EYMQLVLQWPTTFCHTTPCKNIPSNFTIHGLWPDNVSTTLNFCGKEDDYNIIMDGP- :****:**::**: . *:. *.:********* . ** *. .* : * Q40236/1-193 EKRN---KHWPDLMRSKDNSMDNQEFWKHEYIKHGSCCTDLFNETQYFDLALVLKDRFDLLT Q40241/1-189 EKRN---KHWPDLMRSKDNSMDNQEFWKHEYIKHGSCCTDLFNETQYFDLALVLKDRFDLLT Q42513/1-193 VKRN---KHWPDLILTEAASLNSQGFWAYQFKKHGTCCSDLFNQEKYFDLALILKDKFDLLT G255586/1-194 EK-NGLYVRWPDLIREKADCMKTQNFWRREYIKHGTCCSEIYNQVQYFRLAMALKDKFDLLT Q40379/1-194 EK-NGLYVRWPDLIREKADCMKTQNFWRREYIKHGTCCSEIYNQVQYFRLAMALKDKFDLLT :** :****: : .:..* ** :: ***:**::::*: :** **: ***:***** Q40236/1-193 TFRIHGIVPRSSHTVDKIKKTIRSVTGVLPNLSCTKNMDLLEIGICFNREASKMIDCTRP Q40241/1-189 TFRIHGIVPRSSHTVDKIKKTIRSVTGVLPNLSCTKNMDLLEIGICFNREASKMIDCTRP Q42513/1-193 TFRNKGIIPKSTCTINKIQKTIRTVTGVVPNLSCTPTMELLEVGICFNRDASKLIDCDQP G255586/1-194 SLKNHGIIRGYKYTVQKINNTIKTVTKGYPNLSCTKGQELWEVGICFDSTAKNVIDCPNP Q40379/1-194 SLKNHGIIRGYKYTVQKINNTIKTVTKGYPNLSCTKGQELWEVGICFDSTAKNVIDCPNP ::: :**: . *::**::**::** ****** :* *:****: *.::*** .* Q40236/1-193 KTCNPGEDNLIGFP Q40241/1-189 KTCNPGEDNLIGFP Q42513/1-193 KTCDTSGNTEIFFP G255586/1-194 KTCKTASNQGIMFP Q40379/1-194 KTCKTASNQGIMFP ***... : * **

  8. Multiple sequence alignments (cont.) ( ( Q40236/1-193:-0.00066, Q40241/1-189:0.00066) :0.18460, Q42513/1-193:0.17928, ( G255586/1-194:0.00258, Q40379/1-194:0.00258) :0.32591);

  9. Motifs - assigned to the secondary structure of a protein E.coli trp repressor

  10. Leucine zipper motif L-X(6)-L-X(6)-L-X(6)-L

  11. http://bioinf.man.ac.uk/dbbrowser/PRINTS/ “A fingerprint is a group of conserved motifs used to characterise a protein family”

  12. Domains Many definitions – depends who you speak to! • Domains are discrete structural units • Defined by structure • Domain boundaries can be inferred from careful sequence analysis • Domains are the common currency of protein function

  13. But – there are slightly more glutamates than aspartates in the alignment! EFGHIVW EYAHMIW DYAHSLW EFGHPLW [ED]- [FY]- [GA]- H- X- [VIL]- W And could X be represented more accurately by {FYW}?

  14. EFGHIVW EYAHMIW DYAHSLW EFGHPLW So, let’s add some numbers to the problem! Positions One 15 5 0 0 0 0 0 0 0 0 0 0 0 0 Two 0 0 10 10 0 0 0 0 0 0 0 0 0 0 Three 0 0 0 0 0 10 10 0 0 0 0 0 0 0 Four 0 0 0 0 20 0 0 0 0 0 0 0 0 0 Five 2 2 -2 -2 2 2 2 2 2 2 2 2 2 -2 Six 0 0 0 0 0 0 0 5 0 0 0 10 5 0 Seven 0 0 0 0 0 0 0 0 0 0 0 0 0 20 E D F Y H G A I M S P L V W

  15. http://www.expasy.ch/prosite

  16. http://protein.toulouse.inra.fr/prodomCG.html

  17. http://www.ebi.ac.uk/interpro

  18. M 1.0 I .50 0.75 0.75 E .75 D .25 F .50 Y .50 S 1.0 V .25 I .25 L .50 X 1.0 1.0 1.0 0.25 1.0 0.25 H 1.0 1.0 W 1.0 But…….profiles do not support gaps…. EFH-IIVW EYH--MIW DYHSISLW EFH-IPLW Hidden Markov Models introduce statistics into profiles

  19. http://www.sanger.ac.uk/pfam

  20. Pfam-A • 2,216 Curated families with annotation. • Pfam-B • 40,000 families derived from Prodom.

  21. http://smart.embl-heidelberg.org.de/

  22. Four character ID code

More Related