1 / 29

Duncan Legge EMBL-EBI

Duncan Legge EMBL-EBI. Introduction to Protein Signatures & InterPro. Protein Signatures. Protein Signature = an amino acid sequence (not necessarily consecutive) associated with a protein characteristic. Integration of signatures. InterPro. Foundations of InterPro. Manual curation.

lulu
Download Presentation

Duncan Legge EMBL-EBI

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Duncan Legge EMBL-EBI

  2. Introduction to Protein Signatures & InterPro Introduction to InterPro

  3. Protein Signatures Protein Signature = an amino acid sequence (not necessarily consecutive) associated with a protein characteristic. Introduction to InterPro

  4. Integration of signatures InterPro Foundations of InterPro Manual curation Introduction to InterPro

  5. InterPro Consortium Consortium of 11 major signature databases Introduction to InterPro

  6. What value are signatures? • Better at finding proteins with common function • Find more distant homologues than BLAST

  7. What value are signatures? • Better at finding proteins with common function • Classification of proteins • Associate proteins that share: Function Domains Sequence Structure

  8. What value are signatures? • Better at finding proteins with common function • Classification of proteins • Annotation of protein sequences • Define conserved regions of a protein • e.g. location and type of domains key structural or functional sites

  9. Protein Signature Methods Introduction to InterPro

  10. Multiple sequence alignment How are protein signatures made? Protein family/domain Build model Search Refine Protein signature Significant matches E-value 1e-49 ITWKGPVCGLDGKTYRNECALL E-value 3e-42 AVPRSPVCGSDDVTYANECELK E-value 5e-39 SVPRSPVCGSDGVTYGTECDLK E-value 6e-10 HPPPGPVCGTDGLTYDNRCELR Introduction to InterPro

  11. Types of Protein signatures (sequence based) Multiple protein alignment

  12. Types of Protein signatures (sequence based) Single motif methods Regular expression patterns C - C - {P} - x(2) - C- [STDNEKPI] - C

  13. Types of Protein signatures (sequence based) Single motif methods Regular expression patterns x = any AA ( ) = number of AAs Must be this C - C - {P} - x(2) - C- [STDNEKPI] - C { } = cannot be.. [ ] = any of

  14. Types of Protein signatures (sequence based) Single motif methods Regular expression patterns 1 2 3 Multiple motif methods Identity matrices Fingerprints

  15. D3 Types of Protein signatures (sequence based) Single motif methods I2 I1 I3 Regular expression patterns M1 M2 M3 M4 M4 Full domain alignment methods D2 Profiles (Profile Library) Multiple motif methods Hidden Markov Models Mathematical model of amino acid probability Identity matrices Fingerprints

  16. CONTRIBUTING MEMBER DATA BASES Models built on either sequence or structural alignments Each MDB has its own focus Hidden Markov Models Finger- Prints Profiles Patterns Sequence Clusters Protein features (active sites…) Prediction of conserved domains Structural Domains Functional annotation of families/domains

  17. A Closer look at InterPro Introduction to InterPro

  18. Integration of signatures InterPro Foundations of InterPro Manual curation Master headline

  19. InterProCurationPriniciples -To represent MDBs signatures as closely as possible to what they intended • To reflect biological reality as accurately as possible in the entry we create by using types, relationships, GO mapping • To provide as much information to the end user as possible about the signatureby annotating signatuires and providing links to other databases. Master headline

  20. InterPro Entry Links related signatures Groups similar signature together Adds extensive annotation Linked to other databases Structural information and viewers Master headline

  21. * Parent (100) Protein kinase PFAM PFAM (75) Serine kinase SMART Protein kinase * (100) Protein kinase PFAM (25) PROSITE Tyrosine kinase SMART PROSITE Serine kinase Tyrosine kinase SMART PROSITE Children No proteins in common Link related signatures - relationships 1) Parent - Child (subgroup of more closely related proteins) Applies to domains and families Master headline

  22. The InterPro entry types Proteins share a common evolutionary origin, as reflected in their related functions, sequences or structure Biological units with defined boundaries Short sequences typically repeated within a protein Active Site Binding Site Conserved Site PTM Master headline

  23. Searching InterPro protein ID Paste in unknown sequence

  24. InterPro Search Results Family Link to PDBe Domains and sites Unintegrated signatures Structural data

  25. Link to InterPro entry Links to signature databases

  26. https://www.ebi.ac.uk/Tools/pfa/iprscan/ Select member databases

  27. Caveats • InterPro entries are based on signatures supplied to us by our member databases • ....this means no signature, no entry! We need your feedback! missing/additional references reporting problems requests

  28. ACKNOWLEDGEMENTS InterPro Team: Craig McAnulla AmaiaSangrador Sarah Hunter Alex Mitchell Siew-Yit Yong Maxim Scheremetjew Phil Jones Matthew Fraser SebastienPesseat

More Related