1 / 60

A brief on: Domain Families & Classification

A brief on: Domain Families & Classification. Evolution by Protein Domains. The discovery of domains in protein structures Domains at the sequence level Examples of “Domain Resources” Domain fusion Supra-domains Signaling domains and cell function InterPro. Automatic Large scale.

step
Download Presentation

A brief on: Domain Families & Classification

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A brief on: Domain Families & Classification

  2. Evolution by Protein Domains • The discovery of domains in protein structures • Domains at the sequence level • Examples of “Domain Resources” • Domain fusion • Supra-domains • Signaling domains and cell function • InterPro

  3. Automatic Large scale Manual High Quality Classification to Families We can classify proteins into families by: • A. Sequence (motifs; proteins) • B. Structure • C. Function (annotation) • D. Evolution

  4. Sequence Based Classification • Proteins as a unit • Proteins as combination of domains Functional Structural Sequence The Goal: • New Annotation, New Family, Family connections (sub/ super) … • Predicting power (given a new unknown sequence)

  5. Protein Multiple Alignment (Structurally supported)

  6. Q: What is the best way to ‘represent’ this low sequence similarity of ~ 70 aa Domains can be recognized through sequence similarity

  7. Misannotation due to multidomain proteins Domain of known function A Domain of unknown function B Annotation kinase Multidomain protein C Kinase-like Kinase-like A is similar to C, and C is similar to B, but A is not similar to B Smith and Zhang. Nat Biotechnol 1997 15:1222-3

  8. Q: What is the best way to ‘represent’ this low sequence similarity of ~ 70 aa ‘Profile’ PSSM Regular Expression HMM And more…

  9. Multi domain protein families Impossible to find ‘evolutionary relatedness” without adding DOMAIN information…

  10. How is a novel gene born? • Domains are the evolutionary units of sequence that comprise the gene coding regions. • Most genes are built from more than one domain. • Novel genes can be created by recombination of domains into new domain arrangements.

  11. Correspondence between functional associations and genes linked by the fusion method From Glycolysis: M. genitalium PGK Glycerone-P M. genitalium TIM PGK1 M. genitalium GAPDH Glyceraldehyde-3P GAPDH Glycerate-1,3P2 Thermotoga Maritima PGK+TIM TIM Glycerate-3P Phytophthora infestans TIM+GAPDH

  12. input K6A1 MOUSE 1e-42 CSKP HUMAN 9e-41 8e-78 DLG3 MOUSE 2e-47 MPP3 HUMAN False Transitivity of Local Alignment BLAST values Pairwise similarities better than 1e-40 EScore If we cluster these proteins, assuming transitivity of local alignment scores, we will cluster K6A1_MOUSE with MPP3_HUMAN

  13. Used Terms:Motif = Domain = Signature = Profile = Seed Family = Cluster These terms are used interchangeably, They are very (too) flexible Domain Classification (intro to few systems)

  14. Protein Sequence Domain Classification DOMO ADDA EVEREST InterPro CDD MetaFam ProSite Pfam Blocks+ Profile SBASE TigrFam eMotif SMART PRINTS ProDom Based on different principles and a different focus!

  15. Integration: Data Fusion InterPro 13,000 entries Based on UniProt DB

  16. Expert system Pfam InterPro - >13,000 entries

  17. Examples: complexity in domains Identification ? Boundary ? Composition ?

  18. Why domains and not proteins Reducingfalsetransitivity. ExposingMix and Match evolution Immediate relevance tostructural domain-families Suggesting evolutionary ‘robust units’ Providing models for a family Why automatic? Overcoming large amounts of data Unbiased identification of new families (even without an identified seed / without 3D structural information )

  19. Domains are the building blocks of evolution: some facts.. 3 domains Each occurs in diverse sets of protein families Number of domains in proteins ranges from 1 up to tens Structural based domain are ~ 150 aa Length varies: some are very short 30-40 aa, other are long > 500 aa Domain definition is somewhat blurred Domain boundary is an unsolved problem Pyruvate kinase, PDB:1pkn

  20. What is a domain? You know it when you see one

  21. >13,000 entries Automatic vs Manual

  22. General approaches • Motif based databases • Prosite, Prints, Blocks, eMotif, InterPro • Domain-based databases • Pfam, ProDom, Domo, Smart • Manual/Semi-manual • Prosite • Semi-automatically • Pfam, Smart • Fully automatic • ProDom, Blocks, Domo, eMotif • Use different models (regular expressions, profiles, HMMs) • Based on each other

  23. Example of semi - automatic • Pfam: Nucleic Acids Research, 2007, 1–8 • Release of Pfam (22.0) contains 9318 protein families. • cover 73.2% of sequences and 50.8%. • Pfam is now based on UniProtKB, NCBI GenPept and metagenomics projects. • ~ 500 new Pfam-A families for PDB sequences and SCOP entries. • Increasing the aa cover ! • Clans are built manually (supported by literature, SCOP..) • total of 283 clans comprising a total of 1808 Pfam-A families.

  24. The Power of Integration SCOP CATH FSSP Pfam, Prosite, SMART, PRINTS, tigrFam ProDom GO ENZ KEGG InterPro

  25. Proteins were found to have spatially distinct structural units StructureDomains provide a “clean” definition TRANSFERASE (METHYLTRANSFERASE) 1adm

  26. In 1974, Michael Rossman observes that structural domains can recur in different structural contexts 1ht0 – an alcohol dehydrogenase 1i0z – a lactate dehydrogenase Rossman fold

  27. Domains can recur in multiple copies in the same protein Fibronectin protein–1fnf

  28. Structural definition of domains A distinct, compact, and stable protein structural unit that folds independently of other such units.

  29. Structural definition of domains A distinct, compact, and stable protein structural unit that folds independently of other such units.

  30. Recurrent domains in diphtheria toxin (1ddt) The diphtheria toxin is made up of three domains, each of which is involved in a different stage of infection (receptor binding, membrane penetration, and catalysis of ADP-ribosylation of elongation factor 2). A structural neighbor is depicted next to each domain of diphtheria toxin (middle).

  31. Dominant domain fold types. Holm and Sander. PROTEINS: Structure, Function, and Genetics 33:88–96 (1998)

  32. 701 1,110 1,940 44,327 SCOP – a structural classification of proteins Families are in turn grouped into superfamilies where sequence similarity is still recognizable and basic biochemical properties are conserved. Superfamilies and families are monophyletic (derive from a common ancestor) Updated from Murzin et al. J. Mol. Biol. 247, 536-540.

  33. Dominant domain fold types. Holm and Sander. PROTEINS: Structure, Function, and Genetics 33:88–96 (1998)

  34. Sequence Biology predominantly proceeds by decomposing proteins into their domains Protein sequence families are constructed at the domain level

  35. Prosite A dictionary of functional and structural motifs and domains Valuable biological information on each family Each motif/domain/family is represented as a regular expression, a rule or a profile Models are generated from (usually published) multiple alignments, manually calibrated to ensure selectivity and sensitivity Patterns do not always cover complete domains whereas profiles usually span the whole domain As of June 2002 contains 1800 patterns and profiles describing 1200 families or domains 1 2 3 4 5 6 7 8 9 10 11 A 0 0.25 0.25 1 0.5 0 1 0.5 0 0.25 0 C 0 0 0.25 0 0 0 0 0 0.25 0.25 1 G 1 0.5 0 0 0 0.25 0 0.5 0.75 0.25 0 T 0 0.25 0.5 0 0.5 0.75 0 0 0 0.25 0 OR G-x(2,3)-[MLIV]-x-P-{K,H}-x(2)-C

  36. Detecting domains at the sequence level From the SMART database

  37. Fusion Links Glycyl-tRNA Synthetase glyQ glyS E. Coli: Fusion link CT796 C. Trachomatis: The fact that glyQ and glyS interact could have been predicted from the fusion protein CT796

  38. The good thing about standards is that there are so many of them to choose from… Interpro An integrated resource of protein sites and functional domains

  39. Introducing Interpro…. http://www.ebi.ac.uk/interpro/

  40. Interpro entry for a zinc finger domain

  41. חיפוש לפי taxonomy:

  42. תוצאות חיפוש לדוגמא עבור החלבון 1Sirt באדם:

  43. הצגת Alignment.

  44. הצגת HMM-Logo.

  45. iPfam - מאגר אינטראקציות domain-domain המבוסס על רשומות PDB.

  46. יתרונות בולטים: • קישור ממאגרי המידע המובילים – UniProt,PDB,interPro. • בקרה ידנית על החלוקה למשפחות. • חיפוש בעזרת HMM עבור רצפים גלובלים ומקומיים. • ריכוז של domain architectures בהם משולב החלבון. • עצים פילוגנטיים וטקסונומיים לחיפוש חלבונים הומולוגים מוכרים. • תצוגת HMM ו-Alignment בצורה גרפית. • אפשרות להוריד את המאגר בשלמותו.

  47. Super-families of domains in Interpro (analogous to superfamilies in SCOP)

  48. Some domains actually contain other domains!

  49. GATCTACCATGAAAGACTTGTGAATCCAGGAAGAGAGACTGACTGGGCAACATGTTATTCAGGTACAAAAAGATTTGGACTGTAACTTAAAAATGATCAAATTATGTTTCCCATGCATCAGGTGCAATGGGAAGCTCTTCTGGAGAGTGAGAGAAGCTTCCAGTTAAGGTGACATTGAAGCCAAGTCCTGAAAGATGAGGAAGAGTTGTATGAGAGTGGGGAGGGAAGGGGGAGGTGGAGGGATGGGGAATGGGCCGGGATGGGATAGCGCAAACTGCCCGGGAAGGGAAACCAGCACTGTACAGACCTGAACAACGAAGATGGCATATTTTGTTCAGGGAATGGTGAATTAAGTGTGGCAGGAATGCTTTGTAGACACAGTAATTTGCTTGTATGGAATTTTGCCTGAGAGACCTCATTGCAGTTTCTGATTTTTTGATGTCTTCATCCATCACTGTCCTTGATGGCATATTTTGTTCAGGGAATGGTGAATTAAGTGTGGCAGGAATGCTTTGTAGACACAGTAATTTGCTTGTATGGAGTCAAATAGTTTGGAACAGGTATAATGATCACAATAACCCCAAGCATAATATTTCGTTAATTCTCACAGAATCACATATAGGTGCCACAGTTATGGAGTSignalingandMulticellularityAAACCTTAGGAATAATGAATGATTTGCGCAGGCTCACCTGGATATTAAGACTGAGTCAAATGTTGGGTCTGGTCTGACTTTAATGTTTGCTTTGTTCATGAGCACCACATATTGCCTCTCCTATGCAGTTAAGCAGGTAGGTGACAGAAAAGCCCATGTTTGTCTCTACTCACACACTTCCGACTGAATGTATGTATGGAGTTTCTACACCAGATTCTTCAGTGCTCTGGATATTAACTGGGTATCCCATGACTTTATTCTGACACTACCTGGACCTTGTCAAATAGTTTGGACCTTGTCAAATAGTTTGGAGTCCTTGTCAAATAGTTTGGGGTTAGCACAGACCCCACAAGTTAGGGGCTCAGTCCCACGAGGCCATCCTCACTTCAGATGACAATGGCAAGTCCTAAGTTGTCACCATACTTTTGACCAACCTGTTACCAATCGGGGGTTCCCGTAACTGTCTTCTTGGGTTTAATAATTTGCTAGAACAGTTTACGGAACTCAGAAAAACAGTTTATTTTCTTTTTTTCTGAGAGAGAGGGTCTTATTTTGTTGCCCAGGCTGGTGTGCAATGGTGCAGTCATAGCTCATTGCAGCCTTGATTGTCTGGGTTCCAGTGGTTCTCCCACCTCAGCCTCCCTAGTAGCTGAGACTACATGCCTGCACCACCACATCTGGCTAGTTTCTTTTATTTTTTGTATAGATGGGGTCTTGTTGTGTTGGCCAGGCTGGCCACAAATTCCTGGTCTCAAGTGATCCTCCCACCTCAGCCTCTGAAAGTGCTGGGATTACAGATGTGAGCCACCACATCTGGCCAGTTCATTTCCTATTACTGGTTCATTGTGAAGGATACATCTCAGAAACAGTCAATGAAAGAGACGTGCATGCTGGATGCAGTGGCTCATGCCTGTAATCTCAGCACTTTGGGAGGCCAAGGTGGGAGGATCGCTTAAACTCAGGAGTTTGAGACCAGCCTGGGCAACATGGTGAAAACCTGTCTCTATAAAAAATTAAAAAATAATAATAATAACTGGTGTGGTGTTGTGCACCTAGAGTTCCAACTACTAGGGAAGCTGAGATGAGAGGATACCTTGAGCTGGGGACTGGGGAGGCTTAGGTTACAGTAAGCTGAGATTGTGCCACTGCACTCCAGCTTGGACAAAAGAGCCTGATCCTGTCTCAAAAAAAAGAAAGATACCCAGGGTCCACAGGCACAGCTCCATCGTTACAATGGCCTCTTTAGACCCAGCTCCTGCCTCCCAGCCTTCTGATCTACCATGAAAGACTTGTGAATCCAGGAAGAGAGACTGACTGGGCAACATGTTATTCAGGTACAAAAAGATTTGGACTGTAACTTAAAAATGATCAAATTATGTTTCCCATGCATCAGGTGCAATGGGAAGCTCTTCTGGAGAGTGAGAGAAGCTTCCAGTTAAGGTGACATTGAAGCCAAGTCCTGAAAGATGAGGAAGAGTTGTATGAGAGTGGGGAGGGAAGGGGGAGGTGGAGGGATGGGGAATGGGCCGGGATGGGATAGCGCAAACTGCCCGGGAAGGGAAACCAGCACTGTACAGACCTGAACAACGAAGATGGCATATTTTGTTCAGGGAATGGTGAATTAAGTGTGGCAGGAATGCTTTGTAGACACAGTAATTTGCTTGTATGGAATTTTGCCTGAGAGACCTCATTGCAGTTTCTGATTTTTTGATGTCTTCATCCATCACTGTCCTTGATGGCATATTTTGTTCAGGGAATGGTGAATTAAGTGTGGCAGGAATGCTTTGTAGACACAGTAATTTGCTTGTATGGAGTCAAATAGTTTGGAACAGGTATAATGATCACAATAACCCCAAGCATAATATTTCGTTAATTCTCACAGAATCACATATAGGTGCCACAGTTATGGAGTSignalingandMulticellularityAAACCTTAGGAATAATGAATGATTTGCGCAGGCTCACCTGGATATTAAGACTGAGTCAAATGTTGGGTCTGGTCTGACTTTAATGTTTGCTTTGTTCATGAGCACCACATATTGCCTCTCCTATGCAGTTAAGCAGGTAGGTGACAGAAAAGCCCATGTTTGTCTCTACTCACACACTTCCGACTGAATGTATGTATGGAGTTTCTACACCAGATTCTTCAGTGCTCTGGATATTAACTGGGTATCCCATGACTTTATTCTGACACTACCTGGACCTTGTCAAATAGTTTGGACCTTGTCAAATAGTTTGGAGTCCTTGTCAAATAGTTTGGGGTTAGCACAGACCCCACAAGTTAGGGGCTCAGTCCCACGAGGCCATCCTCACTTCAGATGACAATGGCAAGTCCTAAGTTGTCACCATACTTTTGACCAACCTGTTACCAATCGGGGGTTCCCGTAACTGTCTTCTTGGGTTTAATAATTTGCTAGAACAGTTTACGGAACTCAGAAAAACAGTTTATTTTCTTTTTTTCTGAGAGAGAGGGTCTTATTTTGTTGCCCAGGCTGGTGTGCAATGGTGCAGTCATAGCTCATTGCAGCCTTGATTGTCTGGGTTCCAGTGGTTCTCCCACCTCAGCCTCCCTAGTAGCTGAGACTACATGCCTGCACCACCACATCTGGCTAGTTTCTTTTATTTTTTGTATAGATGGGGTCTTGTTGTGTTGGCCAGGCTGGCCACAAATTCCTGGTCTCAAGTGATCCTCCCACCTCAGCCTCTGAAAGTGCTGGGATTACAGATGTGAGCCACCACATCTGGCCAGTTCATTTCCTATTACTGGTTCATTGTGAAGGATACATCTCAGAAACAGTCAATGAAAGAGACGTGCATGCTGGATGCAGTGGCTCATGCCTGTAATCTCAGCACTTTGGGAGGCCAAGGTGGGAGGATCGCTTAAACTCAGGAGTTTGAGACCAGCCTGGGCAACATGGTGAAAACCTGTCTCTATAAAAAATTAAAAAATAATAATAATAACTGGTGTGGTGTTGTGCACCTAGAGTTCCAACTACTAGGGAAGCTGAGATGAGAGGATACCTTGAGCTGGGGACTGGGGAGGCTTAGGTTACAGTAAGCTGAGATTGTGCCACTGCACTCCAGCTTGGACAAAAGAGCCTGATCCTGTCTCAAAAAAAAGAAAGATACCCAGGGTCCACAGGCACAGCTCCATCGTTACAATGGCCTCTTTAGACCCAGCTCCTGCCTCCCAGCCTTCT One of the key problems of becoming a multicellular organism is solving the problem of cell signaling.

  50. signal transduction Phosphorylation can reversibly alter the activity of an enzyme through the combined action of a protein kinase and a protein phosphatase. p phosphotase kinase inactive active inactive

More Related