1 / 37

Structural Genomics: Case studies in assigning function from structure

?. ?. ?. ?. ?. ?. ?. ?. ?. ?. ?. ?. Structural Genomics: Case studies in assigning function from structure. James D Watson watson@ebi.ac.uk. Structural Genomics Collaborators. MCSG – M id-west C entre for S tructural G enomics. SPINE – S tructural P roteomics in E urope.

keaton
Download Presentation

Structural Genomics: Case studies in assigning function from structure

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ? ? ? ? ? ? ? ? ? ? ? ? Structural Genomics: Case studies in assigning function from structure James D Watson watson@ebi.ac.uk

  2. Structural Genomics Collaborators MCSG – Mid-west Centre for Structural Genomics SPINE – Structural Proteomics in Europe SGC – Structural Genomics Consortium

  3. Structural Genomics Aims Pathogens and disease Automation / High Throughput ? Coverage of Fold Space Human Proteins

  4. Proteins: known sequences and 3D structures 5,500 non-redundant structures ~1.3m non-redundant protein sequences ~260,000 homology models MRTKSPGDSKFHEITKTPPKNQVSNS… MIVISGENVDIAELTDFLCAA… PPRIPYSMVGPCCVFLMHH… MDVVDSLFVNGSNITSACELGFENE… VYAWETAHFLDAAPKLIEWEVS… MAQQRRGGFKRRKKVDFIAANKIE… CELGFENETLFCLDRPRPSKE… MAQQRRGGFKRRKKVDFIAANKIE… MGMKKNRPRRGSLAFSPRKRAKKLVP… MQILKENASNQRFVTRESEV… MEKFEGYSEKQKSRQQYFVYPFLF… MEEFVNPCKIKVIGVGGGGSNAVNRMY… MAVTQEEIIAGIAEIIEEVTGIEP… …

  5. Proteins: known sequences and 3D structures 5,500 non-redundant structures ~10% unknown 3D structures of ~16,000 carefully selected proteins Homology models

  6. Protein Function • Protein function has many definitions: • Biochemical Function- The biochemical role of the protein e.g. serine protease • Biological Function- The role of the protein in the cell/organism e.g.digestion, blood clotting, fertilisation

  7. Function through homology Motif searches Sequence similarity Active Site Templates HTH motifs Structural Similarity Surface comparison

  8. Template Methodology • Use 3D templates to describe the active site of the enzyme - analogous to 1-D sequence motifs such as PROSITE, butin 3-D • (Wallace et al 1997) • defines a functional site • search a new structure for a functional site • search a database of structures for similar clusters

  9. 3-residue templates 1 2 3 4 5 6 7 8 9 … Query structure Query structure SiteSeer’s “reverse” templates

  10. Problems with template methods • Too many hits (hundreds, thousands or even tens of thousands) • Use of rmsd rarely discriminates true from false positives • Local distortion in structure may give a large rmsd • Top hit rarely the correct hit – even in “obvious” cases

  11. PDB code: 1hsk UDP-N-acetylenolpyruvoylglucosamine reductase (MURB) E.C.1.1.1.158 Glu Contains the 3D template that characterises this enzyme class Sequence identity to template’s representative structure (1mbb) is 28% Ser Arg An example

  12. Ser rmsd=2.19Å Arg Hit E.C number Rmsd Enzyme Glu 1. E.C.1.3.99.2 0.76Å Acyl-CoA dehydrogenase 2. E.C.4.2.1.20 0.76Å Tryptophan synthase α-subunit 3. E.C.3.2.1.73 1.19Å Glycosyl hydrolases, family 17 4. E.C.3.2.1.73 1.21Å Glycosyl hydrolases, family 16 5. E.C.4.1.2.13 1.25Å Fructose-bisphosphate aldolase (class I) … … … … … … 386.… 3.94Å … Enzyme active site templates Hits for 1hsk 102. E.C.1.1.1.158 2.19Å UDP-N-acetylmuramate dehydrogenase

  13. Ser Match to template: Arg Glu Template structure – 1mbb Query structure – 1hsk Comparison of template environments

  14. Ser Match to template: Arg Glu Template structure – 1mbb Query structure – 1hsk Comparison of template environments

  15. Comparison of template environments Identical residues in neighbourhood: Template structure – 1mbb Query structure – 1hsk

  16. Ser Arg Glu Comparison of template environments Similar residues in neighbourhood: Template structure – 1mbb Query structure – 1hsk

  17. Results for 1hsk Hit E.C number Rmsd Score Enzyme 1. E.C.1.1.1.158 2.08 209.1 UDP-N-acetylmuramate dehydrogenase 2. E.C.3.2.1.14 2.13 146.0 Chitinase A chitodextrinase 1,4-beta-poly-N-acetylglucosaminidase coly-beta-glucosaminidase 3. E.C.3.2.1.17 1.92 142.4 Turkey lysozyme 4. E.C.3.2.1.17 1.89 138.7 Hen lysozyme 5. E.C.3.5.1.26 1.47 132.3 Aspartylglucosylaminidase 6. E.C.3.2.1.3 1.54 131.1 Glucan 1,4-alpha-glucosidase

  18. ProFunc – function from 3D structure Homologous structures of known function Homologous sequences of known function Functional sequence motifs Residue conservation analysis Q-x(3)-[GE]-x-C-[YW]-x(2)-[STAGC] Binding site identification and analysis HTH-motifs Electrostatics Surface comparison Nests … etc Template based methods Function

  19. Large scale analysis • Created an edited version of the target database from the PDB – only those with status “In PDB” • Extract all PDB codes for each Structural Genomics group • Extract ‘prior’ knowledge (Header, Title, Jrnl, etc.) • Find any associated GOA annotation • Classify each structure by whether function is “known” “unknown” or “limited info” • Run Profunc in a batch process on all codes (~560) • Extract summary results from each analysis • Compare to prior knowledge and estimate success

  20. Number of deposits to the TargetDB by Structural Genomics group (Total of 577 unique entries) March 2004

  21. Target selection criteria Released within months of SG target PDB Blast • Run query sequences against the PDB using BLAST • Filtered out those matches released AFTER the query sequence • Any hits are ignored from subsequent analyses • Still get significant matches • – why?

  22. InterPro Scan • InterPro scan on proteins of known function • Cannot “backdate” the InterPro database • Essentially picking up itself

  23. Function of query structure “known”

  24. Limited Functional Info

  25. Unknown Function

  26. 1. New functional assignment 2. Possible function identified 3. Function remains unknown The Good, the Not So Good and the Ugly Three examples show the varying levels of information that can be retrieved from structures:

  27. Ser-His-Asp catalytic triad of the lipases with rmsd=0.28Å (template cut-off is 1.2Å) Experimentally confirmed by hydrolase assays Novel carboxylesterase acting on short acyl chain substrates The Good: BioH structure (MCSG) One very strong hit Function Discovered

  28. The Not So Good: APC1040 (MCSG) • Assigned as a probable glutaminase • Most methods suggest b-lactamase activity • No match to Prosite patterns Function being assayed APC1040: 70F-T-M-Q-S-I-S-K-V-I-S-F-I-A-A-C 85 Class A: [FY] -x-[LIVMFY]-x-S-[TV]-x-K-x(4)-[AGLM]-x(2)-[LC]

  29. The Ugly: MT0777 (MCSG) Hypothetical protein from: Methanobacterium thermoautotrophicum • No sequence motifs • Residue conservation is poor. • Fold associated with many functions (Rossmann fold) • Template methods fail Function Unknown

  30. Future Work • Improvements to scoring system and additional templates • Further utilisation of SOAP services as they become available (e.g. KEGG API service) • Possible adaptation to use as part of a larger workflow or in LIMS systems (Taverna and MyGrid) • More truely predictive analyses being developed (e.g. Electrostatics, ligand prediction, catalytic residue prediction)

  31. Detection of DNA-binding proteins (with HTH motif) using structural motifs and electrostatics (Hugh Shanahan) • Combine electrostatics with HTH structural templates. • Can detect HTH DNA-binding proteins only. • 1/3 of DNA-binding proteins families have HTH motif • Use linear predictor as discriminant. • Find comparable true positive rate (~80%) with more complicated methods. • Very low (< 0.01% ) false positive rate.

  32. Ligand Prediction Can active site geometry, shape, physical-chemical properties etc. be used to predict the preferred ligand class? Active Site & Ligand description/fingerprinting methods: • Spherical Harmonics • Hybrid Ellipsoids

  33. Spherical Harmonics (Richard Morris) Spherical t-designs The computation of Legendre polynomials of high order requires a robust integration scheme

  34. Hybrid Ellipsoids (Rafael Najmanovich) • Every shape can be modelled by a set of hybrid ellipsoids • The parameters describe location and a,b,c of the ellipsoid and a smear factor • Similar parameters mean similar active sites and ligands

  35. Predicting Catalytic Residues (Alex Gutteridge) • Aims: • To predict the location of the active site in an enzyme structure. • To predict the catalytic residues of an enzyme. • How? • Train a neural network to identify catalytic residues. • Cluster high scoring residues to find the active site.

  36. Workflows and Taverna (Tom Oinn) • Most procedures used now follow a workflow type scheme • Taverna allows users to pick elements from services to create their own workflows for automation of complex sets of procedures. • Removes the need to write complex scripts Beta 9 release available at: http://taverna.sourceforge.net/

  37. Acknowledgements • Janet Thornton • Christine Orengo • Roman Laskowski - Profunc • Richard Morris – Interpro search, Spherical Harmonics • Gail Bartlett, Craig Porter – Enzyme Templates • Alex Gutteridge – Catalytic Residue Prediction • Sue Jones – HTH motifs • Hugh Shanahan – DNA binding, Electrostatics • Jonathan Barker – JESS • Hannes Ponstingl – PITA • Rafael Najmanovich – Hybrid Ellipsoids • Martin Senger, Siamak Sobhany – SOAP, Tom Oinn – Taverna • Annabel Todd and Russell Marsden – UCL • MCSG consortium for lots of structures, plus many more at EBI and UCL • Work was supported by NIH grant (GM 62414) and by the US DoE under contract (W-31-109-Eng-38)

More Related