extracting and exploiting structural patterns in proteins especially relating to function
Download
Skip this Video
Download Presentation
Extracting and Exploiting Structural Patterns in Proteins, especially Relating to Function

Loading in 2 Seconds...

play fullscreen
1 / 87

ppt - docking Chemistry for biology since 2007 - PowerPoint PPT Presentation


  • 334 Views
  • Uploaded on

Extracting and Exploiting Structural Patterns in Proteins, especially Relating to Function. Janet Thornton James Watson, Roman Laskowski - EBI Adel Golovin, Kim Henrick - EBI MSD David Leader, James Milner-White – Glasgow Andrzej Joachimiak, Aled Edwards – MCSG

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'ppt - docking Chemistry for biology since 2007' - arleen


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
extracting and exploiting structural patterns in proteins especially relating to function

Extracting and Exploiting Structural Patterns in Proteins, especially Relating to Function

Janet Thornton

James Watson, Roman Laskowski - EBI

Adel Golovin, Kim Henrick - EBI MSD

David Leader, James Milner-White – Glasgow

Andrzej Joachimiak, Aled Edwards – MCSG

(Mid-West Centre for Structural Genomics)

outline
Outline
  • Structural Motifs
    • PDBsum
    • MSDmotif
  • Functional Motifs
    • Catalytic Site Atlas
    • DNA Binding Motifs
    • Automated templates
    • Reverse Templates
  • From Structure to Function? - ProFunc
structural motifs
Structural Motifs

Structural motifs are commonly occurring small sections of proteins – that are distinguished by:

Sequence – Gly-X-Gly

Conformation – , angles

Secondary structure - helix, bab unit

Function – catalytic triad, calcium binding site

examples of structural motifs
Examples of Structural Motifs

AlphaBeta Motif

Beta Turn

Schellmann Loop

Beta Bulge (classic)

Nest

Beta Bulge Loop

structural motifs5
Structural Motifs

They may be continuous along the chain (e.g. GXG) or discontinuous (e.g. catalytic triad)

Historically motifs were identified and analysed in an effort to understand the relationship between protein sequence and structure, to improve prediction methods. They are also used to assign function (Prosite).

Many motifs can now be recognised automatically from coordinates, using programmes such as DSSP and Promotif

PDB files can be annotated with these structural motifs e.g. in PDBsum

http www ebi ac uk thornton srv databases pdbsum
http://www.ebi.ac.uk/thornton-srv/databases/pdbsum/

Roman Laskowski

msd motif http www ebi ac uk msd srv msdmotif

MSD motifhttp://www.ebi.ac.uk/msd-srv/msdmotif

Adel Golovin

Currently alpha test

Full Release probably ~Oct 2005

PDB: 1gci

msd motif

MSD motif

Small 3D motifs from J.Milner-White search/view

Secondary structure patterns (HTH) search/view

,, based search/view

Ligands and their environment search/view

Catalytic sites search/view

Blast sequence search/view

Prosite compliant patterns search/view

3D multiple alignment

small motifs
Small motifs

Alpha-Beta Motif

Nest

ST staple

11 motifs in total (Prof James Milner-White)

http://doolittle.ibls.gla.ac.uk:9006/david/ProteinMotifDB.html

motifs in msdmotif 1
Motifs In MSDmotif (1)

AlphaBeta Motif

Beta Turn

Schellmann Loop

Beta Bulge (classic)

Nest

Beta Bulge Loop

motifs in msdmotif 2
Motifs In MSDmotif (2)

Asx Motif

ST Motif

Asx Turn

ST Turn

ST Staple

statistics provided by msdmotif stmotif
Statistics provided by MSDmotifSTmotif

a)

b)

c)

  • Amino acid occurrence at each position
  • Correlation between side chain charge and residue position
  • Motif parameter variation
secondary structure patterns

Strand – turn – Strand

2-3 residues gap

Glycosylation pattern N{P}[ST]{P}

Secondary structure patterns

Where N binds sugar: Man or Nag

search
,, search

PDB:1gci

Ideal for short loops search

example of a search using msdmotif
Example of a search using MSDmotif

PDB:1gci

Subtilases family

PDB:1f5p

Globins family

Phi/Psi Search using MSDmotif

+ Other Subtilases

Calcium binding site

sequence search
Sequence search

ZN binding pattern: CXXCXXXFXXXXXLXXHXXXH

msd motif24
MSD motif
  • Available in alpha version
    • http://www.ebi.ac.uk/msd-srv/msdmotif
  • Will be published later this year
    • Incremental weekly update
    • 20 G disk space on Oracle DB, linear dependency

~ 0.8 M per PDB

  • Web application server with J2EE servlet engine
  • NCBI Blast

Adel Golovin

Kim Henrick

outline25
Outline
  • Structural Motifs
    • PDBsum
    • MSDmotif
  • Functional Motifs
    • Catalytic Site Atlas
    • DNA Binding Motifs
    • Automated templates
    • Reverse Templates
  • From Structure to Function? - ProFunc
slide26

Catalytic Site Atlas

  • Taken from primary literature:
    • -lactamase Class A
    • EC: 3.5.2.6
    • PDB: 1btl
    • Reaction: -lactam + H2O  -amino acid
    • Active site residues: S70, K73, S130, E166
    • Plausible mechanism:
slide27
The Catalytic Site Atlas: a resource of catalytic sites and residues identified in enzymes using structural data.

Craig T. Porter, Gail J. Bartlett, and Janet M. Thornton

Nucl. Acids. Res. 2004 32: D129-D133.

http://www.ebi.ac.uk/thornton-srv/databases/CSA

slide28
Annotates catalytic residues in the PDB
  • Based on a dataset of 514 enzyme families
    • Representative catalytic site for each family
    • Homologues assigned by Psi-BLAST
    • Limited substitution allowed.
    • Homologues updated monthly.
  • Literature references
  • Data also available via MSDsite
  • http://www.ebi.ac.uk/thornton-srv/databases/CSA
  • http://www.ebi.ac.uk/msd-srv/msdsite
3 d templates
3-D templates
  • Use 3D templates to describe the active site of the enzyme
    • analogous to 1-D sequence motifs such as PROSITE, butin 3-D
  • Sequence position independent
  • Captures essence of functional site in protein
aspartic proteinase active site residues dtg x2
Aspartic Proteinase - Active Site residues - [DTG]x2

Eukaryotic & Fungal Aspartic Proteinases:

all-atom DTG-DTG Template

aspartic proteases active site template
Aspartic Proteases: Active Site Template

Asp CO2

Gly C

A template of 8 atoms

is sufficient to identify

all Aspartic Proteinases

Asp O

Gly C

Thr/Ser

O

Thr O

template search and superposition tess
TEmplate Search and Superposition TESS

Wallace et al., 1997

  • defines a functional site as a sequence-independent set of atoms in 3-D space
  • search a new structure for a functional site
  • search a database of structures for similar clusters

e.g. serine proteinase,

catalytic triad

serine proteinase templates
Serine Proteinase templates
  • A trypsin-based template of 7 atoms was able to identify almost all serine proteinases in PDB- including subtilisin
  • It also identified active sites of several other functionally distinct enzyme families - serine carboxypeptidase, acetylcholine esterase; lipase; dehalogenase
  • The catalytic triad has evolved independently many times
slide36

Active site convergence

Trypsin

Subtilisin

slide37

Trypsin

Subtilisin

Alpha/beta hydrolase

Brain platelet activating factor acetylhydrolase

Clp protease

CheB methylesterase

database of enzyme active site templates

GARTfase

Cholesterol oxidase

IIAglc histidine kinase

Database of enzyme active site templates

189 templates

Carbamoylsarcosine

amidohhydrase

Ser-His-Asp

catalytic triad

Dihydrofolate reductase

slide40

DNA

Protein

+

dna binding motifs
DNA-binding Motifs
  • Helix-Turn-Helix (HTH)
    • Standard HTH
    • Winged helix
  • Beta Sheet
  • Zinc-finger
prediction of dna binding function using structural motifs
Prediction of DNA Binding Function using Structural Motifs
  • Predicting function from structure
  • Structural motifs
  • Helix-Turn-Helix (HTH)
  • Bind in major groove
  • Carboxyl terminal helix - DNA recognition
  • 1/3 DNA-binding protein families (16/54)
  • Brennan and Mathews 1989: Brennan, 1991
hth motif proteins
HTH Motif Proteins

Catabolic activator protein (1ber)

Lambda repressor/operator complex (1lmb)

hth motif templates
HTH Motif Templates

3D template library

(E.g. 1berA16-36)

predicting dna binding function
Predicting DNA binding function
  • Scanning template library against 3D structures
  • One templateT(length n) scanned against proteinP of length m, RMSD calculated optimal superposition at each m-n+1 possible positions in P
  • Calculate lowest RMSD for optimal superposition
slide47

RMSD Distributions with HTH templates

1.2Å

RMSD

831/23,506 = 3.5% false positives

2/142 = 1.4% false negatives

hth motif extended templates
HTH Motif Extended Templates
  • Extend templates by adding +2 residues to start and end
  • 1berA16-36
  • 1berA14-38
slide49

RMSD Distributions with extended HTH templates

1.2Å

110/23,506 = 0.5% false positives

2/144 = 1.4% false negatives

hth accessible surface area

Data Set

Min

Max

Mean

HTH Proteins

(144)

990

2740

1732

False Positives (110)

856

2747

1264

HTH Accessible Surface Area

ASA threshold 990Å2 reduced false positives from 110 to 80

False positive rate of 0.3% (80/23506)

summary
Summary
  • Structural template library of 144 HTH motifs
  • Minimum RMSD for optimal superpositions on whole protein structures based on C coordinates
  • Thresholds of 1.2Å RMSD and 990Å2 ASA
  • Hit rate of 98.6% & false positive rate of 0.3%
  • Recognition across sequence families and fold families
template databases
Template databases
  • HAND CURATED
    • Enzyme active sites (PROCAT) – 189 templates
      • Currently being extended
    • Metal-binding sites – 600 templates
  • AUTOMATED
    • Ligand-binding sites – 10,000 templates
    • DNA-binding sites – 800 templates
slide54

1. Ligand-binding templates

Automatically generated templates

a. For each Het Group in the PDB extract a non-homologous data set of proteins binding that Het Group

b. Identify residues interacting with ligand (via H-bonds or non-bonded contacts)

c. Templates generated from overlapping local groups of 3-residue clusters

d. Gives over 10,000 ligand-binding templates

slide55

2. DNA-binding templates

Automatically generated templates

a. Extract a non-homologous data set of DNA/RNA-binding proteins from the PDB

b. Identify residues interacting with DNA/RNA (via H-bonds or non-bonded contacts)

c. Templates generated from overlapping local groups of 3-residue clusters

d. Gives over 800 DNA/RNA-binding templates

slide56

Problems with automated template methods

  • WITH A LARGE NUMBER OF TEMPLATES:
  • Too many hits (usually tens, and often hundreds)
  • Use of rmsd rarely discriminates true from false positives
  • Local distortion in structure may give a large rmsd
  • Top hit rarely the correct hit – even in “obvious” cases
slide57

PDB code: 1hsk

UDP-N-acetylenolpyruvoylglucosamine

reductase (MURB)

E.C.1.1.1.158

Glu

Contains the 3D template that characterises

this enzyme class

Sequence identity to template’s

representative structure (1mbb) is 28%

Ser

Arg

An example

slide58

Ser

rmsd=2.19Å

Arg

Hit E.C number Rmsd Enzyme

Glu

1. E.C.1.3.99.2 0.76Å Acyl-CoA dehydrogenase

2. E.C.4.2.1.20 0.76Å Tryptophan synthase α-subunit

3. E.C.3.2.1.73 1.19Å Glycosyl hydrolases, family 17

4. E.C.3.2.1.73 1.21Å Glycosyl hydrolases, family 16

5. E.C.4.1.2.13 1.25Å Fructose-bisphosphate aldolase (class I)

… … …

… … …

386.… 3.94Å …

Enzyme active site templates

Hits for 1hsk

102. E.C.1.1.1.158 2.19Å UDP-N-acetylmuramate dehydrogenase

slide59

Ser

Arg

Glu

Comparison of template environments

Similar residues in

neighbourhood:

Template structure – 1mbb

Target structure – 1hsk

slide60

Ser

Match to template:

Arg

Glu

Template structure – 1mbb

Target structure – 1hsk

Comparison of template environments

slide61

Ser

Match to template:

Arg

Glu

Template structure – 1mbb

Target structure – 1hsk

Comparison of template environments

slide62

Environment similarity score

Slices through 10Å sphere centred on template match

Template

structure

1mbb

Target

structure

1hsk

Score equivalent grid-points using Dayhoff matrix and taking voids into account

Total similarity score obtained from sum of all grid-point scores

slide63

Results for 1hsk

Hit E.C number Rmsd Score Enzyme

1. E.C.1.1.1.158 2.08 209.1 UDP-N-acetylmuramate dehydrogenase

2. E.C.3.2.1.14 2.13 146.0 Chitinase A chitodextrinase 1,4-beta-poly-N-acetylglucosaminidase

coly-beta-glucosaminidase

3. E.C.3.2.1.17 1.92 142.4 Turkey lysozyme

4. E.C.3.2.1.17 1.89 138.7 Hen lysozyme

5. E.C.3.5.1.26 1.47 132.3 Aspartylglucosylaminidase

6. E.C.3.2.1.3 1.54 131.1 Glucan 1,4-alpha-glucosidase

slide64

Rank template hits according to conservation scores of the matched residues

Hit E.C number Rmsd Signif Enzyme

1. E.C.1.1.1.158 2.08Å 98.3% UDP-N-acetylmuramate dehydrogenase

2. E.C.3.5.1.11 2.06Å 98.3% Penicillin acylase

3. E.C.5.99.1.2 2.22Å 98.3% Topoisomerase Ia/II

4. E.C.5.1.2.2 2.69Å 98.3% Mandelate racemase

5. E.C.5.1.2.2 2.59Å 97.8% Topoisomerase Ia/II

… … ……

Residue conservation

slide65

Rank by conservation and proximity to protein’s two largest clefts

Hit E.C number Rmsd Signif Enzyme

1. E.C.5.1.2.2 2.69Å 98.4% Mandelate racemase

2. E.C.1.1.1.158 2.08Å 98.3% UDP-N-acetylmuramate dehydrogenase

3. E.C.3.5.1.11 2.06Å 98.3% Penicillin acylase

4. E.C.5.99.1.2 2.22Å 98.3% Topoisomerase Ia/II

5. E.C.5.1.2.2 2.59Å 97.8% Topoisomerase Ia/II

… … ……

Residue conservation and cleft proximity

slide66

3-residue templates

1

2

3

4

5

6

7

8

9

1hsk

1hsk

“Reverse” templates

slide67

Comparison of template environments

Identical residues in

neighbourhood:

Template structure – 1mbb

Target structure – 1hsk

slide68

“Reverse” templates

  • Typically get 20-40 templates from a single structure
  • Search each template vs PDB (or representative subset)
  • Non-homologous dataset of 2,500 protein chains
  • Focused search (eg top DALI hits)
  • Locate known PDB entries with closest local similarity
  • Program called: the Protein SiteSeer
  • Times for search vs 2,500 set
  • JESS – 30 minutes
  • SiteSeer – 3 hours
slide69

evolutionary relationships

biological multimeric state

INTERACTIONS

MULTIMERS

FOLD

Structure to Function

Structure to Function

SURFACE

MUTANTS & SNPs

3D STRUCTURE

ELECTROSTATICS

LIGANDS

CLUSTERS

enzyme active sites

ligand & functional sites

catalytic clusters, mechanisms & motifs

protein function
Protein Function

Protein function has many definitions:

  • Biochemical Function - The biochemical role of the protein e.g. serine protease
  • Biological Function - The role of the protein in the cell/organism e.g.digestion, blood clotting, fertilisation

The 3D structure usually only provides information about biochemical function

250 structures solved to date by mcsg

ylxR hypothetical cytosolic protein

ygbM hypothetical protein (EC1530)

Hypothetical protein (MTH1)

Conserved hypothetical protein (MT777)

Hypothetical protein (EC4030_F)

cutA protein implicated in Cu homeostasis (TM1056)

~250 structures solved to date by MCSG

~40% are ‘hypothetical proteins’

Some examples …

from gene to biochemical function
Gene  Protein  3D Structure  Function

Identifying sequence or structural similarity

(i.e. identifying an evolutionary relationship)

is the most powerful route to function

Identification

From Gene To Biochemical Function
from gene to biochemical function73
From Gene To Biochemical Function

Gene  Protein  3D Structure  Function

Given a protein structure:

  • Where is the functional site?
  • Which ligands bind to the protein?
predicting function from 3d structure conservation
Predicting function from 3D Structure: conservation

Residue conservation

  • Conservation
  • Valdar & Thornton
  • Lichtarge et al.
  • Aloy et al.
  • Glaser et al.
  • Etc.…..
predicting function from 3d structure binding sites
Predicting function from 3D structure: binding sites

Binding sites

  • Binding site comparison
  • Geometrical hashing
  • eF-site (Nakamura et al.)
  • PINTS (Russell)
  • Pseudospheres (Klebe)
  • pvSOAR (Binkowski et al.)
  • etc
predicting function from 3d structure templates

Template methods

  • PROCAT/CSA (Wallace..)
  • ASSAM (Artymuik..)
  • Rigor/Spasm (Kleywegt)
  • MSD (Henrick, Oldfield…
  • etc
Predicting function from 3D Structure: templates

3D templates

predicting binding site

Surface clefts

Residue conservation

Most likely binding site

Conserved surface patches

Predicting Binding Site

Binding-site analysis: cutA

identifying binding site function using motifs
Identifying Binding Site Function Using Motifs

- 3D enzyme active site structural motifs (Craig Porter)

- Catalytic Site Atlas - Identification of catalytic residues (Gail Bartlett, Alex Gutteridge)

- Metal binding sites (Malcolm MacArthur)

- Binding site features (Gareth Stockwell)

- Automatically generated templates of ligand-binding and

- DNA binding motifs (Sue Jones, Hugh Shanahan)

- “Reverse” templates (Roman Laskowski)

JESS – fast template search algorithm (Jonathan Barker)

slide79

MCSG structure

BioH – unknown function

involved in biotin synthesis

in E.coli

Expected to be an enzyme

Sequence contains two Gly-X-Ser-X-Gly motifs typical of

acyltransferases and thioesterases

An example

Structure: Rossmann fold, hence many

structural homologues

slide80

Ser-His-Asp catalytic triad of the lipases with rmsd=0.28Å

(template cut-off is 1.2Å)

Experimentally confirmed by hydrolase assays

Novel carboxylesterase acting on short acyl chain substrates

PROCAT template search

One very strong hit

slide81

ProFunc – function from 3D structure

Homologous structures of known function

Homologous sequences of known function

DNA-, ligand- binding and “reverse” templates

Residue conservation analysis

Functional sequence motifs

Binding site identification and analysis

Q-x(3)-[GE]-x-C-[YW]-x(2)-[STAGC]

HTH-motifs Electrostatics Surface comparison

… etc

Enzyme active site 3D-templates

Roman Laskowski

mcsg dataflow
MCSG Dataflow

Crystallographers

(Structure Solution)

Deposition and release

Central Database

Function Prediction (Neural Network)

NIH Report

Experimental Validation Of Function

functional annotation
Functional Annotation

Confident

42/102

Putative

50/102

Conflicting

10/102

All MCSG structures are automatically run through ProFunc.

The results are examined manually to try to estimate the most likely function. The most recent (Nov 2004) dataset contains 193 unique structures:

Some assignment possible

102 (53%)

Function remains

unknown

23 (12%)

Prior function

known

68 (35%)

acknowledgements

Acknowledgements

James Watson, Roman Laskowski - EBI

Adel Golovin, Kim Henrick - EBI MSD

David Leader, James Milner-White – Glasgow

Andrzej Joachimiak, Aled Edwards – MCSG

(Mid-West Centre for Structural Genomics)

http://www.ebi.ac.uk/thornton-srv/databases/pdbsum/

http://www.ebi.ac.uk/msd-srv/msdmotif

http://www.ebi.ac.uk/thornton-srv/databases/ProFunc/

ad