Quality and effectiveness of protein structure models
Download
1 / 42

Anna.Tramontanouniroma1.it - PowerPoint PPT Presentation


  • 320 Views
  • Uploaded on

Quality and effectiveness of protein structure models. DIMACS 2006. Anna . Tramontano @uniroma1.it. Molecular function. The paradigm. Molecular structure. Sequence. …. Detecting homology. 3.50. 3.00. 2.50. 2.00. 1.50. 1.00. 0.50. 0.00. 1.0. 0.8. 0.6. 0.4. 0.2. 0.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Anna.Tramontanouniroma1.it' - Sharon_Dale


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Slide1 l.jpg

Quality and effectiveness of protein structure models

DIMACS 2006

[email protected]


Slide2 l.jpg

Molecular

function

The paradigm

Molecular structure

Sequence


Slide3 l.jpg

Detecting homology


Slide4 l.jpg

3.50

3.00

2.50

2.00

1.50

1.00

0.50

0.00

1.0

0.8

0.6

0.4

0.2

0

r.m.s.d. = [(1/N)Σ d2]1/2

Proteins evolve

Fraction sequence identity

after structural superposition

Chothia and Lesk, EMBO J., 1986


Slide5 l.jpg

AVGIFRAAVCTRGVAKAVDFVP

+

AVGIFRAAVCTRGVAKAVDFVP

| || | | || ||||| ||

AIGIWRSATCTKGVAKA--FVA

Comparative modelling

If the alignment is correct, we can use the Chothia and Lesk relationship to predict the expected quality of the model


Slide6 l.jpg

AVGIFRAAVCTRGVAKAVDFVPVESMETTMRSPVFTDNSSPPAVPQSFQVAHLHAPTGSGKSTKVPAAYAAQGYKVLVLNPSVAATLGFGAYMSKAHGIDPNIRTGVRTITTGAPVTYSTYGKFLADGGCSGGAYDIIICDECHSTDSTTILGIGTVLDQAETAGARLVVLATATPPGSVTVPHPNIEEVALSNTGEIPAVGIFRAAVCTRGVAKAVDFVPVESMETTMRSPVFTDNSSPPAVPQSFQVAHLHAPTGSGKSTKVPAAYAAQGYKVLVLNPSVAATLGFGAYMSKAHGIDPNIRTGVRTITTGAPVTYSTYGKFLADGGCSGGAYDIIICDECHSTDSTTILGIGTVLDQAETAGARLVVLATATPPGSVTVPHPNIEEVALSNTGEIP

Fold recognition

Score and select model

Orengo, Curr. Op. Str. Biol, 1994


Slide7 l.jpg

AVGIFRAAVCTRGVAKAVDFVP…AVGIFRAAVCTRGVAKAVDFVPVESMETTMRSPVFTDNSSPPAVPQSFQVAHLHAPTGSGKSTKVPAAYAAQGYKVLVLNPSVAATLGFGAYMSKAHGIDPNIRTGVRTITTGAPVTYSTYGKFLADGGCSGGAYDIIICDECHSTDSTTILGIGTVLDQAETAGARLVVLATATPPGSVTVPHPNIEEVALSNTGEIP

AVGIFR

AAVCTR

GVAKAVDF

Fragment based

Bystroff and Baker, JMB, 1998


Slide8 l.jpg

AVGIFRAAVCTRGVAKAVDFVP…AVGIFRAAVCTRGVAKAVDFVPVESMETTMRSPVFTDNSSPPAVPQSFQVAHLHAPTGSGKSTKVPAAYAAQGYKVLVLNPSVAATLGFGAYMSKAHGIDPNIRTGVRTITTGAPVTYSTYGKFLADGGCSGGAYDIIICDECHSTDSTTILGIGTVLDQAETAGARLVVLATATPPGSVTVPHPNIEEVALSNTGEIP

AVGIFR

AAVCTR

GVAKAVDF

Fragment based

Bystroff and Baker, JMB, 1998


Slide9 l.jpg

AVGIFRAAVCTRGVAKAVDFVP…AVGIFRAAVCTRGVAKAVDFVPVESMETTMRSPVFTDNSSPPAVPQSFQVAHLHAPTGSGKSTKVPAAYAAQGYKVLVLNPSVAATLGFGAYMSKAHGIDPNIRTGVRTITTGAPVTYSTYGKFLADGGCSGGAYDIIICDECHSTDSTTILGIGTVLDQAETAGARLVVLATATPPGSVTVPHPNIEEVALSNTGEIP

AVGIFR

AAVCTR

GVAKAVDF

Fragment based

Bystroff and Baker, JMB, 1998


Slide10 l.jpg

AVGIFRAAVCTRGVAKAVDFVP…AVGIFRAAVCTRGVAKAVDFVPVESMETTMRSPVFTDNSSPPAVPQSFQVAHLHAPTGSGKSTKVPAAYAAQGYKVLVLNPSVAATLGFGAYMSKAHGIDPNIRTGVRTITTGAPVTYSTYGKFLADGGCSGGAYDIIICDECHSTDSTTILGIGTVLDQAETAGARLVVLATATPPGSVTVPHPNIEEVALSNTGEIP

AVGIFR

AAVCTR

GVAKAVDF

Fragment based

Bystroff and Baker, JMB, 1998


Slide11 l.jpg

AVGIFRAAVCTRGVAKAVDFVP…AVGIFRAAVCTRGVAKAVDFVPVESMETTMRSPVFTDNSSPPAVPQSFQVAHLHAPTGSGKSTKVPAAYAAQGYKVLVLNPSVAATLGFGAYMSKAHGIDPNIRTGVRTITTGAPVTYSTYGKFLADGGCSGGAYDIIICDECHSTDSTTILGIGTVLDQAETAGARLVVLATATPPGSVTVPHPNIEEVALSNTGEIP

AVGIFR

AAVCTR

GVAKAVDF

Fragment based

Score and select model

Bystroff and Baker, JMB, 1998


Slide12 l.jpg

AVSRAFTAVGIFRAAVCTRGVAKAVDFVPVESMETTMRSPVFTDNSSPPAVPQSFQVAHLHAPTGSGKSTKVPAAYAAQGYKVLVLNPSVAATLGFGAYMSKAHGIDPNIRTGVRTITTGAPVTYSTYGKFLADGGCSGGAYDIIICDECHSTDSTTILGIGTVLDQAETAGARLVVLATATPPGSVTVPHPNIEEVALSNTGEIP

RAFTAAF

DGHTYIPK

CASP: Critical assessment of techniques for protein structure prediction

The evaluation

Moult et al., Proteins, 1995


Slide13 l.jpg

300AVGIFRAAVCTRGVAKAVDFVPVESMETTMRSPVFTDNSSPPAVPQSFQVAHLHAPTGSGKSTKVPAAYAAQGYKVLVLNPSVAATLGFGAYMSKAHGIDPNIRTGVRTITTGAPVTYSTYGKFLADGGCSGGAYDIIICDECHSTDSTTILGIGTVLDQAETAGARLVVLATATPPGSVTVPHPNIEEVALSNTGEIP

250

200

150

100

50

0

30000

25000

70

20000

60

15000

50

10000

40

5000

30

0

20

1

10

2

3

4

5

0

6

Groups

Targets

The evaluation

Models

Tramontano, NSB, 2003


Slide14 l.jpg

120,00AVGIFRAAVCTRGVAKAVDFVPVESMETTMRSPVFTDNSSPPAVPQSFQVAHLHAPTGSGKSTKVPAAYAAQGYKVLVLNPSVAATLGFGAYMSKAHGIDPNIRTGVRTITTGAPVTYSTYGKFLADGGCSGGAYDIIICDECHSTDSTTILGIGTVLDQAETAGARLVVLATATPPGSVTVPHPNIEEVALSNTGEIP

110,00

100,00

90,00

casp6

80,00

70,00

casp4

Max P.AL0

60,00

casp5

50,00

40,00

30,00

20,00

0

20

40

60

80

m

CASP4 CASP5CASP6: Best models

The evaluation

Cozzetto and Tramontano, Proteins, 2004


Slide15 l.jpg

http://predictioncenter.govAVGIFRAAVCTRGVAKAVDFVPVESMETTMRSPVFTDNSSPPAVPQSFQVAHLHAPTGSGKSTKVPAAYAAQGYKVLVLNPSVAATLGFGAYMSKAHGIDPNIRTGVRTITTGAPVTYSTYGKFLADGGCSGGAYDIIICDECHSTDSTTILGIGTVLDQAETAGARLVVLATATPPGSVTVPHPNIEEVALSNTGEIP

State of the art

Moult et al., Proteins, 2005.


Slide16 l.jpg

http://www.caspur.it/PMDBAVGIFRAAVCTRGVAKAVDFVPVESMETTMRSPVFTDNSSPPAVPQSFQVAHLHAPTGSGKSTKVPAAYAAQGYKVLVLNPSVAATLGFGAYMSKAHGIDPNIRTGVRTITTGAPVTYSTYGKFLADGGCSGGAYDIIICDECHSTDSTTILGIGTVLDQAETAGARLVVLATATPPGSVTVPHPNIEEVALSNTGEIP

State of the art

Castrignano’ et al., NAR, 2006.


Slide17 l.jpg

Structural genomicsAVGIFRAAVCTRGVAKAVDFVPVESMETTMRSPVFTDNSSPPAVPQSFQVAHLHAPTGSGKSTKVPAAYAAQGYKVLVLNPSVAATLGFGAYMSKAHGIDPNIRTGVRTITTGAPVTYSTYGKFLADGGCSGGAYDIIICDECHSTDSTTILGIGTVLDQAETAGARLVVLATATPPGSVTVPHPNIEEVALSNTGEIP


Slide18 l.jpg

Diffraction dataAVGIFRAAVCTRGVAKAVDFVPVESMETTMRSPVFTDNSSPPAVPQSFQVAHLHAPTGSGKSTKVPAAYAAQGYKVLVLNPSVAATLGFGAYMSKAHGIDPNIRTGVRTITTGAPVTYSTYGKFLADGGCSGGAYDIIICDECHSTDSTTILGIGTVLDQAETAGARLVVLATATPPGSVTVPHPNIEEVALSNTGEIP

measurements

Protein

crystallization

Protein

preparation

Phase estimation

Model building

Molecular replacement


Slide19 l.jpg

Rotation AVGIFRAAVCTRGVAKAVDFVPVESMETTMRSPVFTDNSSPPAVPQSFQVAHLHAPTGSGKSTKVPAAYAAQGYKVLVLNPSVAATLGFGAYMSKAHGIDPNIRTGVRTITTGAPVTYSTYGKFLADGGCSGGAYDIIICDECHSTDSTTILGIGTVLDQAETAGARLVVLATATPPGSVTVPHPNIEEVALSNTGEIP

search

}

?

Translation

search

Model

Molecular replacement


Slide20 l.jpg

ArpWarpAVGIFRAAVCTRGVAKAVDFVPVESMETTMRSPVFTDNSSPPAVPQSFQVAHLHAPTGSGKSTKVPAAYAAQGYKVLVLNPSVAATLGFGAYMSKAHGIDPNIRTGVRTITTGAPVTYSTYGKFLADGGCSGGAYDIIICDECHSTDSTTILGIGTVLDQAETAGARLVVLATATPPGSVTVPHPNIEEVALSNTGEIP

Completely automatic procedure:

CASP Models

MolRep (10x10)

AMoRe. (20)

RefMac (10)

Molecular replacement


Slide21 l.jpg

100AVGIFRAAVCTRGVAKAVDFVPVESMETTMRSPVFTDNSSPPAVPQSFQVAHLHAPTGSGKSTKVPAAYAAQGYKVLVLNPSVAATLGFGAYMSKAHGIDPNIRTGVRTITTGAPVTYSTYGKFLADGGCSGGAYDIIICDECHSTDSTTILGIGTVLDQAETAGARLVVLATATPPGSVTVPHPNIEEVALSNTGEIP

80

60

40

?

GDT-TS (distance based measure)

= [NCA(1Å)+NCA (2Å)+NCA (4Å)

+NCA (8Å)]/4

Molecular replacement

Giorgetti et al., Bioinformatics, 2005


Slide22 l.jpg

What if we don’t know the quality of the model?AVGIFRAAVCTRGVAKAVDFVPVESMETTMRSPVFTDNSSPPAVPQSFQVAHLHAPTGSGKSTKVPAAYAAQGYKVLVLNPSVAATLGFGAYMSKAHGIDPNIRTGVRTITTGAPVTYSTYGKFLADGGCSGGAYDIIICDECHSTDSTTILGIGTVLDQAETAGARLVVLATATPPGSVTVPHPNIEEVALSNTGEIP

What if we don’t know how to build models?

Molecular replacement

Giorgetti et al., submitted


Slide23 l.jpg

ACAVGIFRAAVCTRGVAKAVDFVPVESMETTMRSPVFTDNSSPPAVPQSFQVAHLHAPTGSGKSTKVPAAYAAQGYKVLVLNPSVAATLGFGAYMSKAHGIDPNIRTGVRTITTGAPVTYSTYGKFLADGGCSGGAYDIIICDECHSTDSTTILGIGTVLDQAETAGARLVVLATATPPGSVTVPHPNIEEVALSNTGEIPTFGARTEADEASRTFCGAVHI

GFRLPMNHTYWPLYHMVCS…

Structure factors

Molecular replacement

Giorgetti et al., submitted


Slide24 l.jpg

60% success rateAVGIFRAAVCTRGVAKAVDFVPVESMETTMRSPVFTDNSSPPAVPQSFQVAHLHAPTGSGKSTKVPAAYAAQGYKVLVLNPSVAATLGFGAYMSKAHGIDPNIRTGVRTITTGAPVTYSTYGKFLADGGCSGGAYDIIICDECHSTDSTTILGIGTVLDQAETAGARLVVLATATPPGSVTVPHPNIEEVALSNTGEIP

Molecular replacement


Slide25 l.jpg

60% success rateAVGIFRAAVCTRGVAKAVDFVPVESMETTMRSPVFTDNSSPPAVPQSFQVAHLHAPTGSGKSTKVPAAYAAQGYKVLVLNPSVAATLGFGAYMSKAHGIDPNIRTGVRTITTGAPVTYSTYGKFLADGGCSGGAYDIIICDECHSTDSTTILGIGTVLDQAETAGARLVVLATATPPGSVTVPHPNIEEVALSNTGEIP

If one of the retrieved models works, the procedure is successful

Molecular replacement


Slide26 l.jpg

biologicalAVGIFRAAVCTRGVAKAVDFVPVESMETTMRSPVFTDNSSPPAVPQSFQVAHLHAPTGSGKSTKVPAAYAAQGYKVLVLNPSVAATLGFGAYMSKAHGIDPNIRTGVRTITTGAPVTYSTYGKFLADGGCSGGAYDIIICDECHSTDSTTILGIGTVLDQAETAGARLVVLATATPPGSVTVPHPNIEEVALSNTGEIP

blood

coagulation

Function prediction

catalityc activity

molecular

extra cellular

cellular


Slide27 l.jpg

?AVGIFRAAVCTRGVAKAVDFVPVESMETTMRSPVFTDNSSPPAVPQSFQVAHLHAPTGSGKSTKVPAAYAAQGYKVLVLNPSVAATLGFGAYMSKAHGIDPNIRTGVRTITTGAPVTYSTYGKFLADGGCSGGAYDIIICDECHSTDSTTILGIGTVLDQAETAGARLVVLATATPPGSVTVPHPNIEEVALSNTGEIP

AVSRAFT

RAFTAAF

DGHTYIPK

The experiment

Moult et al., Proteins, 1995


Slide28 l.jpg

Scheme of the experimentAVGIFRAAVCTRGVAKAVDFVPVESMETTMRSPVFTDNSSPPAVPQSFQVAHLHAPTGSGKSTKVPAAYAAQGYKVLVLNPSVAATLGFGAYMSKAHGIDPNIRTGVRTITTGAPVTYSTYGKFLADGGCSGGAYDIIICDECHSTDSTTILGIGTVLDQAETAGARLVVLATATPPGSVTVPHPNIEEVALSNTGEIP

Collect known info on targets

Ask people to provide ADDITIONAL information

Compare predictions

Is there a consensus?

Once the structure is known, can we saymore?

Function prediction


Slide29 l.jpg

EC Number AVGIFRAAVCTRGVAKAVDFVPVESMETTMRSPVFTDNSSPPAVPQSFQVAHLHAPTGSGKSTKVPAAYAAQGYKVLVLNPSVAATLGFGAYMSKAHGIDPNIRTGVRTITTGAPVTYSTYGKFLADGGCSGGAYDIIICDECHSTDSTTILGIGTVLDQAETAGARLVVLATATPPGSVTVPHPNIEEVALSNTGEIP

Binding

Binding site(s)Residue role(s)

PT modificationsFree text comments

Function prediction

Soro and Tramontano, Proteins 2005


Slide30 l.jpg

We had too few predictions per target to derive any sensible conclusion.

However,for the sake of the experiment, we tried to see what we could do and which would be the problems in analysing the data (other than the format)pretending that the numbers were significant.

Function prediction


Slide31 l.jpg

  • Summary table for target T0230 conclusion.

    • Molecular function Unknown / COG annotation: Predicted metal-sulfur cluster biosynthetic enzyme (Group: General function prediction only; Category: Poorly characterized)

  • Predictions:

  • GO number GO name frequency

  • 287 magnesium ion binding1

  • 4176 ATP-dependent peptidase activity1

  • mannose-1-phosphate guanylyltransferase activity 1 1

  • 4672 protein kinase activity1

  • 5094 Rho GDP-dissociation inhibitor activity 1

  • 5554 Molecular function unknown 1 -

  • 6812 PROCESS (1)

  • 6825 PROCESS(1)

  • 8170 N-methyltransferase activity 1

  • 16822 hydrolase activity, acting on acid carbon-carbon bonds 1

  • 46872 metal ion binding1

Function prediction


Slide32 l.jpg

  • Summary table for target T0230 conclusion.

    • Molecular function Unknown / COG annotation: Predicted metal-sulfur cluster biosynthetic enzyme (Group: General function prediction only; Category: Poorly characterized)

  • Predictions:

  • GO number GO name frequency GO Parents

  • 287 magnesium ion binding 1 46872, 43167, 5488

  • 4176 ATP-dependent peptidase activity 1 8233, 16787, 3824

  • mannose-1-phosphate guanylyltransferase

  • activity 1 8905, 16779, 16772, 16740, 3824

  • 4672 protein kinase activity 1 16773, 16772, 16740 (16301), 3824

  • Rho GDP-dissociation inhibitor

  • activity1 1 5092, 5083, 30695, 30234

  • 8170 N-methyltransferase activity 1 8168, 16741, 16740, 3824

  • hydrolase activity, acting on

  • acid carbon-carbon bonds 1 16787, 3824

  • 46872 metal ion binding 1 43167, 5488

Function prediction


Slide33 l.jpg

  • Summary table for target T0230 conclusion.

    • Molecular function Unknown / COG annotation: Predicted metal-sulfur cluster biosynthetic enzyme (Group: General function prediction only; Category: Poorly characterized)

  • Predictions:

  • GO number GO name frequency GO Parents

  • 287 magnesium ion binding 1 46872, 43167, 5488

  • 4176 ATP-dependent peptidase activity1 8233, 16787, 3824

  • mannose-1-phosphate guanylyltransferase

  • activity1 8905, 16779, 16772, 16740, 3824

  • 4672 protein kinase activity1 16773, 16772, 16740 (16301), 3824

  • Rho GDP-dissociation inhibitor

  • activity1 1 5092, 5083, 30695, 30234

  • 8170 N-methyltransferase activity1 8168, 16741, 16740, 3824

  • hydrolase activity, acting on

  • acid carbon-carbon bonds1 16787, 3824

  • 46872 metal ion binding1 43167, 5488

Function prediction


Slide34 l.jpg

  • Summary table for target T0230 conclusion.

    • Molecular function Unknown / COG annotation: Predicted metal-sulfur cluster biosynthetic enzyme (Group: General function prediction only; Category: Poorly characterized)

  • Predictions:

  • GO number GO name frequency GO Parents

  • 287 magnesium ion binding 1 46872, 43167, 5488

  • 4176 ATP-dependent peptidase activity1 8233, 16787, 3824

  • mannose-1-phosphate guanylyltransferase

  • activity1 8905, 16779, 16772, 16740, 3824

  • 4672 protein kinase activity1 16773, 16772, 16740 (16301), 3824

  • Rho GDP-dissociation inhibitor

  • activity1 1 5092, 5083, 30695, 30234

  • 8170 N-methyltransferase activity1 8168, 16741, 16740, 3824

  • hydrolase activity, acting on

  • acid carbon-carbon bonds1 16787, 3824

  • 46872 metal ion binding1 43167, 5488

Function prediction

16787 hydrolase

16740 transferase activity

3824 catalyitic activity


Slide35 l.jpg

Results: GO consensus conclusion.

Function prediction

Soro and Tramontano, Proteins, 2005


Slide36 l.jpg

18 months later… conclusion.

Annotations in DB decreased by 5%

24 new targets were annotated

We looked at methods (abstracts, directly contacting predictors, literature)

Function prediction


Slide37 l.jpg

1 conclusion.

1

4

11011

1

10011

10100

2

10001

10000

11100

2

10101

5

11001

2

Function prediction


Slide38 l.jpg

18 months later… conclusion.

4 newly annotated targets had been correctly predicted by at least one method

85% of the consensus non redundant predictions were correct

Function prediction


Slide39 l.jpg

Results: GO consensus conclusion.

Function prediction

Soro and Tramontano, Proteins, 2005


Slide40 l.jpg

* conclusion.

*

Function prediction

*

*

*

*


Slide41 l.jpg

CASP is about to start again: conclusion.

We will start collecting targets next week

There will be a few differences

http://predictioncenter.org

Announcments


Slide42 l.jpg

Claudia Bonaccini conclusion.

Michele Ceriani

Domenico Cozzetto

Emanuela Giombini

Alejandro Giorgetti

Paolo Marcatili

Veronica Morea

Romina Oliva

Massimiliano Orsini

Marialuisa Pellegrini

Domenico Raimondo

Simonetta Soro

Ivano Talamo

Krzysztof Fidelis

Tim Hubbard

Andriy Kryshtafovych

John Moult

Burkhard Rost

Adam Zemla

Structural biologists

Predictors

Acknowledgements

BioSapiens - EU VI Framework

Ministero della Salute

Universita' di Roma

Istituto Pasteur Roma

Facolta' di Medicina

San Paolo

CNR


ad