Exploiting structural and comparative genomics to reveal protein functions
This presentation is the property of its rightful owner.
Sponsored Links
1 / 30

Exploiting Structural and Comparative Genomics to Reveal Protein Functions PowerPoint PPT Presentation


  • 59 Views
  • Uploaded on
  • Presentation posted in: General

Predicting domain structure families and their domain contexts Exploring how structural divergence in domain families correlates with functional change Predicting domain relatives likely to have significantly different structures and functions.

Download Presentation

Exploiting Structural and Comparative Genomics to Reveal Protein Functions

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Exploiting structural and comparative genomics to reveal protein functions

  • Predicting domain structure families and their domain contexts

  • Exploring how structural divergence in domain families correlates with functional change

  • Predicting domain relatives likely to have significantly different structures and functions

Exploiting Structural and Comparative Genomics to Reveal Protein Functions

T

H

A

C

Domain families of known structure

Gene3D

Protein families and domain annotations for completed genomes


Exploiting structural and comparative genomics to reveal protein functions

Congratulations Swiss-Prot - 20 Years!!

Thanks to Amos, Rolf and the Swiss-Prot Team!!!!


Exploiting structural and comparative genomics to reveal protein functions

T

H

A

C

Class (3)

Orengo and Thornton

(1994)

Architecture (36)

86,000 domains

Topology or Fold (1100)

Homologous superfamily (2100)

H1

H2

H3


Exploiting structural and comparative genomics to reveal protein functions

Gene3D:Domain annotations in genome sequences

scan against

library of HMM models

~2100 CATH

~8300 Pfam

>2 million protein sequences

from 300 completed genomes and UniProt

assign domains to

CATH and Pfam superfamilies

Benchmarking by structural data shows that 76% of remote homologues can be identified using the HMMs


Exploiting structural and comparative genomics to reveal protein functions

DomainFinder: structural domains from CATH take precedent

NewFam

Pfam-1

CATH-1

Pfam-2

Gene3D:

Domain annotations in genome sequences

UniProt sequence

N

C

CATH-1

Pfam-2

Pfam-1

NewFam

Assigned domains


Exploiting structural and comparative genomics to reveal protein functions

Domain families ranked by size (number of domain sequences)

NewFam of unknown stucture

(>50,000 families)

Pfam families of unknown structure

Percentage of all domain family sequences in UniProt

CATH superfamilies of known structure

Rank by family size

>90% of domain sequences in UniProt can be assigned to ~7000 domain families


Exploiting structural and comparative genomics to reveal protein functions

Domain families ranked by size (number of domain sequences)

NewFam of unknown stucture

(>50,000 families)

Pfam families of unknown structure

Percentage of all domain family sequences in UniProt

CATH superfamilies of known structure

Rank by family size

100 largest families of known structure account for 30% of domain sequences in UniProt


Exploiting structural and comparative genomics to reveal protein functions

Correlation of sequence and structural variability of CATH families with the number of different functional groups

Structural Diversity

Population in genomes


Exploiting structural and comparative genomics to reveal protein functions1

  • Prediting domain structure families and their domain contexts

  • Exploring how structural divergence in domain families correlates with functional change

  • Predicting domain relatives likely to have significantly different structures and functions

Exploiting Structural and Comparative Genomics to Reveal Protein Functions

T

H

A

C

Domain families of known structure

Gene3D

Protein families and domain annotations for completed genomes


Exploiting structural and comparative genomics to reveal protein functions

Some superfamilies show great structural diversity

Gabrielle Reeves

J. Mol. Biol. (2006)

Multiple structural alignment by CORA allows identification of consensus secondary structures and secondary structure embellishments

2DSEC algorithm

In 117 superfamilies relatives expanded by >2 fold or more


Exploiting structural and comparative genomics to reveal protein functions

Structural embellishments can modify the active site

Galectin binding superfamily


Exploiting structural and comparative genomics to reveal protein functions

Structural embellishments can modulate domain interactions

side orientation

face orientation

Glucose 6-phosphate dehydrogenase

a

Dihydrodipiccolinate reductase

Additional secondary structure shown at (a) are involved in subunit interactions


Exploiting structural and comparative genomics to reveal protein functions

Structural embellishments can modify function by modifying active site geometry and mediating new domain and subunit interactions

Biotin carboxylase

D-alanine-d-alanine ligase

ATP Grasp

superfamily

Dimer of biotin carboxylase


Exploiting structural and comparative genomics to reveal protein functions

Secondary structure insertions are distributed along the chain but aggregate in 3D


Exploiting structural and comparative genomics to reveal protein functions

Secondary structure insertions are distributed along the chain but aggregate in 3D


Exploiting structural and comparative genomics to reveal protein functions

80

60

Frequency (%)

40

Indel frequency < 1 %

20

0.85% 0.38% 0.23% 0.11% 0.06% 0.02%

0

1

2

3

4

5

6

7

8

9

10

11

12

Size of Indel (number of secondary structures)

85% of insertions comprise only 1 or 2 secondary structures

Frequency (%)

Size of insertion (number of secondary structures)

For ~70% of domains analysed, 80% of the secondary structure embellishments are co-located in 3D with 3 or more other embellishments

In 80% of domains, 1 or more embellishments contacts other domains or subunits


Exploiting structural and comparative genomics to reveal protein functions

3 Layer Alpha/Beta Sandwich

2 Layer Alpha/Beta

Alpha/Beta Barrel

2 Layer Beta Sandwich

Many structurally diverse superfamilies adopt folds with these regular layered architectures


Exploiting structural and comparative genomics to reveal protein functions

3 Layer Alpha/Beta Sandwich

2 Layer Alpha/Beta

Alpha/Beta Barrel

2 Layer Beta Sandwich

Many structurally diverse superfamilies adopt folds with these regular layered architectures


Exploiting structural and comparative genomics to reveal protein functions2

  • Predicting domain structure families and their domain contexts

  • Exploring how structural divergence in domain families correlates with functional change

  • Predicting domain relatives likely to have significantly different structures and functions

Exploiting Structural and Comparative Genomics to Reveal Protein Functions

T

H

A

C

Domain families of known structure

Gene3D

Protein families and domain annotations for completed genomes


Exploiting structural and comparative genomics to reveal protein functions

GEMMA – GEne Model and Model AnnotationAlgorithm for Predicting Sequence Homologues with Similar Structures and Functions

structural

superfamily

subfamily of close sequence relatives predicted to have similar functions

(>=60% sequence identity)

Largest 100 CATH families have more than 20,000 subfamilies


Exploiting structural and comparative genomics to reveal protein functions

GEMMA – Predicting Functional Groups in CATH Superfamilies

subfamily of close relatives predicted to have similar function (>60% identity)

structural

superfamily

Build multiple sequence alignments for each subfamily


Exploiting structural and comparative genomics to reveal protein functions

GEMMA – Predicting Functional Groups in CATH Superfamilies

subfamily of close relatives predicted to have similar function (>60% identity)

structural

superfamily

Cluster subfamilies predicted to have similar functions into functional groups


Exploiting structural and comparative genomics to reveal protein functions

Pyruvate phosphate dikinase (subfamily 1)

Succinyl-CoA synthetase

(subfamily 22)

SSAP score = 68.69

PSS score = 0.375

SSAP score = 93.01

PSS score = 0.827

Pyruvate phosphate dikinase

(subfamily 15)

SSAP score = 68.32

PSS score =0.333

ATP Grasp

Family

192 subfamilies


Exploiting structural and comparative genomics to reveal protein functions

subfamily profiles coloured by residue conservation

(red = high, blue = low)

Profiles aligned using profile

-profile comparison (MAFFT)

Pyruvate phosphate dikinase

Pyruvate phosphate dikinase

Many fully conserved positions

6/7 positions are fully conserved

Equivalent functions

Scorecons (Valdar and Thornton, Profunc)


Exploiting structural and comparative genomics to reveal protein functions

subfamily profiles coloured by residue conservation

(red = high, blue = low)

Profiles aligned using profile

-profile comparison (MAFFT)

Succinyl-CoA synthetase

Pyruvate phosphate dikinase

Fully conserved positions

No fully conserved positions

Different functions

Scorecons (Valdar and Thornton, Profunc)


Exploiting structural and comparative genomics to reveal protein functions

Performance in Merging Subfamilies into Functional Groups

Number of functional groups predicted

Error rate

10 experimentally identified enzyme functions identified in this family


Exploiting structural and comparative genomics to reveal protein functions

GEMMA – Predicting Functional Groups in CATH Superfamilies

subfamily of close relatives predicted to have similar function (>60% identity)

structural

superfamily

functional group

Benchmarked on 12 large enzyme families in CATH

6-10 fold reduction in the number of functional subfamilies


Exploiting structural and comparative genomics to reveal protein functions

Summary

  • More than half the domains in UniProt can be assigned to families of known structure

  • Analysis of some very large structural families revealed how secondary structure insertions can modulate functions

  • Functional groups can be identified in diverse families by comparing multiple features (e.g. residue conservation, predicted secondary structure)


Exploiting structural and comparative genomics to reveal protein functions

CATH

Gene3D

Lesley Greene

Stathis Sidderis

Russell Marsden

Ian Sillitoe

Sarah Addou

Juan Ranea

Tony Lewis

Dave Lee

Ollie Redfern

Alison Cuff

Mark Dibley

Ilhem Diboun

Adam Reid

Corin Yeats

Tim Dallman

http://www.biochem.ucl.ac.uk/bsm/cath_new

MRC, Wellcome Trust, NIH, EU -Biosapiens, Embrace, Enfin, BBSRC


  • Login