Exploiting structural and comparative genomics to reveal protein functions
Download
1 / 30

Exploiting Structural and Comparative Genomics to Reveal Protein Functions - PowerPoint PPT Presentation


  • 98 Views
  • Uploaded on

Predicting domain structure families and their domain contexts Exploring how structural divergence in domain families correlates with functional change Predicting domain relatives likely to have significantly different structures and functions.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Exploiting Structural and Comparative Genomics to Reveal Protein Functions' - arden-joyce


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Exploiting structural and comparative genomics to reveal protein functions

Exploiting Structural and Comparative Genomics to Reveal Protein Functions

T

H

A

C

Domain families of known structure

Gene3D

Protein families and domain annotations for completed genomes


Congratulations Swiss-Prot - 20 Years!! contexts

Thanks to Amos, Rolf and the Swiss-Prot Team!!!!


T contexts

H

A

C

Class (3)

Orengo and Thornton

(1994)

Architecture (36)

86,000 domains

Topology or Fold (1100)

Homologous superfamily (2100)

H1

H2

H3


Gene3D contexts:Domain annotations in genome sequences

scan against

library of HMM models

~2100 CATH

~8300 Pfam

>2 million protein sequences

from 300 completed genomes and UniProt

assign domains to

CATH and Pfam superfamilies

Benchmarking by structural data shows that 76% of remote homologues can be identified using the HMMs


DomainFinder contexts: structural domains from CATH take precedent

NewFam

Pfam-1

CATH-1

Pfam-2

Gene3D:

Domain annotations in genome sequences

UniProt sequence

N

C

CATH-1

Pfam-2

Pfam-1

NewFam

Assigned domains


Domain families ranked by size (number of domain sequences) contexts

NewFam of unknown stucture

(>50,000 families)

Pfam families of unknown structure

Percentage of all domain family sequences in UniProt

CATH superfamilies of known structure

Rank by family size

>90% of domain sequences in UniProt can be assigned to ~7000 domain families


Domain families ranked by size (number of domain sequences) contexts

NewFam of unknown stucture

(>50,000 families)

Pfam families of unknown structure

Percentage of all domain family sequences in UniProt

CATH superfamilies of known structure

Rank by family size

100 largest families of known structure account for 30% of domain sequences in UniProt


Correlation of sequence and structural variability of CATH families with the number of different functional groups

Structural Diversity

Population in genomes


Exploiting structural and comparative genomics to reveal protein functions1

Exploiting Structural and Comparative Genomics to Reveal Protein Functions

T

H

A

C

Domain families of known structure

Gene3D

Protein families and domain annotations for completed genomes


Some superfamilies show great structural diversity contexts

Gabrielle Reeves

J. Mol. Biol. (2006)

Multiple structural alignment by CORA allows identification of consensus secondary structures and secondary structure embellishments

2DSEC algorithm

In 117 superfamilies relatives expanded by >2 fold or more


Structural embellishments can modify the active site contexts

Galectin binding superfamily


Structural embellishments can modulate domain interactions contexts

side orientation

face orientation

Glucose 6-phosphate dehydrogenase

a

Dihydrodipiccolinate reductase

Additional secondary structure shown at (a) are involved in subunit interactions


Structural embellishments can modify function by modifying active site geometry and mediating new domain and subunit interactions

Biotin carboxylase

D-alanine-d-alanine ligase

ATP Grasp

superfamily

Dimer of biotin carboxylase




80 chain but aggregate in 3D

60

Frequency (%)

40

Indel frequency < 1 %

20

0.85% 0.38% 0.23% 0.11% 0.06% 0.02%

0

1

2

3

4

5

6

7

8

9

10

11

12

Size of Indel (number of secondary structures)

85% of insertions comprise only 1 or 2 secondary structures

Frequency (%)

Size of insertion (number of secondary structures)

For ~70% of domains analysed, 80% of the secondary structure embellishments are co-located in 3D with 3 or more other embellishments

In 80% of domains, 1 or more embellishments contacts other domains or subunits


3 Layer Alpha/Beta Sandwich chain but aggregate in 3D

2 Layer Alpha/Beta

Alpha/Beta Barrel

2 Layer Beta Sandwich

Many structurally diverse superfamilies adopt folds with these regular layered architectures


3 Layer Alpha/Beta Sandwich chain but aggregate in 3D

2 Layer Alpha/Beta

Alpha/Beta Barrel

2 Layer Beta Sandwich

Many structurally diverse superfamilies adopt folds with these regular layered architectures


Exploiting structural and comparative genomics to reveal protein functions2

Exploiting Structural and Comparative Genomics to Reveal Protein Functions

T

H

A

C

Domain families of known structure

Gene3D

Protein families and domain annotations for completed genomes


GEMMA – GEne Model and Model Annotation contextsAlgorithm for Predicting Sequence Homologues with Similar Structures and Functions

structural

superfamily

subfamily of close sequence relatives predicted to have similar functions

(>=60% sequence identity)

Largest 100 CATH families have more than 20,000 subfamilies


GEMMA – Predicting Functional Groups in CATH Superfamilies contexts

subfamily of close relatives predicted to have similar function (>60% identity)

structural

superfamily

Build multiple sequence alignments for each subfamily


GEMMA – Predicting Functional Groups in CATH Superfamilies contexts

subfamily of close relatives predicted to have similar function (>60% identity)

structural

superfamily

Cluster subfamilies predicted to have similar functions into functional groups


Pyruvate phosphate dikinase (subfamily 1) contexts

Succinyl-CoA synthetase

(subfamily 22)

SSAP score = 68.69

PSS score = 0.375

SSAP score = 93.01

PSS score = 0.827

Pyruvate phosphate dikinase

(subfamily 15)

SSAP score = 68.32

PSS score =0.333

ATP Grasp

Family

192 subfamilies


subfamily profiles coloured by residue conservation contexts

(red = high, blue = low)

Profiles aligned using profile

-profile comparison (MAFFT)

Pyruvate phosphate dikinase

Pyruvate phosphate dikinase

Many fully conserved positions

6/7 positions are fully conserved

Equivalent functions

Scorecons (Valdar and Thornton, Profunc)


subfamily profiles coloured by residue conservation contexts

(red = high, blue = low)

Profiles aligned using profile

-profile comparison (MAFFT)

Succinyl-CoA synthetase

Pyruvate phosphate dikinase

Fully conserved positions

No fully conserved positions

Different functions

Scorecons (Valdar and Thornton, Profunc)


Performance in Merging Subfamilies into Functional Groups contexts

Number of functional groups predicted

Error rate

10 experimentally identified enzyme functions identified in this family


GEMMA – Predicting Functional Groups in CATH Superfamilies contexts

subfamily of close relatives predicted to have similar function (>60% identity)

structural

superfamily

functional group

Benchmarked on 12 large enzyme families in CATH

6-10 fold reduction in the number of functional subfamilies


Summary contexts

  • More than half the domains in UniProt can be assigned to families of known structure

  • Analysis of some very large structural families revealed how secondary structure insertions can modulate functions

  • Functional groups can be identified in diverse families by comparing multiple features (e.g. residue conservation, predicted secondary structure)


CATH contexts

Gene3D

Lesley Greene

Stathis Sidderis

Russell Marsden

Ian Sillitoe

Sarah Addou

Juan Ranea

Tony Lewis

Dave Lee

Ollie Redfern

Alison Cuff

Mark Dibley

Ilhem Diboun

Adam Reid

Corin Yeats

Tim Dallman

http://www.biochem.ucl.ac.uk/bsm/cath_new

MRC, Wellcome Trust, NIH, EU -Biosapiens, Embrace, Enfin, BBSRC


ad