exploiting structural and comparative genomics to reveal protein functions
Download
Skip this Video
Download Presentation
Exploiting Structural and Comparative Genomics to Reveal Protein Functions

Loading in 2 Seconds...

play fullscreen
1 / 30

Exploiting Structural and Comparative Genomics to Reveal Protein Functions - PowerPoint PPT Presentation


  • 98 Views
  • Uploaded on

Predicting domain structure families and their domain contexts Exploring how structural divergence in domain families correlates with functional change Predicting domain relatives likely to have significantly different structures and functions.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Exploiting Structural and Comparative Genomics to Reveal Protein Functions' - arden-joyce


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
exploiting structural and comparative genomics to reveal protein functions

Predicting domain structure families and their domain contexts

  • Exploring how structural divergence in domain families correlates with functional change
  • Predicting domain relatives likely to have significantly different structures and functions
Exploiting Structural and Comparative Genomics to Reveal Protein Functions

T

H

A

C

Domain families of known structure

Gene3D

Protein families and domain annotations for completed genomes

slide2

Congratulations Swiss-Prot - 20 Years!!

Thanks to Amos, Rolf and the Swiss-Prot Team!!!!

slide3

T

H

A

C

Class (3)

Orengo and Thornton

(1994)

Architecture (36)

86,000 domains

Topology or Fold (1100)

Homologous superfamily (2100)

H1

H2

H3

slide4

Gene3D:Domain annotations in genome sequences

scan against

library of HMM models

~2100 CATH

~8300 Pfam

>2 million protein sequences

from 300 completed genomes and UniProt

assign domains to

CATH and Pfam superfamilies

Benchmarking by structural data shows that 76% of remote homologues can be identified using the HMMs

slide5

DomainFinder: structural domains from CATH take precedent

NewFam

Pfam-1

CATH-1

Pfam-2

Gene3D:

Domain annotations in genome sequences

UniProt sequence

N

C

CATH-1

Pfam-2

Pfam-1

NewFam

Assigned domains

slide6

Domain families ranked by size (number of domain sequences)

NewFam of unknown stucture

(>50,000 families)

Pfam families of unknown structure

Percentage of all domain family sequences in UniProt

CATH superfamilies of known structure

Rank by family size

>90% of domain sequences in UniProt can be assigned to ~7000 domain families

slide7

Domain families ranked by size (number of domain sequences)

NewFam of unknown stucture

(>50,000 families)

Pfam families of unknown structure

Percentage of all domain family sequences in UniProt

CATH superfamilies of known structure

Rank by family size

100 largest families of known structure account for 30% of domain sequences in UniProt

slide8

Correlation of sequence and structural variability of CATH families with the number of different functional groups

Structural Diversity

Population in genomes

exploiting structural and comparative genomics to reveal protein functions1

Prediting domain structure families and their domain contexts

  • Exploring how structural divergence in domain families correlates with functional change
  • Predicting domain relatives likely to have significantly different structures and functions
Exploiting Structural and Comparative Genomics to Reveal Protein Functions

T

H

A

C

Domain families of known structure

Gene3D

Protein families and domain annotations for completed genomes

slide10

Some superfamilies show great structural diversity

Gabrielle Reeves

J. Mol. Biol. (2006)

Multiple structural alignment by CORA allows identification of consensus secondary structures and secondary structure embellishments

2DSEC algorithm

In 117 superfamilies relatives expanded by >2 fold or more

slide12

Structural embellishments can modulate domain interactions

side orientation

face orientation

Glucose 6-phosphate dehydrogenase

a

Dihydrodipiccolinate reductase

Additional secondary structure shown at (a) are involved in subunit interactions

slide13
Structural embellishments can modify function by modifying active site geometry and mediating new domain and subunit interactions

Biotin carboxylase

D-alanine-d-alanine ligase

ATP Grasp

superfamily

Dimer of biotin carboxylase

slide17

80

60

Frequency (%)

40

Indel frequency < 1 %

20

0.85% 0.38% 0.23% 0.11% 0.06% 0.02%

0

1

2

3

4

5

6

7

8

9

10

11

12

Size of Indel (number of secondary structures)

85% of insertions comprise only 1 or 2 secondary structures

Frequency (%)

Size of insertion (number of secondary structures)

For ~70% of domains analysed, 80% of the secondary structure embellishments are co-located in 3D with 3 or more other embellishments

In 80% of domains, 1 or more embellishments contacts other domains or subunits

slide18

3 Layer Alpha/Beta Sandwich

2 Layer Alpha/Beta

Alpha/Beta Barrel

2 Layer Beta Sandwich

Many structurally diverse superfamilies adopt folds with these regular layered architectures

slide19

3 Layer Alpha/Beta Sandwich

2 Layer Alpha/Beta

Alpha/Beta Barrel

2 Layer Beta Sandwich

Many structurally diverse superfamilies adopt folds with these regular layered architectures

exploiting structural and comparative genomics to reveal protein functions2

Predicting domain structure families and their domain contexts

  • Exploring how structural divergence in domain families correlates with functional change
  • Predicting domain relatives likely to have significantly different structures and functions
Exploiting Structural and Comparative Genomics to Reveal Protein Functions

T

H

A

C

Domain families of known structure

Gene3D

Protein families and domain annotations for completed genomes

slide21

GEMMA – GEne Model and Model AnnotationAlgorithm for Predicting Sequence Homologues with Similar Structures and Functions

structural

superfamily

subfamily of close sequence relatives predicted to have similar functions

(>=60% sequence identity)

Largest 100 CATH families have more than 20,000 subfamilies

slide22

GEMMA – Predicting Functional Groups in CATH Superfamilies

subfamily of close relatives predicted to have similar function (>60% identity)

structural

superfamily

Build multiple sequence alignments for each subfamily

slide23

GEMMA – Predicting Functional Groups in CATH Superfamilies

subfamily of close relatives predicted to have similar function (>60% identity)

structural

superfamily

Cluster subfamilies predicted to have similar functions into functional groups

slide24

Pyruvate phosphate dikinase (subfamily 1)

Succinyl-CoA synthetase

(subfamily 22)

SSAP score = 68.69

PSS score = 0.375

SSAP score = 93.01

PSS score = 0.827

Pyruvate phosphate dikinase

(subfamily 15)

SSAP score = 68.32

PSS score =0.333

ATP Grasp

Family

192 subfamilies

slide25

subfamily profiles coloured by residue conservation

(red = high, blue = low)

Profiles aligned using profile

-profile comparison (MAFFT)

Pyruvate phosphate dikinase

Pyruvate phosphate dikinase

Many fully conserved positions

6/7 positions are fully conserved

Equivalent functions

Scorecons (Valdar and Thornton, Profunc)

slide26

subfamily profiles coloured by residue conservation

(red = high, blue = low)

Profiles aligned using profile

-profile comparison (MAFFT)

Succinyl-CoA synthetase

Pyruvate phosphate dikinase

Fully conserved positions

No fully conserved positions

Different functions

Scorecons (Valdar and Thornton, Profunc)

slide27

Performance in Merging Subfamilies into Functional Groups

Number of functional groups predicted

Error rate

10 experimentally identified enzyme functions identified in this family

slide28

GEMMA – Predicting Functional Groups in CATH Superfamilies

subfamily of close relatives predicted to have similar function (>60% identity)

structural

superfamily

functional group

Benchmarked on 12 large enzyme families in CATH

6-10 fold reduction in the number of functional subfamilies

slide29

Summary

  • More than half the domains in UniProt can be assigned to families of known structure
  • Analysis of some very large structural families revealed how secondary structure insertions can modulate functions
  • Functional groups can be identified in diverse families by comparing multiple features (e.g. residue conservation, predicted secondary structure)
slide30

CATH

Gene3D

Lesley Greene

Stathis Sidderis

Russell Marsden

Ian Sillitoe

Sarah Addou

Juan Ranea

Tony Lewis

Dave Lee

Ollie Redfern

Alison Cuff

Mark Dibley

Ilhem Diboun

Adam Reid

Corin Yeats

Tim Dallman

http://www.biochem.ucl.ac.uk/bsm/cath_new

MRC, Wellcome Trust, NIH, EU -Biosapiens, Embrace, Enfin, BBSRC

ad