Identifying Functional signatures in Proteins - a computational design approach - PowerPoint PPT Presentation

dalmar
identifying functional signatures in proteins a computational design approach n.
Skip this Video
Loading SlideShow in 5 Seconds..
Identifying Functional signatures in Proteins - a computational design approach PowerPoint Presentation
Download Presentation
Identifying Functional signatures in Proteins - a computational design approach

play fullscreen
1 / 13
Download Presentation
93 Views
Download Presentation

Identifying Functional signatures in Proteins - a computational design approach

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Identifying Functional signatures in Proteins - a computational design approach David Bernick Rohl group16-Mar-2005

  2. The big picture • what is function? • hinges • substrate/DNA/protein binding/alignment/recognition • catalytic sites • what isn’t function ? (structure) • secondary structures, • fold architecture • thermodynamically required elements • nature selects for function (structure is implicit) • computational methods select for structure • can we predict…quickly ?

  3. Some terms • pssm - position specific score matrix • a [20 x length] model of residue frequencies for every position of sequence family • homolog - natural sequences evolved from a common parent • morpholog - computationally derived sequence generated from a parent structure • ortholog - common ancestor, derived by speciation (constrained functional divergence) • paralog - common ancestor, same species (unconstrained functional divergence)

  4. pssm from an alignment

  5. structure ensembles • Larson (2003) - Improved homology searches • Pei(2003) - Homology detection and active site searches • Kuhlman(2000) - Structural optimality of Natural sequences


  6. Results - SH3 domain 11 Structures 62 additional sequences

  7. Results - S100 domain Ca++ loop1 not detected backbone coordinated residues Ca++ loop2 not detected insufficient homolog depth 11 structures 30 additional sequences

  8. the protocol Sequence CE+SCOPTaylorDomsFlexible Design cogs, pfam, reverse blast blast representative structure homolog Alignment paralog structures fixeddesign score pssmH pssmM statistical geometric

  9. genome scale • high cost step - producing pssmM • precalculate pssmM for every domain

  10. morpholog pssmsgenome scale • Data Sources • Taylor parsed Domain database • CE all-to-all + SCOP • Precompute pssms for every domain • ~8000 domains • 100 sequences ~90% diversity1000 sequences ~99% diversity • ~4-8 wks, 70p cluster for initial set

  11. scoring • compare PSSMh to PSSMm • PSSMm contains only structure signal • PSSMh contains both function and structure • each position represents a count-normalized position in 20-space (H or M) • R-position -- average aa position • RH and RM define 20 space vectors • ‘function vector’ • ‘structure vector’

  12. next steps • complete this set of domains - verification • full domain pssmM generation

  13. acknowledgements • Carol Rohl • Kevin Karplus • Craig Lowe • Rohl group • HP