130 likes | 257 Views
This presentation by David Bernick from the Rohl group discusses the computational methodologies for identifying functional signatures in protein structures. It covers fundamentals such as binding interactions, functional vs. non-functional structures, catalytic sites, and the importance of secondary structures and fold architecture. Emphasis is placed on computational techniques like position-specific score matrices (PSSM) for detecting homologous sequences and predicting protein functions. The outcomes of analyses using various structural domains underscore the challenge of distinguishing functional signals from structural data.
E N D
Identifying Functional signatures in Proteins - a computational design approach David Bernick Rohl group16-Mar-2005
The big picture • what is function? • hinges • substrate/DNA/protein binding/alignment/recognition • catalytic sites • what isn’t function ? (structure) • secondary structures, • fold architecture • thermodynamically required elements • nature selects for function (structure is implicit) • computational methods select for structure • can we predict…quickly ?
Some terms • pssm - position specific score matrix • a [20 x length] model of residue frequencies for every position of sequence family • homolog - natural sequences evolved from a common parent • morpholog - computationally derived sequence generated from a parent structure • ortholog - common ancestor, derived by speciation (constrained functional divergence) • paralog - common ancestor, same species (unconstrained functional divergence)
structure ensembles • Larson (2003) - Improved homology searches • Pei(2003) - Homology detection and active site searches • Kuhlman(2000) - Structural optimality of Natural sequences
Results - SH3 domain 11 Structures 62 additional sequences
Results - S100 domain Ca++ loop1 not detected backbone coordinated residues Ca++ loop2 not detected insufficient homolog depth 11 structures 30 additional sequences
the protocol Sequence CE+SCOPTaylorDomsFlexible Design cogs, pfam, reverse blast blast representative structure homolog Alignment paralog structures fixeddesign score pssmH pssmM statistical geometric
genome scale • high cost step - producing pssmM • precalculate pssmM for every domain
morpholog pssmsgenome scale • Data Sources • Taylor parsed Domain database • CE all-to-all + SCOP • Precompute pssms for every domain • ~8000 domains • 100 sequences ~90% diversity1000 sequences ~99% diversity • ~4-8 wks, 70p cluster for initial set
scoring • compare PSSMh to PSSMm • PSSMm contains only structure signal • PSSMh contains both function and structure • each position represents a count-normalized position in 20-space (H or M) • R-position -- average aa position • RH and RM define 20 space vectors • ‘function vector’ • ‘structure vector’
next steps • complete this set of domains - verification • full domain pssmM generation
acknowledgements • Carol Rohl • Kevin Karplus • Craig Lowe • Rohl group • HP