Create Presentation
Download Presentation

Download Presentation

Anna Yershova Department of Computer Science Duke University February 5, 2010

Anna Yershova Department of Computer Science Duke University February 5, 2010

119 Views

Download Presentation
Download Presentation
## Anna Yershova Department of Computer Science Duke University February 5, 2010

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**Automated High-Resolution Protein Structure Determination**using Residual Dipolar Couplings Anna Yershova Department of Computer Science Duke University February 5, 2010 Feb 5 2010, NC State University Automated Protein Structure Determination using RDCs**Introduction**Motivation Protein Structure Determination is Important Amino acid sequences Structures Functions Protein redesign • High-resolution structures are needed for: • Determining protein functions • Protein redesign 2**Introduction**Motivation 1 2 3 4 What is Protein Structure: Primary Structure The sequence of amino acids forms the backbone.Residues are sidechains attached to the backbone. 3 Dihedral angle Side chain Amino acid**Introduction**Motivation What is Protein Structure: Secondary Structure Elements Local folding is maintained by short distance interactions. 4**Introduction**Motivation What is Protein Structure: 3D Fold Global 3D folding is maintained by more distant interactions. Alpha-helix Side chain Loop Beta-strands 5**Introduction**Motivation High-Throughput Structure Determination Is Important The gap between sequences and structures http://www.metabolomics.ca/News/lectures/CPI2008-short.pdf 6**Introduction**Motivation Current Approaches for Structure Determination • X-ray crystallography • Difficulty: growing good quality crystals • Nuclear Magnetic Resonance (NMR) spectroscopy • Difficulty: lengthy (expensive) time in processing and analyzing experimental data Both require expressing and purifying proteins. 7**Introduction**Motivation Bruce Donald’s Lab • Michael Zeng Chittu Tripathy • Lincong Wang Pei Zhou Bruce Donald Cheng-Yu Chen John MacMaster 8**Introduction**Motivation Types of NMRSpectroscopy Data 4.2 R Ha NOE 133.1 172.1 B0 8.9 • Chemical shift (CS) • Unique resonance frequency, serves as an ID • Nuclear Overhauser effect (NOE) • Local distance restraint between two protons • Residual dipolar coupling (RDC) • Global orientational restraint for bond vectors 9**Introduction**Motivation Resonance Assignment Problem Assigning chemical shifts to each atom 10 http://www.pnas.org/content/102/52/18890/suppl/DC1 Bailey-Kellogg et al., 2000, 2004**Introduction**Motivation NOE Assignment Problem Obtain local distance restraints between protons A famous bottleneck 11 Bailey-Kellogg et al., 2000, 2004**Introduction**Motivation . . . a1 a2 a3 an . . . a1 4 3 . . . a2 4 ? . . . a3 3 ? . . . . . . . . . . . . . . . an Structure Determination from NOEs NOESY spectrum Resonance assignments NOE assignment Assignment Ambiguity Distance Geometry NP-Hard [Saxe ’79; Hendrickson ’92, ’95] 12**Introduction**Motivation Resonance assignments NOESY spectra SA/MD Initial fold NOE Assignments XPLOR-NIH Structure Refinement RDCs NOE Assignments 3D Structures Protein Structure Determination is Hard Traditional Structure Determination Protocol A famous bottleneck 13**Introduction**Motivation Resonance assignments NOESY spectra SA/MD Initial fold NOE Assignments XPLOR-NIH Structure Refinement RDCs NOE Assignments 3D Structures Protein Structure Determination is Hard Traditional Structure Determination Protocol error propagation local minima manual intervention for initial fold and for evaluation of NOE assignments A famous bottleneck Can we have a poly-time algorithm using orientational restraints? • Yes: Wang and Donald, 2004; Wang et al, 2006 14**Introduction**Motivation Types of NMRSpectroscopy Data 4.2 R Ha NOE 133.1 172.1 B0 8.9 • Chemical shift (CS) • Unique resonance frequency, serves as an ID • Nuclear Overhauser effect (NOE) • Local distance restraint between two protons • Residual dipolar coupling (RDC) • Global orientational restraint for bond vectors 15**Background**RDCs Szz Syy Sxx v D RDC Equation for a Single Bond Alignment medium b B0 v a S – Saupe Matrix S is traceless and symmetric S contains 5 dofs 16**Introduction**Motivation Resonance assignments NOESY spectra SA/MD Initial fold NOE Assignments XPLOR-NIH Structure Refinement RDCs NOE Assignments 3D Structures Protein Structure Determination is Hard Traditional Structure Determination VS RDC-Panda RDC-PANDA Protocol Constaint number of NOEs RDCs error propagation RDC-ANALYTIC PACKER local minima GlobalFold manual intervention for initial fold and for evaluation of NOE assignments Sidechain Placement NOE Assignments XPLOR-NIH NOEAssignments3DStructures 17 Zeng et al. (Jour. Biomolecular NMR,2009)**Introduction**Motivation Importance of Backbone Structure Determination Global orientational restraints from RDCs Sparce data (high-throughput, large proteins, membraine proteins) Compute initial fold using exact solutions to RDC equations Avoid the NP-Hard problem of structure determination from NOEs Resolve NOE assignment ambiguity Automated side-chain resonance assignment 18**Introduction**Motivation Current Limitations of RDC-Panda Because it requires only 2 RDCs per residue: • Only SSE elements can be reliably determined, NOEs are needed to determine structure of loops • Difficulty in handling missing data 19**Introduction**Motivation My Current Project • Improve current protein structure determination techniques from our lab • Design new algorithms for protein backbone structure determination using orientational restraints from RDCs 20**Distance geometry based structure determination**Braun, 1987 Crippen and Havel, 1988 More and Wu, 1999 Heuristic based structure determination Brünger, 1992 Nilges et al., 1997 Güntert, 2003 Rieping et al., 2005 RDC-based structure determination Tolman et al., 1995 Tjandra and Bax, 1997 Hus et al., 2001 Tian et al., 2001 Prestegard et al., 2004 Wang and Donald (CSB 2004) Wang and Donald (Jour. Biomolecular NMR, 2004) Wang, Mettu and Donald (JCB 2005) Donald and Martin (Progress in NMR Spectroscopy, 2009 ) Ruan et al., 2008 Zeng et al. (Jour. Biomolecular NMR,2009) Introduction Motivation Literature Overview • Heuristic based automated NOE assignment • Mumenthaler et al., 1997 • Nilges et al., 1997, 2003 • Herrmann et al., 2002 • Schwieters et al., 2003 • Kuszewski et al., 2004 • Huang et al., 2006 • Automated NOE assignment starting with initial fold computed from RDCs • Wang and Donald (CSB 2005) • Zeng et al. (CSB 2008) • Zeng et al. (Jour. Biomolecular NMR,2009) • Automated side-chain resonance assignment • Li and Sanctuary, 1996, 1997 • Marin et al., 2004 • Masse et al., 2006 • Zeng et al. (In submission, 2009) 21**Background**RDCs Szz Syy Sxx v D RDC Equation for a Single Bond Linear in S, A fixed v defines a hyperplane Quadratic in v, A fixed S defines a hyperboloid S 22**Background**RDCs RDC Equation for a Single Bond 1 RDC equation defines a collection of hyperplanes, 7 variables Linear in S, A fixed v defines a hyperplane Quadratic in v, A fixed S defines a hyperboloid S 23**Background**RDCs 1 2 3 4 RDC Equations for a Protein Portion 24**Background**RDCs RDC Equations for a Protein Portion 1 2 3 4 u1 v1 v2 [1] L. Wang and B. R. Donald. J. Biomol. NMR, 29(3):223–242, 2004. [2] J. Zeng, J. Boyles, C. Tripathy, L. Wang, A. Yan, P. Zhou, and B. R. Donald J. Biomol. NMR, [Epub ahead of print] PMID:19711185, 2009. Too few equations, too many variables! 25**Background**RDCs Forward Kinematics Reduces the Number of Variables v1 Fix coordinate system. v2 u1 26**Background**RDCs RDC Equations for a Protein Portion v1 v2 u1 27**Background**RDCs RDC Equations for a Protein Portion Recursive representation is possible! 28**Background**RDCs One Equation Per Dihedral Angle is Not Enough! • Each equation is linear in S, and quartic in either tan() or tan() • To be able to solve this system there must be additional information: • Possible scenarios: • Additional RDC measurement(s) for each dihedral angle. • Additional alignment media. • Additional NOE data. • Modeling (Ramachandran regions, steric clashes, energy function) • Sampling (for alignment tensors) 29**Background**RDC-Panda The RDC-PANDA Structure Determination Package • Current requirements • 2 RDCs per residue to obtain SSE structures • Sparse NOEs to pack the SSEs • Current bottlenecks • Missing data (even in long SSEs) • Long loops • Sampling for computing alignment tensor(s) • Sampling for the orientation of the first pp [1] L. Wang and B. R. Donald. J. Biomol. NMR, 29(3):223–242, 2004. [2] J. Zeng, J. Boyles, C. Tripathy, L. Wang, A. Yan, P. Zhou, and B. R. Donald J. Biomol. NMR, [Epub ahead of print] PMID:19711185, 2009. 30**Background**RDC-Panda When Saupe Matrix is Known Solution Can Be Found Exactly! Ellipse equations for CH bond vector Wang & Donald, 2004; Donald & Martin, 2009.**Background**RDC-Panda Solution Structure of FF Domain 2 of human transcription elongation factor CA150 (FF2) using RDC-PANDA Solution Structure Deposited Using RDC-Panda PDB ID: 2KIQ In collaboration with Dr. Zhou’s Lab 32**Current Project**Problem Formulation: NH, CH RDCs in 2 Media We require measurements for at least 9 consecutive bond vectors (4.5 residues) in 2 media. The goal is to handle more equations and errors. 33**Current Project**Relationship to Minimization 34**Current Project**b A s Relationship to Minimization and SVD Solving an over constrained system of linear equations is equivalent to finding a projection of the b vector on the A hyperplane. This is also equivalent to minimizing the least square function of the terms. 35**Current Project**Relationship to Minimization 36**Current Project**Relationship to Minimization and SVD b A(i i) s Solving such a system of non-linear equations is not trivial! There are multiple local minima in the corresponding minimization problem. 37**Current Project**Advantages • If the minimization problem is solved then • Computation of packed SSEs and loops is possible without additional NOE data. • Saupe matrices for each of the alignment medium can be computed without sampling. • Robust handling of missing values 38**Current Project**The Algorithm: Initialization Using Helix Initialize(i,i) for a helix Compute initial approximation for Si using SVD Compute (i,i) using tree search and minimization Update Si using SVD 39**Current Project**The Algorithm: Protein Portion Initialize Si to computed approximations Compute (i,i) using tree search and minimization Update Si using SVD 40**Current Project**The Algorithm: Computing Dihedrals 1 Minimize each of the RMSD terms as a univariate function. ψ1 x x n x ψn Iteratively minimize the RMSD function x Compute the list of best solutions. 41**Current Project**Advantages • The algorithm is converging, since every step minimizes RMSD function • If the data was “perfect” then the solution to the minimization problem would be the roots of the polynomials in the RMSD terms, and the algorithm would find ALL of them. • The minima of the RMSD terms give a good collection of initial structures for finding local and global minima • Robust handling of missing values 42**Preliminary Results**Preliminary Results: Ubiquitin Helix Conformation of the portion [25-31] of the helix for human ubiquitin computed using NH and CH RDCs in two media (red) has been superimposed on the same portion from high-resolution X-ray structure (PDB Id: 1UBQ) (green). The backbone RMSD is 0.58 Å. 43**Preliminary Results**Preliminary Results: Ubiquitin Strand Conformation of the portion [2-7] of the beta-strand for human ubiquitin computed using NH and CH RDCs in two media has been superimposed on the same portion from high-resolution X-ray structure (PDB Id: 1UBQ). The backbone RMSD is 1.151 Å. 44**Conclusions**• Complete and exhaustive search over the space of all structures minimizing the RDC fit function seems feasible due to understanding the structure of the solution. • Possible and exiting extensions to more/different data Funding: NIH Thank you! 45**Comparison**Sparse Accuracy: Data requirements vs. Accuracy (Ubiquitin): 46