Principal Component Analysis in MD Simulation

Principal Component Analysis in MD Simulation Speaker: ZHOU Chen-Yang Supervisor: Wu Yun-Dong

Methods to analyze MD trajectory • Intuition-based coordinates • RMSD with respect to native state • Fraction of native contacts • Radius of gyration • Other observables • Advantage • Easy to understand • Convenient to do • Disadvantage • Inaccurate • Ineffecctive for non-native structures, or without good reference structure • Depend on previous knowledge

How to measure conformational change? But usually we don't, and it doesn't come up automatically If we already have optimal reaction coordinate • What we have to do: • Reduce dimension • Trajectory is too complicated • Good projection should be able to seperat of noise and signal • Classification/Clustering • Classify structures to different states • Algorithms include: • PCA: Principal Component Analysis • MDS: Multi-Dimensional Scaling Then we have: free energy landscape, transition pathway, transition rate ...

dPCA vs RMSD The figure represents the free energy landscape of Trp-zip2 at 300K, using Amber force field 99sb*-ildn. Projected to 2nd principal component and RMSD.

Genaral description of PCA • The central idea of PCA is to: • reduce the dimension • retain the variation • An example: • (x,y) is a randomly generated dataset • var(x) = 3.2, var(y) = 2.3 • (x,y) is either centered at (0,0) or at (3,3), which are mixed • PCA generates new coordinate (x',y'), and x' captures most of the variation • var(x') = 5.5, var(y') = 0.99

Example: Z=X1+X2 Key question understanding PCA • In practice, the principal components (PCs) are some linear combination of original coordinates. • Suppose we have a set of data containing 2 columns X1 and X2. Now we generate a new column of data Z=a1X1+a2X2, what is the variance of Z? Variance and covariance Why is it important? Because we are going to project the data set to a new coordinate Z, and our attemp is to choose a (a1, a2) to maximize the variance of Z.

Next step: change ato search the maximum of var(Z) Z=X1+X2: Z=a1X1+a2X2: Represented with matrix multiplication: Covariance Matrix: Σ Coefficients of original coordinate in PC, α var(Z)=Var(αX)=α'Σα

Maximize var(Z) First, we have to normalize a: Then, maximize var(Z) is to maximize Differentiate with respect to a1 l is the eigen value and a1 is the corresponding eigen vector of S Pick first several eigen vector as PC, or actually thecoefficient of PCs. Then project data to PCs, and the simplified data could be further analyzed with orther techniques such as clustering. eigen value ploted from large to small

PCA in application: Cartesian coordinates • Cartesian coordinates contain all the imformation • But often noisy Comparison of cPCA and dPCA in the analysis of Ala7MD simulation cPCA: cartesian PCA use cartesian coordinate Dashed blue line: Cartesian PCA Full red line: PCA using dihedral angle Mu, Y., Nguyen, P. H., & Stock, G. (2005). Proteins, 58(1), 45–52.

PCA in application: cPCA, dPCA and pPCA Advangtage: 1. reduction of dimensionality 2. constraint within coordinate Problem with dihedral: 1. dihedral angle is periodic 2. dihedral angle is not linear In application, people transform dihedral angle to its sin/cos values to do PCA, called dPCA

Application of dPCA: (Ab16-22)6 Free-energy diagram projected onto the first two principal components V1 and V2 of the dPCA for the hexamer. Nguyen, P. H., Li, M. S., Stock, G., Straub, J. E., & Thirumalai, D. (2007). PNAS, 104(1), 111–6.

dPCA in RNA analysis: flexible choice of internal coordinates Riccardi, L., Nguyen, P. H., & Stock, G. (2009). JPCB, 113(52), 16660–8.

Using dPCA to compare Trp-zip2 potential energy surface in different force field • REMD simulation of a short b-hairpin Trp-zip2 using: • ff99sb-ildn • ff99sb*-ildn • ff99sb-ildn-nmr • ff99C, our modified version of ff99sb-ildn

Using dPCA to compare Trp-zip2 potential energy surface in different force field Helical structure Native like turn Free energy landscape of Trp-zip2 at 300K, using Amber force field 99sb*-ildn. Projected to 1st and 2nd principal component, using dPCA of turn region. The reason for the extended energy surface is that it cannot form stable hairpin.

Using dPCA to compare Trp-zip2 potential energy surface in different force field Helical structure Native like turn The figure represents the free energy landscape of Trp-zip2 at 300K, using Amber force field 99sb-ildn. Projected to 1st and 2nd principal component of 99sb*-ildn, using dPCA of turn region.

Using dPCA to compare Trp-zip2 potential energy surface in different force field Helical structure Native like turn The figure represents the free energy landscape of Trp-zip2 at 300K, using Amber force field 99sb-ildn-nmr. Projected to 1st and 2nd principal component of 99sb*-ildn, using dPCA of turn region. 99sb-ildn-nmr cannot fold the Trp-zip2 hairpin.

Using dPCA to compare Trp-zip2 potential energy surface in different force field Native like turn The figure represents the free energy landscape of Trp-zip2 at 300K, using force field 99C. Projected to 1st and 2nd principal component of 99sb*-ildn, using dPCA of turn region. In our force field, Trp-zip2 form stable beta-turn so that it rarely sample other conformation.

Summary • PCA is a linear transformation of old coordinates to capture maximum variance • Instead of using Cartesian coordinates, dihedral angles could be a better choice in description of conformational change • General coordinates or a subset of coordinates (for region of interest) can be used for PCA analysis • The result of PCA could used for further analysis such as clustering and transition rate calculation.

Thank you!

Principal Component Analysis in MD Simulation

Principal Component Analysis in MD Simulation

Presentation Transcript

Principal Component Analysis

Principal component analysis

Principal Component Analysis

Principal Component Analysis

Principal Component Analysis

Principal Component Analysis

Principal Component Analysis

Principal Component Analysis

Principal Component Analysis

Principal Component Analysis

Principal Component Analysis

Principal Component Analysis

Principal Component Analysis

Principal Component Analysis

Principal Component Analysis

Principal Component Analysis

Principal Component Analysis

Principal Component Analysis

Principal Component Analysis