Bioinformatics data analysis tools
Download
1 / 64

Bioinformatics Data Analysis & Tools - PowerPoint PPT Presentation


  • 108 Views
  • Uploaded on

Bioinformatics Data Analysis & Tools. Molecular simulations & sampling techniques. Molecular Simulations: Brief History. Protein flexibility. Also a correctly folded protein is dynamic Crystal structure yields average position of the atoms ‘Breathing’ overall motion possible. B-factors.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Bioinformatics Data Analysis & Tools' - agnes


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Bioinformatics data analysis tools

Bioinformatics Data Analysis & Tools

Molecular simulations & sampling techniques

Molecular Simulations & Sampling Techniques


Molecular simulations brief history
Molecular Simulations: Brief History

Molecular Simulations & Sampling Techniques


Protein flexibility
Protein flexibility

  • Also a correctly folded protein is dynamic

    • Crystal structure yields average position of the atoms

    • ‘Breathing’ overall motion possible

Molecular Simulations & Sampling Techniques


B factors
B-factors

  • De gemiddelde beweging van atoom rond gemiddelde positie

alpha helices

beta-sheet

Molecular Simulations & Sampling Techniques


Peptide folding from simulation
Peptide folding from simulation

  • A small (beta-)peptide forms helical structure according to NMR

  • Computer simulations of the atomic motions: molecular dynamics

Molecular Simulations & Sampling Techniques


Folding and un folding in 200 ns

unfolded

folded

Folding and un-folding in 200 ns

all different?

how different?

Unfolded structures

321 1010 possibilities!

Folded structures

all the same

Molecular Simulations & Sampling Techniques


Temperature dependence

unfolded

folded

folding equilibrium depends on temperature

Temperature dependence

360 K

350 K

340 K

320 K

298 K

Molecular Simulations & Sampling Techniques


Pressure dependence

unfolded

folded

folding equilibrium depends on pressure

Pressure dependence

2000 atm

1000 atm

1 atm

Molecular Simulations & Sampling Techniques


Surprising result
Surprising result

  • Number of relevant non-folded structures is very much smaller than the number of possible non-folded structures

  • If the number of relevant non-folded structures increases proportionally with the folding time, only 109 protein structures need to be simulated in stead of 1090 structures

  • Folding-mechanism perhaps simpler after all…

Molecular Simulations & Sampling Techniques


Phase space
Phase Space

  • Defines state of classical system of N particles:

    • coordinates q = (x1, y1, z1, x2, … , zN)

    • momenta p = (px1, py1, pz1, px2, … , pzN)

  • One conformation (+ momenta) is one point (p,q) in phase space

  • Motion is a curved line in phase space

    • trajectory: (p(t),q(t))

Molecular Simulations & Sampling Techniques


Molecular motions time length scales
Molecular Motions: Time & Length-scales

Molecular Simulations & Sampling Techniques


Newton dynamics
Newton Dynamics

Sir Isaac Newton

t

t + Dt

Molecular Simulations & Sampling Techniques


Classical newton mechanics
Classical (Newton) Mechanics

  • A system has coordinates q and momenta p (= mv):

    p = ( p1, p2, … , pN )

    q = ( q1, q2, … , qN )

  • This is called the configuration space.

  • The total energy can be split into two components:

    • kinetic energy (K):

      K(p) = ½ mv2 = ½ p2/m

    • potential energy (V):

      V(q) depends on interaction(s)

  • The potential energy is described by

    • bonded interactions (e.g. bond stretching, angle bending)

    • non-bonded interactions (e.g. van der Waals, electrostatic)

  • Non-bonded interactions determine the conformational variation that we observe for example in protein motions.

Molecular Simulations & Sampling Techniques


The hamilton function
The Hamilton Function

  • The Hamiltonian function represents the total energy:H(p,q) = K(p) + V(q)

  • Is the generalised expression of classical mechanics

  • In two differential expressions:

  • Newton equations of motion, but in a very elegant way

  • Use 'generalised coordinates' (p and q):

    • can use any coordiate system

      • e.g., Cartesian coordinates or Euler angles

dpdHp = ––– = ––– dtdqk

dqdHq = ––– = ––– dtdpk

.

.

Molecular Simulations & Sampling Techniques


Hamilton s principle
Hamilton's Principle

  • "The time derivative of the integral over the energy ofd ( pq - H(p,q) ) dt = 0

  • Hamilton's principle is most fundamental

    • Newton's equation of motion are only one set of equations that can be derived from Hamilton's principle.

  • The integral is called the 'action‘, meaning:

    • If we integrate the trajectory of an object in a configuration space given by positions q and momenta p between time points (integration limits) t1 and t2, then the value of the integral (= the 'action') of a 'real‘ trajectory is a minimum (more precisely an extremum) if compared to all other trajectories.

  • Example: Why does a thrown stone follow a parabolic trajectory?

    • If you vary the trajectory and calculate the action, the parbolic trajectory will yield the smallest 'action'.

.

.

Molecular Simulations & Sampling Techniques


Harmonic oscillator
Harmonic oscillator:

  • 1-dimensional motion

  • 2 dimensions in phase-space:

    • position (1-dimensional)

    • momentum (1-dimensional)

  • analytical solution for integration:

    • q(t) = b · cos (√k/m · t )

    • p(t) = -b·√mk· sin ( √k/m·t )

q(t)

p(t)

Molecular Simulations & Sampling Techniques


Calculating averages
Calculating Averages

  • Integration of phase space:

    • 1 particle, 2 values per coordinate (e.g. up, down):

      • 1*6 degrees of freedom (dof); 26 = 64 points

      • 2 particles: 2*6 dof; 212 = 4.096 points

      • 3 particles: 3*6 dof; 218 = 262.144 points

      • 4 particles: 4*6 dof; 224 = 16.777.216 points

  • Need whole of phase space ?

    • only low energy states are relevant

Molecular Simulations & Sampling Techniques


Solving complex systems
Solving Complex systems

  • No analytical solutions

  • Numerical integration:

    • by time (Molecular Dynamics)

    • by ensemble (Monte-Carlo)

  • Molecular Dynamics:Numerical integration in time

    • Euler’s approximation:

      • q(t + Δt) = q(t) + p(t)/m·Δt

      • p(t + Δt) = p(t) + m·a(t) ·Δt

    • Verlet / Leap-frog

Molecular Simulations & Sampling Techniques


Features of newton dynamics
Features of Newton Dynamics

  • Newton’s equations:

    • Energy conservative

    • Time reversible

    • Deterministic

  • Numeric integration by Verlet algrorithm: ‘Simulation’r(t + Dt) ~ 2 r(t) - r(t - Dt) + F(t)/mDt2 [ + 2 O(Dt4) ]

  • In ‘real’ simulation: Rounding errors (cumulative):

     not fully reversible

     no full energy conservation

    • Coupling to thermal bath  re-scaling

       not fully deterministic

    • ‘Lyapunov’ instability  trajectories diverge

Molecular Simulations & Sampling Techniques


Derivation verlet
Derivation: Verlet

  • Taylor expansion:

    • q(t+Δt) = q(t) + q’(t)Δt + 1/2! q’’(t)Δt2 + 1/3! q’’’(t)Δt3 + …

      • where: q’(t) = v(t) (1st derivative, velocity)

      • and: q’’(t) = a(t) (2nd derivative, acceleration)

        q(t+Δt) = q(t) + q’(t)Δt + 1/2! q’’(t)Δt2 + 1/3! q’’’(t)Δt3

        q(t−Δt) = q(t) − q’(t)Δt + 1/2! q’’(t)Δt2 − 1/3! q’’’(t)Δt3+

        q(t+Δt) + q(t−Δt) = 2q(t) + 2·1/2! q’’(t)Δt2

    • Rearrange:

      q(t+Δt) = 2q(t) − q(t−Δt) + a(t)Δt2

  • 2nd order; but 3rd order accuracy

Molecular Simulations & Sampling Techniques


What do we obtain
What do we obtain?

  • Trajectory:q(t) and p(t)

  • Probability of occurence:P(p,q) = 1/Z e-H(p,q)/kT

  • Averages along trajectory: <A(p,q)T> = 1/T A(q(t),p(t)) dt (where T denotes total time, and not! temperature)

Molecular Simulations & Sampling Techniques


Convergence
Convergence

  • Amount of phase-space covered

    • “Sampling”

  • Impossible to prove:You cannot know what you don’t know

  • Energy “landscape” in phase-space

    • there might be a “next valley”

Molecular Simulations & Sampling Techniques


Example convergence 1
Example: Convergence (1)

Molecular Simulations & Sampling Techniques


Example convergence 2
Example: Convergence (2)

Molecular Simulations & Sampling Techniques


Example convergence 3
Example: Convergence (3)

  • Apparent Convergenceon all timescales100 ps – 10 ns !

Molecular Simulations & Sampling Techniques


Efficiency
Efficiency

  • Time step limited by vibrational frequencies

    • heavy-atom–hydrogen bond vibration 10-14s (10fs)

    • 10-20 integration steps per vibrational period:

      • 0.5 fs time step; 2.000.000 steps for 1 ns

  • Removal of fast vibrations (constraining):

    • hydrogen atom bond and angle motion

    • heavy-atom bond motion

    • out-of-plane motions (e.g. aromatic groups)

  • In practice: 1-2 fs time step

    • 5-7 fs maximum

Molecular Simulations & Sampling Techniques


Constraining
Constraining

  • to remove degrees of freedom, e.g.:

    • bond i-j vibrations  keep distance i-j constant

    • angle i-j-k vibrations  keep distance i-k constant

  • Constraint Algorithms

    • SHAKE

      • iterative adjustment of lagrange multipliers

    • LINCS

      • Taylor expansion of matrix inversion

      • non-iterative (more stable)

      • no highly connected constraints

    • SETTLE

      • Analytical Solution

        • for symmetric 3-atom molecules (like water)

Molecular Simulations & Sampling Techniques


Improving performance
Improving Performance

  • Pairwise potential: Fij = − Fji

  • Potential E(r) ~ 0 at large r : cut-off

    • Coulomb: ~ 1/r

    • Lennard-Jones: ~1/r6

  • Atoms move little in one step: pair-list

    • Evaluating r is expensive: r = √|rj−ri|

  • Large distances change less: twin-range

    • short-range each step; long range less often

  • Multiple time-step methods

  • Many Processor/Compiler/Language specific optimizations:

    • use of Fortran vs. C

    • optimize cache performance

      • arrays of positions, velocities, foces, parameters are very large

    • compiler optimizations

Molecular Simulations & Sampling Techniques


Ignoring degrees of freedom
Ignoring Degrees of Freedom

  • Internal:

    • bonds, angles → Constraint algorithm

      • larger time steps

  • External:

    • “Solvent” → Langevin dynamics

      • less (explicit) particles

    • Inertia & “solvent” → Brownian dynamics

      • larger time steps

Molecular Simulations & Sampling Techniques


Trajectory on energy surface
Trajectory on Energy Surface

Molecular Simulations & Sampling Techniques


Sampling in conformational space
Sampling in Conformational Space

  • Most of the computational time is spent on calculating(local, harmonic) vibrations.

DE >> KT

Energy

vibration

Entropy

Molecular Simulations & Sampling Techniques


Barriers
Barriers

  • Kitao et al. (1998) Proteins 33, 496-517.

Molecular Simulations & Sampling Techniques


Psychology of theorists
Psychology of Theorists

100%

“In theory, there should be no difference between theory and practice. In practice, however, there is always a difference...“ (Witten and Frank)

“For every complex question there is a simple and wrong solution.” (Albert Einstein)

“All models are wrong, but some are useful.” (George Box)

0%

OPTIMIST SCALE

Molecular Simulations & Sampling Techniques


Monte carlo sampling
Monte Carlo Sampling

  • Ergodic hypothesis:

    • Sampling over time (Molecular Dynamics approach); and

    • Ensemble averaging (Monte Carlo approach)

  • Yield the same result:

    r (r) = < ri(r) >NVE

  • Detailed Balance condition:

    p(o) p(on) = p(n) p(no)

Molecular Simulations & Sampling Techniques


Metropolis selection scheme
Metropolis Selection Scheme

  • Metropolis acceptance rule that satisfies detailed equilibrium:acc(on) = p(n)/p(o) = e-DE/kT if p(n) < (o)acc(on) = 1 if p(n)  (o)

     Metropolis Monte Carlo

  • Ergodic probability density for configurations around rN e-E/kTp(rN) = ––––––S e-E/kT

Molecular Simulations & Sampling Techniques


Search strategies
Search Strategies

Molecular Simulations & Sampling Techniques


Leaps
Leaps

Molecular Simulations & Sampling Techniques


Computational scheme
Computational Scheme

  • Readuction of the leaps will lead to classical dynamics

  • Control parameter:

    • RMSD

    • Angle deviation

Molecular Simulations & Sampling Techniques


Computational load solvation
Computational Load: Solvation

  • Most computational time (>95%) spent on calculating (bulk) water-water interactions

Molecular Simulations & Sampling Techniques


Implicit solvation
Implicit Solvation

Molecular Simulations & Sampling Techniques


POPS

  • Solvent accessible area

    • fast and accurate area calculation

    • resolution:

      • POPS-A (per atom)

      • POPS-R (per residue)

    • parametrised on 120000 atoms and 12000 residues

    • derivable -> MD

  • Free energy of solvationDGsolvi = areai·si

  • POPS is implemented in GROMOS96

  • parameters 'sigma' from simulations in water:

    • amino acids in helix, sheet and extended conformation

    • peptides in helix and sheet conformation

Molecular Simulations & Sampling Techniques


Pops server
POPS server

Molecular Simulations & Sampling Techniques


Test molecules alanine dipeptide
Test molecules: alanine dipeptide

Molecular Simulations & Sampling Techniques


Test molecules bpti y35g bpti
Test molecules: BPTI / Y35G-BPTI

Classical MD Leap-dynamics Essential dynamics

Molecular Simulations & Sampling Techniques


Calmodulin domains
Calmodulin domains

  • Apparent unfolding temperatures (CD)

    • C-domain : 315 K (42 ° C)

    • N-domain : 328 K (55 °C)

  • LD simulations:

    • 3 ns

    • 4 trajectories

      • 290 K

      • 325 K

      • 360 K

Molecular Simulations & Sampling Techniques


Snapshots
Snapshots

Molecular Simulations & Sampling Techniques


Trajectories
Trajectories

Molecular Simulations & Sampling Techniques


Example protein ligand dynamics
Example: Protein & Ligand Dynamics

Molecular Simulations & Sampling Techniques


Example essential dynamics analysis
Example: Essential Dynamics Analysis

Cyt-P450BM37 x 10ns “free” MD simulations

Molecular Simulations & Sampling Techniques


CD

Molecular Simulations & Sampling Techniques


Comparison cd simulation
Comparison CD / simulation

Molecular Simulations & Sampling Techniques


Example minima
Example: Minima

Molecular Simulations & Sampling Techniques


Example conformations
Example: Conformations

Molecular Simulations & Sampling Techniques


Levinthal s paradox
Levinthal’s paradox

  • Eiwitvouwingsprobleem:

    • Voorspel de 3D structuur vanuit de sequentie

    • Begrijp het vouwingsproces

Molecular Simulations & Sampling Techniques


Folding energy

energy

E(x)

may have higher energy

but lower free energy

than

coordinate x

Folding energy

  • Each protein conformation has a certain energy and a certain flexibility (entropy)

  • Corresponds to a point on a multidimensional free energy surface

Three coordinates per atom

3N-6dimensions possible

DG = DH – TDS

Molecular Simulations & Sampling Techniques


Folded state
Folded state

  • Native state = lowest point on the free energy landscape

  • Many possible routes

  • Many possible local minima (misfolded structures)

Molecular Simulations & Sampling Techniques


Molten globule
Molten globule

  • First step: hydrophobiccollapse

  • Molten globule: globular structure, not yet correct folded

  • Local minimum on the free energy surface

Molecular Simulations & Sampling Techniques


Force field
Force Field

“the collection of all forces that we consider to occur in a mechanical atomar system”

  • A generalised description:

    Etotal = Ebonded + Enon-bonded + Ecrossterm

  • Crossterms:

    • non-bonded interaction influence the bonded interaction (v.v.).

    • Some force fields neglect those terms.

  • Note that force fields are (mostly) designed for pairwise atom interactions.

    • Higher order interactions are implicitly included in the pairwise interaction parameters.

Molecular Simulations & Sampling Techniques


Force field components bonded interactions
Force Field Components: Bonded Interactions

Molecular Simulations & Sampling Techniques


Force field components non bonded interactions
Force Field Components: Non-Bonded Interactions

Molecular Simulations & Sampling Techniques


All together
All Together…

Molecular Simulations & Sampling Techniques


Reduced units
Reduced Units

  • Generalise description of (atomic) systems

    • expres all quantities in basic units derived from system's dimensions

  • For example, a Lennard-Jones interaction:VLJ = eƒ(r/s)eis characteristic interaction energy; s is equilibrium distance

  • Choose basic units:

    • unit of length, s

    • unit of energy, e

    • unit of mass, m (mass of the atoms in the system)

  • all other units can be derived from these, e.g.:

    • time: sm/e

    • temperature: e/kB

      (from: Frenkel and Smit, 'Understanding Molecular Simulations', Academic Press.)

  • Other choices, e.g., ‘MD’ units:

    • length nm (10-9m),mass u, time ps (10-12s), charge e, temp K

    • energy kJ mol-1, veolcity nm ps-1, pressure kJ mol-1 nm-3

Molecular Simulations & Sampling Techniques


Main points
Main points

Molecular Simulations & Sampling Techniques



ad