1 a molecular replacement pipeline balbes 2 low resolution refinement problems and approaches
1 / 39

1) A Molecular Replacement pipeline – BALBES 2) Low resolution refinement problems and approaches - PowerPoint PPT Presentation

  • Uploaded on

1) A Molecular Replacement pipeline – BALBES 2) Low resolution refinement problems and approaches. Garib N Murshudov York Structural Laboratory Chemistry Department University of York. Contents. BALBES – molecular replacement pipeline TWIN refinement in REFMAC and its extension

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about '1) A Molecular Replacement pipeline – BALBES 2) Low resolution refinement problems and approaches' - omer

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
1 a molecular replacement pipeline balbes 2 low resolution refinement problems and approaches l.jpg
1) A Molecular Replacement pipeline – BALBES2) Low resolution refinement problems and approaches

Garib N Murshudov

York Structural Laboratory

Chemistry Department

University of York

Contents l.jpg

  • BALBES – molecular replacement pipeline

  • TWIN refinement in REFMAC and its extension

  • Problems of low resolution refinement

  • Some tools for low resolution: ncs, external restraints, B value restraints and “jelly” body

  • Map sharpening: General approach and some applications

Distribution of methods in pdb l.jpg
Distribution of methods in PDB

Diagram showing the percentage of structures in the PDB solved by different techniques

67.5% of structures are solved by Molecular Replacement (MR)

21% of structures are solved by experimental phasing

Organisation of balbes l.jpg
Organisation of BALBES

Chains . The internal database has around 40000 unique entries selected from more than 51,000 present in the PDB. All entries in the PDB are analysed according to their identity. Only non- redundant sets of structures are stored.

Domains. About 10000 of them have two or more domains. The DB contains 35000 domain definitions Loops and other flexible parts are removed from the domain definitions.

Inputs (reflections and sequence)

Multimers of structures (using PISA)

Hierarchy is organized according to sequence identity and 3D similarity (rmsd over Ca atoms).





Search models l.jpg







Best multi-domain model





Full chain models



Domains from DB

Search models

Input sequence

Example of search multi domain protein l.jpg
Example of search: Multi-domain protein

This structure can be solved with multi-domain model.

PDB entry 1z45 has three major domains. One of the domains has also two subdomains. Domain 1 is similar to 1ek6 (seq id 55%). Domain 2 similar to 1yga (seq id 51%) and domain 3 is similar to 1udc (seq id 49%)

1z45 - isomerase

1ek6 - two domains of isomerase

1yga - another domain of isomerase

1udc - two domains of isomerase

All these proteins are although isomerases they have slightly different activities

Other functions l.jpg
Other functions


Flexible loop


  • Automatic generation and use of ensembles

  • Multiprotein complexes

Slide8 l.jpg

BALBES Interface in Our Web Server

(running using our Linux cluster) designed by P.Young


Slide9 l.jpg

BALBES Interface in Our Web Server

(running using our Linux cluster) designed by P.Young

Available refinement programs l.jpg
Available refinement programs


  • CNS


  • TNT


  • Phenix.refine



  • XD

  • MAIN

What can refmac do l.jpg
What can REFMAC do?

  • Simple maximum likelihood restrained refinement

  • Twin refinement

  • Phased refinement (with Hendrickson-Lattmann coefficients)

  • SAD/SIRAS refinement

  • Structure idealisation

  • Library for more than 8000 ligands (from the next version)

  • Covalent links between ligands and ligand-protein

  • Rigid body refinement

  • NCS local, restraints to external structures, jelly body

  • TLS refinement

  • Map sharpening

  • etc

Twin refinement in refmac l.jpg
Twin refinement in REFMAC

Twin refinement in refmac (5.5 or later) is automatic.

  • Identify “twin” operators

  • Calculate “Rmerge” (Σ|Ih-<I>twin| /ΣΙh) for each operator. Ιf Rmerge>0.50 keep it: Twin plus crystal symmetry operators should form a group

  • Refine twin fractions. Keep only “significant” domains (default threshold is 5%): Twin plus symmetry operators should form a group

    Intensities can be used

    If phases are available they can be used

    Maximum likelihood refinement is used

Likelihood l.jpg

The dimension of integration is in general twice the number of twin related domains. Since the phases do not contribute to the first part of the integrant the second part becomes Rice distribution.

The integration is carried out using Laplace approximation.

These equations are general enough to account for: non-merohedral twinning (including allawtwin), unmerged data. A little bit modification should allow handling of simultaneous twin and SAD/MAD phasing, radiation damage

Twin few warnings about r factors l.jpg
Twin: Few warnings about R factors

For acentric case only:

For random structure

Crystallographic R factors

No twinning 58%

For perfect twinning: twin modelled 40%

For perfect twinning without twin modelled 50%

R merges without experimental error

No twinning 50%

Along non twinned axes with another axis than twin 37.5%

Non twin


Problems of low resolution refinement l.jpg
Problems of low resolution refinement

  • Function to describe fit of the model into experiment: likelihood or similar

    • Data may come from very peculiar “crystals”: Twin, OD, multiple cell

    • Radiation damage

    • Converting I-s to |F| may not be valid: weak reflections, modulated crystals

  • Limited and noisy data: use of available knowledge

    • Known structures

    • Internal patterns: NCS, secondary structure

  • Smeared electron density with vanishing side chains, secondary structures, domains: High B values and series termination:

    • Filtering methods: Solve inverse problem with regulariser

    • Missing data problem: Data augmentation, bootstrap

Slide17 l.jpg

Use of available knowledge 1) NCS local2) Restraints to known structure(s)3) Restraints to current inter-atomic distances (implicit normal modes or “jelly” body)4) Better restraints on B values These are available from the version 5.6NoteBuster/TNT has local NCS and restraints to known structures CNS has restraints to known structures (they call it deformable elastic network)Phenix has B-value restraints on non-bonded atom pairs and automatic global NCSLocal NCS (only for torsion angle related atom pairs) was available in SHELXL since the beginning of time

Auto ncs local and global l.jpg
Auto NCS: local and global

  • Align all chains with all chains using Needleman-Wunsh method

  • If alignment score is higher than predefined (e.g.80%) value then consider them as similar

  • Find local RMS and if average local RMS is less than predefined value then consider them aligned

  • Find correspondence between atoms

  • If global restraints (i.e. restraints based on RMS between atoms of aligned chains) then identify domains

  • For local NCS make the list of corresponding interatomic distances (remove bond and angle related atom pairs)

  • Design weights

  • The list of interatomic distance pairs is calculated at every cycle

Auto ncs l.jpg
Auto NCS

Aligned regions

  • Global RMS is calculated using all aligned atoms.

  • Local RMS is calculated using k (default is 5) residue sliding windows and then averaging of the results

Chain A

Chain B


Auto ncs iterative alignment l.jpg
Auto NCS: Iterative alignment

Example of alignment: 2vtu.

There are two chains similar to each other. There appears to be gene duplication

RMS – all aligned atoms

Ave(RmsLoc) – local RMS

********* Alignment results *********


: N: Chain 1 : Chain 2 : No of aligned :Score : RMS :Ave(RmsLoc):


: 1 : J( 131 - 256 ) : J( 3 - 128 ) : 126 : 1.0000 : 5.2409 : 1.6608 :

: 2 : J( 1 - 257 ) : L( 1 - 257 ) : 257 : 1.0000 : 4.8200 : 1.6694 :

: 3 : J( 131 - 256 ) : L( 3 - 128 ) : 126 : 1.0000 : 5.2092 : 1.6820 :

: 4 : J( 3 - 128 ) : L( 131 - 256 ) : 126 : 1.0000 : 3.0316 : 1.5414 :

: 5 : L( 131 - 256 ) : L( 3 - 128 ) : 126 : 1.0000 : 0.4515 : 0.0464 :


Auto ncs conformational changes l.jpg
Auto NCS: Conformational changes

Domain 2

In many cases it could be expected that two or more copies of the same molecule will have (slightly) different conformation. For example if there is a domain movement then internal structures of domains will be same but between domains distances will be different in two copies of a molecule

Domain 1

Domain 2

Domain 1

Robust estimators l.jpg
Robust estimators

One class of robust (to outliers) estimators are called M-estimators: maximum-likelihood like estimators. One of the popular functions is Geman-Mcclure.

Essentially when distances are similar then they should be kept similar and when they are too different they should be allowed to be different.

This function is used for NCS local restraints as well as for restraints to external structures

Red line: x2

Black line: x^2/(1+w x^2)

where x=(d1-d2)/σ, w=0.1

Restraints to external structures it is done by rob nicholls l.jpg
Restraints to external structuresIt is done by Rob Nicholls


  • Compares Two Protein Chains

  • Conformation-invariant structural comparison

  • Residue-residue alignment

  • Superimposition

  • Residue-based and global similarity scores

  • Produces local atomic distance restraints

  • Based on one or more aligned chains

  • Possibility of multi-crystal refinement

Prosmart restrain l.jpg
ProSmart Restrain

structure to be refined known similar structure (prior)

Remove bond and angle related pairs

Restraints to current distances l.jpg
Restraints to current distances estimator functions are used

The term is added to the target function:

Summation is over all pairs in the same chain and within given distance (default 4.2A). dcurrent is recalculated at every cycle. This function does not contribute to gradients. It only contributes to the second derivative matrix.

It is equivalent to adding springs between atom pairs. During refinement inter-atomic distances are not changed very much. If all pairs would be used and weights would be very large then it would be equivalent to rigid body refinement.

It could be called “implicit normal modes”, “soft” body or “jelly” body refinement.

B value restraints and tls l.jpg
B value restraints and TLS estimator functions are used

  • Designing restraints on B values is much more difficult.

  • Current available options to deal with B values at low resolutions

  • Group B as implemented in CNS

  • TLS group refinement as implemented in refmac, phenix.refine and buster

  • Both of them have some applications. TLS seems to work for wide range of cases but unfortunately it is very often misused. One of the problems is discontinuity of B values. Neighbouring atoms may end up having wildly different B values

  • In ideal world anisotropic U with good restraints should be used. Only in some cases full aniso refinement at 3Å gives better R/Rfree than TLS refinement. These cases are with extreme ansiotropic data.




Kullback leibler divergence l.jpg
Kullback-Leibler divergence estimator functions are used

If there are two densities of distributions – p(x) and q(x) then symmetrised Kullback-Leibler divergence between them is defined (it is distance between distributions)

If both distributions are Gaussian with the same mean values and U1 and U2 variances then this distance becomes:

And for isotropic case it becomes

Restraints for bonded pairs have more weights more than for non-bonded pairs. For nonbonded atoms weights depend on the distance between atoms.

This type of restraint is also applied for rigid bond restraints in anisotropic refinement

Example after molecular replacement 3a resolution data completeness 71 l.jpg
Example, after molecular replacement estimator functions are used 3A resolution, data completeness 71%

Rfactors vs cycle

Black – simple refinement

Red – Global NCS

Blue – Local NCS

Green – “Jelly” body

Solid lines – Rfactor

Dashed lines - Rfree

Data provided by: Marek Brzozovski and Colin Kleanthous

Example 4a resolution data from pdb 2r6c l.jpg
Example: 4A resolution, data from pdb 2r6c estimator functions are used

Starting R/Rfree = 36.0/35.6

R/Rfree after 40 cycles of refinement

General reparameterisation l.jpg
General reparameterisation estimator functions are used

  • The effective number of of parameters when restraints are used is reduced (it is an approximate equation):

  • Where L is –loglikelihood function, R is restraints function, w is weight parameter. When w=0 then the effective number of parameters is the same as the number of parameters in the model. When w>0 then the number of parameters becomes smaller.

  • For example if only jelly body is used then by changing maximum interatomic distances and w, we can get minimum number of parameters 6 (rigid body refinement) and the maximum number of parameters 3natom (for no B value refinement)

    NB: The number of observations is n

Desirable regulirers l.jpg
Desirable regulirers estimator functions are used

Function minimised should have the form:

L is –loglikelihood function

f is a regulariser. It should have the following properties:

f(0)=0, f’(0)=0

I.e. the value of the function and its first derivative should be zero. In this case the minimum achieved will be the minimum of the likelihood function also. In other words derived parameters will depend on observations only.

Map sharpening inverse problem l.jpg

We want to observe estimator functions are used ρ0(x) but we observe ρ(x). These two entities are related:

If Kis known, calculate ρ0. In general problem is easy: Discretise and solve the linear equation. However these problems are ill-posed (small perturbation in the input causes large changes in the output). In practice: by sharpening signal as well as noise are amplified.

Regularisation may help:

L is related with regularisation function. For L2 norm (value of ρ should be small) L is identity and for Sobolev norm (ρ should be smooth) of first order it is Laplace operators


Map sharpening inverse problem34 l.jpg

In general K is effect of such terms as TLS or smoothly varying blurring function.

Noises in electron density: series termination, errors in phases and noises in experimental data.

Very simple case: K is overall B value. Then the problem is solved using FFT: Fourier transform of Gaussian is Gaussian, Fourier transform of Laplace operator is square length of the reciprocal space vector.


Map sharpening 2r6c 4 resolution l.jpg

Original varying blurring function.

No sharpening

Top left and bottom:

After local NCS refinement

Sharpening, median B

α optimised

Sharpening, median B

α 0


Example 2r6c electron density l.jpg
Example, 2r6c, Electron density varying blurring function.

Known structure (2r6a) superimposed to 2r6c structure. There is a helix. Side chains are visible to some degree

Conclusion l.jpg
Conclusion varying blurring function.

  • Molecular replacement in many cases can be done automatically: Only experimental data (reflections and sequence) are needed

  • Twin refinement improves statistics and occasionally electron density

  • Use of known structures improves reliability of the derived model: Especially at low resolution

  • NCS restraints must be done automatically: but conformational flexibility must be accounted for

  • “Jelly” body works better than I thought it should

  • Regularised map sharpening looks promising. More work should be done on series termination and general sharpening operators

Future work l.jpg
Future work varying blurring function.

  • Finilise reticular twinning, multiple cells, modulations

  • Refinement in the presence of radiation damage

  • Implement multiple reference structures

  • Restraints to known templates

  • Implement jelly TLS and test L2 distance restraints between probability distributions

  • Bayesian sharpening and denoising for map calculation

  • Etc

Acknowledgment l.jpg
Acknowledgment varying blurring function.


Alexei Vagin Pavol Skubak

Andrey Lebedev Raj Pannu

Rob Nocholls

Fei Long

CCP4, YSBL people


REFMAC is available from CCP4 or from York’s ftp site:


Balbes and other programs:


This and other presentations can be found on: