Model quality concepts statistics
1 / 55

Model Quality: Concepts & Statistics - PowerPoint PPT Presentation

  • Uploaded on

Model Quality: Concepts & Statistics. Swanand Gore & Gerard Kleywegt PDBe – EBI May 6 th 2010, 10:30-11:30 am. Macromolecular Crystallography Course. Outline. Science, experiments, errors, validation Features of a refined crystallographic model useful for quality checking

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about ' Model Quality: Concepts & Statistics' - shania

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Model quality concepts statistics

Model Quality:Concepts & Statistics

Swanand Gore & Gerard Kleywegt


May 6th 2010, 10:30-11:30 am

Macromolecular Crystallography Course


  • Science, experiments, errors, validation

  • Features of a refined crystallographic model useful for quality checking

  • Model quality checks

    • Only data

    • Model coordinates

    • Model + data

How do scientists push the boundaries of knowledge
How do scientists push the boundaries of knowledge?

Prior Knowledge






Verdict on Hypothesis




  • Prior knowledge aids in interpretation.

  • Measurements should conform to prior knowledge, or be strong and repeatable enough to refute it.

Crystal structure can confirm a hypothesis
Crystal structure can confirm a hypothesis

Mutation A-Ala kills activity of enzyme X.

Inhibitor I reduces activity.

A takes part in catalysis.

Enzyme Crystallization.

Soaking with inhibitor.

X-ray diffraction.

A is a critical residue.

Hypothesis is correct.

Data Collection, Refinement.

A is in a pocket.

I binds pocket close to A.

  • Restraints derived from small-molecule crystallography and high-quality crystal structures help define a refinement target. Maybe a MR probe is available from a known homolog.

  • Solved structure’s features should be protein-like, with reasonable backbone and sidechain conformations. Inhibitor should have a believable covalent geometry.

  • Electron density should be strong at site of interest and good quality throughout.

Model quality checking
Model quality checking

  • = Validation = establishing the truth or accuracy of

    • Theory

    • Hypothesis

    • Model

    • Claim

  • Integral to scientific activity!

  • “Science is a way of trying not to fool yourself. The first principle is that you must not fool yourself, and you are the easiest person to fool.”

    (Richard Feynman)

Errors affect measurement and interpretation
Errors affect measurement(and interpretation)

Precise, but inaccurate



Systematic Errors

Accurate, but imprecise

Accurate and




Random Errors

Model quality checking1
Model quality checking

  • Systematic errors

  • Random errors

  • Both + Mistakes

Prior Knowledge




Optimized values



Other Prior Knowledge

Independent models



More Experiments

Model quality checks are vital to avoid serious errors and consequences
Model quality checks are vital to avoid serious errors and consequences

1phy, 2phy

1pte, 3pte

  • From Jawahar’sroadhow ppt.

Model quality checks are vital to avoid serious errors and consequences1
Model quality checks are vital to avoid serious errors and consequences

“were incorrect in both the hand of the structure and the topology. Thus, the biological interpretations based on the inverted models for MsbA are invalid.”


  • From Jawahar’sroadhow ppt.

Model quality checks are vital to avoid serious errors and consequences2
Model quality checks are vital to avoid serious errors and consequences

“However, because of the lack of clear and continuous electron density for the peptide in the complex structure, the paper is being retracted.”


  • From Jawahar’sroadhow ppt.

A crystallographic model
A crystallographic model consequences

  • Biochemical entities

    • Biopolymers

      • polypeptides, polynucleotides, carbohydrates

    • Small-molecule ligands (ions, organic)

      • Crystallographic additives, e.g. GOL, PEG

      • Physiologically relevant, e.g. heme, ions

      • Synthesized molecules, e.g. a drug candidate

    • Solvent

  • Coordinates, Displacement

    • Unique x,y,z

    • Partial, multiple, absent (occupancy)

    • Isotropic or anisotropic B factors

    • TLS approximation

  • Crystallographic etc.

    • Cell, symmetry, NCS

    • Bulk solvent model (Ksol, Bsol)

  • 3hbq images made with pymol.


  • B factor putty from Antonyuket al. 10.1073/pnas.0809170106


A high quality mx model makes sense in all respects
A high-quality MX model consequencesmakes sense in all respects

  • Chemical

    • Bond lengths, angles, planarity, chirality

  • Physical

    • Good packing, sensible interactions, reasonable thermal displacements

  • Crystallographic

    • Low crystallographic residueal, residues fit density, flat difference map

  • Protein Structure

    • Ramachandran, peptide, rotamers, disulpihdes, salt bridges, pi-interactions, hydrophobic core

  • Statistical

    • Best possible hypothesis to fit data, no over-fitting, no under-modelling

  • Biological

    • Explains observations (activity, mutants, inhibitors)

    • Is predictive

Validation done against unrefined entities is powerful
Validation done against unrefined entities is powerful consequences

Covalent geometry


Types of quality criteria for macromolecular crystallography
Types of quality criteria for consequencesmacromolecular crystallography

  • Model-only

    • How good is model irrespective of experiment?

    • Only coordinates are used

    • Simple, intuitive

  • Model and data

    • How well does the model fit the data?

    • Crucial! Sets your model apart from theoretical model!

  • Data-only

    • Data-Quality + Crystallographer = Model Quality

    • Good data necessary for reliable model

    • Can be understood readily only by expert crystallographer

  • Scope

    • Global quality: how well is the whole structure solved.

      • Primarily for at-a-glance check.

    • Local quality: how well solved and reliable are parts of structure, e.g. residue-wise quality.

      • For those who wish to improve or avoid bad quality regions.




Data only quality checks
Data-only quality checks consequences

  • Quality data essential for good quality of model.

  • Wilson plot

    • Log(Average intensity) in resolution bins

    • Has a characteristic shape

    • Slope estimates overall B factor, and intercept used in scaling

    • Deviations indicate pseudo-symmetry, twinning, outliers

  • Twinning: Padilla-Yeates plot

    • Freq distribution of difference between locally-related intensities

    • L = (I(h1)-I(h2)) / (I(h1)+I(h2))

    • |L| ~ 0.5, |L2| ~ 0.333 for untwinned


Wilson plot



  • Wilson plots from CCP4 wiki and B. Rupp book.

Data only quality checks1
Data-only quality checks consequences

  • Anisotropy

    • Mean amplitude vs 1/d2 (resolution)

    • 1 plot each for a*, b*, c*

    • Anisotropic truncation and scaling

  • Data quality

    • Completeness

      • I / σ(I), signal to noise, drops at higher resolution

      • Completeness reduces towards higher resolution shells

    • Rmerge: how well do reflection agree across frames.

    • Rsym: how well do the symmetry-related reflections agree.

    • Has the the right resolution cutoffs been chosen?



Model only criteria
Model-only criteria consequences

  • Stereochemistry

    • Covalent bonds, angles, dihedrals, chirality

    • Planarity, ring geometry

  • Dihedral angle distributions

    • Ramachandran, (flipped) sidechains, RNA backbone

    • Derived distributions from small-molecule datasets

  • Packing

    • Bad vdw clashes

    • Underpacking

    • Hydrogen bonds and environment

Covalent geometry
Covalent geometry consequences

  • Reference sources for bonds and angles

    • For Proteins and Nucleotides

      • Small-molecule crystallography

        • does not suffer from the phase problem!

        • Numerous expt-structures (CCDC > 500’000)

      • Ultra-high resolution MX structures

      • Mean, variability = refinement target, force constants

      • Engh & Huber (1991,2001), Parkinson et al (1996)

    • For Small-molecules

      • More variety of bonds, angles, rings

      • Comparable fragments from small-molecule database can be used to estimate mean and std. dev.

  • Small variation -> highly restrained in refinement

    • Length variation ~ 0.02 Å, angle variation ~ 2o

    • But still useful to check large deviations

      • refinement problems, incorrect parameters

      • Systematic directional error in lengths due to wrong cell

  • See 104l’s pdbreport for systematic deviation in bond lengths


Covalent geometry quality metrics
Covalent geometry: quality metrics consequences

  • RMS-Z of bond lengths and angles

    • RMS of Z values

    • RMS

      • Root of mean of squares, √ ( ∑xi2) / N)

    • Z-value

      • How far is an observed value from the mean in terms of standard deviation?

      • Z = (value – μ) / σ

    • Each bond type and angle type have a different distribution

    • Find Z for each observed bond length and angle

    • √ ( ∑Zi2) / N)

  • Local checks: Investigate > 4σ outliers

Covalent geometry specific to proteins
Covalent geometry specific to proteins consequences

  • Planarity

    • Peptide bond

    • Phe, Tyr, Trp, His, nucleotide bases

    • Arg, Gln, Asn, Glu, Asp

  • Chirality

    • Should be always L at CA

      • … unless solving a cone-snail structure!

    • Gly is not chiral!

    • CB in Val, Ile, Thr is (2S,3R)

    • CA-N-C-CB ~ 34o, chiral volume ~ 2.5 Å3

  • See 104l’s pdbreport for systematic deviation in bond lengths



Covalent geometry of ligands
Covalent geometry of consequencesligands

  • Small molecule ligands have huge variety

    • They can get modified on soaking.

  • Few geometric rules other than the basic rules

    • Chirality (when known)

    • planarity of aromatics and conjugated systems

    • almost invariant bond lengths and angles

    • CCDC preferences for fragments of molecules

  • Wrong ligand geometry does not result in overall bad crystallographics statistics for the complex

    • Very often ligands end up having a poor geometry.

  • SB-202190 in 1PME, 1998, 2.0Å, Prot. Sci.

  • 3-Phenylpropylamine, in 1TNK, 1994, 1.8Å, Nature Struct. Biol.

  • CCDC Cambridge Crystallographic Data Center

Covalent geometry of ligands1
Covalent geometry of consequencesligands

  • COA = coenzyme A. 2.25Å, R 0.25/0.28, Mol. Cell. Deposited 2003.

  • 4PN = 4-piperidinopiperidine, 2.5Å, R 0.23/0.29, 1k4y, Nature Struct. Biol. Deposited 2001

Ramachandran plot
Ramachandran consequences plot

  • Why are φ-ψ plots useful?

    • Simple description of the protein backbone

    • Frequencies mirror the energy landscape

    • Not used in refinement

    • Highly researched, various regions correspond to frequent secondary structures


  • Bosco et al (2003) Revisiting the Ramachandran plot: Hard-sphere repulsion, electrostatics, and H-bonding in the α-helix. Protein Sci 12 2508


Ramachandran plot1
Ramachandran consequences plot

  • All regions are not equally populated

    • Multiple steric clashes

      • H-H(i+1), O(i-1)-H(i+1), O(i-1), H(i+1), ….

    • Favorability depends on which clashes occur to what extent

  • Rama plot is different for some residues

    • Gly, Pro, pre-Pro, rest

  • Various versions

    • WhatIf, EDS, MolProbity, ProCheck

    • Different definitions of favored, allowed, generously allowed, disallowed

    • Different quality-filtering criteria for choice of underlying distribution


  • Bosco et al (2003) Revisiting the Ramachandran plot: Hard-sphere repulsion, electrostatics, and H-bonding in the α-helix. Protein Sci 12 2508


Ramachandran plot quality metrics
Ramachandran consequences plot: quality metrics

  • Contouring the Rama plot

    • Empirical Rama plots are result of data mining

      • High quality structures

      • Non-redundant chains

      • Low B factors, full occupancies

      • No missing atoms

    • Observations are discretized on a grid.

    • Probabilities are assigned on 5o*5o grid and smoothened.

    • Contours are drawn to enclose data from high to low probabilities

      • 1σ contour : 68.2% data, 2σ : 95.4%, 3σ : 99.7% etc

  • Overall expected-ness of a Ramachandran plot

    • How common is it to find a residue type in a particular secondary structure at a grid point on Ramachandran plot? From database counts, this is expressed as a Z-score for each ss and rt.

    • Rama score is just the mean of individual Z scores for all residues in a protein.

  • Global metrics

    • Percentage outliers > 3.5σ (99.8%)

    • Overall goodness of Rama plot

Objectively judging the quality of a protein structure from a Ramachandran plot. Rob W.W. Hooft , Chris Sander and GerritVriend . Volume 13, Number 4 Pp. 425-430

More checks on protein backbone
More checks on protein backbone consequences

  • Kleywegt plot to check NCS quality

    • Plotting related copies together

  • Omega cis/trans

    • Cis < 0.1% generally

    • Pre-Prolinecis ~ 6%

  • CA-only models

    • Checking against known distribution of CA(i)-CA(i+1)-CA(i+2) angle and CA(i)-CA(i+1)-CA(i+2)-CA(i+3) dihedrals

  • CB validation

    • (Molprobity)

    • Serves as a useful indicator of any problems in backbone bond lengths and angle parameters

  • Phi/Psi-chology: Ramachandran revisited. Gerard J Kleywegt and T Alwyn Jones. Structure. 1996 Dec 15;4(12):1395-400

  • Lovell, S. C., Davis, I. W., Arendall, W. B. III, de Bakker, P. I. W., Word, J. M., Prisant, M. G., Richardson, J. S. & Richardson, D. C. (2003). Proteins, 50, 437-450.

Protein sidechains
Protein consequencessidechains

  • Dihedrals in organic molecules prefer anti over gauche over eclipsed

  • Rotamericity is mainly due to local minima in local energy, just like organic molecules

  • Rotamers preferences are residue and secondary structure specific

  • Many libraries of rotamers exist for modelling

  • Swanand Gore. PhD thesis.


Sidechain quality
Sidechain consequences quality

  • Higher resolution structures have higher fraction of rotamericsidechains

    • Rotamericity calculations vary slightly between MolProbity, ProCheck, WhatCheck

  • Non-rotameric

    • Does not mean incorrect

    • But is there clear density to justify the modelled conformation?

    • Does the conformation make sense in the environment?

  • Can the sidechain be flipped?

    • Asn (ND1, OD2), Gln (NE1,OE2), His (ND2, NE2) are not unambiguously defined by electron density

    • Does flipping make the model better?

      • E.g. Gln90 in 1REI : Better H-bonds and reduced bad contacts after flip

  • Asparagine and Glutamine: Using Hydrogen Atom Contacts in the Choice of Side-chain Amide Orientation. J. M. Word, Simon C. Lovell, J. S. Richardson and D. C. Richardson. J. Mol. Biol. (1999) 285, 1735-1747

  • Procheck sidechain plots for 1aac

  • Image from Jawahar’s roadhow ppt.

Sidechain quality metrics
Sidechain consequences quality metrics

  • Percentage of improbable rotamers

    • Similar to Ramachandran, high-quality data for sidechains is collected

    • Densities are determined and smoothed

    • Χ dihedral space of sidechains is countoured in as many dimensions necessary

    • Probability of occurrence of a sidechain is determined according to its location w.r.t. contours

  • Percentage of flippable NQH sidechains

Nucleotide validation
Nucleotide validation consequences

  • Essential to check quality of nucleotides as much as proteins, else they may become error-sinks!

  • Prominent tetrahedral phosphates and planar bases

  • Sugar-phosphate backbone defined by 6 dihedrals

    • ~ 50 frequent ‘suites’

  • Dominant puckers are C3’-endo, C2’-endo

  • Implemented in MolProbity

  • Quality metrics

    • Percentage of unfavorable backbone suites

    • Percentage of unlikely ribose puckers

  • RNA backbone: Consensus all-angle conformers and modular string nomenclature (an RNA Ontology Consortium contribution) Jane S. Richardson et al. RNA 2008. 14: 465-481

Packing clashes
Packing: clashes consequences

  • D(A,B) < vdwR(A) + vdwR(B)

    • Covalent bonding? Noncovant interaction?

    • Steric clash! Unrelated atoms cannot get arbitrarily close (L-J’s 6-12 potential)

  • Heavy atom clashes are rare and avoided in refinement

  • Hydrogens

    • generally absent in refinement.

    • Clashes on rebuilt hydrogens is a powerful validation check!

  • Quality metric

    • Number of bumps per 1000 atoms after H-addition

    • Local: per residue clashes

Clashes Without Hydrogens

Clashes With Hydrogens Added

  • Kimemages for 3lzm with and without hydrogen addition. From MolProbity server.

Packing quality
Packing quality consequences

  • Protein interiors

    • well-packed with complementary surfaces

    • Satisfied h-bond donors, acceptors

    • Do not have voids

  • Completeness of model: Fraction of non-solvent atoms present in the model with decent occupancy and B-factors

  • Interior voids can be due to inflated unit cell dimensions, e.g. Lysozyme identified by RosettaHoles.

  • Interaction quality for residues

    • Count number of unsatisfied buried hbond donors acceptors

    • Report atypical neighbourhood not observed previously in the database

    • E.g. DACA, verify3D

    • Fraction of unsatisfied buried h-bond donor-acceptors

  • Inside-outside profile

    • Likelihood of observing a residue-window buried or solvent-exposed

    • Can indicate register errors alongwith DACA

  • RosettaHoles: Rapid assessment of protein core packing for structure prediction, refinement, design, and validation. Will Sheffler, David Baker. Volume 18, Issue 1, Pages 229-239.

  • Picture thanks to X-ray validation task force report.

  • Quality control of protein models: Directional atomic contact analysis. G. Vriend, C. Sander. J.Appl.Cryst. (1993) 26, 47-60.

  • Voronoi image from Swanand ore poster on ProVAT.

Quality based on model data
Quality based on model & data consequences

  • Data sufficiency for model parameterization

    • Resolution and data to parameters ratio

  • R factors

    • Match between observed and calculated structure factor amplitudes

  • Map quality

    • Clarity and noise in the final map

  • Quality of mutual fit between model and map

  • Symmetry-related packing

  • B factors

  • Estimate of coordinate precision

Is model plausible with amount of data available
Is model plausible with consequencesamount of data available?

  • Model can be constructed at various levels of details

    • CA-only or heavy atoms only or hydrogens too

    • Macromolecule only or solvent also

    • TLS – isotropic – anisotropic B factors

    • Single or multiple conformers with partial occupancies

  • Images copied from Jawahar’sroadhowppt

Is model plausible with amount of data available1
Is model plausible with consequencesamount of data available?

  • Not all detail can be modelled across all resolutions

    • More reflection data is available at better resolutions

    • A model with high data to params ratio is more credible

    • A good model has just enough detail to explain the observed data without overfitting it

    • Low data to params ratio can lead to overfitting which manifests as model errors

  • Beware of a model...

    • With anisotropic B factors at 3Å

    • With multi-model refinement at 4.5Å (e.g. Chang 2001)

    • With hydrogens or many waters modelled at 2.7Å

  • Images copied from Bernhard Rupp’s book and website.

R and r free
R and consequencesRfree

  • R describes how well do calculated and observed structure factor amplitudes match.

    • Low R is better!

    • Before refinement, Fo’s are divided into working and refinement-‘free’ sets.

    • Free set should not relate with working set via symmetry-related reflections.

    • Rwork: R calculated on Fo’s exposed to refinement.

    • Rfree: R calculated on Fo’s free of refinement.

    • Rfree > Rwork: indicates over-fitting if difference is large.

  • Resolution-dependence of Rfree , Rwork and difference

  • R-factor increases in higher resolution shells

    • Greater detail to fit and higher chance of not getting it right

    • High R-factor at low resolution: is bulk solvent model correct?

  • Images copied from Bernhard Rupp’s book and website.

Map quality
Map quality consequences

  • ρ(x) = 1/V ΣF(h) exp (-2πih.x)

    • Density errors result from errors in phases, amplitudes, bulk solvent model, occupancies, B factors

  • Map with clear separation of protein and solvent

    • Bulk solvent should have uniform low density

  • Crystallographic / NCS axes of symmetry do not have density

    • Special positions are rare

  • Flat difference density

    • e.g. 1lzw 2.5Å, 1aac 1.3Å

    • Image from ActaCryst. (2003). D59, 1881-1890. The phase problem. G. Taylor

    • Imgas of difference maps with Coot.

    Quality of fit between model and map
    Quality of fit between model and map consequences

    • Maps set the experimental model apart from theoretical model

      • Give an intuitive assessment of reliability of model features

    • Maps reveal

      • possibly unmodelled entities

      • poorly modelled entities

    • Different regions of map and model exhibit different quality of fit

      • When data is good and good reason for heterogeneity, even multiple conformers of same sidechain can be modelled confidently

    • Image A copied from Bernhard Rupp’s book and website.

    Quantification of model map fit
    Quantification of model-map fit consequences

    • Real-space R

      • Combined map = 2mFo-DFc, αc

      • Calculated map = DFc

      • Maps have to be scaled together

      • RSR calculated on map-values on grid points surrounding a residue or a fragment of interest

    • RSCC is a correlation coefficient, does not need scaling of maps

    Observed density

    Calculated density

    RSR =  |obs - calc| /  |obs+ calc|

    • Improved methods for building protein models in electron density maps and the location of errors in these models. T. A. Jones. ActaCryst. (1991). A47, 110-119.

    Maps rsr rscc
    Maps, RSR, RSCC consequences

    Maps rsr rscc1
    Maps, RSR, RSCC consequences

    • RSR is dependent on residue type

      • Different flexibility and levels of solvent exposure

    • RSR depends on resolution

      • Calculated electron density will be poorer at lower resolution

    • RSR-Z

      • Brings RSRs of residues on same scale, by removing the effects of resolution and residue type

      • Z(RSR, residue-type, resolution) =

        (RSR - <RSR(aa,d)>) / σ(RSR(aa,d))

    Maps unaccounted density
    Maps: unaccounted density consequences

    • Ligand is not modelled.

      • 2A2U (2.5Å), 2A2G (2.9Å)

    Inspecting small molecules in maps
    Inspecting small molecules in maps consequences

    • Ligand present but modelled as waters.

    • Ligand is forced into density?

    • 1FQH (2000, 2.8Å, JACS)

    Inspecting small molecules in maps1
    Inspecting small molecules in maps consequences

    • Ligand identity is mistaken

      • 1cbq, 2anq

    • Crystallographic refinement of ligand complexes. Gerard J. Kleywegt. ActaCrystallogr D BiolCrystallogr. 2007 January 1; 63(Pt 1): 94–100.

    Inspecting small molecules in maps2
    Inspecting small molecules in maps consequences

    • Is the expected ligand present?

      • 1CET : GOLD docking (blue) pose at least has more vdw interactions

    Symmetry and packing
    Symmetry and packing consequences

    • There are substantial crystallographic interfaces across subunits

    • Poor contacts, big voids is a sure indicator of problems e.g. wrong cell dimensions

    1aac, 106 aa


    abc: 28.95, 56.54, 27.55

    αβγ: 90.0, 96.38, 90.0

    1bef, 176 aa


    abc: 48.8, 62.4, 39.6

    αβγ: 90.0, 96.7, 90.0

    2hr0, 1560 aa


    abc: 151.2, 142.7, 203.7

    αβγ: 90.0, 98.9, 90.0

    B factors
    B-factors consequences

    • F(h) = V Σfi exp(2πih.xi) exp (-4B sin2 θ / λ2)

      • Bi = 8 π2 Ui2

      • B = 50 => U = 0.5Å; B = 100 => U = 1.13Å; B = 200 => U = 1.6Å

      • U = RMS displacement of atom, uncertainty in coordinates

        • Can be anisotropic ellipsoid described by 6 parameters

        • Diminish the scattering intensity rapidly at better resolutions

    • B factors can become “error sinks”

      • Refinement increases B factor to explain the absence of strong density

      • Dynamic disorder, thermal vibration, static disorder

      • Low occupancy can be modelled as high B factor!

      • Corresponding atoms don’t obey strict NCS, leading to high B

      • Wrong conformation, non-existent molecules, wrong atomtype

      • Essential to look at bad B factor windows


    B factors and the model
    B-factors and the model consequences


    • Mainchain has lower B factors (~20) than sidechains (~35)

    • Unsuitable B-factor constraints / restraints can result in abnormal peaks in B-factor histogram

    • Average B factor of model should agree with initially estimated Wilson B.

    • B factors increase with solvent exposure, is least in core.

    • Abrupt changes in B are not physically reasonable

      • Large differences in magnitude

      • Large incompatibilities in anisotropy e.g. TLS boundaries





    • E.A. Merritt (1999a) "Expanding the Model: Anisotropic Displacement Parameters in Protein Structure Refinement". ActaCryst. D55, 1109-1117.

    • Wilson image from CCP4 wiki.

    Coordinate precision
    Coordinate Precision consequences

    • How precise is my model?

      • i.e. how repeatable is my solution? How dependable are the coordinates?

      • precision = accuracy if no systematic bias

    • Precision impacts all downstream calculations with the model

      • E.g. Estimation of bondlength variance

      • E.g. Estimation of hydrogen bonding

      • E.g. Estimation of active site volume

      • E.g. estimation of solvent exposure

      • Which is why all structural-bioinformatics calculations cannot be straightforward formulae, they must have a fuzz factor

    • Imprecise coordinates generally have higher B factors, lower occupancies or both

    • Estimation of precision

      • Luzzati plot

      • Sigma-A

      • Cruickshank DPI

    Coordinate precision calculations
    Coordinate Precision Calculations consequences

    • Luzzati plot

      • Upper estimate of precision for low-B coordinates

      • Slope of R vs 1/d

    • Cruickshank DPI

      • Upper estimate of precision for average-B coordinates

      • 2.2 Natoms1/2 Vasu1/3nobs-5/6Rfree

      • Ignoring variation due to NatomsandVasuleads to a simple graph approximating the precision.

    • Calculated precision

      • Is positional

      • is estimate of 1σ distribution of observations

      • ~ 1 in 3 coordinates will have > 1σ imprecision

      • ~ 1 in 400 coordinates will have > 3σ imprecision

    • Image from Rupp book

    • Image from David Blow’s book.

    Putting it together
    Putting it together consequences

    • Phenix quality polygons

      • Lots of criteria! - At a glance check of relative quality

      • What is the model quality with respect to other comparable models?

        • E.g. Is it ok to have obtained R=0.25, Rfree=0.3, average B = 50 at resolution of 3Å?

      • Compute same criteria for comparable models

      • Make a polygon using percentiles.

    • Coot + Molprobity

      • Integrating validation tightly with model building

    • Crystallographic model quality at a glance. LudmilaUrzhumtseva et al. ActaCryst. (2009). D65, 297–300

    Tools for quality checking
    Tools for quality-checking consequences

    • Data-only

      • CCP4 sfcheck

      • Phenix.Xtriage

      • Uppsala Software Factory

    • Model-only

      • ProCheck, PDBSum

      • WhatCheck

      • MolProbity, Coot

    • Model + Data

      • EDS

      • PARVATI

      • ValLigURL

    • Critical thinking!

    • Ability to script a quality check

    Summary consequences

    • A good model makes sense from all aspects

      • chemical, physical, structural, crystallographic, statistical, biological

    • Errors are part of the game, but so is validation of model quality.

      • Most checks are diagnostic, but do not identify true cause or cure

      • Not for finding mistakes but preventing errors

    • Standard criteria and tools catch majority of errors and help build a high quality model.

    • Special attention should be given to non-standard entities like small molecules, carbohydrates etc.

    • Comparison against other models of similar resolution and size is useful.

    Model quality and pdbe
    Model Quality and consequencesPDBe

    • Tools for checking model quality are not available in a centralized location.

    • Places where quality stats are most needed

      • PDB deposition process

      • Webpages for PDB entries

      • Peer-review process

    • X-ray Validation Task Force

      • VTF is compiling recommendations regarding calculation and presentation of quality metrics.

      • All criteria are to be reported as percentiles within entries of comparable resolution.

      • PDBe will implement a comprehensive resource for validating model quality, available as online or downloadable tool.

      • The resource will be useful for end-users and depositors.

    Acknowledgements consequences

    • Alejandro and IPMont

    • SameerVelankar, JawaharSwaminathan at PDBe

    • Great books (McPherson, Blow, Rupp, Petsko-Ringe)

    • Excellent resources online

      • Rupp web

      • Randy Read’s course

      • Wikipedia

      • CCP4 wiki

      • ... many more