Recent developments in textal
Download
1 / 37

Recent Developments in TEXTAL - PowerPoint PPT Presentation


  • 115 Views
  • Uploaded on

Recent Developments in TEXTAL. Phenix Workshop Berkeley Sept. 2006 Thomas R. Ioerger Texas A&M University. NCS Identification via Pattern Recognition.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Recent Developments in TEXTAL' - hanne


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Recent developments in textal

Recent Developmentsin TEXTAL

Phenix Workshop

Berkeley

Sept. 2006

Thomas R. Ioerger

Texas A&M University


Ncs identification via pattern recognition
NCS Identification via Pattern Recognition

  • Pai, R., Sacchettini, J.C. and Ioerger, T.R. (2006). Identifying non-crystallographic symmetry in protein electron-density maps: a feature-based approach. Acta Crystallographica, D62(9):1012-1021.

  • The Problem:

    • Symmetry averaging can greatly improve phases.

    • Typical methods for finding NCS require ≥ 3 heavy atoms, and are sensitive to errors in coordinates.

    • Despite noise and breaks from symmetry, similar patterns of density exist over large regions of real space (even if imperfectly phased).

    • How to efficiently identify these similarities and derive symmetry operators?


Our approach to ncs
Our Approach to NCS

  • Step 1: calculate backbone using CAPRA

    • Putative C-alpha atoms become centers of regions for initial matching

  • Step 2: Calculate local features for each CA based on pattern of surround CA’s and density; select subset of candidates that are likely to be similar

    • Example features: #CAs, center of mass, moments of inertia, std.dev., skewness, kurtosis…


  • Step 3: Calculate local density correlation between each pair of CA’s (over 5A spheres), with rotation-optimization

  • Step 4: Cluster pairs of matching regions with similar rotation matrices

    • How can you tell if two local transformations are related (from same pair of domains)?

    • Each can transform the coordinates of the other.

  • Definition 1: similar rotation matrices. Given RUV and RPQ as

  • rotation matrices that optimally superpose regions U and V

  • and regions P and Q, respectively, and u, v, p and q as the

  • coordinates of the centers of regions U, V, P and Q, respectively,

  • then RUV is similar to RPQ if q RUV p ≤ 2 A° and

  • u RPQ v ≤ 2 A°.

V

U

P

Q



Results on Experimental Maps non-symmetric deviations)


2a2u non-symmetric deviations)

1a7a

1p32

One subunit (identified by algorithm) superposed on the other

subunits using symmetry operators (also identified by algorithm)


Availability
Availability non-symmetric deviations)

  • Pattern Recognition Algorithm for NCS (by Reetal Pai, PhD student in Ioerger lab)

    • Initial implementation in C and csh scripts

    • User input: structure factors (.mtz), expected # copies

    • Runs CAPRA, extracts features, matches regions…

    • Automatically runs DM to improve phases via averaging

    • Output:

      • NCS operators

      • masks for each region

      • C-alpha chains for each region

      • NCS-averaged structure factors (.mtz)

  • Web server: textal.tamu.edu/NCS

    • Users can upload reflection file; results emailed back


Port to Python non-symmetric deviations)

  • Command line

    # first source phenix_setup and ccp4_setup

    >textal.find_ncs prot.mtz <N> <FP> <PHIB> <FOM>

    ...

    Outputs: prot_ncs_ops.dat, prot_ncs_avg.mtz

    prot_mask_1.xplor, prot_mask_2.xplor...

    prot_region_1.pdb, prog_region_2.pdb...

  • Script-level API

    from textal.find_ncs import find_ncs

    from textal.io.reflection_file import reflection_file

    ref = reflection_file("mbp.mtz")

    obj = find_ncs(reflections=ref,copies=2,

    amplitude='FP',phases='PHIB',FOM='FOM')

    obj.find_ncs()

    (rot_mat,trans_vec) = obj.get_operators(0)

    model1 = obj.get_subunit(0) # type pdb_extended

    mask1 = obj.get_mask(0) # type emap


Improving sequence alignment with simplex
Improving Sequence Alignment with Simplex non-symmetric deviations)

  • Romo, T.R., Sacchettini, J.C. and Ioerger, T.R. (2006). Improving Amino Acid Identification, Fit, and C-alpha Prediction using the Simplex Method in Automated Model-Building. Acta Crystallographica, accepted.

  • The Problem:

    • Most model-building programs build backbone first, then try to recognize side-chains (using probabilities, free atoms, features…)

    • Identification of amino acids is sensitive to errors in predicted Ca coordinates (often up to 1Å rms)

    • Even if sequence alignment is used to correct mistakes, initial side-chains must be sufficiently accurate


Our approach simplex optimization
Our Approach: Simplex Optimization non-symmetric deviations)

  • Simplex is a classic optimization algorithm

    • High radius of convergence

    • Does not require explicit computation of derivatives

  • Simplex can be applied to refine individual residues as rigid bodies (translation+rotation)

    • Several programs do local real-space rigid-body refinement of individual side-chains to improve fit.

    • Typically, applied after aa identity has been determined

  • We apply Simplex in Textal (LOOKUP) during residue selection, to help pick the template from our database that matches the local density pattern best, allowing the Ca atom to shift up to 2Å


Effect of errors in c a coordinates
Effect of Errors in C non-symmetric deviations)a Coordinates

Artificially-introduced errors, starting

from perfect Ca’s from refined model

Percent amino acid identity

Accuracy of amino acids output by LOOKUP for

CzrA (without sequence alignment)


Procedure
Procedure non-symmetric deviations)

  • Step 1: Given a Ca, extract density-based features and retrieve K=400 most similar regions from database

  • Step 2: Re-rank by local density correlation (5Å)

    • Original method:

      • try to find optimal rotation only

    • New method:

      • Generate initial Simplex: N+1 perturbations of configuration vector (6-DOF)

      • Evaluate density correlation coefficient of each

      • Pick the lowest, and ‘reflect’ over average of remaining configuration vectors

worst score

mean of

rest

new

6D config. space

Vector representing original position (3 coords) and orientation (3 angles) of side-chain


Results on experimental maps
Results on Experimental Maps non-symmetric deviations)

Percent identity of model compared to true (refined) structure:


Without Simplex non-symmetric deviations)

With Simplex

True structure

Without Simplex

With Simplex


Textal for molecular replacement
TEXTAL for Molecular Replacement non-symmetric deviations)

  • Motivation:

    • Why not exploit the MR search model if available?

    • No excuse for mistakes in connectivity or aa identities

  • Steps toward larger goal of Model Completion

  • Idea:

    • Rotate search model into density (MR solution)

    • Replace amino acid identities with new sequence

    • Run LOOKUP to build side-chains into new density


  • Issues: non-symmetric deviations)

    • Backbones sometimes diverge (e.g. in loops)

    • Phase improvement: How to identify and edit-out incorrect parts of the model built?

    • Avoiding model bias

  • Our Approach:

    • Use CAPRA to generate backbone for new density

    • Match up Ca’s with search model (core of protein)

    • Identify divergences (no nearby matches)

    • Fill in gaps with chains from new density


5.35 non-symmetric deviations)Å

Deletion in model

  • Method

    • Generate map around search model (MR solution)

    • Run CAPRA to generate new backbone

    • Assign Ca’s (closest match between models, up to 3Å)

    • Assign new aa identities based on sequence alignment supplied by user

      ATAAEIAALPRQKVELVDPPFVHAHSQVAEGGPKVVEFTMVI----IVIDDAGTEVHAM...

      -------ELPVIDAVTTHAPEVPPAI--DRDYPAKVRVKMETVEKTMKMDD-GVEYRYW...

      • Format restricted (for now) to 2 long lines (or N pairs of lines for N subunits in search model)


–exp(-( non-symmetric deviations)r-1))

r

  • Connect small gaps (len≤5)

    • Common (including due to alignment errors)

    • Method 1: Look for a bridge using existing Ca’s

    • Method 2: Use a fragment library

      • 4188 9-mers extracted from 238 non-homologous proteins with min RMS of 1.25Å

      • Superpose edges of each fragment on chain ends, with expected number of missing Ca’s in middle

      • Select top 25 fragments by RMS (typically in range of 1-2Å)

      • Evaluate each fragment based on density measured every 0.5Å along fragment

      • Score(frag) = S –exp(-(r-1))


  • Run non-symmetric deviations)patch to make any remaining connections

    • More indiscriminant; may skip residues or insert extra atoms not consistent with alignment

    • Can turn off via --connectivity=conservative

  • Run ca_refine

    • reduces variance in inter-Ca distances

  • Run LOOKUP to build side-chains

  • Run simulated annealing


Results
Results non-symmetric deviations)

  • 3 MR datasets from Phenix structure library:

    native search perc sec MR map

    resomodel ident size str Rtrue corr

    ------ ----- ----- ---- ---- ----- ----

    a2u-globulin 2.5 Å mup 63% 158(x4) alpha 0.20/0.26 0.94

    human-otc 2.4 Å a1s 48% 354 mixed 0.23/0.27 0.89

    nitrite-reductase 1.7 Å kbv 35% 339 beta 0.26/0.29 0.81

    * Rtrue is R-factor after simulated annealing with refined structure

    * MR map corr is density correl. between initial MR map and final 2Fo-Fc

  • After building model with textal.build_mr and running simulated annealing:

    perc num perc map

    built chains ident Rmod corr

    ---- ----- ----- ---- -----

    a2u-globulin 93% 4/4 98% 0.24/0.30 0.95

    human-otc 93% 2 99% 0.30/0.36 0.82

    nitrite-reductase 84% 4 93% 0.35/0.39 0.85

    * Rmod is R-factor of model built by Textal, after simulated annealing

    * Map corr is between model 2Fo-Fc and refined 2Fo-Fc density maps

    * ideal sequence alignments were used based on structural alignments

    generated using Shindyalov’s CE (Combinatorial Extension) algorithm


a2u-globulin (white) non-symmetric deviations)

Textal model (green)

disordered

loop, res 60-64

11 res

N-term tail

not built


loop not built, non-symmetric deviations)

res 266-275

human-otc (white)

Textal (red, green)

C-term not built,

res 345-352


human-otc (white) non-symmetric deviations)

Textal (red, green)


missing loop: non-symmetric deviations)

res 186-205

nitrite-reductase (white)

Textal model (colors)

missing term:

res 5-10

missing term:

res 334-342

missing loop:

res 159-170

missing loop:

res 29-36


nitrite-reductase (white) non-symmetric deviations)

kbv (MR solution, purple)

large

divergent

loop

small

differences

loop insertion


Initial steps toward model evaluation run sfcheck on model built
Initial Steps Toward Model Evaluation non-symmetric deviations)Run SFCHECK on model built…


Identifying errors with sfcheck

quality score (Sfcheck) non-symmetric deviations)

residues (sorted)

Identifying errors with SFCHECK

  • Which combination of values correlates best with errors in model?

  • Use backbone_density_index from SFCHECK as residue quality score

    Thr-203 0.092

    Gly-226 0.297

    Glu-236 0.306

    Thr-269 0.354

    ...


Residues in non-symmetric deviations)purple (50/284)

are those with low backbone

density index scores

(<0.92)


Re-running SA on editted models non-symmetric deviations)

Hypothesis: impact ofcompleteness versus accuracy of model on R-factor

random deletions

  • Issues:

  • B-factors

  • side-chains

  • lack of HETATMs (2 Cu, 3Cd, 244 HOH in refined structure)

  • avoid model bias (use omit maps?)


Availability1
Availability non-symmetric deviations)

  • Phenix command line:

    textal.build_mr [-c] [--symmetry] [--amplitudes] [--phases]

    <reflections> <search_model> <alignment_file>

    textal.build_mr --symmetry=nitrite-redct.inp –amplitudes=FULL_MOD nitrite-reduce.hkl kbv_mr_solution.pdb NR-KBV-align.txt

  • Python API:

    from textal.users.tom.textal_mr import MR_build

    MR_build(reflections=rx,model=mod,alignment=algn,capra_only=True)



Future work
Future Work non-symmetric deviations)

Conclusion

  • TEXTAL can build highly accurate models for Molecular Replacement (completely automatically), with almost perfect coordinates for backbone and side-chains atoms (with the help of simulated annealing), at least in the core (80-90%)

  • Handle missing domains in the search model

  • Incorporate better model evaluation methods

  • Automate the whole improvement cycle


ad