1 / 27

The TEXTAL System: Automated Model-Building Using Pattern Recognition Techniques

The TEXTAL System: Automated Model-Building Using Pattern Recognition Techniques. Dr. Thomas R. Ioerger Department of Computer Science Texas A&M University. Collaboration with: Dr. James C. Sacchettini, Center for Structural Biology, Texas A&M Univ.

elom
Download Presentation

The TEXTAL System: Automated Model-Building Using Pattern Recognition Techniques

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The TEXTAL System:Automated Model-Building Using Pattern Recognition Techniques Dr. Thomas R. Ioerger Department of Computer Science Texas A&M University Collaboration with: Dr. James C. Sacchettini, Center for Structural Biology, Texas A&M Univ. With support from: National Institutes of Health

  2. Automated Structure Determination • Key step to high-throughput Structural Genomics, structure-based drug design, etc. • Many computational tools to generate a map, but... • Given electron density map, how to extract atomic coordinates automatically? • Currently requires humans (+O): potential bottleneck • Sources of difficulty: complexity, low resolution, phase errors, weak density • Related methods: Shake&Bake, ARP/wARP, X-Powerfit, template convolution...

  3. Overview of TEXTAL • Apply pattern recognition techniques • Exploit database of previously-solved maps • Model molecular structures in local regions (e.g. spheres of 5 Angstrom radius) • Intuitive principles: 1) Have I ever seen a region with a pattern of density like this before? 2) If so, what were previous local atomic coordinates?

  4. Overview (cont’d) • Divide-and-Conquer: 1) identify alpha-carbon positions (chain-tracing) 2) model regions around alpha-carbons (CAs), including backbone and side-chain atoms 3) concatenate local models back together, resolve any conflicts • Database contains many regions centered on CAs from previous maps • ~5A radius right for “structural repetition”

  5. Main Stages of TEXTAL electron density map CAPRA build-in side-chain and main-chain atoms locally around each CA C-alpha chains Reciprocal-space refinement/ML DM LOOKUP example: real-space refinement model (initial coordinates) Human Crystallographer (editing) Post-processing routines model (final coordinates)

  6. Feature Extraction • Database: ~105 regions from ~100 maps • How to identify closest match (efficiently)??? • Calculate numerical features that represent the pattern in each region • Must be rotation-invariant • Search can be very fast: just compare features

  7. F=<1.72,-0.39,1.04,1.55...> F=<1.58,0.18,1.09,-0.25...> F=<0.90,0.65,-1.40,0.87...> F=<1.79,-0.43,0.88,1.52...>

  8. Rotation-Invariant Features • Average density: m=(1/n)Sri, where ri is density at each lattice point in region • Other Statistical Features: standard deviation, kurtosis… • Distant to center of mass: • <xc,yc,zc>=(1/n)< Sxiri/m,Syiri/m,Sziri/m> • dcen=(xc2+ yc2+zc2)

  9. More Features • Moments of inertia • measures dispersion around axes of symmetry in a density distribution • calculate 3x3 inertia matrix • diagonalize to get eigenvalues • sort from largest to smallest • take magnitudes and ratios of moments

  10. More Features • Spoke angles • if region centered on CA, should have 3 “spokes” of density emanating from center • find best-fit vectors; calc. angles among them • surface area of contours • connectivity of density/bones in region • other geometrical features...

  11. Feature Weights

  12. CAPRA: C-Alpha Pattern-Recognition Algorithm Density Trace Neural Network Linking into C-alpha chains • Tracer - remove lattice points from map (lowest density first) without breaking connectivity • Neural nework - for each pseudo atom, extract features, input to network, predict distances to CAs (1:10 in trace), trained on example points in real maps • Linking - desire long chains, good CA predictions (not in side-chains), “structurally plausible” (e.g. linear, helical) map pseudo atoms predictions of distance to true CA C-alpha coordinates

  13. Example of the CAPRA Process

  14. Example of CAPRA chains

  15. The LOOKUP Process

  16. Database Construction • Ideally would use solved MAD/MIR maps • Using “back-transformed” maps works well • PDB  structure factors (include B-factors) • keep reflections down to 2.8A • Fourier transform  electron density map • 50 proteins from PDBSelect (non-homol.) • about 50,000 regions • Feature extraction done offline

  17. Details of Matching Process • Feature-based matching: • Euclidean distance metric between feature vectors. • dist(R1,R2)=Swi(Fi(R1)-Fi(R2))2 • Must weight features by relevance • less-relevant features add noise • Slider algorithm: optimize weights by comparing features in matching regions versus mismatches • Verify selections by density correlation • requires search for optimal rotation

  18. Post-Processing Routines • Imperfections in the initial model: • backbone atoms not necessarily juxtaposed between adjacent residues, or in same direction • side-chains occasionally “flipped” into backbone • residue identities often incorrect (based on dens.) • Fixing “flips” and direction - take candidate match with next highest correlation • Real-space refinement: regularizes backbone • Use sequence alignment to fix identities?

  19. New Results on Real MAD Maps aCZRA: missed a 5-res loop (weak density) and C-terminus bM01: missed a 17-res helix, 9 deletions, 5 due to breaks, 3-res false backbone

  20. Histograms of DistancesBetween Matched Atoms

  21. Analysis of Amino Acid Types Confusion Matrix for CZRA: Amino acid in true structure Amino acid in TEXTAL model

More Related