1 / 43

6. Homology Modeling

6. Homology Modeling. Prediction of structure from sequence. Flowchart. Comparison of query sequence to nr database. Similar to a sequence of known structure ?. No. Yes. Fold Recognition (Threading). Homology Modeling (Comparative Modeling). Fits a known fold ?. Yes. No.

ashley
Download Presentation

6. Homology Modeling

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 6. Homology Modeling

  2. Prediction of structure from sequence Flowchart Comparison of query sequence to nr database Similar to a sequence of known structure? No Yes Fold Recognition (Threading) Homology Modeling (Comparative Modeling) Fits a known fold? Yes No Ab initio prediction

  3. Homology modeling 4 steps: • Detect template • Align sequence onto template • Build model (loop modeling) • Refine model (relax)

  4. Wrong side chain conformations Small backbone deviations Wrong loop modeling Wrong alignment Wrong template Errors in comparative modeling (Marti-Renom & Sali, 2000)

  5. Homology modeling 4 steps: • Detect template • Align sequence onto template • Build model (loop modeling) • Refine model (relax)

  6. Sequence-structure identity depends on length of protein No dissimilar pairs above the threshold line Sander & Schneider, 1991

  7. Template matching • Given: sequence • Wanted: • structural template • sequence-structure alignment • Easiest approach: • Blast / Psiblast against sequences with known structure • select template based on sequence identity • >70%: straight forward • ~40-50%: usually clear • lower seqid: alignment is challenge • State of the art protocols include • more sophisticated searches • additional information for improved template selection • Profile-profile comparison (HHSEARCH) • Seq-structure compatibility (Threading: RAPTOR) Alignment step is critical

  8. Sequence-sequence alignment • Information content: • Sequence • Profile (Position specific scoring matrix -PSSM) aa preferences for each position • Hidden Markov Models (HMM) Contains in addition position-specific in/del penalties M match D deletion I insertion

  9. Sequence-sequence alignment • Information content: • Sequence-sequence comparison • e.g. BLAST • Profile-sequence comparison • e.g. PSI-BLAST • Profile-profile comparison • e.g. LAMA, PROF_SIM, COMPASS • HMM-HMM comparison • e.g. HHSEARCH More information – increased sensitivity in detecting template

  10. No new folds & superfamilies lately -> template available for everyone • SCOP Folds • # of folds • # of new folds No new folds in the last years!! # of unique folds ~1400 folds Year • SCOP Superfamilies • # of superfamilies • # of new superfamilies No new superfamilies in the last years!! # of unique superfamilies ~2300 superfamilies Year Many sequences – few folds: How can I detect my fold?

  11. E C C C C A A A A 4 D Eab A C D E ….. A -3 -1 0 0 .. C -1 -4 1 2 .. D 0 1 5 6 .. E 0 2 6 7 .. . . . . . 3 10 5 2 9 6 1 8 7 Additional ways to include structural information: Threading Evaluate compatibility of sequence with fold, based on pairwise residue potentials • Essential components: • structural template • neighbor definition • energy function E = S Eaibj positions i,j ACCECADAAC -3-1-4-4-1-4-3-3=-23

  12. Potential fold Threading (fold recognition): Find best template for given sequence 1) ... 56) ... n) ... ... -10 ... -123 ... 20.5 MAHFPGFGQSLLFGYPVYVFGD...

  13. RAPTOR State of the art threading method of choice • Successful for “low-homology” proteins (few homolog sequences – low entropy in alignment) • State-of-the art threading protocol: uses linear programming to efficiently find best seq-str threading (linear combination of regression trees) • Optimizes use of several templates http://raptorx.uchicago.edu/ Jian Peng and Jinbo Xu. RaptorX: exploiting structure information for protein alignment by statistical inference. PROTEINS, 2011; A multiple-template approach to protein threading. PROTEINS, 2011.

  14. Combine sequence-structure and sequence-sequence comparisons • Example 1: GENTHREADER (ANN) How likely are 2 aas to be neighbors?? How likely is aa to be buried/exposed??

  15. Combine sequence and structure for template selection Example 2: HHSEARCH*: • Based on hidden markov models (HMM) • Sequence-HMM alignment • Here: extended to HMM-HMM alignment * Söding. Protein homology detection by HMM-HMM comparison. Bioinformatics (2005) 21: 951

  16. HHSEARCH: HMM-HMM alignment • Formalization: • more sensitive (for hard cases with <20% seqid) than: • Profile-profile comparison • Profile-sequence comparison • Sequence-sequence comparison * Söding. Protein homology detection by HMM-HMM comparison. Bioinformatics (2005) 21: 951

  17. HHSEARCH includes structural information about template Include secondary structure preference in model: • Score pairs of aligned secondary structure elements with substitution matrix • Query sequence: • Predicted secondary structure • (PSIPRED: H/E/C) with confidence [0..9] DSSP: H = alpha helix E = extended strand B = residue in isolated beta-bridge G = 3-helix (3/10 helix) I = 5 helix (pi helix) T = hydrogen bonded turn S = bend • Structural template: • Secondary structure • (DSSP: H/E/B/G/I/T/S) 10 x 3 x7substitution values * Söding. Protein homology detection by HMM-HMM comparison. Bioinformatics (2005) 21: 951

  18. Homology modeling 4 steps: • Detect template • Align sequence onto template • Build model (loop modeling) • Refine model (relax)

  19. Build model • Copy aligned regions from template • Rebuild missing pieces: Model loops • Refine model: add side chains (and minimize; relax)

  20. Build model: Loop modeling Input: • 2 anchors • length of missing residues 2 approaches: • Loop libraries: construct loops from fragments of known structures • Loop closure algorithms • model new conformations • good for longer loops

  21. Fold-trees for loop modeling tasks loop modeling N 1 x 1’ 2 x 2’ C Color – flexible bb Gray – fixed bb Flexible “peptide” edge rigid “peptide” edge 1 1’ 1 1’ rigid “jump” flexible “jump” N: N-terminal; C: C-terminal; X: chain break; O: root of the tree;

  22. Rosetta loop modeling • Define regions that are flexible, and perturb these in a fixed background • Same moves as described in ab initio, but more restricted • Use fold tree architecture: connect take off and landing segment by a jump, cut loop (at defined place, or arbitrarily), apply perturbation, reclose loop • Loop closure: using cyclic coordinate descent (CCD) or kinematic loop closure (KC) • Fragments can be used to improve knowledge-based modeling

  23. Cyclic Coordinate Descend (CCD) closure by moving each joint separately.. Canutescu & Dunbrack,. Protein Sci. 12, 963–972 (2003).

  24. ..to maximally approach end

  25. Repeat to obtain several conformations…. Refine, and select best!

  26. Loop closure and degrees of freedom • Over-constrained for <6 DOFs • Under-constrained for >6 DOF: infinite number of solutions. • A molecular loop closure problem with 6 DOF has at most 16 solutions. • Kinematic loop closure allows calculation of analytical solution

  27. Kinematic loop closure Coutsias (2004) From robotics: Analytical solution of loop closure for 6 degrees of freedom Challenge: • find analytical formulation to extract • all possible backbone structures of a chain segment, that are • geometrically consistent with preceding and following parts of the given structure. Setup:

  28. Kinematic loop closure, cont. Solutions aligned to each other aligned to constant part

  29. Kinematic closure (KC) • Analytical solution of loop closure for 6 degrees of freedom • Extension: analytical determination of all mechanically accessible conformations for 6 torsions of a peptide chain of any length (e.g. 25 residues) • (1) Randomly perturb non-pivot positions • (2) Apply KC to pivot positions

  30. Perturbation + KC Loop backbone minimization Kinematic closure (KC) in Rosetta • Embedded into MCM protocol (low-res + high-res) • 720 steps • Repeat 1000 times

  31. Kinematic closure (KC) • Improves median modeling quality from 2.0Å to 0.8ÅRMSD (on set of 25 loops) (CCD)

  32. Improve loop modeling by sampling along Principle Components (PC) of natural variation • Collect loops of a set of homolog templates • Perform Principle Component Analysis (PCA): Collection of loops can be described by a few (3) PCs only • Improves model quality: more similar to the final structure than to template. • Depends on a set of known homolog structures 8 protein structures PCA1 PCA3 PCA2 Qian 2004 PNAS

  33. Free-energy optimization along PC of natural variation: example • Red: model (2.36A RMSD) • Blue: native • Green: refined (1.42A RMSD) Qian 2004 PNAS

  34. Homology modeling 4 steps: • Detect template • Align sequence onto template • Build model (loop modeling) • Refine model (relax)

  35. Side chain optimization Side chain optimization+ minimization Backbone optimization Backbone optimization Rosetta: Refine model with relax protocol Same as in last ab initio modeling step*: • Introduce general flexibility • Relax protocol finds near-by minima (within 4-5Å RMSD) vdw repulsive Small backbone moves and MCM * MCM protocol: small & shear moves (120 steps; see lecture 5)

  36. Homology modeling with Rosetta Summary - Basic protocol: • Detect template and align sequence: based on HHSEARCH (alignment of two HMMs) or RAPTOR (Threading) • Define aligned regions and loop regions; copy aligned regions and complete protein structure with loop modeling (with KIC - kinematic loop closure, or CCD) • Refine structure with the “relax” protocol

  37. Improvement over single best target single impressive improvements many targets better than template Reasons multiple templates free modeling refinement CASP - Template-based modeling (TBM) worse better

  38. Blue: Native Green: Baker Model04 Red: Template CASP7: Example for improved TBM with Rosetta (T330) Distance cutoff % of residues aligned

  39. Rosetta in CASP7 & 8: use of several templates improves prediction Templates that produce lower energy structures produce better models

  40. Homology modeling - summary • Homology modeling to high resolution is challenging (~ ab initio modeling) • Today models are already better than the template – GOOD NEWS! • Good alignment and template selection are critical • Sophisticated new approaches have improved homology modeling in recent years • Include additional information during template selection, alignment and refinement

More Related