1 / 29

Structure Prediction

Structure Prediction. dmitra. Methods. Ab initio Heuristics Machine learning Homology modeling Threading. RNA Structure Prediction: Ab-initio. Sequence over {A, C, G, U} Complementary pairs attract, form base-pairs or minimizes energy

lmars
Download Presentation

Structure Prediction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Structure Prediction dmitra

  2. Methods • Ab initio • Heuristics • Machine learning • Homology modeling • Threading

  3. RNA Structure Prediction: Ab-initio • Sequence over {A, C, G, U} • Complementary pairs attract, form base-pairs or minimizes energy • We are not interested in overall energy of the sequence, just the process of minimization • Just the linear sequence, zero base pairs, energy=0 • Physics is embedded within “free-energy” parameter/function • Minimization of energy is objective

  4. RNA Structure Prediction: Knot-free • Knot-free assumption • Knot: base pairs (I, j) and (k, l) where I<j<k<l • Knot-free causes planar graph, and makes DP algorithm feasible • Base pairs are disjoint or embed in each other

  5. RNA Structure Prediction: Principle of optimality • Assumption 1: Base-pairing do not affect each other’s energy • Now one can add energy minimization by all base pairs in a string and check which configuration produces lowest energy • Combinatorics is exponential • Need further assumption

  6. RNA Structure Prediction: DP Algorithm • Assume energy for each component can be calculated independently • a(r,k): free energy for base pair (r,k), where r, k from ACGU • a is zero for self-pairing (impossible)

  7. RNA Structure Prediction: DP Algorithm • E(Sij)= min{ E(SI+1,j-1 ) + a(ri,rj), when i,j pairs, Min{E(SI,k-1) + E(Sk+1,j )}, when j pairs with k, I<k=<j} • Compute (n x n) matrix for I and j, bottom up, for I-j=0, I-j=1, I-j=2,… • Complexity: O(n^3)

  8. RNA Structure Prediction: relax assumptions • Consider some special energy functions, other than just the base pairing ones a(r,k) • This means: different “types” of base pairings • Some more practical topology

  9. RNA Structure Prediction: Loops • Say, base pair at (I,j) and I<u<v<w<j • v is accessible from base pair (I,j) if there is no base pair at (u,v) • Loop is the bases accessible from base pair (I,j) • Note, still no knot • Some loops: p249

  10. RNA Structure Prediction: Energy overloops • Say, (I,j) base pair closes a loop • Si+1,j-1 may not have the minimum energy configuration • Because energy of Si+1,j-1 plus free energy of a(ri,rj) may be less than min-energy configuration of string (I+1 to j-1) without base pairing at (I,j) • This interactive-ness was ignored at the previous assumption level • Dynamic Programming can still be done, if we explicitly specify energy parameters

  11. RNA Structure Prediction: Energy overloops • E(Sij)= min{ E(SI+1,j ), I is not paired E(SI+1,j-1 ), j is not paired min{E(S,i,k-1) + E(Sk+1,j )}, when i or j pairs with k, i<k<j}, E(LI,j ), when (I,j) base pairs and all special structures may appear within [embeds first formula of previous assumption] }

  12. RNA Structure Prediction: More assumptions • Disregard free energies that do not belong to any loops • Added energy of only components is the final energy of the string: no interaction between components • Only 4 types of loops’ as in p249 for E(LI,j ), (can add more, if you know their energy parameterization)

  13. RNA Structure Prediction: free energies for 4 loops • Hairpin loop of size k: Zi(k) • Additional stabilizing energy for two adjacent base pairs(in addition to a(r,k)): eta, constant • Destabilizing energy for bulge of size k: beta(k) • Destabilizing energy for interior loop of size k: gamma(k)

  14. RNA Structure Prediction: E(LI,j ) • Hairpin: a(ri,rj) + zi(j-I+1) • Stacked-pair: a(ri,rj)+eta+E(Si+1,j-1) • Bulge on i: min{a(ri,rj)+beta(k)+ E(Si+k+1,j-1), k>=1 • Bulge on j: min{a(ri,rj)+beta(k)+ E(Si+1,j-k-1), k>=1 • Interior loop: min{a(ri,rj)+gamma(k1+k2)+ E(Si+k1+1,j-k2-1), k1,k2>=1

  15. RNA Structure Prediction: complexity • O(n^2) table entries • On each entry: • First 2 formulae: O(1) leading to O(n^2) • Third formula: O(n) :: O(n^3) • 4.1 (E(L) hairpin): O(1) :: O(n^2) • 4.2: O(1) :: O(n^2) • 4.3: O(n), run on k :: O(n^3) • 4.4: O(n), run on k :: O(n^3) • 4.5: O(n^2), run on k1, k2 :: O(n^4) • Final complexity from 4.4: O(n^4)

  16. Protein Threading • Interactions in proteins are between 20x20 residues, as opposed to 4x4 NA’a at most in RNA’s • Residue interactions are quite non-local, causing much more structural complexity • Proteins have frequent loops (helices are loops) • So, prediction by Ab initio is extremely difficult

  17. Protein Threading • Number of protein folds are few (~1,000 for 20,000+ proteins) • Threading: map the target sequence over a template fold • Threading is an alignment problem, Torda, Fig1 • Find the fold to which target “aligns” optimally (minimum “energy” function) • Needs basic scoring functions as in sequence alignment

  18. Protein Threading: number of folds • More the number of folds in database: more time to find correct template • Scoring function for threading is quite imperfect: need more available templates (contradictory requirements)

  19. Protein Threading: Scoring functions • Full force field is not necessarily ideal: • it involves dynamics between molecules, stretch, torsion, etc. • Unimportant for a static alignment

  20. Protein Threading: Scoring functions • Scoring function could be between residues from the same sequence: for coming close to each other on the alignment • Torda, Fig 5 • Example scoring function (free energy): • For pair of residues A and B to be at distance r (Torda, p7): G(AB) = kT ln(rho-rAB / rho-0-rAB), rho-rAB is probability of AB to be at distance r, rho-0 is probability of random occurrence of that (k,T usual)

  21. Protein Threading: Scoring functions • Probabilities are collected from PDB proteins with known structure • Different threading scheme uses different scoring functions, but mostly they are derived from PDB

  22. Protein Threading: Scoring functions • Example (Setubal-Meidanis, p257): • G1(I, ti) for placing i-th residue in sequence to the ti position in the fold • G2(I, j, ti, tj) simultaneous placements of i, j, for I<j • Constrained to be within a range, say bi<ti<ei

  23. Protein Threading • Optimization is not only on placement, but also on multiple folds in database • Accuracy is very sensitive to alignment errors

  24. Protein Threading: Dynamic programming • Advantage/disadvantage of DP is that it is deterministic • Problem: “adjacency” is hard to define in 3D

  25. Protein Threading: Dynamic programming • DP: try out different combination of “adjacent” residues on different parts of a template (Torda, Fig 5c: adjacent comes from template sequence) • Start with smaller number of elements and build up to the full sequence • Alternative approach: start with placing each residue to one of its “possible” positions and see where next residue should go: continue residue by residue

  26. Protein Threading: Probabilistic algorithm • Monte Carlo simulation: randomly throw residues at positions on fold and check aggregate scoring function • Simulated annealing: gradually move residues to optimize, stochastically making random shifts to avoid local optimum • Time consuming, & the result is non-deterministic

  27. Protein Threading: Branch and bound • In the worst case try all possible alignments, but prune the search space for non-useful branches using some bounding function

  28. Protein Threading: Search on folds • Divide and conquer over the space of folds • Assumption: folds can be ordered for their “goodness” for the target protein • Example: Setubal-Meidanis, p258

  29. Protein Threading: Future • Slow • Subsumed by Ab intio of IBM Blue Gene™ type projects • De Novo technique using linear programming (Xu and Li, 2003) • Threading techniques are not only useful for structure prediction but for fold recognition problem also: no alignment, just find the template (fold suggests function)

More Related