1 / 59

Comparative Modeling for Beta Protein Structure Prediction

Comparative Modeling for Beta Protein Structure Prediction. Lenore J. Cowen Tufts University. Amino Acids. A protein is composed of a central backbone and a collection of (typically) 50-2000 amino acids (a.k.a. residues).

bern
Download Presentation

Comparative Modeling for Beta Protein Structure Prediction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Comparative Modeling for Beta Protein Structure Prediction Lenore J. Cowen Tufts University

  2. Amino Acids A protein is composed of a central backbone and a collection of (typically) 50-2000 amino acids (a.k.a. residues). There are 20 different kinds of amino acids each consisting of up to 18 atoms, e.g., Name 3-letter code1-letter code Leucine Leu L Alanine Ala A Serine Ser S Glycine Gly G Valine Val V Glutamic acid Glu E Threonine Thr T

  3. repeating backbone structure repeating backbone structure Protein Structure O H O H O H O H O H O H O H H3N+ CH C N CH C N CH C N CH C N CH C N CH C N CH C N CH COO- CH2 CH2 CH CH2 H C CH3 CH2 CH2 CH2 CH2 COO- CH2 H3CCH3 CH2 HC CH CH2 CH2 CH3 HN N OH NH CH C NH2 N+H2 AspArgVal TyrIleHisProPhe DRVYI H PF Protein sequence: DRVYIHPF

  4. Protein Folding Problem Given an amino acid sequence, e.g., MDPNCSCAAAGDSCTCANSCTCLACKCTSCK, how will it fold in 3D? The fold is important because it determines the function of the protein.

  5. Note: The pictures I’ve been giving are “cartoons” of the backbone

  6. The Inverse Protein Folding Problem Instead of given a sequence, and asking what’s its fold, take a fold, and ask for all the sequences that form that fold. …VLWIXS…. …SSCILWG…

  7. What do we mean by “that fold”?

  8. SCOP (http://scop.mrc-lmb.cam.ac.uk/scop/)

  9. SCOP (http://scop.mrc-lmb.cam.ac.uk/scop/)

  10. SCOP (http://scop.mrc-lmb.cam.ac.uk/scop/)

  11. Can we recognize and model all folds that form a beta-trefoil, etc.? • If they are evolutionarily close enough the answer is YES. • Use BLAST to recognize homology (similar sequences have similar folds) and align conserved parts of the backbone. …GVFIIIMGSHGK… …GVD-LMG-HGR…

  12. Comparative modeling • One the backbone of the conserved core is fixed, pack in the sidechains • Add loops and unstructured regions.

  13. Can we recognize and model all folds that form a beta-trefoil, etc.? • But STRUCTURE can be more CONSERVED that sequence—maybe the structures align but we can no longer use BLAST because the sequence similarity is too weak …GVFIIIMGSHGK… …GR—CV-GCAGR…

  14. Comparative modeling • If you CAN find the correct alignment, can do as before. • One the backbone of the conserved core is fixed, pack in the sidechains • Add loops and unstructured regions.

  15. Approaches to Structural Motif Recognition • Statistical template/profile methods (Altschul et al. 1990) • Hidden Markov Models (Eddy, 1998) • Threading Methods (Jones et al. 1992) • Combinations of two or more of the above

  16. Our Results Recognizing the Beta Helix and Beta Trefoil Folds

  17. The Right-handed Parallel Beta-Helix A processive fold composed of repeated super-secondary units. Each rung consists of three beta-strands separated by turn regions. No sequence repeat. Pectate Lyase C (Yoder et al. 1993)

  18. Biological Importance of Beta Helices • Surface proteins in human infectious disease: • virulence factors • adhesins • toxins • allergens • Proposed as a model for amyloid fibrils (e.g. Alzheimer’s and Creutzfeldt-Jakob) • Virulence factors in plant pathogens

  19. What was Known Solved beta-helix structures: 12 structures in PDB in 7 different SCOP families Pectate Lyase: Pectate Lyase C Pectate Lyase E Pectate Lyase Galacturonase: Polygalacturonase Polygalacturonase II Rhamnogalacturonase A Pectin Lyase: Pectin Lyase A Pectin Lyase B Chondroitinase B Pectin Methylesterase P.69 Pertactin P22 Tailspike

  20. BetaWrap Program • [Bradley, Cowen, Menke, King, Berger, PNAS, 2001, 98:26, 14,819-14,824; Cowen, Bradley, Menke, King, Berger (2002), J Comp Biol, 9, 261-276] • Performance: • On PDB: no false positives & no false negatives. Recognizes beta helices in PDB across SCOP families in cross-validation. • Recognizes many new potential beta helices when run on larger sequence databases. • Runs in linear time (~5 min. on SWISS-PROT).

  21. BetaWrap Program • Histogram of protein scores for: • beta helices not in database (12 proteins) • non-beta helices in PDB (1346 proteins )

  22. Single Rung of a Beta Helix

  23. 3D Pairwise Correlations B3 T2 B2 B1 Stacking residues in adjacent beta-strands exhibit strong correlations Residues in the T2 turn have special correlations (Asparagine ladder, aliphatic stacking)

  24. Question: how can we find these correlations which are a variable distance apart in sequence?

  25. Finding Candidate Wraps • Assume we have the correct locations of a • single T2 turn (fixed B2 & B3). Candidate Rung B3 T2 B2 • Generate the 5 best-scoring candidates for the next rung.

  26. Scoring Candidate Wraps (rung-to-rung) Rung-to-rung alignment score incorporates: • Beta sheet pairwise alignment preferences taken from amphipathic beta structures in PDB. • (w/o beta helices) • Additional stacking bonuses • on internal pairs. • Distribution on turn lengths.

  27. Scoring Candidate Wraps (5 rungs) • Iterate out to 5 rungs generating candidate wraps: • Score each wrap: • - sum the rung-to-rung scores • - B1 correlations filter • - screen for alpha-helical content

  28. Predicted Beta Helices • Features of the 200 top-scoring proteins in the NCBI’s protein sequence database: • Many proteins of similar function to the known beta- helices; some with similar sequences. • A significant fraction are characterized as microbial outer membrane or cell-surface proteins. • Mouse, human, worm and fly sequences significantly underrepresented – only two proteins!

  29. Some Predicted Beta Helices in Human Pathogens Cholera Ulcers Malaria Venereal infection Respiratory infection Listeriosis Sleeping sickness Lyme disease Leishmaniasis Respiratory infection Sleeping sickness Whooping cough Anthrax Rocky Mtn. spotted fever Oriental spotted fever Meningitis Legionnaire’s disease Vibrio cholerae Helicobacter pylori Plasmodium falciparum Chlamyidia trachomatis Chlamydophilia pneumoniae Listeria monocytogenes Trypanosoma brucei Borrelia burgdorferi Leishmania donovani Bordetella bronchiseptica Trypanosoma cruizi Bordetella parapertussis Bacillus anthracis Rickettsia ricketsii Rickettsia japonica Neisseria meningitidis Legionaella pneumophilia

  30. The Beta-Trefoil The beta-trefoil consists of three leaves around an axis of three-fold symmetry. B3 T2 Cap T3 B2 T1 x3 Barrel B4 B1 Single Leaf Entire trefoil (3 leaves) 1BFF (Kitagawa et al. 1991)

  31. Templates Aleaf templateconsists of: T2 B3 • a B1-strand, followed by a T1 turn of length 2 to 17, followed by • a B2-strand, followed by a T2 turn of length 0 to 11, followed by a B3-strand, followed by • a T3 turn of length 4 to 20, followed by a B4 strand. T3 B2 T1 Cap template B4 B1 In addition, it is between 26 and 64 residues long. Atrefoil templateconsists of three leaf templates separated by two T4 turns of length 0 to 16.

  32. What Pairs Do We Consider? In both the barrel and the cap, we consider both directly aligned pairs of residues and pairs of residues one-off from each other. Different tables are used for pairwise preferences for buried, exposed, and one-off pairs of residues. B3 T2 T3 B2 T1 B4 B1

  33. Packing moves earlier in the modeling process • In order to produce more accurate sequence-structure alignments, we return several possible “wraps” and try to pack sidechains. • So sidechain packing is used earlier in the comparative modeling process; also to help find the correct sequence-structure alignment.

  34. The Packing Function • Top wraps fed to packing function. • SCWRL (Canutescu, 2003) is better at packing cap than barrels. • Input to SCWRL: • Atomic coordinates of the backbone of cap strand pairs from a member of each trefoil superfamily in the training set. • Top 4 wraps of the target sequence onto the trefoil template. • Return best-scoring wrap with a good packing, if one exists, else reject.

  35. Example of the Packing Phase Partial PDB file from actual trefoil ATOM 4340 N LEU B 196 41.442 … ATOM 4341 CA LEU B 196 40.705 … ATOM 4342 C LEU B 196 40.704 … ATOM 4343 O LEU B 196 41.787 … ATOM 4344 CB LEU B 196 41.441 … ATOM 4345 CG LEU B 196 41.503 … ATOM 4346 CD1 LEU B 196 41.902 … ATOM 4347 CD2 LEU B 196 40.155 … ATOM 4348 H LEU B 196 42.299 … ATOM 4349 N THR B 197 39.524 … ATOM 4350 CA THR B 197 39.397 … ATOM 4351 C THR B 197 38.506 … ATOM 4352 O THR B 197 37.700 … ATOM 4353 CB THR B 197 38.704 … ATOM 4354 OG1 THR B 197 39.307 … ATOM 4355 CG2 THR B 197 38.808 … ATOM 4356 H THR B 197 38.752 … … ATOM 1 N LEU 1 41.442 … ATOM 2 CA LEU 1 40.705 … ATOM 3 C LEU 1 40.704 … ATOM 4 O LEU 1 41.787 … ATOM 5 CB LEU 1 41.412 … ATOM 6 CG LEU 1 40.686 … ATOM 7 CD1 LEU 1 39.364 … ATOM 8 CD2 LEU 1 41.533 … ATOM 9 N ARG 2 39.524 … ATOM 10 CA ARG 2 39.397 … ATOM 11 C ARG 2 38.506 … ATOM 12 O ARG 2 37.700 … ATOM 13 CB ARG 2 38.788 … ATOM 14 CG ARG 2 39.658 … ATOM 15 CD ARG 2 38.984 … ATOM 16 NE ARG 2 39.799 … ATOM 17 CZ ARG 2 39.404 … … Predicted cap atomic positions SCWRL Known Cap LTSKD STILL 12345 67890 Cap from top wrap LRVYY RILHN 12345 67890 B3 7 B3 9 8 6 10 4 B2 5 3 B2 Steric clash 1 2 1ABR (Tahirov et al. 1995)

  36. Toward Automation • For each SCOP beta-structural template *align all known examples of fold *find pairs in conserved core *thread onto template (additionally use profiles); find candidate alignments Pack sidechains for each, determine best structure Place loops and unstructured regions

  37. Toward Automation • For each SCOP beta-structural template *align all known examples of fold *find pairs in conserved core *thread onto template (additionally use profiles); find candidate alignments Pack sidechains for each, determine best structure Place loops and unstructured regions

  38. Multiple Structure Alignment for Remote Protein Homologs • We spend the remainder of the talk discussing our new program for multiple structure alignment: MATT

  39. The Multiple Structure Alignment Problem Input: atomic coordinates for the backbones of m protein structures Output: A sequence alignment of the protein structures, together with a superimposition of the structures in 3D space.

  40. The Multiple Structure Alignment Problem Def: the common core of a protein structure is the set of positions where every structure contributes a residue in alignment

  41. The Multiple Structure Alignment Problem Geometric criteria: Good multiple structure alignments MAXIMIZE common core size while MINIMIZING pairwise RMSDs between structures. Note: even simplified versions NP-Hard (Goldman, Istrail and Papadimitriou, 1999)

  42. The Multiple Structure Alignment Problem Discrimination criteria: Good multiple structure alignments align what is “supposed to be aligned” because it is part of the evolutionarily conserved core.

  43. Approaches to Structure Alignment • AFP chaining methods align all short pieces and chain together using dynamic programming • Contact map methods look for similarities within distance matrices • Geometric hashing, secondary structure elements, etc. Afp chaining

  44. Dali (Holm 93) VAST(Bryant 96) LOCK (Singh 97) FlexProt(Shatsky et al. 02) FATCAT (Ye&Godzik 04) LOVOALIGN (Andreani et al. 06) CE/CE-MC(Shindyalov 2000) SSAP (Orengo&Taylor 96) MultiProt (Shatsky&Wolfson 04) POSA (Ye&Godzik 05) Mustang (Konagurthu et al. 06) CBA (Ebert 07) Some Popular Structure Aligners

  45. The Benchmark Datasets • Globins • Homstrad • 1028 alignments • Each alignment contains 2-41 structures • 399 sets with > 2 structures

  46. The Benchmark Datasets Sabmark Superfamily set: • 3645 domains in 426 subsets Twilight zone set: • 1740 domains in 209 subsets Both sets contain: • Between 3 and 25 structures • Decoy structures (sequence matches that reside in different SCOP domains)

  47. Matt: Multiple Alignment with Translation and Twists • Matt is an AFP chaining method that additionally adds flexibility in the form of geometrically impossible bends and breaks.

  48. Other work modeling flexibility • In structure alignment: • Flexprot [Shatsky et al., 2002] • Fatcat/POSA [Ye&Godzik, 2004, 2005] • For other reasons: • Molecular docking [Echols et al,03; Bonvin,06] • Ligand binding [Lemmen et al, 2006] • Decoy construction [Singh&Berger, 2006]

  49. The Matt algorithm Outline of the Matt Algorithm

More Related