Protein Structural Prediction

Protein Structural Prediction

Protein Structure is Hierarchical

Structure Determines Function The Protein Folding Problem • What determines structure? • Energy • Kinematics • How can we determine structure? • Experimental methods • Computational predictions

Primary Structure: Sequence • The primary structure of a protein is the amino acid sequence

Primary Structure: Sequence • Twenty different amino acids have distinct shapes and properties

Primary Structure: Sequence A useful mnemonic for the hydrophobic amino acids is "FAMILY VW"

Secondary Structure: , , & loops •  helices and  sheets are stabilized by hydrogen bonds between backbone oxygen and hydrogen atoms

Secondary Structure:  helix

Secondary Structure:  sheet b sheet b buldge

Second-and-a-half-ary Structure: Motifs beta helix beta barrel beta trefoil

Tertiary Structure: Domains

Mosaic Proteins

Tertiary Structure: A Protein Fold

Protein Folds Composed of , , other

Quaternary Structure: Multimeric Proteins or Functional Assemblies • Multimeric Proteins • Macromolecular Assemblies Ribosome:Protein Synthesis Hemoglobin: A tetramer Replisome: DNA copying

Protein Folding • The amino-acid sequence of a protein determines the 3D fold [Anfinsen et al., 1950s] Some exceptions: • All proteins can be denatured • Some proteins have multiple conformations • Some proteins get folding help from chaperones • The function of a protein is determined by its 3D fold • Can we predict 3D fold of a protein given its amino-acid sequence?

The Leventhal Paradox • Given a small protein (100aa) assume 3 possible conformations/peptide bond • 3100 = 5 × 1047 conformations • Fastest motions 10- 15 sec so sampling all conformations would take 5 × 1032 sec • 60 × 60 × 24 × 365 = 31536000 seconds in a year • Sampling all conformations will take 1.6 × 1025 years • Each protein folds quickly into a single stable native conformation the Leventhal paradox

Quick Overview of Energy

The Hydrophobic Effect • Important for folding, because every amino acid participates! Fauchere and Pilska (1983). Eur. J. Med. Chem. 18, 369-75. Experimentally Determined Hydrophobicity Levels

Protein Structure Determination • Experimental • X-ray crystallography • NMR spectrometry • Computational – Structure Prediction (The Holy Grail) Sequence implies structure, therefore in principle we can predict the structure from the sequence alone

Protein Structure Prediction • ab initio • Use just first principles: energy, geometry, and kinematics • Homology • Find the best match to a database of sequences with known 3D-structure • Threading • Meta-servers and other methods

Ab initio Prediction • Sampling the global conformation space • Lattice models / Discrete-state models • Molecular Dynamics • Pre-set libraries of fragment 3D motifs • Picking native conformations with an energy function • Solvation model: how protein interacts with water • Pair interactions between amino acids • Predicting secondary structure • Local homology • Fragment libraries

Lattice String Folding • HP model: main modeled force is hydrophobic attraction • NP-hard in both 2-D square and 3-D cubic • Constant approximation algorithms • Not so relevant biologically

Lattice String Folding

? ? ? ROSETTAhttp://www.bioinfo.rpi.edu/~bystrc/hmmstr/server.php http://depts.washington.edu/bakerpg/papers/Bonneau-ARBBS-v30-p173.pdf • Monte Carlo based method • Limit conformational search space by using sequence—structure motif I-Sites library (http://isites.bio.rpi.edu/Isites/) • 261 patterns in library • Certain positions in motif favor certain residues • Remove all sequences with <25% identity • Find structures of the 25 nearest sequence neighbors of each 9-mer Rationale • Local structures often fold independently of full protein • Can predict large areas of protein by matching sequence to I-Sites

Non polar helix Abundance of alanine at all positions Non-polar side chains favored at positions 3, 6, 10 (methionine, leucine, isoleucine) I-Sites Examples • Amphipathic helix • Non-polar side chains favored at positions 6, 9, 13, 16 (methionine, leucine, isoleucine) • Polar side chains favored at positions 1, 8, 11, 18 (glutamic acid, lysine)

? ? ? ROSETTA Method • New structures generated by swapping compatible fragments • Accepted structures are clustered based on energy and structural size • Best cluster is one with the greatest number of conformations within 4-Å rms deviation structure of the center • Representative structures taken from each of the best five clusters and returned to the user as predictions

Robetta & Rosetta

Rosetta results in CASP

Rosetta Results • In CASP4, Rosetta’s best models ranged from 6–10 Å rmsd C • For comparison, good comparative models give 2-5 Å rmsd C • Most effective with small proteins (<100 residues) and structures with helices

Only a few folds are found in nature

The SCOP Database Structural Classification Of Proteins FAMILY: proteins that are >30% similar, or >15% similar and have similar known structure/function SUPERFAMILY: proteins whose families have some sequence and function/structure similarity suggesting a common evolutionary origin COMMON FOLD: superfamilies that have same secondary structures in same arrangement, probably resulting by physics and chemistry CLASS: alpha, beta, alpha–beta, alpha+beta, multidomain

Status of Protein Databases PDB SCOP: Structural Classification of Proteins. 1.67 release24037 PDB Entries (15 May 2004). 65122 Domains. EMBL

Evolution of Proteins – Domains • #members in different families obey power law • 429 families common in all 14 eukaryotes; • 80% of animal domains, 90% of fungi domains • 80% of proteins are multidomain in eukaryotes; • domains usually combine pairwise in same order --why? Chothia, Gough, Vogel, Teichmann, Science 300:1701-17-3, 2003 Evolution of proteins happens mainly through duplication, recombination, and divergence

Homology-based Prediction • Align query sequence with sequences of known structure, usually >30% similar • Superimpose the aligned sequence onto the structure template, according to the computed sequence alignment • Perform local refinement of the resulting structure in 3D The number of unique structural folds is small (possibly a few thousand) 90% of new structures submitted to PDB in the past three years have similar folds in PDB

Examples of Fold Classes

Raw model Loop modeling Side chain placement Refinement Homology-based Prediction

Homology-based Prediction

Protein Structural Prediction