Bioinformatics of Protein Structure

Bioinformatics of Protein Structure

Protein structures often characterized by secondary structure content • All a • All b • a/b • a+b • There are tools available (for instance at www.expasy.ch that will allow one to predict secondary structure from sequence data

Sequence/structure • All a-proteins begin to reveal sequence/structure relationship • Coiled-coil proteins exhibit periodicity with hydrophobic residues • Observe hydrophobic moments in membrane proteins

~1/4 of all predicted proteins in a genome are membrane proteins

A different periodicity in b-structures

Common structures found in b structures • Barrels • Propellers • Greek key • Jelly roll (Contains one Greek key) • Helix

Barrels – anti-parallel sheets

Anti-parallel structures exhibit every other amino acid periodicity

Propellers • Variable number of propeller blades http://info.bio.cmu.edu/courses/03231/ProtStruc/b-props.htm

Quaternary structure of neuraminidase

Looking for active sites

g-crystallin has two domains with identical topology • Protein evolution – motif duplication and fusion

Three sheet b-helix = Toblerone

Protein structures containing a and b • Distinction between a/b and a + b • a/b -Mainly parallel beta sheets (beta-alpha-beta units) • a + b - Mainly antiparallel beta sheets (segregated alpha and beta regions)

a/b

Interspersed a and b

Generally, a tight hydrophobic core found in a/b barrels

How many folds are there? Proteins have a common fold if they have the same major secondary structures in the same arrangement and with the same topological connections. To date we know ~26,000 protein structures Within this dataset, 945 folds are recognized http://scop.mrc-lmb.cam.ac.uk/scop/

How many non-folds are there? • http://www.scripps.edu/news/press/013102.html • 30-40% of human genome encodes for “unstructured” native proteins

Transition to structural classifications • Several useful databases link sequence analysis and protein structure information • Since structure is more highly conserved than sequence during evolution, structural alignment algorithms and classifications enable more distant evolutionary relatives to be identified. • CATH and SCOP are two databases that “organize” protein structures, each containing 950-1400 protein superfamilies

Structural Alignments • Various algorithms allow structure vs. structure comparisons • VAST, DALI • CATH (http://www.biochem.ucl.ac.uk/bsm/cath/) also has SSAP and GRATH (one computationally intensive, one not) • [Sequence similarity to structural families for modeling often extracted using PSI-BLAST (Gene3D)]

Pairwise Structure Alignment: SSAP [1,4] Comparison of sequence and structure alignments: [1] Taylor WR, Orengo CA, 1989, Protein structure alignment. J Mol Biol 208:1-22[4] Mueller L, 2003, Protein structure alignment. Paper presentations 27.5:16:30h

Multiple structural alignments • CORA – from CATH (where?) • MultiProt - http://bioinfo3d.cs.tau.ac.il/MultiProt/ • DMAPS – (pre-calculated) http://dmaps.sdsc.edu/ • CE-MC - http://cemc.sdsc.edu/ • Others?

CATH • http://cathwww.biochem.ucl.ac.uk/latest/ • Classification Scheme: Class, Architecture, Topology and Homology • Class – secondary structure composition and packing • Architecture – orientation of secondary structures in 3D, regardless of connectivity • Topology – both orientation and connectivity of secondary structure is accounted for • Homologous superfamily – grouped based on whether an evolutionary relationship exists (clustered at different levels of sequence ID)

CATH hierarchy • Structural alignments To homologous super- Family, then sequence Alignments for sequence Family, and then domains.

Protein structure predictions • Identifying similar protein structures using only amino acid sequence • Modeling an amino acid sequence onto a known protein structure • Ab initio protein structure prediction

Test sequence >rsp2570 MTLDGKTIAILIAPRGTEDVEYVRPKEALTQATVVTVSLEPGEAQTVNGDLDPGATHRVDRTFADVSADAFDGLVIPGGTVGADKIRSSEEAVAFVRGFVSAGKPVAAICHGPWALVEADVLKGREVTSYPSLATDIRNAGGRWVDREVVVDSGLVTSRKPDDLDAFCAKMIEEFAEGVHDGQRRSA

SCOP database • Classification scheme: Class, Fold, Superfamily, and Family, • Class – Type and organization of secondary structure • Fold – Share common core structure, same secondary structure elements in the same arrangement with the same topological connections • Superfamily – share very common structure and function • Family – protein domains share a clear common evolutionary origin as evidenced by sequence identity or similar structure/function

HMM’s are useful at SCOP • For instance, SCOP (http://scop.mrc-lmb.cam.ac.uk/scop/) HMMs are derived from the PDB databank at www.rcsb.org • Identify sequence signatures for specific domains

Modeling protein structure based on homology • SWISS-MODEL • http://swissmodel.expasy.org/ • Using first approach mode, submit test sequence, and use your email • PSI-Blast identifies the most similar sequence with a protein structure, and SWISS-MODEL wraps your input sequence around it • Note you can also specify which structure you would like your sequence to wrap around

Ab initio predictions • Protein folding is a complex problem

Ab initio attempts • Based on Ramachandran plot probabilities • Measure interatomic Interactions – has worked for small proteins <85 aa, which appear to Favor H-bonds and van Der Waal and ignore Electrostatic interactions

Bioinformatics of Protein Structure

Bioinformatics of Protein Structure

Presentation Transcript

Bioinformatics for biomedicine Protein domains and 3D structure

Protein Structure

Basics of Protein Bioinformatics and Structural Bioinformatics

Protein structure prediction: The holy grail of bioinformatics

PROTEIN STRUCTURE

Protein structure

PROTEIN STRUCTURE

Protein Structure

Protein Structure

Protein Structure

STRUCTURE OF PROTEIN

Protein Structure

Protein structure

COT 6930 HPC and Bioinformatics Protein Structure Prediction

Bioinformatics for biomedicine Protein domains and 3D structure

Protein Structure

Protein Structure

Bioinformatics for biomedicine Protein domains and 3D structure

Protein structure

Protein Structure

Structure of protein

Protein structure