html5-img
1 / 33

Bioinformatics of Protein Structure

Bioinformatics of Protein Structure. Protein structures often characterized by secondary structure content. All a All b a / b a + b There are tools available (for instance at www.expasy.ch that will allow one to predict secondary structure from sequence data. Sequence/structure.

Download Presentation

Bioinformatics of Protein Structure

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bioinformatics of Protein Structure

  2. Protein structures often characterized by secondary structure content • All a • All b • a/b • a+b • There are tools available (for instance at www.expasy.ch that will allow one to predict secondary structure from sequence data

  3. Sequence/structure • All a-proteins begin to reveal sequence/structure relationship • Coiled-coil proteins exhibit periodicity with hydrophobic residues • Observe hydrophobic moments in membrane proteins

  4. ~1/4 of all predicted proteins in a genome are membrane proteins

  5. A different periodicity in b-structures

  6. Common structures found in b structures • Barrels • Propellers • Greek key • Jelly roll (Contains one Greek key) • Helix

  7. Barrels – anti-parallel sheets

  8. Anti-parallel structures exhibit every other amino acid periodicity

  9. Propellers • Variable number of propeller blades http://info.bio.cmu.edu/courses/03231/ProtStruc/b-props.htm

  10. Quaternary structure of neuraminidase

  11. Looking for active sites

  12. g-crystallin has two domains with identical topology • Protein evolution – motif duplication and fusion

  13. Three sheet b-helix = Toblerone

  14. Protein structures containing a and b • Distinction between a/b and a + b • a/b -Mainly parallel beta sheets (beta-alpha-beta units) • a + b - Mainly antiparallel beta sheets (segregated alpha and beta regions)

  15. a/b

  16. Interspersed a and b

  17. Generally, a tight hydrophobic core found in a/b barrels

  18. How many folds are there? Proteins have a common fold if they have the same major secondary structures in the same arrangement and with the same topological connections. To date we know ~26,000 protein structures Within this dataset, 945 folds are recognized http://scop.mrc-lmb.cam.ac.uk/scop/

  19. How many non-folds are there? • http://www.scripps.edu/news/press/013102.html • 30-40% of human genome encodes for “unstructured” native proteins

  20. Transition to structural classifications • Several useful databases link sequence analysis and protein structure information • Since structure is more highly conserved than sequence during evolution, structural alignment algorithms and classifications enable more distant evolutionary relatives to be identified. • CATH and SCOP are two databases that “organize” protein structures, each containing 950-1400 protein superfamilies

  21. Structural Alignments • Various algorithms allow structure vs. structure comparisons • VAST, DALI • CATH (http://www.biochem.ucl.ac.uk/bsm/cath/) also has SSAP and GRATH (one computationally intensive, one not) • [Sequence similarity to structural families for modeling often extracted using PSI-BLAST (Gene3D)]

  22. Pairwise Structure Alignment: SSAP [1,4] Comparison of sequence and structure alignments: [1] Taylor WR, Orengo CA, 1989, Protein structure alignment. J Mol Biol 208:1-22[4] Mueller L, 2003, Protein structure alignment. Paper presentations 27.5:16:30h

  23. Multiple structural alignments • CORA – from CATH (where?) • MultiProt - http://bioinfo3d.cs.tau.ac.il/MultiProt/ • DMAPS – (pre-calculated) http://dmaps.sdsc.edu/ • CE-MC - http://cemc.sdsc.edu/ • Others?

  24. CATH • http://cathwww.biochem.ucl.ac.uk/latest/ • Classification Scheme: Class, Architecture, Topology and Homology • Class – secondary structure composition and packing • Architecture – orientation of secondary structures in 3D, regardless of connectivity • Topology – both orientation and connectivity of secondary structure is accounted for • Homologous superfamily – grouped based on whether an evolutionary relationship exists (clustered at different levels of sequence ID)

  25. CATH hierarchy • Structural alignments To homologous super- Family, then sequence Alignments for sequence Family, and then domains.

  26. Protein structure predictions • Identifying similar protein structures using only amino acid sequence • Modeling an amino acid sequence onto a known protein structure • Ab initio protein structure prediction

  27. Test sequence >rsp2570 MTLDGKTIAILIAPRGTEDVEYVRPKEALTQATVVTVSLEPGEAQTVNGDLDPGATHRVDRTFADVSADAFDGLVIPGGTVGADKIRSSEEAVAFVRGFVSAGKPVAAICHGPWALVEADVLKGREVTSYPSLATDIRNAGGRWVDREVVVDSGLVTSRKPDDLDAFCAKMIEEFAEGVHDGQRRSA

  28. SCOP database • Classification scheme: Class, Fold, Superfamily, and Family, • Class – Type and organization of secondary structure • Fold – Share common core structure, same secondary structure elements in the same arrangement with the same topological connections • Superfamily – share very common structure and function • Family – protein domains share a clear common evolutionary origin as evidenced by sequence identity or similar structure/function

  29. HMM’s are useful at SCOP • For instance, SCOP (http://scop.mrc-lmb.cam.ac.uk/scop/) HMMs are derived from the PDB databank at www.rcsb.org • Identify sequence signatures for specific domains

  30. Modeling protein structure based on homology • SWISS-MODEL • http://swissmodel.expasy.org/ • Using first approach mode, submit test sequence, and use your email • PSI-Blast identifies the most similar sequence with a protein structure, and SWISS-MODEL wraps your input sequence around it • Note you can also specify which structure you would like your sequence to wrap around

  31. Ab initio predictions • Protein folding is a complex problem

  32. Ab initio attempts • Based on Ramachandran plot probabilities • Measure interatomic Interactions – has worked for small proteins <85 aa, which appear to Favor H-bonds and van Der Waal and ignore Electrostatic interactions

More Related