1 / 30

Protein Structure and Function Prediction

Protein Structure and Function Prediction. Predicting 3D Structure. Outstanding difficult problem. Comparative modeling (homology) Fold recognition (threading). Based on Sequence. Based on Secondary Structure. Based on Sequence. Comparative Modeling.

etenia
Download Presentation

Protein Structure and Function Prediction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Protein Structure and Function Prediction

  2. Predicting 3D Structure Outstanding difficult problem • Comparative modeling (homology) • Fold recognition (threading) Based on Sequence Based on Secondary Structure

  3. Based on Sequence Comparative Modeling Similar sequence suggests similar structure Comparative structure prediction produces an all atom model of a sequence, based on its alignment to one or more related protein structures in the database

  4. Based on Sequence Comparative Modeling Modeling of a sequence based on known structures Consist of four major steps : • Finding a known structure(s) related to the sequence to be modeled (template), using sequence comparison methods such as PSI-BLAST 2. Aligning sequence with the templates 3. Building a model 4. Assessing the model

  5. Based on Sequence Comparative Modeling • Accuracy of the comparative model is related to the sequence identity on which it is based >50% sequence identity = high accuracy 30%-50% sequence identity= 90% modeled <30% sequence identity =low accuracy (many errors) • Similarity particularly high in core • Alpha helices and beta sheets preserved • Even near-identical sequences vary in loops

  6. Based on Sequence Comparative Modeling Methods MODELLER (Sali –Rockefeller/UCSF) SCWRL (Dunbrack- UCSF ) SWISS-MODEL http://swissmodel.expasy.org//SWISS-MODEL.html

  7. Based on Secondary Structure Protein Folds • A combination of secondary structural units • Forms basic level of classification • Each protein family belongs to a fold • Estimated 1000–3000 different folds • Fold is shared among close and distant family members • Different sequences can share similar folds

  8. Based on Secondary Structure Protein Folds: sequential and spatial arrangement of secondary structures Hemoglobin TIM

  9. Fold classification: • (SCOP) • Class: • All alpha • All beta • Alpha/beta • Alpha+beta • Fold • Family • Superfamily

  10. Based on Secondary Structure Basic steps in Fold Recognition : Compare sequence against a Library of all known Protein Folds (finite number) Query sequence MTYGFRIPLNCERWGHKLSTVILKRP... Goal: find to what folding template the sequence fits best Findways toevaluate sequence-structure fit

  11. Potential fold Based on Secondary Structure Find best fold for a protein sequence: Fold recognition (threading) 1) ... 56) ... n) ... ... -10 ... -123 ... 20.5 MAHFPGFGQSLLFGYPVYVFGD...

  12. Based on Secondary Structure Programs for fold recognition • TOPITS (Rost 1995) • GenTHREADER (Jones 1999) • SAMT02 (UCSC HMM) • 3D-PSSMhttp://www.sbg.bio.ic.ac.uk/~3dpssm/

  13. Ab Initio Modeling • Compute molecular structure from laws of physics and chemistry alone • Ideal solution (theoretically) • Simulate process of protein folding • Apply minimum energy considerations • Practically nearly impossible • Exceptionally complex calculations • Biophysics understanding incomplete

  14. Ab Initio Methods • Rosetta (Bakers lab, Seattle) • Undertaker (Karplus, UCSC)

  15. Predicting Protein Function PART 2 Tell me what you do and I will tell you what you are

  16. Inferring protein function : • Based on the existence of known protein domains • Based on homology

  17. Protein Domains • Domains can be considered as building blocks of proteins. • Some domains can be found in many proteins with different functions, while others are only found in proteins with a certain function. • The presence of a particular domain can be indicative of the function of the protein.

  18. DNA Binding domainZinc-Finger

  19. Protein Domain can be defined by : • A motif • A profile (PSSM) • A Hidden Markov Model

  20. MOTIF Rxx(F,Y,W)(R,K)SAQ

  21. Profile Scoring

  22. PROSITE • ProSite is a database of protein domains that can be searched by either regular expression patterns or sequence profiles. • Zinc_Finger_C2H2 • Cx{2,4}Cx3(L,I,V,M,F,Y,W,C)x8Hx{3,5}H

  23. Profile HMM (Hidden Markov Model) HMM is a probabilistic model of the MSA consisting of a number of interconnected states D16 D17 D18 D19 100% delete 100% 16 17 18 19 50% M16 M17 M18 M19 D R T R D R T S S - - S S P T R D R T R D P T S D - - S D - - S D - - S D - - R 100% 100% 50% Match D 0.8 S 0.2 P 0.4 R 0.6 R 0.4 S 0.6 T 1.0 I16 I17 I18 I19 insert X X X X

  24. Pfam • Database that contains a large collection of multiple sequence alignments and • Profile hidden Markov Models (HMMs). • The Pfam database is based on two distinct classes of alignments • Seed alignments which are deemed to be accurate and used to produce Pfam A • Alignments derived by automatic clustering of SwissProt, which are less reliable and give rise to Pfam B • High-quality seed alignments are used to build HMMs to which sequences are aligned

  25. InterPro • Was built from protein • classification databases, such as: • PROSITE • ProDom • SMART • Pfam • PRINTS Uses UniProt = SWISSPROT and TrEMBL

  26. Database and Tools for protein families and domains • InterPro - Integrated Resources of Proteins Domains and Functional Sites • Prosite – A dadabase of protein families and domain • BLOCKS - BLOCKS db • Pfam - Protein families db (HMM derived) • PRINTS - Protein Motif fingerprint db • ProDom - Protein domain db (Automatically generated) • PROTOMAP - An automatic hierarchical classification of Swiss-Prot proteins • SBASE - SBASE domain db • SMART - Simple Modular Architecture Research Tool • TIGRFAMs - TIGR protein families db

  27. Inferring protein function based on sequence homology

  28. Clusters of Orthologous Groups of proteins (COGs) Classification of conserved genes according to their homologous relationships. (Koonin et al., NAR) Homologs -Proteins with a common evolutionary origin Orthologs - Proteins from different species that evolved by vertical descent (speciation). Paralogs - Proteins encoded within a given species that arose from one or more gene duplication events.

  29. Clusters of Orthologous Groups of proteins (COGs) Each COG consists of individual orthologous proteins or orthologous sets of paralogs from at least three lineages. Orthologs typically have the same function, allowing transfer of functional information from one member to an entire COG.

  30. COGS - Clusters of orthologous groups * All-against-all sequence comparison of the proteins encoded in completed genomes (paralogs/orthologs) * For a given protein “a” in genome A, if there are several similar proteins in genome B, the most similar one is selected * If when using the protein “b” as a query, protein “a” in genome A is selected as the best hit “a” and “b” can be included in a COG * Proteins in a COG are more similar to other proteins in the COG than to any other protein in the compared genomes * A COG is defined when it includes at least three homologous proteins from three distant genomes

More Related