1 / 13

Functional Annotation of Proteins via the CAFA Challenge

Functional Annotation of Proteins via the CAFA Challenge. Lee Tien Duncan Renfrow-Symon Shilpa Nadimpalli Mengfei Cao COMP150PBT | Fall 2010. What ’ s the problem?. Huge bottleneck = finding a protein ’ s function when given a protein sequence

yair
Download Presentation

Functional Annotation of Proteins via the CAFA Challenge

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Functional Annotation of Proteins via the CAFA Challenge Lee Tien Duncan Renfrow-Symon Shilpa Nadimpalli Mengfei Cao COMP150PBT | Fall 2010

  2. What’s the problem? • Huge bottleneck = finding a protein’s function when given a protein sequence • Incomplete, inaccurate, or inconsistent annotations are difficult to work with and can propagate • No good way to measure the accuracy of an annotation predictor

  3. What is the CAFA Challenge?

  4. What are Gene Ontology (GO) terms? • GO = controlled vocabulary of “gene ontologies” • Cover three domains: • Cellular component • Molecular function • Biological process • Hierarchy: • Broad/general (e.g. “catalytic activity”) • Specific (e.g. “leukotriene-C4-synthase activity”)

  5. Outline of Our Approach Other Secondary Structure Predictor? Betawrap Pro? CAFA targets (FASTA sequences) GO ids for each CAFA target SMURF? BLAST PFAM

  6. Pfam: Protein Family Database • Collection of protein families represented by: • Multiple sequence alignments • Hidden Markov Models • Two sections of Pfam: • A: high-quality, manually-curated • B: large, automatically-generated Sample Multiple Sequence Alignment Sample Hidden Markov Model

  7. BLAST: Basic Local Align’t Search Tool • Goal: find homologous (i.e. derived from a common ancester) sequences from a database • Various BLAST programs: • blastp = query: protein, database: protein • blastn = query: nucleotide, database: nucleotide • blastx = query: translated nucleotide, database: protein • tblastn = query: protein, database: translated nucleotide • tblastx = query: translated nucleotide, database: translated nucleotide

  8. SMURF: Structural Motifs Using Random Fields • Determines whether a protein sequence contains one of the following super secondary structures: • 6-bladed propeller • 7-bladed propeller • 8-bladed propeller • Double blades (i.e. 6-6, 6-7,6-8…) • Developed at Tufts! • Some propeller functions: • Often WD40 repeat –protein-protein interaction • Signaling, transcription, cell cycle Smurf! 7-bladed propeller

  9. Final Database Structure INPUT MAPPING OUTPUT RESULTS

  10. Final Results Statistics Of 8,904 unknown sequences… 4,265 had at least one hit in PDB BLAST 4,824 had at least one hit in Pfam 104 had at least one hit in SMURF 789 3,445 12 19 In total, 5,694 unique sequences had at least one hit, a 63.9% success 4 1,356 69 Distribution of sequence hits by method

  11. Example Result T38114 MDLDMNGGNKRVFQRLGGGSNRPTTDSNQKVCFHWRAGRCNRYPCPYLHRELPGPGSGPVAASSNKRVADESGFAGPSHR RGPGFSGTANNWGRFGGNRTVTKTEKLCKFWVDGNCPYGDKCRYLHCWSKGDSFSLLTQLDGHQKVVTGIALPSGSDKLY TASKDETVRIWDCASGQCTGVLNLGGEVGCIISEGPWLLVGMPNLVKAWNIQNNADLSLNGPVGQVYSLVVGTDLLFAGT QDGSILVWRYNSTTSCFDPAASLLGHTLAVVSLYVGANRLYSGAMDNSIKVWSLDNLQCIQTLTEHTSVVMSLICWDQFL LSCSLDNTVKIWAATEGGNLEVTYTHKEEYGVLALCGVHDAEAKPVLLCSCNDNSLHLYDLPSFTERGKILAKQEIRSIQ IGPGGIFFTGDGSGQVKVWKWSTESTPILS • BLAST: matches with PDB structures 2OVP, 3MKS, 2CNX, 1P22, 1NEX, 3N0E • Transcription, mitosis, methylation, protein binding • Pfam: match to family PF00642 • Zinc ion binding, nucleic acid binding • SMURF: match to 7-bladed β-propeller template • WD domain (protein binding)

  12. Possible Future Directions • Improving functional annotation for β-propellers identified by SMURF • Analyze training set of propeller proteins with known function to build probabilistic model of protein function based on propeller type • Addition of other structural prediction tools for motifs with known function • G-coupled receptors, membrane bound proteins • Expansion of BLAST search to include full nr database

  13. Questions?

More Related