1 / 23

Automate Function Prediction

Automate Function Prediction. Outline. Goal How function is defined Why Gene Ontology Methods for protein function prediction End points. GOAL. A) You find a new protein B) You sequence the whole genome of your favorite organism Obtained gene (s) should be annotated

kalkin
Download Presentation

Automate Function Prediction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. AutomateFunctionPrediction

  2. Outline • Goal • Howfunction is defined • WhyGeneOntology • Methods for proteinfunctionprediction • Endpoints

  3. GOAL • A) Youfind a new protein • B) Yousequence the wholegenome of yourfavoriteorganism • Obtainedgene(s) shouldbeannotated • A canbesolvedmanually. B needsautomatictools

  4. Howfunction is defined • Functionaldescription as text • Linkinggene to Key Words (Uniprot) • LinkinggeneGeneOntology • Linkinggene to SignallingPathwaysorBiochemicalPathways (KEGG)

  5. WhyGeneOntology (GO) • GO represents a popularstandardcurrently in the geneannotation • GO representscategoriesthatrepresentgenefunction • Creates an union for genes in sameprocess • Easysummary for geneswithsimilarfunction

  6. WhyGeneOntology (GO) • 3 sub-parts: BiologicalProcess, MolecularFunction, CellularLocalization • MolecularFunction => chemicalactivity • BiologicalProcess => Biology, cellularprocess • Cellularlocalization => Location of gene • Hierarchicalstructure • Categorieswithveryprecisefunction • Categorieswithlessprecisefunction • Categorieswithverybroadfunction

  7. How GO helps • Enduser: Summarycategories for geneswithvariousfunctions • Computer programs: Classifieralgorithmscanbetaught to predict the categories for genes

  8. Understanding GO • Amigoserver(http://amigo.geneontology.org/cgi-bin/amigo/go.cgi)

  9. FunctionPrediction: Whatcanweuse to predictfunction • Sequencehomology (BLAST resultlist) • Phylogenetictree of sequences • ProteinDomains (PFAM domains) • Short sequencepatterns – motifs • Sequencefeatures (sec. struct., lowcompl. regions)

  10. SequenceHomologyMethods • Do a BLAST searchwith a querysequence • Collect GO classes for genes in the BLAST resulthit • Give a weight to each BLAST hit • oftenlog(E-value) • Combine the scoresfrom the genesthatbelong to same GO class • Report the top best / significant GO classes

  11. SequenceHomologyMethods • Simplemethods • Programs • BLAST2GO (http://www.blast2go.com/b2ghome) • GOTCHA (http://www.compbio.dundee.ac.uk/gotcha/gotcha.php) • ARGOT(http://www.medcomp.medicina.unipd.it/Argot2/form.php) • PFP (http://kiharalab.org/web/pfp.php)

  12. Phylogenetictreemethods • Create the pair-wisedistancesfor the set of genes • Do a hierarchicalclustering of genes • Map the know GO functions to clustertree • Look for unknowngenes in a clusterwithmanygenesfrom the same GO class • Report the top best / significant GO classes • More => http://genome.cshlp.org/content/8/3/163.full

  13. Phylogenetictreemethods • Theseshouldoutperformsequencehomologymethods (CAFA 2011?) • Require a set of relatedgenes • Oftenmuchheaviercalculations • Programs: • Sifter(http://genome.cshlp.org/content/early/2011/07/22/gr.104687.109)

  14. PredictionwithProteindomains • Look whatproteindomainsthereare in queryprotein (PFAM) • Map the functionsthatarelinked to domains to yourquerysequence • PFAM2GO • Programs: InterProScan + PFAM2GO • Drawbacks: • Thismapping is same in plant, mammal, bacteria • Manydomains to specificfunction

  15. PredictionwithProteindomains • Benefits: • Cancreateannotationfromseparatedomains • Similarseq:sdonothave to be in database • Programs (?): InterProScan(http://www.ebi.ac.uk/InterProScan/) • Drawbacks: • The mapping is same in plant, mammal, bacteria • Manydomains to specificfunction

  16. Predictionwithpatterns and motifs • Sameprinciple as before, butwe look sequencepatterns and motifs • Map the functionsthatarelinked to patterns to yourquerysequence • Programs: • InterProScan • IBM BioDictionary(http://cbcsrv.watson.ibm.com/Tpa.html) • Drawbacks and benefits appr. same as before

  17. Predictionwithsequencefeatures • Againsameprinciple as before • We look seq. features(seepict.) • Thesearegiven as an input to classifieralgorithm (SupportVector Machine)

  18. Predictionwithsequencefeatures

  19. Predictionwithsequencefeatures • Benefits: • No actualseq. similarityneeded • Info collectedfromvaguesimilarities • Use of classifier => feature weighting • Program: FFPred(http://bioinf.cs.ucl.ac.uk/ffpred/) • Drawbacks: • Calculationsprobablyquite heavy • No use of nearbysequencesimilarities (domains etc.)

  20. Ourcontribution: PANNZER • Use BLAST resultlist • AddTaxonomicinformation • Score GO classesusing a scorethattakes the frequency of GO class in seq. DB into account • Method is used to predict: • GO Classes • Descriptionline

  21. Ourcontribution: PANNZER • Benefits: • Taking the speciestaxonomy into account • Improveduse of statistics • Notpublicyet

  22. Ourcontribution: No NameYet • Take PFAM domainpredictions, BLAST similarities andTaxonomicinformation • Feedthis to feature selection and to classifieralgorithm • …Wait… • Method is used to predictGO-classes • Notpublic + testing is ongoing

  23. Conclusion • Thesemethodsincreasinglyneeded • Somemethodsexist • Unfortunately no clearevaluation (my opinion) • Remember: Thesearepredictions. No certain info untiltheyaretested in wetlab…

More Related