1 / 18

Automatic methods for functional annotation of sequences

Automatic methods for functional annotation of sequences. Petri Törönen. What , Why , How ???. Functional annotation of sequence ( seq .) Definition of description line Mapping seq . to functional categories Simple solutions are error-sensitive

keefer
Download Presentation

Automatic methods for functional annotation of sequences

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Automaticmethods for functionalannotation of sequences Petri Törönen

  2. What, Why, How??? • Functionalannotation of sequence (seq.) • Definition of descriptionline • Mappingseq. to functionalcategories • Simplesolutionsareerror-sensitive • Reviewsomeavailabletools in the exercises

  3. Old, simpleway • Do a SequenceSearch (SS), like BLAST, withyoursequence • Find the bestmatch • Transferall the info from the bestmatch to yoursequence • Everythingdone? Finished?

  4. Problems • Firsthit is unknownseq. • Firsthit is misannotatedseq. • an increasingproblem!! • No significantmatchesfound • Strong, butonlylocalmatches => impurities in search • Inpurities in queryseq.

  5. Why manual analysis is hard? • Largesize of genelists (SS resultlist) • Falsepositivesamongobservedresults

  6. Why manual analysis is hard? • Eachgenecanhavemultiplefunctions -theimportant common themeamong the genescangoeasilyunnoticed. • Requiresdetailedknowledgeof genes • varyingrepresentations for samefunction in descriptionlines • Objectivity

  7. Gene Ontology (GO) www.geneontology.org • A controlled vocabulary of gene product roles in cells and the role associations • The roles can be applied to all organisms • Three main hierarchies: biological process, cellular component and molecular function include currently about 19,000 classes (=roles) • -usuallyonly a smallportion of theseclasses is in usewithoneorganism (example: chloroplastsrelatedfunctionsareimportantonlywithinplants)

  8. Structure of GO root of hierarchical structure GO graph: • Hierarchicalstructure of linkednodes -eachnodepresentsoneclassthat is part of itsparentalclass • Direct Acylic Graph (DAG) -a tree-structurewherebranchescanalsomergewhengoingfromparentalnodes to childnodes. Genescanbelinked to manyclasses in the GO structure Less detailedclasses Moredetailedclasses Starting node

  9. How GO helps • GO presents a terminology for presentation of knowninformation of the gene • GO classifiesgenesaccording to theirknown/predictedfunctions • Classesrepresentvaryingdetail • Classificationscanbeused to findover-representedfunctions in the results

  10. How GO helps • Look over-represented GO classesfrom the genelist wewouldlike to ask: what is the probability of observing the number of classmemberslikewehave in the clusterbyrandom? Solutionfrom the statistics is the samplingwithoutreplacement Sampling w/o replacements answers to: How many ways there are to select 8 balls so that two of them are white and rest are black from the whole data?

  11. Methodsthatpredictproteinfunction • Methodsthatsummarize the SS resultlist • Methodsthatuseprofilesearches • Methodsthatusesequencefeatures • Methodsbased on sequencepatterns • Methodsbased on sequencephylogeny

  12. SS listsummarization • Consensusanalysis of SS list • Do the SS • Look repetitivelyoccuringdescriptions /GO classes • Over-representation of GO classes (BLAST2GO) • Toolsperformingthis: • Ourmethod PANNZER (Koskinen et al. unpubl.) • BLAST2GO (http://www.blast2go.org/start_blast2go) • ConFunc

  13. Profilesearchmethods • Useprofilesearchesinstead of SS • Somepositionsaremoreconserved in the seq. • PFAM http://pfam.sanger.ac.uk/ • ConFunchttp://www.sbg.bio.ic.ac.uk/~confunc/

  14. ConFunc in detail • BLAST searchwithqueryseq. • Obtain a resultlist • Seq:s in resultlistareclustered to seq:swithsimilarfunction (same GO classes) • Eachcluster is used as a seed for a profilesearch • Testhowwell the queryseqmatches to eachprofile • Uselink: http://www.sbg.bio.ic.ac.uk/confunc/indextemp.cgi

  15. Sequence feature methods • Look for sequencefeatures Features: Secondarystructure, proteindomains • Comparesequencesbylookingwhichfeaturestheyhave in common • Methodsthatdothis: FACT http://www.cibiv.at/FACT/ • Limited searchpossibilitieswith FACT

  16. Sequencepatternmethods • Pattern => frequentlyobservedshortmotiffromseq. DB • InterProScan • BioDictionaryfrom IBM ComputationalBiology(http://cbcsrv.watson.ibm.com/Tpa.html) • Extraction of most of the patternsfromswissprot • Linking of eachpattern to keywords, seen in the seq:swherepatternwas • Queryseq. is linked to keywords via patternsithas

  17. Phylogenybasedmethods • Shortly: Include the speciestree to the annotation of the sequences. • Evolutionarydistance is taken into account • Comparafrom ENSEMBL • http://www.ebi.ac.uk/GOA/compara_go_annotations.html

  18. Tip for testing the tools • For testingwithpurelyrandomsequence • http://www.bioinformatics.org/sms2/random_protein.html • For testingpartiallyrandomsequence • http://www.bioinformatics.org/sms2/mutate_protein.html

More Related