adding go for large datasets n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Adding GO for Large Datasets PowerPoint Presentation
Download Presentation
Adding GO for Large Datasets

Loading in 2 Seconds...

play fullscreen
1 / 13

Adding GO for Large Datasets - PowerPoint PPT Presentation


  • 174 Views
  • Uploaded on

Adding GO for Large Datasets. COST Functional Modeling Workshop 22-24 April, Helsinki. Large Datasets. RNASeq data sets and etc.: large data sets often there is little functional information available m any enrichment analysis tools will not accept large gene lists

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Adding GO for Large Datasets' - latham


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
adding go for large datasets

Adding GO for Large Datasets

COST Functional Modeling Workshop

22-24 April, Helsinki

large datasets
Large Datasets
  • RNASeq data sets and etc.:
  • large data sets
  • often there is little functional information available
  • many enrichment analysis tools will not accept large gene lists
  • RNASeq data sets also contain “novel” genes
1 finding existing go
1. Finding Existing GO
  • Use GOProfiler to search based upon taxon or name.
  • Check the GO Consortium Website to see if your species of interest has an active annotation effort.
    • or to determine which relate species may have GO annotations that can be transferred
  • Use QuickGO or GOProfiler to download existing GO annotations.
  • Add your own GO annotations…
2 adding high throughput go
2. Adding High-throughput GO

ntfastafile

aafastafile

EMBOSS Transeq (or etc)

list of motifs and domains

species’ taxon ID

InterProScan

GO association file (IEA, ND)

InterPro2GO

BLAST database of EXP GO annotations for related species

GOanna/

Blast2GO,

etc

GO association file (ISA)

combine to make single GO annotation file

Note: AgBase & iPlant are working to make these tools freely available via the AgBase & iPlant websites.

comments
Comments
  • Translating transcripts to proteins:
  • many different programs
  • most assume proteins > 100aa
  • assume that proteins is translated from longest ORF
  • EMBOSS – free and high-throughput; also available on Galaxy, iPlant
  • InterProScan:
  • searches sequences for conserved domains and motifs
  • very intensive computing (needs HPC)
  • Online tools at EBI – limited to proteins, low throughput
  • iPlant – is preparing an instance
  • AgBase – can help
  • InterPro2GO
  • Script that converts InterPro IDs into their corresponding GO IDs
  • Available at geneontology.org
comments1
Comments
  • Adding GO using Blast:
  • Need to identify related species that have experimental GO
  • Search database of experimental GO (should not transfer annotations with IEA, ISS, etc evidence codes)
  • Use a test set of sequences to identify Blast parameters (e.g. Evalues, expect, etc.) for the full dataset
  • Combining GO from InterProScan & Blast:
  • Remove any duplicate annotations derived from InterProScan (IEA) and Blast (ISA).
  • Remove any “no data” (ND) annotations where you have added an annotation using Blast.

Note: GO IEA annotations are continually updated (by manual review) and are considered out of date after one year.