1 / 33

Data Mining in Ensembl with BioMart

Data Mining in Ensembl with BioMart. Giulietta Spudich. Simple Text-based Search Engine. ‘Mouse Gene’ Gives Us Results. A More Complex Query is Not as Useful. BioMart- Data mining. BioMart is a search engine that can find multiple terms and put them into a table format.

jhalbert
Download Presentation

Data Mining in Ensembl with BioMart

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Mining in Ensembl with BioMart Giulietta Spudich

  2. Simple Text-based Search Engine

  3. ‘Mouse Gene’ Gives Us Results

  4. A More Complex Query is Not as Useful

  5. BioMart- Data mining • BioMart is a search engine that can find multiple terms and put them into a table format. • Such as: mouse gene (IDs), chromosome and base pair position • No programming required!

  6. General or Specific Data-Tables • All the genes for one species • Or… only genes on one specific region of a chromosome • Or… genes on one region of a chromosome associated with a disease

  7. The First Step: Choose the Dataset

  8. The Second Step: Filters Filters define which genes we are looking at.

  9. Attributes attach information Determine output columns with Attributes.

  10. Results Tables or sequences

  11. Query: • For all mouse genes on chromosome 10 that are protein coding, I would like to know the IDsin both Ensembl and MGI. Are there Illumina probes and GO IDs for these genes? • In the query: Filters: what we know Attributes: what we want to know.

  12. Query: • For all mouse genes on chromosome 10 that are protein coding, I would like to know the IDs in both Ensembl and MGI. Are there Illumina probes and GO IDs for these genes? • In the query: Filters: what we know Attributes: what we want to know.

  13. Query: • For all mouse genes on chromosome 10 that are protein coding, I would like to know the IDs in both Ensembl and MGI. Are there Illumina probes and GO IDs for these genes? • In the query: Filters: what we know Attributes: what we want to know.

  14. A Brief Example Change dataset to mouse Mus musculus

  15. Select the genes with Filters Expand the ‘REGION’ panel. Click Filters. We are looking for mouse genes on chromosome 10 that are protein coding.

  16. Filters (selecting the genes) Change this to chromosome 10

  17. Filters (selecting the genes) Select ‘protein coding’ in the ‘GENE’ section. Click on ‘Attributes’

  18. Attributes (Output Options) Expand the ‘EXTERNAL’ panel for non-Ensembl IDs. We would like GO terms and IDs in MGI (the Mouse Genome Informatics site).

  19. Attributes (Output) Click ‘Results’ Scroll down to add ‘Illumina v1’ probes that map to these genes.

  20. The Results Table - Preview For the full result table: click ‘Go’ or View ‘ALL’ rows. ‘Results’ shows Gene IDs, GO terms, and Illumina probes for all protein coding mouse genes on chromosome 10.

  21. Full Result Table Ensembl Gene and Transcript IDs GO terms MGI symbol Illumina probes

  22. Original Query: • For all mouse genes on chromosome 10 that are protein coding, I would like to know the IDs in both Ensembl and MGI. Are there Illumina probes and GO IDs for these genes? • In the query: Filters: what we know Attributes: columns in the Result Table

  23. Other Export Options (Attributes) • Sequences: UTRs, flanking sequences, cDNA and peptides, etc • Gene IDs from Ensembl and external sources (MGI, Entrez, etc) • Microarray data • Protein Functions/descriptions (Interpro, GO) • Orthologous gene sets • SNP/ Variation Data

  24. BioMart Data Sets • Ensembl genes • Vega genes • SNPs • Compara (homologues and alignments)

  25. BioMart around the world… BioMart started at Ensembl…To where has it travelled?

  26. Central Server www.biomart.org

  27. WormBase

  28. Population frequencies Inter- population comparisons Gene annotation HapMap

  29. DictyBase

  30. GRAMENE Rice, Maize, Arabidopsis genomes…

  31. How to Get There http://www.biomart.org/biomart/martview http://www.ensembl.org/biomart/martview • Or click on ‘BioMart’ from Ensembl

  32. The Flow • Choose Dataset (All genes for a species) • Choose Filters (narrows the gene set) • Choose Attributes (output options)

  33. BioMart team • Arek Kasprzyk • Syed Haider • Richard Holland • Damian Smedley

More Related