Data mining with biomart
Download
1 / 35

Data Mining with BioMart - PowerPoint PPT Presentation


  • 138 Views
  • Uploaded on

Data Mining with BioMart. www.ensembl.org/biomart/martview www.biomart.org/biomart/martview. What is BioMart?. A data export tool A quick table generator A web interface to mine Ensembl data. BioMart- Data mining.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Data Mining with BioMart' - brigid


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Data mining with biomart

Data Mining with BioMart

www.ensembl.org/biomart/martview

www.biomart.org/biomart/martview


What is biomart
What is BioMart?

  • A data export tool

  • A quick table generator

  • A web interface to mine Ensembl data


Biomart data mining
BioMart- Data mining

  • BioMart is a search engine that can find multiple terms and put them into a table format.

  • Such as: mouse gene (IDs), chromosome and base pair position

  • No programming required!


General or specific data tables
General or Specific Data-Tables

  • All the genes for one species

  • Or… only genes on one specific region of a chromosome

  • Or… make BioMart select genes

    (I.e. all transcripts that match a microarry probe set, GO term, or InterPro domain).


Results
Results

Tables or sequences


The first step choose the dataset
The First Step: Choose the Dataset

Dataset: Current Ensembl, Human genes


The second step filters
The Second Step: Filters

Filters: Define a gene set


Attributes attach information
Attributes attach information

Attributes: Determine output columns


Query
Query

For the human CFTR gene, export the Entrez Gene ID(s) and matching Affy HG U133-PLUS-2 probeset(s)


Query1
Query:

For the human CFTR gene, export the Entrez Gene ID(s) and matching Affy HG U133-PLUS-2 probeset(s)

  • In the query:

    Filters: what we know

    Attributes: what we want to know.


Query2
Query:

For the human CFTR gene, export the Entrez Gene ID(s) and matching Affy HG U133-PLUS-2 probeset(s)

  • In the query:

    Filters: what we know

    Attributes: what we want to know.


Query3
Query:

For the human CFTR gene, export the Entrez Gene ID(s) and matching Affy HG U133-PLUS-2 probeset(s)

  • In the query:

    Filters: what we know

    Attributes: what we want to know


A brief example
A Brief Example

Use the current Ensembl (archives are also available)

Select

Homo sapiens genes


Select the genes with filters
Select the Genes with Filters

Click

Filters

Expand the ‘GENE’ panel.

Expand the GENE panel to enter in the gene ID(s).


Filters and count
Filters (and Count)

Change this to HGNC curated name. Enter “CFTR” in the box.

Click “Count” to see if genes passed through your filters.


Attributes output options
Attributes (Output Options)

Click on ‘Attributes’

‘Attributes’ allows you to output information.


Attributes output options1
Attributes (Output Options)

Select ‘EntrezGene ID’


Attributes output options2
Attributes (Output Options)

Select the Affy Platform ‘HG U133-PLUS-2’ in the ‘Microarray’ section


The results table preview
The Results Table - Preview

For the full result table: click “Go” or View “ALL” rows.


Full result table
Full Result Table

Ensembl Transcript IDs

Affy HG probeset

EntrezGene ID

Ensembl Gene ID for CFTR


Other export options attributes
Other Export Options (Attributes)

  • Sequences: UTRs, flanking sequences, cDNA and peptides, etc

  • Gene IDs from Ensembl and external sources (MGI, Entrez, etc)

  • Microarray data

  • Protein Functions/descriptions (Interpro, GO)

  • Orthologous gene sets

  • SNP/ Variation Data


Biomart around the world
BioMart around the world…

BioMart started at Ensembl…To where has it travelled?


Central portal
Central Portal

www.biomart.org



Hapmap

Population frequencies

Inter- population comparisons

Gene annotation

HapMap



Gramene
GRAMENE

www.gramene.org



How to get there
How to Get There

http://www.biomart.org/biomart/martview

http://www.ensembl.org/biomart/martview

  • Or click on ‘BioMart’ from Ensembl


Worked example
Worked Example

  • Follow the worked example on pg 26

  • Then, do the exercises on pg 34 (answers on pg 37)

    This module should do the following:

  • Show you how to export multiple data types from Ensembl for gene IDs or chromosomal regions.


Ensembl core databases
Ensembl Core Databases

Relational Database

Normalised

Each data point stored only once

Therefore:

Quick updates

Minimal storage requirements

But:

Many tables

Many joins for complicated queries

Slow for data mining applications



Biomart database
BioMart Database

Data warehouse

De-normalised

Query-optimised

Therefore:

Fast and flexible

Ideal for data mining

But:

Tables with apparent “redundancy”

Needs rebuilding from scratch for every release from normalised core databases



Information flow
Information Flow

REGION

REGION

GENE

GENE

EXPRESSION

EXPRESSION

HOMOLOGY

HOMOLOGY

PROTEIN

PROTEIN

SNP

SNP

SPECIES

FOCUS

SWISSPROT

FASTA

EMBL

GTF

REFSEQ

HTML

GO

TEXT

INTERPRO

EXCEL

AFFYMETRIX

FILE

DATASET

FILTER

ATTRIBUTES


ad