Data mining with biomart
This presentation is the property of its rightful owner.
Sponsored Links
1 / 35

Data Mining with BioMart PowerPoint PPT Presentation


  • 101 Views
  • Uploaded on
  • Presentation posted in: General

Data Mining with BioMart. www.ensembl.org/biomart/martview www.biomart.org/biomart/martview. What is BioMart?. A data export tool A quick table generator A web interface to mine Ensembl data. BioMart- Data mining.

Download Presentation

Data Mining with BioMart

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Data mining with biomart

Data Mining with BioMart

www.ensembl.org/biomart/martview

www.biomart.org/biomart/martview


What is biomart

What is BioMart?

  • A data export tool

  • A quick table generator

  • A web interface to mine Ensembl data


Biomart data mining

BioMart- Data mining

  • BioMart is a search engine that can find multiple terms and put them into a table format.

  • Such as: mouse gene (IDs), chromosome and base pair position

  • No programming required!


General or specific data tables

General or Specific Data-Tables

  • All the genes for one species

  • Or… only genes on one specific region of a chromosome

  • Or… make BioMart select genes

    (I.e. all transcripts that match a microarry probe set, GO term, or InterPro domain).


Results

Results

Tables or sequences


The first step choose the dataset

The First Step: Choose the Dataset

Dataset: Current Ensembl, Human genes


The second step filters

The Second Step: Filters

Filters: Define a gene set


Attributes attach information

Attributes attach information

Attributes: Determine output columns


Query

Query

For the human CFTR gene, export the Entrez Gene ID(s) and matching Affy HG U133-PLUS-2 probeset(s)


Query1

Query:

For the human CFTR gene, export the Entrez Gene ID(s) and matching Affy HG U133-PLUS-2 probeset(s)

  • In the query:

    Filters: what we know

    Attributes: what we want to know.


Query2

Query:

For the human CFTR gene, export the Entrez Gene ID(s) and matching Affy HG U133-PLUS-2 probeset(s)

  • In the query:

    Filters: what we know

    Attributes: what we want to know.


Query3

Query:

For the human CFTR gene, export the Entrez Gene ID(s) and matching Affy HG U133-PLUS-2 probeset(s)

  • In the query:

    Filters: what we know

    Attributes: what we want to know


A brief example

A Brief Example

Use the current Ensembl (archives are also available)

Select

Homo sapiens genes


Select the genes with filters

Select the Genes with Filters

Click

Filters

Expand the ‘GENE’ panel.

Expand the GENE panel to enter in the gene ID(s).


Filters and count

Filters (and Count)

Change this to HGNC curated name. Enter “CFTR” in the box.

Click “Count” to see if genes passed through your filters.


Attributes output options

Attributes (Output Options)

Click on ‘Attributes’

‘Attributes’ allows you to output information.


Attributes output options1

Attributes (Output Options)

Select ‘EntrezGene ID’


Attributes output options2

Attributes (Output Options)

Select the Affy Platform ‘HG U133-PLUS-2’ in the ‘Microarray’ section


The results table preview

The Results Table - Preview

For the full result table: click “Go” or View “ALL” rows.


Full result table

Full Result Table

Ensembl Transcript IDs

Affy HG probeset

EntrezGene ID

Ensembl Gene ID for CFTR


Other export options attributes

Other Export Options (Attributes)

  • Sequences: UTRs, flanking sequences, cDNA and peptides, etc

  • Gene IDs from Ensembl and external sources (MGI, Entrez, etc)

  • Microarray data

  • Protein Functions/descriptions (Interpro, GO)

  • Orthologous gene sets

  • SNP/ Variation Data


Biomart around the world

BioMart around the world…

BioMart started at Ensembl…To where has it travelled?


Central portal

Central Portal

www.biomart.org


Wormbase

WormBase


Hapmap

Population frequencies

Inter- population comparisons

Gene annotation

HapMap


Data mining with biomart

DictyBase


Gramene

GRAMENE

www.gramene.org


The potato center

The Potato Center


How to get there

How to Get There

http://www.biomart.org/biomart/martview

http://www.ensembl.org/biomart/martview

  • Or click on ‘BioMart’ from Ensembl


Worked example

Worked Example

  • Follow the worked example on pg 26

  • Then, do the exercises on pg 34 (answers on pg 37)

    This module should do the following:

  • Show you how to export multiple data types from Ensembl for gene IDs or chromosomal regions.


Ensembl core databases

Ensembl Core Databases

Relational Database

Normalised

Each data point stored only once

Therefore:

Quick updates

Minimal storage requirements

But:

Many tables

Many joins for complicated queries

Slow for data mining applications


Normalised schema

Normalised Schema


Biomart database

BioMart Database

Data warehouse

De-normalised

Query-optimised

Therefore:

Fast and flexible

Ideal for data mining

But:

Tables with apparent “redundancy”

Needs rebuilding from scratch for every release from normalised core databases


De normalised schema

De-Normalised Schema


Information flow

Information Flow

REGION

REGION

GENE

GENE

EXPRESSION

EXPRESSION

HOMOLOGY

HOMOLOGY

PROTEIN

PROTEIN

SNP

SNP

SPECIES

FOCUS

SWISSPROT

FASTA

EMBL

GTF

REFSEQ

HTML

GO

TEXT

INTERPRO

EXCEL

AFFYMETRIX

FILE

DATASET

FILTER

ATTRIBUTES


  • Login