130 likes | 238 Views
This guide provides an overview of using BioMart for data mining within the Ensembl databases. It details how to filter and extract data related to human genes, including attributes like HGNC IDs, chromosome locations, and genomic sequences without requiring programming skills. Users can define datasets, specify filters, and select attributes to produce customizable tables or sequences in various formats such as Excel, CSV, GFF, and FASTA. Explore relationships among genes and their implications in human disease.
E N D
Data Mining in Ensembl with BioMart Giulietta Spudich EBI, 2007
BioMart http://www.biomart.org/biomart/martview http://www.ensembl.org/biomart/martview • Or click on ‘BioMart’ from Ensembl
BioMart- Data mining • BioMart filters the data in the Ensembl databases, combines multiple terms and puts them into a table format. • Such as: human genes (HGNC IDs), chromosome and base pair position • No programming required!
General or Specific Data-Tables • All the genes for one species • Or… only genes on one specific region of a chromosome • Or… only genes on one specific region of a chromosome that have homologues
Web Interface Dataset Filters: Define the gene set Attributes: Output information Three main stages: Dataset, Filters and Attributes.
Results Tables or sequences
Export tables as… • Microsoft Excel (xls) • Text (csv, tsv) • HTML • GFF • XML Or export sequences in FASTA format
FASTA sequences • Gene (unspliced) • Transcript (cDNA) • Translation (coding) • UTR (5’ or 3’) • Flanking sequence
BioMart – Other Installations Find more at www.biomart.org
The Flow • Choose Dataset (All genes for a species) • Choose Filters (narrows the gene set) • Choose Attributes (output options)
Query: • The human gene encoding Glucose-6-phosphate dehydrogenase (G6PD) is located on chromosome X in cytogenetic band q28. What other genes related to human disease locate to the same region? Do they have Interpro domains? Filters: what we know Attributes: what we want to know.
Query: • The human gene encoding Glucose-6-phosphate dehydrogenase (G6PD) is located on chromosome X in cytogenetic band q28. What other genes related to human disease locate to the same region? Do they have Interpro domains? Filters: what we know Attributes: what we want to know.
BioMart team • Arek Kasprzyk • Syed Haider • Richard Holland • Damian Smedley