1 / 19

Data retrieval

Data retrieval. BioMart. Export View. Data sets on ftp site MySQL queries of databases Perl API access to databases. ExportView. Data Mining in Ensembl with EnsMart. August 2005. Possible queries…. All genes from a candidate region Genes with a particular protein domain

frostk
Download Presentation

Data retrieval

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data retrieval BioMart Export View Data sets on ftp site MySQL queries of databases Perl API access to databases

  2. ExportView

  3. Data Mining in Ensembl with EnsMart August 2005

  4. Possible queries… • All genes from a candidate region • Genes with a particular protein domain • Members of a protein family • Genes associated with SNPs

  5. More specific queries • Human genes with upstream regions conserved w.r.t. mouse • Upstream sequence for all Ensembl genes mapped to U95A chip (similarly, complete genomic annotation of MG_U74). • Genomic location and description of all mouse, rat and fugu homologues of all human genes, with transmembrane domains, expressed in cardiovascular system and have non-synonymous SNPs.

  6. Ensembl core database • Normalised • Each data point stored only once • Quick updates • Minimal storage requirements • But: • Many tables • Many joins for complicated queries • Slow for data mining questions

  7. BioMart and EnsMart • Large-scale data retrieval tool • Query builder interface • Databases: Ensembl, SNP, Vega, (MSD, UniProt) • Associated features or sequences • Flexible output formats • http://www.ebi.ac.uk/biomart/ • http://www.ensembl.org/EnsMart/

  8. Mart database • De-normalised • Tables with ‘redundant’ information • Query-optimised • Fast and flexible • designed for data mining

  9. Primary Data Sets • Ensembl genes • SNP • Single nucleotide polymorphisms • Deletion-insertion polymorphisms • Short tandem repeats • Vega genes • (MSD protein structures) • (UniProt proteomes)

  10. Secondary Data Sets • Markers • Diseases • Gene ontology • Gene expression information • Homology predictions • Protein annotation

  11. SPECIES FOCUS REGION GENE EXPRESSION HOMOLOGY PROTEIN SNP REGION REFSEQ FASTA EMBL GENE GTF EXPRESSION HTML AFFY SWISSPROT TEXT HOMOLOGY PROTEIN EXCEL GO SNP INTERPRO FILE Information flow start filter output

  12. BioMart http://www.biomart.org/

  13. BioMart - Features

  14. BioMart - Sequences

  15. HTML Output formats

  16. What about queries not possible to do in EnsMart • Direct database access at ensembldb.ensembl.org • martdb.ebi.ac.uk • MySQL client Download MySQL for Windows http://www.winmysql.com/page4.html File: wmysr11.zip

  17. Access via Perl object API • Based on bioperl • Ensembl modules • For an introduction, see the tutorial at: • http://www.ensembl.org/info/software/core/

  18. There are other ways… MartShell Commandline interface to Mart written in Java. It works with a Mart Query Language

  19. MartExplorer

More Related