1 / 40

BioMart Query Network

BioMart Query Network. Arek Kasprzyk European Bioinformatics Institute 8 January 2005. Biological databases. Distributed Different format Different focus Different release schedule Scalability factor. BioMart. Retrieval. MartExplorer. MartShell. MartView. JAVA. Perl. BioMart API.

ziarre
Download Presentation

BioMart Query Network

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. BioMart Query Network Arek Kasprzyk European Bioinformatics Institute 8 January 2005

  2. Biological databases • Distributed • Different format • Different focus • Different release schedule • Scalability factor

  3. BioMart

  4. Retrieval MartExplorer MartShell MartView JAVA Perl BioMart API Databases Public data (local or remote) MartBuilder MartEditor myDatabase Vega SNP myMart MSD UniProt Ensembl Schema transformation Configuration XML

  5. MartView

  6. BioMart@Ensembl

  7. MartShell

  8. MartExplorer

  9. Database

  10. PK PK FK FK FK FK FK FK PK PK PK FK FK Schema

  11. PK Schema FK FK FK FK PK FK FK FK FK

  12. PK PK Schema FK FK FK FK

  13. PK1 Schema - ‘reversed star’ FK1 FK1 main1 dm dm FK1 FK2 PK1 FK1 FK2 FK2 PK2 FK1 FK2 dm 2 FK2 PK2 PK1 FK2

  14. A C TA TB B Fixed schema transformation

  15. Schema transformation • Central table • Longest n:1, 1:1 path • Dimension table • Central transformation ‘around’ 1:n table. • Link tables are decomposed into a set of 1:n first

  16. MartBuilder • Input • central object • database meta data • cardinalities • Output • Set of SQL statements: • “create table as select …” • Transformations • represented as asymmetric tree

  17. MartBuilder DATASET: hsapiens_gene_ensembl TYPE MAIN [M] DIMENSION [D] EXIT [E]: M TABLE NAME: gene gene: alt_allele cardinality [11] [n1] [0n] [1n] [SKIP S]: S gene: gene cardinality [11] [n1] [0n] [1n] [SKIP S]: S gene: gene_description cardinality [11] [n1] [0n] [1n] [SKIP S]: 11 gene: gene_stable_id cardinality [11] [n1] [0n] [1n] [SKIP S]: 11 gene: kk__gene__main cardinality [11] [n1] [0n] [1n] [SKIP S]: S gene: transcript cardinality [11] [n1] [0n] [1n] [SKIP S]: S gene: analysis cardinality [11] [n1] [0n] [1n] [SKIP S]: n1 gene: dna cardinality [11] [n1] [0n] [1n] [SKIP S]: S gene: dnac cardinality [11] [n1] [0n] [1n] [SKIP S]: S gene: seq_region cardinality [11] [n1] [0n] [1n] [SKIP S]: S TYPE MAIN [M] DIMENSION [D] EXIT [E]: E ADD EXTENSION: hsapiens_gene_ensembl__gene__MAIN [Y|N]: N CHANGE FINAL TABLE NAME: hsapiens_gene_ensembl__gene__MAIN TO: CREATE TABLE TEMP0 as SELECT gene.gene_id,gene.type,gene.analysis_id,gene.seq_region_id,gene.seq_region_start,gene.seq_region_end,gene.seq_region_strand,gene.display_xref_id,gene_description.gene_id AS gene_id_TEMP0,gene_description.description FROM gene, gene_description WHERE gene_description.gene_id = gene.gene_id; CREATE TABLE hsapiens_gene_ensembl__gene__MAIN as SELECT TEMP0.gene_id,TEMP0.type,TEMP0.analysis_id,TEMP0.seq_region_id,TEMP0.seq_region_start,TEMP0.seq_region_end,TEMP0.seq_region_strand,TEMP0.display_xref_id,TEMP0.gene_id_TEMP0,TEMP0.description,gene_stable_id.gene_id AS gene_id_TEMP1,gene_stable_id.stable_id,gene_stable_id.version FROM TEMP0, gene_stable_id WHERE gene_stable_id.gene_id = TEMP0.gene_id; drop table TEMP0;

  18. Transformation configuration satellog_repeats M repeats disease n1 satellog_repeats M repeats gc 11 satellog_repeats M repeats linkage_depth S satellog_repeats M repeats repeats S satellog_repeats M repeats transcripts S satellog_repeats M repeats ugcount S satellog_repeats M repeats ugstats S satellog_repeats M repeats rep_class n1 satellog_repeats D ugcount ugcount S satellog_repeats D ugcount ugstats S satellog_repeats D ugcount gc S satellog_repeats D ugcount repeats n1r

  19. Data access

  20. Dataset – Key Abstraction • Dataset • Organised into a single schema • BioMart database contains one or more dataset(s) • Attribute • Filter • Exportable/Importable (Links) • Dataset - an equivalent of relational table • Exportable/Importable = PK/FK

  21. Mart Dataset Attribute Filter GENE CENTRAL gene_id(PK) gene_stable_id gene_start gene_chrom_end chromosome gene_display_id description Key Abstractions

  22. Exportables, Importables and Links • Exportable = ordered list of attributes • Importable = ordered list of filters • WHERE filt1=value1 • WHERE filt1=value1 or filt1=value2 • WHERE filt1>value1 and filt2<value2 • Links = matching importable and exportable

  23. MartView

  24. Dataset Configuration • Dataset configuration • Attributes • Filters • Trees, Groups, Collections • Links • Semantics • Relational mapping • User interface • Linking datasets • XML-based

  25. XML XML XML Dataset Configuration

  26. Table naming conventionNaïve configuration • Tables • Meta tables meta_content • Data tables dataset__content__type • Data tables • Main __main • Dimension __dm • Columns • Key _key • Boolean filter _bool • List filter _list

  27. MartEditor

  28. MartEditor • Naïve configuration • Updates • Links • Automatic discovery of new tables

  29. Class diagram - configuration

  30. Class diagram - querying

  31. Information flow • Read connections • Register individual datasets and create linked datasets • Get input from the user, split queries to individual datasets. • Find the shortest path between datasets (Dijikstra) • Compile SQL

  32. Summary

  33. BioMart • Domain independent • Platform independent • MySQL 4 • Oracle 9i • Plugin architecture

  34. BioMart model • Already applied • Ensembl • Vega • dbSNP • Uniprot • MSD • Variety of small projects • In development • ArrayExpress • Wormbase • RGD

  35. Future work • BioMart v 0.2 to be released later on in january • Java library to be upgraded over coming months to the new architecture • BioMart has been integrated with Taverna • MartBuilder - to be properly implemented

  36. BioMart • www.ebi.ac.uk/biomart • Open source (LGPL) • Public MySQL server • ftp • mart-dev@ebi.ac.uk • mart-announce@ebi.ac.uk

  37. Acknowledgments • BioMart • Damian Smedley • Darin London • Contributors • Arne Stabenau (Ensembl) • Andreas Kahari (Ensembl) • Craig Melsopp (Ensembl) • Katerina Tzouvara (Uniprot) • Paul Donlon (Unilever) • Will Spooner (CSHL)

More Related