1 / 29

Damian Smedley

Modifying EnsMart. Damian Smedley. Modifying EnsMart. Modifying the existing EnsMart system Current status of the distributed, generic BioMart system and future plans Modifying the BioMart system. EnsMart schema. %_gene_snp_ dm. %_gene_ main. gene_id. gene_id. Attribute columns.

zamora
Download Presentation

Damian Smedley

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Modifying EnsMart Damian Smedley

  2. Modifying EnsMart • Modifying the existing EnsMart system • Current status of the distributed, generic BioMart system and future plans • Modifying the BioMart system

  3. EnsMart schema %_gene_snp_dm %_gene_main gene_id gene_id Attribute columns Attribute columns Filter columns %_gene_xref_REFSEQ_dm %_transcript_main gene_id gene_id transcript_id transcript_id Attribute columns Attribute columns Filter columns %_karyotype_lookup %_dna_chunks_support

  4. Modifying EnsMart • Adding new data to an existing EnsMart database • Changing species and foci available in EnsMart • Creating a completely new species/focus (i.e.) a new chimpanzee-gene mart or a new protein focus

  5. Adding new data to an existing mart • Modify the database • Modify the API • Modify the Web code

  6. Modify database • New table(s) following the mart naming convention • New column(s) in existing tables • Example: Some human transcripts mapped to some IDs mysql> SELECT * FROM example_mappings; +--------------------------+------------+ | transcript_stable_id | EXAMPLE_id | +--------------------------+------------+ | ENST00000326632 | AK024481 | | ENST00000269816 | AK024448 | | ENST00000326447 | AF346307 | | ENST00000018806 | AF118890 | | ENST00000018806 | U92072 | | ENST00000018784 | AF118889 |

  7. Modify database: new table • Left join onto the main table to create a new dimension (dm) table: • mysql> CREATE TABLE hsapiens_ensemblgene_xref_EXAMPLE_dm • -> SELECT m.gene_id, m.gene_stable_id, m.transcript_id, m.transcript_stable_id, m.translation_id, m.translation_stable_id, e.EXAMPLE_id as display_id, e.EXAMPLE_id as dbprimary_id • -> FROM hsapiens_ensembltranscript_main m LEFT JOIN example_mappings e ON m.transcript_stable_id=e.transcript_stable_id; • mysql> select * from hsapiens_ensemblgene_xref_EXAMPLE_dm; +---------+----------------+---------------+---------------------+--------------+------------------------+------------+------------+ | gene_id| gene_stable_id | transcript_id | transcript_stable_id| translation_id| translation_stable_id | display_id | dbprimary_id +---------+----------------+---------------+---------------------+--------------+------------------------+------------+------------+ | 97565| ENSG00000023810 | 123373 | ENST00000036411 | 124178 | ENSP00000037729 | AK024495 | AK024495 | 97565| ENSG00000023810 | 123374 | ENST00000036403 | 124179 | ENSP00000038269 | AF123675 | AF123675 | 97567| ENSG00000014005 | 123376 | ENST00000018764 | 124181 | ENSP00000018764 | AK012367 | AK012367

  8. Modify database:new column • Create flag columns in the gene and transcript main tables indicating whether a particular gene or transcript has a mapped EXAMPLE ID mysql> SELECT gene_id, gene_stable_id, has_EXAMPLE FROM hsapiens_ensemblgene_main limit 5; +---------+----------------+--------------+ | gene_id | gene_stable_id | has_EXAMPLE | +---------+----------------+--------------+ | 97565 | ENSG00000023810| NULL | | 97565 | ENSG00000023810| NULL | | 97567 | ENSG00000014005| 1 | | 97569 | ENSG00000023795| NULL | | 97569 | ENSG00000023795| 1 | +---------+----------------+--------------+

  9. Modify API • MartGeneExtractor.pm • Table name: • my $META_TABLES = { • example => ["%s_%sgene_xref_EXAMPLE_dm", "gene_id"], ... • Attributes and filters: • my $META_NAMES = { • # attribute • xexample_dis => ["example","example.display_id"], • # filters • FG_EXAMPLE_ID => ["example","example.display_id in(%s)"], • example_exclusive => ["example","example.display_id is not null"], • example_excluded => ["example","example.display_id is null"],...

  10. Modify web code • MetaData.pm organised into: • Stages – collections of blocks on one HTML page • Blocks - collections of related forms • Forms – collections of entries • Entries – an individual HTML element STAGE BLOCK FORM ENTRY

  11. Modify web code • MetaData.pm • xexample_dis attribute: • FORM_XREF_ATTRIBUTES:{ • ..... • my %entry_labels = • ( xgene_name_dis => [1, 'Gene Name'], • ..... • ( xexample_dis => [35, 'Example ID'], Add Example ID as an attribute

  12. Modify web code • MetaData.pm cont • Hyperlinks for attribute: • my %hyperlinks = • ( xhugo_dis => ['exturl' , 'HUGO'], • xexample_dis => ['exturl' , 'EXAMPLE], ... • exturl defined in /conf/DEFAULTS.ini: • [ENSEMBL_EXTERNAL_URLS] • EXAMPLE = http://www.ebi.ac.uk/cgi-bin/emblfetch?###ID### Add hyperlink definition for example id

  13. Modify web code

  14. Modify web code • MetaData.pm • The FG_EXAMPLE_ID filter is picked up automatically by a method in the web code that detects all filters beginning FG_ as ID list filters • example_exclusive/excluded filters: • FORM_EXAMPLE:{ • my $form_name = 'example'; • my $form = $block->addobj_form(); • add_available_by_api_filter( $form, 'example_exclusive' ); • $form->set_name($form_name); • $form->set_type('CHECK_WITH_RADIO'); • ENTRY_CHECK:{ • my $entry = $form->addobj_form_entry(); • $entry->set_value(1); • $entry->set_label("Entries with an EXAMPLE ID"); • } Uses gen_check_with_radio method in PanelMain.pm to organise HTML layout

  15. Modify web code • Example exclusive/excluded filters (cont) • ENTRY_RADIO_1:{ • my $entry = $form->addobj_form_entry(); • $entry->set_name_suffix('_type'); • $entry->set_api_filter('example_exclusive'); • $entry->set_value('Only'); • $entry->set_default('Only'); • $entry->set_label('Only'); • $entry->set_label_summary("Has Example ID: %s"); • activate_filter_onchange( $entry ); • add_error_scalar( $entry ); • }

  16. Modify web code • Example exclusive/excluded filters (cont) • ENTRY_RADIO_2:{ • my $entry = $form->addobj_form_entry(); • $entry->set_name_suffix('_type'); • $entry->set_api_filter('example_excluded'); • $entry->set_value('Excluded'); • $entry->set_label('Excluded'); • $entry->set_label_summary("Has Example ID: %s"); • activate_filter_onchange( $entry ); • add_error_scalar( $entry ); • } • }

  17. Modify web code

  18. Changing species/focus available • Create an EnsMart database with: • just the species and focus combination tables interested in including all lookup and support tables: • (i.e.) hsapiens* for a human-only EnsMart • (i.e.) hsapiens_ensemblgene* plus hsapiens_*lookup and hsapiens_*support for a human ensemblgene only EnsMart • the _meta* tables • evoc* and go* if want expression vocabulary and GO searching

  19. Changing species/focus available • Edit the _meta_release_info table: • To only have human datasets: mysql>DELETE FROM _meta_release_info WHERE species != 'homo_sapiens'; • To further restrict focus to ensembl genes only: mysql>UPDATE _meta_release_info SET core_datasets = 'core' WHERE species = 'homo_sapiens'; mysql>UPDATE _meta_release_info SET satellite_datasets = NULL WHERE species = 'homo_sapiens';

  20. Changing species/focus available

  21. Adding a new species • Create the tables conforming to the mart naming convention • Filters and attributes corresponding to equivalent columns in existing EnsMart tables will be picked up automatically (i.e.) chromosome name attribute is already defined by: %_ensemblgene_main.chr_name • Add new filters and attributes as detailed earlier

  22. Adding a new focus • Requires a new Extractor module in the API. For example a new protein focus would require a MartProteinExtractor.pm equivalent to MartGeneExtractor.pm • May require extra configuration methods in MartInfo.pm and MartDefs.pm • All filters and attributes need adding to MetaData.pm in the web code.

  23. Adding a new focus

  24. BioMart system • MartLib API allowing query chaining between distributed Marts • XML based configuration system • MartEditor tool to create and edit the XML documents • MartShell command line tool and MartExplorer GUI • MartWeb servlet planned for this year • Currently Java-based but perl API and web interface coming to replace existing EnsMart site • EBI Industry Programme 19th March includes “BioMart – a distributed, query-oriented data integration architecture”

  25. Adding new filters and attributes in BioMart • Edit the XML file: • example id attribute <AttributeDescription description=“EXAMPLE Ids” displayName=”EXAMPLE ID” field=”display_id” internalName= ” xref_example_id” homepageURL=”” linkoutURL=”” maxLength=”8” source=”” tableConstraint=”gene_xref_EXAMPLE_dm”/> • example id exclusive/excluded filter <Option description=”filter to include/exclude genes mapping to EXAMPLE Ids” displayName=”with EXAMPLE ID(s)” field=”has_EXAMPLE” internalName=”example_id_xrefs” isSelectable=”true” legal_qualifiers=”only,excluded” tableConstraint=”main” type=”boolean”/> • example id list filter <Option description=”filter to include genes with supplied list of EXAMPLE Ids” displayName=”EXAMPLE ID(s):” field=”display_id” internalName=”example_id” isSelectable=”true” legal_qualifiers=”=,in” qualifier=”=” tableConstraint=”gene_xref_EXAMPLE_dm” type=”list”/>

  26. Adding new filters and attributes in BioMart

  27. Adding new datasets in BioMart • Just have to create the XML document once a Mart-compliant database created • The MartEditor tool simplifies this task. Creates a naïve initial XML view of the dataset and allows further editing in a GUI environment • Compare to existing perl system where a whole new perl module has to be coded and new code added to several other modules

  28. Conclusions • Adding new filters, attributes or a whole new species to EnsMart requires some understanding of the mart schema and a bit of “copying and pasting” in MetaData.pm and appropriate Extractor (i.e.) MartGeneExtractor.pm • Adding new datasets/foci requires a good understanding of the mart API as a new Extractor modules needs to be coded • The new BioMart system reduces all this to the creation of a mart-compliant schema and use of a GUI editing tool to produce an XML configuration file

  29. Acknowledgements • Arek Kasprzyk • EnsMart production and API • Damian Keefe • Darin London • Web code • Will Spooner • Java based Mart system • Craig Melsopp • Darin London • Katerina Tzouvara

More Related