1 / 50

Chado and interoperability

Chado and interoperability. Chris Mungall, BDGP Pinglei Zhou, FlyBase-Harvard. Databases and applications. Java. SQL. Chado. Application. ?. rad. seq. genetic. cv. Application. pub. phylo. Perl. How do we get databases and applications speaking to one another?.

lorae
Download Presentation

Chado and interoperability

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chado and interoperability Chris Mungall, BDGP Pinglei Zhou, FlyBase-Harvard

  2. Databases and applications Java SQL Chado Application ? rad seq genetic cv Application pub phylo Perl How do we get databases and applications speaking to one another?

  3. Databases and applications SQL Java Chado posgresql driver Application JDBC method calls row objects rad seq posgresql driver genetic cv Application DBI method calls perl arrays pub phylo Perl Generic database interfaces only solve part of the problem They let us embed SQL inside application code

  4. Why this alone isn’t enough • Interfacing applications to databases is a tricky business… • Issue: Language mismatch • Issue: Data structure mismatch • Issue: Repetitive code • Issue: No centralized domain logic

  5. Language mismatch String sql = "SELECT srcfeature_id, fmax, fmin “+ “FROM featureloc "+ "WHERE feature_id =" + featId; try { Statement s = conn.createStatement(); ResultSet rs = s.executeQuery(sql); rs.next(); sourceFeatureId = rs.getInt("srcfeature_id"); fmin = rs.getInt("fmin"); fmax = rs.getInt("fmax"); } catch (SQLException sqle) { System.err.println(this.getClass() + ": SQLException retrieving feature loc" + " for feature_id = " + featId); sqle.printStackTrace(System.err); }

  6. Data structure mismatch • Different formalisms relations - set theoretic - relational algebra X classes and structs - pointers - programs

  7. Repetitive code • Database fetch pattern • construct, ask, transform, repeat, stitch • Example: fetching gene models • fetch genes • fetch transcripts • fetch exons, polypeptides • fetch ancillary data (props, cvs, pubs, syns, etc) • Optimisation is difficult

  8. Examples of domain logic: project a feature onto a virtual contig revcomp or translate a sequence search by ontology term delete a gene model Domain logic should be reusable by different applications No centralized domain logic

  9. A solution:Object Oriented APIs Perl Perl chado driver DBI chado adapter Application API method calls domain objects Application biosql biosql adapter domain objects driver DBI Perl Different perl applications share the same API Different schemas can be added by writing adapters

  10. How do OO APIs help? • Issue: Language mismatch • Seperation of interface from implementation • Issue: Data structure mismatch • API talks objects • adapters hide and deal with conversion • Issue: Repetitive code • code centralized in both API and adapter • Issue: No centralized domain logic • object model encapsulates domain logic • object model can be used independently of database

  11. How do OO APIs hinder? • Writing or generating adapters • brittle, difficult to maintain • Restrictive • canned parameterized queries vs query language • Application language bound • very difficult to use a perl API from java • Application bound • sometimes generic, but often limited to one application • Opaque domain logic

  12. XML can help • XML is application-language neutral • XML can be used to specify: • data • transactions • queries and query constraints • XML can be used within both application languages and specialized XML languages • XPath • XSLT • XQuery

  13. XML middleware Perl Perl any chado driver DBI impl Application interface query params chado xml Application query params chado xml Generic SQL to XML mapper Java Database XML as lingua franca OO APIs can be implemented on top of XML layer

  14. XML middleware Java Perl any chado driver JDBC impl Application interface query params chado xml Application query params Implementation can be any language chado xml Java

  15. Mapping with XORT • XORT is a specification of how to map XML to the relational model • generic: independent of chado and biology • XML::XORT is a perl implementation of the XORT specification • Other implementations possible • DBIx::DBStag implements XORT xml->db • Application language agnostic • Easily wrapped for other languages

  16. Highlights • Proposal: XML mapping specification for Chado • Tools • Real Case

  17. XORT Mapping • Elements • Table • Column (except DB-specific value, e.g primary key in Chado schema -- not visible in XML) • Attributes • few and generic: transaction and reference control • Element nesting • column within table • joined table within table -- joining column is implicit • foreign key table within foreign key column • Modules • No module distinctions in chadoXML • Limitations of DTD • Cardinality, NULLness, data type

  18. Transactions and Operations • Lookup - Select only • Insert - Insert explicitly • Delete - Unique identifier with unique key(s) - One record per operation • Update - Two elements - Unique identifier with unique key(s) - One record per operation • Force Combination of lookup, insert and update (if not lookup, then insert, else update)

  19. Referencing Objects • By global accession - Format: dbname:accession[.version] - Only for dbxref, feature ?, cvterm ? • By a pre-defined local id - Allows reference to objects in same file - Need not be in DB - Can be any symbol • By lookup using unique key value(s) - Object can be in file or DB • Implicitly, using foreign key to identify information in the related link table

  20. Object ReferenceBy Global Accession <feature> <uniquename>CG3123</uniquename> <type_id>gene</type_id> <feature_relationship> <subject_id>Gadfly:CG3123-RA:1</subject_id> <type_id>producedby</type_id> <feature_relationship> …… </feature> </feature>

  21. Object ReferenceBy Local ID <cv id=“SO”> <name>Sequence Ontology</name> </cv> <cvterm id=“exon”> <cv_id>SO</cv_id> <name>exon</name> </cvterm> <feature> <type_id>exon</type_id>

  22. Object ReferenceBy key Value (s) <feature> <type_id>    <cvterm>        <cv_id> <cv> <name>Sequence Ontology</name>               </cv> </cv_id>        <name>exon</name>    </cvterm> </type_id> ….       

  23. ChadoXML Example <cv id=“SO”> <name>Sequence Ontology</name> </cv> <feature op=“lookup” id=“CG3312”> <uniquename>CG3312</uniquename> <type_id> <cvterm> <name>gene</name> <cv_id>SO</cv_id> </cvterm> <type_id> <organism_id> <feature_relationship> <subject_id>Gadfly:CG3312-RA</subject_id> <type_id>producedby</type_id> </feature_relationship> </feature>

  24. Schema-Driven Tools • DTD Generator: DDL-DTD • Validator • DB Not connected Syntax verification: legal XML, correct element nesting Some Semantic verfication: NULLness, cardinality, local ID reference • DB Connected: reference validation • Loader: XML->DB • Dumper:DB->XML • Driven by XML “dumpspec” • XORTDiff: diff two XORT XML files

  25. DumpSpec Driven Dumper • Default behavior: given an object class and ID, dump all direct values and link tables, with refs to foreign keys. • Non-default behavior specified by XML dumpspecs using same DTD with a few additions: • attribute dump= all | cols | select | none • attribute test = yes | no • element _sql • element _appdata • Workaround with views, _sql • Current use cases: • Dump a gene for a gene detail page • Dump a scaffold for Apollo

  26. DumpSpec Sample <feature dump="all"> <uniquename test="yes">CG3312</uniquename> <!-- get all mRNAs of this gene --> <feature_relationship dump="all"> <subject_id test="yes"> <feature> <type_id><cvterm> <name>mRNA</name> </cvterm> </type_id> </feature> </subject_id> <subject_id> <feature dump="all"> <!-- get all exons of those mRNAs --> <feature_relationship dump="all"> <subject_id test="yes"> <feature> <type_id> ><cvterm><name>exon</name> </cvterm> </type_id> </feature> </subject_id> <subject_id> <feature dump="all“/> </subject_id> </feature_relationship> </feature>

  27. Apollo Use Case 1Chado <-> Apollo Interaction DumpSpec1 DDL XML Loader + validator Chado XML Dumper Chado XML Chado XML By_product GameBridge GameBridge XML Schema GAME XML GAME XML

  28. Use Case 2Gene Page Dataflow DumpSpec2 Chado XML Dumper Chado XML acode FlyBase

  29. To Do Lists • External Object reference • Dump with auto-generated XML Schema • Output human-friendly

  30. Resources • Today’s slides • XORT package http://www.gmod.org • Protocol draft submit to Current Protocol In Bioinformatics • Using chado to Store Genome Annotation Data

  31. XORT Key points • Application language-neutral • reusable from within multiple languages and applications • Where does the domain logic live? • Unlike objects, XML does not have ‘behaviour’ • One solution: ChadoXML Services • Another solution: Inside the DBMS

  32. ChadoXML Services chado Application XORT interface query params driver DBI XML:: XORT chado xml chado xml query params chado xml other xml Lisp Java XSLT Prolog Format converters, dumpers Ontology services XQuery ChadoXML services interface companalysis logic sequence domain logic Java Perl Perl C

  33. ChadoXML Services Can be independent of DB Application chado xml query params chado xml other xml Lisp Java XSLT Prolog Format converters, dumpers Ontology services XQuery ChadoXML services interface companalysis logic sequence domain logic Java Perl Perl C

  34. DB Functions API Chado JDBC/ DBI posgresql driver Application db function calls DB Functions API sql result objects PL/PgSQL Function Impl sequence library C Implementation inside database

  35. DB Functions API • cv module • get_all_subject_ids(cvterm_id int); • get_all_object_ids(cvterm_id int); • fill_cvtermpath(cv_id int); Chado DB Functions API PL/PgSQL Function Impl Existing functions

  36. DB Functions API • sequence module • get_sub_feature_ids(feature_id int) • featuresplice(fmin int, fmax int) • get_subsequence(srcfeature_id int, fmin int, fmax int, strand int) • next_uniquename() Chado DB Functions API PL/PgSQL Function Impl Existing functions

  37. Putting it together: storing ontologies in chado XML::XORTor DBIx::DBStag Chado XML cv fill_cvtermpath() oboxml_to_ chadoxml.xsl Obo XML obo-edit owl_to_ oboxml.xsl protege OWL

  38. Benefits • Issue: Language mismatch • XORT dumpspecs and sql functions a more natural fit for application languages • Issue: Data structure mismatch • XML maps naturally to objects and structs • Issue: Repetitive code • XORT dumpspecs centralize db-fetch code • XORT loader centralizes db-store code • Issue: No centralized domain logic • domain logic can be encoded in: • PostgreSQL functions and triggers • ChadoXML services

  39. Other issues • Speed? • chained transformations may be slower (-) • generic code is often slower (-) • single point for optimization(+) • Verbosity • inevitable with a normalized database • reduced with XORT macros • Portability • XORT highly portable (+) • PostgreSQL functions must be manually ported to different DBMSs (-)

  40. Current plans • XORT wrappers • Improving efficiency • Documentation • Extend PostgreSQL function repertoire • More ChadoXML XSLTs • ChadoXML adapters • CGL • Apollo • BioPerl - Bio::{Seq,Search,Tree,..}IO::chadoxml

  41. Conclusions • ChadoXML • a common GMOD format • converted to other formats with XSLTs • XORT • centralises database interoperation logic • PostgreSQL functions • useful for certain kinds of domain logic • Object APIs • still required by many applications • can be layered on top of XORT if so desired

  42. Richard Bruskiewich Scott Cain Allen Day Karen Eilbeck Dave Emmert William Gelbart Mark Gibson Don Gilbert Aubrey de Grey Nomi Harris Stan Letovsky Suzanna Lewis Aaron Mackey Sima Misra Emmannel Mongin Simon Prochnik Gerald Rubin Susan Russo ShengQiang Shu Chris Smith Frank Smutniak Lincoln Stein Colin Wiel Mark Yandell Peili Zhang Mark Zythovicz Thanks to…

  43. Nothing to see here, move along….. • random deleted slides follow……

  44. How do we develop an OO API? • Ad-hoc • eg: GO API, GadFly API, Ensembl API, ApolloChadoJDBC • Object model and db can be developed independently • hand tuned • Generic • Class::DBI, JDO • efficiency issues?

  45. GBrowse API Perl BioSQL GBrowse Bio::DB ::DAS SQL Chado DBI Bio-DB-GFF BioPerl Gbrowse can connect to different database via the same API APIs are typically associated with an object model

  46. Choice of XML Datamodel • XML is a different formalism from the relational model • Should we: • design an XML datamodel from scratch? • adopt existing XML datamodel? • Generate XML datamodel from relational model?

  47. Chado XML • XML datamodel derived automatically from relational schema • mapping uses XORT specification • XML tightly coupled to relational schema • benefits & advantages..

  48. Data retrieval: XORT Dumpspecs • Examples: chado gene model dumpspec

  49. Data storage • Static or transactional

  50. بزرگترین بانک پاورپوینت ایرانwww.txtzoom.comبانک هوشمند اسناد متنی

More Related