910 likes | 1.02k Views
Interoperability With BioMoby 1.0. It’s Better Than Sharing Your Toothbrush!. Photo taken by http://flickr.com/people/mfsarwar/. A brief history of BioMoby. Model Organism Bring Your own Database Interface Conference, Sept, 2001 (MOBY-DIC) May 21, 2002 – Genome Canada Platform Award
E N D
Interoperability With BioMoby 1.0 It’s Better Than Sharing Your Toothbrush! Photo taken by http://flickr.com/people/mfsarwar/
A brief history of BioMoby • Model Organism Bring Your own Database Interface Conference, Sept, 2001 (MOBY-DIC) • May 21, 2002 – Genome Canada Platform Award • May 25, 2002 – API Version 0.1 deployed, including object ontology serialization into XML • July 18, 2002 – First Moby Client (Gbrowse Moby) • June 9, 2003 – API Version 0.5 deployed • 2006 – Genome Canada Platform Award • 2007 - Version 1.0 API submitted for publication
MOBY-DIC Chapter VII 7th Model Organism Bring Your-own Database Interface Conference Vancouver, BC, June 2007.
Wendy Richard Martin Mylah Eddie
Andreas Paul Ivan Mark’s Screen…
Create an ontology of bioinformatics data-types • Define a serialization of this ontology (data syntax) • Create an open API over this ontology • Define Web Service inputs and outputs v.v. Ontology • Register Services in an ontology-aware Registry • Machines can find an appropriate service • Machines can execute that service unattended • Ontology is community-extensible The BioMoby Plan
Overview of BioMoby Transactions MOBY hosts & services Sequence Express. Protein Alleles … MOBY Central Align Phylogeny Primers Sequence Alignment Gene names
Overview of BioMoby Transactions A sequence is a ___ That has these features __ What is a sequence? MOBY Central Align Phylogeny Primers Sequence Discovery of services That consume things LIKE sequences! Object ontology
This is SCUFL – Simple ConceptualUnified Flow Language It is a complete record of everything you just did, and it can be saved for use in the Taverna workflow application that we will look at later…
Pipeline discovery “on the fly” • No explicit coordination between providers • Dynamic discovery of ~appropriate Services • Automated execution of services
Some BioMoby statistics
Moby: Breadth • Namespaces (data types): 418 • Objects (data syntaxes): >561 • Service Types (analytical categories): 112 • Providers: ~50 active • Service Instances: ~1200 currently “alive” • In main Moby Central server in Canada • Others in “boutique” Moby registries serving specialized communities worldwide
Moby: Clients • Gbrowse_moby (M Wilkinson) • PlaNet Locus_View (H Schoof, R Ernst) • Blue-Jay(P Gordon) • Taverna (T Oinn, M Senger, E Kawas) • MOWserv (INB, Spain) • Remora (S Carrere, J Gouzy, INRA) • MOBYLE (B Néron, P Tufféry, C Letondal, Pasteur Inst.) • SeaHawk (P Gordon)
BioMoby in detail • MOBY Data typing system: Semantic Type • MOBY Data typing system: Syntactic Type • Moby Registry Queries
BioMoby in detail • MOBY Data typing system: Semantic Type • MOBY Data typing system: Syntactic Type • Moby Registry Queries
Moby Namespaces • A “Namespace” is a category of identifiers • NCBI has gi numbers (gi Namespace) • GO Terms have accession numbers (GO Namespace) • Namespaces indicate data’s semantic type. • GO:0003476 a Gene Ontology Term • gi|163483 a GenBank record • Though we are using the word “Namespace” correctly, it causes confusion! • “Namespace” in XML is tightly associated with an XML document and/or its syntax • In Moby, we are ONLY talking about data entities NOT THEIR SYNTAX
BioMoby in detail • MOBY Data typing system: Semantic Type • MOBY Data typing system: Syntactic Type • Moby Registry Queries
BioMoby in detail • MOBY Data typing system: Semantic Type • MOBY Data typing system: Syntactic Type • Moby Registry Queries
The MOBY Object Ontology • Syntactic types are defined by a GO-like ontology • Class name at each node • Edges define the relationships between Classes • GO used as a model because of its familiarity in the community • Edges define one of three relationships • ISA • Inheritance relationship • All properties of the parent are present in the child • HASA • Container relationship of ‘exactly 1’ • HAS • Container relationship with ‘1 or more’
The Simplest Moby Data-Type <Object namespace=‘NCBI_gi’ id=‘111076’/> The combination of a namespace and an identifier within that namespace uniquely identify a data entity, not its location(s), nor its representation Object
ISA DateTime ISA Float ISA Integer ISA String Moby Primitives <Integer namespace=‘’ id=‘’>38</Integer> Object
ISA Integer HASA ISA Object String Describes the semanticrelationship between the Integer andthe Virtual Sequence ISA Virtual Sequence A Derived Data-Type <VirtualSequence namespace=‘NCBI_gi’ id=‘111076’> <Integer namespace=‘’ id=‘’ articleName=“length”>38</Integer> </ VirtualSequence > <Integer namespace=‘’ id=‘’>38</Integer>
HASA ISA Generic Sequence A Derived Data-Type <GenericSequence namespace=‘NCBI_gi’ id=‘111076’> <Integer namespace=‘’ id=‘’ articleName=“length”>38</Integer> <String namespace=‘’ id=‘’ articleName=“SequenceString”> ATGATGATAGATAGAGGGCCCGGCGCGCGCGCGCGC </String> </ GenericSequence > <VirtualSequence namespace=‘NCBI_gi’ id=‘111076’> <Integer namespace=‘’ id=‘’ articleName=“length”>38</Integer> </ VirtualSequence > ISA Integer HASA ISA Object String ISA Virtual Sequence
ISA DNA Sequence A Derived Data-Type <GenericSequence namespace=‘NCBI_gi’ id=‘111076’> <Integer namespace=‘’ id=‘’ articleName=“length”>38</Integer> <String namespace=‘’ id=‘’ articleName=“SequenceString”> ATGATGATAGATAGAGGGCCCGGCGCGCGCGCGCGC </String> </ GenericSequence > <DNASequence namespace=‘NCBI_gi’ id=‘111076’> <Integer namespace=‘’ id=‘’ articleName=“length”>38</Integer> <String namespace=‘’ id=‘’ articleName=“SequenceString”> ATGATGATAGATAGAGGGCCCGGCGCGCGCGCGCGC </String> </ DNASequence > ISA Integer HASA HASA ISA Object String ISA ISA Virtual Sequence Generic Sequence
Legacy file formats • Containing “String” allows ontological classes to represent legacy data types • <NCBI_Blast_Report namespace=‘NCBI_gi’ id=‘115325’> • <String namespace=‘’ id=‘’ articleName=‘content’> • TBLASTN 2.0.4 [Feb-24-1998] • Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. • Schäffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman • (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search • programs", Nucleic Acids Res. 25:3389-3402. • Query= gi|1401126 • (504 letters) • Database: Non-redundant GenBank+EMBL+DDBJ+PDB sequences • 336,723 sequences; 677,679,054 total letters • Searchingdone • Score E • Sequences producing significant alignments: (bits) Value • gb|U49928|HSU49928 Homo sapiens TAK1 binding protein (TAB1) mRNA... 1009 0.0 • emb|Z36985|PTPP2CMR P.tetraurelia mRNA for protein phosphatase t... 58 4e-07 • emb|X77116|ATMRABI1 A.thaliana mRNA for ABI1 protein 53 1e-05 • </String> • </NCBI_Blast_Report>
Binaries – pictures, movies • Text-base64 is a Class that containsString • Binaries are base64 encoded and passed in classes that inherit from text-base64 • base64_encoded_jpegISAtext/base64ISAtext/plainHASAString • <base64_encoded_jpeg namespace=‘TAIR_image’ id=‘3343532’> • <String namespace=‘’ id=‘’ articleName=‘content’> • MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIJQDCC • Av4wggJnoAMCAQICAwhH9jANBgkqhkiG9w0BAQQFADCBkjELMAkGA1UEBhMCWkExFTATBgNV • MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIJQDCC • Av4wggJnoAMCAQICAwhH9jANBgkqhkiG9w0BAQQFADCBkjELMAkGA1UEBhMCWkExFTATBgNV • BAgTDFdlc3Rlcm4gQ2FwZTESMBAGA1UEBxMJQ2FwZSBUb3duMQ8wDQYDVQQKEwZUaGF3dGUx • HTAbBgNVBAsTFENlcnRpZmljYXRlIFNlcnZpY2VzMSgwJgYDVQQDEx9QZXJzb25hbCBGcmVl • bWFpbCBSU0EgMjAwMC44LjMwMB4XDTAyMDkxNTIxMDkwMVoXDTAzMDkxNTIxMDkwMVowQjEf • MB0GA1UEAxMWVGhhd3RlIEZyZWVtYWlsIE1lbWJlcjEfMB0GCSqGSIb3DQEJARYQamprM0Bt • </String> • </base64_encoded_jpeg>
Extending legacy datatypes • With legacy data-types defined, we can extend them as we see fit • annotated_jpegISAbase64_encoded_jpeg • annotated_jpegHASA2D_Coordinate_set • annotated_jpegHASADescription • <annotated_jpeg namespace=‘TAIR_Image’ id=‘3343532’> • <2D_Coordinate_set namespace=‘’ id=‘’ articleName=“pixelCoordinates”> • <Integer namespace=‘’ id=‘’ articleName=“x_coordinate”>3554</Integer> • <Integer namespace=‘’ id=‘’ articleName=“y_coordinate”>663</Integer> • </2D_Coordinate_set> • <String namespace=‘’ id=‘’ articleName=“Description”> • This is the phenotype of a ufo-1 mutant under long daylength, 16’C • </String> • <String namespace=‘’ id=‘’ articleName=“content”> • MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIJQDCC • Av4wggJnoAMCAQICAwhH9jANBgkqhkiG9w0BAQQFADCBkjELMAkGA1UEBhMCWkExFTATBgNV • </String> • </annotated_jpeg>
The same object… annotated_jpegISAbase64_encoded_jpegHASA2D_Coordinate_setHASADescription • <annotated_jpeg namespace=‘TAIR_Image’ id=‘3343532’> • <2D_Coordinate_set namespace=‘’ id=‘’ articleName=“pixelCoordinates”> • <Integer namespace=‘’ id=‘’ articleName=“x_coordinate”> 3554 </Integer> • <Integer namespace=‘’ id=‘’ articleName=“y_coordinate”> 663 </Integer> • </2D_Coordinate_set> • <String namespace=‘’ id=‘’ articleName=“Description”> • This is the phenotype of a ufo-1 mutant under long daylength, 16’C • </String> • <String namespace=‘’ id=‘’ articleName=“content”> • MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIJQDCC • Av4wggJnoAMCAQICAwhH9jANBgkqhkiG9w0BAQQFADCBkjELMAkGA1UEBhMCWkExFTATBgNV • </String> • </annotated_jpeg>
<CrossReference> <Object namespace=“TAIR_Allele” id=“ufo-1”/> </CrossReference> <CrossReference> <Object namespace=‘TAIR_Tissue’ id=‘122’/> </CrossReference> The same object… annotated_jpegISAbase64_encoded_jpegHASA2D_Coordinate_setHASADescription • <annotated_jpeg namespace=‘TAIR_Image’ id=‘3343532’> • <2D_Coordinate_set namespace=‘’ id=‘’ articleName=“pixelCoordinates”> • <Integer namespace=‘’ id=‘’ articleName=“x_coordinate”> 3554 </Integer> • <Integer namespace=‘’ id=‘’ articleName=“y_coordinate”> 663 </Integer> • </2D_Coordinate_set> • <String namespace=‘’ id=‘’ articleName=“Description”> • This is the phenotype of a ufo-1 mutant under long daylength, 16’C • </String> • <String namespace=‘’ id=‘’ articleName=“content”> • MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3 • Av4wggJnoAMCAQICAwhH9jANBgkqhkiG9w0BAQQFADCBkjELMAkGA1U • </String> • </annotated_jpeg>
Cross reference types • Simple • A MOBY Object • Rich • Takes the form: • …Incidentally, this avoids the problem of reification that is experienced in RDF • <Object namespace=‘foo' id=‘12345‘/> • <Xref namespace='' id='' authURI='' serviceName='' evidenceCode='' xrefType=''> • ... Textual Description ... • </Xref>
XML Schema? The Object Ontology allows new data-types WITHOUT new flatfile formats, and without having to understand e.g. XML Schema Minimize future heterogeneity Improve interoperability without requiring schema-to-schema mapping
XML Schema? • Object Ontology terms have semantically rich names, but this is primarily for human intuition • DNA Sequence • Annotated_GIF • Object Ontology does not define the meaning of an object to the machine • No machine-readable semantics • It does define the representation • SYNTAX