1 / 14

From Web 1.0  Web 3.0: Is RDF access to RDB enough?

From Web 1.0  Web 3.0: Is RDF access to RDB enough?. Vipul Kashyap vkashyap1@partners.org Senior Medical Informatician, Clinical Informatics R&D Partners Healthcare System Martin Flanagan, mflanagan@insilicodiscovery.com CTO, InSilico Discovery

greenf
Download Presentation

From Web 1.0  Web 3.0: Is RDF access to RDB enough?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. From Web 1.0  Web 3.0: Is RDF access to RDB enough? Vipul Kashyap vkashyap1@partners.org Senior Medical Informatician, Clinical Informatics R&D Partners Healthcare System Martin Flanagan, mflanagan@insilicodiscovery.com CTO, InSilico Discovery W3C Workshop on RDF Access to Relational Databases October 26th , 2007

  2. Outline • Position • Use Case Scenario • Solution Approach • A Generalized Framework for RDF Access • Next Steps: • Proposed Roadmap • Research Topics

  3. Position There is a need for a generalized framework (format, representation language, algebra?) for RDF access to: • Relational Databases • Tabular Data Sources, e.g., Excel Spreadsheets • Web Services Motivation: • Large amounts of “tabular” data and increasing number of web services in the Healthcare and Life Sciences • Learn from the relational database success story: Declarative query language + Algebra + Opportunities for optimization • Potential for providing incremental value, increasing the adoption and acceptance of the Semantic Web.

  4. Use Case Scenario:Biological Explanations for Statistical Correlations • What is the location of a given Gene, e.g., CPNE1 on the Human Genome?Data Repository: NCBI EntrezAccess Mechanism: Web Services • For what gene(s) is a given SNP, e.g.., rs6060535 in the upstream regulatory region?Data Repository: RDBMS containing dbSNP and regulatory region data, Access Mechanism: JDBC/SQL • What genes have been found to be "coexpressed" with CPNE1 and in what study?Data Repository: Excel Spreadsheet containing the co-expression patterns of various genes in various studies.Access Mechanism: .NET API, MS Office API

  5. Solution Approach • Ontology based RDF query specification • Mapping Framework • Relational Databases • Excel Spreadsheets • Web Services • Query Translations and Execution Illustrations of a working system based on the Semantic Discovery System by InSilico Discovery (http://www.insilicodiscovery.com)

  6. prefix example <http://www.semanticdiscoverysystems.com/Example.owl#> prefix ns <http://www.w3.org/1999/02/22-rdf-syntax-ns#> select distinct ?v0, ?v1 where { ?v0 ns:type example:gene ?v0 example:has_gene_region ?v1 ?v0 example:gname ‘CPNE’ } SPARQL Query Generated: Ontology based RDF Query Specification

  7. Mapping to Oracle Databases Mapping to Gene Names Mediator Class Mapping to Relational Databases

  8. Mapping to Web Services Mapping to Web Services Mapping to GetGenomeLocations in gene_regions Mediator class

  9. Mapping to Excel Spreadsheets Mapping to Spreadsheet Data Mapping to Gene Names Mediator Class

  10. Query Translation and Execution This one SPARQL statement ‘joins’ data From NCBI, Excel, Oracle – “who did what assay matching this sequence data …” Translators

  11. A Generalized Framework for RDF Access Ontology Classes and Properties Gene, GeneRegion has_gene_region, gname Mediator Framework Classes: gene.mdl, gene_region.mdl, gene_names.mdl, … RDB specific classes: oracle.mdl Web service specific classes: ncbi.mdl, keg.mdl Excel specific classes: excel.mdl The SDS Platform is based on the Mediator Definition Language work done by Val Tannen and his students at U. Pennsylvania. Was earlier implemented in the K3 system and was widely used in Pharma

  12. Conclusions • Need to think of various types of structured/semi-structured/tabular data sources in a wholistic manner: • XML Documents (GRDDL Transforms) • Relational Databases • Web Services • Excel Spreadsheets • Other “Tabular” and “Tree” data sources • Potential for providing value beyond relational databases • Accelerate the transition to the Semantic Web • Increase Adoption and Acceptance

  13. Next Steps: Proposed Roadmap RDF Generalized Transformation Language Relational Algebra GRDDL Relational Databases WSDL XML Excel Spreadsheets

  14. Next Steps: Research • Extension of Relational Algebra? • XQuery • RDF • GRDDL Transformations • WSDL • Read only Web Service Choreography/Composition • What aspects of the above can be “webified”? • Access Transformation Languages • Mapping Languages: Is XQuery or RDF enough? • Existing efforts in Mediator research • E.g., Mediator Definition Language (MDL)

More Related