1 / 29

Towards a Generic Framework for Semantic Data Registration and Integration in Geosciences

Towards a Generic Framework for Semantic Data Registration and Integration in Geosciences. Kai Lin, Chaitan Baru San Diego Supercomputer Center University of California, San Diego. Data Integration Goal. Query heterogeneous data sources as a single resource

isi
Download Presentation

Towards a Generic Framework for Semantic Data Registration and Integration in Geosciences

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Towards a Generic Framework for Semantic Data Registration and Integration in Geosciences Kai Lin, Chaitan Baru San Diego Supercomputer Center University of California, San Diego www.geongrid.org

  2. Data Integration Goal • Query heterogeneous data sources as a single resource • Query: not write a program (“ad hoc, non-procedural query languages”) • Heterogeneous: local resource controls definition of the data • Single resource: remove the burden of individually accessing each data source www.geongrid.org

  3. Data Integration Challenges: Heterogeneities Syntactical Heterogeneity heterogeneous data format e.g. 02-04-2004 vs. 02/04/04 Structural Heterogeneity heterogeneous data models and schemas e.g. 02-04-2004 is saved as three columns or one columns Semantics Heterogeneity fuzzy metadata, terminology, “hidden” semantics, implicit assumptions • GEON Solution: • data should be semantically registered to GEON first • heterogeneities are resolved by registration www.geongrid.org

  4. Levels of Registration • Metadata-level registration • Register metadata associated with a resource •  submit required metadata. Predefined semantics. • “Item” level registration • Register the “schema” of a resources, e.g. relational database, shapefiles, … • Record semantics of schema elements, e.g. table name, column name • “Item-Detail” level registration • Register individual values in a dataset • Record semantics of each item in a record/column www.geongrid.org

  5. Registering Structured Data • Relational databases • Shapefiles  database tables • Excel spreadsheets  database tables • Delimited ASCII files  database tables • Headers of scientific data files, e.g. netCDF www.geongrid.org

  6. Item Level Database Registration and Access Application Table Def Table Def View Table View Table Table Table View Def GEON JDBC Driver GEON Mediator Original Database select tables and views to register Published Database www.geongrid.org

  7. How to Connect to GEON Databases • Download GEON JDBC Driver • Use the following code to create a connection // load driver Class.forName ("org.geongrid.jdbc.driver.Driver"); // set the mediator URL String url = "jdbc:geon://geon01.sdsc.edu:2532/GEON-63cb404c-6038-11d9-a69f”; // open the connection Connection conn = DriverManager.getConnection(url, "geonuser", "geongrid"); The host name and port number of GEON Mediator GEON ID GEON JDBC protocol Note: the original account information is not accessbile by end users www.geongrid.org

  8. GEON Mediator Enables Write Protection Mediator Database UPDATE B C B B A • Only accepts SELECT statements • Rejects any requests other than SELECT www.geongrid.org

  9. Read Protection for Unregistered Tables and Views Mediator Database SELECT * FROM A C B B A • An unregistered table or view is invisible to an end user • The data in the table can’t be viewed by SELECT statement • The schema can’t be fetched www.geongrid.org

  10. GEON Database Integration • GEON Mediator supports integration at three levels • Level 1: Federation-Based Integration • End users need to be knowledgeable about each database • Level 2: View-Based Integration • End users see “integrated views”. An intermediary designs these views. • Level 3: Ontology-Based Integration • End users can query using familiar concepts • Requires middleware and formal representation of domain knowledge www.geongrid.org

  11. Use SQL to query the federated database • Structural and semantic heterogeneity should be • solved by users themselves Level 1: Federation-Based Integration GEON Mediator backend A B A B C D C D SELECT * FROM A, E WHERE …… backend E E F G F G www.geongrid.org

  12. A B C D SELECT * FROM V, W WHERE …… E F G • Allow defining views on top of the federated databases • Allow hiding the original backend schemas • Integration results can be shared and reused Level 2: View-Based Integration GEON Mediator backend A B C D V W backend E F G www.geongrid.org

  13. A B C D E F G • Requires ontology annotations for backend databases • Use simple ontology query language to query the integrated database • End users do not need to know the backend schemas and local semantics Level 3: Ontology-Based Integration GEON Mediator backend A B C D Ontology Based Query backend E F G www.geongrid.org

  14. GEON Ontology Based Data Integration Ontology2 ontology3 Ontology1 dataset1 dataset2 dataset3 dataset4 • Ontology Enabled Semantic Integration Challenges for Computer Scientists and Domain Scientists • Computer Scientists: build an integration system based on the ontological registration of datasets • Domain Scientists: create domain ontologies • Data Providers: register datasets to ontologies www.geongrid.org

  15. Ontological Data Registration for Data integration • Registering a dataset to an ontology for data integration is a procedure to generate a partial model of the ontology from the dataset itself individuals ontology From registration dataset p Not all the constraints in the ontology are satisfied by the generated individuals www.geongrid.org

  16. Registering Relational Tables to Ontology Classes • Associate one or more columns under an optional SQL condition to a selected class in the ontology • Provide a mapping method if no explicit names of individuals should be generated Location (23.5, 47.9) is the name of an individual of the class Location Same name indicates the same location GeologicalAge Precambrian Cenozoic Paleozoic www.geongrid.org

  17. Registering Relational Tables to Ontology Object Properties • Associate two entities which are already registered to the domain class and the range class of a selected object property in the ontology hasAge Rock GeologicAge www.geongrid.org

  18. ODAL and SOQL ODAL (Ontological Database Annotation Language) User query SOQL (Simple Ontology Query Language) Register item/item-detail to Ontology www.geongrid.org

  19. ODAL(Ontological Database Annotation Language) <odal:NamedIndividuals odal:id="RockSample" odal:database="VTDatabase"> <odal:Class odal:resource="http://geon.vt.edu#RockSample" /> <odal:Table>Samples</odal:Table> <odal:Table>RockTexture</odal:Table> <odal:Table>RockGeoChemistry</odal:Table> <odal:Table>ModalData</odal:Table> <odal:Table>MineralChemistry</odal:Table> <odal:Table>Images</odal:Table> <odal:Column>ssID</odal:Column> </odal:NamedIndividuals> GUI generate to ODAL processor • Create a partial model of ontologies from databases • Independent of end interface • Independent of specific database implementations • The ODAL mapping is itself a “first-class” object The values in the column ssID of the table Samples, RockTexture, RockGeoChemistry, ModalData,MineralChemistry and Images represent instances of RockSample www.geongrid.org

  20. ODAL: Import Ontologies The Ontologies used for annotating a database can be imported as follows: <?xml version="1.0"?> <odal:ODAL xmlns:rdf = “http://www.w3.org/1999/02/22-rdf-syntax-ns#” xmlns:owl="http://www.w3.org/2002/07/owl#" xmlns:odal = “http://www.sdsc.edu/odal#” > <odal:Ontology> <odal:Imports rdf:resource="http://www.library.org/Book.owl"/> <odal:Imports rdf:resource="http://www.writer.org/Writer.owl"/> </odal:Ontology> …… </odal:ODAL> www.geongrid.org

  21. ODAL: Database Connection Declaration The target databases for making annotation is declared as follows: <?xml version="1.0"?> <odal:ODAL xmlns:rdf = “http://www.w3.org/1999/02/22-rdf-syntax-ns#” xmlns:owl="http://www.w3.org/2002/07/owl#" xmlns:odal = “http://www.sdsc.edu/odal#” > …… <odal:Database odal:id="PublicationDatabase"> <odal:DatabaseProductName>Oracle<odal:DatabaseProductName> <odal:DatabaseProductVersion>9.1.21<odal:DatabaseProductVersion> <odal:Host>oracle.sdsc.edu</odal:Host> <odal:Port>3456</odal:Port> <odal:DatabaseName>Publications</odal:DatabaseName> </odal:Database> …… </odal:ODAL> www.geongrid.org

  22. ODAL: Simple Named Individuals <odal:NamedIndividuals odal:id="BookInTableBookPrice" odal:database="PublicationDatabase" > <odal:Class odal:resource="http://www.amazon.com/Book.owl#Book"/> <odal:Schema>Collections</odal:Schema> <odal:Table>book-price</odal:Table> <odal:Column>ISBN</odal:Column> </odal:NamedIndividuals> Suppose the Book ontology contains a class Book and the schema Collection contains a table Book-Price with a column ISBN. The statement says that each value in the column ISBN represents a book individual. odal:id gives a name to the declaration, and represents the set of the individuals generated by the statement. www.geongrid.org

  23. ODAL: Named Individuals from Multiple Columns <odal:NamedIndividuals odal:id="LocationInTableRockSample" > <odal:Class odal:resource="http://www.usgs.org/Space.owl#Location"/> <odal:Schema>California</odal:Schema> <odal:Table>Rock-Sample</odal:Table> <odal:Column>Latitude</odal:Column> <odal:Column>Longitude</odal:Column> </odal:NamedIndividuals> Suppose an ontology contains a class Location and a database table Rock-Sample with two columns Latitude and Longitude. The statement says that a pair of latitude and longitude gives a location www.geongrid.org

  24. ODAL: Named Individuals with Conditions <odal:NamedIndividuals odal:id="MaleEmployeeInTableEmployee" > <odal:Class odal:resource="http://www.abc.com/Employee.owl#MaleEmployee"/> <odal:Table>employee</odal:Table> <odal:Column>EmployeeId</odal:Column> <odal:Condition><![CDATA[ Gender=’M’ >]]</odal:Condition> </odal:NamedIndividuals> <odal:NamedIndividuals odal:id="FemaleEmployeeInTableEmployee" > <odal:Class odal:resource="http://www.abc.com/Employee#FemaleEmployee"/> <odal:Table>employee</odal:Table> <odal:Column>EmployeeId</odal:Column> <odal:Condition><![CDATA[ Gender=’F’ >]]</odal:Condition> </odal:NamedIndividuals> A condition in an odal:Condition element should be a boolean expression which is valid to be used in any WHERE clauses of SQL queries www.geongrid.org

  25. ODAL: Data Type Property Declaration Person … SSN … age … … 1234-56-7890 … 8 … hasAge double <odal:NamedIndividuals odal:id="PersonInTablePerson" > <odal:Class odal:resource="http://www.foo.org/Person.owl#Person"/> <odal:Table>Person</odal:Table> <odal:Column>ssn</odal:Column> </odal:NamedIndividuals> <odal:OntologyProperty> <odal:DatatypeProperty odal:resource="http://www.foo.org/Person.owl#hasAge"/> <odal:Table>person</odal:Table> <odal:Domain odal:resource="PersonInTablePerson" /> <odal:Range odal:resource="age" /> </odal:OntologyProperty> www.geongrid.org

  26. Conditions for Joining Individuals from Different Resources • To join data across independent resources we need we need to know the correspondence between entities. • For example, does “10001” represent the same rock in the two resources. By default, we assume they are not. • A set of datatype properties can be declared as a key for a class in the ontology. We do join cross multiple resources based on keys. e.g. { hasLatitude, hasLongitude} can be declared as a key of Location Two locations from different resources are same if they have the same latitude and longitude Rock www.geongrid.org

  27. SOQL (Simple Ontology Query Language) location RockSample Location hasSiO2 lat long value float ValueWithUnit unit SELECT X.location.*; FROM RockSample X WHERE X.location.lat > 60 AND X.location.long > 100 AND X.hasSiO2.value < 30 AND X.hasSiO2.unit =‘weightPercetage’ string GUI generate to SOQL processor • Query single or integrated resources • via ontologies (i.e., high level logical views) • independent of schema-level representation www.geongrid.org

  28. The Architecture of GEON Semantic Mediator Oracle DB2 MySQL SQL Server PostgreSQL PostGIS Query Execution Query Optimization Query Planning Internal Database SQL Parser Spatial SQL against federal schemas Mediator JDBC Driver SOQL Parser Semantic Query Rewriter SOQL Ontology Reasoner ODAL Processor GUI Portal or Application OWL ODAL SOQL Processor www.geongrid.org

  29. Question: Finding all seismic stations within 1 mile from railroads GEON SOQL GUI SELECT X.code, X.location.* FROM SeismicStation X, Railroad Y WHERE distance(X.location, Y.geometry) < 1 SOQL Processor SELECT X2.stationcode, X2.lat, X2.lon FROM railroads_of_the_united_states X1, stationdatatable X2 WHERE distance(X1.the_geom, MakePoint(X2.lat, X2.lon)) < 1 Schema Mediator SELECT X1.the_geom FROM railroads X1 distance(X1.the_geom, MakePoint(X2.lat, X2.lon)) < 1 Seismic Stations Railroad shapefile SELECT X2.stationcode, X2.lat, X2.lon FROM stationdatatable X2 WHERE bounding box condition www.geongrid.org

More Related