linked open geodata management in the cloud
Download
Skip this Video
Download Presentation
Linked Open GeoData Management in the Cloud

Loading in 2 Seconds...

play fullscreen
1 / 28

Linked Open GeoData Management in the Cloud - PowerPoint PPT Presentation


  • 137 Views
  • Uploaded on

Linked Open GeoData Management in the Cloud. K. Kritikos , Y. Roussakis ICS-FORTH D. Kotzinos ICS-FORTH & TEI of Serres. Cloud Computing. Better (faster, reliable, etc.) infrastructure - IaaS. Development infrastructure – PaaS. Software infrastructure – SaaS. Cloud Computing.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Linked Open GeoData Management in the Cloud' - jirair


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
linked open geodata management in the cloud

Linked Open GeoData Management in the Cloud

K. Kritikos, Y. Roussakis ICS-FORTH

D. Kotzinos ICS-FORTH & TEI of Serres

cloud computing
Cloud Computing

Better (faster, reliable, etc.) infrastructure - IaaS

Development infrastructure –

PaaS

Software infrastructure –

SaaS

cloud computing1
Cloud Computing
  • Publication
  • Querying
  • Updating

Data as a Service (DaaS)

Data

linked open data as a service
Linked (Open) Data as a Service
  • Publishing Linked Data
    • URI construction
    • Conceptual Model
    • Storage as RDF files or SPARQL endpoints
  • Querying Linked Data
    • SPARQL
    • GeoSPARQL
  • Updating Linked Data
    • SPARUL
    • Synchronization with original sources
problem introduction i
Problem Introduction (I)
  • INGeoCloudS FP7 Pilot B Project (www.ingeoclouds.eu)
  • Geophysical data from different sources and in different formats (excel, xml, relational, nothing …)
  • Borehole and Groundwater Water Analysis
    • Boreholes located in Mygdonia/Thriasio of Greece, whole country in Denmark and France and their features (static data over time)
    • Chemical analyses of ground waters sampled from their boreholes (data updated over time)
  • Earthquake events and features
  • Landslides
data granularity
Data granularity

Data refer to different levels of granularity, e.g. susceptibility maps refer to a country-wide area while earthquakes or boreholes are point-level data

Data might need to be aggregated by such aggregation is based on the spatial dimension, i.e. points contained within a polygon

Some problems of aggregation do exist since phenomena outside the area of concern may affect it, so spatial aggregation might not be enough

slide7

Landslides:

  • Which area and how much is it affected?
  • How does this change over time?
  • Is the earthquake effect cumulative or fades over time?
  • Earthquakes:
  • How much back in time should we go?
  • What information should be kept/would be relevant?
  • How should we query the repository to get the relevant information?
problem introduction ii
Problem Introduction (II)
  • Data/Metadata Standards
    • INSPIRE standard proposes generic conceptual schema for scientific data + models for 34 spatial data themes
  • Deal with geospatial data & maintaining schemas/ontologies becomes difficult
    • Challenge is to exploit semantic heterogeneity
  • Need to offer seamless & transparent LOD as a service (LODaaS) way to manage LOD data
    • Lack of tools for mapping, transforming & synchronizing geo-spatial LD
    • Generic LOD management independent of way LOD are stored
points of interest
Points of interest
  • GeoData get bigger and more important
    • Used in a variety of applications in different fields
  • Size & high demand impose considerable requirements in infrastructure storage size & compute power
  • Need to be reused and linked with other data sets
    • Go beyond current Web paradigm of isolated data silos
  • Current geo-spatial open data management work does not offer such effort
    • Cloud-based approaches:
      • do not provide geo-spatial support
      • Some do not fully support SPARQL or offer SPARQL end-points
    • Centralized approaches offer geo-spatial support but:
      • Do not enable automatic mapping between relational and RDF data
      • Worse performance in general (with the exception of Strabonwrt geo-spatial query support)
proposed solution i
Proposed Solution (I)
  • A specific set of LODaaS services for geo-spatial LOD publishing, integration & querying
  • Cloud is offering its scalability & elasticity of computation, 24/7 availability & multiple data storage and integration offerings
  • Our cloud-based service-oriented system:
    • Exhibits good LOD management performance
    • Exposes a LOD management service that abstracts away RDF Store peculiarities & provides a generic way for LOD access and management
proposed solution ii
Proposed Solution (II)
  • A particular solution is adopted for mapping geo-spatial data in different formats to RDF data
  • The latter conform to extensible conceptual models that accurately capture thematic areas and are integrated via GeoScientific Observation Model
    • This allows imposing queries across providers and thematic fields
  • Our solution is part of the system, developed in the context of the InGeoCloudS project, that exploits cloud capabilities & LD technology to integrate & store heterogeneous geo-spatial data sets of different thematic fields + host & execute applications that exploit these data sets
architecture i
Architecture (I)
  • System is scalable and elastic by exploiting cloud facilities
  • An extensive application pool can be built on top that exploits the offered services to perform various added-value and high-demanding tasks:
    • LO GeoData visualization, discovery & composition of data-sets, LO GeoData analytics
    • System could be extended to host such applications & offer various (geo-spatial) LO GeoData processing services and pre-built applications
architecture ii
Architecture (II)
    • Distributor: equally distributes generic queries & collects back the results, non-generic queries are sent to instances with the appropriate data, data distribution achieved by assigning new data to the less loaded wrt storage space scaling layer, exploits CPU monitoring & elasticity facilities of Amazon
  • Scaling Layer: comprises one or more LOD management components, data are replicated across these components to enhance reliability & enable layer-based load balancing
  • LOD Management Component: comprises LOD Management Service (LMS) instance & Virtuoso server for storage
  • LMS: provides methods for data providers to manage LOD & for other users to query & export the LOD stored
  • Virtuoso: underlying RDF triple store also allowing the mapping & synchronization between relational and RDF data
slide15

General Query Evaluation Behavior

Response Time

2nd instance involvement

Time passed

lod integration publishing i
LOD Integration & Publishing (I)
  • Extension of the high-level CIDOC-CRM conceptual model
  • New model is called Geo-Scientific Spatial Observation Model (GSOM) & expressed in RDF/S
  • It enables to capture all information coming different fields & countries + link data across different providers
  • INSPIRE was not exploited as did not cover all requirements:
    • Capturing of scientific events
    • Complicated and cumbersome for information integration
    • In some cases, does not cover all appropriate information required by the data providers in particular thematic fields
  • GSOM-to-INSPIRE mapping specification to enable exporting INSPIRE-compliant data
lod integration publishing ii
LOD Integration & Publishing (II)
  • Two alternatives for publishing LOD:
  • Create and import RDF-based descriptions of data-sets via particular LMS method
    • Data update process must be controlled by performing SPARUL updates via particular LMS method
    • Data provider responsibility to keep synchronized relational & RDF data
      • A perfect synchronization may be also not required as it may incur costs -> second alternative becomes more preferable
lod integration publishing iii
LOD Integration & Publishing (III)
  • Data provider publishes relational data of his/her data sets + provides a mapping file in R2RML to enable the synchronization of relational to RDF data (by executing LMS method)
    • System takes care of this synchronization
    • Relational storage in the way used many years + additional RDF storage for the data with automatic one-way synchronization between the two
    • Provider should have a good knowledge of GSOM & RDF
lod integration publishing iv
LOD Integration & Publishing (IV)
  • R2RML:
    • W3C recommendation since 2012
    • Can specify customized mappings between RDB & RDF data
    • R2RML specification is just a RDF graph in Turtle
    • No specific implementation is imposed
  • Virtuoso supports R2RML by processing the R2RML specification & creating the respective RDB2RDF triggers (used for creating/updating RDF data from relational ones)
    • An RDF view or physical RDF graph can be created with the second option mapping to far better performance
slide20

R2RML

E26.Physical_Feature

GSOM

O4.sampled_from

S15.Acquifer_

Concept

Intake

P121F.overlaps_

with

S16.Borehole

S2.SampleTaking

O5.removed

P43F.has_

dimension

P1F.is_identified_by

S13.Sample

E41.Appelation

Borehole_Name

E42.Identifier

Sample_ID,

E54.Dimension

Waterlevel

URI Identification:

http://orgURL/SampleID/XYZ

P1F.is_identified_by

Publication

Borehole Relational Model

RDB

Synchronization

lod management service i
LOD Management Service (I)
  • REST-based service with API exposing all appropriate management functionality needed by geo-spatial LOD users
    • Abstracts away from peculiarities of RDF triple stores
    • Enables simple & intuitive use of a specific set of LOD management methods
    • Programmatic or form-based access to methods
    • Production of query results in different forms, such as WKT, GML, & KML
    • Imporing/exporting capabilities in different formats (RDF/XML, NTriples, Turtle)
lod management service ii
LOD Management Service (II)
  • The provided methods are:
    • meta_query (SPARQL string, timeout (opt.), row limit (opt.)): user-requested format (e.g., JSON)
    • meta_update (SPARUL string, baseURI, timeout (opt.), row limit (opt.))
    • meta_addMappings(R2RML string, graphURI) -> initiates mapping procedure
    • meta_export(graphURI, subjURI, predURI, objURI, internal): user-requested format -> last param indicates if result will be inline in the response
    • meta_import(url, graphUri, format, blocking): ImportStatus -> RDF data are imported by downloading them via provided URL or inline in user-request + method can be blocking or non-blocking
    • import_status(importID): ImportStatus -> in case of blocking import request, the user can inquire the status of his/her import by exploiting the value of a specific field (importID) returned from the previous method as input to this method
lod management service iii
LOD Management Service (III)
  • Each method accessible via specific URL + produces meaningful exception messages (e.g., in case user input is wrong)
  • User-friendly HTML Documentation produced via Enunciate
  • Implementation exploited Sesame RDF Data Management API, Virtuoso’s JDBC Driver & Jersey
open issues i
Open Issues (I)
  • Model:
    • Extend it to capture other thematic fields
    • Data published in our system could fulfill all requirements to be 5-star LOD if respective owners decide to do so
  • Data mapping:
    • Cloud-based Virtuoso version supports native Relational DB for RDB2RDF synchronization
      • Trade-off between LOD management completeness & cost
    • Mapping tools are needed to allow visual-based editing of R2RML without needing from data providers to have good knowledge of RDF
    • Research issue: support bi-directional RDB2RDF mappings
open issues ii
Open Issues (II)
  • Geo-spatial query support:
    • Virtuoso does not support GeoSPARQL
    • Virtuoso has limited geo-spatial query support only in commercial versions
      • 2D geometries + limited set of topological relation operators
    • Additional support in terms of geometry dimensionality + feature aggregation operators
      • Could extend Virtuoso via frameworks, such as uSeekM, which provide adequate geo-spatial support along with the capability of evaluating GeoSPARQL queries
        • Such solutions require processing all RDF data stored to create geo-spatial indices as well as deploy another DB -> do not fit well with automatic geo-spatial LOD management
        • Could resolve problem by: (a) performing re-indexing in infrequent time intervals, (b) create specialized triggers which trigger re-indexing only when RDF data are updated
open issues iii
Open Issues (III)
  • Quality & provenance:
    • Original input data sets may not have the appropriate quality -> resulting RDF data can have the same or lower quality level
    • Proposed infrastructure must be extended with quality resolving procedures & methods (e.g., data cleansing methods for correcting the data exploited)
    • Provenance information can ensure the correct updating of LD + assist in LD reasoning process by deriving additional facts
    • Thus, provenance information should be exploited by our system, especially if we consider that such exploitation is not enabled by most LOD management systems
conclusions
Conclusions
  • Proposed a scalable, geo-spatial LOD as-a-Service management system deployed on Amazon cloud
    • Distributes query load + scales-up/down when CPU utilization surpasses specific thresholds
    • Exposes REST-based service with LOD management methods
    • Provides two different ways for publishing open geo-spatial data sets
  • Advance geo-spatial support level by following two directions:
    • Realize GSOM-to-INSPIRE mapping to enable producing INSPIRE-compliant data
    • Extend Virtuoso with geo-spatial indexing & query systems to enable the efficient processing of rich & expressive geo-spatial queries, expressed either in SPARQL or GeoSPARQL
ad