1 / 24

Finding Spatial Equivalences Across Multiple RDF Datasets

Finding Spatial Equivalences Across Multiple RDF Datasets. Juan Salas, Andreas Harth. Outline. Motivation NeoGeo Vocabularies Geospatial Datasets Integration Challenges Finding Geometric E quivalences Conclusion. Motivation. Geodata is becoming increasingly relevant.

tamarr
Download Presentation

Finding Spatial Equivalences Across Multiple RDF Datasets

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Finding Spatial Equivalences Across Multiple RDF Datasets Juan Salas, Andreas Harth

  2. Outline Motivation NeoGeo Vocabularies Geospatial Datasets Integration Challenges Finding Geometric Equivalences Conclusion

  3. Motivation • Geodata is becoming increasingly relevant. • Location-based services • Mobile applications • Ever increasing amount of sensor data (phones, satelites) • Different sources. • Many formats: • GML, KML, Shapefile, GPX, WKT, RDF?… Applications require integrated access to geodata.

  4. NeoGeo Vocabularies • Geometry Vocabulary– http://geovocab.org/geometry • Representation of georeferenced geometric shapes. • Spatial Ontology– http://geovocab.org/spatial • Representation and reasoning on topological relations based on the Region Connection Calculus (RCC).

  5. Geospatial Datasets • GADM-RDF– http://gadm.geovocab.org • RDF representation of the administrative regions of the GADM project: http://gadm.org • NUTS-RDF– http://nuts.geovocab.org • RDF representation of Eurostat's NUTS nomenclature. They serve as: • New geospatial information on the Semantic Web. • Bridges between already published spatial datasets. • Proof-of-concept platforms.

  6. Integration Challenges • Vocabularies – http://geovocab.org/doc/survey.html • Survey of several well-known Linked Data datasets (Ordnance Survey, GeoLinkedData.es, LinkedGeoData.org, GeoNames, DBpedia). • Identified properties and classes mapped to the NeoGeo vocabularies published at GeoVocab.org • Instances • Finding equivalences between regions across multiple datasets at the geometry level.

  7. Integration Challenges

  8. Finding Geometric Equivalences Geometric shapes will not be vertex by vertex equivalent. A sensible criterion for finding geometric equivalences is needed. • NUTS-RDF and GADM-RDF have different: • Sampling values • Scales • Starting points • Rounding effects

  9. Algorithm Overview WGS-84, Plate Carrée projection 1 Hausdorff distance 1 spatial:EQ *

  10. 1. Retrieve sample data • The algorithm requires: • WGS-84 coordinate reference system. • Plate Carrée projection: X = longitude Y = latitude • Coordinates are treated as Cartesian. • Distorts all parameters (area, shape, distance, direction). • Geometric shapes are equally distorted on both datasets. • Local reprojections are avoided (e.g. UTM). • Units will be presented in centesimal degrees.

  11. 2. Similarity threshold function The Hausdorff Distance provides a measure of similarity between geometric shapes. Can be intuitively defined as the largest distance between the closest points of two geometric shapes.

  12. 2. Similarity threshold function Smaller regions need a lower Hausdorff Distance threshold than larger regions.

  13. 2. Similarity threshold function We calculate the midpoint value between the Hausdorff Distances for a correct guess and the lowest wrong guess.

  14. 2. Similarity threshold function We perform regression on the midpoint values to obtain the Hausdorff Distance threshold function.

  15. 3. Finding spatial equivalences

  16. Poor Geospatial Information Sometimes location is approximated as a single point. Can lead to false assertions while calculating containment relations. <http://dbpedia.org/resource/Germany> geo:lat 52.516666; geo:long 13.383333 . <http://nuts.geovocab.org/id/DE30_geometry> rdf:type ngeo:Polygon . Germany is not contained in Berlin. Other properties must be considered to calculate containment relations (e.g. rdf:type). Other spatial relations (e.g. spatial:EQ) cannot be calculated.

  17. Optimizations The cost of calculating the Hausdorff distance depends on the amount of vertices. The Ramer-Douglas-Peucker algorithm allows to simplify geometric shapes, using an arbitrary maximum separation.

  18. Optimizations

  19. Spatial Databases • The algorithm works also well with spatial databases (e.g. PostgreSQL / PostGIS): SELECT g.gadm_id, n.nuts_id FROM nuts n INNER JOIN gadm g ON (n.geometry && g.geometry) WHERE n.shape_area BETWEEN (g.shape_area * 0.9) AND (g.shape_area * 1.1) AND ST_HausdorffDistance( ST_SimplifyPreserveTopology(n.geometry, 0.5), ST_SimplifyPreserveTopology(g.geometry, 0.5) ) < g.max_hausdorff_dist;

  20. Evaluation GADM 2_13988 Leicestershire NUTS UKF2 Leicestershire, Rutland and Northamptonshire • Not every NUTS region matches a GADM region. • Many NUTS regions represent parts or aggregations of GADM administrative boundaries. • 1,671 NUTS regions => 965 matches & 13 false positives.

  21. Evaluation

  22. Conclusion • NeoGeo vocabularies: • Survey and mappings to other vocabularies. • NUTS-RDF and GADM-RDF datasets: • GADM-RDF links to DBpedia, UK Ordnance Survey and NUTS-RDF. • Linked Data Services for accessing/querying spatial indices (withinRegion, boundingBox). • Work on spatial similarity metrics: • Promising results

  23. Future Work • NeoGeo vocabularies. • Temporal context. • Datasets: • More Earth and space science data. • Add more instance mappings. • Spatial similarity: • Improve precision. • Develop tools to support the mapping process. • More experiments: • Querying of integrated data and reasoning.

  24. Acknowledgements European Commission's Seventh Framework ProgrammeFP7/2007-2013 (PlanetData, Grant 257641)

More Related