GeoSpatial “Unstructured Data” Dan Rickman GeoSpatial SG
Agenda • What is geospatial data • What does “structured” geospatial data look like? • General data modelling issues regarding geospatial data • In search of the BLPU • A brief history of OS maps – how structured are they (then and now) • Raster map data • EDRM • Geo-parsers/gazetteers/metadata • Web-based systems • Future directions?
What is Geospatial Information? - 1 • Spatial data which relates to the surface of the Earth • Geodetic reference system as base e.g. WGS84 used for Global Positioning System (Earth as an ellipsoid), Latitude and Longitude (Earth as a sphere) • Ordnance Survey (GB) define National Grid – projection onto flat surface – NB: OS(NI) use Irish grid • Spatial relationships – defined around concept of neighbourhood – relates to two “laws” of geography: • Most things influence most other things in some way • Nearby things are usually more similar than things which are far apart
What is Geospatial Information? - 2 • Unstructured – spaghetti data • Topology – information structured as networks, polygons • GeoSpatial information requires metadata – e.g. minimal information such as map projection used • GeoSpatial information may also temporal modelling – e.g. farm subsidies vary as utilisation and legislation change • Field-based model versus object-based model of space, e.g. rainfall versus buildings on which rain falls • GeoSpatial information requires ontology • What is the “real world”, how classified • Relates to semantics
What are GeoSpatial Systems? • Known as Geographic Information Systems, Spatial Information Systems • Enables capture, modelling, storage, retrieval, sharing, manipulation and analysis of geographically referenced data • Database is at the heart – as is “attribute” data • Model developing – perhaps GeoSpatial data better seen as “attribute” of alphanumeric business information • Presentation does not have to be map-based in all cases • Key element is spatial indexing – uses different techniques to alphanumeric indexing
Where used? Examples • Central government – DEFRA, ODPM, Land Registry, ONS • Local government – planning, highways authorities • Utilities – physical and logical network • Insurance – flood plains • Health – epidemiology • Travel, multi-modal route planning • More widespread use – addresses, postcode based data against regional boundaries, infrastructure (“geographies” used to divide country, catchment area) • Fiat boundaries verus “bona fide” boundaries – what is “real world” how do we structure it?
Structured geo-databaseParadigm shift? Relational Database (Attribute data) Spatial Data (proprietary format) Real Time/Engineering Systems CRM ERP • Spatially extended RDBMS • Complex data types for spatial data • Computational geometry • Spatial indexing • DDL and DML extensions
Geospatial data modelling • Field-based model versus object-based model • Geographic Information Systems are object-based in practice • Most common field based information, e.g. Digital Elevation Model (line of sight applications), attached to objects • Objects rely on field-based model, i.e. spatial co-ordinates • Initiatives such as Digital National Framework encourage organisations to structure data on references to objects, not re-capture and duplicate data • GeoSpatial equivalent of “referential integrity” • Nevertheless duplication, lack of (referential) integrity is common place and hard to eradicate
In search of the BLPU • Basic Land and Property Unit • “Holy grail” of industry – no Da Vinci code produced yet! • Example of Ordnance Survey Master Map (OSMM): • "St Mary's football stadium, Southampton" is one object • Typical detached house and its plot of land, likewise • Complex entities such as "Southampton railway station" are defined in terms multiple objects: one for the main building, several for the platforms, one more for pedestrian bridge over the tracks. (NB: See Wikipedia article on TOID) • Defining the candidate BLPU, their lifecycles and their attribute data and verifying that these are meaningful/practicable from the wide variety of business processes which apply to the BLPU and the aggregate entities which are created from them • Dependencies so that data sets are based on the BLPU wherever possible limited by business use, e.g. field use change quite different from a tenant/owner perspective
database records digital records geographic information paper records digital mapping paper mapping 1990 1970 Evolution of geographic information 2010 1950
Raster map data • Scanned ortho-rectified map or map-based data – metadata is co-ordinates, projection, extent • For example Google Maps/Google Earth, Microsoft Virtual Earth • Traditionally stored outside the database as external files, analogous to vector data storage, e.g. Oracle 10g GeoRaster • Data stored as BLOBs, metadata required regarding number of bytes per pixel, compression algorithms and so on • Benefits limited as “intelligence” in map requires interpretation • Still limited progress on map-based pattern recognition – there are semi-automated solutions from companies such as Laser-Scan
EDRM • Electronic document and records management • Increase usage in local/central government due to Freedom of Information act • Contain potentially significant geospatial data • Most common example is address • Requires capture of appropriate metadata or appropriate pattern recognition to identify addresses • Requires gazetteers to provide reference to spatial co-ordinates • NB: most familiar gazetteer – list of streets in AtoZ maps
Geo-parsers/gazetteers/metadata • Geo-parsers: identify spatial tags (geo-tags) in data • Context sensitivity and patterns of usage required • E.g. Jordan (country) != Jordan (Katie Price) • Can see an example at: • http://edina.ac.uk/projects/geoxwalk/geoparser.html • Relies on and populates gazetteer of associated names • Emerging standards for geo-parsing, e.g. Open GIS Consortium looking at: • Gazetteer service • Geo-coder service • Web services (WMS/WFS)
Web-based systemsGoogle Earth meets Flickr • Web-based systems (metacarta, KML, mashup)
Web-based systems • World wide wild west of unstructured data • Increasing use of systems to control, coordinate and make this accessible • Geo-enabled semantic web – raises issues of ontology • www.metacarta.com – provide web-based Geographic Text Search (GTS), has the ability to confine searches by geography and retrieve information that it detects using the keywords, and then displays this information geographically on a map interface (working now with Google Earth).
They know where you live • MetaCarta(R), Inc., a leading provider of geographic intelligence, announced today that it had won a one-year contract with … the Department of Homeland Security [which] identifies and assesses current and future threats to the homeland, maps those threats against the nation's vulnerabilities, issues timely warnings and takes preventative and protective action… The product automatically identifies geographic references using advanced natural language processing (NLP) from any type of unstructured content in a customer's archives such as email, web pages, newswires or cables. It assigns a latitude and longitude to these references so that users can analyze their text archives using geographic maps, keywords and time as filters. The results of a query are displayed on a map with icons representing the locations found in the natural language text of the documents and as a text results list. Both the icons and text summaries are hyperlinked to the documents they represent. • (Source: http://www.prnewswire.com/cgi-bin/stories.pl?ACCT=109&STORY=/www/story/03-14-2005/0003193909&EDATE=)
The future (and summary) • Structured environment – will contain more “unstructured” data • Web will continue to provide unstructured distributed data • Success of semantic-based approach yet to be determined, experience with geospatial data indicates there are significant complexities based around our representations of the “real world” • One issue is clear – increasingly less privacy, location is already accessible through mobile phones and linking this to other data can provide significant intelligence information • Also clear – data quality issues will persist • They will still get it wrong!