Spatial Databases: Lecture 2

Spatial Databases: Lecture 2 DT249, DT211,DT228 Semester 2 2012-13 Pat Browne

Topics • Spatial Joins • Spatial databases • OGC • Themes • Indexing • Spatial Database Architecture • PostGIS

Spatial Join A spatial join associates two tables based on a spatial relationship, rather than an the classic non-spatial relational attribute. A spatial join operation is used to combine two or more dataset with respect to a spatial predicate or spatial operation. Predicates can be a combination of directional, distance, and topological spatial relations (e.g. overlap, contains). In case of non-spatial join, the joining attributes must of the same type, but for spatial join they can be of different types.

Spatial Join Example1 Query: For all the rivers listed in the River table, find the counties through which they pass. SELECT r.name, c.name FROM river AS r, country AS c WHERE crosses(r.the_geom,c.the_geom)=True The spatial predicate “Cross” is used to join River and Country tables To view this we would add asbinary(R.the_geom,C.the_geom). Note this will not work with our Irish data set.

Spatial Joins • In practice, spatial join operations are divided into a filter step and a refinement step to efficiently process complex spatial data types such as point collections in a row instance. In the filter step, the spatial objects are represented by simpler approximations such as their Minimum Bounding Rectangle or Box (MBR or MBB).

Spatial Joins Example 2 • A spatial join associate two tables based on a spatial relationship, rather than an attribute relationship. For example the query: Summarize the election results by municipality. • Could be answered using the following SQL:

SELECT m.name, sum(v.fiannfail) AS FF, sum(v.gp) AS Green, sum(v. finegael) AS FG, sum(v.vtotal) AS total FROM voting_areas v, municipality m, WHERE intersects(v.the_geom, m.the_geom) GROUP BY m.name ORDER BY m.name; http://www.oireachtas.ie/documents/publications/dublin2.pdf http://www.oireachtas.ie/viewdoc.asp?DocID=10838

Dynamic and Static Data • Static non-spatial data is usually maintained in the table with the geometry (e.g. county name or unique identified). In some cases this data along with the geometry is considered immutable. • Dynamic non-spatial data is often maintained in a separate table e.g. land ownership. • There can be more than one dynamic table for a geometry table. • Dynamic spatial can include moving objects or a changing world (temporality requires different treatment)

Traditional Joins • Pure data tables can be joined with the geometry tables for querying purposes • A primary key is used to relate the 2 tables together • A primary key is a unique identifier for each row in a table Primary Key

Spatial Join1 A typical example of spatial join is “Find all pair of rivers and cities that intersect”. The result of join between the set of rivers {R1, R2} and cities {C1, C2, C3, C4, C5} is { (R1, C1), (R2, C5)}.

Guting’s1 definition of a spatial database • (1) A spatial database system is a database system • (2) It offers spatial data types in its data model and query language • (3) It supports spatial data types in its implementation, providing at least spatial indexing and efficient algorithms for spatial join2.

Why use a database for GIS? • GIS are not database systems, although they can be connected to a DBMS. • A GIS cannot efficiently manage large quantities of non-spatial data (e.g. at government department level). • They lack ad hoc querying capability (they provide a restricted form of predefined queries) • They lack indexing structures for fast external data access (they use in memory techniques). • They lack a 'logic' (e.g. first order logic of the relational calculus)

Why use a database for GIS? • Databases offer the following functions: • Data independence • Data Abstraction • Self-describing • Concurrency • Distributed capabilities • High performance • Supports spatial data types using ADTs. • Alternative: files • Reliability • Integrity: enforces consistency • Security • User views • User interface • Querying • Updating • DB theory has a Mathematical basis

Programming Languages • The basic components of current languages are: • Data types e.g. Integers, String, Polygon. • Variables to refer to data types e.g. a = 2 • Operations on those data types e.g. area(polygon) • Control structures e.g. sequence, iteration, and conditions. • Logic is an important part of programming, but it is often implicit and external to the language. Some languages like SQL are quite close to logic.

Data types • A data type represents a constraint placed upon the interpretation of data in a type system, describing representation, interpretation, legal operations and structure of values. • Data types are a way to limit the kind of data that can be used by a particular program or stored in a database table. Types restrict the data to a certain set of values (e.g. 1,2,3,..for Integers). • Data types also are restricted to certain operations on the type (e.g. addition for Integers). SQL comes with a range of standard data types that can be used to represent strings, integers, etc, PostgreSQL can be extended to have additional types e.g. spatial data types.

SQL supports data abstraction Data Abstraction- allows users to ignore unimportant details View Level– a way of presenting data to a particular group of users Logical Level– how data is interpreted when writing queries Physical Level– how data is manipulated at storage level by a computer. Most users are not interested in the physical level.

Databases use high level declarative languages (SQL) • Data Definition Language (DDL) • Create, alter and delete data • CREATE TABLE, CREATE INDEX • Data Manipulation Language (DML) • Retrieve and manipulate data • SELECT, UPDATE, DELETE, INSERT • Data Control Languages (DCL) • Control security of data • GRANT, CREATE USER, DROP USER

Spatial Types – OGC Simple Features for SQL

Spatial Types – OGC Simple Features for SQL Composed Sub Type SpatialReferenceSystem Geometry Relationship GeometryCollection Point Curve Surface Polygon MultiSurface MultiCurve MultiPoint LineString Line LinearRing MultiPolygon MultiLineString

Spatial Types – OGC Simple Features for SQL (*)

Operations OGC Simple Feature Types

OGC Simple Features for SQL1(*) • The OGC SF (similar to ISO 19125-1) describes 2-D geometry with linear interpolation between vertices. The simple feature model consists of a root class Geometry and its subclasses Point, Curve, Surface, GeometryCollection. The class Geometry collection has the subclasses Multipoint, Multicurve, MultiSurface.

OGC Simple Features for SQL1 (*) • The OGC does not include complexes, a third dimension, non-linear curves, `networking or topology (i.e. connectivity information). • Because of it relative simplicity and its support in both the commercial & open source community SFSQL is widely used in DBMS and is supported in many Web applications. • It is expected that newer more sophisticated standards such as ISO-19107 will gradually replace OGC SF.

OGC Simple Features for SQL (*) • Brief description • A simple feature is defined to have both spatial and non-spatial attributes. Spatial attributes are geometry valued, and simple features are based on 2D geometry with linear interpolation between vertices. Each feature is stored as a row in a database table. This course covers the OGC: GEOMETRY type with subtypes such as POINT, LINE, POLYLINE, POLYGON, and collections of these.

OGC Simple Features for SQL (*) • Functionality can be described under the following headings. • Basic Methods on Geometry • Methods for testing Spatial Relations between geometric objects • Methods that support Spatial Analysis • Geometry Collection

OGC Simple Features for SQL (*) • Basic Methods on Geometry • Describes the dimensions and reference system (SRID) of the geometry. • Operations include Dimension, GeometryType, , conversions AsText, AsBinary, tests on geometry include IsEmpty, IsSimple. Operations that return geometry Boundary, Envelope returns bounding box • Methods for testing Spatial Relations between geometric objects • These polymorphic methods check relations on the generic or super class GEOMETRY and usually return a Boolean. Main methods Equals, Disjoint, Intersects, Touches, Crosses, Within, Contains, Overlaps, Relate( testing for intersections between the Interior, Boundary and Exterior of the two geometries) • Methods that support Spatial Analysis • A set of geometric and ‘metric’ methods. Methods calculate distances and areas with respect to the spatial reference system of this Geometry. Methods include Distance, Buffer, ConvexHull, Intersection, Union, Difference, SymDifference. • Geometry Collection • A GeometryCollection is a geometry that is a collection of 1 or more geometries. All the elements in a GeometryCollection must be in the same Spatial Reference. Subclasses of GeometryCollection may restrict membership based on dimension and may also place other constraints on the degree of spatial overlap between elements. Methods • NumGeometries( ):Integer—Returns the number of geometries in this GeometryCollection. • GeometryN(N:integer):Geometry—Returns the Nth

OGC Spatial Relations • Equals – same geometries • Disjoint – geometries share common point • Intersects – geometries intersect • Touches – geometries intersect at common boundary • Crosses – geometries overlap • Within– geometry within • Contains – geometry completely contains • Overlaps – geometries of same dimension overlap • Relate – intersection between interior, boundary or exterior

Semantics of geometry Here are some of Euclid’s definitions from about 300 BC A point is that which has no part. A line is breadthless length. The ends of a line are points. A straight line is a line which lies evenly with the points on itself. A surface is that which has length and breadth only. The edges of a surface are lines. A plane surface is a surface which lies evenly with the straight lines on itself. A boundary is that which is an extremity of anything A figure is that which is contained by any boundary or boundaries. Are these definitions consistent with your understanding? Is a line really composed of points? How does a computer draw a line?

Semantics of geometry Most computer displays use raster data structures to store pixel information. http://en.wikipedia.org/wiki/Raster_graphics

Drawing a line in JavaScript1 Note: Code use variable b for y-intercept. The equation of a line is y = mx + c. dx, dyare changes in x,y m is the slope. c is y-intercept.

Drawing a line in JavaScript1 The code on the previous slide is fine when the slope is less than or equal to 1 (dx >= dy). If the slope greater than 1, the line will have gaps (dx < dy). If dx = 0 then only a single point is plotted. Most computer displays are pixel based.

Calculating Distance in Java import java.util.Scanner; public class DistanceApp { public static void main(String[] args){ int x1, y1, x2, y2; double distance; Scanner scan = new Scanner (System.in); System.out.print("Enter the x coordinate for point 1: "); x1 = scan.nextInt(); System.out.println("The x coordinate for point 1: "+ x1); System.out.print("Enter the y coordinate for point 1: "); y1 = scan.nextInt(); System.out.println("The y coordinate for point 1: "+ y1); System.out.print("Enter the x coordinate for point 2: "); x2 = scan.nextInt(); System.out.println("The x coordinate for point 2: "+ y1); System.out.print("Enter the y coordinate for point 2: "); y2 = scan.nextInt(); System.out.println("The y coordinate for point 2: "+ y2); distance = Math.sqrt( Math.pow((x2 - x1),2) + Math.pow((y2 - y1),2)); System.out.println("The distance between the two points is " + distance); }} Change to floating point input using nextDouble and change coord to double.

Contains Relation Does the base geometry (small circles) contain the comparison geometry (big circles)? For the base geometry to contain the comparison geometry it must be a superset of that geometry. Geographic Information Systems and Science, ,Longley,Goodchild,Maguire,Rhind

Some spatial relations

Geographic Information Systems and Science , Longley,,Goodchild,,Maguire,Rhind Touches Relation Does the base geometry (small circles) touch the comparison geometry (big circles) ? Two geometries touch when their boundaries intersect. Raises deep mathematical issues e.g. what is the boundary of a point?, what about tolerance + or - a metre?

Spatial Methods • Distance – shortest distance • Buffer – geometric buffer • ConvexHull – smallest convex polygon geometry • Intersection – points common to two geometries • Union – all points in geometries • Difference – points different between two geometries • SymDifference – points in either, but not both of input geometries

Convex Hull The convex hull of a set of points is the intersection of all convex sets which contain the points. A set of points is convex if and only if for every pair of points p,q in S, the line segment pq is completely contained in S. Left is convex set and right non-convex set Convex hulls constructed around objects.

Operations on themes (*) • Theme projection1: ‘selecting’ some attributes from the countries theme. Get the Population and Geometry of European countries • Theme selection: Name and population of European countries with a population of 50 million or more. • Theme union : European countries with population less than 10 million joined with those over 10 million. • Theme overlay: See example • Theme merge : See example

Operations on themes (*) • Theme overlay1: Generates a new theme and new geometry from the overlaid themes. We get the geometric intersection of spatial objects with the required themes. See European language example. • Theme merge : The merge operation performs the geometric union of the spatial part of n geographic objects that belong to the same theme under a constraint condition supplied by the user. See East/West Germany example.

Projection on Theme (*) Find the countries of western Europe with population greater than 50 million. This is a projection on the attribute population. Unlike a conventional database query we often want the query result and the original context, in this case Europe.

Theme Merge (*) Merging two geographic objects in a selected theme (say country) into a single object.

Theme Overlay (*) Anglo-Saxon Latin languages The lower map represents the overlay of European countries and languages.

Indexing • Indexing is used to speed up queries and locate rows quickly • Traditional RDBMS use 1-d indexing (B-tree) • Spatial DBMS need 2-D, hierarchical indexing • Grid • Quadtree • R-tree • Others • Multi-level queries often used for performance (MBR)

R-tree Examples of R – Tree Index of polygons

Minimum Bounding Rectangle Study Area Minimum Bounding Rectangles

Spatial enabled DB Summary • Database – an integrated set of data on a particular subject. Can include spatial and non-spatial and possible temporal. • Databases offer many advantages over files • Relational databases dominate for non-spatial use, object-relational databases (ORDBMS) often used for spatial data. • Databases address some limitations for GIS

Choice database for GIS? • Choice of DBMS: • Commercial (Oracle, DB2) or • Open source (PostgreSQL, MySQL) . • We will PostgreSQL with PostGIS spatial extensions. • PostgreSQL is an Object Relational Database System (ORDBMS).

Database Architecture for GIS(*) • PureRelational Approach • Spatial data can be stored in a pure RDBMS. The coordinates for the spatial data can be stored in tables. Uses existing technologies, requires no additional software (for the pure DBMS perspective). • Drawbacks • It is difficult to represent and query complex spatial structures (such as a polygon with holes) or topological relationships ( network connectivity, polygon adjacency). No ordered lists. • Violates independence principle, user must know about data storage. Change of geometric representation requires deep reorganization of the database and query formulation. • Poor performance, requires a lot of processing of the relational tuples that represent the spatial information. • Lack of user friendliness because users have to manipulate tables of points. • Difficulty of defining new spatial types. • The impossibility of expressing geometric computations such as adjacency tests, point query, or window query.

Database Architecture for GIS(*) • Loosely Coupled • Many current commercial such as ArcInfo use this approach. • Uses a RDBMS to store 'attribute' or descriptive information e.g. the name of a road not its geometry. • A specific module for spatial data management. • Drawbacks • The coexistence of heterogeneous data models, which implies difficulties in modeling use and integration. • A partial loss of DBMS techniques e.g. recovery, querying, optimization.

Database Architecture for GIS(*) • Loosely Coupled Architecture Application Programs Relational DBMS Geometric Processing (GIS) Database Files

Spatial Databases: Lecture 2

Spatial Databases: Lecture 2

Presentation Transcript

Spatial Databases: Lecture 6

Spatial Databases: Lecture 5

Spatial Databases: Lecture 3

Spatial Databases: Lecture 9

Spatial Databases

Databases, Spatial Databases, and PostGIS

Spatial Databases - Indexing

Spatial Databases - Representation

Spatial Databases - Quiz

Spatial Databases: Spatio-Temporal Databases

Spatial Databases

Spatial Databases

Chapter 2: Spatial Databases

Spatial Databases - Topology

Spatial Databases

Spatial Databases

Spatial Databases

Spatial Databases - Topology

Lecture 2: Active Databases

Spatial Databases - Topology

5. Spatial databases

Spatial Databases - Indexing