Logical Models and Query Languages for Spatial Databases

Spatial Databases: Lecture 5 DT249, DT21, DT228 Semester 2 2013-14 Pat Browne Logical Models, ADT, Query Language and Intersection Model.

Outline • We cover logical models and query languages from Chapter 3 of Spatial Databases: With Application to GIS (SDWAG) by Rigaux, Scholl, and Voisard. • Initially we look at a set of reference queries. • Many of the figures are from the book.

Models and Implementaion • Conceptual: ERD,UML, • Logical: Network Model • Physical Models: ADT

ArgoCASEGEO

UML • UML can be used as a semi-formal representation of geographic objects. • It is assumed that you are familiar the main features of class diagrams (class, association, multiplicity). In the next few slides we review aggregation and composition. Classes can be converted to RDBMS relations. Associations can be converted into primary and foreign key representations.

UML : Aggregation and Composition • Both aggregation and composition represent a whole-part (has-a/part-of) association. The main difference between aggregation and composition is the lifecycle dependence between whole and part. In aggregation, the part may have an independent lifecycle, it can exist independently. When the whole is destroyed the part may continue to exist. Compositionis a stronger form of aggregation. The lifecycle of the part is strongly dependent on the lifecycle of the whole. When the whole is destroyed, the part is destroyed too.

UML : Aggregation and Composition • Aggregation example. A car has many parts. A part can be removed from one car and installed into a different car. If we consider a salvage business, before a car is destroyed, they remove all saleable parts. Those parts will continue to exist after the car is destroyed. • Composition example. A building has rooms. A room can exist only as part of a building. The room cannot be removed from one building and attached to a different one. When the building ceases to exist so do all rooms that are part of it.

Reference Schemas • A conceptual schema describes themes and their attributes and relations between themes. The following UML diagrams could be used to represent: • The Irish administrative hierarchy namely; country, province, and county. • The Irish Primary Road Network • Irish Cities. • Geological Information.

Reference Schemas

Reference Schemas1 CREATE TABLE Country ( CountryCode INT, CountryName CHAR(1), the_geom GEOMETRY, PRIMARY KEY (CountryCode)); ALTER TABLE County ADD ProvinceCode_FK INT; ALTER TABLE County ADD CONSTRAINT fk_Province_County FOREIGN KEY (ProvinceCode_FK) REFERENCES Province(ProvinceCode);

Reference Schemas1 CREATE TABLE Province ( ProvinceCode INT, ProvinceName CHAR(1), the_geom GEOMETRY, PRIMARY KEY (ProvinceCode)); ALTER TABLE Province ADD CountryCode_FK INT; ALTER TABLE Province ADD CONSTRAINT fk_Country_Province FOREIGN KEY (CountryCode_FK) REFERENCES Country(CountryCode);

Reference Schemas1 CREATE TABLE County ( CountyCode INT, CountyName CHAR(1), Populution INT, the_geom GEOMETRY, PRIMARY KEY (CountyCode)); CREATE TABLE Road ( RoadCode INT, RoadName CHAR(1), RoadType INT, the_geom GEOMETRY, PRIMARY KEY (RoadCode));

Reference Schemas1 CREATE TABLE RoadSection ( SectionCode INT, SectionName CHAR(1), NumberOfLanes INT, the_geom GEOMETRY, PRIMARY KEY (SectionCode)); ALTER TABLE RoadSection ADD CityName_FK CHAR(1); ALTER TABLE RoadSection ADD CONSTRAINT fk_City_RoadSection FOREIGN KEY (CityName_FK) REFERENCES City(CityName); CREATE TABLE City ( CityName CHAR(1), Population INT, the_geom GEOMETRY, PRIMARY KEY (CityName));

Reference Schemas1 CREATE TABLE Road_RoadSection ( RoadCode INT, SectionCode INT, PRIMARY KEY (RoadCode, SectionCode); ALTER TABLE Road_RoadSection ADD CONSTRAINT fk_Road_Road_RoadSection FOREIGN KEY (RoadCode) REFERENCES Road(RoadCode); ALTER TABLE Road_RoadSection ADD CONSTRAINT fk_RoadSection_Road_RoadSection FOREIGN KEY (SectionCode) REFERENCES RoadSection(SectionCode);

Reference Schema: Admin • An administrative hierarchy consists of a country that contains provinces, which in turn contain counties. Each are identified by a unique code and have their own geometry (no shared lines). As it happens all elements have unique name, but our schema will not enforce this.

Reference Schema: Admin • The term ‘aggregation’ is used in UML to describe a whole/part relationship. Aggregation is represented as an open diamond on a class diagram. • Composition (black diamond) is a stronger form of aggregation. It indicates that the lifetime of the part is dependent upon whole. When a country is deleted from the DB, then all its provinces and counties would be deleted too. • The UML diagrams in the previous slides used the less restrictive aggregation.

Reference Schema: Roads • A road has a unique code and name. It consists of a number of sections. The associations between city and section are named. The multiplicities capture the following semantics: • Many road sections can entry or leave a city, but a road section can only enter or leave one city. • A road is composed of one or more ordered sections. • A section can represent more that one road (shared geometry). For example, a primary road passing through a town could be both a primary road and the ‘main street’ of the town.

Reference Schema: Roads • Note, the association between Road and RoadSection as a constraint called ordered. The term ordered refers to a constraint that specifies that the order of participants within a given AssociationRole is semantically important. It could be represented by an List rather than a Set. • Intuitively an Association Role refers to sets of objects at a particular end of an association.

Reference Schema: Geology, Archaeology, GeoDirectory • We will use a geological data set which contains bedrock information for region queries. http://www.gsi.ie/mapping.htm • We will use some historical information and the Geodirectory for point data queries. • http://webgis.archaeology.ie/NationalMonuments/FlexViewer/ • https://www.geodirectory.ie/Maps.aspx https://www.geodirectory.ie/Maps.aspx

Reference Queries • Queries in a spatial DB can be purely spatial, purely thematic, or a mixture of both. For example: • select * from county where name = 'Meath'; • Is a purely thematic query no spatial operations are involved. We often wish to see the graphic result of such a query.

Reference Queries • Queries with alpha-numeric criteria • Queries with spatial criterion • Interactive queries, which require participation from the user. We do not use an interactive GIS with a ‘live’ two way connection to the geo-data but we can simulate constants to represent user selected items (e.g. ‘POINT(..)’).

Reference Queries

Interactive Reference Queries

Alpha-Numeric Queries

Querying Geographic Objects • Recall that geographic objects have both a spatial & descriptive part. Ideally users would like to query geographic objects without the need of a detailed understanding of their spatial component. Users should need only a general idea of spatial concepts e.g. a river is a linear object and county is a polygon. They should not be concerned with the low level spatial representations discussed in Chapter 2.

Querying Geographic Objects • Current relation databases systems (RDBMS) need to be extended to handle spatial data. We use PostgreSQL which is an Object Relational Database (ORDBMS). We use a spatial extension to PostgreSQL called PostGIS. The basic DB is extended with an Abstract Data Types (ADTs aka blades/cartridges) which implements the OGC’s Simple Features for SQL.

Querying Geographic Objects • The ADTs are encapsulated in the sense that the data is only accessible through the operations defined on the data. This means that the user of the ADT (usually a programmer or spatial specialist) is not aware how the operations are implemented. All the user of the ADT needs to know are the required arguments and the expected result. Encapsulation is an important concept in object orientation (as are identity, inheritance, and messages)

Querying Geographic Objects • Two design issues • The definition of spatial ADTs and • Their integration with SQL and the DBMS • These issues are largely application driven and in general do not rest on a sound mathematical or formal representations (e.g. topo-models). • The OGC’s Simple Features for SQL have columns that represent ‘geometry’ (normal SQL columns or tables represent ‘themes’ or ‘attributes’).

Perceived limitations of Relational Model for Spatial Data • Spatial data may be represented directly in a conventional relational database. But this is less than ideal. Currently, such an approach means that the spatial structures are visible at the table level, which requires that the user of the query language must understand the details of the spatial structures used. • Also, with current RDBMS, using standard relational tables is not efficient for spatial data. Indexing can speed improve performance.

Data Types • One of the purposes of data types is to allow correct expression: • select true or false; • But forbid meaningless expressions • select true + 1; • ERROR: operator does not exist: boolean + integer • HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts. • select 0::boolean or 1::boolean;

Data Types • Another the purpose of data types is to free the programmer from the details of the underlying actual representation. The programmer merely state that they require a string or an integer and the system looks after the necessary storage details.

Data Types • Each data type comes with one or more operations or functions e.g. integers have the plus operation. • Integer + Integer -> Integer • We can also mix types. For example, grater-than (>) uses Integer and Boolean types. • Integer > Integer -> Boolean

Data Types • In SQL types describe the data that program will work with. • PostgreSQL has a rich set of native data types (e.g. Integers, decimal (aka double precision, float8, floating point numbers), Strings (aka text), Dates, Time, etc.) available to users.

Data Types • In PostgreSQL the users may add new types to PostgreSQL using the CREATE TYPE command. CREATE TYPE vertex AS (x double precision, y double precision); • We can use vertex as we would any built-in type. SELECT CAST(ROW(x,y*0.02) As vertex) as myvert FROM generate_series(1,10) As x CROSS JOIN generate_series(10,20,2) as y

Concrete Data Types • A concrete data type is a specialized language-oriented data type that represents a language specific domain concept (e.g. Integer). They are implemented automatically by the language (in our case SQL). A concrete data type can be embedded or composed with other data types to form larger data types.

Spatial Abstract Data Types1 • The term Abstract Data Type (ADT) refers to a data type that extends the SQL type system. ADTs can be used to define the column types for tables, this allows values stored in the columns of a table to be instances of ADTs. Some Geo-spatial Standards are defined in terms of ADTs, Individual implementations (e.g. PostGIS from Refractions Research) are free to choose their own implementation details (e.g. language) once they respect the specification of the ADT.

Spatial Abstract Data Types1 • SQL operations and functions may be declared to take ADT values as arguments, and return ADT values as results. An ADT may be defined as a subtype of another ADT, referred to as its super-type.

Spatial Abstract Data Types1 • Spatial ADTs are implemented in in various languages and for various environments. • GeoAPI.NET project provides a common framework based on OGC/ISO standards to improve interoperability among .NET GIS projects. • JASPA (JAvaSPAtial) is a spatial extension for relational database systems. JASPA implements the OpenGISSimple Features for SQL and the SQL/MM standard. • GeoTools is an open source Java library that provides tools for geospatial data. • PostGIS is written in C & implements the OpenGIS Simple Features for SQL.

Spatial Abstract Data Types1 • The JTS Topology Suite is an API of spatial predicates and functions for processing geometry. It has the following design goals: • JTS conforms to the Simple Features Specification for SQL published by the Open Geospatial Consortium • JTS provides a complete, consistent, robust implementation of fundamental algorithms for processing linear geometry on the 2-dimensional Cartesian plane.

Spatial Abstract Data Types1 • The sub-type mechanism allows an instance of the subtype to be stored in any column where an instance of the supertype is expected and allows an instance of the subtype to be used as an argument or return value in any SQL function that is declared to use the super type as an argument or return value. • The above definition of ADTs is value based, and value based ADTs with the above properties are defined as part of the current draft SQL3 standard

Spatial Abstract Data Types • Spatial ADT’s were introduced to overcome the perceived limitations of the relational model. Spatial ADT’s provide an abstraction for modelling the geometric & topological structure of objects in space, their relationships, properties and operations. PostGIS provides spatial ADT’s for the PostgreSQL DB. The semantics are that of the OGC Simple Features for SQL. In other extensible DBMS (e.g. ORACLE) spatial type extension packages are called spatial data blades or cartridges.

Spatial Abstract Data Types • Introducing these types in the DBMS requires not only defining their structure but the operations that can be performed on them. For instance, a type “polygon” can be introduced with operation PolygonArea (which computes the area of a polygon). Combinations of such data types and their operations are referred to as abstract data types (ADTs). An important point is that ADTs are encapsulated in the sense that they are accessible only through the operations defined on them. In addition, the DBMS need not have any knowledge on the ADT implementation (i.e., on the code corresponding to the operations). For instance, the way PolygonArea is implemented is not relevant.

Spatial Abstract Data Types • In order to support geo-spatial applications, the type system of the standard relational database needs to be extended by providing additional types beyond the built-in types (such as integer or real). These new types require a basic structure and a set of operations that can be performed on those structures. For example, a type Polygon could have an operation called Area, that computes the area of a polygon.

Spatial Abstract Data Types • Although more complex, the spatial types appear to the user to be similar to the built-in strings and integers. • A spatial ADT for a region includes a list of operations on regions. Query language user only interested in ADT’s interface (signature). Conceptual ERD Implementation of region operations such as touch, adjacent, overlap.

Spatial Abstract Data Types • The abstraction of the region type in the conceptual geographic model means that it is independent of the spatial model. As long as the same interface is presented to the geographic model any reasonable spatial model will work. The spaghetti, network, or topological models could be used to represent spatial objects.

Spatial Abstract Data Types • The user sees the data from the ‘geographic’ point of view. For example, objects O1, O2, O3, are stored using both the Topological Model and the Spaghetti Model.

Choosing Spatial Types • Trade-off between modelling power captured in the definition and the constraints imposed by the chosen representation.

Choosing Spatial Types • Consider constraint on polyline (b) • the end point of a segment is part of at most two segment. • This precludes branching, which is common in road and river networks.

Chosen Simple Spatial Types1 • Point: zero dimensional object • Polyline: one dimensional object, a list of pair wise connected segments, with constraint that endpoint shard by at most two segments. • Region: two dimensional object, a set of non-overlapping polygons (polygons assumed but not defined)

Operations on Spatial Types(*) • Definition of region1.The infinite set of points lying inside and on the boundary2 of a polygon. • Definition of operation intersect. All points common to both polygons2. • The next slide (page 79, SDWAG) represents three possible interpretations of the return type of the intersect operations on regions.

Logical Models and Query Languages for Spatial Databases