On Languages for the Specification of Integrity Constraints in Spatial Conceptual Models Mehrdad Salehi Yvan Bédard Mir Abolfazl Mostafavi Jean Brodeur Center for Research in Geomatics (CRG) Department of Geomatics Sciences Laval University Canada ER 2007 Workshop on Semantics and Conceptual Issues in GIS (SeCoGIS) Auckland, Newzeland
Presentation Plan • The role of spatial integrity constraints in spatial data quality • Definition of spatial integrity constraints • Classification of languages for the specification of spatial integrity constraints at the conceptual level + Examples • Natural language • Visual language • First-order logic language • Hybrid language • Comparison of languages • Conclusion and on-going/future work
The Role of SIC in Spatial Data Quality Spatial Data Quality • Internal Data Quality • Completeness • Positional Accuracy • Temporal Accuracy • Thematic Accuracy • Logical Consistency External Data Quality Deals with fitness for use of the data SIC carry semantic information of a database application and are used to preserve logical consistency in spatial databases
Spatial Integrity Constraints • A spatial IC (SIC) defines mandatory, allowed, and unacceptable spatial relationships and values, sometimes in relation to other specific attribute values, geometric features shapes, specific relationships, or for given areas of validity. Simple examples of SIC: • Topological IC: Based on topological properties and relationships • “Each building must be represented by a closed area.” • “Two buildings do not overlap.” • Metric IC: Based on metric properties and relationships • “The area of a house must be more than 100 square meters.” • “Distance between a school and a gas station must be more than 30 meters.” • Integrity constraints (IC) are assertions that restrict the data values that may appear in the database to prevent insertions of incorrect data.
Definition of SIC • SIC convey essential semantic information of database applications • It is necessary to define SIC at all levels of a spatial database design process • Each database design level requires its specific SIC specification language (spatial ICSL) • Conceptual level: SIC must be first defined with a language understandable to a database users • Implementation level: SIC are then translated to a DDL or a programming language to be understandable to a computer SIC at the Implementation Model SIC at the Conceptual Model Disjoint (Road.geometry, Building.geometry) Road disjoint Building • This presentation focuses on the spatial ICSL at the conceptual level.
Classification of the Spatial ICSL at the Conceptual Level • We categorize the existing spatial ICSL at conceptual level into: • Natural languages • Free natural languages • Controlled natural languages • Visual languages • First-order logic languages • Hybrid languages • Visual hybrid languages • Natural hybrid languages
Natural Languages • People use natural languages for their daily communications. • They are the easiest languages for a database client to express SIC. 1. Free Natural Languages: • are natural languages without additional limit to their syntax and semantics • support a rich vocabulary • are sometimes ambiguous or used too loosely • Several words may bear the same semantics • Words may have several meanings depending on the context • Loose usage of restrictive terms (and, or, must, can, … ) 2. Controlled Natural Languages: • are sub-sets of natural languages whose syntax and semantics are restricted • are proposed to overcome the ambiguity of free natural languages
Natural Languages Examples for controlled natural languages: • Ubeda and Egenhofer approach (1997): (Entity Class1, Topological Relation, Entity Class2, Quantifier) • forbidden • at least n times • at most n times • exactly n times extended 9I model topological relationships (e.g., inside, cross) Example: (Road, Cross, Building, Forbidden) • Vallieres et. al approach (2006): • “Objects Class1” + “Topological Relation” + “Objects Class2” + “[-,-]” 8 topological relations extended by three notions: tangent, border, strict Cardinality Example: Road Segment Touch-Tangent Road Segment [1,2]
Visual Languages • Employs graphical and image notations • Database end-user must • learn the semantics of every visual construct • understand very well the very specific context of its usage • Several ambiguities and unintended meanings can emerge • Example for visual spatial ICSL: Pizano et al. (1989) • In this language pictures show unacceptable database states terms “constraint pictures” Cars and people cannot be inside a crosswalk simultaneously
First-Order Logic Language • Supports precise semantics and syntax • However, using and understanding this language requires a mathematical background • Database end-users do not necessarily have a mathematical background Example of FOL for expressing SIC: Hadzilacos and Tryfona (1992) • The syntax of this language is structured as: • Atomic topological formulae consisting of: • Binary topological relations between objects • Geometric operator over objects • Comparison between attributes of objects • Negation, conjunction, disjunction, and universal and existential quantifications • Example of SIC “A Road and a Building are disjoint” :
Hybrid Languages • Are not purely natural, visual, or logical, instead are the combination of them • Depending on the dominant part of a language, they are: 1. Visual hybrid languages • The main part includes visual symbols • Visual constructs are enriched by a limited number of natural language descriptions 2. Natural hybrid languages • The dominant part is a natural language • Complementary components are visual pictograms (e.g., ) or symbols
Hybrid Languages 1. Example for a visual hybrid language • There is no visual hybrid spatial ICSL • However, spatio-temporal conceptual modeling languages (e.g., Perceptory) • specify a number of SIC in the conceptual schema • contradicts the conciseness rule of conceptual schemas • are mostly limited to “constraints on spatial relations” • leave the remaining SIC to be defined by a specific spatial ICSL Example: “A Roundabout is crossed by at least one Route”
Hybrid Languages 2. Natural hybrid languages 2.1. Example for a natural hybrid language with pictograms • Normand (1999) • A language for defining SIC in the data dictionary and includes: • three pictograms for point, line, and polygon • topological relations based on ISO 9I model • Defines topological and metric IC on the relationship between objects • Supports multiple geometries • Express a complex SIC for an object in a tabular form
Hybrid Languages 2. Natural hybrid languages 2.2. Example for natural hybrid language with symbols • Spatial OCL (Kang et al. 2004) • extends OCL, i.e., an ICSL along with UML, by adding: • basic geometric primitives (e.g., point) to OCL meta-model • 9I topological relations (e.g., overlap) to OCL operators • specifies topological IC • Example “A building is disjoint from a Road”: context Building inv: Road.allInstances()->forAll(R|R.geometry->Disjoint self.geometry)) Is it really a Hybrid Natural Language ?
Comparing Spatial ICSL • Why comparing spatial ICSL? • We are not aiming at finding “the best” spatial ICSL (if such a thing is possible!). • We are revisiting our past practices (i.e. is Hybrid natural language still the best ICSL for the natural level?) • Our goal is to summarize the potential avenues for developing ICSLs for spatio-temporal databases AND spatial datacubes. • Comparison Criteria 1. Expressiveness: • Semantic quality: Correspondence between ICs’ meaning and concepts supported by a spatial ICSL • Syntactic quality: Degree to which the rules of spatial ICSL govern the structure of expressions • Richness: Capability to express the needed elements of SIC • Inherence: Precision of an ICSL to be straight to the point and focuses on the essential aspects of SIC 2. Pragmatics: • Usability of the spatial ICSL by database end-users • Facility to translate spatial IC into technical languages In our context the former pragmatic quality has priority over the latter.
Comparing Spatial ICSL • Three values “Good”, “Medium”, and “Weak” are used to rank the languages. ?? « Natural » is the way to go for the conceptual level • The values represent our opinion from a literature study and 20 years of experience in spatial database modeling and development.
Conclusions • Spatial IC convey important semantic information of applications • They must be first defined at the conceptual level for database end-users • We presented a classification of spatial ICSL at the conceptual level: • According to our opinion “controlled natural languages” and “natural hybrid languages with pictograms” are good candidates
On-Going and Future Work • We are currently working on a classification of IC in spatio-temporal database applications • This classification provides the basic constructs to build an ICSL for spatio-temporal databases and spatial datacubes • We will build an ICSL for spatial datacubes based on: • The results of the classification of ICs • Spatial datacubes vocabulary (e.g., Dimension and Measure) • The candidate languages resulted from the current research work