slide1 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Introduction to Geographic Information Systems Fall 2013 (INF 385T-28620) Data Modeling, Database Design Dr. David Arc PowerPoint Presentation
Download Presentation
Introduction to Geographic Information Systems Fall 2013 (INF 385T-28620) Data Modeling, Database Design Dr. David Arc

Loading in 2 Seconds...

play fullscreen
1 / 55

Introduction to Geographic Information Systems Fall 2013 (INF 385T-28620) Data Modeling, Database Design Dr. David Arc - PowerPoint PPT Presentation


  • 197 Views
  • Uploaded on

Introduction to Geographic Information Systems Fall 2013 (INF 385T-28620) Data Modeling, Database Design Dr. David Arctur Research Fellow, Adjunct Faculty University of Texas at Austin Lecture 11 October 31, 2013. You are here. Food is there. Well, now it’s there. Outline.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Introduction to Geographic Information Systems Fall 2013 (INF 385T-28620) Data Modeling, Database Design Dr. David Arc' - anise


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

Introduction to Geographic Information Systems Fall 2013 (INF 385T-28620)

Data Modeling,

Database Design

Dr. David Arctur

Research Fellow, Adjunct Faculty

University of Texas at Austin

Lecture 11

October 31, 2013

slide2

You are here

Food is there

Well, now it’s there

outline
Outline

INF385T(28620) – Fall 2013 – Lecture 11

The design process

Describing a system of viewpoints

Developing use cases

GIS design stages

Conceptual design

Normalization example

Database diagrams

Building the database

Prototype to production

best practices iterative incremental development
Best practices: iterative, incremental development

Planning, Cost-Benefit, Risk Management

Ideas, Use Cases, Requirements

Analysis, Design

and Evaluation

Maintenance, Evaluation

Development, Quality Control

and Evaluation

Note: This is only an illustration – your experience may differ!

Deployment

INF385T(28620) – Fall 2013 – Lecture 11

best practices iterative approach
Best practices: iterative approach
  • Focus on current issues
    • Short iterations better reflect near-term internal and external environment
  • Resolve misunderstandings early
  • Resolve analysis, design and implementation disparities early
  • More accurate overall project status
  • Workload better distributed across life cycle
    • Medium- to long-term changes in environment factored in more easily through spiral approach

INF385T(28620) – Fall 2013 – Lecture 11

starting out
Starting out
  • Define the project
  • Get to know the client
    • Stakeholders and decision makers
  • Build the design team
    • Analyst
    • Subject matter experts
    • Users
  • Inventory tasks, products, data
    • Identify model
    • Develop Use Cases
    • Inventory data
  • GIS Design Seminar
  • Teach client GIS concepts and design methods
  • Introduce yourself and the design team
  • Set realistic expectations for the GIS system
  • Alleviate fears and concerns

INF385T(28620) – Fall 2013 – Lecture 11

building a system of viewpoints
Building a System… of Viewpoints

Community Objectives

Business aspects: purpose, scope and policiesWhat for? Why? Who? When?

Enterprise

Viewpoint

Abstract/Best Practices

Information sources

and

models

What is it about?

Types of services and protocols

How does each bit work?

Information

Viewpoint

Computational

Viewpoint

Implementation/Development

Solution types: distribution infrastructureHow do the components work together?

Engineering

Viewpoint

Implementation system: hardware, software, distribution

With what?

Technology

Viewpoint

INF385T(28620) – Fall 2013 – Lecture 11

Viewpoints in “Reference Model - Open Distributed Processing (RM-ODP)” ISO/IEC 10746

use cases

Notify Owners

Use cases
  • A description of a task you want the system to perform
    • Add new water service, record parcel sale
  • Basis of all analysis and design
    • Start simple; expand with detail later
  • Analysis of use cases yields data, interfaces, applications
  • Use cases can:
    • Capture existing work flows
    • Define new applications
    • Help understand alternative and pathological work flows

Use case

Data

GIS Database

INF385T(28620) – Fall 2013 – Lecture 11

use case diagrams

Emergency Call Center

Document locations of flooding

Operations & MaintenanceStaff

Manage flood control structures

Produce flood maps

Geocode & map call list

GIS Analyst

Use case diagrams

System boundary

  • Show the actor/use case relationships
    • System architecture
    • Data flows, coordination
  • Develop with users
    • During meetings, interviews
    • Clean up and refine later
  • Graphical notation is helpful, but the use case document is the most important artifact

Actor

and name

Use case

and name

INF385T(28620) – Fall 2013 – Lecture 11

example use case

Use Case Name

Description:Actors:Pre-conditions:Post-conditions:

Flow of events:- business rules- user actions, responses

Exceptions:Alternates:

Example use case
  • Some context: For emergency response application, assume a set of use cases focused on exchanging information leading to creating, updating, and posting flood maps to interested agencies.
  • Each use case is documented according to a template, such as:

INF385T(28620) – Fall 2013 – Lecture 11

example use case1
Example use case
  • Use case: Flooding Information & Response
  • Description: The call center receives and documents citizen calls related to flooding during storm events. The information is geocoded, mapped, and provided to the Water District’s Operation & Maintenance staff, which makes decisions to manage control structures to mitigate flooding.
  • Actors: Emergency Response Call Center, GIS Analyst, Operations & Maintenance Staff.
  • Pre-conditions: The database of critical water facilities has been created. The Emergency Call Center has been activated for a storm event. Water district staff are operating under emergency operating procedures.
  • Post-conditions: Status change notices are sent to the relevant agencies registered to receive updates.

INF385T(28620) – Fall 2013 – Lecture 11

use case primary scenario
Use case primary scenario
  • Citizen places a call to the Emergency Call Center hotline.
  • Call center staff document location and description of flooding problem.
  • GIS analyst receives and geocodes locations from the call center, producing a map of call locations.
  • Call reports are symbolized based on flooding issue. Maps are produced and handed off to the operations staff.
  • Operations & Maintenance staff review maps of flooding incidents and make decisions for operating control structures, gates, and pumps.

INF385T(28620) – Fall 2013 – Lecture 11

are we finished yet
Are we finished yet?
  • What layers will you need based on the use case?
  • Where will you get these layers?
  • Are there any changes that can be made to the business process?

INF385T(28620) – Fall 2013 – Lecture 11

using the use cases
Using the use cases
  • From the set of use cases developed, the functional requirements and interfaces can be fleshed out.
  • From an understanding of the collaborators and stakeholders involved in the use cases, appropriate data sources and maintenance authorities can be determined.
  • From a comparison of all the use cases, redundant information and tasks can be discovered and minimized.
  • From an examination of potential alternate scenarios, pathological situations can be anticipated and mitigated.
  • BUT… beware the use case time sink
    • You cannot completely and correctly document all the use cases for a reasonably complex system in your lifetime
    • Keep it simple, and start prototyping as soon as you can – this will further inform the use cases and keep your project moving

INF385T(28620) – Fall 2013 – Lecture 11

gis design
GIS Design

INF385T(28620) – Fall 2013 – Lecture 11

designing the database

Conceptual

model

Key

Logical

model

Business

practices

Project

Feature collection

Physical

model

Roads

Network

table

Rail

Topology

Boundaries

Relationship

table

table

Designing the database

Collect information, identify desired themes and sources

Map themes to GIS database elements: define database entities and organization

Complete data organization, build full schema, test and refine

INF385T(28620) – Fall 2013 – Lecture 11

gis design practice
GIS design practice
  • Think about the GIS features represented by thematic layers, and about the integrity and behavior of those features--
    • Parcels are represented as polygons.
    • Parcels share geometry with boundaries.
    • Parcels do not overlap.
    • … etc.

INF385T(28620) – Fall 2013 – Lecture 11

conceptual design
Conceptual design
  • Entities, general relationships, important attributes
  • Sketches
  • ER/UML conceptual diagrams
  • Spreadsheets
  • Often reconstruct from existing systems/datasets
  • Very important for complex projects
  • Very useful to communicate with domain experts/business people

INF385T(28620) – Fall 2013 – Lecture 11

conceptual design1
Conceptual design
  • Purpose and usage of GIS
  • Data sources
    • coverages, shapefiles, CAD, etc.
    • compilation scale and accuracy
  • Spatial representation
    • raster, vector, surface, address
  • Attributes
    • required fields, types of measurement
  • Relationships
    • network, topological, general

INF385T(28620) – Fall 2013 – Lecture 11

conceptual design2

Key

Conceptual design

It is important to understand what you want to achieve from the outset

  • Symbology and labels
    • what symbols at which scales
    • text presentation on the map
  • Spatial reference
    • projection and datum
    • the largest area mapped
    • required detail and resolution
  • Special design cases, for example:
    • condominium parcels
    • parcel annotation

INF385T(28620) – Fall 2013 – Lecture 11

diagramming themes

Water Use Application

Hydrology

Utilities

Boundaries

Roads

Diagramming themes
  • Classic layer diagrams
    • Organize data into logical units
    • Focus on common dataelements to help determine:
      • Attributes
      • Associations
      • Spatial relationships

INF385T(28620) – Fall 2013 – Lecture 11

inventory existing data

Annotation

Boundaries

Lots

Parcels

PLSS Monument

PLSS Quarter

PLSS Section

PLSS Township

Inventory existing data

Target data layers

Legacy data

INF385T(28620) – Fall 2013 – Lecture 11

inventory existing data1

Annotation

Boundaries

Lots

Parcels

PLSS Monument

PLSS Quarter

PLSS Section

PLSS Township

Inventory existing data
  • Model database schema from existing data
  • Bridge existing data with current technology, for example:

Target Data Layers

Legacy data

GIS Database

• Boundaries hold survey attributes

• Coverage parcel polygons only exist for regions

INF385T(28620) – Fall 2013 – Lecture 11

database design
Lecture 11Database Design

INF385T(28620) – Fall 2013 – Lecture 11

files databases and gis
Files, databases, and GIS
  • Data files contain text or other data in arbitrary formats
  • Data tables contain records with fields (attributes, data items) identified by a primary key
  • Relational Database Management System (RDBMS or just DBMS):
    • creates and maintains relationships between data tables
    • allows one or more users to create or edit data in the tables
    • allows users to sort, select, and retrieve information using QUERIES and REPORTS
  • GIS adds a spatial dimension to databases, by integrating location and geometric shape information with the tables

INF385T(28620) – Fall 2013 – Lecture 11

relational database a formal information model called relational

Ab

Cd

Ef

xz

11

12

24

12

32

12

17

53

ed

vg

kl

op

34

9

11

46

4

9

2

2

R

A

X

G

w

f

c

v

Relational databaseA formal information model called “relational”
  • Tables can have formal & ad hoc relationships, based on:
    • Rows and columns
    • Known column types
    • Relationships
    • SQL language and operators
  • Relational is based on a simple, generic model with many implementations (MS Access, IBM DB2, Oracle, MS SQLServer, and many others)

INF385T(28620) – Fall 2013 – Lecture 11

data tables
Data tables

Organized into columns, rows, and cells (like a spreadsheet)

Columns = attributes = fields = data items

Rows = records

Cells = values

Attribute

or Column

Record or Row

Cell or Value

INF385T(28620) – Fall 2013 – Lecture 11

defining columns
Defining columns

To define a column or attribute, you must specify the column name and type

All DBMS’s support basic types:

  • Number (integer, float, decimal)
  • String (text)
  • Boolean (Yes/No)
  • Date

Many DBMS (SQL-99) and GIS systems support additional types (BLOB, XML, time series, …)

INF385T(28620) – Fall 2013 – Lecture 11

primary key
Primary key

The field or combination of fields that identifies each and every record uniquely within a table

Note: Primary key is more often arbitrary, meaningless to users; main purpose is to be unique

water use permit example
Water use permit example
  • Paper-based application form required for permit to withdraw surface or ground water
sample data for water use permits
Sample data for water use permits

Avg Gallons/Day

Max Gallons/Day

Withdrawal Code

Application ID

Source Name

Source Code

Use Codes

Owner

INF385T(28620) – Fall 2013 – Lecture 11

slide34

Water

use codes

INF385T(28620) – Fall 2013 – Lecture 11

relational organization
Relational organization
  • Tables should be organized according to basic rules of relational design for most efficient use.
  • Normalization is a series of steps followed to obtain a database design that allows for efficient access and storage of data in a relational database. These steps reduce data redundancy and the chances of data becoming inconsistent.
  • 3NF or BCNF are the usual standards for relational database design, however performance and convenience may drive toward de-normalization.

INF385T(28620) – Fall 2013 – Lecture 11

database normalization steps
Database normalization steps
  • First Normal Form (1NF) eliminates repeating groups by putting each into a separate table and connecting them with a one-to-many relationship.
  • Second Normal Form (2NF) eliminates functional dependencies on a partial key by putting the fields in a separate table from those that are dependent on the whole key.
  • Third Normal Form (3NF) eliminates functional dependencies on non-key fields by putting them in a separate table. At this stage, all non-key fields are dependent on the key, the whole key and nothing but the key.
  • Boyce-Codd Normal Form (BCNF) is sometimes applied as a stronger form of 3NF in which every determinant of a functional dependency within a relation must be a candidate key for the schema.
  • Fourth Normal Form (4NF) separates independent multi-valued facts stored in one table into separate tables.
  • Fifth Normal Form (5NF) breaks out data redundancy that is not covered by any of the previous normal forms.

source - http://www.hyperdictionary.com/dictionary/database+normalisation

INF385T(28620) – Fall 2013 – Lecture 11

first normal form not
First normal form - NOT
  • Do you see any groups of repeating columns?
    • What’s wrong with that?
    • Can you think of a case where this is okay?
  • How would you reorganize to fix this?

INF385T(28620) – Fall 2013 – Lecture 11

first normal form
First normal form

Primary key

Foreign keys

  • The Use Code columns can be removed from the main table (A), and made into rows of a separate table (B), keyed by ActID. (compare with previous slide)
  • (C) is a lookup table for use code descriptions.

B

C

Primary key

A

INF385T(28620) – Fall 2013 – Lecture 11

relationship cardinality
Relationship cardinality
  • With this design, one ActID can have any number of use codes, and any one use code can be associated with many ActID’s
    • This is called a Many-to-Many (M:M) relationship
    • This is much more space-efficient for data storage
  • One record in the Use Code Descriptions table (C) can be associated with many records in the ActID-Use Codes table (B)
    • This is called a One-to-Many (1:M) relationship
  • You may also have relationships with fixed cardinalities, such as 1:1, 1:2, 1:0..5, etc.
  • Cardinality of 1:0 generally means “nulls are allowed”

INF385T(28620) – Fall 2013 – Lecture 11

second normal form not
Second normal form - NOT
  • Do you see any dependencies between non-key columns and a partial key?
    • If the primary key were compound and included an OwnerID, there could be such a dependency between Owner and OwnerID
  • What’s wrongwith that?
  • What wouldyou do to fix this?

Compound key

INF385T(28620) – Fall 2013 – Lecture 11

second normal form
Second normal form
  • Remove the non-key data to a separate table and link to it
  • … and clean up the data while you’re at it!
    • Spelling, abbreviations, punctuation
    • Firstname Lastname vs. Lastname, Firstname

INF385T(28620) – Fall 2013 – Lecture 11

third normal form not
Third normal form - NOT
  • Do you see any functional dependencies among non-key fields in the table below?
    • Need we ask again: what’s wrong with this?
  • How would you reorganize to fix this?
third normal form
Third normal form
  • Remove the source description to a separate table, and join using the source code field
  • This will reduce duplication of data (and errors)

INF385T(28620) – Fall 2013 – Lecture 11

is that all there is to it
Is that all there is to it?
  • This table is NOT in Third Normal Form:
    • The PostalCode field is dependent on the City and ST fields
  • To place this table in 3NF, a separate table would be created for the City and ST fields, and joined using the PostalCode field
    • But this is generally not done with address & postal codes… WHY?
normalization tradeoffs
Normalization tradeoffs
  • When would you expect to normalize tables?
    • For primary data entry and updates; easier to set up and manage data integrity validation
      • Such as name and address subfields
    • To support more kinds of ad hoc queries
  • When would you expect to denormalize?
    • For presentation of data to users
    • To reduce the number of table-joins for faster performance
      • Queries are known and fixed
      • Better performance for web publishing
  • Database views are often used to flatten relationship structure for read-only access

INF385T(28620) – Fall 2013 – Lecture 11

database diagramming
Lecture 11Database diagramming

INF385T(28620) – Fall 2013 – Lecture 11

database diagramming conceptual logical overview
Database diagramming: conceptual / logical overview
  • Database relationships and cardinality can be diagrammed for prototyping and documentation
  • The “*” on an associationlink means “many”

1

*

*

*

Owners

Applications

Use Codes

*

1

Use Code Descrips

INF385T(28620) – Fall 2013 – Lecture 11

database diagramming entity relationship e r or unified modeling language uml

1

*

*

*

Owners

*OwnerID

FirstName

LastName

Phone

StreetAddr

City

State

PostalCode

Applications

*ApplicationID

OwnerID

ApplicationType

ProjectLocation

BusinessName

BusinessType

Use Codes

*RowID

UseCode

ApplicationID

*

1

Use Code Descrips

*UseCode

Description

Database diagramming:Entity-Relationship (E-R) or Unified Modeling Language (UML)

INF385T(28620) – Fall 2013 – Lecture 11

normalization tradeoff referential integrity
Normalization tradeoff: referential integrity
  • Suppose you removed a record from the Owner table
    • What should be done with the related records from the Applications table?
    • Would this be easier or harder to manage than with the de-normalized design on slide 33, “Sample data for water use permits”?
  • The more tables are interconnected by relationships, the greater the need to support referential integrity within your applications
    • A DBMS’ default support for referential integrity may be very basic, such as to place Nulls in associated foreign key fields, but only when a relationship is declared

INF385T(28620) – Fall 2013 – Lecture 11

building the database
Lecture 11Building the database

INF385T(28620) – Fall 2013 – Lecture 11

prototype prototype prototype
Prototype prototype prototype…
  • Critical for validating your data model and applications
  • An easy way to discover project requirements
  • Don’t plan a lot of time for this, just do it!
  • Prototype in the simplest environment to learn the most in the least time
    • Validate that thematic choices, schema & integrity rules support your requirements
    • Reduce data management overhead with personal, single-user system

INF385T(28620) – Fall 2013 – Lecture 11

database environments
Database environments
  • Production/Publishing
    • Read-only copies of databases
    • Used by majority of users
    • Contains custom views of databases
  • Development/Maintenance
    • Where compilation and editing occur
    • Normalized for greater integrity enforcement
    • May have multiple environments by data model (cadastral/land use, transportation, utilities, hydro…)
  • Design/Test
    • Prototype validation, load testing
    • Isolate testing changes to the development environment, so as not to corrupt the development system

INF385T(28620) – Fall 2013 – Lecture 11

large projects can seem like this
Large projects can seem like this…

etc, etc…

smoke alarm…

1. Burning toast…

…fill glass!

INF385T(28620) – Fall 2013 – Lecture 11

but they can be simplified with common data models
But they can be simplified with common data models
  • Should have:
    • Simple structure with most common elements across a set of applications in a user community
    • Minimal rules, custom behavior, or cross-dependencies
    • May include collections or sets or systems of feature classes, e.g., networks, topologies, terrains
  • Should lend to:
    • Web distribution
    • Incremental, multi-user data updates
    • User-side fusion, densification, value-adding
  • Early versions of data models (Arc 8 & 9):

http://support.esri.com/en/knowledgebase/techarticles/detail/40585

  • Current approach: http://resources.arcgis.com/en/communities/

INF385T(28620) – Fall 2013 – Lecture 11

summary
Summary

INF385T(28620) – Fall 2013 – Lecture 11

The design process

Describing a system of viewpoints

Developing use cases

GIS design stages

Conceptual design

Normalization example

Database diagrams

Building the database

Prototype to production