340 likes | 358 Views
Learn about relational databases, SQL, database normalization, and geospatial databases to optimize data management processes. Understand the importance of database design and dictionary for long-term success in GIS projects.
E N D
Quick Lesson on Databases • Relational databases are key to managing complex data • You’ve been using relational databases with “Joins” and “Relates” in ArcGIS • GeoDatabases are relational databases • Structured Query Language (SQL) is the primary language for relational databases • You’ve been using SQL statements in ArcGIS to query data
Relational Databases • Need to represent data with a complex structure Plot Species Tree
Database Tables • What you’ve seen in ArcGIS only more flexible • Tables are made up of “fields” (columns) and “records” (rows) • Queries are used to combine and subset tables into new tables • Each table should have a unique, integer, ID, referred to as a primary key • Greatly improves query performance
Field Data Types • Numeric • Float or integer • Auto numbered, use for primary keys • Dates • YYYY-MM-DD HH:MM:SS.SS • 2013-04-05 14:23:12.34 • Text • Specified width • “Variant” width • Binary Large Objects (BLOB)
Relational Databases • Allow us to “relate” tables to: • Reduce the overall amount of data • Removes duplicates • Makes updates much easier • Improves search speeds
Entity-Relationship Diagram • ERD • Unified Markup Language (UML) Relationship Types One to one One to many Many to many Entities Plot Relationships Species Tree
Plot Tree Species Primary Key Foreign Key
Database Normalization • Eliminate duplicate columns from the same table • Move fields that have “duplicate” row entries and move them to a related table • All field entries should be dependent on the primary key • There should be only one primary key in each table
Database Dictionary • Defines each of the tables and fields in a database • A database forms the basis for data management behind many GIS projects, web sites, and organizations • Proper documentation is key to long term success! • Database design (including ERDs) • Database Dictionary
Geospatial Databases • Not required to store spatial data! • Provide: • Field types for spatial data: point, polyline, polygon, etc. • Spatial operations: union, intersect, etc. • Spatial queries: return records that overlap with a polygon, etc. • Some provide spatial reference control
Relational Databases • Enterprise-Level • SQL Server • PostgreSQL • MySQL • Oracle • Sybase • File-Level • Geodatabase • MS-Access
What we really want • What we need from a database: • Distributed, concurrent access (concurrency) • Automatic Backup • Version control • Unlimited amounts of data • Quick data access • Inexpensive • Broad OS Support • File-level copying • GeoSpatial queries, operations, data types
Structured Query Language (SQL) • Comes from the database industry • “INSERT”, “DELETE”, and “SELECT” rows in tables • Very rich syntax • Portions of “SELECT” grammar used heavily in ArcGIS: • Selecting attributes • Raster calculator • Geodatabases
Transaction SQL • “SQL” is a subset of T-SQL • T-SQL allows full management of a database: • Create & drop: • Tables, fields/columns, relationships, indexes, views, etc. • Administrative functions • Varies some between databases
Using SQL • All Databases have “query editors” that allow us to write, save, edit, and use SQL queries • Use programming languages to “write” queries and “fetch” records from the database
SQL: SELECT SELECT Field1, Field2 FROM TableName JOIN TableName2 WHERE Filter1 AND Filter 2 GROUP BY Field1,Field2 ORDER BY Field1 [DESC], Field2 [DESC]
Selecting Fields • SELECT * • Returns all fields as new table • SELECT Field1,Field2 • SELECT Table1.Field1,Table2.Field1 • Return specified fields • SELECT Table1.Field1 AS NewName • Avoids name collisions
Selecting Tables • FROM Table1 • Returns contents of one table • FROM Table1 INNER JOIN Table2 ON Table2.ForeignKey=Table1.PrimaryKey • Returns records from Table2 that match primary keys in Table1 • Does not return all rows in Table1
Selecting Tables (con’t) • FROM Table1 OUTER JOIN Table2 ON Table2.ForeignKey=Table1.PrimaryKey • Returns all matches between Table1 and Table2 and any records in Table1 that don’t match records in Table2 • Missing values are NULL
Filters or “WHERE” clauses SELECT * FROM Table1 WHERE (Field1 Operator Value1) BooleanOperator (Field1 Operator Field2)
Filter Examples • WHERE: • ID = 1 • Area < 10000 • Area <= 10000 • Name = “Crater Lake” (case dependent) • Name LIKE “Crater Lake” (ignores case) • Notice: • String values have double quotes • Syntax for strings vary some between databases
SQL Comparisons • Equals: = • Greater than: > • Less than: < • Greater than or equal: >= • Less than or equal: <= • Not equal: <> • Like: case independent string comparison with wild cards (%)
More Complex Filter Examples • WHERE: • Name LIKE “Hawaii” AND Area < 10000 • Species LIKE “Ponderosa” AND DBH > 1
ORDER BY SELECT * FROM Table 1 ORDER BY LastName DESC, FirstName DESC • Careful with performance on large datasets and string fields
GROUP BY • Aggregates data SELECT Species ,AVG(Height) FROM Trees GROUP BY Species • Only aggregated fields can appear in SELECT list
SQL INSERT • INSERT INTO TableName (Field1,Field2) VALUES (Value1,”Value2”) • String values must be in quotes • Other values can also be in quotes • If the table has an “auto numbered” ID field, it will be added automatically • Otherwise, very difficult to set the ID field
SQL DELETE DELETE FROM TableName WHERE ID=Value - Deletes one row DELETE FROM Plot WHERE PlotID=12 - Deletes all rows with PlotID=12 DELETE FROM TableName - Deletes everything in TableName!
Database Performance Default Search Indexed Search Primary Key Search
Indexes • Added to a table • Typically for one field • Adds overhead to INSERT and DELETEs • Important for: • Large tables • Complex queries • Especially text searches!
Maintaining Performance • Always use integer, auto numbered primary keys • Avoid iterative or hierarchical queries • Sometimes code is faster: • Do simple query, load into RAM and sort • With REALLY big data, don’t use SQL • NoSQL, accessing data directly, without the use of a relational database package • There are “NoSQL” products in the works • Avoid text searches and sorts
Rasters and Databases • Don’t put rasters into a database! • Makes it impossible to backup and restore the database • Put a file path to the rasters in the database