Predicting Water Quality Impaired Stream Segments using
Download
1 / 67

Predicting Water Quality Impaired Stream Segments using Landscape-scale Data and a Regional Geostatistical Model - PowerPoint PPT Presentation


  • 154 Views
  • Uploaded on

Predicting Water Quality Impaired Stream Segments using Landscape-scale Data and a Regional Geostatistical Model . Erin Peterson Environmental Risk Technologies CSIRO Mathematical & Information Sciences St Lucia, Queensland. This research is funded by. This research is funded by. U.S.EPA.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Predicting Water Quality Impaired Stream Segments using Landscape-scale Data and a Regional Geostatistical Model' - dunn


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Slide1 l.jpg

Predicting Water Quality Impaired Stream Segments using Landscape-scale Data and a Regional Geostatistical Model

Erin Peterson

Environmental Risk Technologies

CSIRO Mathematical & Information Sciences

St Lucia, Queensland


Slide2 l.jpg

This research is funded by

This research is funded by

U.S.EPA

U.S.EPA

Science To Achieve

Science To Achieve

Results (STAR) Program

Results (STAR) Program

Cooperative

Cooperative

CR

CR

-

-

829095

829095

#

#

Agreement

Agreement

Space-Time Aquatic Resources Modeling and Analysis Program

The work reported here was developed under STAR Research Assistance Agreement CR-829095 awarded by the U.S. Environmental Protection Agency (EPA) to Colorado State University. This presentation has not been formally reviewed by EPA. EPA does not endorse any products or commercial services mentioned in this presentation.


Slide3 l.jpg

Collaborators

Dr. David M. Theobald

Natural Resource Ecology Lab

Department of Recreation & Tourism

Colorado State University, USA

Dr. N. Scott Urquhart

Department of Statistics

Colorado State University, USA

Dr. Jay M. Ver Hoef

National Marine Mammal Laboratory, Seattle, USA

Andrew A. Merton

Department of Statistics

Colorado State University, USA


Slide4 l.jpg

Overview

Introduction

~

Background

~

Patterns of spatial autocorrelation in stream water chemistry

~

Predicting water quality impaired stream segments using landscape-scale data and a regional geostatistical model: A case study in Maryland, USA


Water quality monitoring goals l.jpg
Water Quality Monitoring Goals

  • Create a regional water quality assessment

    • Ecosystem Health Monitoring Program

  • Identify water quality impaired stream segments


Slide6 l.jpg

Probability-based Random Survey Designs

  • Advantages

  • Statistical inference about population of streams over large area

  • Reported in stream kilometers

  • Disadvantages

  • Does not take watershed influence into account

  • Does not identify spatial location of impaired stream segments


Slide7 l.jpg

Purpose

Develop a geostatistical methodology based on coarse-scale GIS data and field surveys that can be used to predict water quality characteristics about stream segments found throughout a large geographic area (e.g., state)


Slide8 l.jpg

SCALE: Grain

Aquatic

Terrestrial

Landscape

River Network

COARSE

Climate

Atmospheric deposition

Geology

Topography

Soil Type

Network Connectivity

Stream Network

Nested Watersheds

Drainage Density

Confluence Density

Connectivity

Flow Direction

Network Configuration

Vegetation Type

Basin Shape/Size

Land Use

Topography

Segment Contributing Area

Segment

Tributary Size Differences

Network Geometry

Localized Disturbances

Land Use/ Land Cover

Reach

Riparian Zone

Riparian Vegetation Type

& Condition

Floodplain / Valley Floor Width

Cross Sectional Area

Channel Slope, Bed Materials

Large Woody Debris

Overhanging

Vegetation

Substrate

Microhabitat

Microhabitat

FINE

Biotic Condition, Substrate Type,

Overlapping Vegetation

Detritus, Macrophytes

Shading

Detritus Inputs

Biotic

Condition


Slide9 l.jpg

10

Sill

Semivariance

Nugget

Range

0

1000

0

Separation Distance

Geostatistical Modeling

  • Fit an autocovariance function to data

  • Describes relationship between observations based on separation distance

Distances and relationships are represented differently depending on the distance measure


Distance measures spatial relationships l.jpg

B

A

C

Distance Measures & Spatial Relationships

Straight-line Distance (SLD)

Geostatistical models typically based on SLD


Distance measures spatial relationships11 l.jpg

B

A

C

Distance Measures & Spatial Relationships

Symmetric Hydrologic Distance (SHD)

Hydrologic connectivity: Fish movement


Distance measures spatial relationships12 l.jpg

B

A

C

Distance Measures & Spatial Relationships

Asymmetric Hydrologic Distance

Longitudinal transport of material


Distance measures spatial relationships13 l.jpg

B

A

C

Distance Measures & Spatial Relationships

  • Challenge:

    • Spatial autocovariance models developed for SLD may not be valid for hydrologic distances

      • Covariance matrix is not positive definite


Asymmetric autocovariance models for stream networks l.jpg

Flow

Asymmetric Autocovariance Models for Stream Networks

  • Weighted asymmetric hydrologic distance (WAHD)

  • Developed by Jay Ver Hoef

  • Moving average models

  • Incorporate flow volume, flow direction, and use hydrologic distance

  • Positive definite covariance matrices

Ver Hoef, J.M., Peterson, E.E., and Theobald, D.M., Spatial Statistical Models that Use Flow and Stream Distance, Environmental and Ecological Statistics. In Press.



Slide16 l.jpg

Objectives Chemistry

Evaluate 8 chemical response variables

  • pH measured in the lab (PHLAB)

  • Conductivity (COND) measured in the lab μmho/cm

  • Dissolved oxygen (DO) mg/l

  • Dissolved organic carbon (DOC) mg/l

  • Nitrate-nitrogen (NO3) mg/l

  • Sulfate (SO4) mg/l

  • Acid neutralizing capacity (ANC) μeq/l

  • Temperature (TEMP) °C

    Determine which distance measure is most appropriate

    • SLD

    • SHD

    • WAHD

    • More than one?

      Find the range of spatial autocorrelation


Dataset l.jpg
Dataset Chemistry

Maryland Biological Stream Survey (MBSS) Data

  • Maryland Department of Natural Resources

    • Maryland, USA

    • 1995, 1996, 1997

  • Stratified probability-based random survey design

  • 881 sites in 17 interbasins


Slide18 l.jpg

Study Chemistry

Area

Maryland, USA

Baltimore

Annapolis

Washington D.C.

Chesapeake Bay


Slide19 l.jpg

N Chemistry

Spatial Distribution of MBSS Data


Slide20 l.jpg

2 Chemistry

1

3

1

2

3

1

2

3

SHD

AHD

SLD

GIS Tools

Automated tools needed to extract data about hydrologic relationships between survey sites did not exist!

Wrote Visual Basic for Applications (VBA) programs to:

  • Calculate watershed covariates for each stream segment

    • Functional Linkage of Watersheds and Streams (FLoWS)

  • Calculate separation distances between sites

    • SLD, SHD, Asymmetric hydrologic distance (AHD)

  • Calculate the spatial weights for the WAHD

  • Convert GIS data to a format compatible with statistics software

  • FLoWS tools will be available on the STARMAP website:

  • http://nrel.colostate.edu/projects/starmap


Slide21 l.jpg

Watershed

Segment B

Watershed

Segment A

  • Calculate the PI of one survey site on another site

    • Flow-connected sites

    • Multiply the segment PIs

A

B

C

Watershed Area A

Segment PI

of A

=

Watershed Area B

Spatial Weights for WAHD

  • Proportional influence (PI): influence of each neighboring survey site on a downstream survey site

    • Weighted by catchment area: Surrogate for flow volume


Slide22 l.jpg

A

C

B

  • Calculate the PI of one survey site on another site

    • Flow-connected sites

    • Multiply the segment PIs

E

D

F

G

H

Spatial Weights for WAHD

  • Proportional influence (PI): influence of each neighboring survey site on a downstream survey site

    • Weighted by catchment area: Surrogate for flow volume

survey sites

stream segment


Slide23 l.jpg

  • Calculate the PI of one survey site on another site

    • Flow-connected sites

    • Multiply the segment PIs

Site PI = B * D * F * G

Spatial Weights for WAHD

  • Proportional influence (PI): influence of each neighboring survey site on a downstream survey site

    • Weighted by catchment area: Surrogate for flow volume

A

C

B

E

D

F

G

H


Data for geostatistical modeling l.jpg
Data for Geostatistical Modeling Chemistry

  • Distance matrices

    • SLD, SHD, AHD

  • Spatial weights matrix

    • Contains flow dependent weights for WAHD

  • Watershed covariates

    • Lumped watershed covariates

      • Mean elevation, % Urban

  • Observations

    • MBSS survey sites


Slide25 l.jpg

Geostatistical Modeling Methods Chemistry

  • Validation Set

  • Unique for each chemical response variable

  • Initial Covariate Selection

  • 5 covariates

  • Model Development

  • Restricted model space to all possible linear models

  • 4 model sets:


Slide26 l.jpg

Geostatistical Modeling Methods Chemistry

  • Geostatistical model parameter estimation

  • Maximize the profile log-likelihood function

Log-likelihood function of the parameters ( ) given the observed data Z is:

Maximizing the log-likelihood with respect to B and sigma2 yields:

and

Both maximum likelihood estimators can be written as functions of alone

Derive the profile log-likelihood function by substituting the MLEs ( ) back into the log-likelihood function


Slide27 l.jpg

where ChemistryC1 is the covariance based on the distance between two sites, h, given the autocorrelationparameter estimates: nugget ( ), sill ( ), and range ( ).

  • Covariance matrix for WAHD model

  • Fit exponential autocorrelation function (C1)

  • Hadamard (element-wise) product of C1 & square root of spatial weights matrix forced into symmetry ( )

Geostatistical Modeling Methods

  • Covariance matrix for SLD and SHD models

  • Fit exponential autocorrelation function


Slide28 l.jpg

Geostatistical Modeling Methods Chemistry

  • Model selection within model set

  • GLM: Akaike Information Corrected Criterion (AICC)

  • Geostatistical models: Spatial AICC (Hoeting et al., in press)

where n is the number of observations, p-1 is the number of covariates, and k is the number of autocorrelation parameters.

http://www.stat.colostate.edu/~jah/papers/spavarsel.pdf

  • Model selection between model types

  • 100 Predictions: Universal kriging algorithm

  • Mean square prediction error (MSPE)

  • Cannot use AICC to compare models based on different distance measures

  • Model comparison: r2 for observed vs. predicted values


Slide29 l.jpg

Summary statistics for distance measures in kilometers using DO (n=826).

* Asymmetric hydrologic distance is not weighted here

Results

  • Summary statistics for distance measures

  • Spatial neighborhood differs

  • Affects number of neighboring sites

  • Affects median, mean, and maximum separation distance


Slide30 l.jpg

180.79 DO (n=826).

301.76

SLD

SHD

WAHD

Results

Mean Range Values

SLD = 28.2 km

SHD = 88.03 km

WAHD = 57.8 km

  • Range of spatial autocorrelation differs:

  • Shortest for SLD

  • TEMP = shortest range values

  • DO = largest range values


Slide31 l.jpg

GLM DO (n=826).

SLD

MSPE

SHD

WAHD

Results

  • Distance Measures:

  • GLM always has less predictive ability

  • More than one distance measure usually performed well

    • SLD, SHD, WAHD: PHLAB & DOC

    • SLD and SHD : ANC, DO, NO3

    • WAHD & SHD: COND, TEMP

  • SLD distance: SO4


Slide32 l.jpg

r DO (n=826).2

GLM

SLD

SHD

WAHD

Results

Predictive ability of models:

Strong: ANC, COND, DOC, NO3, PHLAB

Weak: DO, TEMP, SO4

r2


Slide33 l.jpg

SHD DO (n=826).

WAHD

SLD

Discussion

Distance measure influences how spatial relationships are represented in a stream network

  • Site’s relative influence on other sites

  • Dictates form and size of spatial neighborhood

  • Important because…

  • Impacts accuracy of the geostatistical model predictions


Slide34 l.jpg

Patterns of spatial autocorrelation found at relatively coarse scale

  • Geostatistical models describe more variability than GLM

SLD, SHD, and WAHD represent spatial autocorrelation in continuous coarse-scale variables

SLD

  • > 1 distance measure performed well

  • SLD never substantially inferior

  • Do not represent movement through network

  • Different range of spatial autocorrelation?

  • Larger SHD and WAHD range values

  • Separation distance larger when restricted to network

SHD


Slide35 l.jpg

244 sites did not have neighbors coarse scale

Sample Size = 881

Number of sites with ≤1 neighbor: 393

Mean number of neighbors per site: 2.81

Frequency

Number of Neighboring Sites

Discussion

  • Probability-based random survey design (-) affected WAHD

  • Maximize spatial independence of sites

  • Does not represent spatial relationships in networks

  • Validation sites randomly selected


Slide36 l.jpg

4500 coarse scale

WAHD

GLM

Difference (O – E)

0

0

1

2

3

4

5

6

7

9

10

11

12

13

14

15

16

17

8

Number of Neighboring Sites

Discussion

WAHD models explained more variability as neighboring sites increased

  • Not when neighbors had:

  • Similar watershed conditions

  • Significantly different chemical response values


Slide37 l.jpg

4500 coarse scale

WAHD

GLM

Difference (O – E)

0

0

1

2

3

4

5

6

7

9

10

11

12

13

14

15

16

17

8

Number of Neighboring Sites

Discussion

  • GLM predictions improved as number of neighbors increased

  • Clusters of sites in space have similar watershed conditions

    • Statistical regression pulled towards the cluster

  • GLM contained hidden spatial information

    • Explained additional variability in data with > neighbors


Slide38 l.jpg

Coarse coarse scale

COND

SO4

ANC

PH

NO3

DOC

Scale of influential

ecological processes

TEMP

DO

Fine

0.5

0

1.0

Predictive Ability of Geostatistical Models

r2


Slide39 l.jpg

Conclusions coarse scale

  • Spatial autocorrelation exists in stream chemistry data at a relatively coarse scale

  • Geostatistical models improve the accuracy of water chemistry predictions

  • Patterns of spatial autocorrelation differ between chemical response variables

    • Ecological processes acting at different spatial scales

  • SLD is the most suitable distance measure at regional scale at this time

    • Unsuitable survey designs

    • SHD: GIS processing time is prohibitive


Slide40 l.jpg

Conclusions coarse scale

  • Results are scale specific

    • Spatial patterns change with survey scale

    • Other patterns may emerge at shorter separation distances

  • Further research is needed at finer scales

    • Watershed or small stream network

  • New survey designs for stream networks

    • Capture both coarse and fine scale variation

    • Ensure that hydrologic neighborhoods are represented


Slide41 l.jpg

Predicting Water Quality Impaired Stream Segments using coarse scale

Landscape-scale Data and a Regional Geostatistical Model: A Case Study In Maryland


Slide42 l.jpg

Objective coarse scale

Demonstrate how a geostatistical methodology can be used to compliment regional water quality monitoring efforts

  • Predict regional water quality conditions

  • Identify the spatial location of potentially impaired stream segments


Slide43 l.jpg

1996 MBSS DOC Data coarse scale

Kilometers

0

20

N


Slide44 l.jpg

Methods coarse scale

Potential covariates


Slide45 l.jpg

Methods coarse scale

Potential covariates after initial model selection (10)


Slide46 l.jpg

Methods coarse scale

  • Fit geostatistical models

  • Two distance measures: SLD and WAHD

  • Restricted model space to all possible linear models

  • 1024 models per set

  • 9 model sets

  • Parameter Estimation

  • Maximized profile log-likelihood function


Slide47 l.jpg

Model selection coarse scalewithin distance measure & autocorrelation function

  • Spatial AICC (Hoeting et al., in press)

Model selection between distance measure & autocorrelation function

  • Cross-validation method using Universal kriging algorithm

    • 312 predictions

  • MSPE

  • Model comparison: r2 for the observed vs. predicted values

Methods


Slide48 l.jpg

MSPE coarse scale

Mariah

Linear with Sill

Rational

Quadratic

Spherical

Exponential

Hole Effect

Autocorrelation Function

Results

  • SLD models performed better than WAHD

  • Exception: Spherical model

  • Best models:

  • SLD Exponential, Mariah, and Rational Quadratic models

  • r2 for SLD model predictions

  • Almost identical

  • Further analysis restricted to SLD Mariah model


Slide49 l.jpg

Results coarse scale

  • Covariates for SLD Mariah model:

  • WATER, EMERGWET, WOODYWET, FELPERC, & MINTEMP

  • Positive relationship with DOC:

  • WATER, EMERGWET, WOODYWET, MINTEMP

    Negative relationship with DOC

  • FELPERC


Slide50 l.jpg

Model coefficients represent change in log10 DOC per unit of X

Cross-validation intervals for

Mariah model regression coefficients

  • Cross-validation interval: 95% of regression coefficients produced by leave-one-out cross validation procedure

  • Narrow intervals

  • Few extreme regression coefficient values

    • Not produced by common sites

    • Covariate values for the site are represented in observed data

    • Not clustered in space


Slide51 l.jpg

r X2 Observed vs. Predicted Values

1 influential site

r2 without site = 0.66

n = 312 sites

r2 = 0.72



Slide53 l.jpg

Discussion X

  • SLD models more accurate than WAHD models

  • Landscape-scale covariates were not restricted to watershed boundaries

  • Geology type

  • Temperature

  • Wetlands & water


Slide54 l.jpg

Discussion X

  • Regression Coefficients

  • Narrow cross-validation intervals

  • Spatial location of the sites not as important as watershed characteristics

  • Extreme regression coefficient values

  • Not produced by common sites

  • Not clustered in space

  • Local-scale factor may have affected stream DOC

  • Point source of organic waste


Slide55 l.jpg

SPE values X

Spatial Patterns in Model Fit

  • North and east of Chesapeake Bay - large SPE values

  • Naturally acidic blackwater streams with elevated DOC

  • Not well represented in observed dataset

    • 2 blackwater sites

  • Geostatistical model unable to account for natural variability

    • Large square prediction errors

    • Large prediction variances


Slide56 l.jpg

SPE values X

Spatial Patterns in Model Fit

  • West of Chesapeake Bay - low SPE values

  • Due to statistical and spatial distribution of observed data

    • Regression equation fit to the mean in the data

    • Most observed sites = low DOC values

  • Less variation in western and central Maryland

    • Neighboring sites tend to be similar

  • Separation distances shorter in the west

    • Short separation distances = stronger covariances


Slide57 l.jpg

Model Performance X

Unable to account for abrupt differences in DOC values between neighboring sites with similar watershed conditions

  • What caused abrupt differences?

  • Point sources of organic pollution

    • Not represented in the model

  • Non-point sources of pollution

    • Lumped watershed attributes are non-spatial

    • Differences due to spatial location of landuse are not represented

    • Challenging to represent ecological processes using coarse-scale lumped attributes

      • i.e. Flow path of water


Slide58 l.jpg

Generate Model Predictions X

  • Prediction sites

  • Study area

    • 1st, 2nd, and 3rd order non-tidal streams

    • 3083 segments = 5973 stream km

  • ID downstream node of each segment

    • Create prediction site

  • More than one site at each confluence

  • Generate predictions and prediction variances

  • SLD Mariah model

  • Universal kriging algorithm

  • Assigned predictions and prediction variances back to stream segments in GIS





Slide62 l.jpg

Water Quality Attainment by Stream Kilometers X

  • Threshold values for DOC

  • Set by Maryland Department of Natural Resources

  • High DOC values may indicate biological or ecological stress


Slide63 l.jpg

  • Can be used to provide an estimate of regional stream DOC values

  • Cannot identify point sources of organic pollution

Implications for Water Quality Monitoring

  • Tradeoff between cost-efficiency and model accuracy

  • Western Maryland

    • Can be described using a single geostatistical model

  • Eastern and northeastern Maryland

    • Accept poor model fit

    • Collect additional survey data

    • Develop a separate geostatistical model for eastern Maryland


Slide64 l.jpg

Implications for Water Quality Monitoring stream segments throughout a large area

  • Apply this methodology to other regulated indices

  • e.g. conductivity and pH

  • Categorize predictions into potentially impaired or unimpaired status

  • Report on attainment in stream miles/kilometers


Conclusions l.jpg
Conclusions stream segments throughout a large area

  • Geostatistical models generated more accurate DOC predictions than previous non-spatial models based on coarse-scale landscape data

  • SLD is more appropriate than WAHD for regional geostatistical modeling of DOC at this time

    • Probability-based random survey designs

    • Maryland, USA

  • Adds value to existing water quality monitoring efforts

    • Used to evaluate/report regional water quality conditions

    • Additional field sampling is not necessary

    • Generate inferences about regional stream condition

    • ID spatial location of potentially impaired stream segments


Conclusions66 l.jpg
Conclusions stream segments throughout a large area

  • Model predictions and prediction variances

    • Additional field efforts concentrated in

      • Areas with large amounts of uncertainty

      • Areas with a greater potential for water quality impairment

  • Model results displayed visually

    • Communicate results to a variety of audiences


Questions l.jpg
Questions? stream segments throughout a large area