1 / 55

# Statistics in WR: Session 20 - PowerPoint PPT Presentation

Statistics in WR: Session 20. Introduction to Spatial Statistics Ernest To. Outline. Basics of spatial statistics Kriging Application of spatial-temporal statistics (Gravity currents in CCBay). Basics. Consider the following scenario.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about ' Statistics in WR: Session 20' - sanne

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Statistics in WR: Session 20

Introduction to Spatial Statistics

Ernest To

• Basics of spatial statistics

• Kriging

• Application of spatial-temporal statistics (Gravity currents in CCBay)

Ernest To 20090408

### Basics

• Two river stations, A and B, measure dissolved oxygen (DO).

• At station A

• mean DO = µA = 5 mg/L

• std dev at Station A= σA = 2 mg/L

• At station B

• mean DO = µB = 5 mg/L

• std dev at Station A= σB = 2 mg/L

• Correlation between measurements at stations A and B = ρAB = 0.5.

A

B

Ernest To 20090408

• We collected a DO measurement of 2 mg/L at Station A.

• What is the updated mean (µB|XA ) and standard deviation (σB|XA) at Station B?

• (assume that the DO distributions are normal)

• µA = 5 mg/L

• σA = 2 mg/L

• New sample

• X A = 2 mg/L

A

• µB = 5 mg/L

• σB = 2 mg/L

• µB|XA = ?

• σB|XA = ?

B

Ernest To 20090408

• Distributions at A and B (assume normal)

• Joint distribution at A and B

f(xA)

f(xB)

XA

XB

• µA = 5 mg/L, σA = 2 mg/L

• µB = 5 mg/L, σB = 2 mg/L

f(xA,xB)

XA

Ernest To 20090408

XB

f(xA)

f(xA,xB)

XA

f(xB)

XA

XB

Ernest To 20090408

XB

• µA = 5 mg/L, σA = 2 mg/L

• µB = 5 mg/L, σB = 2 mg/L

How does ρAB affect the shape of the joint distribution?

Scatter plots of XA vs XB

• ρAB = 0.99

• ρAB = -0.99

• ρAB = 0.5

• ρAB = 0

XA

XA

XA

XA

XA

XA

XA

XB

XB

XB

XB

XB

XB

XB

f(xA,xB)

XA

XB

Joint distribution of XB and XA

Ernest To 20090408

Prior pdf (joint distribution)

XA

PRIOR STAGE

XB

CONDITIONALIZATION STAGE

Observed data is used to update the distribution.

xA = 2 mg/L

XA

XB

POSTERIOR STAGE

A conditional pdf for XB is generated.

Prior pdf

xA = 2 mg/L

XA

Conditional pdf

Ernest To 20090408

XB

Prior pdf

If the prior pdf is binormal, the conditional pdf is also normal with:

Mean =

Variance =

xA = 2 mg/L

XA

XB

Conditional pdf

XB|XA

(The variance is independent of XA or XB Homoscedasticity)

Ernest To 20090408

Expected value of conditional pdf is a linear function of the conditioning data

Updated mean and std. dev at Station B

Mean

Std. dev

• µA = 5 mg/L

• σA = 2 mg/L

• New sample

• X A = 2 mg/L

A

• µB = 5 mg/L

• σB = 2 mg/L

• µB|XA = 3.5 mg/L

• σB|XA = 1.7 mg/L

B

Ernest To 20090408

Yes we can….

But under following conditions

• Normality

• 2nd order stationarity:

• Mean does not change with location

• Variance does not change with location

• Know the mean and variance.

• Have a function that determines the correlation between two locations

A

• µ = 5 mg/L

• σ = 2 mg/L

B

Ernest To 20090408

In spatial statistics, correlation is modeled as a function of the separation distance between two points

Where h = separation distance (aka lag).

Most of the time, correlation decreases with distance.

(Things that are closer together tend to be more correlated with each other).

Ernest To 20090408

Imagine the case where we have a smattering of data along an axis.

Any given pair of data points, i and j, will have two properties:

• The semivariance = γ = 0.5*(Zi-Zj )2

2. The separation distance = hij

hij = separation distance

Data point j

Measured value =Zj

Data point i

Measured value =Zi

Ernest To 20090408

We can plot the semivariance, γ , of all possible pairs against the lag, h. This gives us a variogram.

Ernest To 20090408

We can fit a curve through the semivariogram to model the semivariance as a function of the lag. This is the variogram model.

Ernest To 20090408

We can fit a curve through the semivariogram to model the semivariance as a function of the lag. This is the variogram model.

sill

range

Ernest To 20090408

Assuming that mean and variance do not change with location (assumption of stationarity), the variogram model is related to the

covariance model by the equation:

C(h)

Where σ2 is the variance

Ernest To 20090408

Assuming that variance does not change with location (assumption of stationarity), the correlation model is related to the

covariance model model by the equation :

ρ(h)

1

.8

.6

.4

.2

Ernest To 20090408

• ρAB = 0

• ρAB = 0.5

• ρAB = 0.99

Scatter plots

of XA vs XB

XA

XB

XA

XA

f(xA,xB)

XA

XA

Joint distribution of XA and XB

XB

XB

XB

XB

XA

XB

Conditional distribution of XB|XA

XB|XA

Increasing h

Ernest To 20090408

### Kriging

What if we have more than one location that provide conditioning data?

(Assume distributions are STILL normal at all locations).

• At station A1, A2, A3, A4

• µA1 = µA2 = µA3 = µA4 = 5 mg/L

• σA1 = σA2 = σA3 = σA4 = 2 mg/L

• At station B

• mean DO = µB = 5 mg/L

• std dev at Station A= σB = 2 mg/L

• ρ =f(h)= 0.0125h2 - 0.225h + 1

A1

A2

A3

A4

B

Ernest To 20090408

ρ =f(h)= 0.0125h2 - 0.225h + 1

Distance along river (in hundred meters)

2

2

2

2

B

A4

A3

A2

A1

From correlation model:

ρA1B = 0.0, ρA2B = 0.1, ρA3B = 0.3, ρA4B = 0.6; ρA1A2 = 0.6, ρA1A3 = 0.3, ρA1A4 = 0.1, ρA2A3 = 0.6, ρA2A4 =0.3 , ρA3A4 = 0.6

Ernest To 20090408

Divide locations into two groups:

• The vector, , representing the set of random variables at the locations contributing the conditioning data.

• The variable, ,representing the random variable at the point of estimation.

A1

A2

A3

A4

B

Ernest To 20090408

1. If individual distributions are normal, joint pdf is multi-normal.

2. Group variables into two:

one for points with data,

one for the point of estimation.

XB

XA1

XA4

XA2

XA3

Prior pdf

3. Intersect pdf with conditioning data to get conditional pdf.

Ernest To 20090408

Conditional pdf

The updated mean and variance of the distribution at Station B are given by:

Mean:

Variance:

Where:

A1

A2

A3

A4

B

Ernest To 20090408

Recall two variable case

• Multivariable case takes into account

• Correlation between data locations and estimated location ( ).

• Correlation among data locations ( ).

• This is the most fundamental form of kriging, i.e. Simple Kriging.

Multivariable case

Conditional pdf

Ernest To 20090408

• Recall that Cov(A,B) = ρAB σA σ B

• Compute data to data correlation:

Ernest To 20090408

• Compute data to estimation point correlation:

Ernest To 20090408

weights

Note: The weights attributed to each station are determined by the prior (joint distribution) among them.

Ernest To 20090408

Weights = [λ1, λ2, λ3,… λn]

Plug and Chug

weights

Note: The weights attributed to each station are determined by the prior (joint distribution) among them.

Ernest To 20090408

Ernest To 20090408

Ernest To 20090408

The updated mean and standard deviation of the distribution at Station B are:

Mean:

Standard deviation:

A1

A2

A3

A4

B

Ernest To 20090408

• Ordinary kriging (OK)

• Does not require mean to be known

• Assumes that mean is constant and is somewhere in the range of the conditioning data

• Universal kriging (UK)

• Does not require mean to be known nor require it to be constant

• User specifies a model for the trend in mean. UK will then fit the model to the data.

• Indicator kriging (IK)

• handles binary variables (0 or 1)

• has ability to take care of non-normality in data through iterative application.

• Co-kriging (CK)

• takes into account a related secondary variable to help estimate the primary variable.

Ernest To 20090408

• The lag can be represented by the euclidean distance between 2 points

• So the covariance model of the form, C = f(h), can still be used

• Variables may be more correlated in one direction than the other (anisotropy)

• linear transformation can be performed to transform the distances so the correlation distance is the same in all directions (isotropy)

Ernest To 20090408

• For space and time, there is no standard space-time metric.

• The form:

• is not always correct because the temporal and spatial axes are not always orthogonal to each other.

• Processes that happen in time usually have some dependency on processes that happen in space.

• (They are not independent).

• A separate temporal lag term is usually used

• The covariance function takes on the form:

Ernest To 20090408

### Application(Gravity currents in Corpus Christi Bay)

TCOON stations

TCEQ stations

Corpus Christi Bay

Oso

Bay

Gulf of Mexico

Ernest To 20090408

Aerial photo

USGS gages

SERF stations

HRI stations

depressions

ridges

?

?

?

- 5.0 m above Mean High Water Level

- 4.5 m above Mean High Water Level

Oso Bay

- 4.0 m above Mean High Water Level

- 3.5 m above Mean High Water Level

West Laguna

- 2.5 m above MeanHigh Water Level

East Laguna

- 2.0 m above Mean High Water Level

- 1.5 m above Mean High Water Level

Ernest To 20090408

- 1.0 m above Mean High Water Level

channel

Plume tracking survey

July 14 to 17, 2006.

(While gravity current was on the move)

Ben Hodges

University of Texas at Austin

Water quality data

July 12 and 18, 2006.

(At birth and demise of gravity current)

Paul Montagna

Texas A&M University, Corpus Christi

Ernest To 20090408

salinity

salinity

salinity

salinity

salinity

salinity

salinity

salinity

salinity

salinity

salinity

salinity

0

0

0

0

0

0

0

0

0

0

0

0

depth

depth

depth

depth

depth

depth

depth

depth

depth

depth

depth

depth

t = 0

t = 2

t = 3

t = 1

Direction of flow

Synthesis

Ernest To 20090408

Salinity profiles collected at various locations and time

Time history of gravity current along direction of flow

Acquired data in ArcHydro II

Time Series Table

HRI stations

Data Preparation

1. Salinity data from HRI are acquired using HydroGet (a GIS web service client) and combined with plume tracking data.

2. Data locations are projected onto a reference line following the general direction of flow.

• Space-time kriging is performed in 3 dimensions

• X= Longitudinal measure

• (meters from origin point)

• Y =Time

• (days since 7/12/2006)

• Z =Elevation

• (meters from water surface)

Reference line

Origin

x = 0 m

Ernest To 20090408

where

h= lag distance along direction of flow

C0= nugget = 2 psu2

C1= sill = 3.6 psu2

a = range = 6000 m

(Gaussian variogram model)

Ernest To 20090408

where

h= lag distance along direction of flow

C0= nugget = 2 psu2

C1= sill = 3.6 psu2

a = range = 6000 m

(Gaussian variogram model)

sill

nugget

range

Ernest To 20090408

where

h= lag distance along direction of flow

C0= nugget = 0 psu2

C1= sill = 3.6 psu2

a = range = 1.7 m

(Gaussian variogram model)

Ernest To 20090408

where

h= lag distance along direction of flow

C0= nugget = 0 psu2

C1= sill = 3 psu2

a = range = 1 day

(Spherical variogram model)

Ernest To 20090408

N

LEGEND

37 – 40 psu

40 – 42 psu

42 – 43 psu

42 – 44 psu

44 – 46 psu

Elevation

Longitudinal profile on 7/13/2006 18:00

z

Time

Distance to origin point

N

Longitudinal profile on 7/12/2006 18:00

y

Ernest To 20090408

x

Ernest To 20090408

Ernest To 20090408

• a common method to evaluate variogram models.

• aka “fictitious point” method (Delhomme, 1978),

• remove one data point at a time from data set and then using the remaining n-1 points the estimate the removed point.

• estimated and actual values were then compared with each other.

Ernest To 20090408

We’ve covered:

• Basics of spatial statistics

• Kriging

• Application of spatial-temporal statistics (Gravity currents in CCBay)

Spatial statistics is fun!

Ernest To 20090408

• ArcGIS Geostatistical Analyst

• Easiest to use

• GSLIB

• Library of fortran programs

• DeCesare’s version of GSLIB

• Modification of GSLIB to do space-time kriging

• BMELIB

• Library of MATLAB programs

Ernest To 20090408