Statistics in wr session 20
Download
1 / 55

Statistics in WR: Session 20 - PowerPoint PPT Presentation


  • 132 Views
  • Uploaded on

Statistics in WR: Session 20. Introduction to Spatial Statistics Ernest To. Outline. Basics of spatial statistics Kriging Application of spatial-temporal statistics (Gravity currents in CCBay). Basics. Consider the following scenario.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Statistics in WR: Session 20' - sanne


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Statistics in wr session 20

Statistics in WR: Session 20

Introduction to Spatial Statistics

Ernest To


Outline
Outline

  • Basics of spatial statistics

  • Kriging

  • Application of spatial-temporal statistics (Gravity currents in CCBay)

Ernest To 20090408



Consider the following scenario
Consider the following scenario

  • Two river stations, A and B, measure dissolved oxygen (DO).

  • At station A

    • mean DO = µA = 5 mg/L

    • std dev at Station A= σA = 2 mg/L

  • At station B

    • mean DO = µB = 5 mg/L

    • std dev at Station A= σB = 2 mg/L

  • Correlation between measurements at stations A and B = ρAB = 0.5.

A

B

Ernest To 20090408


New data
New data!

  • We collected a DO measurement of 2 mg/L at Station A.

  • What is the updated mean (µB|XA ) and standard deviation (σB|XA) at Station B?

    • (assume that the DO distributions are normal)

  • µA = 5 mg/L

  • σA = 2 mg/L

  • New sample

  • X A = 2 mg/L

A

  • µB = 5 mg/L

  • σB = 2 mg/L

  • µB|XA = ?

  • σB|XA = ?

B

Ernest To 20090408


Let s sketch out the distributions
Let’s sketch out the distributions

  • Distributions at A and B (assume normal)

  • Joint distribution at A and B

f(xA)

f(xB)

XA

XB

  • µA = 5 mg/L, σA = 2 mg/L

  • µB = 5 mg/L, σB = 2 mg/L

f(xA,xB)

XA

Ernest To 20090408

XB


Marginal and joint distributions
Marginal and joint distributions

f(xA)

f(xA,xB)

XA

f(xB)

XA

XB

Ernest To 20090408

XB

  • µA = 5 mg/L, σA = 2 mg/L

  • µB = 5 mg/L, σB = 2 mg/L


How does ab affect the shape of the joint distribution
How does ρAB affect the shape of the joint distribution?

Scatter plots of XA vs XB

  • ρAB = 0.99

  • ρAB = -0.99

  • ρAB = 0.5

  • ρAB = 0

XA

XA

XA

XA

XA

XA

XA

XB

XB

XB

XB

XB

XB

XB

f(xA,xB)

XA

XB

Joint distribution of XB and XA

Ernest To 20090408


Statistics in wr session 20

Bayesian conditioning

Prior pdf (joint distribution)

XA

PRIOR STAGE

XB

CONDITIONALIZATION STAGE

Observed data is used to update the distribution.

xA = 2 mg/L

XA

XB

POSTERIOR STAGE

A conditional pdf for XB is generated.

Prior pdf

xA = 2 mg/L

XA

Conditional pdf

Ernest To 20090408

XB


Statistics in wr session 20

Conditional pdf

Prior pdf

If the prior pdf is binormal, the conditional pdf is also normal with:

Mean =

Variance =

xA = 2 mg/L

XA

XB

Conditional pdf

XB|XA

(The variance is independent of XA or XB Homoscedasticity)

Ernest To 20090408

Expected value of conditional pdf is a linear function of the conditioning data


Back to the problem
Back to the problem

Updated mean and std. dev at Station B

Mean

Std. dev

  • µA = 5 mg/L

  • σA = 2 mg/L

  • New sample

  • X A = 2 mg/L

A

  • µB = 5 mg/L

  • σB = 2 mg/L

  • µB|XA = 3.5 mg/L

  • σB|XA = 1.7 mg/L

B

Ernest To 20090408


Can we do the same for any two points on the river
Can we do the same for any two points on the river?

Yes we can….

But under following conditions

  • Normality

  • 2nd order stationarity:

    • Mean does not change with location

    • Variance does not change with location

  • Know the mean and variance.

  • Have a function that determines the correlation between two locations

A

  • µ = 5 mg/L

  • σ = 2 mg/L

B

Ernest To 20090408


Modeling correlation
Modeling correlation

In spatial statistics, correlation is modeled as a function of the separation distance between two points

Where h = separation distance (aka lag).

Most of the time, correlation decreases with distance.

(Things that are closer together tend to be more correlated with each other).

Ernest To 20090408


Estimating correlation model from data
Estimating correlation model from data

Imagine the case where we have a smattering of data along an axis.

Any given pair of data points, i and j, will have two properties:

  • The semivariance = γ = 0.5*(Zi-Zj )2

    2. The separation distance = hij

hij = separation distance

Data point j

Measured value =Zj

Data point i

Measured value =Zi

Ernest To 20090408


Estimating correlation model from data1
Estimating correlation model from data

We can plot the semivariance, γ , of all possible pairs against the lag, h. This gives us a variogram.

Ernest To 20090408


Estimating correlation model from data2
Estimating correlation model from data

We can fit a curve through the semivariogram to model the semivariance as a function of the lag. This is the variogram model.

Ernest To 20090408


Estimating correlation model from data3
Estimating correlation model from data

We can fit a curve through the semivariogram to model the semivariance as a function of the lag. This is the variogram model.

sill

range

Ernest To 20090408


Estimating correlation model from data4
Estimating correlation model from data

Assuming that mean and variance do not change with location (assumption of stationarity), the variogram model is related to the

covariance model by the equation:

C(h)

Where σ2 is the variance

Ernest To 20090408


Estimating correlation model from data5
Estimating correlation model from data

Assuming that variance does not change with location (assumption of stationarity), the correlation model is related to the

covariance model model by the equation :

ρ(h)

1

.8

.6

.4

.2

Ernest To 20090408


How does the correlation model affect the estimation
How does the correlation model affect the estimation

  • ρAB = 0

  • ρAB = 0.5

  • ρAB = 0.99

Scatter plots

of XA vs XB

XA

XB

XA

XA

f(xA,xB)

XA

XA

Joint distribution of XA and XB

XB

XB

XB

XB

XA

XB

Conditional distribution of XB|XA

XB|XA

Increasing h

Ernest To 20090408



Multivariable case
Multivariable case

What if we have more than one location that provide conditioning data?

(Assume distributions are STILL normal at all locations).

  • At station A1, A2, A3, A4

    • µA1 = µA2 = µA3 = µA4 = 5 mg/L

    • σA1 = σA2 = σA3 = σA4 = 2 mg/L

  • At station B

    • mean DO = µB = 5 mg/L

    • std dev at Station A= σB = 2 mg/L

  • ρ =f(h)= 0.0125h2 - 0.225h + 1

A1

A2

A3

A4

B

Ernest To 20090408


Modeling correlation1
Modeling correlation

ρ =f(h)= 0.0125h2 - 0.225h + 1

Distance along river (in hundred meters)

2

2

2

2

B

A4

A3

A2

A1

From correlation model:

ρA1B = 0.0, ρA2B = 0.1, ρA3B = 0.3, ρA4B = 0.6; ρA1A2 = 0.6, ρA1A3 = 0.3, ρA1A4 = 0.1, ρA2A3 = 0.6, ρA2A4 =0.3 , ρA3A4 = 0.6

Ernest To 20090408


Dealing with multiple variables
Dealing with multiple variables

Divide locations into two groups:

  • The vector, , representing the set of random variables at the locations contributing the conditioning data.

  • The variable, ,representing the random variable at the point of estimation.

A1

A2

A3

A4

B

Ernest To 20090408


Concept
Concept

1. If individual distributions are normal, joint pdf is multi-normal.

2. Group variables into two:

one for points with data,

one for the point of estimation.

XB

XA1

XA4

XA2

XA3

Prior pdf

3. Intersect pdf with conditioning data to get conditional pdf.

Ernest To 20090408

Conditional pdf


Dealing with multiple variables1
Dealing with multiple variables

The updated mean and variance of the distribution at Station B are given by:

Mean:

Variance:

Where:

A1

A2

A3

A4

B

Ernest To 20090408


Statistics in wr session 20

Equations in multivariable case are more generalized

Recall two variable case

  • Multivariable case takes into account

  • Correlation between data locations and estimated location ( ).

  • Correlation among data locations ( ).

  • This is the most fundamental form of kriging, i.e. Simple Kriging.

Multivariable case

Conditional pdf

Ernest To 20090408


Plug and chug
Plug and Chug

  • Recall that Cov(A,B) = ρAB σA σ B

  • Compute data to data correlation:

Ernest To 20090408


Plug and chug1
Plug and Chug

  • Compute data to estimation point correlation:

Ernest To 20090408


Plug and chug2
Plug and Chug

weights

Note: The weights attributed to each station are determined by the prior (joint distribution) among them.

Ernest To 20090408


Plug and chug3

Weights = [λ1, λ2, λ3,… λn]

Plug and Chug

weights

Note: The weights attributed to each station are determined by the prior (joint distribution) among them.

Ernest To 20090408


Plug and chug4
Plug and Chug

Ernest To 20090408


Plug and chug5
Plug and Chug

Ernest To 20090408


Results from simple kriging
Results from Simple Kriging

The updated mean and standard deviation of the distribution at Station B are:

Mean:

Standard deviation:

A1

A2

A3

A4

B

Ernest To 20090408


Other forms of kriging
Other forms of kriging

  • Ordinary kriging (OK)

    • Does not require mean to be known

    • Assumes that mean is constant and is somewhere in the range of the conditioning data

  • Universal kriging (UK)

    • Does not require mean to be known nor require it to be constant

    • User specifies a model for the trend in mean. UK will then fit the model to the data.

  • Indicator kriging (IK)

    • handles binary variables (0 or 1)

    • has ability to take care of non-normality in data through iterative application.

  • Co-kriging (CK)

    • takes into account a related secondary variable to help estimate the primary variable.

Ernest To 20090408


Extension to 2d 3d
Extension to 2D, 3D

  • The lag can be represented by the euclidean distance between 2 points

  • So the covariance model of the form, C = f(h), can still be used

  • Variables may be more correlated in one direction than the other (anisotropy)

    • linear transformation can be performed to transform the distances so the correlation distance is the same in all directions (isotropy)

Ernest To 20090408


Extension to space time
Extension to space-time

  • For space and time, there is no standard space-time metric.

  • The form:

    • is not always correct because the temporal and spatial axes are not always orthogonal to each other.

    • Processes that happen in time usually have some dependency on processes that happen in space.

    • (They are not independent).

  • A separate temporal lag term is usually used

  • The covariance function takes on the form:

Ernest To 20090408


Application gravity currents in corpus christi bay

Application(Gravity currents in Corpus Christi Bay)


Statistics in wr session 20

Sensors in Corpus Christi Bay

TCOON stations

TCEQ stations

Corpus Christi Bay

Oso

Bay

Gulf of Mexico

Laguna Madre

Ernest To 20090408

Aerial photo

from Google Earth

USGS gages

SERF stations

HRI stations




Selecting a study area
Selecting a study area

depressions

ridges

?

?

?

- 5.0 m above Mean High Water Level

- 4.5 m above Mean High Water Level

Oso Bay

- 4.0 m above Mean High Water Level

- 3.5 m above Mean High Water Level

West Laguna

Madre

- 2.5 m above MeanHigh Water Level

East Laguna

Madre

- 2.0 m above Mean High Water Level

- 1.5 m above Mean High Water Level

Ernest To 20090408

- 1.0 m above Mean High Water Level

channel


Downstream of east laguna madre
Downstream of East Laguna Madre

Plume tracking survey

July 14 to 17, 2006.

(While gravity current was on the move)

Ben Hodges

University of Texas at Austin

Water quality data

July 12 and 18, 2006.

(At birth and demise of gravity current)

Paul Montagna

Texas A&M University, Corpus Christi

Ernest To 20090408


Synthesis of data
Synthesis of data

salinity

salinity

salinity

salinity

salinity

salinity

salinity

salinity

salinity

salinity

salinity

salinity

0

0

0

0

0

0

0

0

0

0

0

0

depth

depth

depth

depth

depth

depth

depth

depth

depth

depth

depth

depth

t = 0

t = 2

t = 3

t = 1

Direction of flow

Synthesis

Ernest To 20090408

Salinity profiles collected at various locations and time

Time history of gravity current along direction of flow


Data preparation

HydroGet interface

Acquired data in ArcHydro II

Time Series Table

HRI stations

Data Preparation

1. Salinity data from HRI are acquired using HydroGet (a GIS web service client) and combined with plume tracking data.

2. Data locations are projected onto a reference line following the general direction of flow.

  • Space-time kriging is performed in 3 dimensions

    • X= Longitudinal measure

    • (meters from origin point)

    • Y =Time

    • (days since 7/12/2006)

    • Z =Elevation

    • (meters from water surface)

Reference line

Origin

x = 0 m

Ernest To 20090408


Variogram along direction of flow
Variogram along direction of flow

where

h= lag distance along direction of flow

C0= nugget = 2 psu2

C1= sill = 3.6 psu2

a = range = 6000 m

(Gaussian variogram model)

Ernest To 20090408


Variogram along direction of flow1
Variogram along direction of flow

where

h= lag distance along direction of flow

C0= nugget = 2 psu2

C1= sill = 3.6 psu2

a = range = 6000 m

(Gaussian variogram model)

sill

nugget

range

Ernest To 20090408


Variogram along depth
Variogram along depth

where

h= lag distance along direction of flow

C0= nugget = 0 psu2

C1= sill = 3.6 psu2

a = range = 1.7 m

(Gaussian variogram model)

Ernest To 20090408


Variogram along time axis
Variogram along time axis

where

h= lag distance along direction of flow

C0= nugget = 0 psu2

C1= sill = 3 psu2

a = range = 1 day

(Spherical variogram model)

Ernest To 20090408


Interpolation results
Interpolation results

N

LEGEND

37 – 40 psu

40 – 42 psu

42 – 43 psu

42 – 44 psu

44 – 46 psu

Elevation

Longitudinal profile on 7/13/2006 18:00

z

Time

Distance to origin point

N

Longitudinal profile on 7/12/2006 18:00

y

Ernest To 20090408

x


Longitudinal profiles
Longitudinal Profiles

Ernest To 20090408


Bottom salinities
Bottom salinities

Ernest To 20090408


Cross validation
Cross validation

  • a common method to evaluate variogram models.

  • aka “fictitious point” method (Delhomme, 1978),

  • remove one data point at a time from data set and then using the remaining n-1 points the estimate the removed point.

  • estimated and actual values were then compared with each other.

Ernest To 20090408


Conclusions
Conclusions

We’ve covered:

  • Basics of spatial statistics

  • Kriging

  • Application of spatial-temporal statistics (Gravity currents in CCBay)

    Spatial statistics is fun!

Ernest To 20090408


Geostatistical tools
Geostatistical tools

  • ArcGIS Geostatistical Analyst

    • Easiest to use

  • GSLIB

    • Library of fortran programs

  • DeCesare’s version of GSLIB

    • Modification of GSLIB to do space-time kriging

  • BMELIB

    • Library of MATLAB programs

Ernest To 20090408