University at Albany School of Public Health
This presentation is the property of its rightful owner.
Sponsored Links
1 / 25

Glen Johnson, PhD Lehman College / CUNY School of Public Health [email protected] PowerPoint PPT Presentation


  • 108 Views
  • Uploaded on
  • Presentation posted in: General

University at Albany School of Public Health EPI 621, Geographic Information Systems and Public Health. Introduction to Smoothing and Spatial Regression. Glen Johnson, PhD Lehman College / CUNY School of Public Health [email protected] Consider points distributed in space.

Download Presentation

Glen Johnson, PhD Lehman College / CUNY School of Public Health [email protected]

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Glen johnson phd lehman college cuny school of public health glen johnson lehman cuny

University at Albany School of Public HealthEPI 621, Geographic Information Systems and Public Health

Introduction to Smoothing and Spatial Regression

Glen Johnson, PhD

Lehman College / CUNY School of Public Health

[email protected]


Glen johnson phd lehman college cuny school of public health glen johnson lehman cuny

Consider points distributed in space

“Pure” Point process:

Only coordinates locating some “events”.

Set of points,

S ={s1,s2, … ,sn}

Points represent locations of something that is measured.

Values of a random variable, Z, are observed for a set S of locations, such that the set of measurements are

Z(s) ={Z(s1), Z(s2), … , Z(sn)}

  • _____________________

  • Examples include

  • location of burglaries

  • location of disease cases

  • location of trees, etc.

  • ___________________________

  • Examples include

  • cases and controls (binary outcome) identified by location of residence

  • Population-based count (integer outcome) tied to geographic centroids

  • PCBs measured in mg/kg (continuous outcome) in soil cores taken at specific point locations


Glen johnson phd lehman college cuny school of public health glen johnson lehman cuny

Example of a Pure Point Process: Baltimore Crime Events

Question: How to interpolate a smoothed surface that shows varying “intensity” of the points?

(source: http://www.people.fas.harvard.edu/~zhukov/spatial.html)


Glen johnson phd lehman college cuny school of public health glen johnson lehman cuny

Kernel Density Estimation

From: Cromely and McLafferty. 2002. GIS and Public Health.


Glen johnson phd lehman college cuny school of public health glen johnson lehman cuny

Kernel Density Estimation

Estimate “intensity” of events at regular grid points as a function of nearby observed events. General formula for any pointx is:

where xiare “observed” pointsfor i= 1, …, nlocations in the study area, k(.) is a kernel function that assigns decreasing weight to observed points as they approach the bandwidth h. Points that lie beyond the bandwidth, h, are given zero weighting.


Glen johnson phd lehman college cuny school of public health glen johnson lehman cuny

Results from Kernel Density Smoothing in R


Glen johnson phd lehman college cuny school of public health glen johnson lehman cuny

Kernel Density Surface of Bike Share Locations in NYC

Source: http://spatialityblog.com/2011/09/29/spatial-analysis-of-nyc-bikeshare-maps/


Glen johnson phd lehman college cuny school of public health glen johnson lehman cuny

Examples of Values Observed at Point Locations, Z(s):

Question: How to interpolate a smoothed surface that captures variation in Z(s)?


Glen johnson phd lehman college cuny school of public health glen johnson lehman cuny

  • First, consider “deterministic” approaches to spatial interpolation:

  • Deterministic models do not acknowledge uncertainty.

  • Only real advantage is simplicity; good for exploratory analysis

  • Several options, all with limitations. We will consider Inverse Distance Weighted (IDW) because of its common usage.


Glen johnson phd lehman college cuny school of public health glen johnson lehman cuny

Inverse Distance Weighted Surface Interpolation

Define search parameters

Define power of distance-decay function


Glen johnson phd lehman college cuny school of public health glen johnson lehman cuny

Illustration: Tampa Bay sediment total organic carbon


Glen johnson phd lehman college cuny school of public health glen johnson lehman cuny

True “geostatistical” models assume the data,Z(S) = {Z(s1), Z(s2), … , Z(sn)}, are a partial realization of a random field.

Note that the set of locations S are a subset of some 2-dimensional spatial domain D, that is a subset of the real plane.


Glen johnson phd lehman college cuny school of public health glen johnson lehman cuny

General Protocol:

Characterize properties of spatial autocorrelation through variogram modeling;

Predict values for spatial locations where no data exist, throughKriging.


Glen johnson phd lehman college cuny school of public health glen johnson lehman cuny

A semivariogram is defined as

for distanceh between the two locations, and is estimated as for nh pairs separated by distance hj (called a “lag”).

After repeating for different lags, say j =1, … 10, the semivariance can be plotted as a function of distance.


Glen johnson phd lehman college cuny school of public health glen johnson lehman cuny

Given any location si, all other locations are treated as within distance h if they fall within a search window defined by the direction, lag h, angular tolerance and bandwidth.

bandwidth

Adapted from Waller and Gotway. Applied Spatial Statistics for Public Health. Wiley, 2004.


Glen johnson phd lehman college cuny school of public health glen johnson lehman cuny

Example semivariogram cloud for pairwise differences (red dots) , with the average semivariance for each lag (blue +), and a fitted semivariogram model (solid blue line)


Glen johnson phd lehman college cuny school of public health glen johnson lehman cuny

Characteristics of a semivariogram

Range = the distance within which positive spatial autocorrelation exists

Nugget = spatial discontinuity + observation error

Sill = maximum semivariance


Glen johnson phd lehman college cuny school of public health glen johnson lehman cuny

If the variogram form does not depend on direction, the spatial process is isotropic.

If it does depend on direction, it is anisotropic.

Multiple semi-variograms for different directions. Note changing parameter is the range.

Surface map of semivariance shows values more similar in NW-SE direction and more different in SW-NE direction.


Glen johnson phd lehman college cuny school of public health glen johnson lehman cuny

Kriging then uses semivariogram model results to define weights used for interpolating values where no data exists.

The result is called the “Best Linear Unbiased Predictor”. The basic form is

Where the λi assign weights to neighboring values according to semivariogram modeling that defines a distance-decay relation within the range, beyond which the weight goes to zero.


Glen johnson phd lehman college cuny school of public health glen johnson lehman cuny

  • Several variations of Kriging:

  • Simple (assumes known mean)

  • Ordinary (assumes constant mean, though unknown) [our focus this week]

  • Universal (non-stationary mean)

  • Cokriging(prediction based on more than one inter-related spatial processes)

  • Indicator (probability mapping based on binary variable) [you will see in the lab work]

  • Block (areal prediction from point data)

  • And other variations …


Glen johnson phd lehman college cuny school of public health glen johnson lehman cuny

Example of two types of Kriging for California O3:

Ordinary Kriging (Detrended, Anisotropic)

-continuous surface

Indicator Kriging

- probability isolines


Glen johnson phd lehman college cuny school of public health glen johnson lehman cuny

What if point locations are centroids of polygons and the value Z(si) represents aggregation within polygon i?


Glen johnson phd lehman college cuny school of public health glen johnson lehman cuny

With polygon data, we can still define neighbors as some function of Euclidean distance between polygon centroids, as we do for point-level data,

but now we have other ways to define neighbors and their weights …


Glen johnson phd lehman college cuny school of public health glen johnson lehman cuny

Defining spatial “Neighborhoods”

Raster or Lattice:

Rook

Queen

- 1st order

Queen

  • 2nd order


Glen johnson phd lehman college cuny school of public health glen johnson lehman cuny

  • Spatial Regression Modeling as a method for both

  • assessing the effects of covariates and…

  • smoothing a response variable


  • Login