slide1
Download
Skip this Video
Download Presentation
Going Beyond GIS for Environmental Health

Loading in 2 Seconds...

play fullscreen
1 / 46

Going Beyond GIS for Environmental Health - PowerPoint PPT Presentation


  • 193 Views
  • Uploaded on

Going Beyond GIS for Environmental Health. Frank C. Curriero [email protected] Environmental Health Sciences and Biostatistics Bloomberg School of Public Health EnviroHealth Connections Summer Institute 2006. Bio. Joint appt. in Env Health Sci and Biostatistics PhD in Statistics

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Going Beyond GIS for Environmental Health' - bess


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1
Going Beyond GIS

for

Environmental Health

Frank C. Curriero

[email protected]

Environmental Health Sciences and Biostatistics

Bloomberg School of Public Health

EnviroHealth Connections

Summer Institute

2006

slide2
Bio
  • Joint appt. in Env Health Sci and Biostatistics
  • PhD in Statistics
  • Research agenda is spatial statistics

Statistics

Geography (GIS)

Env Health

Spatial Statistics

slide3
Objectives
  • Provide exposure to the field of spatial statistics.
  • Keep it simple (non-technical)
  • Applications of GIS in Environmental Health
  • Beyond GIS, maps make you think/question
  • Current research topics
  • Geography (location) is a source of variation worth
  • considering in environmental health investigations.
slide4
What is Spatial Statistics?

Statistics for the analysis of spatial data

“spatial”

“geographic”

What is Spatial Data?

The “where” in addition to the “what” was observed

or measured is important and recorded with the data.

Location information (the “where”) can vary.

What is GIS?

Stands for Geographic Information System

Anything more depends on who you ask!

slide5
What is a GIS?

One word def: Database

Two word def: Visual Database

Visual database for geographic data

  • Stores
  • Manipulates
  • Analysis
  • Queries
  • Creates
  • Displays

. . . .

MAPS

“Layer cake of information”

slide6
What else:

- A computer system (piece of software) with a

tremendous amount of capability for storing,

querying, combining, presenting, . . . , spatial data.

- GIS is designed specifically for spatial data and

hence built to handle all of its complicated features.

- GIS is a generic name like word processor. ArcGIS,

MapInfo, Idrisi are examples of different GIS.

- The earth does not have to be the backdrop for

every GIS application, but certainly most common.

slide7
What else (cont.)

- Public health was not the first and probably not

be the last application of GIS and spatial statistics.

- GIS as a mechanism for generating hypotheses

(exploratory spatial data analysis).

- GIS is a tool, a very powerful and valuable tool

when working with spatial data.

slide8
Applications in Spatial Statistics and GIS
  • Waterborne disease outbreaks
  • DDE soil contamination
  • Lyme Disease
  • Prostate cancer mapping
  • Chesapeake Bay water quality assessment
slide9
US Waterborne Disease Outbreaks, 1948-1994

Outbreak Data

Location Longitude Latitude Month Year

AL, Anniston -85.83 33.65 Oct 1953

AL, Center Pt. -86.68 33.63 Nov 1958

WY, Cody -109.06 44.53 July 1986

.

.

.

.

.

.

.

.

.

slide10
US Waterborne Disease Outbreaks, 1948-1994

Substantive Questions

Do outbreaks occur at random across the US?

Are outbreaks preceded by extreme precipitation events?

Does the risk of an outbreak vary spatially and related to

watershed vulnerability?

slide11
Objective: Association between extreme prcip. and outbreaks

Methods: Overlay map of outbreaks and extreme precip events

2,105 watersheds (USGS)

16,000+ weather stations (NCDC)

define extreme precipitation

aggregate precip and outbreak to watershed

Results: 51% of outbreaks were coincident with extreme

levels of precip within a 2 month lag preceding the

outbreak month.

Conclusion: Is this evidence of an association?

slide14
US Waterborne Disease Outbreaks, 1948-1994

Results: 51% of outbreaks were coincident with extreme

levels of precip within a 2 month lag preceding

the outbreak month.

Conclusion: Is this evidence of an association?

slide15
US Waterborne Disease Outbreaks, 1948-1994
  • Map generation included many involved GIS tasks
  • on numerous data sources, GIS Spatial Analysis.
  • Statistically speaking though it represents risk
  • factor data.
  • Spatial statistics often considers the map as a
  • starting point, which in GIS is often an endpoint.
slide16
Western Maryland Superfund Site

DDE Soil Sample Data

Sample # Easting Northing DDE (ppm)

1 1108420 725173 160

2 1108300 725378 4

110 1108490 725038 92

.

.

.

.

.

.

.

.

.

slide17
Substantive Questions

Does the site exceed regulated levels of DDE

contamination and in need of remediation?

What is the level of DDE in my backyard?

slide20
Kriged DDE Predictions

Kriging: Spatial prediction at unsampled locations based

on data from sampled locations.

Environmental health applications of kriging exposure maps

slide21
Baltimore County Lyme Disease: 1989-1990

Lyme Case

Lyme Control

Lyme Disease Cases and Controls

Cases Controls

Longitude Latitude Longitude Latitude

-76.4047 39.3421 -76.4054 39.3419

-76.3433 39.3736 -76.3522 39.3718

-76.7592 39.3265

-76.7665 39.3119

.

.

.

.

.

slide22
Baltimore County Lyme Disease: 1989-1990

Lyme Case

Lyme Control

Substantive Questions

Do cases of Lyme Disease tend to cluster, generally or

as localized “hot spots?”

Does risk of Lyme Disease vary spatially over Balt. County?

Identify and quantify environmental risk factors

associated with Lyme Disease.

slide23
Baltimore County Lyme Disease Risk: 1989-1990

Spatial Case/Control Analysis

  • Spatial density estimate of cases divided by spatial density
  • estimate of controls (nonparametric kernel approach).
  • Logistic regression approach to include covariates.
slide24
Statistical Methods Exist to Address
  • Do cases (events) show a tendency to cluster?
  • Identifying “clusters” or “hot spots.”
  • Does risk of disease (or outcome of interest) vary
  • spatially?
  • Is disease risk elevated near a particular point
  • source?
  • Spatial prediction of outcomes at unobserved
  • locations.
  • Risk factor estimation in the presence of residual
  • spatial variation.
slide25
Types of Spatial Data

1. Geostatistical Data

Basic structure is data tagged with locations.

Locations can essentially exist anywhere.

Referred to as continuous spatial variation.

Example:MD Superfund Site DDE

slide26
2. Point Pattern Data

Locations are the data denoting occurrence of events.

Common to aggregate to area-level data.

Example: Baltimore County Lyme Disease Cases

Baltimore County Lyme Disease Controls

3. Area-level Data

Data summarized to an area unit.

Rarely arises naturally.

Often an aggregate form of point pattern data.

Referred to as discrete spatial variation.

Example: Maryland prostate cancer by zip code

slide27
Why Collect Locations as Part of Data?
  • Sometimes locations are the only data (as in point patterns).
  • Risk (or outcome of interest) may vary spatially.
  • Location can serve as an information gatewayto other
  • linked data sources: environmental
  • demographic
  • social
  • etc.
  • Data are spatially dependent and locations are used in
  • statistical methods that account for this dependence.
  • In general things can vary spatially and geography (location)
  • maybe a source of variation worth considering.
slide28
Temporal Dependence
  • Time series or longitudinal data.
  • Past/present direction inherent in temporal data.

Spatial Dependence

  • Dimensions > 1 and loss of directional component.
  • Observations closer together in space are more
  • similar than observations further away (clustering).

“in space”

“on the earth”

slide29
Spatial Dependence (clustering) in

Environmental Health Data

Could be due to:

  • A contagious agent of the outcome under
  • investigation.
  • The spatial variation in the population at risk.
  • An underlying shared environmental characteristic,
  • measured or unmeasured, that also varies spatially
  • (Shared Environment Effect).
slide30
What GIS is Not
  • A complete system for statistical or scientific inference.
  • Maps, most basic and fundamental concepts in GIS,
  • are not statistical inference.
  • A GIS map of
  • one variable is analogous to a histogram display
  • two variables overlayed is analogous to an x-y
  • scatterplot or 2x2 table.
  • In statistics we go beyond histograms and
  • scatterplots.
slide31
An Important Distinction

In the GIS literature analysis or spatial analysis

often means spatial data manipulation which is

something different than statistical analysis.

slide32
Two Current Research Problems

in Spatial Statistics and GIS

Non-geocoded Data

Non-Euclidean Distance

slide33
Geographic Analysis of Prostate Cancer in Maryland

PI: Ann Klassen (HPM & Oncology)

Collaborators: Margaret Ensminger, Chyvette Williams, JeanHeeHong (HPM)

Frank Curriero (Biostat), Anthony Alberg (Epi)

Martin Kulldorff (Harvard), Helen Meissner (NCI)

Cooperative Agreement from Association of Schools of Public Health and Centers for Disease Control

Data Agreement with the Maryland Cancer Registry

One of six CDC projects investigating geography and prostate cancer, including NY, CT/MA, NJ, Kansas/Iowa, and Louisiana.

slide34
Prostate Cancer Reported to MD Cancer Registry 1992-1997

Proportion of an Outcome of Interest

*

Legend

No Data

0 - 12

13 - 30

31 - 67

68 - 100

*

All geocoded cases

Outcomes of Interest Include

  • Incidence
  • Stage at diagnosis
  • Tumor grade at diagnosis
  • Failure to stage or grade
  • Treatment and mortality
slide35
*

Proportion of an Outcome of Interest

Legend

No Data

0 - 12

13 - 30

31 - 67

68 - 100

* All geocoded cases

slide36
What is Geocoding?

GIS process of translating mailing address information to

coordinates on a map, such as with longitude and latitude

16 Goucher Woods Ct

Towson, MD 21286

(-76.5883, 39.4005)

Nongeocoded Data

Mailing addresses that could not be geocoded

8123 Rose Haven Road

Rosedale, MD 21237

Nongeocoded

slide37
Reasons for Nongeocodes

Address error

PO Box

Rural routes

Base maps out of date

slide38
Legend

0 - 8

9 - 12

13 - 30

31 - 67

68 - 100

Proportion of Outcome of Interest

Geocoded Cases (15,585)

Legend

No Data

0 - 12

13 - 30

31 - 67

68 - 100

All Cases (17,091)

slide39
Statistical Issues

(1) Common to just ignore nongeocodes

What's the Consequence?

Historically not well documented in publications

(2) Level of aggregation for analyses?

Zip code level

Census tract, county, etc.

slide40
Statistical Issues (cont.)

(3) Nongeocodes represent missing data and

most likely not missing at random

MD Prostate Cancer Proportion of NonGeocodes

% Nongeocoded

0 - 9

10 - 25

26 - 47

48 - 75

76 - 100

slide41
Statistical Issues (cont.)

(3) Nongeocodes carry plenty of information

Known Information(fictitious example)

Age = 72

Race = White

Year of Diagnosis = 1991

Stage at Diagnosis = Late

Tumor Grade = Aggressive

Zip Code = 21237

slide42
Statistical Solutions

(a) Impute a location for nongeocodes

Determine the age-race distribution within known zip codes

Weighted random selection based on known age and race

Sampling with and without replacement

Multiple imputation to assess bias

(Joint work with Ann Klassen, HPM)

(b) Develop statistical models for outcomes at

different levels of aggregation

Spatial variation in risk model for geocoded household

level data and nongeocoded zip code level data

(Joint work with Peter Diggle, Biost)

slide43
Chesapeake Bay Water Quality Assessment

Data

Temperature

Turbidity

Dissolved Oxygen

Chlorophyll a

Needed

Assessments at

unsampled locations

slide44
Kriging

A spatial regression method that provides optimal

prediction at unsampled locations.

Kriged predictions are weighted averages of sampled

data, higher weights given to data closer to the prediction

site.

Proximity is measured by the straight line Euclidean

distance (“as the crow flies”).

slide45
Chesapeake Bay Fixed Station Data

Euclidean distance may not

be appropriate.

Propose a water metric

Currently kriging only works

for Euclidean distance.

New methods needed.

slide46
Closing Remarks
  • GIS for spatial database management and
  • hypothesis generation (posing the questions)
  • Spatial Statistics for inferential methods
  • (answering the questions)
  • Why consider location
  • Scientific inference may depend on it
  • Gateway to environmental data
  • Source of variation worth considering
  • Biography and Geography of Public Health
ad