200 likes | 341 Views
Geocoding Public Health Data. Locating Street Addresses. Spatial Data. Public Health data are often inherently spatial: Vital stats have residential street addresses A cohort study of exposure to air pollution might consider residence and work addresses
E N D
Geocoding Public Health Data Locating Street Addresses
Spatial Data Public Health data are often inherently spatial: Vital stats have residential street addresses A cohort study of exposure to air pollution might consider residence and work addresses The problem is how to get these locations on a map. (ie. in a format that is readily usable within a GIS) The process of getting such data placed onto a map or within a GIS is known as Geocoding.
Types of Geocoding • Relational Joins for Spatially Aggregated Data • Address Matching
Aggregated Data For example: A table of data that is grouped at the county level… How do we match this up with GIS map of counties?
Relational Join A GIS is based on the concept of relational databases, which allow us match geographic features with the corresponding attribute data. A table of attributes can be “joined”with a table of geographic features based on a common identifier in GIS. Where common identifiers might be: country name, county name, postal code, etc.
cdc wonder 1999 disease of circulatory system age-adj to yr 2000 pop ICD codes I00-I99
Street Addresses For example: A table of individual street addresses… How do we match this up with a GIS map of streets?
Address Matching in GIS This is known as Address Matching Street geography layer: Street: name, starting & ending address 1234 University Ave 1. matching 2. interpolation Coordinates for the address
Geocoding TIGER The US Census Bureau’s TIGER files include street address information. TIGER = Topologically Integrated Geographic Encoding & Referencing
FRADDL TOADDL University Ave FRADDR TOADDR
Geocoding Services in ArcGIS ArcGIS provides a tool known as Address Locator that allows us to geocode, in particular, street addresses. Address matching relies on street geography, and interpolating the address numbers. ArcGIS comes with a license for StreetMap USA. For the following example, however, we will rely on TIGER files for geocoding.
Address Matching Difficulties Address Matching isn’t as easy as it seems. Even in our little example, we only had good matches for around 50% of our addresses. And we only tried 18 addresses in Berkeley! Problems: Not all mailing addresses correspond to street addresses: PO Box 140 Warren Hall Trailer Parks Newly developed areas lack street maps for geocoding Quality of data, which could be poorly formatted address data and/or errors in street geography data.
Address Matching Difficulties Texas DOH Guideline for Geocoding http://www.tdh.state.tx.us/gis/Images/Docs/GUIDELINE_FOR_GEOCODING.pdf New Jersey Geocoding problems http://www.state.nj.us/health/chs/releasable.htm Jane McElroy’s talk - Univ of Wisc. Geocoding addresses from a large population-based study: Lessons learned and applied http://www.pophealth.wisc.edu/lecture/pm803-02/pm803-25slides.ppt Also paper in 2003, Epidemiology, 14(4): 399-407.
Studies • Bonner et al. 2003, Positional Accuracy of Geocoded Addresses in Epidemiologic Research, Epidemiology, 14(4): 408-412. • 200 addresses geocoded using GIS from Erie and Niagra Counties, NY. • Compared to true locations (GPS measurements) • Any systemic errors between cases versus controls in study (differential misclassification)? • Any errors geocoding historical data? Data back to 1918! • Errors for urban vs non-urban addresses?
Results • Good geocoding of historical data • Accuracy of urban addresses: 32mvs nonurban addresses: 52m • Whether or not this matters in terms of classification of exposures depends upon the study, and geographic scale at which exposure changes. • eg. Proximity to nuclear plant vs EMF
Another study • Cayo, et al., 2003, Positional error in automated geocoding of residential addresses, International Journal of Health Geographics, 2:10. • Compares geocodes to locations on orthophotos • Confirms differences in urban vs suburban vs rural geocoding (mean accuracies 58m, 143m, and 614m, respectively) • Parcel data helps
Give it to someone else to do? • Krieger, et al., 2001, On the Wrong Side of the Tracts? Evaluating the Accuracy of Geocoding in Public Health Research, AJPH, 91(7): 1114-1116. • Protocols for evaluating commercial geocoding service providers. • Spiking a file with errors on purpose to test companies.