1 / 30

Spatial Data Analysis Areas I: Rate Smoothing and the MAUP

Ifgi, Muenster, Fall School 2005. Spatial Data Analysis Areas I: Rate Smoothing and the MAUP. Gilberto Câmara INPE, Brazil. Areal data. Study region is partitioned in disjoint areas The region is the union of the areas Each map has one or more associated measures

junior
Download Presentation

Spatial Data Analysis Areas I: Rate Smoothing and the MAUP

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Ifgi, Muenster, Fall School 2005 Spatial Data Analysis Areas I: Rate Smoothing and the MAUP Gilberto Câmara INPE, Brazil

  2. Areal data • Study region is partitioned in disjoint areas • The region is the union of the areas • Each map has one or more associated measures • Treated as random variables • Examples: • Map of Germany divided in municipalities. For each area, we measure the unemployment rate and the literacy rate. • Is unemployment correlated with years of school? • What about Brazil?

  3. Violence in Minas Gerais

  4. Violence in Minas Gerais

  5. Violence in Minas Gerais

  6. Attributes in areal data • As a general rule, each measure is a sum, count or a similar aggregated function over all the area • Each value is associated to all the corresponding area • If we need to choose a single location, usually we take the polygon centroid • There are no intermediate values

  7. What is mapped in areal data? • Typical values are rates or proportions • Numerator = events • Denominador = pop at risk • Log maps?

  8. Log rate of motor vehicle accident death per 100.000 residents, 1990-92

  9. Log ratio of homicide death of males 15-49 per 100.000 residents of same group age, 1990-92

  10. Models of Discrete Spatial Variation Random variable in area i • n° of ill people • n° of newborn babies • per capita income Source: Renato Assunção (UFMG/Brasil)

  11. Dealing with rates and proportions When the study variable is a rate or a proportion, mapping those rates is the first obvious step in any analysis. However, the use of raw observed rates might be misleading, since the variability of those rates will be a function of the population counts, which differs widely between the areas. Bailey,1995

  12. Source: Fred Ramos (CEDEST/Brasil)

  13. Model-Driven Approaches • Model of discrete spatial variation • Each subregion is described by is a statistical distribution Zi • e.g., homicides numbers are Poisson (, ). • The main objective of the analysis is to estimate the joint distribution of random variables Z = {Z1,…,Zn} • We use a model-driven approach to correct the missing data • It is called the “Empirical Bayes” method... • We could also use the “Full Bayes” method (but that is another story...)

  14. (measured rate) i In Bayesian statistics, the best estimate of the true and unknown rate is where Source: Fred Ramos (CEDEST/Brasil)

  15. Empirical Bayes Simplifying assumptions for estimating means and variances for all random variables of all areas (Marshall, 1991) Source: Fred Ramos (CEDEST/Brasil)

  16. Source: Fred Ramos (CEDEST/Brasil)

  17. Infant Mortality Rate – São Paulo (Raw) Source: Fred Ramos (CEDEST/Brasil)

  18. Infant Mortality Rate – São Paulo (Corrected) Source: Fred Ramos (CEDEST/Brasil)

  19. Some Important Questions • How does scale matter? • How do the spatial partitions matter? • How does proximity matter? • What can we learn by studing how multiple data vary in space? • How much prior assumptions can we impose in our spatial data?

  20. A Question of Scale Problema das Unidades de Área Modificáveis - MAUP • A basic problem with areal data • The spatial definition of the frontiers of the areas impacts the results • Different results can be obtained by just changing the frontiers of these zones. • This problem is known as the “the modifiable area unit problem”

  21. Scale Effects Per capita income Jobs/ population Illiterate / population Source: Fred Ramos (CEDEST/Brasil)

  22. Scale Effects Per capita income Jobs/ population Illiterate / population Source: Fred Ramos (CEDEST/Brasil)

  23. Scale Effects: Figthing the MAUP Population >60 years Illiterates per capita income 270 ZONES OD97 Source: Fred Ramos (CEDEST/Brasil)

  24. Scale Effects: Figthing the MAUP Population >60 years Illiterates per capita income 96 DISTRICTS OF SÃO PAULO Source: Fred Ramos (CEDEST/Brasil)

  25. Scale Effects: Figthing the MAUP Source: Fred Ramos (CEDEST/Brasil) Population >60 years Illiterates per capita income 96 INCOME-HOMOGENOUS ZONES IN SÃO PAULO

  26. Correlation matrices 270 ZONES OD97 VARIABLES A) Percentage of population 60 year-old or more B) Percentage of illiterate population C) Per capita individual income 96 DISTRICTS 96 INCOME-AGGREGATED Source: Fred Ramos (CEDEST/Brasil)

  27. A Questão da Escala Get census data Adaptation Identify inter-tract variation Reduce data variability Minimize the outlier effect

  28. Regionalization • Reagregate N small areas (finest scale available) into M bigger regions to reduce scale effects. • A possible solution: constrained clustering

  29. Regionalization: Maps as graphs

  30. Regionalization: Maps as graphs Simple aggregation Population-constrained aggregation

More Related