Spatial Association and spatial statistic techniques

226 Views

Download Presentation
## Spatial Association and spatial statistic techniques

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**Spatial Association and spatial statistic techniques**Danlin Yu Ph.D. Candidate Dept. of Geography, UWM**Detecting Spatial Association**• What is spatial association • Spatial objects tend to relate with one another • Types of spatial association • Spatial autocorrelation: similar (dissimilar) values in space tend to cluster together • Spatial heterogeneity: spatial regimes, space is not homogeneous • Autocorrelation and heterogeneity are closely related**Detecting spatial association**• Why study spatial association • It is inherent in geographic researches • When working on spatial data, analyses based on regular statistics are VERY likely to be misleading or incorrect • How to detect spatial association • Power of GIS • Exploratory Spatial Data Analysis (ESDA): let the data speak**Background**• The first law of Geography: • Everything is related, but things nearby are more related than things far away • Characteristics of spatial statistics • Existence of spatial association violates an important statistical assumption: independence • Spatial patterns are results of spatial processes – the one we see, is one of numerous possibilities from the same spatial process**Types of spatial association**• Point spatial association • Distance is critical in deciding point spatial association • Line spatial association • Distance and path • Areal spatial association • Distance and contiguity**Today’s topic: univariate SA**• Univariate: for pattern detection • Examples: per capita GDP for economic performance pattern; surface temperature for local climate pattern, etc. • Central question: is the pattern we see a result of some specific processes (usually random or normal processes – our null hypothesis)? • Multivariate: spatial regression or geographically weighted regression (GWR)**Researching means**• Hypothesis testing in answering this question is conducted via spatial statistic means • For univariate geographic data, there are a few indexes in literature: • Moran’s Index (Moran’s I) • Geary’s Index (Geary’s c) • Getis’s G or O**Spatial statistic indexes**• Purposes of the three indexes are very similar – based on the geographic data, calculate an index, test the index against the null • The most often encountered index is the Moran’s I • Discussion on Moran’s I are applicable to other indexes subject to minor adjustments**Moran’s Index (I)**• Structured like the Pearson’s product-moment statistic: measure of covariance**Moran’s I**• wij is the weight, wij=1 if locations i and j are adjacent and zero otherwise (wii=0, a region is not adjacent to itself). • yi and are the variable in the ith location and the mean of the variable, respectively • n is the total number of observations • I is used to test hypotheses concerning similarity**Determining the weights**• Two rules • Distance: locations within a certain distance are considered as neighbors • Border-sharing (for areal units only): areas sharing borders are considered as neighbors • Weights matrix: could be symmetric or asymmetric – binary weights matrix, general weights matrix (distance decaying)**Determining the weights**• Spatial weights matrix should be constructed judiciously • Ideally, related to general concepts from spatial interaction theory, such as the notions of accessibility and potential etc.**Determining the weights**• When used in hypothesis testing, this requirement is less stringent • Since our purpose is to test the null – spatial independence • Still, trying a few structures is a good idea – border sharing, different distances**Determining the weights**• A typical symmetric weights matrix is a binary weights matrix where neighbors are coded as 1, others 0 • Without losing generality, it is usually row standardized – all elements of one row add up to 1**Hypothesis testing**• The expected values and the variance for Moran’s I are used for testing • However, it is observed that in the null hypothesis, Moran’s I usually does not follow normal distribution • Alternatives • Random permutation • Saddlepoint approximation**Hypothesis testing**• Monte Carlo (random) permutation for Moran’s I • Randomly arrange the values among the space and calculate I each time (e.g., 999 times) • Comparing the actual I with the 999 randomly gained Is • If the actual I falls into area of either more than 95% or less than 5%, it is said the I is psuedo significant at 5% level (positive/negative)**Hypothesis testing**• Saddlepoint approximation (Tiefolsdorf, 2001) • Exact distribution of Moran’s I can be obtained, but computationally prohibitive for even medium size data set • A saddlepoint distribution approximates the exact distribution with reasonable accuracy • Based on the ratio of quadratic normal variables • Usually, random permutation would do the job**Global and local (1)**• The Moran’s I just introduced are based on simultaneous measurements from many locations – hence, it is a GLOBAL statistics • Global statistics provides only a limited set of spatial association measurements • You see the pattern, details are ignored – tree and forest dilemma**Global and local (2)**• Recently, a number of statistics have been developed to measure dependence in portion of the study area – the local statistics • In spatial data analysis, the name is Local Index of Spatial Association (LISA) by Anselin (1995)**Global and local (3)**• Definition of LISA (Anselin, 1995) • The local statistics for each observation gives an indication of the extent of significant spatial clustering of similar values around that observation • The sum of local statistics for all observation is proportional (or equal) to a corresponding global statistics**Global and local (4)**• Local statistics are well suited to • Identify existence of pockets or “hot spots” • Assess assumptions of stationarity • Identify distances beyond which no discernible association obtains • Global and local statistics are often used together for thorough understanding of spatial association and processes**Global and local (5)**• This discussion is based on the decomposition of the Moran’s I to its local version • Others can be done similarly, however, there is an important aspects of Moran’s I that will assist further understanding in spatial analysis • It can be decomposed into its local version, AND a graphic version – Moran’s scatterplot**Local Moran’s I**• Following Anselin’s (1995) definition, a local Moran’s Ii may be defined as: • zis are the deviations from the mean of yis • The weights are row standardized**Local Moran’s I**• Hypothesis test for local Moran’s I is more complex • The distribution of local Moran’s I is definitely not normal, furthermore, local Moran’s I’s distribution is influenced by the global pattern • Random permutation won’t work – for one specific location, during the permutation, the local Moran’s I’s mean and variance keep changing – which is not the case for global one**Local Moran’s I**• Exact distribution of local Moran’s I can be obtained, but extremely computationally prohibitive • Saddlepoint approximation currently is thus far one potential resolution • Details can be found at Tiefelsdorf (2000; 2002)**Local Moran’s I**• In addition, local Moran’s Is correlate with one another due to overlapping neighbors • Bonferroni correction or other correction methods are needed for acquiring robust testing results • These are all done in the SPDEP package in R**Moran’s scatterplot**• A graphic tool for detecting local spatial association • Derived directly from the global Moran’s I • It can be used together with the local Moran’s I for better understanding**Moran’s scatterplot**• Recall the formula of Moran’s I: • If use row standardized weights matrix the first term will be 1**Moran’s scatterplot**• Therefore, I could be re-written as: Or:**Moran’s scatterplot**• Recall the coefficient of the linear regression, b: • indi and depiare the independent and dependent variables; the “bar” versions are their means, respectively; and b is the regression coefficient**Moran’s scatterplot**• Yes, similarity between the Moran’s I and the regression coefficient b • Actually, is the so-called “spatial lag” of location i. • So, I is formally equivalent to a regression coefficient in a regression of a location’s spatial lag on itself**Moran’s Scatterplot**• This interpretation enables us to visualize Moran’s I in a scatterplot of a location’s spatial lag and itself – the Moran’s scatterplot • Moran’s I is the slope of the regression line • A lack of fit (in the scatterplot) would indicate important local spatial process and associations (local pockets/non-stationarity)**Moran’s scatterplot**• The scatterplot is centered on the coordinate Origin • The first and third quadrants of the plot represent positive association (high-high and low-low), while the second and fourth negative (high-low, low-high) • The density of the quadrants represent the dominating local spatial process**Moran’s scatterplot**• A so-called LOWESS (LOcally Weighted rEgression Scatterplot Smoothing) curve can aid the visual effects • Turning of the LOWESS curve usually indicates interesting local pockets, regimes or non-stationarity • An example: demonstration in R**More about Moran’s Scatterplot**• A very important ESDA tools for spatial data analysis • Further information could be obtained from: The Moran Scatterplot as an ESDA tool to assess local instability in spatial association. pp. 111–125 in M. M. Fischer, H. J. Scholten and D. Unwin (eds.) Spatial analytical perspectives on GIS, London: Taylor and Francis**An analytical example**• Spatial pattern detection in China’s provincial development • The variable used: per capita GDP • Dynamic patterns – global Moran’s I • Specific local spatial process – local Moran’s I and the Moran’s scatterplot**An analytical example**Dynamic change of global Moran’s I from 1978 to 2000, all are significant at 5% level per random permutation**An analytical example**• There is a clustering trend in China’s provincial level development (represented by per capita GDP • But the global Moran’s I can’t tell on which side does the clustering trend take place: high values cluster or low values cluster?**An analytical example**• First, China’s coast-interior divide persisted • Interior provinces exhibit great geographical similarity in economic development and spatial contributions to the global Moran’s I • Second, the municipalities (Beijing, Tianjin, Shanghai) always contribute the most • Shanghai’s position is worth noting, it development changed the spatial pattern the most**An analytical example**• Third, Guangdong’s contribution to the global index corresponds with its changing spatial behavior depicted in the Moran scatterplot • Fourth, while most of the interior provinces have similar patterns, coastal provinces vary greatly**An analytical example**• Fifth, Shandong fell into the low-low quadrant, and contributed very little to the global index • Sixth, Guizhou and Yunnan, two provinces in southwest China, contributed relatively highly to the global index in 2000 • The poorest ones tend to form a poor cluster**Demo – with R and SPDEP**• A little demonstration • The software package R: freeware, powerful, open source • Packages: SPDEP and MAPTOOLS • If you have spatial data and interested in utilizing ESDA, you can approach me for your research