Spatial Statistics. Concepts (O&U Ch. 3) Centrographic Statistics (O&U Ch. 4 p. 7781) single, summary measures of a spatial distribution Point Pattern Analysis (O&U Ch 4 p. 81114)  pattern analysis; points have no magnitude (“no variable”) Quadrat Analysis
Related searches for Spatial Statistics
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Concepts (O&U Ch. 3)
Centrographic Statistics (O&U Ch. 4 p. 7781)
single, summary measures of a spatial distribution
Point Pattern Analysis (O&U Ch 4 p. 81114)
 pattern analysis; points have no magnitude (“no variable”)
Quadrat Analysis
Nearest Neighbor Analysis
Spatial Autocorrelation (O&U Ch 7 pp. 180205
One variable
The Weights Matrix
Join Count Statistic
Moran’s I (O&U pp 196201)
Geary’s C Ratio (O&U pp 201)
General G
LISA
Correlation and Regression
Two variables
Standard
Spatial
Briggs UTDallas GISC 6382 Spring 2007
We will be looking at both!
Briggs UTDallas GISC 6382 Spring 2007
Formulae for variance
å
n
2
å
2
(
X
X
)
n
å
X
[(
X
)
/
N
]

2
i

i
=
=
=
1
i
i
1
N
N
Classic Descriptive Statistics: UnivariateMeasures of Central Tendency and DispersionThese may be obtained in ArcGIS by:
opening a table, right clicking on column heading, and selecting Statistics
going to ArcToolbox>Analysis>Statistics>Summary Statistics
Briggs UTDallas GISC 6382 Spring 2007
2.5%
1.96
1.96
0
Classic Descriptive Statistics: UnivariateFrequency distributionsA counting of the frequency with which values occur on a variable
In ArcGIS, you may obtain frequency counts on a categorical variable via:
ArcToolbox>Analysis>Statistics>Frequency
Briggs UTDallas GISC 6382 Spring 2007
n
2
(
Y
Y
)
Sy=
å

i
n


(
x
X
)(
y
Y
)
=
1
i
i
i
=
N
=
i
1
r
n
S
S
å
n
2
(
X
X
)
x
y
SX=

i
=
i
1
N
Classic Descriptive Statistics: BivariatePearson Product Moment Correlation Coefficient (r)1 implies perfect negative association
0 implies no association
+1 implies perfect positive association
Where Sx and Sy are the standard deviations of X and Y, and X and Y are the means.
Briggs UTDallas GISC 6382 Spring 2007
Classic Descriptive Statistics: BivariateCalculation Formulae for Pearson Product Moment Correlation Coefficient (r)
Correlation Coefficient example using “calculation formulae”
As we explore spatial statistics, we will see many analogies to the mean, the variance, and the correlation coefficient, and their various formulae
There is an example of calculation later in this presentation.
Briggs UTDallas GISC 6382 Spring 2007
Briggs UTDallas GISC 6382 Spring 2007
2.5%
1.96
0
1.96
Statistical Hypothesis Testing: Classic ApproachStatistical hypothesis testing usually involves 2 values; don’t confuse them!
We proceed from the null hypothesis (Ho ) that, in the population, there is “no difference” between the two sample statistics, or from spatial randomness*
If the test statistic is beyond +/ 1.96 (assuming a Normal distribution), we reject the null hypothesis (of no difference) and assume a statistically significant difference at at least the 0.05 significance level.
*O’Sullivan and Unwin use the term IRP/CSR: independent random process/complete spatial randomness
Briggs UTDallas GISC 6382 Spring 2007
Our observed value:
highly unlikely to have occurred if the process was random
conclude that process is not random
Empirical frequency distribution from 499 random patterns (“samples”)
This approach is used in Anselin’s GeoDA software
From O’Sullivan and Unwin p.69
Briggs UTDallas GISC 6382 Spring 2007
Processes differ from random in two fundamental ways
In practice, it is very difficult to disentangle these two effects merely by the analysis of spatial data
Briggs UTDallas GISC 6382 Spring 2007
What do we mean by spatially random?
UNIFORM/
DISPERSED
CLUSTERED
RANDOM
Measures of CentralityMeasures of Dispersion
This is a repeat of material from GIS Fundamentals. To save time, we will not go over it again here.Go to Slide # 25
Briggs UTDallas GISC 6382 Spring 2007
Distant points have large effect.
Provides a single point summary measure
for the location of distribution.
Briggs UTDallas GISC 6382 Spring 2007
Briggs UTDallas GISC 6382 Spring 2007
Briggs UTDallas GISC 6382 Spring 2007
7,7
10
10
4,7
7,7
7,3
2,3
5
5
6,2
0
0
10
10
5
5
7,3
2,3
6,2
0
0
Calculating the centroid of a polygon or the mean center of a set of points.
(same example data as for area of polygon)
Calculating the weighted mean center. Note how it is pulled toward the high weight point.
Briggs UTDallas GISC 6382 Spring 2007
Source: Neft, 1966
Briggs UTDallas GISC 6382 Spring 2007
Median and Mean Centers for US Population
Median Center:
Intersection of a north/south and an east/west line drawn so half of population lives above and half below the e/w line, and half lives to the left and half to the right of the n/s line
Mean Center:
Balancing point of a weightless map, if equal weights placed on it at the residence of every person on census day.
Source: US Statistical Abstract 2003
Briggs UTDallas GISC 6382 Spring 2007
deviation of single variable
Standard Distance Deviationwhich by Pythagorasreduces to:
essentially the average distance of points from the center
Provides a single unit measure of the spread or dispersion of a distribution.
We can also calculate a weighted standard distance analogous to the weighted mean center.
Or, with weights
Briggs UTDallas GISC 6382 Spring 2007
7,7
10
7,3
2,3
5
6,2
0
10
5
0
Standard Distance Deviation ExampleCircle with radii=SDD=2.9
Briggs UTDallas GISC 6382 Spring 2007
The major axis defines the direction of maximum spreadof the distribution
The minor axis is perpendicular to itand defines the minimum spread
Briggs UTDallas GISC 6382 Spring 2007
Briggs UTDallas GISC 6382 Spring 2007
example
There appears to be no major difference between the location of the software and the telecommunications industry in North Texas.
Briggs UTDallas GISC 6382 Spring 2007
Analysis of spatial properties of the entire body of points rather than the derivation of single summary measures
Two primary approaches:
Although the above would suggest that the first approach examines first order effects and the second approach examines second order effects, in practice the two cannot be separated.
See O&U pp. 8188
Briggs UTDallas GISC 6382 Spring 2007
used for secondary
(e.g census) data
Random sampling
useful in field work
Frequency counts by Quadrat would be:
Multiple ways to create quadrats
and results can differ accordingly!
Quadrats don’t have to be square
and their size has a big influence
Briggs UTDallas GISC 6382 Spring 2007
P
Quadrat Analysis: Variance/Mean Ratio (VMR)Where:
A = area of region
P = # of points
See following slide for example. See O&U p 98100 for another example
Briggs UTDallas GISC 6382 Spring 2007
x
x
UNIFORM/
DISPERSED
CLUSTERED
random
uniform
Clustered
Formulae for variance
2
RANDOM
Note:
N = number of Quadrats = 10
Ratio = Variance/mean
Briggs UTDallas GISC 6382 Spring 2007
clustered
200(202)/10=80
2
random
60(202)/10=10
2
uniform
40(202)/10=0
2
(See O&U p 98100)
Briggs UTDallas GISC 6382 Spring 2007
Briggs UTDallas GISC 6382 Spring 2007
D = max [ Cum Obser. Freq – Cum Expect. Freq]
The largest difference (irrespective of sign) between observed cumulative frequency and expected cumulative frequency
D (at 5%) = 1.36where Q is the number of quadrats
Q
p(0) = eλ= 1 / (2.71828P/Q) and p(x) = p(x  1) * λ/x
Where x = number of points in a quadrat and p(x) = the probability of x points
P = total number of points Q = number of quadrats
λ = P/Q (the average number of points per quadrat)
See next slide for worked example for cluster case
The spreadsheet spatstat.xls contains worked examples for the Uniform/ Clustered/ Random data previously used, as well as for Lee and Wong’s data
For example, quadrat analysis cannot distinguish between these two, obviously different, patterns
For example, overall pattern here is dispersed, but there are some local clusters
Briggs UTDallas GISC 6382 Spring 2007
NNI=Observed Aver. Dist / Expected Aver. Dist
For random pattern, NNI = 1
For clustered pattern, NNI = 0
For dispersed pattern, NNI = 2.149
Standard Error
if Z is below –1.96 or above +1.96, we are 95% confident that the distribution is not randomly distributed. (If the observed pattern was random, there are less than 5 chances in 100 we would have observed a z value this large.)
(in the example that follows, the fact that the NNI for uniform is 1.96 is coincidence!)
Index
Where:
Significance test
Briggs UTDallas GISC 6382 Spring 2007
Mean distance
Mean distance
NNI
NNI
NNI
RANDOM
CLUSTERED
UNIFORM
Z = 5.508
Z = 0.1515
Z = 5.855
Source: Lembro
Briggs UTDallas GISC 6382 Spring 2007
The instantiation of Tobler’s first law of geography
Everything is related to everything else, but near things are more related than distant things.
Correlation of a variable with itself through space.
The correlation between an observation’s value on a variable and the value of closeby observations on the same variable
The degree to which characteristics at one location are similar (or dissimilar) to those nearby.
Measure of the extent to which the occurrence of an event in an areal unit constrains, or makes more probable, the occurrence of a similar event in a neighboring areal unit.
Several measures available:
Join Count Statistic
Moran’s I
Geary’s C ratio
General (GetisOrd) G
Anselin’s Local Index of Spatial Autocorrelation (LISA)
These measures may be “global” or “local”
Autocorrelation
Positive: similar values cluster together on a map
Auto:
self
Correlation:
degree of
relative
correspondence
Source: Dr Dan Griffith, with modification
Negative: dissimilar values cluster together on a map
Briggs UTDallas GISC 6382 Spring 2007
Briggs UTDallas GISC 6382 Spring 2007
Briggs UTDallas GISC 6382 Spring 2007
For Regular Polygons
rook case or queen case
For Irregular polygons
For points
select the contiguity criteria
construct n x n weights matrix with 1 if contiguous, 0 otherwise
An archive of contiguity matrices for US states and counties is at: http://sal.uiuc.edu/weights/index.html(note: the .gal format is weird!!!)
Briggs UTDallas GISC 6382 Spring 2007
Briggs UTDallas GISC 6382 Spring 2007
Briggs UTDallas GISC 6382 Spring 2007
Lee and Wong pp. 147156
Small proportion (or count) of BW joins
Large proportion of BB and WW joins
Dissimilar proportions (or counts) of BW, BB and WW joins
Large proportion (or count) of BW joins
Small proportion of BB and WW joins
Briggs UTDallas GISC 6382 Spring 2007
Expected given by:
SD of Expected
Standard Deviation of Expected given by:
Where: k is the total number of joins (neighbors)
pB is the expected proportion Black
pW is the expected proportion White
m is calculated from k according to:
Note: the formulae given here are for free (normality) sampling. Those for nonfree (randomization) sampling are substantially more complex. See Wong and Lee p. 151 compared to p. 155
Briggs UTDallas GISC 6382 Spring 2007
Briggs UTDallas GISC 6382 Spring 2007
Briggs UTDallas GISC 6382 Spring 2007
Briggs UTDallas GISC 6382 Spring 2007
Where: I is the calculated value for Moran’s I from the sample
E(I) is the expected value (mean)
S is the standard error
Statistical Significance Tests for Moran’s IE(I) = 1/(n1)
The actual formulae for calculation are in Lee and Wong p. 82 and 1601
Briggs UTDallas GISC 6382 Spring 2007
Moran’s I can be interpreted as the correlation between variable, X, and the “spatial lag” of X formed by averaging all the values of X for the neighboring polygons
We can then draw a scatter diagram between these two variables (in standardized form): X and lagX (or w_X)
Low/High negative SA
High/High positive SA
The slope of the regression line is Moran’s I
Each quadrant corresponds to one of the four different types of spatial association (SA)
High/Low negative SA
Low/Low positive SA
Briggs UTDallas GISC 6382 Spring 2007
Briggs UTDallas GISC 6382 Spring 2007
Geary’s C varies on a scale from 0 to 2
Briggs UTDallas GISC 6382 Spring 2007
Where: C is the calculated value for Moran’s I from the sample
E(C) is the expected value (mean)
S is the standard error
however, E(C) = 1
The actual formulae for calculation are in Lee and Wong p. 81 and p. 162
Briggs UTDallas GISC 6382 Spring 2007
Briggs UTDallas GISC 6382 Spring 2007
where sample
Calculating General GWhere:
d is neighborhood distance
Wij weights matrix has only 1 or 0
1 if j is within d distance of i
0 if its beyond that distance
Briggs UTDallas GISC 6382 Spring 2007
with sample
Testing General GG(d) = 0.5557 E(G) = .5238.
Since G(d) is greater than E(G) this indicates potential “hot spots” (clusters of high values)
However, the test statistic Z= 0.3463
Since this does not lie “beyond +/1.96, our standard marker for a 0.05 significance level, we conclude that the difference between G(d) and E(G) could have occurred by chance.” There is no compelling evidence for a hot spot.
However, the calculation of the standard error is complex. See Lee and Wong pp 164167 for formulae.
Briggs UTDallas GISC 6382 Spring 2007
Moran’s I is most commonly used for this purpose, and the localized version is often referred to as Anselin’s LISA.
LISA is a direct extension of the Moran Scatter plot which is often viewed in conjunction with LISA maps
Ashtabula
Lake
Geauga
Cuyahoga
Trumbull
Summit
Portage
Ashtabula has a statistically significant
Negative spatial autocorrelation ‘cos it is
a poor county surrounded by rich ones
(Geauga and Lake in particular)
Source: Lee and Wong
Median
Income
(p< 0.10)
(p< 0.05)
Briggs UTDallas GISC 6382 Spring 2007
High crime clusters
LISA map
(only significant values plotted)
Significance map
(only significant values plotted)
For more detail on LISA, see:
Luc Anselin Local Indicators of Spatial AssociationLISAGeographical Analysis 27: 93115
Low crime clusters
Briggs UTDallas GISC 6382 Spring 2007
All measures so far have been univariate—involving one variable only
We may be interested in the association between two (or more) variables.
Briggs UTDallas GISC 6382 Spring 2007
Where Sx and Sy are the standard deviations of X and Y and X and Y are the means.
Pearson Product Moment Correlation Coefficient (r)1 implies perfect negative association
(price and quantity purchased)
0 implies no association
+1 implies perfect positive association
Briggs UTDallas GISC 6382 Spring 2007
Correlation Coefficient example using “calculation formulae”
Scatter Diagram
Source: Lee and Wong
Briggs UTDallas GISC 6382 Spring 2007
Y formulae”
b
1
a
X
Ordinary Least Squares (OLS) Simple Linear RegressionY = a +bY
The regression line minimizes the sum of the squared deviations between actual Yi and predicted Ŷi
Yi
Ŷi
Min (YiŶi)2
X
0
Briggs UTDallas GISC 6382 Spring 2007
In ordinary least squares regression (OLS), for example, the correlation coefficients will be biased and their precision exaggerated
Briggs UTDallas GISC 6382 Spring 2007
Classic Inner City: formulae”
Low value/
High crime
Gentrification?
High value/
High crime
Unique:
Low value/
Low crime
Classic suburb:
high value/
low crime
Bivariate LISA and Bivariate Moran Scatter PlotsBriggs UTDallas GISC 6382 Spring 2007
Briggs UTDallas GISC 6382 Spring 2007
Briggs UTDallas GISC 6382 Spring 2007
You may copy this software to install elsewhere
Briggs UTDallas GISC 6382 Spring 2007
Briggs UTDallas GISC 6382 Spring 2007