Chapter 6 – Analysis of mapped point patterns

Chapter 6 – Analysis of mapped point patterns This chapter will introduce methods for analyzing and modeling the spatial distribution of mapped point data in which the location of every individual in the population is known. Two types of analyses can be conducted with mapped point patterns: (1) detecting patterns (hypothesis test – if a pattern is at random, regular or aggregated distribution), and (2) model fitting (inference – e.g., fit point pattern models to an observed point pattern, see Chapter 7.) In this chapter, we will concentrate on the first type of analysis by introducing an important technique for detecting spatial patterns of mapped data.

Nearest-neighbor distribution functions: G(r) and F(r) The various distance methods presented in Chapter 4 only provide summary information on a spatial pattern at a particular distance (e.g., first nearest neighbor distance, etc.). We now present methods that actually describe the distribution of the nearest-neighbor distances, i.e., we model the nn distances by considering the distance as a random variable. G(r) is defined as a probability that the distance from a random chosen event to its nearest neighbor is less than or equal to r: The estimator is: where ri is the nn distance for a randomly chosen event i (i = 1, 2, …, n ), I(ri r) is an indicator function, I(ri r) = 1 if (ri r) is true, 0 otherwise. F(r) is a probability that the distance from a random chosen point to its nearest neighbor is less than or equal to r, also called “empty space function”. It has exactly the same expression as G(r), but r in F(r) is a point-to-event distance.

600 500 400 Gr×652 300 Douglas-fir (n = 652) 200 100 0 0 5 10 15 20 25 30 dist More on G(r) and F(r) Under csr, it can be shown that G(r) has the form To judge how far the empirical is from the csr, a simulation envelope could be computed for based on, say, 100 realizations of s1, s2, …, s652 from a uniform distribution in a study area (i.e., assume the 652 Douglas-firs follow the Poisson distribution). The estimator is calculated from each realization and for each distance r, the largest and smallest values define the simulation envelope. (The envelope is not shown in the figure here.)

R package spatstat for point pattern analysis • Developed by Adrian Baddeley and Rolf Turner • The package supports: • creation, manipulation and plotting of point patterns • exploratory data analysis • simulation of point process models • parametric model-fitting • hypothesis tests and diagnostics The first thing to do for all these analyses is to create a ppp object! Use Douglas-fir data as example: > df.dat=subset(victoria.dat,victoria.dat$sp==“DF”) > df.ppp=ppp(df.dat$x,df.dat$y,c(0,103),c(0,87)) > df.ppp=ppp(df.dat$x,df.dat$y,window=owin(c(0,103),c(0,87)) > df.ppp=ppp(df.dat$x,df.dat$y,poly=list(x=c(0,50,60,0),y=c(0,0,60,50))) # ploygon window

R implementation • Prepare Douglas-fir and Hemlock data into ppp format (df.ppp, hl.ppp) • df.G=Gest(df.ppp) • plot(df.G) • plot(envelope(df.ppp,fun=Gest)) #generate envelope. • The pointwise envelopes are not “confidence bands” for the true value of the function! The test is constructed by choosing a fixed value of r, and rejecting the null hypothesis if the observed function value lies outside the envelope at this value ofr. This test has exact significance level alpha = 2nrank/(1 + nsim). nrank = the rank of the envelope value amongst the nsim simulated values. Regular 1st nn dist Aggregated Baddeley, A.J. & Gill, R.D. 1997. Kaplan-Meier estimators of interpoint distance distributions for spatial point processes. Annals of Statistics 25:263-292.

K-function It is the most important function for quantifying mapped point pattern, proposed by Ripley in 1976, often called Ripley’s K-function. K-function is a second-moment measure as it is closely related to the second-order intensity of a stationary isotropic point process. It captures the spatial (in)dependence between different regions of the point process. Let’s first look at the 1st- and 2nd-order properties of a spatial point process. 1st-order property: where Ax is an infinitesimal region which contains point x. For a stationary process, l(x) = constant. 2nd-order property: For a stationary + isotropic process, l2(x, y) = l2(h), where h = |x-y|. * Ripley, B. D. 1976. The second-order analysis of stationary point process. J. of Appl. Prob. 13:255-266.

. h Definition of K-function K-function is defined as K(h) = l-1E(# of other events within distance h of an arbitrary event). E(# of other events within distance h of an arbitrary event) = lK(h).

. h The relationship between K-function and l2(x, y) where l2(r)/l is interpreted as the conditional intensity of an event at x given an event at 0, i.e., l2(0, x)/l. This intensity corresponds to the intensity at the point x conditional on that there is an event at 0. For a Poisson process, l2(r) = l2, then K(h) = ph2. Use as a null model for csr: K(h) > ph2 suggests aggregated pattern. K(h) = ph2 suggests random pattern. K(h) < ph2 suggests regular pattern.

The properties of K(h) 1. For a Poisson process, l2(r) = l2, then K(h) = ph2. Use as a null model for csr: K(h) > ph2 suggests aggregated pattern. K(h) = ph2 suggests random pattern. K(h) < ph2 suggests regular pattern. 2. K-function is invariant under random thinning. By “random thinning”, we mean that if each event of a process is retained or not according to a series of Bernoulli trials. This property means that the K-function of the resulting thinned process is identical to that of the original, unthinned process.

. . . . . . . . . si . h . . . sj . . . A simple estimator of K(h) Edge effect: Those point close to the edges will have less # of points with the h circle than those points far from the edges. h .

h Toroidal unbiased estimator of K(h) Because of edge effect, the simple estimator is not very efficient and is biased. An alternative is the estimator based on toroidal correction. N+ is the number of points that fall within ||si – sj||  h. Toroidal edge correction, use only for stationary + isotropic patterns Misuse of toroidal edge correction for non-stationary patterns

. . . . . . . . . si . h . h . . sj . . . . Weighted unbiased estimator of K(h) Another unbiased estimator, initially proposed by Ripley (1976), gives more weight to those points near the boundaries. where the weight w(si, sj) is the proportion of the circumference of a circle centered at si, passing through sj (si must be within the study area). w(si, sj) = 1 if the circle entirely locates within the study area.

. . . . . . . . . . . . . . . . . . . h si si si • Computing w(si, sj) • Assume the study area is [0, a][0, b] andsi has coordinates si = (x, y). Rewrite w(si, sj) = w(si, h), h is the radius for a circle centered at si. Denote d1 = min(x, a-x), and d2 = min(y, b-y); thus d1 and d2 are the distances from si to the nearest vertical and horizontal edges of A. • w(si, h) is calculated as follows: • If h2 d12 + d22 (circle intersects with both vertical and horizontal edges): • If h2 d12 + d22 (circle intersects with one edge):

Variance and simulation envelopes As it was mentioned earlier that a csr has K(h) = ph2. It is usual to express K(h) as (*) This transformation stabilizes variance for the transformed K0(h), which is approximately: To judge how far the observed K-function deviates from the csr, a simulation envelope could be constructed based on, say, 99 realizations of s1, s2, …, s982 from a uniform distribution in a study area (i.e., the 982 western hemlock trees follow the Poisson distribution). The K-function is calculated from each realization, and for each distance h the largest and smallest values define the simulation envelope.

R implementation Let’s model the distribution of the 982 western hemlocks. The spatstat program computes the transformed K(h) presented on previous page. >hl.kest=Kest(hl.ppp) # hl.ppp = is ppp object of sptatstat >plot(hl.kest) >plot(hl.kest$r,sqrt(hl.kest$iso/pi)-hl.kest$r) >hl.env=envelope(hl.ppp) >plot(hl.kest$r,sqrt(hl.kest$iso/pi)) >lines(hl.env$r,sqrt(hl.env$lo/pi),col=2) >lines(hl.env$r,sqrt(hl.env$hi/pi),col=2)

3 0.5 Hemlock (n = 982) 2 0.0 L(h) 1 Douglas-fir (n = 652) -0.5 0 -1.0 0 10 20 30 40 50 0 10 20 30 40 50 h h • L-function • In practice, K-function is usually displayed in L-function, defined as • For an aggregated distribution • For a random distribution • For a regular distribution • Examples:

. h g-function (pair correlation function) g-function is derivative of K-function, defined as Obviously, g-function describes how K-function changes with spatial distance lag h. K-function is a cumulative function which may accumulate confounding large scale (large h) effect with the effect of small scales (small h). g-function is said to be able to separate these effects. R implementation: >pcf(hl.ppp)

Bivariate spatial point patterns A bivariate spatial point pattern consists of the locations of two types of events in a bounded study area A, e.g., the distributions of two tree species (Douglas-fir and western hemlock). It can be defined as {sj(i): i = 1, 2; j = 1, 2, …} of type i (i = 1, 2) species at jth location. The two species may or may not be spatially independent. A natural working hypothesis is that the patterns of the two species are independent. However, it is worth to note that the independence does not necessarily guarantee the csr for each of the species. Similar to the univariate case, the K-function can be extended to the bivariate case to quantify the relationship between the two species, defined as K12(h) = l2-1E(# of type 2 events within distance h of an arbitrary type 1 event). If both species are at csr, K12(h) (= K21(h)) has a simple result K12(h) = ph2.

. . . . . . . . . . . si(1) . h . . . sj(2) . . . An unbiased estimator For a given data, K12(h) and K21(h) can be respectively estimated as where w(si(1), sj(2)) is the proportion of the circumference of the circle with centre sj(1) and radius h that lies within the study region A.

An estimator of variance reduction • When the underlying process for both species are independent Poisson, Lotwick & Silverman (1982) show that the most efficient estimator is a linear combination • Because for csr K12(h) = ph2, we can similarly define an L-function: • For an aggregated distribution • For a random distribution • For a regular distribution • * Lotwick, H. W. & Silverman, B. W. 1982. Methods for analysing spatial processes of several types of • points. J. R. Stat. Soc. B, 44:406-413.

R implementation Let’s use Splus to compute K12(h) for redcedar and western hemlock. >victoria.ppp=ppp(victoria.dat$x,victoria.dat$y,c(0,103),c(0,87),marks=victoria.dat$sp) >cdhl.kcross=Kcross(victoria.ppp,”HL”,”CD”) >plot(cdhl.kcross) Also see Kmulti >plot(Kmulti(victoria.ppp,victoria.ppp$marks=="CD",victoria.ppp$marks=="HL"))

25 20 20 15 15 K12(h) K12(h) 10 10 5 5 0 0 0 10 20 30 40 50 0 10 20 30 40 50 Distance h Hemlock-redcedar Douglas fir-hemlock

Assignment: Compute bivariate L function for CD and HL of Victoria.dat > victoria.ppp=ppp(victoria.dat$x,victoria.dat$y,c(0,103),c(0,87),marks=victoria.dat$sp) > cdhl.kcross=Kcross(victoria.ppp,”HL”,”CD”) > plot(cdhl.kcross) > cdhl.env=envelope(victoria.ppp, Kcross, i="HL", j="CD") > cdhl.lfn=sqrt(cdhl.kcross$iso/pi)-cdhl.kcross$r > plot(cdhl.kcross$r, cdhl.lfn, ylim=c(-0.25,1.1), xlab="h", ylab="L function") > lines(cdhl.env$r, sqrt(cdhl.env$hi/pi)-cdhl.env$r, col="red") > lines(cdhl.env$r, sqrt(cdhl.env$lo/pi)-cdhl.env$r, col="blue")

Chapter 6 – Analysis of mapped point patterns