Topic 4 – Geographical Data Analysis

Topic 4 – Geographical Data Analysis A – The Nature of Spatial Analysis B – Basic Spatial Analysis

A The Nature of Spatial Analysis • 1. Spatial Analysis and its Purpose • 2. Spatial Location and Reference • 3. Spatial Patterns • 4. Topological Relationships

1 Spatial Analysis and its Purpose • Conceptual framework • Search of order amid disorder. • Organize information in categories. • Method • Inducting or deducting conclusions from spatially related information. • Deduction: Deriving from a model or a rule a conclusion. • Induction: Learning new concepts from examples. • Spatial analysis as a decision-making tool. • Help the user make better decisions. • Often involve the allocation of resources.

1 Spatial Analysis and its Purpose • Requirements • 1) Information to be analyzed must be encoded in some way. • 2) Encoding implicitly requires a spatial language. • 3) Some media to support the encoded information. • 4) Qualitative and/or quantitative methods to perform operations over encoded information. • 5) Ways to present to results in an explicit message. Information Encoding Media Methods Message

Remote sensing Geomorphology Climatology Quantitative methods Physical Geography Geographic Techniques Biogeography Cartography Soils GIS Human Geography Historical Political Economic Behavioral Population 1 Spatial Analysis and its Purpose Spatial Analysis

1 Mapping Deaths from Cholera, London, 1854 (Snow Study)

1 Spatial Analysis and its Purpose • Data Retrieval • Browsing; windowing (zoom-in & zoom-out). • Query window generation (retrieval of selected features). • Multiple map sheets observation. • Boolean logic functions (meeting specific rules). • Map Generalization • Line coordinate thinning of nodes. • Polygon coordinate thinning of nodes. • Edge-matching. DB HD SHP

1 Spatial Analysis and its Purpose • Map Abstraction • Calculation of centroids. • Visual editing & checking. • Automatic contouring from randomly spaced points. • Generation of Thiessen / proximity polygons. • Reclassification of polygons. • Raster to vector/vector to raster conversion. • Map Sheet Manipulation • Changing scales. • Distortion removal/rectification. • Changing projections. • Rotation of coordinates.

6 5 4.5 7.5 1 Spatial Analysis and its Purpose • Buffer Generation • Generation of zones around certain objects. • Geoprocessing • Polygon overlay. • Polygon dissolve. • “Cookie cutting”. • Measurements • Points - total number or number within an area. • Lines - distance along a straight or curvilinear line. • Polygons - area or perimeter.

1 Spatial Analysis and its Purpose • Raster / Grid Analysis • Grid cell overlay. • Area calculation. • Search radius. • Distance calculations. • Digital Terrain Analysis • Visibility analysis of viewing points. • Insolation intensity. • Grid interpolation. • Cross-sectional viewing. • Slope/aspect analysis. • Watershed calculation. • Contour generation. 15

3 Spatial Patterns • Relativity of objects • Definition of an object in view of another. • Create spatial patterns. • Main patterns • Size. • Distribution/spacing : Uniform, random and clustered. • Proximity. • Density: Dense and dispersed. • Shape. • Orientation. • Scale. Size Form Orientation Scale Proximity

Uniform Clustered Positive autocorrelation Random 3 Spatial Patterns • Spatial autocorrelation • Set of objects that are spatially associated. • Relationship in the process affecting the object. • Negative autocorrelation. • Positive autocorrelation.

4 Topological Relations • Proximity • Qualitative expression of distance. • Link spatial objects by their mutual locations. • Nearest neighbors. • Buffer around a point or a line. • Directionality

4 Topological Relations • Adjacency • Link contiguous entities. • Share at least one common boundary. • Intersection • Containment • Link entities to a higher order set. City B City A

1 2 3 4 5 6 4 Topological Relations • Connectivity • Adjacency applied to a network. • Must follow a path, which is a set of linked nodes. • Shortest path. • All possible paths.

4 Topological Relations Arable land • Intersection • What two geographical objects have in common. • Union • Summation of two geographical objects. • Complementarity • What is outside of the geographical object. Flat land Suitable for agriculture Land Non arable land

B Elementary Spatial Analysis • 1. Statistical Generalization • 2. Data Distribution • 3. Spatial Inference

1 Statistical Generalization • Maps and statistical information • Important to display accurately the underlying distribution of data. • Data is generalized to search for a spatial pattern. • If the data is not properly generalized, the message may be obscured. • Balance between remaining true to the data and a generalization enabling to identify spatial patterns. • Thematic maps are a good example of the issue of statistical generalization.

15 25 88 0-30 34 56 7 31-65 92 61 45 65- 77 39 21 1 Statistical Generalization Spatial Pattern Data Classification

1 Statistical Generalization • Number of classes • Too few classes: contours of data distribution is obscured. • Too many classes: confusion will be created. • Most thematic maps have between 3 and 7 classes. • 8 shades of gray are generally the maximum possible to tell apart.

1 Statistical Generalization • Classification methods • Thematic maps developed from the same data and with the same number of classes, will convey a different message if the ranging method is different. • Each ranging method is particular to a data distribution.

Frequency Value 2 Data Distribution • Histogram • The first step in producing a thematic map. • See how data is distributed. • Use of basic statistics such as mean and standard deviation. • An histogram plots the value against the frequency. Uniform Normal Exponential

C1 C2 C3 C4 L H 2 Data Distribution • Equal interval • Each class has an equal range of values. • Difference between the lowest and the highest value divided by the number of categories. • (H-L)/C • Easy to interpret. • Good for uniform distributions and continuous data. • Inappropriate if data is clustered around a few values. Frequency Value

C1 C2 C3 C4 n(C4) n(C1) n(C2) n(C3) 2 Data Distribution • Quantiles • Equal number of observations in each category. • n(C1) = n(C2) = n(C3) = n(C4). • Relevant for evenly distributed data. • Features with similar values may end up in different categories. • Equal area • Classes divided to have a similar area per class. • Similar to quantiles if size of units is the same. Frequency Value

C1 C2 C3 C4 X -1STD +1STD 2 Data Distribution • Standard deviation • The mean (X) and standard deviation (STD) are used to set cutpoints. • Good when the distribution is normal. • Display features that are above and below average. • Very different (abnormal) elements are shown. • Does not show the values of the features, only their distance from the average. Frequency Value

C1 C2 C3 C4 2 Data Distribution • Arithmetic and geometric progressions • Width of the class intervals are increased in a non linear rate. • Good for J shaped distributions. Frequency Value

C1 C2 C3 C4 2 Data Distribution • Natural breaks • Complex optimization method. • Minimize the sum of the variance in each class. • Good for data that is not evenly distributed. • Statistically sound. • Difficult to compare with other classifications. • Difficult to choose the appropriate number of classes. Frequency Value

2 Data Distribution • User defined • The user is free to select class intervals that fit the best the data distribution. • Last resort method, because it is conceptually difficult to explain its choice. • Analysts with experience are able to make a good choice. • Also used to get round numbers after using another type of classification method. • $5,000 - $10,000 instead of $4,982 - $10,123. • Using classification • Classification can be used to deliberately confuse or hide a message.

2 Data Distribution “no problems” - Equal steps “there is a problem” - Quantiles

2 Data Distribution “everything is within standards” - standard deviation

3 Spatial Inference • Filling the gaps • Sampling shortens the time necessary to collect data. • Requires methods to “fill the gaps”. • Interpolation and extrapolation • Data at non-sampled locations can be predicted from sampled locations. • Interpolation: • Predict missing values when bounding values are known. • Extrapolation: • Predict missing values outside the bounding area. • Only one side is known.

Interpolation line Height Sample Location Extrapolation line Delay at the traffic light Sample Interpolation line Number of vehicles 3 Spatial Inference: Interpolation and Extrapolation

3 Spatial Inference: Best Fit 112 110 y = 0.1408x + 116.69 108 2 R = 0.6779 106 Sex Ratio 104 102 100 98 96 -130 -120 -110 -100 -90 -80 -70 -60 Longitude

3 Spatial Inference • Aggregation • Data within a boundary can be aggregated. • Often to form a new class. • Conversion • Data from a sample set can be converted for a different sample set. • Changing the scale of the geographical unit. • Switching from a set of geographical units to another.

Boreal Forest District B1 District B2 3 Spatial Inference: Aggregation and Conversion Pine Trees Poplar Trees District A District B

Topic 4 – Geographical Data Analysis