1 / 63

Introduction to Geospatial Analysis in R

Introduction to Geospatial Analysis in R. SURF – 24 April 2012 Daniel Marlay. Synopsis.

edolie
Download Presentation

Introduction to Geospatial Analysis in R

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction toGeospatial Analysis in R SURF – 24 April 2012 Daniel Marlay

  2. Synopsis • This month's talk is going to look at the geo-spatial capabilities of R. We'll look at how to import common geographical data formats into R and some of the free geographic data sources and map layers available. We'll then look at how to create maps in R using this data, and some of the ways to style it to display our data. We'll look at how R stores geographic data and how we can perform queries against that - for example identifying which points fall into a particular region. Finally, we'll take a brief look at modeling geospatial data and some of the issues to be aware of.

  3. Introduction • There are extensive geospatial capabilities in R • I’ve just started to scratch the surface • This presentation will give a little bit of theory • Most of the content is a walk through of doing geospatial analysis in R • I’ve picked data sets that are freely available • Trying this yourself is the best way to learn • And maybe we’ll learn something about the way Australians vote…

  4. R Geospatial Packages • sp – provides a generic set of functions, classes and methods for handling spatial data • rgdal – provides an R interface into the Geospatial Data Abstraction Library (GDAL) which is used to read and write geospatial data from R

  5. Types of Geospatial Data • Vector data • Points • Lines • Areas • Bitmap • Often used for image data (e.g. aerial photos) • Needs to be registered to a coordinate system • “Labelled” data • Has geographic information, but needs to be matched before it can be used

  6. Setting up the R Environment ## Set working directory to where the data is. Update as required if running this yourself setwd("C:\\Documents and Settings\\marlada\\My Documents\\AQUA Internal\\Thought Leadership\\201204 - SURF Geospatial Analysis Presentation"); ## Load the relevant libraries library(sp); # Basic R classes for handling geographic data library(rgdal); # Library for using the Geographic Data Abstraction Layer library(nlme); # Library that gives us generalised least squares

  7. Obtain Census Data (1/6)

  8. Obtain Census Data (2/6)

  9. Obtain Census Data (3/6)

  10. Obtain Census Data (4/6)

  11. Obtain Census Data (5/6)

  12. Obtain Census Data (6/6)

  13. Read In Census Data (1/3) ## Read in and clean the census data (Note: a lot of this cleaning could be done more easily in Excel) EducationLevel <- read.csv("EducationData.csv",skip=6,na.strings=""); EducationLevel <- EducationLevel[c(-1,-2),c(-1,-27)]; # Remove leading and trailing blank columns and blank second row EducationLevel <- EducationLevel[-(97:100),]; # Remove trailing blank lines #### Create some useable column names EduDataCols <- paste(c(rep("Male",8),rep("Female",8),rep("Total",8)), rep(c("NotStated","InadDescr","Postgrad","GradDipCert","Bachelor","Diploma","Certificate","NA"),3), sep="."); colnames(EducationLevel) <- c("SED",EduDataCols);

  14. Read In Census Data (2/3) #### Recode the data into character and numeric data to avoid weird errors from factors EducationLevel[,1] <- as.character(EducationLevel[,1]); for (col in EduDataCols) { EducationLevel[,col] <- as.numeric(as.character(EducationLevel[,col])); } #### Eyeball the data to make sure it is ok. summary(EducationLevel); head(EducationLevel,10); tail(EducationLevel,10);

  15. Read In Census Data (3/3)

  16. Obtain Electoral Data (1/4)

  17. Obtain Electoral Data (2/4)

  18. Obtain Electoral Data (3/4)

  19. Obtain Electoral Data (4/4)

  20. Read In Electoral Data (1/2) ## Read in the electoral data ElectionResults <- read.csv("2011NSWElectionResults.csv"); #### Eyeball data to make sure it is ok summary(ElectionResults); head(ElectionResults); tail(ElectionResults);

  21. Read In Electoral Data (2/2)

  22. Obtain Geography (1/4)

  23. Obtain Geography (2/4)

  24. Obtain Geography (3/4)

  25. Obtain Geography (4/4)

  26. Read In SED Geography (1/3) ## Read in the state electoral division boundaries (geography) and explore the SpatialPolygonsDataFrame class SED <- readOGR("C:\\Documents and Settings\\marlada\\My Documents\\AQUA Internal\\Thought Leadership\\201204 - SURF Geospatial Analysis Presentation\\Geographies","SED06aAUST_region"); #### Have an initial look at the SED data set that we've just read in summary(SED); plot(SED);

  27. Read In SED Geography (2/3)

  28. Read In SED Geography (3/3)

  29. Examining the SpatialPloygonsDataFrame (1/2) #### SED is a SpatialPolygonsDataFrame, an S4 object. We can have a look at how it is constructed mode(SED); slotNames(SED); summary(SED@data); summary(SED@polygons); SED@plotOrder; SED@bbox; SED@proj4string;

  30. Examining the SpatialPloygonsDataFrame(2/2)

  31. Simple Mapping of SpatialPolygonsDataFrames (1/2) #### Let's now look at some more mapping, we've seen that we can plot all of Australia plot(SED[SED$STATE_2006 == "1",]); # Plot NSW plot(SED[SED$STATE_2006 == "1",],xlim=c(150.6,151.4),ylim=c(-34.3,-33.4)); # Plot Sydney - xlim and ylim from google maps ;-) plot(SED[SED$STATE_2006 == "1",],xlim=c(150.6,151.4),ylim=c(-34.3,-33.4)); # Plot Sydney and put on some electoral district names text(coordinates(SED[SED$STATE_2006 == "1",]),labels=(SED[SED$STATE_2006 == "1",])$NAME_2006,cex=0.5);

  32. Simple Mapping of SpatialPolygonsDataFrames (1/2)

  33. Thematic Mapping (1/8) ## Thematic mapping SED.NSW <- SED[SED$STATE_2006 == "1",]; # subset of SED for convenience #### Create a ThemeData data set with a summary of the data we are interested in - proportion of people with a tertiary education ThemeData <- data.frame(SED = as.character(EducationLevel$SED), PropTertiaryEd = (EducationLevel$Total.Postgrad + EducationLevel$Total.GradDipCert + EducationLevel$Total.Bachelor + EducationLevel$Total.Diploma + EducationLevel$Total.Certificate) / (EducationLevel$Total.Postgrad + EducationLevel$Total.GradDipCert + EducationLevel$Total.Bachelor + EducationLevel$Total.Diploma + EducationLevel$Total.Certificate + EducationLevel$Total.NA), stringsAsFactors=FALSE); hist(ThemeData$PropTertiaryEd); # Histogram of the proportions to work out the appropriate cut points ThemeData$PropTertiaryEdFact <- cut(ThemeData$PropTertiaryEd,c(0,0.25,0.3,0.35,0.4,0.5,1.0)); # Create a factor for the proportion variable levels(ThemeData$PropTertiaryEdFact) <- c("25% or Less","25% to 30%","30% to 35%","35% to 40%","40% to 50%","More than 50%");

  34. Thematic Mapping (2/8)

  35. Thematic Mapping (3/8) #### Display a thematic map for all of NSW bands <- length(levels(ThemeData$PropTertiaryEdFact)); pal <- heat.colors(bands); plot(SED.NSW,col=pal[ThemeData$PropTertiaryEdFact[match(SED.NSW$NAME_2006,ThemeData$SED)]]); # Note the use of match() to get the right rows legend("bottomright", legend=levels(ThemeData$PropTertiaryEdFact), fill=pal, title="Prop. with Tertiary Ed.",inset=0.01); #### Display a thematic map for Sydney plot(SED.NSW,col=pal[ThemeData$PropTertiaryEdFact[match(SED.NSW$NAME_2006,ThemeData$SED)]],xlim=c(150.6,151.4),ylim=c(-34.3,-33.4)); legend("bottomright", legend=levels(ThemeData$PropTertiaryEdFact), fill=pal, title="Prop. with Tertiary Ed.",inset=0.01);

  36. Thematic Mapping (4/8)

  37. Thematic Mapping (5/8) #### Now we'll add the election results to our ThemeData data set rownames(ElectionResults) <- as.character(ElectionResults$District); # Adding rownames allows us to index by them when matching ThemeData$PropGreenVote <- ElectionResults[ThemeData$SED,"GRN"] / ElectionResults[ThemeData$SED,"Total"]; # Create a green vote proportion variable hist(ThemeData$PropGreenVote,breaks=20); # Have a look at the distribution ThemeData$PropGreenVoteFact <- cut(ThemeData$PropGreenVote,c(0,0.05,0.06,0.08,0.1,0.15,1.0)); # Create a factor levels(ThemeData$PropGreenVoteFact) <- c("Less than 5%","5% to 6%","6% to 8%","8% to 10%","10% to 15%","More than 15%");

  38. Thematic Mapping (6/8)

  39. Thematic Mapping (7/8) #### And do some thematic maps of the election results bands <- length(levels(ThemeData$PropGreenVoteFact)); pal <- heat.colors(bands); plot(SED.NSW,col=pal[ThemeData$PropGreenVoteFact[match(SED.NSW$NAME_2006,ThemeData$SED)]]) legend("bottomright", legend=levels(ThemeData$PropPropGreenVoteFactFact), fill=pal, title="Prop. Voted Green",inset=0.01) plot(SED.NSW,col=pal[ThemeData$PropGreenVoteFact[match(SED.NSW$NAME_2006,ThemeData$SED)]],xlim=c(150.6,151.4),ylim=c(-34.3,-33.4)) legend("bottomright", legend=levels(ThemeData$PropGreenVoteFact), fill=pal, title="Prop. Voted Green",inset=0.01)

  40. Thematic Mapping (8/8)

  41. Obtain Topographic Map Data (1/9)

  42. Obtain Topographic Map Data (2/9)

  43. Obtain Topographic Map Data (3/9)

  44. Obtain Topographic Map Data (4/9)

  45. Obtain Topographic Map Data (5/9)

  46. Obtain Topographic Map Data (6/9)

  47. Obtain Topographic Map Data (7/9)

  48. Obtain Topographic Map Data (8/9)

  49. Obtain Topographic Map Data (9/9)

  50. Geographic Querying (1/4) ## Demonstration of geographic querying #### Read in the Localities layer from the TOPO 2.5M data set Locs <- readOGR("C:\\Documents and Settings\\marlada\\My Documents\\AQUA Internal\\Thought Leadership\\201204 - SURF Geospatial Analysis Presentation\\Geographies\\localities","aus25lgd_p"); Mtns <- Locs[Locs$LOCALITY == "6",]; # Select only mountains plot(Mtns) #### Use the over function to find a list of mountains in SEDs with more than 10% green votes over(SED.NSW[!is.na(ThemeData$PropGreenVote[match(SED.NSW$NAME_2006,ThemeData$SED)]) & ThemeData$PropGreenVote[match(SED.NSW$NAME_2006,ThemeData$SED)] > 0.10,], Mtns); # Only gets one mountain per SED over(SED.NSW[!is.na(ThemeData$PropGreenVote[match(SED.NSW$NAME_2006,ThemeData$SED)]) & ThemeData$PropGreenVote[match(SED.NSW$NAME_2006,ThemeData$SED)] > 0.10,], Mtns,returnList=TRUE); # Gets all mountains, but in a less useful format do.call("rbind",over(SED.NSW[!is.na(ThemeData$PropGreenVote[match(SED.NSW$NAME_2006,ThemeData$SED)]) & ThemeData$PropGreenVote[match(SED.NSW$NAME_2006,ThemeData$SED)] > 0.10,], Mtns,returnList=TRUE)); # Gives us something a bit more useable

More Related