- 90 Views
- Uploaded on
- Presentation posted in: General

Plotting Multivariate Data

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Plotting Multivariate Data

Harry R. Erwin, PhD

School of Computing and Technology

University of Sunderland

- Everitt, BS, and G Dunn (2001) Applied Multivariate Data Analysis, London:Arnold.
- Everitt, BS (2005) An R and S-PLUS® Companion to Multivariate Analysis, London:Springer

- Show the data
- Induce the viewer to think about the substance of the data
- Avoid distorting what the data have to say
- Present many numbers in a small space
- Make large data sets coherent
- Encourage comparison
- Reveal the data at several levels of detail
- Serve a clear purpose
- Be closely integrated with the statistical and verbal descriptions of the data
- Tufte, E R (2001), The Visual Display of Quantitative Information, Graphics Press.

- Graphics reveal data.
- Graphics can be more precise and revealing than conventional statistics.
- Anscombe’s data
- Anscombe, F J (1973) “Graphs in Statistical Analysis”, American Statistician, 27:17-21.

- All four data sets are described by the same linear model.

- Scatterplots
- Demonstration

- “The convex hull of bivariate data”
- Demonstration

- Chiplot
- Demonstration

- BivariateBoxplot
- Demonstration

- Bivariate Densities
- Demonstration

- Other Variables in a Scatterplot
- Demonstration

- Scatterplot Matrix
- Demonstration of pairs

- 3-D Plots
- Demonstration

- Conditioning Plots
- Demonstration

- Launch R
- Set the working directory to Statistics/RSPCMA/Data
- airpoll<-source("chap2airpoll.dat")$value
- Review exercises on pages 19-22

- Scatterplots are often used during the calculation of the correlation coefficient of two variables.
- Used to detect outliers.
- Convex hull trimming generates a robust estimate of the correlation coefficient.
- Demonstration
- attach(airpoll)
- cor(SO2, Mortality)

- hull<-chull(SO2, Mortality) # finds the convex hull
- plot(SO2, Mortality, pch=1)
- polygon(SO2[hull],Mortality[hull], density=15, angle=30)
- cor(SO2[-hull],Mortality[-hull])
- The results are almost identical, which is unusual.

- A way of augmenting the scatterplot to spot dependence/independence.
- See Statistics/RSCMPA/functions.txt
- chiplot(SO2,Mortality,vlabs=c("SO2", "Mortality")
- For independent data, the points will be scattered in ahoriszontal band centered around 0.
- Departure from independence here is shown by the points missing from (-0.25,0.25)

- Two-dimensional analogue of the boxplot
- A pair of concentric ellipses—the inner ellipse (the “hinge”) holds half the data, and the outer ellipse (the “fence”) identifiers outliers.
- Regression lines of x on y and y on x are shown.
- bvbox(cbind(SO2,Mortality), xlab="SO2", ylab="Mortality")

- Cleaned up (more robust):
- bvbox(cbind(SO2,Mortality), xlab="SO2", ylab="Mortality", method="O")

- The goal of examining a scatterplot is to identify clusters and outliers.
- Humans are not particularly good at this, so graphical aids help.
- Adding a bivariate density estimate is good.
- Histograms are too rough, though.

- den1<-bivden(SO2,Mortality)
- persp(den1$seqx, den1$seqy, den1$den, xlab=“SO2”, ylab=“Mortality”, zlab=“Density”, lwd=2)
- plot(SO2, Mortality)
- contour(den1$seqx, den1$seqy, den1$den, lwd=2, nlevels=20, add=T)

- Thebubbleplot
- plot(SO2, Mortality, pch=1, lwd=2, ylim=c(700,1200), xlim=c(-5,300)) # basic scatterplot.
- symbols(SO2, Mortality, circles=Rainfall, inches=0.4, add=TRUE, lwd=2) # adding Rainfall to each point.

- pairs(airpoll)
- To add regression lines
- pairs(airpoll,panel=function(x,y) {
abline(lsfit(x,y)$coef,lwd=2)

lines(lowess(x,y),lty=2,lwd=2)

points(x,y)})

- pairs(airpoll,panel=function(x,y) {
- For 3D graphics, use cloud
- cloud(Mortality~SO2+Rainfall)

- coplot(Mortality~SO2|Popden)
- To add a local regression fit
coplot(Mortality~SO2|Popden, panel=function(x,y,col,pch)

panel.smooth(x,y,span=1))

- The purpose of graphics is to aid your intuition.
- Explore them—the appropriate graphics reflect your questions and the structure of the data.
- Next week: graphic presentations to avoid, because they mislead you and your audience.
- Look at the books by Edward Tufte in the library.