190 likes | 398 Views
Plotting Multivariate Data. Harry R. Erwin, PhD School of Computing and Technology University of Sunderland. Resources. Everitt , BS, and G Dunn (2001) Applied Multivariate Data Analysis, London:Arnold .
E N D
Plotting Multivariate Data Harry R. Erwin, PhD School of Computing and Technology University of Sunderland
Resources • Everitt, BS, and G Dunn (2001) Applied Multivariate Data Analysis, London:Arnold. • Everitt, BS (2005) An R and S-PLUS® Companion to Multivariate Analysis, London:Springer
Edward Tufte’s Recommendations • Show the data • Induce the viewer to think about the substance of the data • Avoid distorting what the data have to say • Present many numbers in a small space • Make large data sets coherent • Encourage comparison • Reveal the data at several levels of detail • Serve a clear purpose • Be closely integrated with the statistical and verbal descriptions of the data • Tufte, E R (2001), The Visual Display of Quantitative Information, Graphics Press.
Tufte’s Points • Graphics reveal data. • Graphics can be more precise and revealing than conventional statistics. • Anscombe’s data • Anscombe, F J (1973) “Graphs in Statistical Analysis”, American Statistician, 27:17-21. • All four data sets are described by the same linear model.
Ways of Looking at Data • Scatterplots • Demonstration • “The convex hull of bivariate data” • Demonstration • Chiplot • Demonstration • BivariateBoxplot • Demonstration
And More Multivariate Graphics • Bivariate Densities • Demonstration • Other Variables in a Scatterplot • Demonstration • Scatterplot Matrix • Demonstration of pairs • 3-D Plots • Demonstration • Conditioning Plots • Demonstration
Demonstration • Launch R • Set the working directory to Statistics/RSPCMA/Data • airpoll<-source("chap2airpoll.dat")$value • Review exercises on pages 19-22
Convex Hull of Bivariate Data • Scatterplots are often used during the calculation of the correlation coefficient of two variables. • Used to detect outliers. • Convex hull trimming generates a robust estimate of the correlation coefficient. • Demonstration • attach(airpoll) • cor(SO2, Mortality)
Robust Estimation of the Correlation • hull<-chull(SO2, Mortality) # finds the convex hull • plot(SO2, Mortality, pch=1) • polygon(SO2[hull],Mortality[hull], density=15, angle=30) • cor(SO2[-hull],Mortality[-hull]) • The results are almost identical, which is unusual.
Chiplot • A way of augmenting the scatterplot to spot dependence/independence. • See Statistics/RSCMPA/functions.txt • chiplot(SO2,Mortality,vlabs=c("SO2", "Mortality") • For independent data, the points will be scattered in ahoriszontal band centered around 0. • Departure from independence here is shown by the points missing from (-0.25,0.25)
BivariateBoxplot • Two-dimensional analogue of the boxplot • A pair of concentric ellipses—the inner ellipse (the “hinge”) holds half the data, and the outer ellipse (the “fence”) identifiers outliers. • Regression lines of x on y and y on x are shown. • bvbox(cbind(SO2,Mortality), xlab="SO2", ylab="Mortality") • Cleaned up (more robust): • bvbox(cbind(SO2,Mortality), xlab="SO2", ylab="Mortality", method="O")
Bivariate Densities • The goal of examining a scatterplot is to identify clusters and outliers. • Humans are not particularly good at this, so graphical aids help. • Adding a bivariate density estimate is good. • Histograms are too rough, though.
Demo of Bivariate Density • den1<-bivden(SO2,Mortality) • persp(den1$seqx, den1$seqy, den1$den, xlab=“SO2”, ylab=“Mortality”, zlab=“Density”, lwd=2) • plot(SO2, Mortality) • contour(den1$seqx, den1$seqy, den1$den, lwd=2, nlevels=20, add=T)
Adding a Third Variable to the Scatterplot • Thebubbleplot • plot(SO2, Mortality, pch=1, lwd=2, ylim=c(700,1200), xlim=c(-5,300)) # basic scatterplot. • symbols(SO2, Mortality, circles=Rainfall, inches=0.4, add=TRUE, lwd=2) # adding Rainfall to each point.
Scatterplot Matrix • pairs(airpoll) • To add regression lines • pairs(airpoll,panel=function(x,y) { abline(lsfit(x,y)$coef,lwd=2) lines(lowess(x,y),lty=2,lwd=2) points(x,y)}) • For 3D graphics, use cloud • cloud(Mortality~SO2+Rainfall)
Conditioning Plots • coplot(Mortality~SO2|Popden) • To add a local regression fit coplot(Mortality~SO2|Popden, panel=function(x,y,col,pch) panel.smooth(x,y,span=1))
Conclusions • The purpose of graphics is to aid your intuition. • Explore them—the appropriate graphics reflect your questions and the structure of the data. • Next week: graphic presentations to avoid, because they mislead you and your audience. • Look at the books by Edward Tufte in the library.