1 / 18

# Plotting Multivariate Data - PowerPoint PPT Presentation

Plotting Multivariate Data. Harry R. Erwin, PhD School of Computing and Technology University of Sunderland. Resources. Everitt , BS, and G Dunn (2001) Applied Multivariate Data Analysis, London:Arnold .

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'Plotting Multivariate Data' - breena

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Plotting Multivariate Data

Harry R. Erwin, PhD

School of Computing and Technology

University of Sunderland

• Everitt, BS, and G Dunn (2001) Applied Multivariate Data Analysis, London:Arnold.

• Everitt, BS (2005) An R and S-PLUS® Companion to Multivariate Analysis, London:Springer

Edward Tufte’s Recommendations

• Show the data

• Induce the viewer to think about the substance of the data

• Avoid distorting what the data have to say

• Present many numbers in a small space

• Make large data sets coherent

• Encourage comparison

• Reveal the data at several levels of detail

• Serve a clear purpose

• Be closely integrated with the statistical and verbal descriptions of the data

• Tufte, E R (2001), The Visual Display of Quantitative Information, Graphics Press.

Tufte’s Points

• Graphics reveal data.

• Graphics can be more precise and revealing than conventional statistics.

• Anscombe’s data

• Anscombe, F J (1973) “Graphs in Statistical Analysis”, American Statistician, 27:17-21.

• All four data sets are described by the same linear model.

The Anscombe Graphics

Ways of Looking at Data

• Scatterplots

• Demonstration

• “The convex hull of bivariate data”

• Demonstration

• Chiplot

• Demonstration

• BivariateBoxplot

• Demonstration

And More Multivariate Graphics

• Bivariate Densities

• Demonstration

• Other Variables in a Scatterplot

• Demonstration

• Scatterplot Matrix

• Demonstration of pairs

• 3-D Plots

• Demonstration

• Conditioning Plots

• Demonstration

• Launch R

• Set the working directory to Statistics/RSPCMA/Data

• airpoll<-source("chap2airpoll.dat")\$value

• Review exercises on pages 19-22

Convex Hull of Bivariate Data

• Scatterplots are often used during the calculation of the correlation coefficient of two variables.

• Used to detect outliers.

• Convex hull trimming generates a robust estimate of the correlation coefficient.

• Demonstration

• attach(airpoll)

• cor(SO2, Mortality)

• hull<-chull(SO2, Mortality) # finds the convex hull

• plot(SO2, Mortality, pch=1)

• polygon(SO2[hull],Mortality[hull], density=15, angle=30)

• cor(SO2[-hull],Mortality[-hull])

• The results are almost identical, which is unusual.

• A way of augmenting the scatterplot to spot dependence/independence.

• See Statistics/RSCMPA/functions.txt

• chiplot(SO2,Mortality,vlabs=c("SO2", "Mortality")

• For independent data, the points will be scattered in ahoriszontal band centered around 0.

• Departure from independence here is shown by the points missing from (-0.25,0.25)

BivariateBoxplot

• Two-dimensional analogue of the boxplot

• A pair of concentric ellipses—the inner ellipse (the “hinge”) holds half the data, and the outer ellipse (the “fence”) identifiers outliers.

• Regression lines of x on y and y on x are shown.

• bvbox(cbind(SO2,Mortality), xlab="SO2", ylab="Mortality")

• Cleaned up (more robust):

• bvbox(cbind(SO2,Mortality), xlab="SO2", ylab="Mortality", method="O")

Bivariate Densities

• The goal of examining a scatterplot is to identify clusters and outliers.

• Humans are not particularly good at this, so graphical aids help.

• Adding a bivariate density estimate is good.

• Histograms are too rough, though.

Demo of Bivariate Density

• den1<-bivden(SO2,Mortality)

• persp(den1\$seqx, den1\$seqy, den1\$den, xlab=“SO2”, ylab=“Mortality”, zlab=“Density”, lwd=2)

• plot(SO2, Mortality)

• contour(den1\$seqx, den1\$seqy, den1\$den, lwd=2, nlevels=20, add=T)

Adding a Third Variable to the Scatterplot

• Thebubbleplot

• plot(SO2, Mortality, pch=1, lwd=2, ylim=c(700,1200), xlim=c(-5,300)) # basic scatterplot.

• symbols(SO2, Mortality, circles=Rainfall, inches=0.4, add=TRUE, lwd=2) # adding Rainfall to each point.

Scatterplot Matrix

• pairs(airpoll)

• pairs(airpoll,panel=function(x,y) {

abline(lsfit(x,y)\$coef,lwd=2)

lines(lowess(x,y),lty=2,lwd=2)

points(x,y)})

• For 3D graphics, use cloud

• cloud(Mortality~SO2+Rainfall)

• coplot(Mortality~SO2|Popden)

• To add a local regression fit

coplot(Mortality~SO2|Popden, panel=function(x,y,col,pch)

panel.smooth(x,y,span=1))

• The purpose of graphics is to aid your intuition.

• Explore them—the appropriate graphics reflect your questions and the structure of the data.

• Next week: graphic presentations to avoid, because they mislead you and your audience.

• Look at the books by Edward Tufte in the library.