Plotting multivariate data
1 / 18

Plotting Multivariate Data - PowerPoint PPT Presentation

  • Uploaded on

Plotting Multivariate Data. Harry R. Erwin, PhD School of Computing and Technology University of Sunderland. Resources. Everitt , BS, and G Dunn (2001) Applied Multivariate Data Analysis, London:Arnold .

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about ' Plotting Multivariate Data' - breena

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Plotting multivariate data

Plotting Multivariate Data

Harry R. Erwin, PhD

School of Computing and Technology

University of Sunderland


  • Everitt, BS, and G Dunn (2001) Applied Multivariate Data Analysis, London:Arnold.

  • Everitt, BS (2005) An R and S-PLUS® Companion to Multivariate Analysis, London:Springer

Edward tufte s recommendations
Edward Tufte’s Recommendations

  • Show the data

  • Induce the viewer to think about the substance of the data

  • Avoid distorting what the data have to say

  • Present many numbers in a small space

  • Make large data sets coherent

  • Encourage comparison

  • Reveal the data at several levels of detail

  • Serve a clear purpose

  • Be closely integrated with the statistical and verbal descriptions of the data

    • Tufte, E R (2001), The Visual Display of Quantitative Information, Graphics Press.

Tufte s points
Tufte’s Points

  • Graphics reveal data.

  • Graphics can be more precise and revealing than conventional statistics.

  • Anscombe’s data

    • Anscombe, F J (1973) “Graphs in Statistical Analysis”, American Statistician, 27:17-21.

  • All four data sets are described by the same linear model.

The anscombe graphics
The Anscombe Graphics

Ways of looking at data
Ways of Looking at Data

  • Scatterplots

    • Demonstration

  • “The convex hull of bivariate data”

    • Demonstration

  • Chiplot

    • Demonstration

  • BivariateBoxplot

    • Demonstration

And more multivariate graphics
And More Multivariate Graphics

  • Bivariate Densities

    • Demonstration

  • Other Variables in a Scatterplot

    • Demonstration

  • Scatterplot Matrix

    • Demonstration of pairs

  • 3-D Plots

    • Demonstration

  • Conditioning Plots

    • Demonstration


  • Launch R

  • Set the working directory to Statistics/RSPCMA/Data

  • airpoll<-source("chap2airpoll.dat")$value

  • Review exercises on pages 19-22

Convex hull of bivariate data
Convex Hull of Bivariate Data

  • Scatterplots are often used during the calculation of the correlation coefficient of two variables.

  • Used to detect outliers.

  • Convex hull trimming generates a robust estimate of the correlation coefficient.

  • Demonstration

    • attach(airpoll)

    • cor(SO2, Mortality)

Robust estimation of the correlation
Robust Estimation of the Correlation

  • hull<-chull(SO2, Mortality) # finds the convex hull

  • plot(SO2, Mortality, pch=1)

  • polygon(SO2[hull],Mortality[hull], density=15, angle=30)

  • cor(SO2[-hull],Mortality[-hull])

  • The results are almost identical, which is unusual.


  • A way of augmenting the scatterplot to spot dependence/independence.

  • See Statistics/RSCMPA/functions.txt

  • chiplot(SO2,Mortality,vlabs=c("SO2", "Mortality")

  • For independent data, the points will be scattered in ahoriszontal band centered around 0.

  • Departure from independence here is shown by the points missing from (-0.25,0.25)

Bivariate boxplot

  • Two-dimensional analogue of the boxplot

  • A pair of concentric ellipses—the inner ellipse (the “hinge”) holds half the data, and the outer ellipse (the “fence”) identifiers outliers.

  • Regression lines of x on y and y on x are shown.

    • bvbox(cbind(SO2,Mortality), xlab="SO2", ylab="Mortality")

  • Cleaned up (more robust):

    • bvbox(cbind(SO2,Mortality), xlab="SO2", ylab="Mortality", method="O")

Bivariate densities
Bivariate Densities

  • The goal of examining a scatterplot is to identify clusters and outliers.

  • Humans are not particularly good at this, so graphical aids help.

  • Adding a bivariate density estimate is good.

  • Histograms are too rough, though.

Demo of bivariate density
Demo of Bivariate Density

  • den1<-bivden(SO2,Mortality)

  • persp(den1$seqx, den1$seqy, den1$den, xlab=“SO2”, ylab=“Mortality”, zlab=“Density”, lwd=2)

  • plot(SO2, Mortality)

  • contour(den1$seqx, den1$seqy, den1$den, lwd=2, nlevels=20, add=T)

Adding a third variable to the scatterplot
Adding a Third Variable to the Scatterplot

  • Thebubbleplot

  • plot(SO2, Mortality, pch=1, lwd=2, ylim=c(700,1200), xlim=c(-5,300)) # basic scatterplot.

  • symbols(SO2, Mortality, circles=Rainfall, inches=0.4, add=TRUE, lwd=2) # adding Rainfall to each point.

Scatterplot matrix
Scatterplot Matrix

  • pairs(airpoll)

  • To add regression lines

    • pairs(airpoll,panel=function(x,y) {




  • For 3D graphics, use cloud

    • cloud(Mortality~SO2+Rainfall)

Conditioning plots
Conditioning Plots

  • coplot(Mortality~SO2|Popden)

  • To add a local regression fit

    coplot(Mortality~SO2|Popden, panel=function(x,y,col,pch)



  • The purpose of graphics is to aid your intuition.

  • Explore them—the appropriate graphics reflect your questions and the structure of the data.

  • Next week: graphic presentations to avoid, because they mislead you and your audience.

  • Look at the books by Edward Tufte in the library.