plotting multivariate data
Skip this Video
Download Presentation
Plotting Multivariate Data

Loading in 2 Seconds...

play fullscreen
1 / 18

Plotting Multivariate Data - PowerPoint PPT Presentation

  • Uploaded on

Plotting Multivariate Data. Harry R. Erwin, PhD School of Computing and Technology University of Sunderland. Resources. Everitt , BS, and G Dunn (2001) Applied Multivariate Data Analysis, London:Arnold .

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about ' Plotting Multivariate Data' - breena

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
plotting multivariate data

Plotting Multivariate Data

Harry R. Erwin, PhD

School of Computing and Technology

University of Sunderland

  • Everitt, BS, and G Dunn (2001) Applied Multivariate Data Analysis, London:Arnold.
  • Everitt, BS (2005) An R and S-PLUS® Companion to Multivariate Analysis, London:Springer
edward tufte s recommendations
Edward Tufte’s Recommendations
  • Show the data
  • Induce the viewer to think about the substance of the data
  • Avoid distorting what the data have to say
  • Present many numbers in a small space
  • Make large data sets coherent
  • Encourage comparison
  • Reveal the data at several levels of detail
  • Serve a clear purpose
  • Be closely integrated with the statistical and verbal descriptions of the data
    • Tufte, E R (2001), The Visual Display of Quantitative Information, Graphics Press.
tufte s points
Tufte’s Points
  • Graphics reveal data.
  • Graphics can be more precise and revealing than conventional statistics.
  • Anscombe’s data
    • Anscombe, F J (1973) “Graphs in Statistical Analysis”, American Statistician, 27:17-21.
  • All four data sets are described by the same linear model.
ways of looking at data
Ways of Looking at Data
  • Scatterplots
    • Demonstration
  • “The convex hull of bivariate data”
    • Demonstration
  • Chiplot
    • Demonstration
  • BivariateBoxplot
    • Demonstration
and more multivariate graphics
And More Multivariate Graphics
  • Bivariate Densities
    • Demonstration
  • Other Variables in a Scatterplot
    • Demonstration
  • Scatterplot Matrix
    • Demonstration of pairs
  • 3-D Plots
    • Demonstration
  • Conditioning Plots
    • Demonstration
  • Launch R
  • Set the working directory to Statistics/RSPCMA/Data
  • airpoll<-source("chap2airpoll.dat")$value
  • Review exercises on pages 19-22
convex hull of bivariate data
Convex Hull of Bivariate Data
  • Scatterplots are often used during the calculation of the correlation coefficient of two variables.
  • Used to detect outliers.
  • Convex hull trimming generates a robust estimate of the correlation coefficient.
  • Demonstration
    • attach(airpoll)
    • cor(SO2, Mortality)
robust estimation of the correlation
Robust Estimation of the Correlation
  • hull<-chull(SO2, Mortality) # finds the convex hull
  • plot(SO2, Mortality, pch=1)
  • polygon(SO2[hull],Mortality[hull], density=15, angle=30)
  • cor(SO2[-hull],Mortality[-hull])
  • The results are almost identical, which is unusual.
  • A way of augmenting the scatterplot to spot dependence/independence.
  • See Statistics/RSCMPA/functions.txt
  • chiplot(SO2,Mortality,vlabs=c("SO2", "Mortality")
  • For independent data, the points will be scattered in ahoriszontal band centered around 0.
  • Departure from independence here is shown by the points missing from (-0.25,0.25)
bivariate boxplot
  • Two-dimensional analogue of the boxplot
  • A pair of concentric ellipses—the inner ellipse (the “hinge”) holds half the data, and the outer ellipse (the “fence”) identifiers outliers.
  • Regression lines of x on y and y on x are shown.
    • bvbox(cbind(SO2,Mortality), xlab="SO2", ylab="Mortality")
  • Cleaned up (more robust):
    • bvbox(cbind(SO2,Mortality), xlab="SO2", ylab="Mortality", method="O")
bivariate densities
Bivariate Densities
  • The goal of examining a scatterplot is to identify clusters and outliers.
  • Humans are not particularly good at this, so graphical aids help.
  • Adding a bivariate density estimate is good.
  • Histograms are too rough, though.
demo of bivariate density
Demo of Bivariate Density
  • den1<-bivden(SO2,Mortality)
  • persp(den1$seqx, den1$seqy, den1$den, xlab=“SO2”, ylab=“Mortality”, zlab=“Density”, lwd=2)
  • plot(SO2, Mortality)
  • contour(den1$seqx, den1$seqy, den1$den, lwd=2, nlevels=20, add=T)
adding a third variable to the scatterplot
Adding a Third Variable to the Scatterplot
  • Thebubbleplot
  • plot(SO2, Mortality, pch=1, lwd=2, ylim=c(700,1200), xlim=c(-5,300)) # basic scatterplot.
  • symbols(SO2, Mortality, circles=Rainfall, inches=0.4, add=TRUE, lwd=2) # adding Rainfall to each point.
scatterplot matrix
Scatterplot Matrix
  • pairs(airpoll)
  • To add regression lines
    • pairs(airpoll,panel=function(x,y) {




  • For 3D graphics, use cloud
    • cloud(Mortality~SO2+Rainfall)
conditioning plots
Conditioning Plots
  • coplot(Mortality~SO2|Popden)
  • To add a local regression fit

coplot(Mortality~SO2|Popden, panel=function(x,y,col,pch)


  • The purpose of graphics is to aid your intuition.
  • Explore them—the appropriate graphics reflect your questions and the structure of the data.
  • Next week: graphic presentations to avoid, because they mislead you and your audience.
  • Look at the books by Edward Tufte in the library.