1 / 56

Unit 31: A Unified Perspective for Visual Display of Data (A work in progress)

Unit 31: A Unified Perspective for Visual Display of Data (A work in progress). Give you more experience (lecture/lab) with display of data in R One of R’s biggest strengths but as a result complex Sample code for common designs Think about what you want to display

lenore
Download Presentation

Unit 31: A Unified Perspective for Visual Display of Data (A work in progress)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Unit 31: A Unified Perspective for Visual Display of Data (A work in progress)

  2. Give you more experience (lecture/lab) with display of data in R • One of R’s biggest strengths but as a result complex • Sample code for common designs • Think about what you want to display • Understand what you can and cant conclude from graphs • Emphasis on raw and sample distributions; error bars and envelopes • This just scratches the surface • No clear consensus and our field is out of date

  3. Some Options in R Base Package ggplot2 effects

  4. General Principles Visually present our effect, the parameter estimate. Display information about the sampling distribution for that parameter estimate Display information about the distribution of raw scores Case analysis- outliers and influence Reader understands what you are presenting Consistent across typical designs and measurement strategies

  5. The Examples Between subjects designs Two group, equal N Two group, unequal N Three group Three group with covariates One quantitative IV Mixed and within designs Two conditions Three conditions 2 IV: quantitative and two conditions

  6. Two Group Equal N some(d) X Y 007 A 63.75985 034 A 29.04556 052 B 47.33331 091 B 65.85238 tapply(d$Y, d$X, 'length') A B 25 75 tapply(d$Y, d$X, 'mean') A B 41.61268 51.37365 tapply(d$Y, d$X, 'sd') A B 13.85891 15.04196 tapply(d$Y, d$X, 'se') A B 1.959946 2.127254

  7. contrasts(d$X) = varContrasts(d$X, Type = 'POC', POCList = list(c(-1,1))) POC1 A -0.5 B 0.5 m = lm(Y ~ X, data = d) modelSummary(m) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 46.493 1.446 32.147 < 2e-16 *** XPOC1 9.761 2.893 3.375 0.00106 **

  8. contrasts(d$X) = varContrasts(d$X, Type = 'dummy', RefLevel = 1) B_v_A A 0 B 1 m = lm(Y ~ X, data = d) modelSummary(m) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 41.613 2.045 20.345 < 2e-16 *** XB_v_A 9.761 2.893 3.375 0.00106 **

  9. contrasts(d$X) = varContrasts(d$X, Type = 'dummy', RefLevel = 2) A_v_B A 1 B 0 m = lm(Y ~ X, data = d) modelSummary(m) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 51.374 2.045 25.118 < 2e-16 *** XA_v_B -9.761 2.893 -3.375 0.00106 **

  10. How do you display these results

  11. p= data.frame(X= c('A', 'B')) • X • 1 A • 2 B • p= modelPredictions (m,p) • Predicted lwrupr se X • 1 41.61268 39.56737 43.65799 2.045311 A • 51.37365 49.32833 53.41896 2.045311 B

  12. library(gplots) library(Hmisc) windows() par(lwd=3, cex = 1.5, font=2, cex.axis=1, font.axis=2, cex.lab =1.5, font.lab=2) barplot2(p$Predicted, beside = TRUE, ylim = c(0,100), xlab = '', ylab = '', plot.ci =FALSE, axes=FALSE, col= 'white')

  13. axis(2, at=seq(0,100,by=25),lwd=3) mtext('Dependent Measure (units)', side=2, line=2, cex=1.5) mtext('Group', side=1, line=3, cex=1.5)

  14. x = jitter(rep(0,sum(d$X=='A')),2) + 0.7 points(x, d$Y[d$X=='A'], pch=20, cex = .5, col = 'gray') x = jitter(rep(0,sum(d$X=='A')),2) + 1.9 points(x, d$Y[d$X=='B'], pch=20, cex = .5, col = 'gray')

  15. errbar(x=c(0.7, 1.9), y=p$Predicted, p$CIHi, p$CILo, pch=NA_integer_, lwd=3, cap= .05, add=TRUE )

  16. lines(x=c(0.7,1.9),y=c(75,75), lwd=2) text(x=1.3, y=78,'**', cex=1.5)

  17. Figure Caption: Bars represent sample means of dependent measure by group. Confidence interval bands (+1 standard error of point estimates from GLM) are provided to indicate the precision of the point estimates of the population group means. Dependent measure raw scores are presented by group as gray points. Horizontal line indicates significant contrast between group means (** p < .01)

  18. What other error bars might you have put on the graph instead of the standard error of the point estimates?

  19. POC Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 46.493 1.446 32.147 < 2e-16 *** XPOC1 9.761 2.893 3.375 0.00106 ** Dummy (A as reference) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 41.613 2.045 20.345 < 2e-16 *** XB_v_A 9.761 2.893 3.375 0.00106 ** Dummy (B as reference) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 51.374 2.045 25.118 < 2e-16 *** XA_v_B -9.761 2.893 -3.375 0.00106 **

  20. t = summary(m) t$coefficients Estimate Std. Error t value Pr(>|t|) (Intercept) 46.493 1.446 32.147 < 2e-16 *** XPOC1 9.761 2.893 3.375 0.00106 ** errbar(x=c(0.7, 1.9), y=p$Predicted, p$Predicted + t$coefficients[2,2], p$Predicted - t$coefficients[2,2], pch=NA_integer_, lwd=3, cap= .05, add=TRUE )

  21. Two Group Unequal N describeBy(d,d$X) #an alternatve to tapply used earlier group: A var n mean sd median trimmed mad min max range skew kurtosis se X* 1 25 1.00 0.00 1.00 1.00 0.00 1 1.00 0.00 NaN NaN 0.00 Y 2 25 37.72 15.51 34.87 36.77 10.77 10 81.11 71.12 0.88 0.62 3.10 ----------------------------------------------------------------------------------- group: B var n mean sd median trimmed mad min max range skew kurtosis se X* 1 75 2.00 0.00 2.00 2.0 0.00 2.00 2.00 0.00 NaN NaN 0.00 Y 2 75 48.84 13.38 47.52 48.8 14.79 18.85 81.98 63.13 0.11 -0.38 1.55

  22. POC Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 43.279 1.609 26.894 < 2e-16 *** XPOC1 11.122 3.218 3.456 0.000813 *** Dummy (A as reference) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 37.718 2.787 13.532 < 2e-16 *** XB_v_A 11.122 3.218 3.456 0.000813 *** Dummy (B as reference) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 48.840 1.609 30.350 < 2e-16 *** XA_v_B -11.122 3.218 -3.456 0.000813 ***

  23. Three Group Equal N tapply(d$Y, d$X, 'length') A B C 50 50 50 tapply(d$Y, d$X, 'mean') A B C 40.56729 44.34783 55.38463 tapply(d$Y, d$X, 'sd') A B C 13.67520 16.14149 13.78084 tapply(d$Y, d$X, 'se') A B C 1.933965 2.282752 1.948905

  24. POC Estimate Std. Error t value Pr(>|t|) (Intercept) 46.767 1.190 39.293 < 2e-16 *** XPOC1 3.781 2.915 1.297 0.197 XPOC2 12.927 2.525 5.120 9.43e-07 *** Dummy (A as reference) Estimate Std. Error t value Pr(>|t|) (Intercept) 40.567 2.062 19.678 < 2e-16 *** XB_v_A 3.781 2.915 1.297 0.197 XC_v_A 14.817 2.915 5.082 1.12e-06 *** Dummy (B as reference) Estimate Std. Error t value Pr(>|t|) (Intercept) 44.348 2.062 21.512 < 2e-16 *** XA_v_B -3.781 2.915 -1.297 0.196752 XC_v_B 11.037 2.915 3.786 0.000223 *** Dummy (C as reference) Estimate Std. Error t value Pr(>|t|) (Intercept) 55.385 2.062 26.866 < 2e-16 *** XA_v_C -14.817 2.915 -5.082 1.12e-06 *** XB_v_C -11.037 2.915 -3.786 0.000223 ***

  25. p= data.frame(X= c('A', 'B', 'C')) • X • 1 A • 2 B • 3 C • p= modelPredictions(m,p) • Predicted lwrupr se • 1 40.56729 38.50579 42.62880 2.061505 • 2 44.34783 42.28632 46.40933 2.061505 • 55.38463 53.32312 57.44613 2.061505

  26. windows() par(lwd=3, cex = 1.5, font=2, cex.axis=1, font.axis=2, cex.lab =1.5, font.lab=2) barplot2(p$Predicted, beside = TRUE, ylim = c(0,100), xlab = '', ylab = '', plot.ci =FALSE, axes=FALSE, col= 'white')

  27. axis(2, at=seq(0,100,by=25),lwd=3) mtext('Dependent Measure (units)', side=2, line=2, cex=1.5) mtext('Group', side=1, line=3, cex=1.5)

  28. x = jitter(rep(0,sum(d$X=='A')),2) + 0.7 points(x, d$Y[d$X=='A'], pch=20, cex = .5, col = 'gray') x = jitter(rep(0,sum(d$X=='A')),2) + 1.9 points(x, d$Y[d$X=='B'], pch=20, cex = .5, col = 'gray') x = jitter(rep(0,sum(d$X=='A')),2) + 3.1 points(x, d$Y[d$X=='C'], pch=20, cex = .5, col = 'gray')

  29. errbar(x=c(0.7,1.9,3.1), y=p$Predicted, p$CIHi, p$CILo, pch=NA_integer_, lwd=3, cap= .05, add=TRUE )

  30. lines(x=c(0.7, 3.1),y=c(70,70), lwd=2) text(x=1.9, y=73,'***', cex=1.5) lines(x=c(1.9,3.1),y=c(80,80), lwd=2) text(x=2.5, y=83,'***', cex=1.5)

  31. Figure Caption: Bars represent sample means of dependent measure by group. Confidence interval bands (+1 standard error of point estimates from GLM) are provided to indicate the precision of the point estimates of the population group means. Dependent measure raw scores are presented by group as gray points. Horizontal line indicates significant contrast between group means (*** p < .001)

  32. What other error bars might you have put on the graph instead of the standard error of the point estimates?

  33. POC Estimate Std. Error t value Pr(>|t|) (Intercept) 46.767 1.190 39.293 < 2e-16 *** XPOC1 3.781 2.915 1.297 0.197 XPOC2 12.927 2.525 5.120 9.43e-07 *** Dummy (A as reference) Estimate Std. Error t value Pr(>|t|) (Intercept) 40.567 2.062 19.678 < 2e-16 *** XB_v_A 3.781 2.915 1.297 0.197 XC_v_A 14.817 2.915 5.082 1.12e-06 *** Dummy (B as reference) Estimate Std. Error t value Pr(>|t|) (Intercept) 44.348 2.062 21.512 < 2e-16 *** XA_v_B -3.781 2.915 -1.297 0.196752 XC_v_B 11.037 2.915 3.786 0.000223 *** Dummy (C as reference) Estimate Std. Error t value Pr(>|t|) (Intercept) 55.385 2.062 26.866 < 2e-16 *** XA_v_C -14.817 2.915 -5.082 1.12e-06 *** XB_v_C -11.037 2.915 -3.786 0.000223 ***

  34. Three Group with Two Covariates some(d) X CQ CD Y 007 A 100.35521 1 143.1154 029 A 98.86410 1 121.2442 057 B 98.99465 0 130.4698 070 B 109.54382 0 166.6895 132 C 105.21880 0 146.7784 tapply(d$Y, d$X, 'mean') A B C 124.3091 131.0999 143.5009

  35. What is different here?

  36. str(d) 'data.frame': 100 obs. of 2 variables: $ X: num 46.7 35.3 33 57 36.1 ... $ Y: num 32.1 19.8 26.7 63.1 53.9 ... varDescribe(d,1) n mean sd min max X 100 51.24 9.85 32.98 80.97 Y 100 45.90 15.82 11.13 93.24

  37. m = lm(Y ~ X, data = d) modelSummary(m) Coefficients Estimate SE t-statistic Pr(>|t|) (Intercept) 17.3968 7.9405 2.191 0.030829 * X 0.5563 0.1522 3.655 0.000416 *** --- Sum of squared errors (SSE): 21795.3, Error df: 98 R-squared: 0.1200 48

  38. p = data.frame(X = seq(33,80,.01)) p = modelPredictions(m,p) plot(x=c(30,90),y=c(0,100), type='n', xlab = '', ylab = '', axes=FALSE, frame.plot=FALSE) axis(1, lwd=3, at=seq(30,90, by=10), cex.axis=1) mtext('Variable X', side=1, line=3, cex=1.5) axis(2, lwd=3, at=seq(0,100, by=25), cex.axis=1) mtext(expression(bold(paste('Variable Y (', mu, 'V)', sep=''))), side=2, line=2, cex=1.5) points(d$X,d$Y, cex=.5) #Draw new polygon shaded confidence bands with transparency. NOTE: Bands drawn before prediction lines in case of overlap polygon(c(p$X, rev(p$X)), c(p$CILo, rev(p$Predicted)),col = (rgb(1, 0, 0,.25)), border = NA) polygon(c(p$X, rev(p$X)), c(p$CIHi, rev(p$Predicted)),col = (rgb(1, 0, 0,.25)), border = NA) #Draw confidence bands as lines instead of region. NOTE: Bands drawn before prediction lines in case of overlap #lines(x=p$X,y=p$CILo, type='l', lty=1, col='gray', lwd=1) #lines(x=p$X,y=p$CIHi, type='l', lty=1, col='gray', lwd=1) lines(x=p$X,y=p$Predicted, type='l', lty=1, col='black', lwd=3) Coefficients Estimate SE t-statistic Pr(>|t|) (Intercept) 17.3968 7.9405 2.191 0.030829 * X 0.5563 0.1522 3.655 0.000416 *** --- Sum of squared errors (SSE): 21795.3, Error df: 98 R-squared: 0.1200

  39. points(d$X,d$Y, cex=.5) polygon(c(p$X, rev(p$X)), c(p$CILo, rev(p$Predicted)), col = (rgb(1, 0, 0,.25)), border = NA) polygon(c(p$X, rev(p$X)), c(p$CIHi, rev(p$Predicted)), col = (rgb(1, 0, 0,.25)), border = NA) lines(p$X,p$Predicted, type='l', lty=1, col='black', lwd=3)

More Related