1 / 42

Outline

Outline. Research Question: What determines height? Data Input Look at One Variable Compare Two Variables Children’s Height and Parents Height Children’s Height and Gender Graphic Packages: ggplot2. What factors are most responsible for height?. Galton’s F amily Height Dataset. X1.

chyna
Download Presentation

Outline

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Outline • Research Question: What determines height? • Data Input • Look at One Variable • Compare Two Variables • Children’s Height and Parents Height • Children’s Height and Gender • Graphic Packages: ggplot2

  2. What factors are most responsible for height?

  3. Galton’s Family Height Dataset X1 X2 X3 Y

  4. Galton’s Notebook on Families & Height

  5. > getwd() [1] "C:/Users/johnp_000/Documents" > setwd()

  6. Dataset Input h <- read.csv("GaltonFamilies.csv") Object Function Filename

  7. str() summary() Data Types: Numbers and Factors/Categorical

  8. Type Variable Steps Histogram Child’s Height Continuous Dad’s Height Scatter Continuous Mom’s Height Categorical Gender Boxplot

  9. Frequency Distribution, Histogram hist(h$child)

  10. Density Plot plot(density(h$childHeight)) Area = 1

  11. Mode, Bimodal hist(h$childHeight,freq=F, breaks =25, ylim = c(0,0.14)) curve(dnorm(x, mean=mean(h$childHeight), sd=sd(h$childHeight)), col="red", add=T)

  12. Grammar of Graphics Seven Components formations Legend Axes ggplot2 built using the grammar of graphics approach

  13. Hadley Wickman and ggplot2 Asst. Professor of Statistics at Rice University ggplot2 plyr reshape rggobi profr http://ggplot2.org/

  14. ggplot2 In ggplot2 a plot is made up of layers. Pl o t

  15. ggplot2 library(ggplot2) h.gg <- ggplot(h, aes(child)) h.gg + geom_histogram(binwidth = 1 ) + labs(x = "Height", y = "Frequency") h.gg + geom_density()

  16. ggplot2 h.gg <- ggplot(h, aes(child)) + theme(legend.position = "right") h.gg + geom_density() + labs(x = "Height", y = "Frequency") h.gg + geom_density(aes(fill=factor(gender)), size=2)

  17. Box Plot

  18. Children’s Height vs. Gender boxplot(h$child~gender,data=h, col=(c("pink","lightblue")), main="Children's Height by Gender", xlab="Gender", ylab="")

  19. Descriptive Stats: Box Plot

  20. Subset Males men<- subset(h, gender=='male')

  21. Subset Females women <- subset(h, gender==‘female')

  22. Children’s Height: Males hist(men$childHeight)

  23. Children’s Height: Females hist(women$child)

  24. library(ggplot2) h.bb <- ggplot(h, aes(factor(gender), child)) h.bb + geom_boxplot() h.bb + geom_boxplot(aes(fill = factor(gender))) ggplot2

  25. Type Variable Steps Y Histogram Continuous Child’s Height Dad’s Height X1, X2 Scatter Continuous Mom’s Height X3 Boxplot Categorical Gender

  26. Correlation

  27. Correlation ?cor cor(h$father, h$child) 0.2660385

  28. Scatterplot Matrix: pairs()

  29. Correlations Matrix library(car) scatterplotMatrix(heights)

  30. ggplot2

  31. Analytics & History: 1st Regression Line The first “Regression Line”

  32. Type Variable Steps Histogram Continuous Child’s Height Dad’s Height Scatter Continuous Mom’s Height Boxplot Categorical Gender

  33. Appendix

  34. What software do you use for creating charts or data visualizations? BI Tools Spotfire Cognos MicroStrategy .net BIRT cytoscape flot gephi gnuplot graphite iDashboards Incanter Java JMP Ptotobi Silverlight splunk SSRS talend webGL Wijmo WPF Xcelcuis XLMiner LogiXML MDX Mondrian octave openlayers OpenViz PhP Powerpoint precog Prezi processing Javascript: Raphael Highcharts Arbor jfreecharts May, 2013 N=172

  35. Visualization and Reporting Steep Learning Curve Easy to Use Standard Interactive Visualizations

  36. BI Software: Tableau http://public.tableausoftware.com/views/PapelbonPitchFX/PapelbonPitchFX

  37. http://rcharts.io/gallery/

  38. https://plot.ly/r/

  39. http://shiny.rstudio.com/gallery/movie-explorer.html

  40. The next data visual was produced with about 150 lines of R code

  41. Data Viz Tutorials

More Related