1 / 43

Workshop

Workshop. Sarah Pendergrass, PhD MS Research Associate Center for Systems Genomics. Outline. g gplot2 Cytoscape PhenoGram. ggplot2. ggplot2 Plotting system for R Flexible, accessible, visualization of data You must have R installed You must have ggplot2 installed:

Download Presentation

Workshop

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Workshop Sarah Pendergrass, PhD MS Research Associate Center for Systems Genomics

  2. Outline • ggplot2 • Cytoscape • PhenoGram

  3. ggplot2 • ggplot2 • Plotting system for R • Flexible, accessible, visualization of data • You must have R installed • You must have ggplot2 installed: • install.packages("ggplot2") • library(ggplot2)

  4. ggplot2 • ggplot2 • Developed by Hadley Wickham • Grammar of graphics: formal structured perspective on describing data graphics • Data properties: typically numerical or categorical values • Visual properties: x and y positions of points, colors of lines, heights of bars • Once you have your code you can reuse reuse reuse • Benefits compared to other R packages • Structure of the data can remain the same while making very different types of plots • Standard format for generating plots

  5. ggplot2 • ggplot2 • Plotting system for R • Flexible, accessible, visualization of data • We will walk through some examples, great references: • “R Graphics Cookbook” • “ggplot2” by Hadley Wickham • Many more examples also exist on-line • Worth doing image searches when you have a new set of data to plot to get ideas and basic code to modify

  6. ggplot2 Input/Output • A little about file input • Input – a dataframe • data <- read.table(“datafile.txt”,header=TRUE) • data <- read.csv(“datafile.txt”, header=FALSE) • There are also commands for importing excel spread sheets • Windows: • windows()will open a new figure window • Mac: • quartz()will open a new figure window

  7. ggplot2 Input/Output • A little about file output • Raster/bitmap • Array of pixels • Can go high resolution (ex 600 dpi) • Vector • Infinitely “zoomable” • Can be modified more easily in Adobe Illustrator • Slow to render with thousands of graphical objects • ggsave(file = “output.pdf”) Or • pdf(file = “output.pdf”) • Then ggplot command • dev.off() • Raster: png(),jpeg(),bitmap(),tiff() • Vector: pdf(),ps()

  8. ggplot2 Input/Output • A little about file output • Illustrator: svg • Latex: ps • MS Office: png (600 dpi) • Open Office: png (600 dpi) • Pdflatex: pdf, png (600 dpi) • Web: png (72 dpi) • Or whatever your journal specifies! • Can specify dpi by adding to the command • ggsave(file = “output.png”, dpi=600)

  9. ggplot2 vocabulary • Data: what we want to visualize • Consisting of variables in a data frame • Data frame: primary data structure in R with properties of matrices • Geoms: geometric objects drawn to represent the data • Aesthetics (aes): visual properties of geoms such as defining X, defining Y, line color, point shapes, etc. • Mappings: mapping from data values to aesthetics • Scales: control mapping from data space to aesthetic space • Guides: show viewer how to map visual properties back to data space: tick marks and labels, etc

  10. ggplot2 basics • ggplot2 • Data has to be saved in a data frame • Each type of variable mapped to an aesthetic must be stored in a separate column (your x, y variables) • Basic ggplot2 specification: • ggplot(dat, aes(x=xval, y=yval) • x=xval maps the column xval to the x position • y=yval maps the column yval to the y position • Now you need to add geometric objects… let’s go make a first plot!

  11. ggplot2 Example 1: Scatter Plot • ggplot2 • Load example data frame 1: • dat <- read.table ("datafile.txt",header=FALSE,sep="\t") • Name the columns: • names(dat) <-c("SampleID","PC1","PC2","Ancestry","Site","Platform","Sex","BMI") • Type dat to check your data frame • and remember, up and down arrows are your friend in R!

  12. ggplot2 Example 1: Scatter Plot • Type dat to check your data frame

  13. ggplot2 Example 1: Scatter Plot • Ggplot2 example 1 • ggplot(dat, aes(x=PC1, y=PC2)) • Indicates the data (our data frame) • xval column values are mapped to the x position, etc. But we need to add geometric objects such as points, so we need to add: • Command: • ggplot(dat,aes(x=PC1,y=PC2))+geom_point() • We can add group to the color of the points, by adding specifying aesthetics for that particular geom • Command: • ggplot(dat,aes(x=PC1,y=PC2)) + geom_point(aes(color=Ancestry)) And if you want • ggsave(file="output1.pdf")

  14. ggplot2 Example 1: Scatter Plot

  15. ggplot2 Example 1: Scatter Plot • Ggplot2 example 1 • Command: • ggplot(dat,aes(x=PC1, y=PC2))+geom_point() • We can add group to the color of the points, by adding specifying aesthetics for that particular geom • last_plot()+geom_point(aes(color=Ancestry))

  16. ggplot2 Example 1: Scatter Plot

  17. ggplot2 Example 1: Scatter Plot • Ggplot2 example 1 • How about changing the axes? • Command: • ggplot(dat,aes(x=PC1,y=PC2))+geom_point() • Modify the scale: • ggplot(dat,aes(x=PC1,y=PC2))+geom_point() + scale_x_continuous(limits=c(0,8))

  18. ggplot2 Example 1: Scatter Plot • Ggplot2 example 1 • Change points • ggplot(dat,aes(x=PC1,y=PC2,color=Ancestry)) + geom_point(shape=1)+scale_colour_hue(l=50) # Open circles and a slightly darker palette than normal • Add regression lines • ggplot(dat,aes(x=PC1,y=PC2)) + geom_point(shape=1)+scale_colour_hue(l=50) + geom_smooth(method=lm,se=FALSE)#Add linear regression lines but don’t add shaded confidence region • ggplot(dat,aes(x=PC1,y=PC2,color=Ancestry)) + geom_point(shape=1) + scale_colour_hue(l=50) +geom_smooth(method=lm,se=FALSE)

  19. ggplot2 Example 1: Scatter Plot

  20. ggplot2 Example 1: Scatter Plot Set shape based on a condition ggplot(dat,aes(x=PC1,y=PC2,shape=Ancestry)) + geom_point() Set shape and color based on separate conditions ggplot(dat,aes(x=PC1, y=PC2, color=Platform,shape=Ancestry)) + geom_point() Same but use hollow circles and triangles ggplot(dat,aes(x=PC1,y=PC2,shape=Ancestry, color=Platform)) + geom_point() + scale_shape_manual(values=c(1,2))

  21. ggplot2 Example 1: Scatter Plot

  22. ggplot2 Example 1: Scatter Plot Same but increase shape size and text size theme_set(theme_gray(base_size = 15)) ggplot(dat,aes(x=PC1,y=PC2,shape=Ancestry, color=Platform)) + geom_point(size=8)

  23. ggplot2 Example 2: Histograms Histogram • ggplot(dat, aes(x=BMI)) + geom_histogram(binwidth=.5, colour="black", fill="white") Histogram adding the mean • ggplot(dat, aes(x=BMI)) + geom_histogram(binwidth=.5, colour="black", fill="white") +geom_vline(aes(xintercept=mean(BMI, na.rm=T)),color="red", linetype="dashed", size=1) Tip: you can use “bin width” to adjust bin size (wider bins, more items in each bin) • ggplot(dat, aes(x=BMI)) + geom_histogram(binwidth=5, colour="black", fill="white") +geom_vline(aes(xintercept=mean(BMI, na.rm=T)),color="red", linetype="dashed", size=1)

  24. ggplot2 Example 2: Histogram and Density Graphs

  25. ggplot2 Example 4: Bar Graph Making a bar graph: • ggplot(data=dat, aes(x=SampleID, y=BMI))+ geom_bar(stat="identity") • Colors • ggplot(data=dat, aes(x=SampleID, y=BMI, fill=Ancestry))+ geom_bar(stat="identity")

  26. Ggplot2: Bar Graph • The space below the top line on a bar chart is usually meaningless – only representing the distance between start value and plotted value • The information of the bar plot can actually be represented with single dots • This can cut down on visual clutter, and also make a more visually meaningful plot • One way to show the trends of the points – but needs more: • ggplot(data=dat, aes(x=SampleID, y=BMI))+ geom_point()

  27. Ggplot2: Bar Graph • One way to show the trends of the points better – however this might make it seem like nearby points are related via proximity • ggplot(data=dat, aes(x=SampleID, y=BMI))+ geom_line()+geom_point()

  28. Ggplot2: Cleveland Dot Plot • Another way to show the trends of the points better: Cleveland Dot Plot • ggplot(data=dat, aes(x=SampleID, y=BMI))+geom_segment(aes(xend=SampleID),yend=0,color="grey") + geom_point()

  29. ggplot2 Example 5: Creating Boxplots • When comparing the distributions of groups of data, boxplots are a great approach instead of bar charts • Command: • ggplot(dat,aes(x=Sex,y=BMI)) + geom_boxplot()

  30. ggplot2 Example 5: Creating Boxplots • When comparing the distributions of groups of data, boxplots are a great approach instead of bar charts • Adding color to box plot • ggplot(dat,aes(x=Sex,y=BMI,fill=Sex)) + geom_boxplot()

  31. ggplot2 Example 5: Creating Boxplots • When comparing the distributions of groups of data, boxplots are a great approach instead of bar charts • Add summary like mean to box plot (Adding mean as a diamond shape) • ggplot(dat,aes(x=Sex,y=BMI)) + geom_boxplot()+ stat_summary(fun.y=mean,geom="point", shape=5, size=4)

  32. ggplot2 Example 5: Creating Boxplots • When comparing the distributions of groups of data, boxplots are a great approach instead of bar charts • Adding individual data points to the box plot • ggplot(dat, aes(x=Sex,y=BMI,fill=Sex)) + geom_boxplot()+geom_point()

  33. ggplot2 Example 5: Creating Boxplots • When comparing the distributions of groups of data, boxplots are a great approach instead of bar charts • What about adding a title? • ggplot(dat,aes(x=Sex,y=BMI,fill=Sex)) + geom_boxplot()+geom_point()+ggtitle("BMI for each Sex")

  34. ggplot2 Example 5: Creating Boxplots • When comparing the distributions of groups of data, boxplots are a great approach instead of bar charts • What about adding modifying the axis titles? • ggplot(dat, aes(x=Sex, y=BMI,fill=Sex)) + geom_boxplot() +geom_point()+ggtitle("BMI for each Sex")+xlab("Sex")+ylab("Body Mass Index")

  35. ggplot2 Example 6:Facets • You to split up your data by one or more variables and plot the subsets of data together: • ggplot(dat,aes(x=PC1,y=PC2)) + geom_point(aes(color=Ancestry))+facet_grid(Sex ~ .)

  36. ggplot2 A Note on Colors • In the examples, we used mostly ggplot2 default colors • There are lots of options for getting into different colors for ggplot2 • http://www.cookbook-r.com/Graphs/Colors_(ggplot2)/ • Example • Using scale_fill_manual, you can use color hexadecimal codes (you can get these from Color Brewer http://colorbrewer2.org/) • ggplot(dat,aes(x=Sex,y=BMI,fill=Sex))+geom_bar(stat="identity") +scale_fill_manual(values=c("#CC6666","#9999CC"))

  37. ggplot2 Other Notes • Not covered here but so many options! • Color of background • Grid line modification • Font choice • Other kinds of plots such as heatmaps, and using the techniques here to make Manhattan plots, coloring maps with information

  38. ggplot2 Other Notes Examples and code are EVERYWHERE!! This was just a Google Image search on “ggplot2”!

  39. Cytoscape • Introduction to Cytoscape

  40. PhenoGram • Chromosomal Ideogram • Can add lines, shapes, and text • Can add cytogenetic banding patterns • Web version here: http://visualization.ritchielab.psu.edu/phenograms/plot • Example files here: http://visualization.ritchielab.psu.edu/phenograms/examples • Currently only human chromosomal information, adding mouse soon and will add other model organisms

  41. PhenoGram • Chromosomal Ideogram • Can add lines, shapes • Can add cytogenetic banding patterns • Download: phenogram-groups-sample.txt • Go to: http://visualization.ritchielab.psu.edu/phenograms/plot

  42. PhenoGram

  43. PhenoGram • Chromosomal Ideogram • Can plot just lines • Download: phenogram-groups-poscolor.txt • Go to: http://visualization.ritchielab.psu.edu/phenograms/plot

More Related