Data visualization and graphic design
This presentation is the property of its rightful owner.
Sponsored Links
1 / 57

Data visualization and graphic design Introducing R for data visualization PowerPoint PPT Presentation


  • 125 Views
  • Uploaded on
  • Presentation posted in: General

Data visualization and graphic design Introducing R for data visualization. Allan Just and Andrew Rundle EPIC Short Course June 21, 2011. Wickham 2008. Intro to R. Objectives After this class, participants will be able to: Describe some capabilities and uses of R

Download Presentation

Data visualization and graphic design Introducing R for data visualization

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Data visualization and graphic design introducing r for data visualization

Data visualization and graphic design

Introducing R for data visualization

Allan Just and Andrew Rundle

EPIC Short Course

June 21, 2011

Wickham 2008


Intro to r

Intro to R

Objectives

After this class, participants will be able to:

  • Describe some capabilities and uses of R

  • Search for help within R and use good coding practices for reproducible research in R

  • Read in and summarize a simple dataset with R/JGR/Deducer

  • Make some standard plots with Deducer templates


What is r

What is R?

nytimes.com


R has many uses

R has many uses

  • Work with data: subset, merge, and transform datasets with a powerful syntax

  • Analysis: use existing statistical functions like regression or write your own

  • Graphics: graphs can be made quickly during analysis and polished for publication quality displays


Why learn a whole language to look at data versus excel

Why learn a whole language to look at data versus Excel?

  • Recreate/redo your exact analysis

  • Automate repetitive tasks

  • Access to statistical methods not available in Excel

  • Graphs are more elegant


Why r versus sas spss or stata

Why R versusSAS, SPSS, or Stata?

  • It's free!

  • It runs on Mac, Windows, and Linux

  • It has state-of-the-art graphics capabilities

  • It contains advanced statistical routines not yet available in other packages – a de facto standard in statistics

  • Can program new statistical methods or automate data manipulation/analysis

adapted from statmethods.net


Data visualization and graphic design introducing r for data visualization

Made in SAS

Redone in R

learnr.wordpress.com


R plots from my own research

R plots from my own research


Scatterplot matrix bivariate densities and correlations

Scatterplot matrixbivariate densities and correlations


Data visualization and graphic design introducing r for data visualization

Forest plot to compare parameter estimates

from many models


Displaying lots of data facetted histograms

Displaying lots of data: facetted histograms


Plotting data with a model

Plotting data with a model


Data visualization and graphic design introducing r for data visualization

Automated report generation


Choropleth map

Choropleth map

Shapefile: CIESIN, Columbia University Asthma data: http://nyc.gov/html/doh/downloads/pdf/asthma/asthma-hospital.pdf


Intro to r recap

Intro to R: recap

Objectives

After this class, participants will be able to:

  • Describe some capabilities and uses of R

    Statistical data analysis

    Automation (scripting) of functions to work with data

    Elegant graphics to facilitate data visualization

  • Search for help within R and use good coding practices for reproducible research in R

  • Read in and summarize a simple dataset with R/JGR/Deducer

  • Make some standard plots with Deducer templates


Data visualization and graphic design introducing r for data visualization

Learning a new language is difficult

flickr.com/photos/dnorman/3732851541/


What makes r difficult to learn

What makes R difficult to learn

R is designed to be flexible and powerful rather than simple but limited.

R is a fully featured language mainly used from the command line. Learning the commands and the structure of the code takes time and practice.

If I made a a typo you would know what I meant...


What makes r difficult to learn1

What makes R difficult to learn

R is designed to be flexible and powerful rather than simple but limited.

The solution:

be careful

build code in simple pieces and test as you go (learn to debug).

Reuse code that works.

Use helpful resources.

Consider an alternative GUI for R.


Getting help in r

Getting help in R

You can call for help on a function with a leading question mark and leaving off the ()

?functionname

Search online

statmethods.net

An Introduction to R

in Windows found under Help – Manuals (in PDF)


Suggestions for an r workflow

Suggestions for an R workflow

Save the bits of your code that work in a text editor - building a script of clean code that works from start-to-finish.

With clean code instead of transformed data files it is easier to redo analyses if your data are updated or you want to change an earlier step

Leave yourself informative comments

# everything to the right of the pound sign

# is unevaluated

Using spaces and indents can help readability

Use meaningful names for objects

Reproducible research!


Intro to r recap1

Intro to R: recap

Objectives

After this class, participants will be able to:

  • Describe some capabilities and uses of R

  • Search for help within R and use good coding practices for reproducible research in R

    ?t.testwill bring up R help

    Free manuals online: Introduction to R Also: statmethods.net

    #use comments; save the code that works to reproduce your results

  • Read in and summarize a simple dataset with R/JGR/Deducer

  • Make some standard plots with Deducer templates


Learning the language many important features

Learning the languageMany important features

  • Arithmetic and logical operators: +, <, …

  • Data types: numeric, logical, …

  • Data structures: vectors, matrices, …

  • Functions – always end with (): median(x)


Using r as a calculator

Using R as a calculator

Mathematical operators

+ - / * ^

log()

abs()


R can evaluate logical expressions

R can evaluate logical expressions

== equal

!= not equal

& and

| or (vertical pipe)

10 < 20

[1] TRUE

pi > 3 & 2^2 == 4

[1] TRUE

"This" != "That"

[1] TRUE


Creating new objects

Creating new objects

Assignment operator is <- (looks like an arrow)

x <- 10

“Set x to take the value 10”

The symbols in this operator must be adjacent.

x < - 10 What does this do?

You can overwrite old values

x <- x^2

“Set x to take the value x2”


Indexing and subsetting

Indexing and subsetting

Concatenate function is c()

x <- c(10, 20, 30)

x

[1] 10 20 30

Refer to components of objects by a position index which goes between square braces

x[2]return the second position in x

[1] 20

x[c(1, 2)]return the first and second position in x

[1] 10 20

x[-3]return all except the third position in x

[1] 10 20

What wouldx[c(3, 2)]return?


Data frames

Data frames

A data frame is a rectangular collection of data

Rows: observations

Columns: variables

diamonds <- data.frame(carat, cut, price)

carat cut price

1 0.23 Ideal 326

2 0.21 Premium 326

3 0.23 Good 327

4 0.29 Premium 334

5 0.31 Good 335

6 0.24 Very Good 336


Data frames1

Data frames

You can extract the variables as vectors with a $

diamonds$cut

You can also index by position (or name) with square braces

diamonds[2, 3] returns the single value in row 2, column 3

An empty index is treated like a wildcard and corresponds to all rows or columns depending on position

diamonds[, "cut"] (same result as diamonds$cut)

How would you return the first three rows and all columns?

row, column


R functions

R functions

Thousands of functions are built-in:

median()lm() linear model

t.test()chisq.test()

or make your own:

inch.to.cm <- function(x){x * 2.54}

inch.to.cm(74)

[1] 187.96


Missing values

Missing values

These take a value of NA

Can be in a data object of any type (logical, numeric, character)

By default operations on NA will return NA

NA == NA

[1] NA

Can check for NA with is.na()

y <- c(2, 10, NA, 12)

is.na(y)

[1] FALSE FALSE TRUE FALSE

Can often pass na.rm = T option to remove NA values in operations

mean(y)

[1] NA

mean(y, na.rm = T)

[1] 8


R has several thousand additional packages

R has several thousandadditional packages

time series

survival

spatial

machine learning

bioinformatics

Interfaces to Excel, SQL databases, Twitter, google maps…


Installing a package

Installing a package

  • Open up R

  • Click in to the console window and type:

    install.packages()

  • Select a mirror (anywhere in the US)

  • Find and select "Deducer" and choose OK.

  • This will download Deducer and the other packages which it requires, including ggplot2.


The default r graphical user interface windows

The default R graphical user interface (Windows)


Data visualization and graphic design introducing r for data visualization

JGR


Deducer

Deducer


Recap on guis

Recap on GUIs

R

Default Windows GUI: lacks additional features to make learning or programming easier

JGR: Makes programming easier with syntax highlighting and command argument suggestions. No menus for stats. Looks the same across platforms (Java based)

Deducer: Adds menus for basic stats to JGR. Menu driven graphics options (building with ggplot2).


R graphics 3 main dialects

R graphics – 3 main "dialects"

Base: with(airquality, plot(Temp, Ozone))

Lattice: xyplot(Ozone ~ Temp, airquality)

ggplot2:

ggplot(airquality, aes(Temp, Ozone)) + geom_point( )


Google image search ggplot2

Google image search: ggplot2


Ggplot2 philosophy

ggplot2 philosophy

Written by Hadley Wickham (Rice Univ.)

Extends The Grammar of Graphics (Wilkinson, 2005)

All graphs can be constructed by combining specifications with data (Wilkinson, 2005).

A specification is a structured way to describe how to build the graph from geometric objects (points, lines, etc.) projected on to scales (x, y, color, size, etc.)


Ggplot2 philosophy1

ggplot2 philosophy

When you can describe the content of the graph with the grammar, you don’t need to know the name of a particular type of plot…

Dot plot, forest plot, Manhattan plot are just special cases of this formal grammar.

…a plotting system with good defaults for a large set of components that can be combined in flexible and creative ways…


Building a plot in ggplot2

Building a plot in ggplot2

data to visualize (a data frame)

map variables to aesthetic attributes

geometric objects – what you see (points, bars, etc)

scales map values from data to aesthetic space

faceting subsets the data to show multiple plots

statistical transformations – summarize data

coordinate systems put data on plane of graphic

Wickham 2009


A basic ggplot2 graph

A basic ggplot2 graph

ggplot(airquality) + geom_point(aes(x = Temp, y = Ozone))

Aesthetics map variables to scales

Data

Geometric objects to display


A ggplot2 graph is an r object

A ggplot2 graph is an R object

p <- ggplot(airquality) + geom_point(aes(x = Temp, y = Ozone))

str(p) #structure of p

List of 8

$ data :'data.frame':153 obs. of 6 variables:

..$ Ozone : int [1:153] 41 36 12 18 NA 28 23 19 8 NA ...

..$ Solar.R: int [1:153] 190 118 149 313 NA NA 299 99 19 194 ...

..$ Wind : num [1:153] 7.4 8 12.6 11.5 14.3 14.9 8.6 13.8 20.1 8.6 ...

..$ Temp : int [1:153] 67 72 74 62 56 66 65 59 61 69 ...

..$ Month : int [1:153] 5 5 5 5 5 5 5 5 5 5 ...

..$ Day : int [1:153] 1 2 3 4 5 6 7 8 9 10 ...

$ layers :List of 1

..$ :proto object

.. .. $ mapping :List of 2

.. .. ..$ x: symbol Temp

.. .. ..$ y: symbol Ozone

.. .. $ geom_params:List of 1

.. .. ..$ na.rm: logi FALSE

.

.

.

$ plot_env :<environment: R_GlobalEnv>

- attr(*, "class")= chr "ggplot"

Note that the internal plot specification includes the data

So if you update the data, update the call to ggplot()

shortened substantially


Help with learning ggplot2

Help with learning ggplot2

Website:

had.co.nz/ggplot2/

Thousands of examples!

Book:

ggplot2: Elegant Graphics for Data Analysis

Hadley Wickham, 2009

Graphic User Interface:

Deducer(R package)

Ian Fellows


Intro to r recap2

Intro to R: recap

Objectives

After this workshop participants will be able to:

  • Describe some capabilities and uses of R

  • Search for help within R and use good coding practices for reproducible research in R

  • Read in and summarize a simple dataset with R/JGR/Deducer

    Together, let’s explore some data from the WHO - Global School Health Survey.

    I will also give you ascript containing code which you can run, modify, and take home!

  • Make some standard plots with Deducer templates


Open jgr

Open JGR -


Load the deducer package

Load the Deducer package


Data visualization and graphic design introducing r for data visualization

Note additional menus


Intro to r recap3

Intro to R: recap

Objectives

After this workshop participants will be able to:

  • Describe some capabilities and uses of R

  • Search for help within R and use good coding practices for reproducible research in R

  • Read in and summarize a simple dataset with R/JGR/Deducer

  • Make some standard plots with Deducer templates

    Using the gshsdataframe – let's make some plots together using templates in:

    Deducer → Plots → Plot Builder


Data visualization and graphic design introducing r for data visualization

Since R, JGR, and Deducer are free,

you should install them at home or

work and play with them!


Installing r jgr deducer part i r on windows shown or mac or linux

Installing R, JGR, DeducerPart I: R on Windows (shown), or Mac, or Linux

R is available from a set of mirrors known as The Comprehensive RArchive Network (CRAN)

http://cran.r-project.org/

Closest mirror and link for windows:

http://software.rc.fas.harvard.edu/mirrors/R/bin/windows/base/

Uses a Windows installer – default options are fine


Installing r jgr deducer part ii jgr on windows shown or mac or linux

Installing R, JGR, DeducerPart II: JGR on Windows (shown), or Mac, or Linux

JGR requires a Java Development Kit (JDK)

You probably don't have this*

Available free at:

http://www.oracle.com/technetwork/java/javase/downloads/index.html

*if you did have a JDK (and not just a JRE)

you would have a folder named something like …

C:\Program Files\Java\jdk1.6.0_20\


Installing r jgr deducer part ii jgr on windows shown or mac or linux1

Installing R, JGR, DeducerPart II: JGR on Windows (shown), or Mac, or Linux

JGR requires a launcher file on Windows:

http://www.rforge.net/JGR/web-files/jgr-1_62.exe

Leave this as your desktop shortcut


Installing r jgr deducer part iii installing deducer

Installing R, JGR, DeducerPart III: Installing Deducer

Deducer is an R package

From within JGR

To install packages: Packages & Data -> Package Installer

To load packages: Packages & Data -> Package Manager


A few helpful r links

A few helpful R links

Download R: http://cran.r-project.org/available for Windows, Mac OS X, and Linux

Advice – A clearly stated question with a reproducible example is far more likely to get help. You will often find your own solution by restating where you are getting stuck in a clear and concise way.

Writing reproducible examples: https://gist.github.com/270442

General R links

http://statmethods.net/ Quick-R for SAS/SPSS/Stata Users - An all around excellent reference site

http://www.ats.ucla.edu/stat/R/Resources for learning R from UCLA with lots of examples

http://www.r-bloggers.com/learning-r-for-researchers-in-psychology/ This is a nice listing of R resources

http://stackoverflow.com/questions/tagged/rQ&A forum for R programming questions - lots of good help!

see also: http://crossvalidated.com for general stats & R

http://rstudio.org Integrated Development Environment for command line programming with R

ggplot2 links

http://had.co.nz/ggplot2/ggplot2 help & reference – lots of examples

http://groups.google.com/group/ggplot2ggplot2 user group – great for posting questions

https://github.com/hadley/ggplot2/wikiggplot2 wiki: answers many FAQs, tips & tricks

http://www.slideshare.net/hadley/presentations Over 100 presentations by Hadley Wickham, author of ggplot2. A four-part video of a ½ day workshop by him starts here: http://had.blip.tv/file/3362248/

Setting up JGR in Windows

JGR requires a JDK – speak to your IT person if this seems daunting (http://www.oracle.com/technetwork/java/javase/downloads/index.html)

On Windows, JGR needs to be started from a launcher. For R version 2.13.0 on Windows with a 32bit R you will likely want to get the file jgr-1_62.exe as a launcher from here: http://www.rforge.net/JGR/

A discussion of the features of JGR can be found in this article (starting on page 9):

http://stat-computing.org/newsletter/issues/scgn-16-2.pdf

Deducer - an R package which works best in a working instance of JGR – has drop-down menus for ggplot2 functionality

http://www.deducer.org/pmwiki/pmwiki.php?n=Main.DeducerManual

There are great videos linked here introducing the Deducer package (although the volume is quite low)

This slide last updated 06/19/2011


  • Login