introduction to r n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Introduction to R PowerPoint Presentation
Download Presentation
Introduction to R

Loading in 2 Seconds...

play fullscreen
1 / 45

Introduction to R - PowerPoint PPT Presentation


  • 384 Views
  • Uploaded on

Introduction to R. Summer session: Lecture 3 Brian Healy. Outline. Discussion of R Importing and changing data Creating your own data Summary statistics / graphs Tests for normality. What is R?. Statistical computer language similar to S-plus Has many built-in statistical functions

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Introduction to R' - jana


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
introduction to r

Introduction to R

Summer session: Lecture 3

Brian Healy

outline
Outline
  • Discussion of R
  • Importing and changing data
  • Creating your own data
  • Summary statistics / graphs
  • Tests for normality
what is r
What is R?
  • Statistical computer language similar to S-plus
  • Has many built-in statistical functions
  • Easy to build your own functions (similar to SAS macros)
  • Good graphic displays
  • Extensive help files
strengths
Strengths
  • Many built-in functions
  • Can get other functions from the internet by downloading libraries
  • Relatively easy data manipulations

Weaknesses

  • Not as commonly used by non-statisticians
  • Many datasets already in SAS form
starting r
Starting R
  • Windows / HSPH computers
    • Open using the start menu or g: drive
  • Unix / Telnet in HSPH from home
    • Open using R1.9 command
  • Unix specific commands will be discussed at the end of the session
writing r code
Writing R code
  • Can input lines one at a time into R
  • Can write many lines of code in a text editor and run all at once
    • Using Windows version, simply paste the commands into R
    • Using Unix version, save the commands and run in batch mode
types of commands
Types of commands
  • Defining variables
  • Inputting data
  • Using built-in functions
  • Using the help menu and notation
    • ?functionname, help.search(“functionname”)
  • Writing your own functions
language layout
Language layout
  • Three types of statement
    • expression: it is evaluated, printed, and the value is lost (3+5)
    • assignment: passes the value to a variable but the result is not printed automatically (out<-3+5)
    • comment: (#This is a comment)
naming conventions
Naming conventions
  • Any roman letters, digits, and ‘.’ (non-initial position)
  • Avoid using system names: c, q, s, t, C, D, F, I, T, diff, mean, pi, range, rank, tree, var
  • Hold for variables, data and functions
arithmetic operations and functions
Arithmetic operations and functions
  • Most operations in R are similar to Excel and calculators
  • Basic: +(add), -(subtract), *(multiply), /(divide)
  • Exponentiation: ^
  • Remainder or modulo operator: %%
  • Matrix multiplication: %*%
  • sin(x), cos(x), cosh(x), tan(x), tanh(x), acos(x), acosh(x), asin(x), asinh(x), atan(x), atan(x,y) atanh(x)
  • abs(x), ceiling(x), floor(x)
  • exp(x), log(x, base=exp(1)), log10(x), sqrt(x), trunc(x) (the next integer closer to zero)
  • max(), min()
defining new variables
Defining new variables
  • Assignment symbol, use “<-” (or _)
  • Scalars
    • scal<-6
    • value<-7
  • Vectors
    • vec<-c(0,1,2)
    • vec2<-c(1:10)
    • vec3<-c(8,6,4,2,10,12,14)
    • famnames<-c("Kate", "Andrew", "Brian")
  • Variable names are case sensitive
examples
Examples
  • Try the following
    • 3*4+2
    • 5*scal
    • (value+scal)*2
    • 3+vec
    • sqrt(vec2)
    • trunc(log(vec3))
  • What happened in the last three cases?
  • How can we assign the result of these to a new variable?
  • What is the minimum of vec3?
indexing vectors
Indexing vectors
  • To choose individual observations from a vector, use vec[1]
  • Try the following:
    • Find the 3rd value from vec
    • List the 4th, 5th, and 6th values from vec2
    • Make a new variable that is only the first 2 values from vec2
matrix
Matrix
  • There are several ways to make a matrix
  • To make a 2x3 (2 rows, 3 columns) matrix of 0’s:
    • mat<-matrix(0,2,3)
  • To make the following matrix:
    • mat2<-rbind(c(71,172),c(73,169),c(69,160),c(65,130))
    • mat3<-cbind(c(71,73,69,65),c(172,169,160,130))
  • To make the following matrix:
    • mat4<-matrix(vec2,2,5, byrow=T)
matrix examples
Matrix Examples
  • Create the following matrices
    • ex1
    • ex2 ex3
rows and columns of matrices
Rows and columns of matrices
  • You can pick an individual data point, row or column from a matrix
  • mat[1,2] will give the observation in the first row, second column
  • What happens when you type
    • mat2[,2]
    • mat2[2,]
    • mat4[1:3,1]
lists
Lists
  • The final form of data is lists
  • Type the following code
    • ourlist<-list(v1=vec,v2=vec2, fam=famnames)
  • Now, let’s look at ourlist
  • If there are no names, we can give the elements of our list names using
    • names(outlist)<-c(“v1”, “v2”, “fam”)
  • To get individual parts of a list we must use the $ sign
  • What happens when you type ourlist$v1?
  • What happens when you type ourlist$v1[1]?
data frames
Data frames
  • Very similar to matrices in form
  • Different columns can be of different types, unlike matrices
  • famages<-c(24, 29, 27); famheight<- c(64, 73, 71)
  • We can put my family names, ages and heights into a data frame
  • fam<-data.frame(famnames, famages, famheight)
converting data
Converting data
  • To change a matrix to a data frame
    • as.data.frame()
  • To change a data frame to a matrix use
    • fammat<-as.matrix(fam)
    • Try this on your own
    • How does this change?
  • What happens when we type fam[,2]+3 and fammat[,2]+3
opening data
Opening data
  • You must always know the directory where your files have been stored
  • In R for Windows, use \\

c:\\splus\\free.dat

  • In Unix, splus/free.dat
  • For now we will show to do this in Windows
reading in data
Reading in data
  • Change the directory to g:\shared\bio271summer
  • Let’s look at the class data set in the notepad
  • To read in data, use this command: class<-read.table("class.dat", header=T)
  • This command assumes that the data is space or tab delimited
  • Reads in as data frame
working with the data
Working with the data
  • Type class to look at the data
  • How could we find people height in cm?
    • Use cmheight<-class[,3]*2.54
    • This command completes the operation on the entire column as we discussed before
  • We can attach this new variable to the old dataset using newclass<-cbind(class,cmheight)
  • rbind is used to combine extra rows
    • In this example, that would be more students
reading in data cont
Reading in data cont.
  • Look at the data in auto.dat in the notepad
  • Note that data is comma delimited
  • Try the read.table method
  • What is wrong? Look at ?read.table
  • auto<-read.table(“auto2.dat", sep=",", header=T)
practice
Practice
  • Find the minimum height in the class
  • Add two family members to your class data set
  • Make a new variable mpg from the third column of auto
outputting data check
Outputting data-CHECK
  • write a vector to a file: write()

write(x,file=“outdata”)

  • write a matrices or data frames

write.matrix<-function(x,file=“”,sep=“”) {

x<-as.matrix(x)

p<-ncol(x)

cat(dimnames(x)[[2]],format(t(x)),file=file,

sep=c(rep(sep,p-1),”\n”))

}

  • Try to write your family dataset to your P:\\ drive
input output
Input/Output
  • execute commands from a file:

source(“command.s”)

    • use options(echo=T) to have the commands echoed.
  • divert output to a file:

sink(“record.lis”)

  • write objects to an external file:

dump(c(“a”,“x”,“ink”),file=“outdat”)

  • general print:

cat(format(iris[,1,1]),fill=60)

sorting data
Sorting data
  • There are several ways to sort data in R
  • To sort a vector, use sort()
    • To sort the ages in the class, sort(class[,2])
  • To sort the entire matrix, use order()
    • To order the class by age, class[order(class[,2]),]
    • To get the same result in two steps
      • o<-order(class[,2])
      • class[o,]
  • Try to sort auto by foreign
missing values
Missing values
  • What happens if there are missing values as in the auto data sets?
  • R codes these as NA
  • How can we change the NA’s to 0’s?
  • is.na(x) is a logical function that assigns a T to all values that are NA and F otherwise
  • Ans: data[is.na(data)]<-0
practice1
Practice
  • In slide 11, several functions were mentioned including sum and min
  • Try these functions on mpg
  • What has happened?
  • How can we find the sum of mpg not including the missing values?
    • mean(mpg[!is.na(mpg)])
practice2
Practice
  • What is the difference when you use the functions sqrt on the same data?
  • Why is there a difference in the effect of the missing data?
loops and conditionals
Loops and conditionals
  • Conditional
    • if (expr) expr
    • if (expr) expr else expr
  • Iteration
    • repeat expr
    • while (expr) expr
    • for (name in expr1) expr
  • For comparisons use:
    • == for equal
    • != for not equal
    • > for greater than
    • && for and
    • | for or
examples1
Examples
  • What happens with the following code?
    • if (value==1) {check<-1} else {check<-0}
    • counter<-0

for (i in 1:10) {

if (auto[i,3]<10){counter<-counter+1}

}

basic r functions
Basic R functions
  • Let’s use the class data set
  • Find the summary statistics of the data
    • summary(class)
    • Notice how names are handled compared to ages and heights
    • What happens when we type range(class)
    • How can we find the range of age and height using this command?
functions
Functions
  • You can define a function to complete any operation
  • out<-function(var){definition}
  • Let’s look at this function:

filter <- function(x){

if (is.na(sum(x))){fil<-F}

else {fil<-T}

}

filt <- apply(auto, 1, filter)

newauto<-auto[filt,]

  • What is this function doing?
practice3
Practice
  • Write a function to calculate the sum of the numbers in a vector greater ten and the sum of the numbers less than or equal to ten
  • Output the answer in a list using list(ans1,ans2) at the end of your function
  • Try your function on the mpg data from the auto data set
  • Look at the output when you apply your function
generating data from distributions
Generating data from distributions
  • Many applications require generating data from specific distributions
  • r<distname>(n,<parameters>)
    • Possible distributions: beta, cauchy, chisq, f, gamma, norm, t, unif
  • You find other characteristics of distributions as well
    • d<dist>(x,<parameters>): density at x
    • p<dist>(x,<parameters>): cumulative distribution function to x
    • q<dist>(p,<parameters>): inverse cdf
using the distribution functions
Using the distribution functions
  • Often when we use simulations we need to use the r functions
  • Try sample<-runif(n=10,min=0,max=1)
    • What happens here?
plots
Plots
  • One of the biggest advantages of R is the quality of the plots
  • Let’s plot the ages of the class
  • To make plots in R, use the following commands for the appropriate plots
    • age<-class[,2]
    • histogram- hist(age)
    • box plot- boxplot(age)
plot command
Plot Command

The basic command-line command for producing a scatter plot or line graph.

col= set colors,

lty= set line types,

lwd= set line widths,

pch= set the character type,

type= pick points (type = "p"), lines ("l"),

cex= set the "character expansion“,

xlab= and ylab= set the labels,

xlim= and ylim= set the limits of the axes,

main= put a title on the plot,

mtext= add a sub-title,

help (par) for details

one dimensional plots
One-Dimensional Plots
  • barplot(height) #simple form
  • barplot(height, width, names, space=.2, inside=TRUE, beside=FALSE, horiz=FALSE, legend, angle, density, col, blocks=TRUE)
  • boxplot(..., range, width, varwidth=FALSE, notch=FALSE, names, plot=TRUE)
  • hist(x, nclass, breaks, plot=TRUE, angle, density, col, inside)
two dimensional plots
Two-Dimensional Plots
  • lines(x, y, type="l")
  • points(x, y, type="p"))
  • matplot(x, y, type="p", lty=1:5, pch=, col=1:4)
  • matpoints(x, y, type="p", lty=1:5, pch=, col=1:4)
  • matlines(x, y, type="l", lty=1:5, pch=, col=1:4)
  • plot(x, y, type="p", log="")
  • abline(coef), abline(a, b), abline(reg), abline(h=), abline(v=)
  • qqplot(x, y, plot=TRUE)
  • qqnorm(x, datax=FALSE, plot=TRUE)
three dimensional plots
Three-Dimensional Plots
  • contour(x, y, z, v, nint=5, add=FALSE, labex)
  • interp(x, y, z, xo, yo, ncp=0, extrap=FALSE)
  • persp(z, eye=c(-6,-8,5), ar=1)
multiple plots per page
Multiple Plots Per Page
  • par(mfrow=c(nrow, ncol), oma=c(0, 0, 4, 0))
    • mfrow=c(m,n) : subsequent figures will be drawn row-by-row in an m by n matrix on the page.
    • oma=c(xbot,xlef,xtop,xrig):outer margin lines of text.
  • mtext(side=3, line=0, cex=2, outer=T, "This is an Overall Title For the Page")
  • Try this code on your own
    • par(mfrow=c(2,1)
    • hist(age)
    • plot(class[,2],class[,3])
output to a postscript file
Output to a postscript file
  • Often we want to output an R graph to a postscript file to place it into a Latex file or other document
  • To do this, we use the following code
    • postscript(“graph1.ps”) – This opens a postscript file in the home directory
    • plot(regr) – This plots a graph into the file
    • dev.off() – This closes the postscript file
making plots of your own
Making plots of your own
  • Make the following plots
    • Histogram of height in the class with the appropriate labels
    • Scatterplot of height and age in the class using a different point
    • Make a postscript file with four plots of your choice from the baby dataset on one graph
    • Write a function to make a histogram and boxplot on one graph and use it on any of your data