1 / 25

R Lecture 5

R Lecture 5. Naomi Altman Department of Statistics. Example: Regression. The data are available at http://www.stat.psu.edu/~jls/stat511/homework/body.dat ?read.table body=read.table("body.txt",header=T) plot(body$hips,body$weight) plot(body$waist,body$weight) ?formula

vern
Download Presentation

R Lecture 5

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. R Lecture 5 Naomi Altman Department of Statistics

  2. Example: Regression The data are available at http://www.stat.psu.edu/~jls/stat511/homework/body.dat ?read.table body=read.table("body.txt",header=T) plot(body$hips,body$weight) plot(body$waist,body$weight) ?formula lm.out=lm(weight~hips+waist,data=body) attributes(lm.out)

  3. Formulas lm fits the regression of Y on a set of X variables. The variable for Y and the predictors are denoted by a formula of the form. You can also use formulas in other contexts. e.g. plot(weight~waist, data=body)

  4. Object Oriented Programming in R or how a bunch of smart programming types made R easier to use and harder to program - at least in the eyes of a statistician

  5. In the bad old days If I wanted to write a function similar to something already in R, I would edit the R code: myFun=edit(Rfun) myDensity=edit(density) Sometimes the R code would call a C or C++ program, but the code for that is also available.

  6. But now ... plot boxplot rnorm

  7. Classes and Generic Functions I have already mentioned that one of the attributes a R object can have is a class. A generic function is a function that captures the class of an object and then calls another function to do the actual work. If the function is called fun and the class is called cls, the function that does the work is (almost always) called fun.cls. If there is no suitable fun.cls, then fun.default is used.

  8. e.g. plot(body$hips,body$weight) plot(lm.out) plot.default plot.lm

  9. Classes Actually, a class can be a pair c("first","second") in which the "first" "inherits from" i.e. is a special case of "second". In practise, this means that it has all the components of class "first" objects but possibly some additional ones. If there is no fun.first, then the generic function will search for fun.second. Only if there is also no fun.second will fun.default be used.

  10. e.g. plot uses plot.lm on an object with class "lm" and also on an object with class ("glm","lm")

  11. 'inherits' indicates whether its first argument inherits from any of the classes specified in the 'what' argument glm.out=glm(weight~hips+waist,data=body) class(glm.out) "glm" "lm" inherits(lm.out,"lm") inherits(glm.out,"lm") inherits(lm.out,"glm") inherits(glm.out,"glm") plot.lm plot.glm plot(glm.out)

  12. unclass If you remove the class, most objects are just lists. lm.out unclass(lm.out) For example, the "lm" objects are lists with the following components: "coefficients" "residuals" "effects" "rank" "fitted.values" "assign" "qr" "df.residual" "xlevels" "call" "terms" "model" Some of these components are obvious. Some of them are matrix computations that can be used to compute, e.g. the leverages and Cook's Distance (notice that these have not been stored). Some of them are only empty - they are used primarily when the predictor variable is a factor (ANOVA).

  13. Why use classes For the user: less to think about e.g. you can try generic functions like plot and summary with any output For the programmer: provides a framework e.g. you might think about having a plot.myfun and summary.myfun for the function you are writing also, you can use inheritance so that you do not need to write your own functions

  14. Learning to Use Objects and other Extensions Calling C or C++ from R: Writing R extensions Object oriented programming in R (S3 protocol) R Language Definition (S4 protocol) R Internals

  15. Odds and Ends • speeding things up (a bit) • errors in functions and loops • source and sink • attach and detach • resampling

  16. Speeding Things Up (I am assuming that R behaves like Splus) Try to minimize the number of expressions evaluated. R is an "interpreted" language, not a "compiled" language. This means that each time R encounters an expression, it has to decipher it and convert it to machine-readable microcode. If you write a loop, then R has to decipher the expressions within the loop again and again. Looking at the code for apply, it seems to create a loop, so apply does not get around this.

  17. Speeding Things Up Create space for new objects. Every time you change the dimensions of an object, R reallocates space. So my loop: • intercepts=NULL • for (i in 1:1000){y=line+rnorm(25,0,3) • lmout=lm(y~x) • intercepts=c(intercepts,getIntercept(lmout))} would be better as: • intercepts=rep(0,1000) • for (i in 1:1000){y=line+rnorm(25,0,3) • lmout=lm(y~x) • intercepts[i]=getIntercept(lmout))}

  18. Errors in Functions and Loops If an error occurs somewhere within a function or loop, R aborts the whole process and destroys the frame. Nothing will change in your .Rdata directory. There is no such thing as "partial execution"; if even one mistake occurs the whole process is aborted. If you want to "trap" errors so that your routine can recover, the "try" command can be used to return a flag.

  19. "source" and "sink" source("myfile.txt") runs the commands in the text file "myfile" until they are completed source("myOutput.txt") send everything that would be written to the screen to the text file "myOutput" until sink() is typed in

  20. Attach and Detach "attach" and "detach" expand and contract the search path for R. You can attach either a dataframe or a .Rdata workspace. Any new assignments are stored in the current workspace. When you want to stop searching the dataframe or workspace, you can detach it.

  21. Session: Attach and Detach myData=read.table("body.txt",header=T) lm.out=lm(weight~hips+waist,data=myData) hips[1:5] attach(myData) hips[1:5] waist[1:5] waist=hips waist[1:5] detach(myData) hips[1:5] waist[1:5] myData$waist[1:5]

  22. Resampling Many statistical methods require sampling from the data already collected. Examples include the bootstrap and permutation methods. sample(x, size, replace = FALSE, prob = NULL) allows you to sample with or without replacement from x, with unequal probabilities if you want

  23. Resampling sample(x,length(x),replace=F) creates a random permutation of the data

  24. Session: Resampling a=c("H","T") sample(a,10,replace=T) sample(a,10,replace=T,prob=c(.2,.8)) sample(1:5,5,replace=T) sample(1:5,5,replace=F)

  25. #bootstrap distribution of the intercept attach(myData) intercept=rep(0,1000) l=length(weight) for (i in 1:1000){ samp=sample(1:l,l,replace=T) intercept[i]=coefficients(lm(weight[samp]~ hip[samp]+waist[samp]))[1] } hist(intercept)

More Related