1 / 69

Ann Arbor ASA Up and Running Series R

A comprehensive introduction to R programming language for statistical analysis. Learn about functions, working with data, importing/exporting data, graphs and statistics, and more. Access presentation materials and additional resources.

abob
Download Presentation

Ann Arbor ASA Up and Running Series R

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Ann Arbor ASA Up and Running Series R Sponsored by the Ann Arbor Chapter of the American Statistical Association and the Department of Statistics of the University of Michigan

  2. Content • Introduction • R Help • Functions • Working with Data • Importing/Exporting Data • Graphs + Statistics • Practice Problems • Further Resources Ann Arbor ASA Up and Running Series: R

  3. http://community.amstat.org/annarbor/home Presentation Materials  Up and Running Series  R Select files: R - code R - furniture_txt R - furniture_xlsx R - datafiles R - workshop Save upload of each file to Desktop Ann Arbor ASA Up and Running Series: R

  4. Introduction - What is R? • R is open source with code available to users • R is object-oriented programming • involves the S computer language • R is a commonly used for statistical analysis • R is a free software package • R-project.org Ann Arbor ASA Up and Running Series: R

  5. Introduction - Installation Ann Arbor ASA Up and Running Series: R

  6. Introduction - More About R • Statistical analysis is done using pre-defined functions in R. • Upon download of the ‘base’ package, you have access to many functions. • More advanced functions will require the of download other packages. Ann Arbor ASA Up and Running Series: R

  7. Introduction – What can you do with R? • Topics in statistics are readily available • Linear modeling, linear mixed modeling, clustering, multivariate analysis, non-parametric methods, classification, among others. • R produces high quality graphics • Simple plots are easy • With more practice, users can produce publishable graphics! Ann Arbor ASA Up and Running Series: R

  8. Introduction - Launch R • Start  All Programs  Math & Statistics  R Workspace Ann Arbor ASA Up and Running Series: R

  9. Introduction – Editor Window • Get Editor window: File  New script • More convenient than workspace Workspace Editor window Ann Arbor ASA Up and Running Series: R

  10. Introduction - Data Objects in R • Users create different data objects in R • Data objects refer to variables, arrays of numbers, character strings, functions and other more complicated data manipulations • <- allows you to assign data objects • Type in your editor window: a <- 7 • Submit this command by highlighting it and pressing ctrl+r • Practice creating different data objects and submit them to the workspace Ann Arbor ASA Up and Running Series: R

  11. Introduction - Data Objects in R • Type objects () • This allows you to see that you have created the object aduring this R session • You can view previously submitted commands by using the up/down arrow • You can remove this object by typing rm(a) • Try removing some objects you created and then type objects() to see if they are listed Ann Arbor ASA Up and Running Series: R

  12. Examples • Set up vector named x: • x <- c(5,4,3,6) • This is an assignmentstatement • the functionc() creates a vector by concatenating its arguments • Perform vector/matrix arithmetic: • v <- 3*x - 5 Ann Arbor ASA Up and Running Series: R

  13. Questions? Ann Arbor ASA Up and Running Series: R

  14. Content • Introduction • R Help • Functions • Working with Data • Importing/Exporting Data • Graphs + Statistics • Practice Problems • Further Resources Ann Arbor ASA Up and Running Series: R

  15. R Help – CRAN: Search • CRAN: Search • R archives (manuals, mail, help files, etc.) • faced with a tough analysis question • see if another R user has addressed the question before Ann Arbor ASA Up and Running Series: R

  16. R Help • To get help on any specific function: • help(function.name) • ?(function.name) • Sometimes help is not available from the packages downloaded • ??(function.name) Ann Arbor ASA Up and Running Series: R

  17. R Help • To see a list of all of the functions that come with the base R package • library(help = “base”) Error: unexpected input in "library(help = ““ • library(help = "base") Ann Arbor ASA Up and Running Series: R

  18. R Help • Two popular R resource websites: • Rseek.org • nabble.com Ann Arbor ASA Up and Running Series: R

  19. R Help • For help via the Internet submit • help.start() Ann Arbor ASA Up and Running Series: R

  20. Questions? Ann Arbor ASA Up and Running Series: R

  21. Content • Introduction • R Help • Functions • Working with Data • Importing/Exporting Data • Graphs + Statistics • Practice Problems • Further Resources Ann Arbor ASA Up and Running Series: R

  22. Functions - R Reference Card*created by Tom Short • There are thousands of available functions in R • Reference Card provides a strong working knowledge • Look at the organization of the Reference Card • Try out a few of the functions available! Ann Arbor ASA Up and Running Series: R

  23. Functions - R Reference Card*created by Tom Short Ann Arbor ASA Up and Running Series: R

  24. Functions - Generating Sequences • Sequences • seq(-5, 5, by=.2) • seq(length=51, from=-5, by=.2) • Both produce a sequence from -5 to 5 with a distance of .2 between objects Ann Arbor ASA Up and Running Series: R

  25. Functions - Replicating Objects • Replications • rep(“x”, times=5) • rep(“x”, each=5) • Both produce x replicated 5 times Ann Arbor ASA Up and Running Series: R

  26. Questions? Ann Arbor ASA Up and Running Series: R

  27. Content • Introduction • R Help • Functions • Working with Data • Importing/Exporting Data • Graphs + Statistics • Practice Problems • Further Resources Ann Arbor ASA Up and Running Series: R

  28. Working with Data • There are many data sets available for use in R • data()  to see what’s available • We will work with the trees data set • data(trees) • This data set is now ready to use in R • The following are useful commands: • summary(trees) – summary of variables • dim(trees) – dimension of data set • names(trees) – see variable names • attach(trees) – attaches variable names Ann Arbor ASA Up and Running Series: R

  29. Extracting Data • R has saved the data set trees as a data frame object • Check this by typing - class(trees) • R stores this data in matrix row/column format: data.frame[rows,columns] • trees[c(1:2),2] • first 2 rows and 2nd column • trees[3,c(“Height”, “Girth”)] • reference column names • trees[-c(10:20), “Height”] • skips rows 10-20 for variable Height Ann Arbor ASA Up and Running Series: R

  30. Extracting Data • The subset() command is very useful to extract data in a logical manner, where the 1st argument is data, and the 2nd argument is logical subset requirement • subset(trees, Height>80) • subset where all tree heights >80 • subset(trees, Height<70 & Girth>10) • subset where all tree heights<70 AND tree girth>10 • subset(trees, Height <60 | Girth >11) • subset where all tree heights <60 OR Girth >11 Ann Arbor ASA Up and Running Series: R

  31. Questions? Ann Arbor ASA Up and Running Series: R

  32. Content • Introduction • R Help • Functions • Working with Data • Importing/Exporting Data • Graphs + Statistics • Practice Problems • Further Resources Ann Arbor ASA Up and Running Series: R

  33. Importing Data • The most common (and easiest) file to import is a text file with the read.table() command • R needs to be told where the file is located • set the working directory setwd("C:\\Users\\akazanis\\Desktop") • tells R where all your files are located • OR point to working directory • File  Change dir… and choosing the location of the files • OR include the physical location of your file in the read.table() command Ann Arbor ASA Up and Running Series: R

  34. Using the read.table() command • Include the physical location of your file in the read.table() command read.table("C:\\Users\\akazanis\\Desktop\\furniture.txt",header=TRUE,sep="") • Important to use double slashes \\ • rather than single slash \ • header=TRUE or header=FALSE • Tells R whether you have column names on data Ann Arbor ASA Up and Running Series: R

  35. Using the read.table() command • Another way of specifying the file’s location is to set the working directory first and then read in the file • setwd(“C:\\Users\\akazanis\\Desktop”) • read.table(“furniture.txt”,header=TRUE,sep=“”) • OR point to the location File  Change dir… pointing to the file’s location • Then, read in the data file read.table(“furniture.txt”,header=TRUE,sep=“”) Ann Arbor ASA Up and Running Series: R

  36. read.table(), read.csv(), Missing Values • It is also popular to import csv files since excel files are easily converted to csv files • read.csv() and read.table() are very similar although, they handle missing values differently • read.csv() automatically assigns an ‘NA’ to missing values • read.table() will not load data with missing values • Assign ‘NA’ to missing values before reading it into R Ann Arbor ASA Up and Running Series: R

  37. read.table(), read.csv(), Missing Values • Let’s remove a data entry from both furniture.txt and furniture.csv • From the first row, erase 100 from the Area column • Now read in the data from these two files using read.table() and read.csv() • You should see that you cannot read the data in using the read.table() command unless you input an entry for the missing value Ann Arbor ASA Up and Running Series: R

  38. Other Options for Importing Data • *** When you download R, automatically obtain the foreignpackage*** • Submit library(foreign) • many more options for importing data: • read.xport(), read.spss(), read.dta(), read.mtp() • For more information on these options, submit help(read.xxxx) Ann Arbor ASA Up and Running Series: R

  39. Exporting Data • You can export data by using the write.table() command write.table(trees,“treesDATA.txt”,row.names=FALSE,sep=“,”) • Specifies that we want the trees data set exported • Type in name of file to be exported. • By default R writes file to working directory already specified unless you give a location • row.names=FALSE • tells R that we do not wish to preserve the row names • sep=“,” • data set is comma delimited Ann Arbor ASA Up and Running Series: R

  40. Questions? Ann Arbor ASA Up and Running Series: R

  41. Content • Introduction • R Help • Functions • Working with Data • Importing/Exporting Data • Graphs + Statistics • Practice Problems • Further Resources Ann Arbor ASA Up and Running Series: R

  42. Example - Furniture Data Set • Assign a name to the furniture data set, as we read it in, to do some analysis furn<-read.table(“furniture.txt”,sep=“”,h=T) • To examine data set • dim(furn) • summary(furn) • names(furn) • attach(furn) • It is important to attach before subsequent steps with the data Ann Arbor ASA Up and Running Series: R

  43. Graphs • R can produce very simple and very complex graphs • Make a simple scatter plot of the Area and Cost variables from the furniture data set • plot(Area,Cost,main=“Area vs Cost”,xlab=“Area”,ylab=“Cost”) • Area on the x-axis • Cost on the y-axis • Title and labels the axes Ann Arbor ASA Up and Running Series: R

  44. Graphs • Variables distribution using graphs in R • hist(Area) – histogram of Area • hist(Cost) – histogram of Cost • boxplot(Cost ~ Type) – boxplot of Cost by Type Ann Arbor ASA Up and Running Series: R

  45. Graphs • We can make the boxplot much prettier boxplot(Cost~Type,main=“Boxplot of Cost by Type”, col=c(“orange”,“green”,“blue”), xlab=“Type”, ylab=“Cost”) Ann Arbor ASA Up and Running Series: R

  46. Graphs • Scatter plot matrix of all variables in a data set using the pairs() function • pairs(furn) • Correlation/covariance matrix of numeric variables • cor(furn[,c(2:3)]) • cov(furn[,c(2:3)]) Ann Arbor ASA Up and Running Series: R

  47. Graphs + Statistics • Simple linear regression using the furniture data • m1<-lm(Cost ~ Area) • summary(m1) • coef(m1) • fitted.values(m1) • residuals(m1) Ann Arbor ASA Up and Running Series: R

  48. Graphs + Statistics • Plot the residuals against the fitted values • plot(fitted.values(m1), residuals(m1)) Ann Arbor ASA Up and Running Series: R

  49. Graphs + Statistics • Scatter plot of Area and Cost plot(Area,Cost,main=“Cost Regression Example”,xlab=“Cost”, ylab=“Area”) • abline(lm(Cost~Area), col=3, lty=1) • lines( lowess(Cost~Area), col=3, lty=2) • Interactively add a legend • legend(locator(1),c(“Linear”,“Lowess”),lty=c(1,2),col=3) • point to graph and place legend where you wish! Ann Arbor ASA Up and Running Series: R

  50. Graphs + Statistics • Identify different points on the graph • identify(Area, Cost, row.names(furn)) • Makes it easy to identify outliers • Use the locator() command to quantify differences between the regression fit and the loess line • locator(2) • Example - Compare predicted values of Cost when Area is equal to 50 Ann Arbor ASA Up and Running Series: R

More Related