1 / 41

Training on R For 3 rd and 4 th Year Honours Students, Dept. of Statistics, RU

Training on R For 3 rd and 4 th Year Honours Students, Dept. of Statistics, RU. Installation and Data Structures of R. Empowered by. H igher E ducation Q uality E nhancement P roject (HEQEP) Department of Statistics Rajshahi University, Rajshahi-6205, Bangladesh

skip
Download Presentation

Training on R For 3 rd and 4 th Year Honours Students, Dept. of Statistics, RU

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Training on R For 3rd and 4th Year Honours Students, Dept. of Statistics, RU Installation and Data Structures of R Empowered by Higher Education Quality Enhancement Project (HEQEP) Department of Statistics Rajshahi University, Rajshahi-6205, Bangladesh March 21-23, 2013

  2. Licensed asS-Plusin 1983. 1990:R An open source program similar to S Developed byRobert GentlemanandRoss Ihaka(Auckland, NZ) 1997: Developed international “R-core” team Updated versions available every couple months History of R Statistical Programming LanguageSdeveloped at Bell Labs,1976. • For more:http://cran.r-project.org/mirrors.html

  3. Advantage of R • R is a free computer programming language, developed by renowned Statisticians. • It is open-source and runs on Windows, Linux and Macintosh. • R has excellent graphing capabilities. • R has an excellent built-in help system. • R's language has a powerful, easy to learn syntax with many built-in statistical functions. • The language is easy to extend with user-written functions.

  4. To obtain and install R on your computer • Go to http://cran.r-project.org/mirrors.html • to choose a mirror near you • Click on your favorite operating system (Windows, Linux, or Mac) • Download and install from the “base” • To install additional packages • Start R on your computer • Choose the appropriate item from the “Packages” menu Here, CRAN = Comprehensive R Archive Network.

  5. To obtain and install R on your computer

  6. To obtain and install R on your computer

  7. To obtain and install R on your computer

  8. To obtain and install R on your computer Double Click

  9. To obtain and install R on your computer

  10. To obtain and install R on your computer

  11. Tools bar Menu bar Command Prompt The R Environment

  12. L ctrl + For clear screen The R Environment

  13. > Creating a Script File

  14. Working in R: As Calculator Numeric Operators • 4 +2 =6 • 4 – 2 = 2 • 4 * 2 = 8 • 4 / 2 = 2 • 4 ^ 2 = 16

  15. Variables & Assignment Operator • Numeric • 5, 5.76, etc • Logical • Values corresponding to True or False • Character Strings • Sequences of characters (blue, male, Rahim, etc) • Variables are assigned by the operator <- or = • Data type need not to be declared. a = 5 (or, a <- 5) b = “blue” c = a^2 + 5 c > a etc

  16. Data Structure • Vectors • Matrices • Arrays • Factors • Lists • Data frames

  17. Vector Here we introduce three functions, c, seq, and rep, that are used to create vectors in various situations. c() to concatenate elements or sub-vectors rep() to repeat elements or patterns seq() to generate sequences > c(2, 7, 9) > [1] 2 7 9 > a = c(2, 7, 9) > b = c(3, 5, 8, a) > b > [1] 2 7 9 2 7 9 rep(value(s), number of repetition) > rep(5,10) [1] 5 5 5 5 5 5 5 5 5 5 > rep(c(2,4,6),3) [1] 2 4 6 2 4 6 2 4 6 seq(initial value, Terminated value, increment) > seq(2, 10, 2) > [1] 2 4 6 8 10

  18. Vector h = c(21,25, 19, 22, 23, 20) # Numeric vector h [1] 21 25 19 22 23 20 name = c(“Rahim”, “Rani”, “Raju”) # Character vector name [1] “Rahim” “Rani” “Raju” c = h > 22 # Logical vector c [1] FALSE TRUE FALSE FALSE TRUE FALSE a = c(1,2,3,4,5) a [1] 1 2 3 4 5 a = 1:5 a [1] 1 2 3 4 5

  19. Vector Indexing w = c(1, 3, 5, 2, 10) > w[3] # the third element of w >[1] 5 > w[3:5] # the third to fifth element of w, inclusive >[1] 5 2 10 > w[w>3] # elements in w greater than 3 >w[-2] # all except the second element >[1] 1 5 2 10 > w[w>2 & w<=5) # greater than 2 and less than or equal to 5

  20. Vector Vector used in functions w = c(1, 3, 5, 2, 10) length(w) sum(w) cumsum(w) min(w) max(w) range(w) sum(w) mean(w) median(w) var(w) std(w) summary(w) abs(10-50) sort(w) sort(w, decreasing=T) etc

  21. HTML HTML help.start() Working in R: Using help ?keyword Specific R keyword help(keyword) > ?mean # information on mean command >help(mean) CRAN Full Manual >help(median) >help.start() help.search(“topic”) Finding "vague" topic ??topic

  22. Array & Matrix A matrix in mathematics is just a two-dimensional array of numbers. Matrices and arrays are represented as vectors with dimensions: # Generate a 3 by 4 array > x <- 1:12 > dim(x) <- c(3,4) > x [,1] [,2] [,3] [,4] [1,] 1 4 7 10 [2,] 2 5 8 11 [3,] 3 6 9 12 # Generate a 4 by 5 array > A <- array(1:20, dim = c(4,5)) > A [,1] [,2] [,3] [,4] [,5] [1,] 1 5 9 13 17 [2,] 2 6 10 14 18 [3,] 3 7 11 15 19 [4,] 4 8 12 16 20 • The dim assignment function sets or changes the dimension attribute of x, causing R to treat the vector of 12 numbers as a 3 × 4 matrix. • Notice that the storage is column-major; that is, the elements of the first column are followed by those of the second, etc.

  23. Array & Matrix A matrix in mathematics is just a two-dimensional array of numbers. Matrices and arrays are represented as vectors with dimensions: # Generate a 3 by 2 Matrix > A = matrix(1:12, nrow=3, byrow=T) > A [,1] [,2] [,3] [,4] [1,] 1 2 3 4 [2,] 5 6 7 8 [3,] 9 10 11 12 # 3 x 2 matrix of 0 > Y <- matrix(0, nrow=3, ncol=2) > Y [,1] [,2] [1,] 0 0 [2,] 0 0 [3,] 0 0 > A[ ,2] # 2nd column of matrix A [1] 2 6 10 > A[3, ] # 3rd row of matrix A [1] 9 10 11 12 > A[2 ,2] # (2, 2) th element of matrix A [1] 2 6 10

  24. Basic operations – Matrix

  25. Basic operations – Matrix > A.mat <- matrix(c(19,8,11,2,18,17,15,19,10),nrow=3) > A.mat [,1] [,2] [,3] [1,] 19 2 15 [2,] 8 18 19 [3,] 11 17 10 > inv.A <- solve(A.mat) # inverse of matrix A.mat > t(A.mat) # transpose of matrix A.mat > A.mat %*% inv.A

  26. Basic operations – Matrix > cbind(a,b) [,1] [,2] [,3] [,4] [,5] [,6] [1,] 1 4 7 2 5 8 [2,] 2 5 8 3 6 9 [3,] 3 6 9 4 7 10 > rbind(a,b) [,1] [,2] [,3] [1,] 1 4 7 [2,] 2 5 8 [3,] 3 6 9 [4,] 2 5 8 [5,] 3 6 9 [6,] 4 7 10 > a=matrix(1:9,nrow=3) > b=matrix(2:10, nrow=3) > a [,1] [,2] [,3] [1,] 1 4 7 [2,] 2 5 8 [3,] 3 6 9 > b [,1] [,2] [,3] [1,] 2 5 8 [2,] 3 6 9 [3,] 4 7 10 Cov.matrix = cov(b) Cor.matrix = cor(b) Row.mean = apply(b, 1, mean) Col.mean = apply(b, 2, mean) NOTE: apply(X, MARGIN, FUN)

  27. List • vector: an ordered collection of data of the same type. • > a = c(7,5,1) • > a[2] • [1] 5 • list: an ordered collection of data of arbitrary types. > a = list(Name="Rahim",age=c(12, 23,10), Married = F) > a $Name [1] "Rahim" $age [1] 12 23 10 $Married [1] FALSE • Typically, vector elements are accessed by their index (an integer), list elements by their name (a character string).

  28. Data frames • Data frame is supposed to represent the typical data table that researchers come up with – like a spreadsheet. • It is a rectangular table with rows and columns with same length; data within each column has the same type (e.g. number, text, logical), but different columns may have different types. • Example: • > a • localisation tumorsize progress • 1 proximal 6.3 FALSE • 2 distal 8.0 TRUE • 3 proximal 10.0 FALSE

  29. Making data frames We illustrate how to construct a data frame from the following car data.

  30. Making data frames > Make <- c("Honda","Chevrolet","Ford","Eagle","Volkswagen","Buick","Mitsbusihi", + "Dodge","Chrysler","Acura") > Model <- c("Civic","Beretta","Escort","Summit","Jetta","Le Sabre","Galant", + "Grand Caravan","NewYorker","Legend") > Cylinder <-c (rep("V4",5),"V6","V4",rep("V6",3)) > Weight <- c(2170, 2655, 2345, 2560, 2330, 3325, 2745, 3735, 3450, 3265) > Mileage <- c(33, 26, 33, 33, 26, 23, 25, 18, 22, 20) > Type <- c("Sporty","Compact",rep("Small",3),"Large","Compact","Van", + rep("Medium",2))

  31. Making data frames Now data.frame() function combines the six vectors into a single data frame. > Car <- data.frame(Make, Model, Cylinder, Weight, Mileage, Type) > Car

  32. Making data frames > names(Car) [1] "Make"     "Model"    "Cylinder“ "Weight"   "Mileage"  "Type" > Car[1,]    Make Model Cylinder Weight Mileage   Type 1 Honda Civic       V4   2170      33 Sporty > Car[10,4] [1] 3265 > Car$Mileage  [1] 33 26 33 33 26 23 25 18 22 20 > mean(Car$Mileage)    #average mileage of the 10 vehicles [1] 25.9 > min(Car$Weight) [1] 2170

  33. Making data frames > table(Car$Type) # gives a frequency table Compact   Large  Medium   Small  Sporty     Van       2       1       2        3        1        1 > table(Car$Make, Car$Type) # Cross tabulation              Compact Large Medium Small Sporty Van   Acura      0        0     1       0     0      0   Buick      0        1     0      0     0      0   Chevrolet  1       0     0      0     0      0   Chrysler   0       0     1      0     0      0   Dodge      0       0     0      0     0      1   Eagle      0       0     0      1     0      0   Ford       0       0     0      1     0      0   Honda      0       0     0      0     1      0 Mitsbusihi 1       0     0      0     0      0   Volkswagen 0       0     0      1     0      0

  34. Making data frames > Make.Small <- Car$Make[Car$Type == "Small"] > summary(Car$Mileage) # gives summary statistics Min. 1st Qu. Median Mean 3rd Qu. Max. 18.00 22.25 25.50 25.90 31.25 33.00

  35. Making data frames > b = data.frame(x=rnorm(10), y=rnorm(10), z=rnorm(10)) > b x y z 1 -1.7651180 0.462309932 0.09230914 2 -0.7340731 -1.681826091 0.66648791 3 -0.4968900 1.728658405 -0.68281664 4 -1.3217873 0.307030157 0.24192745 5 -0.2070019 0.003892192 1.19591807 6 -0.9633084 0.060328696 -1.40424843 7 -1.1323626 1.079521099 1.63552915 8 -0.7301976 -1.422012899 -0.16695860 9 0.2979073 0.528152338 0.65995778 10 -0.5759655 0.655296337 -0.39156127 > cor(b) x y z x 1.0000000000 0.0007151043 0.12151913 y 0.0007151043 1.0000000000 -0.05770153 z 0.1215191317 -0.0577015345 1.00000000 > apply(b,1,var) [1] 1.42472853 1.39573092 1.80047438 0.85041478 0.57226442 0.56454121 [7] 2.14379987 0.39516798 0.03357767 0.44098693

  36. Making data frames > b = data.frame(x=rnorm(10), y=rnorm(10), z=rnorm(10)) > b x y z 1 -1.7651180 0.462309932 0.09230914 2 -0.7340731 -1.681826091 0.66648791 3 -0.4968900 1.728658405 -0.68281664 4 -1.3217873 0.307030157 0.24192745 5 -0.2070019 0.003892192 1.19591807 6 -0.9633084 0.060328696 -1.40424843 7 -1.1323626 1.079521099 1.63552915 8 -0.7301976 -1.422012899 -0.16695860 9 0.2979073 0.528152338 0.65995778 10 -0.5759655 0.655296337 -0.39156127 attach(b) lm.D9 <- lm(y ~ x) # Regression of y on x lm.D90 <- lm(weight ~ group - 1) # omitting intercept anova(lm.D9) summary(lm.D9

  37. Data Entry using Data Editor • R has a Data Editor with spreadsheet-like interface. • The interface quite useful for small data sets. • Suppose we want to construct a data frame based on following data

  38. Data Entry using Data Editor • To do this – type • > result <- data.frame(Roll=integer(0), Bstat101=numeric(0), Bstat102=numeric(0)) • > result <- edit(result) • Then enter the data in the Data Editor and close Editor • > result # To see the data • > result <- edit(result) # To modify the data

  39. Reading data from File • An entire data frame can be read directly with the read.table() function. • # Reading data from Excel .csvFile • > data1 <- read.table(file= “d:/RFiles/data1.csv", header=T, sep=“,”) • > data1 <- read.csv(file= “d:/RFiles/data1.csv", header=T ) • > data1 • # Reading data from text file • data2 <- read.table(file= “d:/RFiles/data3.txt", header=T, sep=“\t” ) • > data2 • > attach(data1) • > detach(data1)

  40. Importing from other statistical systems Package foreign on cran provides import facilities for files produced by the following statistical software. > read.mtp # imports a `Minitab Portable Worksheet’ > read.xport # reads a file in SAS format > read.spss # reads files created by spss Package Rstreams on cran contain functions > readSfile # reads binary objects produced by S-PLUS > data.restore # reads S-PLUS data dumps (created by data.dump)

  41. Thanks

More Related