1 / 37

Statistical Software

Statistical Software. An introduction to Statistics Using R. Instructed by Jinzhu Jia. Chap 1. R Basics. Installing R R Data Structures Vectors Matrices and Arrays Lists Data Frames Factors Objects. Installing R. R can be downloaded freely from http:// www.r-project.org .

jag
Download Presentation

Statistical Software

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Statistical Software An introduction to Statistics Using R Instructed by Jinzhu Jia

  2. Chap 1. R Basics • Installing R • R Data Structures • Vectors • Matrices and Arrays • Lists • Data Frames • Factors • Objects

  3. Installing R • R can be downloaded freely from http://www.r-project.org. • Windows, MAC, Linux versions

  4. R Interface

  5. An Example • Through this example, we will learn what data structures R is using. • Data frames • Vectors • Factors • Lists • Matrices

  6. Using R as a calculator Now you see a few functions: sin() exp() log() Try sin(pi/2) and Sin(pi/2), you will find that R is sensitive to the case of an alphabetical character We will talk more about functions later

  7. Vectors • A vector is an ordered collection of elements of the same basic type. • Numeric vectors • Logical vectors • Character vectors

  8. Numeric vectors • final_scores <- c(100,99,98) • ## create a vector • ## this is also an assignment statement • ## notice the differences between R and C • “final_scores” is the name of the created variable • “<-” is the assignment operator • 100,99,98 are the values of the elements of the created vector; they are concatenated with function c() • Type the variable name in R and hit enter, you will see this variable on screen

  9. Numeric vectors -- variables • A variable is used to store information • The value can be alternated • Variable names use A-Z, or a-z, 0-9, period (.) and underscore (_) • Variable names cannot include spaces. • Variable names are case sensitive. • Variable names must start with a letter or a period. • Variable names cannot be one of the reserved keywords.

  10. Vectors-- How long is a vector? • length() • A vector is an R object. • Each object has two intrinsic attributes: mode (or type) and length. • We can use mode() and length() to find these two attributes.

  11. Vectors – Change length of a vector • Below is an equivalent way to create the above vector • final_scores2 <- numeric()## the length is 0 • final_scores2[1] = 100 ## thelengthis 1 • final_scores2[2] = 99 ## thelengthis 2 • final_scores2[3] = 98 ### note: `=' is also an assignment operator • Try the following operations: • X = 1:10 • length(X) = 3 • What is X? Differences between () and [] ??

  12. Vectors – Basic operations

  13. Vectors – Index vectors • An index vector is used to select subsets of a vector. • Below are four types of index vectors • A logical vector • A vector of positive integers • A vector of negative integers • A vector of character strings

  14. Logical index vectors • A logical vector must be the same length as the vector from which elements are to be selected • Values corresponds to TRUE in the index vector are selected • For example: find the scores that are greater than 85 • scores[scores >= 85] • Y<-X[!is.na(X)] • scores[gender == ‘Male’]

  15. Positive or negative index vectors • A positive index vector can be any length. • It specifies which element should be included in the result • X[c(1,5,6,1,2,1)] • A negative index vector tells which element should be excluded. • X[-c(2,3)]

  16. Index vectors with character strings • This index vector is used when a vector has a names attribute. • scores = c(90,85,93,78) • names(scores) = c('LiBai','LiHei', 'Li Hong', 'LiXiaolan') • scores[c('LiBai','Li Hong')]

  17. Vectors – A useful example • Plot a unit circle – a circle centered at 0 with radius 1. • X= seq(from = -1,to = 1, by = 0.001) • Y = sqrt(1 - X^2) • Z = c(Y,-Y) • plot(rep(X,2),Z,type = 'l') • n = length(X) • X1 = c(X,X[n:1]) • Z1 = c(Y, -Y[n:1]) • plot(X1,Z1,type = 'l')

  18. Vectors -- Help • Try the following commands • ? plot • ? seq • ? Rep • ?’(‘ • ?’[‘ • Google • baidu

  19. Logical vectors • The elements of a logical vector can have value TRUE, FALSE, or NA (not available) • Logical operators: • >, <, ==, >=, <=, != • & (and) • | (or)

  20. Character vectors • A sequence of characters delimited by the double quote character or single quote – no differences. • For example, c(‘Li Bai’, ‘Li Hong’, ‘Li Xiaolan’) is a character vector with 3 elements. • A useful function: paste() • paste(c(‘a’,’b’,’c’),c(‘1’,’2’,’3’)) • paste(c(‘X’,’Y’), 1:10, sep = “”) • See the differences??

  21. A simple text mining example • text1 = "China's Jade Rabbit moon rover has endured a long lunar night but is still malfunctioning, state media said on Thursday, after technical problems last month cast uncertainty over the country's first moon landing.” • text2 = "Jade Rabbit, named after a lunar goddess in traditional Chinese mythology, landed to domestic fanfare in mid-December, on a mission to do geological surveys and hunt natural resources." • Question: (1)how many characters are there in Text1? (2)how many unique words are there in both Text1 and Text2? – google?

  22. Factors • A factor is a vector…..Will learn more later • Just show one example: • tapply(final,gender,mean), here gender is a factor; this function returns an array, • The function tapply() is used to apply a function, here mean(), to each group of components of the first argument, here final, defined by the levels of the second component, here gender, as if they were separate vector structures.

  23. A note on vector-recycling rule • Look at the following example: • X <- c(3,5,6) • Y<-1 • Z <- c(1,2,3,4,5,6) • X+Y = c(3,5,6) + c(1,1,1) • X+Z = c(3,5,6,3,5,6) + c(1,2,3,4,5,6) • In words, Shorter vectors in the expression are recycled as often as need be (perhaps fractionally) until they match the length of the longest vector.

  24. Matrices and Arrays • Construction of a matrix X = matrix(,nrow = 2,ncol=2) X[1,1] = 2 X[2,2] = 3 X = matrix(1:9,ncol=3) X = matrix(1:9,ncol=3,byrow = T) as.vector(X) ## turn a matrix to a vector c(X) ## the same as as.vector(X)

  25. Index matrices • Index matrices are used to extract information • Extract elements: X[1,3] • Extract a row: X[1,] • Extract a column: X[,2] • Extract a few rows and columns: X[c(1,2),c(3,3,2)]

  26. Higher dimensional array • We take a 3 dimensional array as an example. • It can store matrices. • Say Z = array(dim=c(3,3,2)) Z[,,1] = X1; Z[,,2] = X2; ……

  27. Operations on Matrices • Transpose: t(X) • dim(X), ncol(X),nrow(X) • Addition: X + Y • Subtraction: X-Y • Multiplication: NOT X*Y; X %*% Y • Inversion: solve(X) • diag(): investigate diag(X), diag(c(1,2,3)),diag(3)

  28. Eigenvalues and SVD • Obj = eigen(X) ## eigenvalue decomposition • Obj2 = svd(X) ## singular value decomposition Each returns a list.

  29. cbind() and rbind() • cbind() forms matrices by binding together matrices column-wise • rbind() forms matrices row-wise • Vectors are treated as matrices. • Recycling rule will be used for short vectors. • For example cbind(1,c(1,2),c(1,2,3))

  30. More comments on factors • table() return frequency tables • Examples: • tabl=tapply(gender,gender,length) • tabl2 = table(gender) • Best_scores= cut(final,breaks = c(min(final)-0.5,85,max(final)+0.5)) • Tab3 = table(Best_scores,gender)

  31. Lists • Recall that Vectors consists of an ordered collection of elements with the same basic type. • Matrices also contains elements with the same type (numeric) • A new type object called list consists of an ordered collection of any kinds of objects such as vectors, matrices, and lists……

  32. Construction of a list • list(name1 = obj1, name2 = obj2) • It is very useful to use a list to return values of a function. • For example, obj = svd(X). This obj is a list; it contains singular values and singular vectors. • Lst <- list(name="Fred", wife="Mary", no.children=3, • child.ages=c(4,7,9)) • Lst[1],Lst[[1]]??

  33. Modifying Lists • Lst$wife, Lst[[‘wife’]] • #both retrieve the value of components of the lists with name attributes `wife’ • You can also use Lst$w to denote Lst$wife if w can identify `wife”, ie. no other component name starts with `w’ • You can concatenate different lists with c() via • c(lsit1,list2,list3)

  34. Data Frames • A data frame is a special list. • It is a list of vectors of the same length. • Data frame is a list with the components arranged like a matrix – each column is one component of the list. • Some Examples:

  35. attach() and detach() • After using attach(DF), you can use each column of DF as a vector and the vector name is the column name • This way the original column in DF is protected. • After using detach(DF), all of the variable names after column names of DF will not be available.

  36. Objects • The following are all R objects: • Vectors • Matrices and Arrays • Lists • Data Frames • Factors

  37. References • http://www.r-tutor.com/r-introduction/ • cran.r-project.org/doc/manuals/R-intro.pdf‎ • http://ua.edu.au/ccs/teaching/lsr

More Related