1 / 19

R Data Import/Export

R Data Import/Export. Dr. Jieh -Shan George YEH jsyeh@pu.edu.tw. Save and Load R Data. Data in R can be saved as . Rdata files with function save(). getwd () setwd ("c:\temp ") a <- 1:10 save(a , file =" dumData.Rdata ") rm (a) load(" dumData.Rdata ") print(a ).

Download Presentation

R Data Import/Export

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. R Data Import/Export Dr. Jieh-Shan George YEH jsyeh@pu.edu.tw

  2. Save and Load R Data • Data in R can be saved as .Rdatafiles with function save(). getwd() setwd("c:\\temp") a <- 1:10 save(a, file="dumData.Rdata") rm(a) load("dumData.Rdata") print(a)

  3. Fixed-width-format files cat("2 3 5 7", "11 13 17 19", file="ex1.data", sep="\n") scan(file="ex1.data", what=list(x=0, y="", z=0), flush=TRUE) cat("TITLE extra line", "2 3 5 7", "11 13 17", file = "ex2.data", sep = "\n") pp <- scan("ex2.data", skip = 1, quiet = TRUE) scan("ex2.data", skip = 1) scan("ex2.data", skip = 1, nlines = 1) # only 1 line after the skipped one pp2<-scan("ex2.data", what = list("","","")) # flush is F -> read "7" pp3<-scan("ex2.data", what = list("","",""), flush = TRUE) unlink("ex2.data") # unlink deletes the file

  4. Import from and Export to .CSV Files • Create a dataframe df1 and save it as a .CSV le with write.csv(). • The dataframe is loaded from file to df2 with read.csv() var1 <- 1:5 var2 <- (1:5) / 10 var3 <- c("R", "and", "Data Mining", "Examples", "Case Studies") df1 <- data.frame(var1, var2, var3) names(df1) <- c("VariableInt", "VariableReal", "VariableChar") write.csv(df1, "dummmyData.csv", row.names = FALSE) df2 <- read.csv("dummmyData.csv") print(df2)

  5. Scan • One common use of scan is to read in a large matrix. Suppose file matrix.dat just contains the numbers for a 200 x 2000 matrix. • Then we can use A <- matrix(scan("matrix.dat", n = 200*2000), 200, 2000, byrow = TRUE) On one test this took 1 second (under Linux, 3 seconds under Windows on the same machine) Whereas A <- as.matrix(read.table("matrix.dat")) took 10 seconds (and more memory), and A <- as.matrix(read.table("matrix.dat", header = FALSE, nrows = 200, comment.char = "", colClasses = "numeric")) took 7 seconds.

  6. Note that timings can depend on the type read and the data. writeLines(as.character((1+1e6):2e6), "ints.dat") xi <- scan("ints.dat", what=integer(0), n=1e6) # 0.77s xn <- scan("ints.dat", what=numeric(0), n=1e6) # 0.93s xc <- scan("ints.dat", what=character(0), n=1e6) # 0.85s xf <- as.factor(xc) # 2.2s DF <- read.table("ints.dat") # 4.5s

  7. code <- c("LMH", "SJC", "CHCH", "SPC", "SOM") writeLines(sample(code, 1e6, replace=TRUE), "code.dat") y <- scan("code.dat", what=character(0), n=1e6) # 0.44s yf <- as.factor(y) # 0.21s DF <- read.table("code.dat") # 4.9s

  8. zz <- read.csv("mr.csv", strip.white = TRUE) zzz <- cbind(zz[gl(nrow(zz), 1, 4*nrow(zz)), 1:2], stack(zz[, 3:6]))

  9. read.table • HousePrice <- read.table("houses.data") • HousePrice <- read.table("houses.data", header=TRUE)

  10. scan() function • inp <- scan("input.dat", list("",0,0)) • inp <- scan("input.dat", list(id="", x=0, y=0)) • X <- matrix(scan("light.dat", 0), ncol=5, byrow=TRUE)

  11. built in datasets

  12. Accessing built in datasets • Around 100 datasets are supplied with R (in package datasets) data() data(infert) • To access data from a particular package, use the package argument data(package="rpart") data(Puromycin, package="datasets")

  13. Editing data • This is useful for making small changes once a data set has been read. The command data(car90, package="rpart") xnew<- edit(car90) • If you want to alter the original dataset xold, the simplest way is to use fix(xold), • which is equivalent to xold <- edit(xold). • to enter new data via the spreadsheet interface. xnew<- edit(data.frame())

  14. Package ‘xlsx’

  15. Package ‘xlsx’ • http://cran.r-project.org/web/packages/xlsx/xlsx.pdf install.packages("xlsx") require(xlsx) # example of reading xlsx sheets file <- system.file("tests", "test_import.xlsx", package = "xlsx") res <- read.xlsx(file, 2) # read the second sheet # example of writing xlsx sheets file <- paste(tempfile(), "xlsx", sep=".") write.xlsx(USArrests, file=file)#This data set contains statistics, in arrests per 100,000 residents for assault, murder, and rape in each of the 50 US states in 1973. Also given is the percent of the population living in urban areas. res <- read.xlsx("mydata.xlsx", 1, encoding="utf-8") # read the sheet1

  16. Output to connections zz <- file("ex.data", "w") # open an output file connection cat("TITLE extra line", "2 3 5 7", "", "11 13 17", file = zz, sep = "\n") cat("One more line\n", file = zz) close(zz)

  17. Output to connections ## capture R output: use examples from help(lm) zz <- textConnection("ex.lm.out", "w") sink(zz) example(lm, prompt.prefix= "> ") sink() close(zz) ## now ‘ex.lm.out’ contains the output for futher processing. ## Look at it by, e.g., cat(ex.lm.out, sep = "\n")

  18. Input from connections ## read in file created in last examples readLines("ex.data") unlink("ex.data") ## read listing of current directory (Unix) readLines(pipe("ls -1")) ## read listing of current directory (windows) readLines(pipe(“dir"))

More Related