html5-img
1 / 9

Basic Data Input

Basic Data Input. To get started, you can give students binary data already in the R format. s ave() one or more R objects to a file (with . rda extension) Put it on a Web site. Students use load() to read the data into an R session directly

lenore
Download Presentation

Basic Data Input

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Basic Data Input

  2. To get started, you can give students binary data already in the R format. • save() one or more R objects to a file (with .rda extension) • Put it on a Web site. • Students use load() to read the data into an R session directly • load(url(“http://eeyore.ucdavis.edu/ESR2010/bayAreaHousing.rda”)) • Note the use or url() – it is an example of a “connection”, a stream of bytes that come from “somewhere”, in this case a URL, but could be a file, another program outputting data, a character string.

  3. Reading ASCII data • Have to know how to read standard rectangular data • tab separated, comma-separated, etc. • R has functions for this, i.e. • read. table(), read.csv(), etc. • read.fwf() for fixed width format. • For efficiency reasons, very beneficial to use colClasses parameter to specify target type. • But there are lots of issues.

  4. Strings or factors • Common “gotcha” • For better or worse, by default, R turns strings in rectangular data read from an ASCII file into factor objects. • Use stringsAsFactors = FALSE

  5. Problems in reading • Quote characters • Missing values • Character Encoding • Comment characters

  6. Interactive code • read.table("~/problemData2", quote = "", comment.char = "", fill = TRUE)

  7. Accessing files - Paths • Students need to know about working directories (getwd() & setwd()) • This is where the R session is “rooted” • all relative file names are relative to this directory. • Students need to recognize that their code will not work if they move files, change directories, etc. • i.e. their code is not runnable and so we cannot help fix things. • Using URLs makes things universally locatable.

  8. Binary data • R can read binary data. • But one has to read the bytes and interpret them based on the actual known format of the data, e.g. • read 2 integers • then followed by n real numbers where n is the value of the second of the first two integers read, … • Students should not necessarily deal with this, but be aware of the existence of different binary formats & why they are used (compact representation)

  9. Non-standard data input • 3 problems: • Sample observations from a huge ASCII file w/o reading the whole file • Multiple data frames in a single CSV file. • ragged data# timestamp=2006-02-11 08:31:58# usec=250# minReadings=110t=1139643118358;id=00:02:2D:21:0F:33;pos=0.0,0.0,0.0;degree=0.0;00:14:bf:b1:97:8a=-38,2437000000,3;00:14:bf:b1:97:90=-56,2427000000,3;00:0f:a3:39:e1:c0=-53,2462000000,3;00:14:bf:b1:97:8d=-65,2442000000,3;

More Related