1 / 52

Using R in a Microsoft Office World (aka: hostile environment) Harold Baize, PhD SF-DPH

Using R in a Microsoft Office World (aka: hostile environment) Harold Baize, PhD SF-DPH. First: Advice to the R Newbie. Use the assignment arrow! <- Directional and not ambiguous Clearly shows an object created or modified X = Y X <- Y Y -> X

zalika
Download Presentation

Using R in a Microsoft Office World (aka: hostile environment) Harold Baize, PhD SF-DPH

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Using R in a Microsoft Office World (aka: hostile environment) Harold Baize, PhD SF-DPH

  2. First: Advice to the R Newbie • Use the assignment arrow! <- Directional and not ambiguous Clearly shows an object created or modified X = Y X <- Y Y -> X • Use caution with attach()!! * Does not work like SPSS or SAS! * Data manipulations are not part of the data frame

  3. Attach Headache An Introduction to Statistical Learning: with Applications in R James, Witten, Hastie and Tibshirani (2013) “To refer to a variable, we must type the data set and the variable name joined with a $ symbol. Alternatively, we can use the attach() function in order to tell R to make the variables in this data frame available by name.” Google R Style Guide: “Attach The possibilities for creating errors when using attach are numerous. Avoid it.”

  4. Attach Headache R is fundamentally different from SAS and SPSS, you’re not restricted to one dataset at a time! var1 <- gear * carb #Error: object 'gear' not foundattach(mtcars) var2 <- gear + carb mtcars$var3 <- gear + carb

  5. There are now twomtcars, the one attached was not changed! mtcars$var4 <-mtcars$cyl + mtcars$var3Formula notation lets you skip $Attach not necessary with formula notation (data = )

  6. REPRODUCIBILITY REPRODUCIBILITY REPRODUCIBILITY https://www.scienceexchange.com/reproducibility

  7. Ewww… It’s sorta GUI

  8. R in a Microsoft Office World • IT does not trust open source software • Your supervisors don’t trust open source software(but want to save $) • Everyone else only knows MS-Office (you might get hit by a bus) • Management only knows Excel (and they think its great!) • You have to be able to share data between R and MS-Office apps • Need to publish results in MS-Office formats but want reproducibility

  9. R in a Microsoft Office World Two basic strategies: • Work in MS Office with R plug-ins to execute R from within the app a. not open source b. not easily reproducible due to GUI • Work in R and use packages to tailor output for MS Office

  10. MS Office Plug-ins • statconnDCOMServer: server that allows integration of Rinto other applications. • rscproxy: R package required for rcom and for the statconn(D)COM Server. • rcom: R package for R and other apps to communicate • RExcel: embeds Rin Excel • SWord: embeds Rin Wordmodified from:http://www.londonr.org/Presentations/May'10%20Markus%20Gesmann.pdf

  11. Working in R:Data Exchange • ODBC • CSV files • Excel

  12. RODBC • Use RODBCto perform SQL queries • ODBC drivers need to be set up in your operating system. • Once set up, you use the Data Source Name (dsn) as a connection

  13. Control Panel > Administrative Tools >Data Sources (ODBC)

  14. Enter a name • Description optional • Select server from drop down menu • Defaults should work for the rest…

  15. library(RODBC) AvDW<-odbcConnect(dsn = "Avatar_Reports") AvLv<-odbcConnect( dsn = "AVCALPMLIVE", believeNRows = FALSE) ## , uid="User", pwd="password")

  16. X<-sqlQuery(AvDW,"select * from tableName") • Returns a data frame or SQL error message to the data object, no Rerror until you use X • Dates are returned in POSIXct format • Text fields are returned as factors, unless you specify stringsAsFactors = FALSEsqlSave(OpenChannel, dfName, append = FALSE)

  17. StDate<- "2012-07-01" EnDate<-"2013-06-30"idEpFY<-sqlQuery(AvDW, paste( "select distinct id, epi_num from billing_tx_history where date_of_service Between {d '",StDate,"'} and {d '",EnDate,"'} and rrg= 'Mental Health'")) odbcClose(AvDW) # {d '2012-07-01'}

  18. Access library(RODBC) Ac <-odbcConnectAccess("fileName.mdb") ## For newer versions of Access, 2007 or later... "accdb" ## dataObj <- odbcConnectAccess2007("fileName.accdb") ## In this example "Demos_FY" is a table in an Access database ## If the table name contains underscore, then the ## table name needs to be in back tick quote` DemoDat<-sqlQuery(Ac, "select * from `Demos_FY`")

  19. CSV • Comma Separated Values (in plain text) • Almost a universal format… ALMOST! • Microsoft opens CSV files with Excel by default ## To read a CSV fileAnx<-read.csv("ANexx.csv") str(Anx)## Returns dates as factor Anx$disDate<-as.Date(Anx$disDate, "%m/%d/%Y") ## To write CSV write.csv(Anx, file = "Ane.csv", row.names= FALSE)

  20. ## If you don’t use ## row.names= FALSEstr(Anx) 'data.frame': 2515 obs. of 23 variables: $ X : int 1 2 3 4 5 6 7 8 ... $ X.3 : int 1 2 3 4 5 6 7 8 ... $ X.2 : int 1 2 3 4 5 6 7 8 ... $ X.1 : int 1 2 3 4 5 6 7 8 ... $ program_code : chr "88121" "8919F1"...

  21. Excel CSV “Feature” Microsoft wants to be sure your data looks good, so….Leading zeroes are removed!program_code00201 becomes 201Not an option, it’s automatic.Only work around in Excel is use GUI and open as “text”Note: This problem can occur in R too, although not as common. The solution in R is to use the colClasses option to define data types with read.csv().

  22. Excel Package RODBChas a function odbcConnectExcel() Package gdataread.xls()and read.xlsx() Requires separate installation of pearl executable.Packages xlsx and XLConnect both use Java run time to parse MS Office “Open XML” format. They can both read and write. Best package to use to read and write Excel is…

  23. XLConnect • Uses Java runtime installed on mostcomputers • Multiplatform • Provided by Mirai Solutions GmbH • Can write Excel files with formatting

  24. Read Excel library(XLConnect) prg<-readWorksheetFromFile( "ExcelFile.xls", sheet = 1) ## Other arguments:## startRow= 1, startCol = 5,endRow= 1, endCol = 5 1 2 3 4## sr,sc,er,ec

  25. Excel file as often misused

  26. ## Create data frame from multiple Excel files## Read file names from the directory filenames <-list.files("./XLfiles", pattern = "*.xls", full.names= TRUE) lpd<-lapply(filenames, function(x) {cbind(readWorksheetFromFile(x,1,13, 1,75, 6), # main data readWorksheetFromFile(x,1, 1, 5, 1, 5, header = FALSE), # prog code sr 1,sc 5,er 1,ec 5 readWorksheetFromFile(x,1, 3, 5, 3, 5, header = FALSE), # program name readWorksheetFromFile(x,1, 5,13, 5,13, header = FALSE), # Date Submitted readWorksheetFromFile(x,1, 2,13, 2,13, header = FALSE))# FY }) ## Change the list object into one data frame ddf<-do.call(rbind,lpd) ## Give names to the columnsnames(ddf)<- c("A","B","LFname","Hours","Worked", "ClinHrsWk","prog","progN","DateSub","FY")

  27. Write Excel Basically the same as reading… To create a new Excel file from a data frame: writeWorksheetToFile("Name.xls", data = dfName, sheet = "Some Name",startRow= 3, startCol= 4)

  28. Add a sheet to an existing Excel file, put your data frame (dfName) into it. wb<-loadWorkbook("xlFile.xls") createSheet(wb, name = "Sheet Name") writeWorksheet(wb, dfName, sheet = "Sheet Name") saveWorkbook(wb)

  29. Reports and Publishing • Export data to Crystal Reports • Tableau integration • WordR2DOCX Markdown to Word (knitr)xtablesjPlot

  30. Tableau 8.1 integration involves installing the package Rserve which establishes connection to Tableau. • Short R scripts can be executed in one line • Tableau functions return either integer, real, string or boolean data type • R code is the part between red single quotes below. SCRIPT_INT('kmeans(data.frame(.arg1,.arg2,.arg3,.arg4),3)$cluster;',SUM([Petal length]), SUM([Petal width]),SUM([Sepal length]),SUM([Sepal width]))

  31. R2DOCX R2DOCX is a package on GitHub that allows you to write Word documents directly in R by using functions! library(R2DOCX) ## Create a document new object doc <-new("Docx", title = "My example", basefile = template.file) doc <-addParagraph(doc, stylename = "Normal", value = paste("This is a rather short paragraph. ", RObj," Not a lot of room on a PowerPoint screen."))

  32. Markdown to Word • Use knitr and knit2html() • Use Pandoc, or… A simple trick allows you to open HTML documents in Word • You can compose reproducible reports in R as markdown documents with plots and tables • Plots can be created as any device including Windows Metafile • Tables can be formatted using packages xtable or sjPlot (but not both)

  33. Markdown: A simplified # Heading ## Sub _italic_ __bold__ Heading ======= Sub-heading ----------- Paragraphs are separated by a blank line. Text attributes *italic*, **bold**, `monospace`. A [link](http://example.com). Shopping list: * apples * oranges Numbered list: 1. apples 2. oranges The rain---not the reign---in Spain.

  34. <h1>Heading</h1> <h2>Sub-heading</h2> <p>Paragraphs are separated by a blank line.</p> <p>Text attributes <em>italic</em>, <strong>bold</strong>, <code>monospace</code>.</p> <p>A <ahref="http://example.com">link</a>.</p> <p>Shopping list:</p> <ul> <li>apples</li> <li>oranges</li> <li>pears</li> </ul> <p>Numbered list:</p> <ol> <li>apples</li> <li>oranges</li> <li>pears</li> </ol> <p>The rain&mdash;not the reign&mdash;in Spain.</p>

  35. R in Markdown ## Code chunks A _paragraph_ here. A code chunk below ```{r} 1+1 ## defaults: echo=TRUE, include=TRUE .4-.7+.3 ## Use trunc() or round() ``` ## Inline code The value of pi is **`r pi`**, and 3 times pi is **`r sprintf("%.2f",3*pi)`**.

  36. HTML to Word: Pandoc

  37. HTML to Word library(knitr) knit2html("RMD_Web.rmd") Creates RMD_Web.html and RMD_Web.md How do you turn it into a Word file? file.copy("fileName.html", "fileName.doc") (not fileName.docx)

  38. Change Windows to show file extensionsOrganize > Folder and search options

  39. HTML to Word knit2html("RMD_Word.rmd", options = c('mathjax','smartypants', 'use_xhtml','highlight_code')) Rename RMD_Word.html as RMD_Word.docDouble click, it opens in Word as a web page, not text Change the view from web page… View > Print Layout Save as “docx” it is now a Word document!

  40. Plots and Images • By default knitr renders plots as PNG and stores in sub-folder “figure” • Default is to embed plots in the HTML as base 64 encryption. Word does not read that. If you render plots directly to output, then you need to turn off the base_64images option.

  41. More control by rendering plots using R device (PNG, JPEG, PDF…) then reading in the file • If you render image files then read them in, you don’t need to change options to turn off base_64images. • To use in Word, render plots using win.metafile() as device, then read in the file. ![plotName](plotName.wmf)

  42. Generate image file in R chunk, read it into document in markdown ```{r, echo=FALSE, include=FALSE} win.metafile("wmDense.wmf", width = 6, height = 4) densityplot(~mpg|as.factor(cyl), main = "Density Plot by Cylinders", xlab = "Miles per Gallon ", data = mtcars) dev.off() ``` ![wmhist](wmDense.wmf)

  43. xtable • Nice plots are not enough. You need to be able to print well formatted results and tables. • The crude text of R output won’t do in a publication. • knit2html() produces HTML • Tags can be used in your text! (i.e. Greek: &beta;prints out β) • xtableprints tables and data frames as HTML directly in the text

  44. xtable **Regression on Income** ```{r, echo = FALSE, results = 'asis'} reg <- lm(income ~ age + IQ, data = donner) rxl <- summary(reg) print(xtable(rxl), type = "HTML") ``` Residual standard error `r sprintf("%.1f", rxl$sigma)` on `r rxl$df[2]` degrees of freedom.

  45. Another Example of In Line Statistics **Chi Square** Statistical test from a cross tabulation. **&chi;^2** = `r xt$chisq$statistic`, df = `r xt$chisq$parameter`, *p* = `r xt$chisq$p.value`

  46. sjPlot • sj is for the blog “StrengeJacke!” German title: “Strict Jacket, the majority do not want to hear the Daniel.” (Daniel Lüdecke) • A Package for plots and to generate well formatted output for linear models, frequency tables, and cross tabs similar to SPSS output. • Originally designed to produce stand alone web pages or output to R Studio viewer window. • Should work with knitr…

  47. sjPlot • Use sjPlot functions in markdown in-line R statements • Internal objects of function page.styleand page.contentprovide the HTML • Double in-line calls to sjPlot functions • Caused conflicts with CSS of the web page • Automatically opened web browser • Not well suited for knitr use…

  48. sjPlot • So… • Comments on Daniel’s blog page • New version of sjPlot to be more knitr friendly  • Still using in line R in markdown but… • Embedded CSS • Option for no output to browser • knitr object added, only need one function

  49. sjPlot table functions: sjt.corr, sjt.df, sjt.frq, sjt.glm, sjt.xtab, sjt.pca, sjt.stackfrq ```{r echo = FALSE}library(sjPlot) ## One line ````r sjt.frq(efc$c160age, no.output = TRUE, variableLabels = list("Carer's Age"), autoGroupAt = 10)$knitr`

More Related