1 / 112

Data Visualization with R

Data Visualization with R. February 2019. What is R ?. R is: A programming language used for statistical computing and data visualization. Open source and freely available under the GNU General Public License. Supported by the R Project for Statistical Computing

asmiley
Download Presentation

Data Visualization with R

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Visualization with R February 2019

  2. What is R? R is: • A programming language used for statistical computing and data visualization. • Open source and freely available under the GNU General Public License. • Supported by the R Project for Statistical Computing • Download latest version of R via: • https://www.r-project.org/ • The use of R can be facilitated through the use of Rstudio. • Download Rstudio via: • https://www.rstudio.com/products/rstudio/download/

  3. Basic R Data Modes • Numeric • e.g. 3.14, -543.6, 0.2, 10., etc. • Integer • e.g. 3, -544, 0, 10, etc. • Complex • e.g. -17 + 8*i • Logical • e.g. TRUE, FALSE • Character • e.g. “Hello”, “To be or not to be”, “ljfakl;r#”, etc. While other modes of data exist, they are far less common than those listed above.

  4. Common R Objects In all of the above, integers are treated as numeric values.

  5. Some Distinguishing Characteristics • R does not use GUIs; it is entirely command-line based. • Coding is simplified through the use of products such as RStudio. • Commands consist of a function with associated arguments that define the manner in which the function should be executed. • For example, a command to create a vector • For example, a command to load a comma-delimited data file: Command Arguments c(“Larry”, “Moe”, “Curly”) Command Arguments read.csv(“bigfile.csv”, header = TRUE, sep= “,”, colClasses= c(“character”, “character”, “logical”, “numeric”))

  6. Unsure about how to use a command? • Type a question mark (?), followed by the name of the command (without any arguments defined). • For example, to learn about the “par” command, which is used for formatting various aspects of plots and other data visualizations: • ?par()

  7. Creating New Objects The output from commands can be assigned to objects. For example: stooges<- c(“Larry”, “Moe”, “Curly”) Execute this command and assign the resulting vector to an object named “stooges”.

  8. R Packages • The capabilities of R are being continually expanded and improved through the creation of “packages” • A package may be developed to: • Develop capabilities previously unavailable • Improve upon existing capabilities • Focus upon the needs of a particular user group.

  9. Take a look for yourself. • Google “list of R packages”

  10. R Packages • List of R Packages: • https://cran.r-project.org/web/packages/available_packages_by_name.html • Over 13,700 packages listed (as of 8 February 2019) • Examples of R packages: • dplyr: manipulates data by taking subsets, summarizing, rearranging and joining data sets. • tidyr: reformats layouts of data sets to make them more compatible with R. • lubridate: simplifies working with dates and times. • oce: analysis of oceanographic data. • AMR: antimicrobial resistance analysis • WDI: used for downloading World Development Indicators data from the World Bank

  11. Accessing Data Sets • Data sets may be imported from the Internet or from a computer directory. • R can import data in a wide variety of formats; some of the more common are: • Excel • CSV • TXT • SPSS • Access • SQL Server • Minitab • R also includes a set of standard data sets (for you to practice/play with) • To list available datasets, type the following command in R or Rstudio: • library(help="datasets") • A more complete description of many of these may be found on the following site: • https://stat.ethz.ch/R-manual/R-devel/library/datasets/html/00Index.html • For the examples in this presentation, we use data sets already included within R.

  12. How to Specify Elements in Objects • Element positions are often indicated using index numbers. • Index numbers start at “1” and increase from there. • Unlike some programming languages, such as Python, where indices start with “0”. • An index of “1” refers to: • The 1stelement in a vector or list • The 1st row or column of a matrix or data frame • When specifying an element in a matrix or data frame, always indicate the row followed by the column. • e.g. mtcars[7, 4] refers to the element in the 7th row and 4th column of the data object mtcars. • Colons can be used to indicate ranges of index numbers • e.g. 3:9 indicates the indices from 3 through 9. • Negative signs indicate the indices of elements to be excluded • e.g. mtcars[7, -4] indicates everything in the 7th row of mtcarsexcept what’s in the 4th column. • Names can be assigned to elements in lists or to rows and columns in dataframes. • mtcars["Valiant","hp"] will go to the row for the Plymouth Valiant and retrieve its horsepower (hp). • Mtcars$hp will retrieve the horsepower (hp) column from mtcars.

  13. Let’s try this out. • Please open up the “Rstudio” application If you don’t see 4 panes (windows), as shown here, let me know.

  14. Examples Suppose we have a list containing information about the Orlando Metropolitan Statistical Area (MSA) • Assume that the name of the list is “Orlando” Element names (elements can be named in lists, but not in vectors) Element values

  15. Examples Question: What value(s) would be retrieved from the list “Orlando” by the following command? Orlando[5] Note: R is case-sensitive. • Therefore, in this command, the “O” in “Orlando” must be capitalized. • If it were not capitalized (i.e., orlando[5]), you would get the following error statement: • Error: object 'orlando' not found

  16. Examples Question: What value(s) would be retrieved from the list “Orlando” by the following command? Orlando[5]

  17. Examples Question: What value(s) would be retrieved from the list “Orlando” by the following command? Orlando[2:4]

  18. Examples Question: What value(s) would be retrieved from the list “Orlando” by the following command? Orlando[2:4]

  19. Examples Question: What value(s) would be retrieved from the list “Orlando” by the following command? Orlando[c(1, 5, 6)]

  20. Examples Question: What value(s) would be retrieved from the list “Orlando” by the following command? Orlando[c(1, 5, 6)]

  21. Examples Question: What value(s) would be retrieved from the list “Orlando” by the following command? Orlando[c(1, 4:6)]

  22. Examples Question: What value(s) would be retrieved from the list “Orlando” by the following command? Orlando[c(1, 4:6)]

  23. Examples Question: What value(s) would be retrieved from the list “Orlando” by the following command? Orlando["population"]

  24. Examples Question: What value(s) would be retrieved from the list “Orlando” by the following command? Orlando["population"]

  25. Examples Question: What value(s) would be retrieved from the list “Orlando” by the following command? Orlando[Orlando < 25000]

  26. Examples Question: What value(s) would be retrieved from the list “Orlando” by the following command? Orlando[Orlando < 25000] Note: • When executing this command, you would get the following message: • Warning message: NAs introduced by coercion • This is due to R’s attempt to valuate a character string ( “Orlando”).

  27. Examples Assume that we have the following Data Frame, named “New_England”: Column Names Row Names

  28. Examples Question: What value(s) would be retrieved from the data frame “New_England” by the following command? New_England[3, 4]

  29. Examples Question: What value(s) would be retrieved from the data frame “New_England” by the following command? New_England[3, 4]

  30. Examples Question: What value(s) would be retrieved from the data frame “New_England” by the following command? New_England[3, 4:5]

  31. Examples Question: What value(s) would be retrieved from the data frame “New_England” by the following command? New_England[3, 4:5]

  32. Examples Question: What value(s) would be retrieved from the data frame “New_England” by the following command? New_England[3, 4:5]

  33. Examples Question: What value(s) would be retrieved from the data frame “New_England” by the following command? New_England[“Vermont”, 1]

  34. Examples Question: What value(s) would be retrieved from the data frame “New_England” by the following command? New_England[“Vermont”, 1]

  35. Examples Question: What value(s) would be retrieved from the data frame “New_England” by the following command? New_England[“Vermont”, “Population”]

  36. Examples Question: What value(s) would be retrieved from the data frame “New_England” by the following command? New_England[“Vermont”, “Population”]

  37. Examples Question: What value(s) would be retrieved from the data frame “New_England” by the following command? New_England[ , “Per_Capita_Income”]

  38. Examples Question: What value(s) would be retrieved from the data frame “New_England” by the following command? New_England[ , “Per_Capita_Income”]

  39. Examples Question: What value(s) would be retrieved from the data frame “New_England” by the following command? New_England$Square_Miles

  40. Examples Question: What value(s) would be retrieved from the data frame “New_England” by the following command? New_England$Square_Miles

  41. Taking subsets of an object Command Arguments subset(object name, criteria) Example: using a dataset named mtcars (containing data about cars), extract those records where mileage is greater than 20 mpg and either of the following is true: the engine has more than 4 cylinders or 100 hp. subset(mtcars, mpg>20 & (cyl>4 | hp >100)) Data set “&” = “and” “|” = “or”

  42. Reordering data in an object Command Arguments filename[order(sorting field, sort direction)] Example: Arrange the mtcars data set in descending order by horsepower (hp). • mtcars[order(mtcars$hp, decreasing = TRUE), ] Data set field to sort by sort in descending order

  43. The Basic Plot plot(mtcars$disp, mtcars$hp) Command Arguments X coordinates Y coordinates

  44. Jazzing up your plot Let’s start by introducing “par”. • par is a command for specifying graphical parameters • Type “par()” to get a listing of the current values assigned to all or your par settings • Explanations of par settings available at http://stat.ethz.ch/R-manual/R-devel/library/graphics/html/par.html Command Arguments Background color using Hex color code par(bg="#262a35", mar=c(5, 4, 3, 2), oma=c(0,0,0,0), col.lab="darkorange2", col.axis="darkorange2", col.main="darkorange2", font.main=2, font.lab = 2, cex.main=1.2, cex.axis=0.9, cex.lab=0.9, tck=0) Margins on bottom, left, top and right sides of plot area in number of lines of text. Outer margins on bottom, left, top and right sides of plot area in number of lines of text. Color for x- and y-axis labels (using standard R color set) Color for axis annotation (using standard R color set) Color for main title (using standard R color set) Font setting for main title (“2” indicates bold type) Font setting for x- and y-labels (“2” indicates bold type) Scaling factor for size of main title (relative to default value) Scaling factor for size of axis annotations (relative to default value) Scaling factor for size of axis annotations (relative to default value) Tick setting (“0” indicates no ticks on axes)

  45. Jazzing up your plot Now, create a blank plot using data from the mtcars data set (contains specifications for various car models) Command Arguments plot(0, xlim=c(min(mtcars$disp), max(mtcars$disp)), ylim=c(min(mtcars$hp), max(mtcars$hp)), type="n", bty="n", las=1, main="Power as a Function of Displacement", xlab="Displacement", ylab="Power(hp)", asp=1/2) Don’t plot anything Set the limits of the x axis to the minimum and maximum of the disp field Set the limits of the y axis to the minimum and maximum of the hp field Set type of plot (“n” indicates that no data will be plotted) Style of box to draw around plot (“n” indicates no box) Orientation of axis labels (“1” indicates labels parallel to axes Main title Label for x-axis Label for y-axis Y-to-x aspect ratio

  46. So, what do we have so far?

  47. Add the data points and a legend Command Arguments points(mtcars$disp, mtcars$hp, pch=mtcars$cyl, col="green") Where to find the x coordinates Where to find the y coordinates Indicate different symbols for different numbers of cylinders Indicate color for symbols legend(110, 380, pch=c(4,6,8), col="green", legend = c("4 cylinders", "6 cylinders","8 cylinders"), bg ="azure4", text.col = "white") X-coordinate for upper left corner of legend box Y-coordinate for upper left corner of legend box Indicate the different types of symbols on the plot Indicate the color for the symbols Provide the descriptions for the symbols Indicate the background color for the legend box Indicate the color for the text

  48. …and a trend line would be nice First, create a linear model (lm) based upon the data in the plot. Object being created Command Arguments trend<- lm(hp~disp, data=mtcars) Indicate that power (hp) is a function of displacement (disp) Indicate the data set being used Then, add the line to the plot. Command Arguments abline(trend, lty=2, col="ghostwhite") Indicate what is being added Line type (“2” corresponds to a dashed line) Indicate line color of the line

  49. Seems like a good time for an Updated Figure

  50. Just for fun, show the cars with the highest power and displacement Start by identifying the desired car models: Command Arguments subset(mtcars,hp==max(hp)|disp==max(disp)) Take a subset of mtcars for which hpmatches the maximum hp valueordispmatches the maximum disp value The result:

More Related