1 / 15

Intermediate Topics in R

NYU DataServices. Intermediate Topics in R. Last Updated: 2013-11-22. Data Services. Location: 5th Floor Research Commons, Bobst Library Appointment: Email ( data.services@nyu.edu ) or Walk-in Website : http :// nyu.libguides.com / dataservices. Outline.

serena
Download Presentation

Intermediate Topics in R

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. NYUDataServices Intermediate Topics in R Last Updated: 2013-11-22

  2. Data Services • Location: 5th Floor Research Commons, Bobst Library • Appointment: Email (data.services@nyu.edu) or Walk-in • Website: http://nyu.libguides.com/dataservices

  3. Outline • Review of Objects and More • Select Data Management Topics • Apply Functions • Writing Functions • Conditional Statements • For Loops • While Loops • A Practical Example • Additional Help

  4. Objects and Indices • Objects are symbolic names that represent some form of information, ranging from data sets to models to charts and essentially anything else. • Elements within vector objects can be referenced via “[ ]”: > v1 <- c(“a”,”b”,”c”) > v1[2] [1] “b” • Data frames are the class of object used for tabular datasets and both indices and row/column names can be used to reference elements: > <data frame>[<row indices/name>,<column indices/names>]

  5. Classes and Types • Objects have a type and class where classes are specific instances of types. The type and class of an object can be checked with the typeof() and class() functions. • One commonly used type is “list”, for instance the classes “data frame” and “lm” are both lists. Lists are particularly powerful because they can contain multiple named objects with varying classes. • Objects with different classes may be treated differently when passing them into generic functions. For instance the function summary() will actually call different versions depending on the class of its argument: summary.lm() : "objects of class “lm” summary.table() : "objects of class “table”

  6. Selecting and Sorting Data • Many functions, such as which()and grep(), return a vector of indices where a logical statement is true and can be used to select cases within other objects. For example the subsets can be created via: <data frame>[which(<logical expression>),] • The same concept applies to sorting data frames. The sort() function can be used to sort a vector but it will not sort an entire data frames. However the order() function returns the indices of a sorted vector, which can then be used to reorder all elements within a data frame: <data frame>[order(<object to sort by>),]

  7. Reshaping, Merging, and SQL • There are many procedures for data cleaning that are beyond simply computing or recoding variables, for example: • Panel data can be converted transformed to/from long format to/from wide format via the reshape()function (or improved “reshape2” package). • Datasets can be combined using many functions like merge(), rbind()and cbind(). • R also offers many other ways to manipulate data. For example the “sqldf” package allows R objects to be manipulated, queried and created via SQL.

  8. Apply function & Aggregates • The suite of “apply” functions, like apply(), lapply(), sapply() and tapply(), allow a function to be applied to elements of a data structure. For example: • apply(df,1,mean) – returns a vector containing the mean of every row in the data frame “df”. • lapply(df,class)– returns a list containing the result of applying the class() function to each column in the data frame “df”. • Similarly the by(), aggregate()and ave() functions provide a slightly more user friendly but less flexible method to apply functions to grouped data. • Functions in the “plyr” function can also simplify make complex aggregates and grouping tasks.

  9. Writing functions • Functions are incredibly useful, offering huge benefits for code reuse with abstraction. • Basic form: <function name> = function(<arg.1>,…,<arg.n>) { <block of code> return(<object to return>) } • R is statically scoped, which means that locally defined objects will temporarily mask global objects and changes to global objects will not be saved. Default arguments can be assigned in the function itself, such as function(a,b=3) – ie: “b” will be 3 unless specified otherwise.

  10. If, else if and else Statements • If statements are an essential method of control flow, however they are rarely used in R outside of loops and/or functions. • Basic form: if(<condition>) { <block of code> } • How they work: if the logical expression - denoted by <condition> above - is TRUE then the block of code in braces is executed, otherwise the block is ignored. • else if (<condition>) {<block of code>} and else {<block of code>} statements can also be used to avoid excessive indentation.

  11. For Loops • “Loop” over each element in a vector doing the same block of code. • Why this useful? : Now the same task can be done “N” times without any manual work or copying + pasting. • Basic Form: for(<object name> in <set> ) { <block of code> } • 1. Assign the first element in the set to the object name. 2. Execute the block of code. 3. If the set contains at least one more element then assign the next one to the object name and go to 2. Otherwise terminate the loop.

  12. Apply Functions (Again!) • Apply functions can be used in place of for loops as long as the results of the current iteration in a loop do not depend on the previous iteration of a loop. • How does this work? : Write a function to do your “block of code” and then apply that function to your set. • Why use an apply function instead of a loop? : It may be easier to understand and is more “R-like”.

  13. While Loops • Do a loop “while” a logical condition is true. • Why use this instead of a for loop or apply function? : Sometimes the number of iterations is unknown or abstract, ex: loop until an algorithm converges. • Note: Infinite loops are possible! • Basic Form: while(<condition>) { <block of code> } • 1. Evaluate the logical statement. If true go to 2, otherwise go to 3. 2. Execute the block of code then go to 1. 3. Ignore the block of the code and terminate the loop.

  14. Evaluation • Please help us improve this session and others by filling out the following brief anonymous survey: http://tinyurl.com/IntermediateR

  15. References • Modern Applied Statistics with S-PLUS, 2nd Edition, W.N. Venables & B.D. Repley • Linear Models with R, Julian J. Faraway • Quick-R : http://www.statmethods.net • UCLA idre : http://www.ats.ucla.edu/stat/

More Related