730 likes | 857 Views
This tutorial provides an introduction to key programming concepts in R, focusing on control structures such as if-else statements, switch statements, and loops—including repeat, for, and while loops. You'll learn how to effectively use logical and comparison operators, as well as gain an understanding of how to create and apply custom functions to manipulate and analyze data. Exercises are included to reinforce learning, making it a practical guide for both beginners and intermediate R programmers.
E N D
Programming in Rcoding, debugging and optimizingKatia Oleinikkoleinik@bu.eduScientific Computing and VisualizationBoston University http://www.bu.edu/tech/research/training/tutorials/list/
if Comparison operators: == equal != not equal > (<) greater (less) >= (<=) greater (less) or equal if(condition) { command(s) } else { command(s) } Logical operators: & and | or ! not
if • ># define x • > x <- 7 • ># simple ifstatement • >if (x < 0)print("Negative") • ># simple if-else statement • >if ( x < 0 )print("Negative") elseprint("Non-negative") • [1] "Non-negative" • >#if statement may be used inside other constructions • >y <- if ( x < 0 )-1else0 • > y • [1] 0
if ># multiline if - else statement >if (x < 0 ) { +x <- x+1 +print("Add one") + } else if ( x == 0 ) { +print("Zero") + } else { +print("Positive value") + } [1] positive Note:For multiline if-statements braces are necessary even for single statement bodies. The left and right braces must be on the same linewith else keyword (in interactive session).
ifelse ifelse(test_condition, true_value, false_value) • ># ifelsestatement • >y <- ifelse(x < 0, -1, 0 ) • ># nested ifelsestatement • >y <- ifelse (x < 0, -1, ifelse (x > 0, 1, 0) )
ifelse Best of all – ifelse statement operates on vectors! • ># ifelse statement on a vector • >digits <- 0 : 9 • >(odd <- ifelse(digits %% 2 > 0, TRUE, FALSE )) • [1] FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE
ifelse • Exercise: • define a random vector ranging from -10 to 10: • x<- as.integer( runif( 10, -10, 10 ) ) • create vector y, such that its elements equal to absolute values of x • Note: normally, you would use abs() function to achieve this result
switch switch(statement, list) • ># simple switchstatement • >x <- 3 • >switch(x, 2, 4, 6, 8) • [1] 6 • >switch(x, 2, 4 )# returns NULL since there are only 2 elements in the list
switch switch(statement, name1 = str1, name2 = str2, … ) • ># switch statement with named list • >day <- "Tue" • >switch(day, Sun = 0, Mon = 1, Tue = 2, Wed = 3, …) • [1] 2 • ># switch statement with a “default” value • >food <- "meet" • >switch(food, banana="fruit", carrot="veggie", "neither") • [1] "neither"
loops There are 3 statements that provide explicit looping: - repeat - for - while Built – in constructs to control the looping: - next - break Note: Use explicit loops only if it is absolutely necessary. R has other functions for implicit looping, which will run much faster: apply(), sapply(), tapply(), and lapply().
repeat repeat { } statement causes repeated evaluation of the body until break is requested. Be careful – infinite loop may occur! ># find the greatest odd divisor of an integer >x <- 84 >repeat{ + print(x) +if( x%%2 != 0) break +x <- x/2 +} [1] 84 [1] 42 [1] 21 >
for for(object in sequence) { command(s) } ># print all words in a vector >names <- c("Sam", "Paul", "Michael") > >for( j in names ){ + print(paste("My name is" , j)) +} [1] "My name is Sam" [1] "My name is Paul" [1] "My name is Michael" >
for for(object in sequence) { command(s) if (…) next # return to the start of the loop if (…) break # exit from (innermost) loop }
while while(test_statement) { command(s) } ># find the largest odd divisor of a given number >x <- 84 >while (x %% 2 == 0){ + x <- x/2 +} >x [1] 21 >
loops • Exercise: • Using either loop statement print all the numbers from 0 to 30 divisible by 7. • Use %% - modular arithmetic operator to check divisibility.
function myFun <- function(ARG, OPT_ARGs ){ statement(s) } ARG:vector, matrix, list or a data frame OPT_ARGs:optional arguments Functions are a powerful R elements. They allows you to expand on existing functions by writing your own custom functions.
function myFun <- function(ARG, OPT_ARGs ){ statement(s) } Naming: Variable naming rules apply. Avoid usage of existing (built-in) functions Arguments: Argument list can be empty. Some (or all) of the arguments can have a default value ( arg1 = TRUE ) The argument ‘…’ can be used to allow one function to pass on argument settings to another function. Return value: The value returned by the function is the last value computed, but you can also use return() statement.
function ># simple function: calculate (x+1)2 >myFun <- function (x) { + x^2 + 2*x + 1 +} >myFun(3) [1] 16 >
function ># function with optional arguments: calculate (x+a)2 >myFun<- function (x, a=1) { + x^2 + 2*x*a + a^2 +} >myFun(3) [1] 16 >myFun(3,2) [1] 25 > ># arguments can be called using their names ( and out of order!!!) > myFun( a = 2, x = 1) [1] 9
function ># Some optional arguments can be specified as ‘…’ to pass them to another function >myFun<- function (x, … ) { +plot (x, … ) +} > ># print all the words together in one sentence >myFun<- function ( … ) { +print(paste ( … ) ) +} > myFun("Hello", " R! ") [1] "Hello R! "
function Local and global variables: All variables appearing inside a function are treated as local, except their initial value will be of that of the global (if such variable exists). ># define a function >myFun<- function (x) { +cat ("u=", u, "\n") # this variable is local ! +u<-u+1 # this will not affect the value of variable outside f() +cat ("u=", u, "\n") +} > >u <- 2 # define a variable >myFun(5) #execute the function u= 2 u= 3 > >cat("u=", u, "\n") # print the value of the variable u= 2
function Local and global variables: If you want to access the global variable – you can use the super-assignment operator <<-. You should avoid doing this!!! ># define a function >myFun<- function (x) { +cat ("u=", u, "\n") # this variable is local ! +u <<- u+1 # this WILL affect the value of variable outside f() +cat ("u=", u, "\n") +} > >u <- 2 # define a variable >myFun(u) #execute the function u= 2 u= 3 > >cat("u=", u, "\n") # print the value of the variable u= 3 >
function Call vector variables: Functions do not change their arguments. ># define a function >myFun<- function (x) { + x <- 2 + print (x) +} > >x <- 3 # assign value to x >y <- myFun(x) # call the function [1] 2 > >print(x) # print value of x [1] 3 >
function Call vector variables: If you want to change the value of the function’s argument, reassign the return value to the argument. ># define a function >myFun<- function (x) { + x <- 2 + print (x) +} > >x <- 3 # assign value to x >x <- myFun(x) # call the function [1] 2 > >print(x) # print value of x [1] 2 >
function Finding the source code: You can find the source code for any R function by printing its name without parentheses. ># get the source code of lm() function >lm function (formula, data, subset, weights, na.action, method = "qr", model = TRUE, x = FALSE, y = FALSE, qr = TRUE, singular.ok = TRUE, contrasts = NULL, offset, ...) { ret.x <- x ret.y <- y cl <- match.call() . . . z } <environment: namespace:stats> >
function Finding the source code: For generic functions there are many methods depending on the type of the argument. ># get the source code of mean() function >mean function (x, ...) UseMethod("mean") <environment: namespace:base> >
function Finding the source code: You can first explore different methods and then chose the one you need. ># get the source code of mean() function > methods("mean") [1] mean.Datemean.POSIXctmean.POSIXltmean.data.frame [5] mean.defaultmean.difftime > ># get source code > mean.default function (x, trim = 0, na.rm = FALSE, ...) { if (!is.numeric(x) && !is.complex(x) && !is.logical(x)) { . . . z } <environment: namespace:stats>
apply apply(OBJECT, MARGIN, FUNCTION, ARGs ) object:vector, matrix or a data frame margin:1 – rows, 2 – columns, c(1,2) – both function: function to apply args:possible arguments Description: Returns a vector or array or list of values obtained by applying a function to margins of an array or matrix
apply Example: Create matrix and apply different functions to its rows and columns. ># create 3x4 matrix >x <- matrix( 1:12, nrow = 3, ncol = 4) >x [,1] [,2] [,3] [,4] [1,] 1 4 7 10 [2,] 2 5 8 11 [3,] 3 6 9 12 >
apply Example: Create matrix and apply different functions to its rows and columns. ># create 3x4 matrix >x <- matrix( 1:12, nrow = 3, ncol = 4) >x [,1] [,2] [,3] [,4] [1,] 1 4 7 10 [2,] 2 5 8 11 [3,] 3 6 9 12 ># find median of each row >apply (x, 1, median) [1] 5.5 6.5 7.5 >
apply Example: Create matrix and apply different functions to its rows and columns. ># create 3x4 matrix >x <- matrix( 1:12, nrow = 3, ncol = 4) >x [,1] [,2] [,3] [,4] [1,] 1 4 7 10 [2,] 2 5 8 11 [3,] 3 6 9 12 ># find mean of each column >apply (x, 2, mean) [1] 2 5 8 11 >
apply Example: Create matrix and apply different functions to its rows and columns. ># create 3x4 matrix >x <- matrix( 1:12, nrow = 3, ncol = 4) >x [,1] [,2] [,3] [,4] [1,] 1 4 7 10 [2,] 2 5 8 11 [3,] 3 6 9 12 ># create a new matrix with values 0 or 1 for even and odd elements of x >apply (x, c(1,2), function (x) x%%2) [,1] [,2] [,3] [,4] [1,] 1 0 1 0 [2,] 0 1 0 1 [3,] 1 0 1 0 >
lapply llapply() function returns a list: lapply(X, FUN, ...) ># create a list >x <- list(a = 1:10, beta = exp(-3:3), logic = c(TRUE,FALSE,FALSE)) ># compute the list mean for each list element >lapply (x, mean) $a [1] 5.5 $beta [1] 4.535125 $logic [1] 0.3333333 >
sapply lsapply() function returns a vector or a matrix: sapply(X, FUN, ... , simplify = TRUE, USE.NAMES = TRUE) ># create a list >x <- list(a = 1:10, beta = exp(-3:3), logic = c(TRUE,FALSE,FALSE)) ># compute the list mean for each list element >sapply (x, mean) a beta logic 5.5000000 4.5351252 0.3333333 >
code sourcing source("file", … ) file:file with a source code to load (usually with extension .r ) echo: if TRUE, each expression is printed after parsing, before evaluation.
code sourcing Linux prompt katana:~ %emacsfoo_source.r & Text editor # dummy function foo<- function(x){ x+1 } R session ># load foo.r source file > source ("foo_source.r") ># create a vector > x <- c(3,5,7) ># call function > foo(x) [1] 4 6 8
code sourcing ># load foo.r source file > source ("foo_source.r", echo = TRUE) > # dummy function > foo <- function(x){ + x+1; + } ># create a vector > x <- c(3,5,7) ># call function > foo(x) [1] 4 6 8
code sourcing Exercise: - write a function that computes a logarithm of inverse of a number log(1/x) - save it in the file with .r extension - load it into your workspace - execute it - try execute it with input vector ( 2, 1, 0, -1 ).
debugging R package includes debugging tools. cat() & print() – print out the values browser() – pause the code execution and “browse” the code debug(FUN) – execute function line by line undebug(FUN) – stop debugging the function
debugging inv_log.r # dummy function inv_log<- function(x){ y <- 1/x browser() y <- log(y) } ># load foo.r source file > source ("inv_log.r", echo = TRUE) > # dummy function > inv_log<- function(x){ + y<-1/x; + browser(); + y<-log(y); + } > inv_log (x)# call function Called from: inv_log(x) Browse[1]> y # check the values of local variables [1] 0.3333333 0.5000000 1.0000000 Inf-1.0000000
debugging <RET>Go to the next statement if the function is being debugged. Continue execution if the browser was invoked. c or contContinue execution without single stepping. nExecute the next statement in the function. This works from the browser as well. whereShow the call stack. QHalt execution and jump to the top-level immediately. To view the value of a variable whose name matches one of these commands, use the print() function, e.g. print(n).
debugging inv_log.r # dummy function inv_log<- function(x){ y <- 1/x browser() y <- log(y) } ># load foo.r source file > source ("inv_log.r", echo = TRUE) > # dummy function > inv_log<- function(x){ + y<-1/x; + browser(); + y<-log(y); + } > inv_log (x)# call function Called from: inv_log(x) Browse[1]> y [1] 0.3333333 0.5000000 1.0000000 Inf-1.0000000 Browse[1]> n debug: y <- log(y) Browse[2]> Warning message: In log(y) : NaNsproduced >
debugging inv_log.r # dummy function inv_log<- function(x){ y <- 1/x y <- log(y) } ># load foo.r source file > source ("inv_log.r", echo = TRUE) > # dummy function > inv_log<- function(x){ + y<-1/x; + y<-log(y); + } > debug(inv_log)# debug mode > inv_log (x)# call function Called from: inv_log(x) debugging in: inv_log(x) debug: { y <- 1/x y <- log(y) } Browse[2]> . . . > undebug(inv_log)# exit debugging mode
timing Use system.time() functions to measure the time of execution. ># make a function > g <- function(n) { + y = vector(length=n) + for (i in 1:n) y[i]=i/(i+1) + y + }
timing Use system.time() functions to measure the time of execution. ># make a function > myFun<- function(x) { + y = vector(length=x) + for (i in 1:x) y[i]=i/(i+1) + y + } ># execute the function, measuring the time of the execution >system.time(myFun(100000) ) user system elapsed 0.107 0.002 0.109
optimization How to speed up the code?
optimization • How to speed up the code? • Use vectors !
optimization • How to speed up the code? • Use vectors ! ># using loops > g1 <- function(x) { + y = vector(length=x) + for (i in 1:x) y[i]=i/(i+1) + y + } ># using vectors > x <- (1:100000) > g2 <- function(x) { + x/(x+1) + } >
optimization • How to speed up the code? • Use vectors ! ># using loops > g1 <- function(x) { + y = vector(length=x) + for (i in 1:x) y[i]=i/(i+1) + y + } ># execute the function >system.time( g1(100000) ) user system elapsed 0.107 0.002 0.109 ># using vectors > x <- (1:100000) > g2 <- function(x) { + x/(x+1) + } ># execute the function >system.time(g2(x) ) user system elapsed 0.002 0.000 0.003
optimization • How to speed up the code? • Avoid dynamically expanding arrays