Henrik Bengtsson hb@maths.lth.se Mathematical Statistics, Centre for Mathematical Sciences

1 / 22

# Henrik Bengtsson hb@maths.lth.se Mathematical Statistics, Centre for Mathematical Sciences - PowerPoint PPT Presentation

## Henrik Bengtsson hb@maths.lth.se Mathematical Statistics, Centre for Mathematical Sciences

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
##### Presentation Transcript

1. The R.oo Package – Object-Oriented Programming With References Using Standard R Code Henrik Bengtsson hb@maths.lth.se Mathematical Statistics, Centre for Mathematical Sciences Lund University, Sweden DSC-2003, Vienna. March 20-22, 2003

2. Outline • Purpose and what the package is and is not. • RCC: R Coding Conventions (draft). • Reference variables. • The root class Object. • setMethodS3() & setConstructorS3(). • Rdoc comments. • Static methods. • Virtual fields. • trycatch() - exception handling based on class. For slides etc: http://www.maths.lth.se/help/R/

3. Purposes • End user (the most important person at the end of the day!) • Provide consistent object-oriented APIs across different packages, e.g. by having a well defined naming convention for classes, methods, fields and variables. • Make class inheritance more explicit. • Provide a simpler API, e.g. less arguments. • More memory efficient packages. • Developer / programmer • Provide reference variables to reduce memory req.'s and data redundancy. • R Coding Convention, e.g. naming conventions. • Create generic functions automatically. • Make code cleaner and remove the need for tedious code repetitions. • Minimize the risk for package conflicts. • More code checking when creating methods and classes to catch errors early on. • Catch rare but “classical” bugs, e.g. using reserved words in method names. • Make help pages more up to date with the source code by allowing Rd document to be placed together with the code in the source files. For slides etc: http://www.maths.lth.se/help/R/

4. Real world example # Read all GenePix Result files gpr <- MicroarrayData$read(pattern=“*.gpr”) # Extract the foreground & background signals of the red and # the green channels. The slide layout is also included. raw <- as.RawData(gpr) # Get the background corrected signal as M=log(R/G) and A=log(RG)/2. ma <- getSignal(raw, bgSubtract=TRUE) normalizeWithinSlide(ma, method=“p”) # print-tip normalization. knownGenes <- c(50,194,3433,5541,6384) plot(ma); highlight(ma, knownGenes) # highlights the data points from the plotPrintorder(ma); highlight(ma, knownGenes) # correct slide in the correct space. plotSpatial(ma); highlight(ma, knownGenes) plotSpatial3d(gpr, field=“area”, col=getColor(ma)) # Write the normalized data to a tab-delimited file write(ma, “NormalizedExpressions.dat”) For slides etc: http://www.maths.lth.se/help/R/ 5. What the package is and isn’t • It is not supposed to replace S3 or S4, but • it is an extra layer on top of S3 (eventually S4), to • move the focus from S3 & S4 details to object-oriented design and implementation. • It has been tested and verified for > 2 years! For slides etc: http://www.maths.lth.se/help/R/ 6. RCC: R Coding Conventions (draft) http://www.maths.lth.se/help/R/RCC/ • Standardizes the coding style • Example of the naming conventions: • Variables, objects, fields and methods should verbs starting with a lower case letter, e.g. shape$side and normalize(). • Classes should be nouns starting with an upper case letter, e.g. MicroarrayData. • Constants should be in all upper case, e.g. Colors$RED.HUE. • Similar to Java. • Standards • make the code (and the design) easier to read, share and maintain. • reduce the risk for bugs and misunderstandings. For slides etc: http://www.maths.lth.se/help/R/ 7. Reference variables • Memory efficient. • Minimizes the amount redundant data. • Very useful for some data structures, e.g. graphs. • References in R.oo are implemented using the environment data type. • Collected by the R garbage collector. • (More user friendly methods interfaces since methods can “communicate” with each other by updating the state of the object.) For slides etc: http://www.maths.lth.se/help/R/ 8. Object$(name): ANY $<-(name, value)[[(name): ANY [[<-(name, value)as.character(): characterattach(private=FALSE, pos=2) clone(): Object data.class(): character detach() equals(other): logical extend(this, ...className, ...): Object finalize() getFields(private=FALSE): character[] hashCode(): integer ll(...): data.frame static load(file): Object objectSize(): integer print() save(file=NULL, ...) A common root class: Object • All classes should have the common root class Object. • A similar idea exists in R today, e.g. print(), as.character() etc, but a common root class makes it more explicit. For slides etc: http://www.maths.lth.se/help/R/ 9. com.braju.sma R.oo R.graphics GenePixData MicroarrayData BitmapImage Object ImaGeneData RawData MonochromeImage Exception R.io QuantArrayData RGData GrayImage RccViolationException File ScanAlyzeData MAData RGBImage FileFilter SpotData Reporter TMAData SpotFinderData Color Device HtmlReporter Layout LaTeXReporter GalLayout TextReporter RspEngine MultiReporter Object – the common root class For slides etc: http://www.maths.lth.se/help/R/ 10. Object$(name): ANY $<-(name, value)[[(name): ANY [[<-(name, value)as.character(): characterattach(private=FALSE, pos=2) clone(): Object data.class(): character detach() equals(other): logical extend(this, ...className, ...): Object finalize() getFields(private=FALSE): character[] hashCode(): integer ll(...): data.frame static load(file): Object objectSize(): integer print() save(file=NULL, ...) A common root class: Object • All classes should have the common root class Object. • A similar idea exists in R today, e.g. print(), as.character() etc, but a common root class makes it more explicit. • Fields of an Object can be accessed as elements of a list, e.g.: • square$sideand • square[[“side”]] <- 23 • Methods can also be called as • square$getArea() • The implementation of reference variables is taken care of within the Object class. Under the hood, we roughly have: ”$.Object” <- function(object, name) { get(name, envir=object$env) } ”$<-.Object” <- function(object, name, value) { assign(name, value, envir=object$env) } For slides etc: http://www.maths.lth.se/help/R/ 11. Does not require the Object class setMethodS3() • Defines a method of a class. • Creates a generic function automatically iff missing. • RCC: • Methods should start with a lower case letter. • Asserts that a correct method name is used; reserved words and names of basic functions that must not be overwritten or redefined are protected. setMethodS3(“plotPrintorder”, “MAData”, function(object, ...) { ... }) setMethodS3(“next”, “Iterator”, function(object, ...) { ... }) Error: [2003-03-18 16:28:00] RccViolationException: Method names must not be same as a reserved keyword in R: next, cf. http://www.maths.lth.se/help/R/RCC/ For slides etc: http://www.maths.lth.se/help/R/ 12. Problems with generic functions • Hard to check if function (generic or not) already exists. • Ad hoc solutions for creating generic function “automatically”. • Under the S3 schema, it is possible to create generic functions that are truly generic:normalize <- function(...) UseMethod(“normalize”)Note that the first argument is omitted. If not, it would be impossible to have default functions with no arguments, e.g.search(). • The R.oo package automatically creates generic functions as above. • We are not aware of how to do the same in S4 (this is the main reason for why R.oo is currently staying with S3). For slides etc: http://www.maths.lth.se/help/R/ 13. Does not require the Object class setConstructorS3() • Defines the constructor method of a class, but also the class. • RCC: • Asserts that a correct class name is used; reserved words and names of basic functions that must not be overwritten or redefined are protected. • Class and constructor names should start with an UPPER CASE letter. • Constructors should be named the same as the class. setConstructorS3(“MAData”, function(M, A, layout=NULL) { extend(MicroarrayData(layout=layout), “MAData”, M = as.matrix(M), A = as.matrix(A) ) }) Constructor/class definition hybrid: Creates an object of the super class, which is then “extended” into an MAData object with additional fields. For slides etc: http://www.maths.lth.se/help/R/ 14. Quick inspection of a class • print(<class name>) or simply type the class name at the prompt and press ENTER, e.g. Object > MADataMAData extends MicroarrayData, Object { public A public layout public M ... normalizeWithinSlide(...) ... public plot(what="MvsA", ...) public plot3d(...) public plotPrintorder(what="M", ...) ... public print(...) public save(file=NULL, path=NULL, ...)} MicroarrayData Layout ngrid.c: integer ngrid.r: integer nspot.c: integer nspot.r: integer ... plot(...) plot3d(...) plotPrintorder(...) ... ... getName(...): character getId(...): character ... nbrOfSpots(): integer nbrOfGrids(): integer ... MAData A: matrix M: matrix as.RGData(): RGData ... normalizeWithinSlide(...) normalizeAcrossSlides(...) ... For slides etc: http://www.maths.lth.se/help/R/ 15. Quick inspection of an object • print(<object>) or simply <object> and ENTER at the prompt, which by default is equal to print(as.character(<object>)), e.g.> ma[1] "MAData: M (5184x4), A (5184x4), Layout: Grids: 4x4 (=16), spots in grids:18x18 (=324), total number of spots: 5184. Spot name's are specified. Spot id's are specified." • ll(<object>) gives details information about the (public) fields, e.g. • > ll(ma) member data.class dimension object.size • 1 A SpotSlideArray c(5184,4) 166008 • 2 layout Layout 1 428 • 3 M SpotSlideArray c(5184,4) 166008 • > ll(ma$layout) # or ll(getLayout(ma)) member data.class dimens2ion object.size1 geneGrps NULL 0 02 geneSpotMap NULL 0 03 id character 5184 638684 ngrid.c numeric 1 36... 11 printtipGrps NULL 0 0 For slides etc: http://www.maths.lth.se/help/R/

16. Rdoc: Source-to-Rd converter #####################################################################/** # @Class Matlab # # \title{Matlab client for remote or local Matlab access} # # \description{ # @include "Matlab.declaration.Rdoc" # } # # \usage{ # matlab <- Matlab(host="localhost", port=9999, remote=FALSE) # } # # \arguments{ # \item{host}{Name of host to connect to. # Default value is \code{localhost}.} # \item{port}{Port number on host to connect to. # Default value is \code{9999}.} # \item{remote}{If \code{TRUE}, all data to and from the Matlab server will # be transferred through the socket connection, otherwise the data will # be transferred via a temporary file. Default value is \code{FALSE}.} # } # # \section{Fields and Methods}{ # @include "Matlab.methods.Rdoc" # @include "Matlab.inheritedMethods.Rdoc" # } # # \examples{\dontrun{@include "Matlab.Rex"}} # # \author{Henrik Bengtsson, \url{http://www.braju.com/R/}} # # \seealso{ # Stand-alone methods \code{\link{readMAT}()} and \code{\link{writeMAT}()} # for reading and writing MAT file structures. # } # # @visibility public #*/###################################################################### setConstructorS3("Matlab", function(host="localhost", port=9999, remote=FALSE) { extend(Object(), "Matlab", ... • Rdoc comments are Rd documentation within the source files: • easy to generate complete Rd files from source files. • less risk to forget to update Rd files. • automatically generates class hierarchy and method lists. • extra tags to include external files, e.g. example code. Does not require the Object class For slides etc: http://www.maths.lth.se/help/R/

17. Static methods • Methods that are specific to a class and do not belong to a certain object. • Keeps the focus on classes/objects, not methods. • For instance, static method names are easy to remember for the end user (“first class then method”), e.g. • MicroarrayData$read(“slide1.gpr”) • Sound$read(“chime.wav”) • Colors$getHeatColors(1:10) instead of • readMicroarrayData(“slide1.gpr”) • readSound(“chime.wav”) • getHeatColors(1:10) which might not even be unique! For slides etc: http://www.maths.lth.se/help/R/ 18. Virtual fields • Virtual fields are fields that does not exist, but appears to do so because of existing methods get<Field>() and set<Field>(). • Example 1: The virtual field area of the Square class is defined by defining getArea() and setArea(): • square$area will call getArea(square), which will return the area (´calculated from the field side or in some other way) • square$area <- -12 will call setArea(square, -12), which then throws an OutOfRangeException. • Example 2: Private fields, e.g. side, can be protected by defining setSide(), which throws a NoSuchFieldException. • Example 3: The constant field RED.HUE can be write protected by defining setRED.HUE(), which throws an AssignmentException. • Example 4: Provide cached fields that can be calculated from the other fields, but can be cached in case they are accessed often at it takes a long time to calculate them. The cache can be removed in case of low memory. For slides etc: http://www.maths.lth.se/help/R/ 19. Summary example setConstructorS3(“Square”, function(side=0) { # Creates an object of class Square. Square, whose fields are # defined at the same time, extends the class Shape. extend(Shape(), “Square”, side = side # ‘side’ is public )}) setMethodS3(“setSide”, “Square”, function(this, side) {# sq$side <- “a” will throw a NonNumericException if (!is.numeric(side)) throw(NonNumericException(“Trying to set the side of a square \ to a non-numeric value: “, side)) # sq$side <- -12 will throw an OutOfRangeException if (side < 0) throw(OutOfRangeException(“The side of a square must be zero \ or greater: “, side)) this$side <- side # Assignment remains also after returning! }) For slides etc: http://www.maths.lth.se/help/R/

20. Does not require the Object class Extended exception handling • Throw Exception objects, which can be caught (quietly) based on class, e.g. trycatch({# Calls setArea(), which throws an OutOfRangeException. sq$side <- -12 }, NonNumericException = { cat(“The side of a square must be a numeric value.\n”)}, ANY = { # catches any other types of Exception (also try-error). print(Exception$getLastException())}, finally = { # always double the side whatever happens.sq$side <- 2*sq$side}) R.oo Object Exception RccViolationException OutOfRangeException NonNumericException Exception static getLastException(): ExceptiongetMessage(): character getWhen(): POSIX timethrow() Error: [2003-03-08 12:11:43] OutOfRangeException: The side of a square must be zero or greater: -12 For slides etc: http://www.maths.lth.se/help/R/

21. Future • Make the API (even) more similar to the S4 API • Makes transitions to and from R.oo (and S4), easier. • Less confusing for beginners. • Make an S4 version of the package • When the problem “generic functions are too restricted on matching argument” is solved. • Make it easier to declare private fields or constants. • Implement the mechanisms for field access in native code. • Publish R.oo on CRAN • Requires a stable API. After 2+ years it is indeed very stable, but any major changes after v1.0 will be annoying for the user. For slides etc: http://www.maths.lth.se/help/R/

22. Acknowledgments • The R development team • People on the r-help mailing list • All users that have given feedback on the project See http://www.maths.lth.se/help/R/ forRCC, more documentation, help, examples, and installation ofR.classes bundle: R.audio, R.base, R.graphics, R.io, R.lang, R.matlab, R.oo, R.tcltk, R.ui,cDNA microarray package: com.braju.sma. For slides etc: http://www.maths.lth.se/help/R/