Thursday, September 13, 2012

OO in R

The R Project

"Is there a package for obfuscating code in #rstats?", someone asked. "The S4 object system?!" came the snarky reply. If you're smiling right now, you know that it wouldn't be funny if it weren't at least a little bit true.

Options: S3, S4 or R5?

There can be little doubt that object oriented programming in R is the cause of some confusion. We'll look at S4 classes more closely in a minute, but be warned that S4 classes are just one of at least three object systems available to the R programmer:

  • S3: simple and lightweight
  • S4: formal classes implemented by the methods package
  • R5: Reference classes

It's not super clear when to use which, at least not to me. It seems to depend strongly on style and personal preference. The Bioconductor folks, for example, make heavy use of S4 classes. Google, on the other hand, advises to "avoid S4 objects and methods when possible".

Here's the way it looks to me. S3 classes feel a bit like Javascript classes - easy, loose and informal. S4 classes are rigid, verbose and harder to understand. But, they offer a better separation between interface and implementation, along with some advanced features like multiple dispatch, validation and type coercion. Reference classes (aka R5) encapsulate mutable state and look more like familiar Java-style classes. They're new and pass-by-reference can violate expectations of R users.

An S4 class example

Now, let's return to S4 classes with a simple example. First, we define a class to represent people.

# define an S4 class for people
setClass(
  "Person",
  representation(name="character", age="numeric"),
  prototype(name=NA_character_, age=NA_real_)
)

A person has a name and an age, which default to NAs of their respective types - character string and numeric. For the sake of demonstrating polymorphism, let's define a couple subclasses.

# define subclasses for different types of people
setClass("Musician",
  representation(instrument="character"),
  contains="Person")

setClass("Programmer",
  representation(language="character"),
  contains="Person")

There's no reason not to write normal R functions that take S4 classes as arguments. Polymorphism is called for when a method has different implementations for different classes. In that case, we declare a generic method.

# create a generic method called 'talent' that
# dispatches on the type of object it's applied to
setGeneric(
  "talent",
  function(object) {
    standardGeneric("talent")
  }
)

The following code implements two subtypes of person, each with a talent for something.

setMethod(
  "talent",
  signature("Programmer"),
  function(object) {
    paste("Codes in", 
      paste(object@language, collapse=", "))
  }
)

setMethod(
  "talent",
  signature("Musician"),
  function(object) {
    paste("Plays the",
      paste(object@instrument, collapse=", "))
  }
)

Now, let's make some talented people.

# create some talented people
donald <- new("Programmer",
  name="Donald Knuth",
  age=74,
  language=c("MMIX"))

coltrane <- new("Musician",
  name="John Coltrane",
  age=40,
  instrument=c("Tenor Sax", "Alto Sax"))

miles <- new("Musician",
  name="Miles Dewey Davis",
  instrument=c("Trumpet"))

monk <- new("Musician",
  name="Theloneous Sphere Monk",
  instrument=c("Piano"))

talent(miles)
[1] "Plays the Trumpet"

talent(donald)
[1] "Codes in MMIX"

talent(coltrane)
[1] "Plays the Tenor Sax, Alto Sax"

Mutability

One common stumbling block with S4 classes concerns changes in state. For instance, we might want to give our hard-working employees a raise.

setClass("Employee",
  representation(boss="Person", salary="numeric"),
  contains = "Person"
)

setGeneric(
  "raise",
  function(object, percent=0) {
    standardGeneric("raise")
  }
)

setMethod(
  "raise",
  signature("Employee"),
  function(object, percent=0) {
    object@salary <- object@salary * (1+percent)
    object
  }
)

True to it's functional heritage, R deals with immutable values. Changes in state happen by making new objects. The trick is to return the new object from the mutator methods and capture it on the way out.

smithers <- new("Employee",
  name="Waylon Smithers",
  boss=new("Person",name="Mr. Burns"),
  salary=100000)

# doesn't work?!?!
raise(smithers, percent=15)
smithers@salary
[1] 100000

Setting a new salary creates a new value. Notice that we return the modified object from the raise function. Don't forget to catch it.


# remember to reassign smithers to the new value
smithers <- raise(smithers, percent=15)
smithers@salary
[1] 115000

Multiple Inheritance

Through the magic of multiple inheritance, the lowly Code Monkey is both a programmer and an employee. Just set the contains value to indicate its two parent classes.


setClass("Code Monkey",
  contains=c("Programmer","Employee"))

setMethod(
  "talent",
  signature("Code Monkey"),
  function(object) {
    paste("Codes in",
      paste(object@language, collapse=", "),
        "for", object@boss@name)
  }
)

chris <- new("Code Monkey",
  name="Chris",
  age=29,
  boss=new("Person", name="The Man"),
  salary=2L,
  language=c("Java", "R", "Python", "Clojure"))

talent(chris)
[1] "Codes in Java, R, Python, Clojure for The Man"

So, there you have it - encapsulation, polymorphism and inheritance in S4 classes. Complete code for this example is in gist:3670578.

OO in R resources

It's lucky that there are loads of places to go to learn about S4 classes.

3 comments:

  1. Totally OT, but: Trane played tenor and soprano. I'm not aware of any tracks on which he played alto.

    ReplyDelete
    Replies
    1. @cellocgw, No not off topic at all. Thank you for the correction. What a total flub on my part. I should be forced to listen to a Justin Bieber album as a punishment.

      Delete
  2. Heh. http://www.thegearpage.net/board/showthread.php?t=756171

    ReplyDelete