In 2021 I tried to create a data application in R using Java-style object-oriented programming. It worked, but there was a problem...
In this post I am going to share details about how exciting my dating life has been recently. You see, after a relative period of… oh wait; hold on… sorry; looks like I mixed up posts here. That last bit was obviously intended to be the start of a post for a very different kind of personal blog…
What this post is actually going to be about is why you probably shouldn’t use Java-style object-oriented programming (OOP) when writing code for the data programming language R. Don’t worry though: if you’re interested in hearing details about my dating life I may still be able to work in a few details here and there. Just keep reading…
Section 1 – On the challenge of learning R
R is an interpreted GPL-licensed specialty language intended for statistical and analytical applications. This makes it different from languages such as Python or Java which are general-purpose programming languages which can be used—with heavy modification—for data and analytics programming. What does this mean, practically?
In R, robust support for linear algebra, regression analysis, tabular data processing, and chart creation are all part of the standard library. Almost every non-function object in the language is also a vector or a matrix, and if you want to get the square of every number between 2 and 99 you just type (2:99)^2
.
That R—both the language and the interpreter that processes the language—was built around data also means that data processing in R is both fast and syntactically easy to accomplish. And R’s interpreter makes it easy to explore and experiment with data via the command-line console. If your project is configured right, rapid testing and debugging is also a breeze. The difficulty though is learning enough about the core language so that coding in R is easy.
The haRd part
If there is a singular problem with R it’s that there is so much documentation related to it, and so much to learn, novices may struggle to know where to even begin. People who are trying to learn how to program for the first time in addition to learning R are in for a very rough time. Even if you are coming from a language like Python you will feel like you are all thumbs and that everything about the language is awkward. (That goes double if you did what I did, and try and master R and vim in parallel.)
The attention span and patience required to learn the R language with any depth is high, which is why you sometimes—but not always—see people with PhDs using it most effectively. Unlike Python or even Java this was never a language for the masses.
The documentation on the R standard library contains 3,000 pages worth of detailed function definitions and examples, and while almost every function needed to use R effectively already exists in the standard library, finding it is 80% of the effort: patience and tenacity are paramount.
Still, not all libraries in the standard library are equal, as I discovered…
OOPs!
While R documentation is robust, there is a rare but notable hole in it that has not been fixed for years. And curiously enough, this deficit involves the use of the “methods” library, a library which—in part—allows for Object-oriented programming (OOP) in R via the Reference class.
(Note: I’ll be referring to the Reference Class in the R methods package as “OOP in R” for the remainder of the article.)
All powerful programming languages have their quirks. Python has its weird (and poorly explained) pass-by-pointer-as-value list() object. JavaScript has its surprisingly worthless Date class. (See I told you this would touch on my “dating” life!) And R has—among other things—the “methods” library.
While the R documentation—if you poke around the manual—certainly does instruct you on how you can implement OOP-style objects successfully in R using the standard library, I learned that while you can use OOP in R for data visualization projects you probably shouldn’t.
Section 2 – Object-oriented programming in R works but…
I used OOP in R on a few personal projects from late 2020 through early 2021. I even built a theoretically extensible tool in R using OOP that was intended to serve as a template for all my personal R projects going forward; an ambitious goal.
One of these projects was a data animation tool for R I completed from start to finish with no background whatsoever in animation, only moderate experience in R, and a lot of experience programming in Python using OOP. It took me roughly 4-5 weeks of my own time to execute it, and the results of my efforts were kind of impressive:
The tool I created could automatically generate complex multi-part animations using an animations API I wrote from the ground-up. And as you can see it supports not only some interesting one-to-many transitions, but there is also that nice-looking right-side legend, which I pulled off using only the R “graphics” library and some clever R coding tricks.
Early in the project though I made a decision to implement this using OOP in R, fearing that if I didn’t it would cause my code to snowball in complexity. At first everything worked perfectly, and using OOP in R allowed me to easily reuse large chunks of code. As I tried to make modifications and extensions to my code though some issues started to appear…
“Will, how is it you R still single?!?”
The problem, I quickly found, is that packages in R written using OOP are not as easy to organize as packages written without it, and it’s much more difficult to navigate code written using OOP in R than non-OOP code.
While you can use tools like rtags on non-OOP-projects to get around ordinary code easily (ctags if you are using vim) it doesn’t doesn’t work nearly as well for OOP in R. Documentation and debugging for OOP in R also poses a unique challenge, and it’s far more cumbersome to document and debug your work in OOP than it is in unadorned R.
While I found it was certainly easy to write OOP modules in R, coming back to the code a few weeks later and figuring out what I needed to change or add without breaking everything started to feel less like breezy software development and more like heavy manual labor: R is really not built for OOP in the same way that Java, Python, and C++ are.
A solution to the problem of OOP in R?
The solution, I’ve found, to tackling the issue of code complexity in R is not to force the use of OOP but to structure data tools so they capitalize on the unique features of the R language. While you use OOP in Java because there is no other way to program in Java, and you may use OOP in Python to handle certain complex object relationships, there is rarely a good reason to use it in R.
The animations you see scattered around this post—minus the ones from 2021—were all quickly created using an environment-based approach to code compartmentalization that is native to R and does not use Java-style OOP.
While I know how strong the modern trend is to try and turn popular languages into a flavor of Python, “programming in R with R” really is the best approach to R programming; not trying to make it conform to other programming language’s paradigms.
To be clear, I’m still a huge fan of OOP in other languages, but going forward I won’t be using OOP in R (or even a third-party OOP library like R6) for my R data projects. If I feel I absolutely must use OOP in my R project for some reason, I’d consider writing an C++ extension for R that uses OOP before I’d go back to OOP in R base.
Section 3 – All’s well that ends well
To be clear, modern S4 classes in R—not to be confused with OOP in R—are nifty and add much-needed language-level type-enforcement features to R. The issue I have is with these “reference class” objects, which are used to implement OOP-style coding in the R standard library. And my problem with them isn’t that they don’t work but that the nature of R makes them more difficult to debug, document, maintain, and re-use.
In any case, building on my hits and misses in R, in late 2021 I was able to use R to effectively (and quickly) put together a solid data analysis presentation as part of the interview process for the Vanguard Group. The presentation allowed me to land a lead analyst job there, along with a healthy fifteen months of highly-productive full-time (remote!) employment alongside some really nice people.
(Interesting detail: while the analysis I produced using R played a major role in getting hired, what apparently sold them on me was that I aced the Python coding test they administered. I was actually told I could have used nothing but Excel to prepare my data analysis presentation and still have gotten the job. Go figure.)
I do recommend learning base R if you have the time: it does good things to your brain (usually). But let me come full circle: does R do good things for your dating life as well? Uh…
Anyway, if you take the time to learn the R language’s weirdness and don’t expect it to behave like Python or Java—and you stick to the standard library—you’ll be all right. Just remember that even though you can do OOP in R that doesn’t mean you should.
Suggesting Readings on R
The R Manuals
An Introduction to R: A Sample Session by the R Core Team
Time Series Analysis and Its Applications With R Examples (2nd edition), by Shumway and Stoffer
S, R, and Data Science, by John M. Chambers (pdf)
Suggested Tools for R
Vim
gedit (seriously)
The R interactive terminal