Measuring performance of functions in R

In R you can use the system.time(function) function to test the time it takes to execute a function. Because I had to do some extended performance testing in R, I wrote a new function based on system.time which gives you the possibility to run multiple samples of a function, to increase the reliability of the measurements. The function is called performance, and takes three arguments: the function you want to test, the number of samples (1 by default), and whether garbage collection should be performed before each function run (yes by default).

For instance:

> performance(complex.function(1,5,"goal"), samples=100)

Average time per run:
User System Elapsed
0.338 0.014 0.352

Total time for all runs:
User System Elapsed
33.805 1.369 35.212

I included 'performance' in my R basic functions (see also Customizing R: startup script).

Set your R working directory in TextWrangler

Yesterday I figured out (together with a friend, Leendert) how to set your R working directory to the path of the current document you're working on in TextWrangler.

We developed two scripts: one for setting the working directory of R directly, and one for adding setwd('/the/directory/of/your/current/document/') to your current document.

If you would like to use them, download the scripts below, and place them in ~/Library/Application Support/TextWrangler/Scripts/, and assign shortcuts using TW's Window -> Palettes -> Scripts menu.

Download the scripts here:
the script to set R's working directory directly
the script to add the setwd command to your document

Labeled outliers in R boxplot

Boxplots are a good way to get some insight in your data, and while R provides a fine ‘boxplot’ function, it doesn’t label the outliers in the graph. However, with a little code you can add labels yourself:labeledBoxPlot
The numbers plotted next to the outliers indicate the row number of your original dataframe. Admitted, it could look nicer, but it’s enough to find your outliers back in your data.

It’s done by this code:
#create boxplot
boxdata <- with(data,boxplot(dependent ~ condition1 * condition2, range=4))

#for each outlier in boxdata
for(i in 1:length(boxdata$group)){
    #add text to the boxplot
    text(boxdata$group[i], boxdata$out[i], which(data$dependent==boxdata$out[i]),pos=4)

You can download the code here.

Region Of Interest analysis of BOLD data

For my research I do Region-Of-Interest (ROI) analyses of BOLD fMRI data. That is, we look at how brain activity develops in certain predefined brain regions during an experiment. On this site I describe a script that automates these analyses, using SPM, MarsBaR, Matlab, & R.

Let’s start with a short example of an Region-Of-Interest analysis: When you let a person look at a stimulus, you will find a certain pattern of brain activation in the fusiform gyrus (the ROI here, it does visual processing):
At scan 1 the stimulus was presented; you can see that the BOLD response is sluggish, only 3 scans later (= 4.5 seconds) the activity reaches its peak. How do we get from the raw scanner data to such graphs, how are these ROI-analyses done?

fMRI analyses are often performed using the software package SPM (Statistical Parametric Mapping, Wellcome Trust Centre for Neuroimaging). For doing Region-Of-Interest analyses in SPM, the toolbox MarsBaR was developed. MarsBaR can be used to define ROIs, and to get the raw BOLD data from such regions (MarsBaR can also be used for more sophisticated analyses, see the MarsBaR website).

While MarsBaR is really useful as it is, it gets a little tedious to do the analyses by hand. As I often do the same sort of analyses (analyzing 16 predefined regions, see, I automized the whole procedure as a shell script, called jpbplot. The script calls an R file, which, in turn, executes a matlab function that calls SPM and MarsBaR.

The script takes as input four arguments: the design of your experiment (more on that below), the ROI(s) you want to analyze, the names of your conditions, and the scans you want to use as a baseline. For example, to analyze a visual and a manual ROI with a two condition design, you could execute the following in a terminal window:

jpbplot experimentDesign.df visual_roi.mat,motor_roi.mat condition1_condition2 1,2

which will give you two pdf-files: the average bold response in each ROI
(see the example on the right) and the bold response per participant in each ROI. Furthermore, it will output a data file with the raw bold values in your ROI(s), per participant and condition, which can be used for further analyses.

As described above, you have to tell the script which ROIs it has to analyze. The ROIs that can be used have to be made with MarsBaR, 16 example ROIs are included in the zip-file containing jpbplot. These are the regions that are used for ACT-R fMRI research; as a shortcut you can use ‘ACT-R’ instead of these ROI-names when you call the script. For example, this will analyze all ACT-R regions:

jpbplot experimentDesign.df ACT-R condition1_condition2 1,2

You also have to provide the script with a ‘design’-file, containing your experimental design. The design-file contains a list of your raw scan-files on disk, linking those files to participant number, condition, and scan within a trial. Thus, say one trial takes 3 scans (= 6 seconds), and you have two conditions that can both be EASY and HARD, the top of the design file may look like this (the first 5 trials of the first participant):

1 EASY HARD 1 /Volumes/fMRI/pp1/functional/smoothed_00001
1 EASY HARD 2 /Volumes/fMRI/pp1/functional/smoothed_00002
1 EASY HARD 3 /Volumes/fMRI/pp1/functional/smoothed_00003
1 EASY EASY 1 /Volumes/fMRI/pp1/functional/smoothed_00004
1 EASY EASY 2 /Volumes/fMRI/pp1/functional/smoothed_00005
1 EASY EASY 3 /Volumes/fMRI/pp1/functional/smoothed_00006
1 HARD HARD 1 /Volumes/fMRI/pp1/functional/smoothed_00007
1 HARD HARD 2 /Volumes/fMRI/pp1/functional/smoothed_00008
1 HARD HARD 3 /Volumes/fMRI/pp1/functional/smoothed_00009
1 EASY HARD 1 /Volumes/fMRI/pp1/functional/smoothed_000010
1 EASY HARD 2 /Volumes/fMRI/pp1/functional/smoothed_000011
1 EASY HARD 3 /Volumes/fMRI/pp1/functional/smoothed_000012
1 HARD EASY 1 /Volumes/fMRI/pp1/functional/smoothed_000013
1 HARD EASY 2 /Volumes/fMRI/pp1/functional/smoothed_000014
1 HARD EASY 3 /Volumes/fMRI/pp1/functional/smoothed_000015

Thus, each line contains the participant number, the two conditions, the scan-within-a-trial, and the filename of the scan file (in that order; trials are not numbered, it is assumed that the the scan-files are listed in the order in which they were acquired). If you have more or less than two conditions you simply include more or less columns: if you have only one condition, you could leave out the 3rd column, if you have 12 conditions, columns 2-13 would describe the conditions. The list should contain all scan-files you want to analyze; each trial must have the same number of scans (you can remove scans for subsequent analyses using the raw data file that is part of jpbplot’s output). A complete example.df is included in the zip-file.

jpbplot is based on jmfplot, which can be used for the same kind of analyses, but uses different analysis and ROI software than SPM and MarsBaR. jmfplot was developed by Jon M. Fincham.

If you would like to use jpbplot, you can find a zip-file here with the scripts and ROIs. The zip-file also contains a readme.txt which explains how to install jpbplot and describes the design file in a little more detail. If you have any questions, don’t hesitate to email me!

In case you use jpbplot, please cite following paper:

Borst, J.P., Taatgen, N.A., Stocco, A., & Van Rijn, H. (2010). The Neural Correlates of Problem States: Testing fMRI Predictions of a Computational Model of Multitasking. PLoS ONE 5(9): e12966. doi:10.1371/journal.pone.0012966. (pdf) (url)


Customizing R: startup script

While R is certainly a great tool, some functions seem to be missing. Take for instance the standard error, for some reason there is no standard function included, while you would expect se(x) to work. Therefore I usually include a couple of functions, like a standard error, a correct round() function (the standard function round(2.5) gives 2 as a result), and some other functions in my R-scripts.
Last week, I decided to come up with a constructive solution. When R starts up, it loads an .Rprofile file in your home directory (if it exists), and executes a .First() function. The solution is therefore really easy, you put such a file in your home directory (and rename it to .Rprofile, note that Snow Leopard might try to resist this, use the terminal in that case: mv dotRprofile .Rprofile), which sources another file:

.First <- function(){
    cat("\nWelcome to R!\n",sep="")
        cat("RbasicFunctions.r was loaded, providing the following functions:\n\n",sep="")

As you can see, this file sources ‘RbasicFunctions.r” if it exists, and executes print.functions(). RbasicFunctions.r is the file containing my functions, and in that file there’s also the print.functions() that displays which functions are in the file. You can either put the file itself in your home directory, or make a symbolic link to it with ‘ln -s RbasicFunctions.r target’. Now, when R starts up, it shows this:

Welcome to R!

RbasicFunctions.r was loaded, providing the following functions:

Rounding etc.:
roundup(x, numdigits=0) - correct rounding of .5 etc.
round.largest(x) - round to largest digit, i.e., 54 -> 50
ceiling.largest(x) - ceiling to largest digit, i.e., 54 -> 60
is.odd(x) - returns TRUE if roundup(x) is an odd number
is.even(x) - returns TRUE if roundup(x) is an even number

Standard errors, error bars, rmsd etc:
se(x) - standard error
rmsd(x) - root mean squared deviation

etc. and I’ve got all the functions I was missing in R. Each time I come across something else that’s missing, I just have to extend the RBasicFunctions.r file.

Analyzing Nike+ using R

A couple of weeks ago, I finally took up running seriously. To keep myself motivated -- especially on dark winter nights -- I bought the Nike+ iPod sports kit. The sports kit was developed by a partnership of Nike and Apple, and consists of a tiny accelerometer that you attach to one of your shoes (according to Nike it should go in a special built-in pocket of your special Nike+ compatible shoe, but there are of course plenty of other solutions), and a receiver you attach to your iPod. The accelerometer sends its data to your iPod, which in turn can tell you how far you ran and how far you still have to go. So far so good, but what you probably want to do is to analyze the data you have painfully gathered. And the way to do that, according to apple and nike, is to let iTunes send your data to the nikeplus website. This website is heavily flash-based, slow and only shows general trends -- it doesn’t let you touch your raw data. And that is exactly what I wanted. A couple of existing programs give you slightly better control over your data (e.g., Runner’s Log, TrailRunner), but none of them did cater to my needs.

To solve this I decided to write an R-script: how to better analyze data than by using R? Fortunately, the running data is stored in xml files on your iPod, which makes life easy. The script starts by copying new .xml files from a connected iPod (if there is one) to your hard disk, and parses the contents of the new files. Then, it displays the speed and pace over your last run, and graphs of speed, pace and distance of all your runs to date. For example:


Of course, as the script gives you access to the raw data, you can do with it what you like.


The last thing the current script does is importing a csv training schedule (e.g., 12/30/08,2,3.218688
- date/mile/km). It displays the distance you have to run each day, and shows what you have already managed:


If you would like to try it out, you can download my script here, if you have any questions about the script, please feel free to send me and email.

TextWrangler and R

R is an environment for statistical computing (see also ‘Data Analysts Captivated by R's Power’ in the New York Times) which I use on an almost daily basis. However, the included text editor is not all that great, and that is why I use TextWrangler to edit my R scripts. To get the most out of this combination, I searched the internet for syntax highlighting and for an apple script for executing my code directly from TextWrangler in R. It took me a while to find it, especially the script that only executes the selection in TextWrangler (as opposed to the complete file). This is the best I could find:
  • the syntax highlighting file. Save this file in ~/Library/Application Support/TextWrangler/Language Modules/ (make sure you keep the ‘.plist’ extension), and restart TextWrangler. Go to TextWrangler’s preferences -> Languages, and add a new suffix mapping (e.g., .r to R language).
  • an applescript to execute the selection in TextWrangler. Put this in ~/Library/Application Support/TextWrangler/Scripts/. You can assign a shortcut by going to the Window menu in TextWrangler, then choose Palettes -> Scripts and assign a shortcut, for instance cmd-return for the same behavior as the built-in R text editor. It either executes your current selection, or the line on which your cursor is. Note: for some versions of TextWrangler, you will have to rename the applescript from .txt to .scpt for TextWrangler to recognize the file (it turns out that you sometimes have to save the file with ‘script editor’, to get TextWrangler to recognize it, thanks Brian!).

Update September 18, 2009:
If you like the scripts above, you might also be interested in:
  • a script to automatically set your working directory in R to your TextWrangler document's directory.
  • a start-up script for R with a couple of useful function that should've been in R in the first place.