Thursday, April 24, 2008

Performance of loop functions in R

I've recently paid special attention to loop functions in R because of my intern jobs at PICB, which usually take a couple of days to run on the server. A colleague told me that the apply() and similar functions may work well to accelerate the program performance. I doubted this at the very beginning. However, I implemented lots of my program using replicate() and the code seem much nicer.

Then I came into this post, saying that apply is even slower than for loops. And this forces me to do some simulation on my own. Some simulations study between sapply and for loop shows that the former one has better performance than the latter one. Maybe we need to choose the right *apply function to get the best performance.

This also brought me an alternative solution to the loops on vectors which I usually meet in practical programming. Some times I use Vectorize() on the atomic functions to process the whole vector. This also yields tidy codes.

Finally, the simulation results shows that the performance is like:

sapply > Vectorize > for

I guess I should consult the R mailing list for better understanding of the performance evaluation and improvement.

1 comment:

Yu-Sung Su said...

Hi,

I don't know why but I feel your examples are not comparable.

I revise your code as:
http://yusung.googlepages.com/applytest2.R

Now loop is faster than sapply().

For me,
loop > apply > sapply~lapply~tapply