By Kevin Keenan
The apply
family of functions in R
are extremely useful. I've been using them for quite a while now, generally in place of for
loops. However, they are not particulary intuative for R
beginners, in the same way that loops can be.
One apply
function that I have never paid much attention to in the past is mapply
. I've attempted to use it a few times but could never make sense of the help file and just resorted to loops instead. This morning, however, I was trying to calculate some statistics from the independent element of two lists
that I had generated, and was determined to avoid using a for
loop (my default position when writing R
code).
A quick google search suggested that mapply was the way to go. After some fumbling around and lots of trial and error, the scales dropped from my eyes as I held 'CTRL+ENTER' (in RStudio of course) and the stop icon dissappeared as if it was never there. Previously, when running similar calculations using for
loops, the stop icon might have remained tauntingly for up to a minute, maybe more.
It appears that mapply
is not only easier to use than I previously thought, but also lightening fast. Let the code below be a testament to its power:
Example code
For the purpose of illustration, imagine we have two lists of length 100,000, each element being a matrix of 100 random variables with 10 columns and 10 rows.
Imagine that we are interested in calculating the product of each matrix (i.e. \( mat1 \times mat2 \)). Let's have a look at the speed difference between using a for loop and mapply
.
# generate the data
list1 <- list()
list2 <- list()
for (i in 1:1e+05) {
list1[[i]] <- matrix(rnorm(100), ncol = 10)
list2[[i]] <- matrix(rnorm(100), ncol = 10)
}
# Calculate the matrix products using a 'for' loop
system.time({
listProd1 <- list()
for (i in 1:1e+05) {
listProd1[[i]] <- list1[[i]] * list2[[i]]
}
})
## user system elapsed
## 32.33 1.34 34.31
# Calculate the matrix products using 'mapply'
system.time({
listProd2 <- mapply(FUN = `*`, list1, list2, SIMPLIFY = FALSE)
})
## user system elapsed
## 0.34 0.03 0.38
# Test to make sure both methods do the same thing
listProd1[[1]] == listProd2[[1]]
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [3,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [4,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [5,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [6,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [7,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [8,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [9,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [10,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
listProd1[[1000]] == listProd2[[1000]]
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [3,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [4,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [5,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [6,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [7,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [8,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [9,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [10,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
We can see that there is a massive (in computation terms) difference in the performance of these two methods. Although I don't know for sure, I suspect the time penalties in the for
loop are due to growing the list from scratch which takes time, and is not the best way to do things.
Reproducibility
## R Under development (unstable) (2013-09-29 r64014)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] knitr_1.5.1
##
## loaded via a namespace (and not attached):
## [1] digest_0.6.3 evaluate_0.4.7 formatR_0.9 stringr_0.6.2
## [5] tools_3.1.0