By Kevin Keenan
I never thought my lack of familiarity with Mac OS X would cause me any problems though, it is based on UNIX after all. Recently however, I have been struggling with an apparent bug in the diveRsity package source code, and it quickly became clear that the users having trouble were all Mac OS X peeps. I couldn't reproduce their problem on my Windows XP, Windows 7 or Ubuntu (Linux) operating systems, so I asked a colleague to test the problem on their Mac (thanks Deirdre!). Sadly they weren't imagining it. The problem was mine to solve. I quickly went about trying to install Mac OS X on my PC (which isn't the most straight forward process by the way), and manged to get it working using a combination of Ubuntu and virtualbox (instructions can be found here).
After installing Mac OS X I went about the hefty task of finding an ambiguous bug in just shy of 9,000 lines of poorly annotated code (I wrote most of diveRsity late at night when being meticulous was the least of my worries). I finally localised the bug to the chunk of code below:
if (gp == 3) {
plMake <- function(x) {
matrix(sprintf("%06g", as.numeric(x)),
nrow = nrow(x), ncol = ncol(x))
}
} else if (gp == 2) {
plMake <- function(x) {
matrix(sprintf("%04g", as.numeric(x)),
nrow = nrow(x), ncol = ncol(x))
}
}
Apparently because 'sprintf' is just a wrapper for the C-level function 'printf', the arguments "%06g" and "%04g" have undefined behaviour which is OS dependent. The above example holds true in Windows and Lunix (i.e. space/empty character padding), but in Mac OS X, the function results in 'leading zero padding' not leading spaces. This means that where the code downstream was expecting either ' NA' (four spaces and NA) or ' NA' (two spaces and NA), in Mac OS X the actual string was either '0000NA' or '00NA'.
This bug resulted in the creation of two completely new alleles at a locus (where missing data should actually be), leading to erroneous calculation of allele frequencies downstream. To solve the problem without having to modify too much code, I simply added a conditional argument to the above code. See below:
if (gp == 3) {
plMake <- function(x) {
out <- matrix(sprintf("%06g", as.numeric(x)),
nrow = nrow(x), ncol = ncol(x))
if (Sys.info()["sysname"] == "Darwin") {
out[out == "0000NA"] <- " NA"
}
return(out)
}
} else if (gp == 2) {
plMake <- function(x) {
out <- matrix(sprintf("%04g", as.numeric(x)),
nrow = nrow(x), ncol = ncol(x))
if (Sys.info()["sysname"] == "Darwin") {
out[out == "00NA"] <- " NA"
}
return(out)
}
}