A new version of diveRsity
is now available for download from 'CRAN'. It can also be downloaded using the method described here. While most of the updates in this version are minor, some are more substantial. One of these is the inclusion of an automated bias correction for confidence intervals (CIs) estimated for pairwise differentiation. Below is a coded demonstration of the problem of bias when estimating confidence intervals for diversity/differentiation statistics from bootstrapped data.
The code
Load diveRsity
and calculate pairwise differentiation, including \( 95\% \) Confidence Intervals (using 1000 bootstrap iterations).
# load diveRsity library(diveRsity)
# run fastDivPart on "Test_data" (1000 bootstraps)
data(Test_data)
system.time({
pwDiff <- fastDivPart(Test_data, pairwise = TRUE,
bs_pairwise = TRUE, WC_Fst = TRUE,
bootstrap = 1000, parallel = TRUE)
})
user system elapsed
2.810 0.054 456.167
Since Test_data
contains genotype data for six population samples, fastDivPart
will calculate confidence intervals for 15 separate pairwise estimates of \( G_{ST} \) (Nei & Chesser, 1983), \( G^{'}_{ST} \) (Hedrick, 2005), \( D_{Jost} \) (Jost, 2008) and \( \theta \) (Weir & Cockerham, 1984). We can plot these point estimates along with CIs as follows (N.B. by setting the argument plot = TRUE
in the original call, these results will be plotted automatically).
Plotting the results
# View the pairwise results data
names(pwDiff)
[1] "standard" "estimate" "pairwise" "meanPairwise" "bs_pairwise"
We can see that the pairwise bootstrap results are located at bs_pairwise
. This object within pwDiff
contains separate data tables for each of the four statistics listed above. All data tables have the same structure, so we can just look at one, \( D_{Jost} \).
pwDiff$bs_pairwise$djostEst
actual mean BC_mean Lower_95%CI Upper_95%CI BC_Lower_95%CI BC_Upper_95%CI
POP1_1 vs. POP2_1 0.6002342 0.5850703 0.6002342 5.169e-01 0.647625 0.5320938 0.6627891
POP1_1 vs. POP3_1 0.0006103 0.0011152 0.0006103 4.645e-06 0.003614 -0.0005003 0.0031092
POP1_1 vs. POP4_1 -0.0002870 0.0002585 -0.0002870 -7.567e-05 0.001304 -0.0006211 0.0007581
POP2_1 vs. POP3_1 0.5903311 0.5742200 0.5903311 5.086e-01 0.635704 0.5246634 0.6518151
POP2_1 vs. POP4_1 0.5933083 0.5783549 0.5933083 5.139e-01 0.641400 0.5288839 0.6563533
POP3_1 vs. POP4_1 0.0010899 0.0018571 0.0010899 3.038e-04 0.004484 -0.0004634 0.0037167
We can see that the output for each of the four statistics estimates contains seven data columns. We are interested in comparing the difference between the standard CIs and the bias corrected CIs.
# plot using ggplot2
library(ggplot2)
# create a djost dataframe from our pwDiff results to reduce typing
djost <- as.data.frame(pwDiff$bs_pairwise$djostEst)
# The column names cause problems, so we can change them as follows:
colnames(djost) <- c("actual", "mean", "bcMean", "bucLow", "bucUp", "bcLow", "bcUp")
# plot the point estimates for each of your four pw comparisons
# with uncorrected and corrected 95% CIs
CI_type <- as.factor(c(rep("uc", nrow(djost)), rep("bc", nrow(djost))))
ggplot(djost, aes(x = c(1:nrow(djost),1:nrow(djost) + 0.5),
y = c(actual, actual), colour = CI_type)) +
geom_errorbar(aes(ymin = c(bucLow, bcLow), ymax = c(bucUp, bcUp)),
width = 0.3) +
geom_point() +
ylab(expression(D["Jost"])) +
xlab("Pairwise Comparison")
A figure demonstrating the difference in bias corrected and uncorrected \( 95\% CIs \) derived from 1000 bootstrap iterations. Blue lines represent the uncorrected CI, while pink lines represent the bias corrected CIs.
Of particular note is the fact that often the uncorrected CIs don't even encompass the sample estimates. This occurs as a result of the known upward bias associated with bootstrapping these types of statistics. Another important point of notes is the increased risk of type I error seen for two of the pairwise comparisons when using uncorrected CIs. In these comparisons, the uncorrected CIs do not overlap \( 0 \), suggesting differentiation between the two populations is statistically significant. However, following bias correction, these pairwise differentiation estimates are no longer significantly different from \( 0 \) (i.e. they overlap \( 0 \)).
From this example, it is clear that correcting for bias in bootstrapped differentiation estimates is important. In line with this, diveRsity
now provides automatic bias correction in the fastDivPart
function.
N.B. divPart
does not have this capability since it is no longer supported.
References
-
Hedrick, P. (2005) A standardized genetic differentiation measure. Evolution, 59, 1633–1638
-
Jost, L. (2008) GST and its relatives do not measure differentiation. Molecular Ecology, 17, 4015–4026
-
Jost, L. (2008) GST and its relatives do not measure differentiation. Molecular Ecology, 17, 4015–4026
-
Keenan, K., McGinnity, P., Cross, T. F., Crozier, W. W., Prodöhl, P. A. (2013), diveRsity: An R package for the estimation and exploration of population genetics parameters and their associated errors. Methods in Ecology and Evolution, 4: 782–788. doi: 10.1111/2041-210X.12067
-
Weir, B. & Cockerham, C. (1984) Estimating F-statistics for the analysis of population structure. Evolution, 38, 1358–1370