by Kevin Keenan
This post is to announce the release of version 1.4.2 of the diveRsity package. This new version contains a new major function 'divBasic', which allows users to calculate basic population parameters such as allelic richness, HWE and heterozygosity. The basic usage of 'divBasic' is:
Version 1.4.2 also incorporates this new function into the 'divOnline' web app. Users can download 'divBasic' results in a psudo-publication ready format using the 'divOnline' web app. To launch the web app simply type the following into the R console:
The new version also introduces a new argument 'pairwise' to the major function 'div.part'. This new argument allows user to skip the calculation of pairwise diversity statistics, which can be time consuming for large data sets. The general usage of the 'div.part' function is:
For a more detailed explanation of the diveRsity package and its functionality you can download a user manual here.
By Kevin Keenan
I have a piece of code which reads genepop format files containing genotypic data and calculates lots of basic parameters like allele frequencies and proportions of missing data per locus etc. I generally use this code in wrapper functions like 'div.part' from the diveRsity package to calculate these basic values for use in more complex analyses. Recently I have adapted the code to allow me to resample k samples of n individuals from a large panmictic population of m individuals in an attempt to assess the Type I error rate associated with specific statistical tests under various sampling routines. Changing the code wasn't particularly difficult and I began running the simulations this morning. However, every once in a while my code would terminate with the following error:
Now, I'm a pretty haphazard coder so I've see this error message a million times. Usually the problem is obvious, maybe I haven't included flow control for missing data or something simple like that. This time however I could not see the problem. I spent quite a while looking through the code, double checking each relevant line of the the other function I was using which contained the line:
No matter how much I looked at it, I just couldn't see the problem. I finally decided to look through the results from the code which reads the genepop file to see if the problem might be here and not in the function which was actually returning the error.
It turns out that this code contained the most obscure bug I've ever seen. It would only cause a problem under the most improbable conditions for microsatellite data from 'real' populations. Basically I was using the R function 'apply' to convert columns of a matrix into tables which list the number of occurrences of each allele at a particular locus. However for some reason if all tables generated using apply are the same length (i.e. all loci have the same number of alleles), the function will simplify the results to a matrix rather than the usual list of tables. This change in results format meant that all parameters calculated from this point on were wrong. Here's the code that caused the problem:
And the simple fix:
In the fixed code, I can now force the function afTab to return a list of table because lapply will not simplify the results, but rather maintain the list structure required by downstream code.
By Kevin Keenan
diveRsity v1.3.6 has just be submitted to CRAN. This new version now contains a function to calculate the statistical significance of genetic heterogeneity between population samples. A new function allowing users to launch diveRsity-online locally is also included.
About the authors
Kevin Keenan is currently working towards a PhD in population and evolutionary genetics.