I am currently working on a method to assess statistical power provided by genetic loci, in an attempt to select the most efficient set of markers possible to address a specific question about particular populations. The general method involves the estimation of a few locus specific population genetics parameters and then the use of these values to rank loci in term of informativeness. These different ranks are then tested to assess their ability to predict statistical power (i.e. does one parameter provide a better power curve than another?).
The problem I have been having is, the power analysis requires hundreds of different input files. I started out optimistically by manually creating these files with the help of MS excel. My optimism didn't last past coffee break. In the three hours or so before coffee, I only managed to create 42 input files out of a required 320. I calculated my completion time to be around 23 hours. I've never done anything for 23 hours in my life let alone something as tedious as this. I decided that this was a job for my computer to do on it's own, so I set about writing some R code that would take less than 23 hours to write and run!!! As a novice programmer with little experience this might be a challenge.....
Well I overestimated the problem. It took me about 30mins to write the following and a further 2 mins or so to run:
POWSIM.create.R
powsim.file.create<-function(infile){ x=infile #read the fixed header lines (1:14) hdr<-readLines(x)[1:14] #count the total numbe rof lines nlines<-length(readLines(x)) #define the data as the total numer #lines - the first 14 fixed header #lines data<-readLines(x)[15:nlines] #create the sequential input files for(i in 1:(nlines-14)){ #open a file connection with a specific names out<-file(paste("POWSIM.IN_",i,sep=""),"w") for(j in 1:14){ #this is an element in the header which #needs to change as it define the number #of loci present in the file if (j == 7){ cat(i,"\n",file=out,sep="") }else{ cat(hdr[j],"\n",file=out,sep="") } } for(z in 1:i){ cat(data[z],"\n",file=out,sep="") } close(out) } }
The moral of the story is useR (or any other language you like), it gives you so much time to waste doing other thing, like blogging instead of working.