Contact Us

Use the form on the right to contact us.

You can edit the text in this area, and change where the contact form on the right submits to, by entering edit mode using the modes on the bottom right. 


Santa Cruz, CA

Functional morphologist and evolutionary biologist

Blog

Parallel processing for MCMCglmm in R (Windows-friendly)

Vikram Baliga

Lately, I have been using the MCMCglmm package to run linear mixed-models in a Bayesian framework. The documentation is generally very good and there are lots of great tips on StackExchange and r-sig-phylo & r-sig-mixed-models…etc. But, there seems to be relatively little support for using parallel processing (here: using multiple cores on your machine) to speed up how long it takes to finish large volumes of mcmc runs. This is especially true for Windows users, who cannot use functions like mclapply().

I’m happy to share that I have worked out a solution using the parallel package. Basically, I set up a virtual cluster and then use the parLapply() function to run iterations of MCMCglmm() in parallel.

Tutorial setup

I’ll use “Example 2” from the MCMCglmm() function help. So, first I’ll load everything up and take care of a few preliminaries. You can skip ahead to the next section if instead you’d like to tailor this to your own data & analysis.

library(MCMCglmm) library(parallel)

Lifting this directly from the MCMCglmm() help page:

data(bird.families) phylo.effect<-rbv(bird.families, 1, nodes="TIPS") phenotype<-phylo.effect+rnorm(dim(phylo.effect)[1], 0, 1) # simulate phylogenetic and residual effects with unit variance test.data<-data.frame(phenotype=phenotype, taxon=row.names(phenotype)) Ainv<-inverseA(bird.families)$Ainv # inverse matrix of shared phyloegnetic history prior<-list(R=list(V=1, nu=0.002), G=list(G1=list(V=1, nu=0.002))) model2<-MCMCglmm(phenotype~1, random=~taxon, ginverse=list(taxon=Ainv), data=test.data, prior=prior, verbose=FALSE, nitt=1300, burnin=300, thin=1) summary(model2)

Of course, the example provided sets nitt to only 130, yielding an ESS of only ~460 for the fixed effect. I am guessing this is intended to make sure the example is quick to execute.

Boosting this to nitt=100000, burnin=10000, and thin=10 gives a more healthy ESS of ~8321. But please note that this will take a lot longer to finish (I’ll leave it up to you to use the Sys.time() function to time it yourself).

The code

Whenever conducting MCMC-based analyses, it’s advisable to conduct multiple runs (different chains) and then assess convergence. I’ll leave the convergence assessments for another day (but here’s a good StackExchange post).

For now we’ll just conduct 10 runs of this model, each using nitt=100000, using parallel processing. Make sure you load both the parallel and MCMCglmm packages if you skipped any of the stuff above

# PLEASE NOTE: I am setting this up to use only 80% of your machine's total # logical processors. You can certainly harness all of your CPUs if you'd like, # although I advise against doing so if any of your MCMC runs take more than a # few minutes. It also doesn't make sense to set the number of logical # processors to be greater than the number of runs (chains), but more on that # later. Anyway, treat your silicon well! setCores<-round(detectCores()*0.8) # use detectCores() by itself if you want all CPUs # make the cluster cl <- makeCluster(getOption("cl.cores",setCores)) # load the MCMCglmm package within the cluster cl.pkg <- clusterEvalQ(cl,library(MCMCglmm)) # import each object that's necessary to run the function clusterExport(cl,"prior") clusterExport(cl,"test.data") clusterExport(cl,"Ainv") # use parLapply() to execute 10 runs of MCMCglmm(), each with nitt=100000 model2_10runs<-parLapply(cl=cl,1:10, function(i) { MCMCglmm(phenotype~1, random=~taxon, ginverse=list(taxon=Ainv), data=test.data, prior=prior, verbose=FALSE, nitt=100000, burnin=10000, thin=10)} ) # once it's finished, use stopCluster() to stop running the parallel cluster stopCluster(cl)

Pretty simple!

The model2_10runs object is a list that contains each of the 10 mcmc models. You can perform all the usual summarization, plotting…etc, but just be sure to specify models within the list, e.g.:

summary(model2_10runs[[3]]) # summarize the third model out of the 10

As I mentioned above, we’ll leave convergence and other fun topics like autocorrelation for another day.