PARBOOT is a parallel implementation of the bootstrapping function available in many of the PHYLIP(1) phylogenetic analysis programs. The programs in the PHYLIP package that support bootstrapping take input files with multiple resampled data sets that could potentially be processed independently (usually generated with the PHYLIP program SEQBOOT). Bootstrapping is traditionally performed using the PHYLIP programs, however, so that the constituent programs of a bootstrap process these blocks sequentially. This is usually a time consuming process, making analysis through the iterative bootstrapping method impractical.
The PARBOOT application splits a multiple-dataset inputfile into its independent datasets, and processes these datasets in parallel on multiple hosts (or one host with multiple CPUs). This means that the speed of a bootstrap analysis can be approximately improved in proportion the number and power of CPUs available to the user. For example, a resampled dataset in the file "infile" might contain 100 re- samplings of the original data, as illustrated below (only three blocks shown):
---- Start of file infile ----
5 13
Alpha ACCGGGTTTG GCA
Beta AGGGGGTTTC CCA
Gamma CGGTTTTTTC CCA
Delta GGGAAATTTT TCG
Epsilon GGGAAATTTC CCG
5 13
Alpha ACCGGGTTGG CCC
Beta AGGGGGTTCC CCC
Gamma CGGTTTTTCC CCC
Delta GGGAAATTTT CCC
Epsilon GGGAAATTCC CCC
5 13
Alpha AACCGGGCCC AAA
Beta AAGGGCCCCC AAA
Gamma CCGGTCCCCC AAA
Delta GGGGATTCCC CCC
Epsilon GGGGACCCCC CCC
(...)
---- End of file infile ----
Bootstrapping using a distance matrix method (the PHYLIP program "DNADIST") could be performed using the "multiple dataset" option and specifying 100 data sets. As this is the same as running "DNADIST" 100 times on 100 different files each containing one data set, DNADIST is a good candidate for parallel breaking the analysis into smaller parts and performing this analysis on multiple machines.
Once PARBOOT is properly installed and configured, invoking it by typing:
"parboot infile dnadist y"
will result in the infile being split into individual data sets and each dataset being run on the specified hosts simultaneously. The results are collated as each sub-analysis is completed.
PARBOOT requires the following:
See the INSTALLATION, getting.perl and getting.phylip documents for more information on installing parboot.
Files can be retrieved from the /pub/parboot directory.
All feedback welcome.
(1) Felsenstein, J. 1993. PHYLIP (Phylogeny Inference Package) version 3.5c. Distributed by the author. Department of Genetics, University of Washington, Seattle.
Copyright OGMP, 1994 ogmp@bch.umontreal.ca .
The development of PARBOOT was supported by a grant from the Canadian Genome Analysis and Technology Program. (CGAT).