PARBOOT - parallel bootstrapping

What is Parboot , and how can I obtain it ?

PARBOOT is a parallel implementation of the bootstrapping function available in many of the PHYLIP(1) phylogenetic analysis programs. The programs in the PHYLIP package that support bootstrapping take input files with multiple resampled data sets that could potentially be processed independently (usually generated with the PHYLIP program SEQBOOT). Bootstrapping is traditionally performed using the PHYLIP programs, however, so that the constituent programs of a bootstrap process these blocks sequentially. This is usually a time consuming process, making analysis through the iterative bootstrapping method impractical.

The PARBOOT application splits a multiple-dataset inputfile into its independent datasets, and processes these datasets in parallel on multiple hosts (or one host with multiple CPUs). This means that the speed of a bootstrap analysis can be approximately improved in proportion the number and power of CPUs available to the user. For example, a resampled dataset in the file "infile" might contain 100 re- samplings of the original data, as illustrated below (only three blocks shown):

---- Start of file infile ----
    5   13
Alpha        ACCGGGTTTG GCA
Beta         AGGGGGTTTC CCA
Gamma        CGGTTTTTTC CCA
Delta        GGGAAATTTT TCG
Epsilon      GGGAAATTTC CCG
    5   13
Alpha        ACCGGGTTGG CCC
Beta         AGGGGGTTCC CCC
Gamma        CGGTTTTTCC CCC
Delta        GGGAAATTTT CCC
Epsilon      GGGAAATTCC CCC
    5   13
Alpha        AACCGGGCCC AAA
Beta         AAGGGCCCCC AAA
Gamma        CCGGTCCCCC AAA
Delta        GGGGATTCCC CCC
Epsilon      GGGGACCCCC CCC
(...)
---- End of file infile ----

Bootstrapping using a distance matrix method (the PHYLIP program "DNADIST") could be performed using the "multiple dataset" option and specifying 100 data sets. As this is the same as running "DNADIST" 100 times on 100 different files each containing one data set, DNADIST is a good candidate for parallel breaking the analysis into smaller parts and performing this analysis on multiple machines.

Once PARBOOT is properly installed and configured, invoking it by typing:

"parboot infile dnadist y"

will result in the infile being split into individual data sets and each dataset being run on the specified hosts simultaneously. The results are collated as each sub-analysis is completed.

Platforms/Operating Systems

PARBOOT requires the following:

Networked UNIX computers.
An account on all hosts that will be used.
Working "rsh" and "rcp" commands.
A perl interpreter on all hosts (version 4.0).
The PHYLIP package accessible on all hosts.

See the INSTALLATION, getting.perl and getting.phylip documents for more information on installing parboot.

Obtaining PARBOOT

PARBOOT can be obtained by anonymous ftp to:

megasun.bch.umontreal.ca

Files can be retrieved from the /pub/parboot directory.

FURTHER INFORMATION

For more information about the parboot project, send email to the Informatics Division of the Organelle Genome Megasequencing Program at the Universite de Montreal:

riouxp@bch.umontreal.ca

All feedback welcome.

ACKNOWLEDGMENTS

This work was supported in part by the Medical Research Council, Canada (grant No. SP-34), the Canadian Genome Analysis & Technology program (grant No. GO-12323) and Sun Microsystems.

AUTHORS

Pierre Rioux, Tim Littlejohn (Project Management), Organelle Genome Megasequencing Program, Aug. 1994.

REFERENCES

(1) Felsenstein, J. 1993. PHYLIP (Phylogeny Inference Package) version 3.5c. Distributed by the author. Department of Genetics, University of Washington, Seattle.

The development of PARBOOT was supported by a grant from the Canadian Genome Analysis and Technology Program. (CGAT).