The construction of a phylogenomic dataset is often confronted to a logical
choice of sequences according to various constraints:
- existence of partial sequences (e.g. EST or low-cover genome sequencing),
- absence of genes for some species (e.g. gene loss or sequencing not finished),
- existence of multiple sequences for one species (e.g. paralogous or xenologous genes).
SCaFoS is a useful software in this phylogenomic context allowing the easy
handling of multiple aligned files. Starting from alignments of proteins or nucleic
acids, SCaFoS is able to select genes, species and sequences according to the needs
of the user. The various options of SCaFoS include:
- the concatenation of multiple aligned files into a single super-matrix,
- the selection of genes for super-tree generation,
- the selection of species according to their frequency of presence or other user defined criteria,
- the selection among different paralogous sequences from the same species,
- the creation of chimerical genes from closely related species.
All these abilities taken into account, SCaFoS helps in minimizing the amount of
missing data.
SCaFoS can be used in an intuitive easy-to-use graphical mode for which the
option choice is highly facilitated, or in a command line mode that can be
implemented in a workflow.