The construction of a phylogenomic dataset is often confronted to a logical choice of sequences according to various constraints:
  • existence of partial sequences (e.g. EST or low-cover genome sequencing),
  • absence of genes for some species (e.g. gene loss or sequencing not finished),
  • existence of multiple sequences for one species (e.g. paralogous or xenologous genes).
SCaFoS is a useful software in this phylogenomic context allowing the easy handling of multiple aligned files. Starting from alignments of proteins or nucleic acids, SCaFoS is able to select genes, species and sequences according to the needs of the user. The various options of SCaFoS include:
  • the concatenation of multiple aligned files into a single super-matrix,
  • the selection of genes for super-tree generation,
  • the selection of species according to their frequency of presence or other user defined criteria,
  • the selection among different paralogous sequences from the same species,
  • the creation of chimerical genes from closely related species.
All these abilities taken into account, SCaFoS helps in minimizing the amount of missing data.
SCaFoS can be used in an intuitive easy-to-use graphical mode for which the option choice is highly facilitated, or in a command line mode that can be implemented in a workflow.
