Selection, Concatenation and Fusion of Sequences

Table of content Purpose >

Phylogenetic inference based on large amounts of sequence data (phylogenomics) is becoming an alternative approach to single gene phylogenies, which are often insufficient to resolve most phylogenetic questions. In this context, handling large amounts of data implies dealing with species and gene sampling, missing data, partial sequences, uneven distribution of species and genes, and with the presence multiple sequences per species (usually due to paralogous genes).

Despite the huge size of molecular databases, large amounts of data are only available for a limited number of species, and a choice has to be made between using a large number of genes or of species. An alternative is to make a compromise between a maximum number of genes and species and a reduced amount of missing data. The different ways to minimize missing data include the use of the most broadly sequenced genes and species and the combination of sequences from closely related species into a single one. Once species and genes have been selected, many approaches exist to infer phylogenies, but the most common ones can be defined as (i) super-matrix approaches or (ii) super-tree approaches. SCaFoS is a tool that allows the easy selection of sequences, species and genes and the construction of datasets suitable for these approaches. In particular, it helps in maximizing the amount of data usable for phylogenomic analysis.
Table of content Purpose >

Hervé PHILIPPE's Lab