We have developed automated annotation for mitochondrial and plastid genomes that requires little if any manual corrections (replacing manual annotation of often many days to --- just a few minutes). It makes intense use of the RNA/intron detection tools described below, and is particularly helpful with organelle genomes that contain lots of introns. Intron-exon boundaries are identified by a combination of intron splice rules and exon similarities, and are thus precise in most instances. There is also a separate procedure for detecting small introns (as small as three nucleotides long, in the documented case of Podospora anserina).
The output of MFannot is
listings of gene coordinates either in a format that can be
directly loaded into NCBI sequence submission tools, or in
masterfile format (computer-parsible as well as human-readable;
annotations embedded into the sequence). In its current form,
bilaterian mtDNAs will not be properly processed, a feature that
we do not plan to develop.
Similarly, annotation of plant mitochondrial is less effective.
strongly recommend that MFannot annotations are always
carefully validated by an expert,
before GenBank submission, in particular in
case of the large and small rRNA subunit genes (rnl and rns).
At this time point, introns in these genes are just indicated to
be present, but not positioned within a proper gene model
(exon/intron structure). For this, manual expert work is
required, either by sequence comparison with genes from closely
related species (preferentially genes that contain no or few
introns), or via use of RNA-seq data that allow to position
exons with relative confidence. 'Relative', because
interpretation of mitochondrial RNA-seq read mapping tends to be
challenging in many instances, due to ineffective intron
splicing (difficulties to identify clearcut exon-intron
Curiously, most of the mito genome research community seems to think that RNAseq data are a luxury and not a necessity, whereas the availability of RNA sequence is otherwise as good as required for the annotation of nuclear genomes. We recommend to produce RNAseq data not only to check gene annotations, and you may not only identify very small exons that are difficult to infer in intron-rich mtDNA, but also detect cool things like RNA editing, trans-splicing, ribosomal hopping, etc.
Results produced by our service will be sent by Email only (but you may venture to install MFannot locally on a Linux box, from our GitHub page). We recommend that you view the email-appended file with an asci viewer that is set to a fixed-width font like Monospace. In case of problems, comments or questions, please contact Franz.Lang [at] Umontreal.ca
UPDATES: We currently work on a variety of improvements that will result in regular updates within the coming months. Please provide us with feedback in case that functionality has been lost. A gradual replacement of ERPIN intron predictions by cmsearch (Infernal package) will allow us to identify introns that are currently not recognized, thus also gene models are expected to be more complete.
October 2019: a new version has been installed that is
(i) more effective in intron recognition, (ii) more complete in
annotation of ORFs, dpo and rpo (which are usually relics of
mitochondrial plasmid integration into the mtDNA) and (iii)
provides E-values for intron and gene identification in case of
either Infernal or HMM searches.
RNAweasel predicts complex, structured
mitochondrial (and other organelle) RNAs, using ERPIN (1) as a
search engine. ERPIN's search algorithm is based on RNA
secondary structure profiles, which are computed from RNA
sequence alignments plus user-defined secondary structure
information as an input. Much of its efficiency stems from the
definition of precisely delimited structural elements that can
be searched individually or in combination, by using a defined
search order ('search strategy'). It is currently the
second-most sensitive search algorithm for structured RNAs
(following the outstanding covariance-based Infernal program
based on Cove . We are planning to transit to Infernal-based
covariance models for RNA predictions in the near future (2020
?), as these are (i) more sensitive, including detection of
partial RNA motifs and (ii) because the search algorithm has
been largely improved in terms of execution speed and