About MFannot and RNAweasel

We have developed automated annotation for mitochondrial and plastid genomes that requires little if any manual corrections (replacing manual annotation of often many days to --- just a few minutes). It makes intense use of the RNA/intron detection tools described below, and is particularly helpful with organelle genomes that contain lots of introns. Intron-exon boundaries are identified by a combination of intron splice rules and exon similarities, and are thus precise in most instances. There is also a separate procedure for detecting small introns (as small as three nucleotides long, in the documented case of Podospora anserina).

The output of MFannot is listings of gene coordinates either in a format that can be directly loaded into NCBI sequence submission tools, or in masterfile format (computer-parsible as well as human-readable; annotations embedded into the sequence). In its current form, bilaterian mtDNAs will not be properly processed, a feature that we do not plan to develop. Similarly, annotation of plant mitochondrial is less effective.

We strongly recommend that MFannot annotations are always carefully validated by an expert, before GenBank submission, in particular in case of the large and small rRNA subunit genes (rnl and rns). At this time point, introns in these genes are just indicated to be present, but not positioned within a proper gene model (exon/intron structure). For this, manual expert work is required, either by sequence comparison with genes from closely related species (preferentially genes that contain no or few introns), or via use of RNA-seq data that allow to position exons with relative confidence. 'Relative', because interpretation of mitochondrial RNA-seq read mapping tends to be challenging in many instances, due to ineffective intron splicing (difficulties to identify clearcut exon-intron boundaries).
Curiously, most of the mito genome research community seems to think that RNAseq data are a luxury and not a necessity, whereas the availability of RNA sequence is otherwise as good as required for the annotation of nuclear genomes. We recommend to produce RNAseq data not only to check gene annotations, and you may not only identify very small exons that are difficult to infer in intron-rich mtDNA, but also detect cool things like RNA editing, trans-splicing, ribosomal hopping, etc.

Results produced by our service will be sent by Email only (but you may venture to install MFannot locally on a Linux box, from our GitHub page). We recommend that you view the email-appended file with an asci viewer that is set to a fixed-width font like Monospace
. In case of problems, comments or questions, please contact Franz.Lang [at] Umontreal.ca

UPDATES: We currently work on a variety of improvements that will result in regular updates within the coming months. Please provide us with feedback in case that functionality has been lost. A gradual replacement of ERPIN intron predictions by cmsearch (Infernal package) will allow us to identify introns that are currently not recognized, thus also gene models are expected to be more complete.

Update 23 October 2019: a new version has been installed that is (i) more effective in intron recognition, (ii) more complete in annotation of ORFs, dpo and rpo (which are usually relics of mitochondrial plasmid integration into the mtDNA) and (iii) provides E-values for intron and gene identification in case of either Infernal or HMM searches.


RNAweasel predicts complex, structured mitochondrial (and other organelle) RNAs, using ERPIN (1) as a search engine. ERPIN's search algorithm is based on RNA secondary structure profiles, which are computed from RNA sequence alignments plus user-defined secondary structure information as an input. Much of its efficiency stems from the definition of precisely delimited structural elements that can be searched individually or in combination, by using a defined search order ('search strategy'). It is currently the second-most sensitive search algorithm for structured RNAs (following the outstanding covariance-based Infernal program based on Cove [2]. We are planning to transit to Infernal-based covariance models for RNA predictions in the near future (2020 ?), as these are (i) more sensitive, including detection of partial RNA motifs and (ii) because the search algorithm has been largely improved in terms of execution speed and sensitivity.

The availability of correctly aligned RNA sequences as training sets, and the deduction of precise secondary structure definitions are THE key prerequisites for using ERPIN. We have therefore developed the RNAweasel tools: for the compilation and manipulation of sequence training sets, including easy visualization and editing of alignments and structure definitions (using GDE; 3), automatic alignment of ERPIN results, normalization of training set sequences, and a reiterative mode of search that helps to build training sets starting from just a few initial sequences. A set of optimized intron training sets that are used in this service are discussed in (4), and a recent application to finding unorthodox trans-spliced group I introns in Trichoplax mtDNA in (5). The current version of RNAweasel is based on ERPIN version 5.2.1.

RNAweasel searches includes mitochondrial and plastid genomes for:


(1) Gautheret, D., and A. Lambert (2001)
      Direct RNA motif definition and identification from multiple sequence alignments using secondary structure profiles.
      J Mol Biol 313:1003-1011

(2) Eddy SR and Durbin R (1994)
      RNA sequence analysis using covariance models.
      Nucleic Acids Res 22: 2079-88

(3) Smith, S. W., R. Overbeek, C. R. Woese, W. Gilbert, and P. M. Gillevet (1994).
     The genetic data environment an expandable GUI for multiple sequence analysis.
     Comput Appl Biosci 10:671-675

(4) Lang B.F., M-J. Laforest, and G. Burger (2007)
     Mitochondrial introns: a critical view. 
     Trends Genet 23

(5) Burger G, Yan Y, Javadi P and Lang  BF (2009)
      Group I-intron trans-splicing and mRNA editing in mitochondria of placozoan animals.
      Trend in Genetics 25: 381-6


Support has been generously provided by NSERC, Genome Quebec/Canada and the Canadian Research Chair program. Special thanks to Guy Troughton (guytroughton@bigpond.com) for permitting us to use his artistic weasel drawing.