···

About MFannot and RNAweasel

Background on the two pipelines this site exposes — what they do, what they don’t, and the people and references behind them. For day-to-day usage see the Docs; for the HTTP contract see the API reference.

MFannot

We have developed automated annotation for mitochondrial and plastid genomes that requires little if any manual correction — replacing manual annotation of often many days with just a few minutes. It makes intense use of the RNA/intron detection tools described below, and is particularly helpful with organelle genomes that contain lots of introns. Intron-exon boundaries are identified by a combination of intron splice rules and exon similarities, and are thus precise in most instances. There is also a separate procedure for detecting small introns (as small as three nucleotides long, in the documented case of Podospora anserina).

The output of MFannot is listings of gene coordinates either in a format that can be directly loaded into NCBI sequence submission tools, or in masterfile format (computer-parsible as well as human-readable; annotations embedded into the sequence). In its current form, bilaterian mtDNAs will not be properly processed, a feature that we do not plan to develop. Similarly, annotation of plant mitochondrial genomes is less effective.

We strongly recommend that MFannot annotations are always carefully validated by an expert, before GenBank submission, in particular in case of the large and small rRNA subunit genes (rnl and rns). At this time point, introns in these genes are just indicated to be present, but not positioned within a proper gene model (exon/intron structure). For this, manual expert work is required, either by sequence comparison with genes from closely related species (preferentially genes that contain no or few introns), or via use of RNA-seq data that allow positioning of exons with relative confidence. ‘Relative’, because interpretation of mitochondrial RNA-seq read mapping tends to be challenging in many instances, due to ineffective intron splicing (difficulties identifying clearcut exon-intron boundaries).

Curiously, most of the mito-genome research community seems to think that RNA-seq data are a luxury and not a necessity, whereas the availability of RNA sequence is otherwise as good as required for the annotation of nuclear genomes. We recommend producing RNA-seq data not only to check gene annotations: you may identify very small exons that are difficult to infer in intron-rich mtDNA, and also detect cool things like RNA editing, trans-splicing, ribosomal hopping, etc.

For problems, comments or questions specific to MFannot, please contact Franz.Lang [at] Umontreal.ca.

RNAweasel

RNAweasel predicts complex, structured mitochondrial (and other organelle) RNAs, using ERPIN (1) as a search engine. ERPIN’s search algorithm is based on RNA secondary-structure profiles, which are computed from RNA sequence alignments plus user-defined secondary-structure information as input. Much of its efficiency stems from the definition of precisely delimited structural elements that can be searched individually or in combination, using a defined search order (‘search strategy’). It is currently the second-most sensitive search algorithm for structured RNAs, following the outstanding covariance-based Infernal program based on Cove (2). We are planning to transit to Infernal-based covariance models for RNA predictions, as they are (i) more sensitive, including detection of partial RNA motifs, and (ii) the search algorithm has been largely improved in terms of execution speed and sensitivity.

The availability of correctly aligned RNA sequences as training sets, and the deduction of precise secondary-structure definitions, are the key prerequisites for using ERPIN. We have therefore developed the RNAweasel tools: for the compilation and manipulation of sequence training sets, including easy visualization and editing of alignments and structure definitions (using GDE; 3), automatic alignment of ERPIN results, normalization of training-set sequences, and a reiterative mode of search that helps build training sets starting from just a few initial sequences. A set of optimized intron training sets used in this service is discussed in (4), and a recent application to finding unorthodox trans-spliced group I introns in Trichoplax mtDNA in (5). The current version of RNAweasel is based on ERPIN version 5.2.1.

RNAweasel searches mitochondrial and plastid genomes for:

introns of group I and group II
tRNAs
RNase P RNA (rnpB)
5S (rrn5) and small-subunit rRNA (rns)

Updates

A gradual replacement of ERPIN intron predictions by cmsearch (Infernal package) will allow us to identify introns that are currently not recognized, so gene models are expected to become more complete. Please let us know if functionality has been lost across versions.

23 October 2019

A new version was installed that is (i) more effective in intron recognition, (ii) more complete in annotation of ORFs, dpo and rpo (which are usually relics of mitochondrial plasmid integration into the mtDNA) and (iii) provides E-values for intron and gene identification in case of either Infernal or HMM searches.

References

Gautheret D and Lambert A (2001). Direct RNA motif definition and identification from multiple sequence alignments using secondary structure profiles. J Mol Biol 313:1003–1011.
Eddy SR and Durbin R (1994). RNA sequence analysis using covariance models. Nucleic Acids Res 22:2079–2088.
Smith SW, Overbeek R, Woese CR, Gilbert W and Gillevet PM (1994). The genetic data environment: an expandable GUI for multiple sequence analysis. Comput Appl Biosci 10:671–675.
Lang BF, Laforest M-J and Burger G (2007). Mitochondrial introns: a critical view. Trends Genet 23:119–125.
Burger G, Yan Y, Javadi P and Lang BF (2009). Group I-intron trans-splicing and mRNA editing in mitochondria of placozoan animals. Trends Genet 25:381–386.

Contact

Should you encounter any issues, please contact B.Franz.Lang [at] gmail.com.

Acknowledgements

Support has been generously provided by NSERC, Genome Quebec / Canada and the Canadian Research Chair program. Special thanks to Guy Troughton for permitting us to use his artistic weasel drawing.