|
MFannot
We have developed automated
annotation for mitochondrial and plastid genomes that requires
little if any manual corrections (replacing manual annotation of
often many days to --- just a few minutes). It makes intense use
of the RNA/intron detection tools described below, and is
particularly helpful with organelle genomes that contain lots of
introns. Intron-exon boundaries are identified by a combination
of intron splice rules and exon similarities, and are thus
precise in most instances. There is also a separate procedure
for detecting small introns (as small as three nucleotides long,
in the documented case of Podospora anserina).
The output of MFannot is
listings of gene coordinates either in a format that can be
directly loaded into NCBI sequence submission tools, or in
masterfile format (computer-parsible as well as human-readable;
annotations embedded into the sequence). In its current form,
bilaterian mtDNAs will not be properly processed, a feature that
we do not plan to develop.
Similarly, annotation of plant mitochondrial is less effective.
We
strongly recommend that MFannot annotations are always
carefully validated by an expert,
before GenBank submission, in particular in
case of the large and small rRNA subunit genes (rnl and rns).
At this time point, introns in these genes are just indicated to
be present, but not positioned within a proper gene model
(exon/intron structure). For this, manual expert work is
required, either by sequence comparison with genes from closely
related species (preferentially genes that contain no or few
introns), or via use of RNA-seq data that allow to position
exons with relative confidence. 'Relative', because
interpretation of mitochondrial RNA-seq read mapping tends to be
challenging in many instances, due to ineffective intron
splicing (difficulties to identify clearcut exon-intron
boundaries).
Curiously, most of the mito
genome research community seems to think that RNAseq data are a
luxury and not a necessity, whereas the availability of RNA
sequence is otherwise as good as required for the annotation of
nuclear genomes. We recommend to produce RNAseq data not only to
check gene annotations, and you may not only identify very small
exons that are difficult to infer in intron-rich mtDNA, but also
detect cool things like RNA editing, trans-splicing, ribosomal
hopping, etc.
Results produced by our service
will be sent by Email only (but you may venture to install
MFannot locally on a Linux box, from our GitHub page). We
recommend that you view the email-appended file with an asci
viewer that is set to a fixed-width font like Monospace. In case of problems, comments or
questions, please contact Franz.Lang [at] Umontreal.ca
UPDATES: We currently work on a variety of
improvements that will result in regular updates within the
coming months. Please provide us with feedback in
case that functionality has been lost. A gradual replacement
of ERPIN intron predictions by cmsearch (Infernal package)
will allow us to identify introns that are currently not
recognized, thus also gene models are expected to be more
complete.
Update 23
October 2019: a new version has been installed that is
(i) more effective in intron recognition, (ii) more complete in
annotation of ORFs, dpo and rpo (which are usually relics of
mitochondrial plasmid integration into the mtDNA) and (iii)
provides E-values for intron and gene identification in case of
either Infernal or HMM searches.
RNAweasel
RNAweasel predicts complex, structured
mitochondrial (and other organelle) RNAs, using ERPIN (1) as a
search engine. ERPIN's search algorithm is based on RNA
secondary structure profiles, which are computed from RNA
sequence alignments plus user-defined secondary structure
information as an input. Much of its efficiency stems from the
definition of precisely delimited structural elements that can
be searched individually or in combination, by using a defined
search order ('search strategy'). It is currently the
second-most sensitive search algorithm for structured RNAs
(following the outstanding covariance-based Infernal program
based on Cove [2]. We are planning to transit to Infernal-based
covariance models for RNA predictions in the near future (2020
?), as these are (i) more sensitive, including detection of
partial RNA motifs and (ii) because the search algorithm has
been largely improved in terms of execution speed and
sensitivity.